Improving support for locators

Cites can include locators to specify a location within a larger work, e.g.
“(Doe et al. 2000, pp. 14-24)”. CSL 1.0 has a defined set of locator terms,
such as “volume”, “chapter”, “page”, etc (see
http://citationstyles.org/downloads/specification.html#locators for the
full list). With the citeproc-js JSON input format, the locator information
of a cite is stored in two fields: a) “label”, which specifies the locator
term, e.g. “page”, and b) “locator”, which stores the (numeric) locator
value, e.g. “14-24”. See
https://bitbucket.org/bdarcus/citeproc-test/src/6fda4656cf9d/processor-tests/humans/locator_SimpleLocators.txt#cl-61for
an example.

While this approach works quite well, there have been several user requests
to expand the number of locator terms. We could just expand the current
set, although it might be hard to cover all the desired locator terms, and
having a long list would increase UI clutter. More problematic are
hierarchical locators (e.g., “Act I, scene i, lines 12-23”).

One of the suggestions that has come up is to keep the current list of
locator terms mostly untouched, and add an “empty” locator term. If this
locator type is selected, both the locator descriptions and values would be
stored in the “locator” field of citeproc input JSON, e.g. { “label”:
“none”, “locator”: “Act I, scene i, lines 12-23” }.

Does anybody have an opinion about whether this is an acceptable way to
improve the support for locators?

Rintze

P.S. The related GitHub issue can be found at

I’m strongly in favor of this.

sounds like a good way to fall back on any custom scheme for users.

Hi,

Sounds good here too.

Minor issue:

{ “label”: “none”, “locator”: “Act I, scene i, lines 12-23” }.

First point is this is easier as just:

{ “locator”: “Act I, scene i, lines 12-23” }.

E.g. just make the label key optional.

The second point is more about the substance of the proposal. If I
understand right, we have two, completely orthogonal, issues here.

  1. custom locators (what to do about point locator labels we haven’t itemized?)

  2. multiple locators

This proposal (as represented in the example above; it’s doesn’t seem
to be formalized as such) attempts to solve both problems at once, so
that the upshot is that the second problem requires a free text value.

For sake of argument, why not split these issues, so that you have a
point locator defined as a list of key values; something like?

[
{ “act”: “1”},
{ “scene”: “3”},
{ “value”: “foo 5”}
]

Bruce

How would this work in a user interface?

Is that really our concern?

Seems to me we want a good, stable, spec that leaves room for
different sorts of UI approaches.

It might be, for example, that some apps (like Zotero) just have a
simple field and parsers that field as a set of comma-separated
values.

It might be that other apps treat it more structured.

I guess I just need think if we’re going to go this way, we better
define the model (is a point locator a single value, or is it a list
of key-values) so that it’s clear, and that we probably shouldn’t be
changing it later.

Bruce

I adopted an approach like this in CSL for Law. Some specialised
problems emerged, and it would be good to have them taken into account
at this point. We hold individual provisions in Zotero as separate
items (having one item for an entire statute is not useful, but if we
are able to link and comment on individual provisions, it is very
useful). One CSL item field (section) is reserved for locator
information. This consists of (optional) label strings (sec., p. etc)
and accompanying locator strings. There can be several sets of these
in the field, each separately evaluated for pluralism,
numeric/non-numeric status, and so forth.

There were a number of tricky issues, but one of the most difficult
was working out how an actual, user-supplied locator field and label
should interact with the leading, structured portion supplied via the
"section" field. For example, suppose an entry for 23 USC 253(a),
where the section field is set as “sec. 253(a)”. Now suppose this
provision is cited at 253(a)(i), with “(i)” supplied via the user
locator. Alternatively, suppose it is cited at 23 USC 253(a) para. 2,
where the “para.” element might come from the pull-down menu, or be
supplied as an abbreviated term at the start of the locator field.

There are also problems with multiple locators, such as “23 USC 253(a) & 264”.

I was able to make things work pretty well (i think), but the
experience suggested to me that mixing two structures in input
(pull-down labels for the initial element, possibly overridden by a
leading in-field label like “sec.”, and embedded in-field labels for
the rest) was a headache. It will be easier to just use embedded
in-field label abbreviations for everywhere and dispense with the
pull-down label altogether in the UI. The use case shown above (with
"&") also shows that a strict key/value representation will be too
limiting. We need to localize the label to the style (so “sec.” or
"section" in some styles, and a section symbol in others), but
preserve user-supplied connecting punctuation. It’s a messy
half-structure, but that’s how things are referenced, and I concluded
that there isn’t much that can be done for further discipline.

Frank

I guess I just need think if we’re going to go this way, we better
define the model (is a point locator a single value, or is it a list
of key-values) so that it’s clear, and that we probably shouldn’t be
changing it later.

Just emphasizing here this is my key point. But …

There were a number of tricky issues, but one of the most difficult
was working out how an actual, user-supplied locator field and label
should interact with the leading, structured portion supplied via the
“section” field. For example, suppose an entry for 23 USC 253(a),
where the section field is set as “sec. 253(a)”. Now suppose this
provision is cited at 253(a)(i), with “(i)” supplied via the user
locator. Alternatively, suppose it is cited at 23 USC 253(a) para. 2,
where the “para.” element might come from the pull-down menu, or be
supplied as an abbreviated term at the start of the locator field.

There are also problems with multiple locators, such as “23 USC 253(a) & 264”.

Right. While I don’t understand all the intricacies of your case, the
point is my suggestion here does not mean these fields are free text;
they are comma-separated lists of structured key values, where
optionally a key is empty and its value is free text.

So users would have to account for that presumably. I can’t imagine
any other way.

I was able to make things work pretty well (i think), but the
experience suggested to me that mixing two structures in input
(pull-down labels for the initial element, possibly overridden by a
leading in-field label like “sec.”, and embedded in-field labels for
the rest) was a headache. It will be easier to just use embedded
in-field label abbreviations for everywhere and dispense with the
pull-down label altogether in the UI.

OK. Sounds like we agree.

The use case shown above (with
“&”) also shows that a strict key/value representation will be too
limiting.

But that presumes we all accept the requirement that users should be
able to do that. I don’t (but my opinion is just one).

We need to localize the label to the style (so “sec.” or
“section” in some styles, and a section symbol in others), but
preserve user-supplied connecting punctuation.

Given this is the key problem now in moving forward with a concrete
proposal, can you expand on why you say this is a requirement?

Consider how Zotero deals with dates; why not something like that here?

Bruce

The use case shown above (with
“&”) also shows that a strict key/value representation will be too
limiting.

But that presumes we all accept the requirement that users should be
able to do that. I don’t (but my opinion is just one).

Accept which requirement, exactly? That users should be able to create
complex cites such as “23 USC 253(a) & 264”?

We need to localize the label to the style (so “sec.” or
“section” in some styles, and a section symbol in others), but
preserve user-supplied connecting punctuation.

Given this is the key problem now in moving forward with a concrete
proposal, can you expand on why you say this is a requirement?

Consider how Zotero deals with dates; why not something like that here?

Do you suggest using a dedicated cs:locator rendering element, akin to
cs:dates? How would that work?

Rintze

The use case shown above (with
“&”) also shows that a strict key/value representation will be too
limiting.

But that presumes we all accept the requirement that users should be
able to do that. I don’t (but my opinion is just one).

Accept which requirement, exactly? That users should be able to create
complex cites such as “23 USC 253(a) & 264”?

Should be able to enter that data as is and get useful citations out
the other end. I’m talking about data input here; something not per se
the domain of CSL.

We need to localize the label to the style (so “sec.” or
“section” in some styles, and a section symbol in others), but
preserve user-supplied connecting punctuation.

Given this is the key problem now in moving forward with a concrete
proposal, can you expand on why you say this is a requirement?

Consider how Zotero deals with dates; why not something like that here?

Do you suggest using a dedicated cs:locator rendering element, akin to
cs:dates? How would that work?

I don’t know ATM. I still am not clear on the use cases (and in
particular legal eccentricities).

I also think we need to clarify that this discussion is not about what
we might call “reference locators” (identifiers that locate a
reference source within some larger containing entity), but rather
“point locators” (identifiers that locate a specific fragment of
content within a reference source). Right?

Bruce

The use case shown above (with
“&”) also shows that a strict key/value representation will be too
limiting.

But that presumes we all accept the requirement that users should be
able to do that. I don’t (but my opinion is just one).

Accept which requirement, exactly? That users should be able to create
complex cites such as “23 USC 253(a) & 264”?

Should be able to enter that data as is and get useful citations out
the other end. I’m talking about data input here; something not per se
the domain of CSL.

Expressed as input in the scheme I described, the locator portion of
the bogus example “23 USC §§ 253(a) & 264” might be broken down like
this:

{
“section”: [
{ “section”: “253” }
],
“locator”: [
{ “none”: “(a) &” },
{ “section”: “264” }
]
"
}

Where the “section” variable is drawn from the persistent Item, and
the “locator” variable is drawn from the supplementary item data set
for this citation. As you can see, the “none” locator label has a role
to play when the Item specifier is supplemented in the citation to
refer to a smaller subunit of the target resource.

We need to localize the label to the style (so “sec.” or
“section” in some styles, and a section symbol in others), but
preserve user-supplied connecting punctuation.

Given this is the key problem now in moving forward with a concrete
proposal, can you expand on why you say this is a requirement?

Consider how Zotero deals with dates; why not something like that here?

Do you suggest using a dedicated cs:locator rendering element, akin to
cs:dates? How would that work?

I don’t know ATM. I still am not clear on the use cases (and in
particular legal eccentricities).

As background, Thomas Bruce has a useful series of posts on
legislative identifiers and Linked Data.

blog.law.cornell.edu/metasausage/2012/05/07/identifiers-part-1/

I also think we need to clarify that this discussion is not about what
we might call “reference locators” (identifiers that locate a
reference source within some larger containing entity), but rather
“point locators” (identifiers that locate a specific fragment of
content within a reference source). Right?

For statutes and other large documents/archives with a nested
structure, the boundary between the two gets fuzzy.

The use case shown above (with
“&”) also shows that a strict key/value representation will be too
limiting.

But that presumes we all accept the requirement that users should be
able to do that. I don’t (but my opinion is just one).

Accept which requirement, exactly? That users should be able to create
complex cites such as “23 USC 253(a) & 264”?

Should be able to enter that data as is and get useful citations out
the other end. I’m talking about data input here; something not per se
the domain of CSL.

Expressed as input in the scheme I described, the locator portion of
the bogus example “23 USC §§ 253(a) & 264” might be broken down like
this:

{
“section”: [
{ “section”: “253” }
],
“locator”: [
{ “none”: “(a) &” },
{ “section”: “264” }
]
"
}

Where the “section” variable is drawn from the persistent Item, and
the “locator” variable is drawn from the supplementary item data set
for this citation. As you can see, the “none” locator label has a role
to play when the Item specifier is supplemented in the citation to
refer to a smaller subunit of the target resource.

But what does this all mean; in particular the “253(a) & 264” bit?

To break it fully down (consider this legal citations for dummies, of
which I am one):

23 = volume (?)
USC = container-title (e.g. it’s an abbreviation for the code, which
is a periodical)
§§ = ??
253 = section (of the volume?)
(a) = ?? (is this a subsection of “253”, and therefore a point locator?)
& = (what it seems?)
264 = section (also of the volume?)

We need to localize the label to the style (so “sec.” or
“section” in some styles, and a section symbol in others), but
preserve user-supplied connecting punctuation.

Given this is the key problem now in moving forward with a concrete
proposal, can you expand on why you say this is a requirement?

Consider how Zotero deals with dates; why not something like that here?

Do you suggest using a dedicated cs:locator rendering element, akin to
cs:dates? How would that work?

I don’t know ATM. I still am not clear on the use cases (and in
particular legal eccentricities).

As background, Thomas Bruce has a useful series of posts on
legislative identifiers and Linked Data.

blog.law.cornell.edu/metasausage/2012/05/07/identifiers-part-1/

I also think we need to clarify that this discussion is not about what
we might call “reference locators” (identifiers that locate a
reference source within some larger containing entity), but rather
“point locators” (identifiers that locate a specific fragment of
content within a reference source). Right?

For statutes and other large documents/archives with a nested
structure, the boundary between the two gets fuzzy.

OK. Perhaps the example above can help us understand the fuzziness?

Bruce

The use case shown above (with
“&”) also shows that a strict key/value representation will be too
limiting.

But that presumes we all accept the requirement that users should be
able to do that. I don’t (but my opinion is just one).

Accept which requirement, exactly? That users should be able to create
complex cites such as “23 USC 253(a) & 264”?

Should be able to enter that data as is and get useful citations out
the other end. I’m talking about data input here; something not per se
the domain of CSL.

Expressed as input in the scheme I described, the locator portion of
the bogus example “23 USC §§ 253(a) & 264” might be broken down like
this:

{
“section”: [
{ “section”: “253” }
],
“locator”: [
{ “none”: “(a) &” },
{ “section”: “264” }
]
"
}

Where the “section” variable is drawn from the persistent Item, and
the “locator” variable is drawn from the supplementary item data set
for this citation. As you can see, the “none” locator label has a role
to play when the Item specifier is supplemented in the citation to
refer to a smaller subunit of the target resource.

But what does this all mean; in particular the “253(a) & 264” bit?

To break it fully down (consider this legal citations for dummies, of
which I am one):

23 = volume (?)
USC = container-title (e.g. it’s an abbreviation for the code, which
is a periodical)
§§ = ??
253 = section (of the volume?)
(a) = ?? (is this a subsection of “253”, and therefore a point locator?)
& = (what it seems?)
264 = section (also of the volume?)

http://www.law.cornell.edu/citation/2-300.htm

To break it fully down (consider this legal citations for dummies, of
which I am one):

23 = volume (?)
USC = container-title (e.g. it’s an abbreviation for the code, which
is a periodical)
§§ = ??
253 = section (of the volume?)
(a) = ?? (is this a subsection of “253”, and therefore a point locator?)
& = (what it seems?)
264 = section (also of the volume?)

Basic Legal Citation

That’s awesome!

But it doesn’t completely break it down for me. Am I right to assume that:

$$ = “sections” (plural)

If that’s the case, then why isn’t the input …

{ “section”: [‘253(a)’, 264] }

… (where I"m assuming the ‘(a)’ bit is just a subsection of 253)?

Finally, is this a point locator, or a resource locator? Or both?

Bruce