Input: page field

From comments posted by Brecht Machiels.

Brecht writes:

  • Which values are allowed for the “page” input field? I see multiple
    ranges can also be specified. I think the CSL spec should, in general,
    also define the format of the input fields. Personally, I would opt for a
    structured format (like the date fields) as opposed to a string-format
    (the page field). Individual CSL processors can still convert a
    string-formatted field to the structured data. This would require changes
    to the tests.

There is a similar issue with the locator field (and, in MLZ/CSL-m,
the section field on legal item types). Configured for MLZ,
citeproc-js currently parses out the content of all three (page,
section, locator), to extract label overrides and embedded labels,
where appropriate to combine locator and section (a hard-coded
pinpoint available on things like statute items), and to suss out
whether the top-level label should or should not be pluralized.

The logic works, and it addresses some show-stopping issues affecting
legal resources (i.e. label overrides and embedded labels): but it is
completely off-specification.

If there is a move to specify structured input for these fields, I can
provide use cases from the legal side. It would be good to have them
covered in the specification, although it would take a fair amount of
work to pin the behaviour down.

Frank

Hello,

From comments posted by Brecht Machiels.

Brecht writes:

  • Which values are allowed for the “page” input field? I see multiple
    ranges can also be specified. I think the CSL spec should, in general,
    also define the format of the input fields. Personally, I would opt for a
    structured format (like the date fields) as opposed to a string-format
    (the page field). Individual CSL processors can still convert a
    string-formatted field to the structured data. This would require changes
    to the tests.

There is a similar issue with the locator field (and, in MLZ/CSL-m,
the section field on legal item types). Configured for MLZ,
citeproc-js currently parses out the content of all three (page,
section, locator), to extract label overrides and embedded labels,
where appropriate to combine locator and section (a hard-coded
pinpoint available on things like statute items), and to suss out
whether the top-level label should or should not be pluralized.

The logic works, and it addresses some show-stopping issues affecting
legal resources (i.e. label overrides and embedded labels): but it is
completely off-specification.

It’s these kind of things in the test suite that I run into now and then
that
feel like black magic. Since its not part of the spec, it’s hard to
support that behavior. For now, I’m focusing on supporting what’s
described in the spec.

I understand that the behavior is hard to describe due to its very nature.
I believe the first step is to define a clear structured input data format.

If there is a move to specify structured input for these fields, I can
provide use cases from the legal side. It would be good to have them
covered in the specification, although it would take a fair amount of
work to pin the behaviour down.

I assume that in citeproc-js, you are parsing the string input data into
some kind of structured format? This could serve as a starting point.

The page field could be a list of page ranges, where a "page range"
doesn’t necessarily need to have an end page specified (to indicate a
single page).

page: [
[‘ii’, ‘vi’]
[5],
[8, 9]
]

Does this cover all use cases?

Regards,
Brecht

Hello,

From comments posted by Brecht Machiels.

Brecht writes:

  • Which values are allowed for the “page” input field? I see multiple
    ranges can also be specified. I think the CSL spec should, in general,
    also define the format of the input fields. Personally, I would opt for
    a

structured format (like the date fields) as opposed to a string-format
(the page field). Individual CSL processors can still convert a
string-formatted field to the structured data. This would require
changes

to the tests.

There is a similar issue with the locator field (and, in MLZ/CSL-m,
the section field on legal item types). Configured for MLZ,
citeproc-js currently parses out the content of all three (page,
section, locator), to extract label overrides and embedded labels,
where appropriate to combine locator and section (a hard-coded
pinpoint available on things like statute items), and to suss out
whether the top-level label should or should not be pluralized.

The logic works, and it addresses some show-stopping issues affecting
legal resources (i.e. label overrides and embedded labels): but it is
completely off-specification.

It’s these kind of things in the test suite that I run into now and then

that
feel like black magic. Since its not part of the spec, it’s hard to
support that behavior. For now, I’m focusing on supporting what’s
described in the spec.

I understand that the behavior is hard to describe due to its very
nature.

I believe the first step is to define a clear structured input data
format.

If there is a move to specify structured input for these fields, I can
provide use cases from the legal side. It would be good to have them
covered in the specification, although it would take a fair amount of
work to pin the behaviour down.

I assume that in citeproc-js, you are parsing the string input data into

some kind of structured format? This could serve as a starting point.

The page field could be a list of page ranges, where a “page range”
doesn’t necessarily need to have an end page specified (to indicate a
single page).

page: [
[‘ii’, ‘vi’]
[5],
[8, 9]
]

Does this cover all use cases?

A while ago we started a list of observations that should be considered
when standardizing the input format. Perhaps you could add the page field
there as well for future reference:

Sylvester