page number collapsing

So do we want to support this?

http://forums.zotero.org/discussion/7185?page=1

If we do, it probably suggests a global (or maybe citation or bib)
option something like “page-number-collapse” with a value that is the
algorithm to use (the only I implemented in my XSLT code was
Chicago’s).

Bruce

A big yes. But why not use a number attribute instead of an option? That
seems a bit more flexible.

Rintze

A big yes. But why not use a number attribute instead of an option?
That seems a bit more flexible.

Makes sense.

So I guess the RNC code would be something like:

to be optional on cs:number

attribute collapse-range { “chicago” }

We’d thus want to document the chicago algorithm (and any others we’d include).

Of course, we also have to figure out that other problem that Frank
noted with locators.

Bruce

I guess there also should be an (explicit) option so that a page range gets
expanded if it is supplied in a collapsed form.

Rintze

Blah.

Maybe …

Bruce

So do we want to support this?

http://forums.zotero.org/discussion/7185?page=1

If we do, it probably suggests a global (or maybe citation or bib)
option something like “page-number-collapse” with a value that is the
algorithm to use (the only I implemented in my XSLT code was
Chicago’s).

citeproc-js has a range mechanism (used for citation-number and
year-suffix collapsing) that could be extended to support this, but it
requires clean integers and formatting hints to work from. The
problem is how to get the data into structured form without adding
significantly to the hassle of data entry, and I think that’s going to
be a real problem, unfortunately. It’s date parsing on steroids, with
roman numerals (pages xi-xxv or XI-XXV), prefixed sequence numbers
(sections N23-N25), and combined locators with labels (ch. 3, pp. 3-7,
chs. 4-9). There’s very little structure to work from.

I agree that it would be really nice to have this, for a bunch of
reasons (I can imagine a world where clicking on pages while taking
notes on a document sets the correct locators for a cite tied to a
note), but the UI challenges would be formidable.

citeproc-js has a range mechanism (used for citation-number and
year-suffix collapsing) that could be extended to support this, but it
requires clean integers and formatting hints to work from. The
problem is how to get the data into structured form without adding
significantly to the hassle of data entry, and I think that’s going to
be a real problem, unfortunately. It’s date parsing on steroids, with
roman numerals (pages xi-xxv or XI-XXV), prefixed sequence numbers
(sections N23-N25), and combined locators with labels (ch. 3, pp. 3-7,
chs. 4-9). There’s very little structure to work from.

I agree that it would be really nice to have this, for a bunch of
reasons (I can imagine a world where clicking on pages while taking
notes on a document sets the correct locators for a cite tied to a
note), but the UI challenges would be formidable.

I’m not sure it’s that big a problem. The practical use case for this
is page numbers for the source; not so much locators. In most apps,
this will be a single field (as in Zotero), or maybe even two.

Bruce

citeproc-js has a range mechanism (used for citation-number and
year-suffix collapsing) that could be extended to support this, but it
requires clean integers and formatting hints to work from. The
problem is how to get the data into structured form without adding
significantly to the hassle of data entry, and I think that’s going to
be a real problem, unfortunately. It’s date parsing on steroids, with
roman numerals (pages xi-xxv or XI-XXV), prefixed sequence numbers
(sections N23-N25), and combined locators with labels (ch. 3, pp. 3-7,
chs. 4-9). There’s very little structure to work from.

I agree that it would be really nice to have this, for a bunch of
reasons (I can imagine a world where clicking on pages while taking
notes on a document sets the correct locators for a cite tied to a
note), but the UI challenges would be formidable.

I’m not sure it’s that big a problem. The practical use case for this
is page numbers for the source; not so much locators. In most apps,
this will be a single field (as in Zotero), or maybe even two.

Even for simple page numbers you’d need to do something to handle
upper and lowercased roman numerals, I suppose, so the application
would need to cope with that and deliver a number and a hint. Apart
from that, so long as all that is needed is range collapsing against a
list of numbers, it’s no real problem at the implementation end.

But doesn’t the distinction between locators and page specifiers begin
to evaporate with the introduction of hierarchical relations?

Even for simple page numbers you’d need to do something to handle
upper and lowercased roman numerals, I suppose, so the application
would need to cope with that and deliver a number and a hint. Apart
from that, so long as all that is needed is range collapsing against a
list of numbers, it’s no real problem at the implementation end.

Well, and we can also define what’s allowed. For example, we can start
by saying only page numbers can get collapsed, and only if the input
is an integer range.

But doesn’t the distinction between locators and page specifiers begin
to evaporate with the introduction of hierarchical relations?

No. By “locators” here I was meaning the details of the citation; not
the source.

Bruce

FWIW, I was annoyed by PubMed serving collapsed page ranges in its XML, so I
wrote a bit of translator code some time ago to expand page ranges. I think
it already should handle your last two cases: prefixed sequence numbers,
which are becoming common with electronic-only journals (e.g. E53-E56), and
multiple number ranges in a single string (also important for non-continuous
page ranges). Roman numerals shouldn’t be much of a problem either, as long
as you (reliably) can use hyphens as indicators that a range is present in
the string.

(original patch)


(plus a minor bug-fix)

RintzeOn Wed, May 27, 2009 at 12:40 AM, Frank Bennett <@Frank_Bennett>wrote:

It’s date parsing on steroids, with
roman numerals (pages xi-xxv or XI-XXV), prefixed sequence numbers
(sections N23-N25), and combined locators with labels (ch. 3, pp. 3-7,
chs. 4-9). There’s very little structure to work from.

FWIW, I was annoyed by PubMed serving collapsed page ranges in its XML, so I
wrote a bit of translator code some time ago to expand page ranges. I think
it already should handle your last two cases: prefixed sequence numbers,
which are becoming common with electronic-only journals (e.g. E53-E56), and
multiple number ranges in a single string (also important for non-continuous
page ranges). Roman numerals shouldn’t be much of a problem either, as long
as you (reliably) can use hyphens as indicators that a range is present in
the string.

Great stuff! I’m not against supporting parsed ranges per se; I just
think that maybe the CSL processor isn’t the best place for string
parsing code to reside.