chicago range collapsing

Rintze asked me about this, and I see Frank is putting it on the
radar. Here’s (old) XSLT code that correctly does Chicago range
collapsing. Starting on line 39 …

http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/citeproc-xsl/trunk/lib/main/functions.xsl?view=markup

Note, though, it only works if the begin and end of the range are each integers.

Note also this solution was suggested by David Carlisle, who’s a math
guy. Tthere are other, perhaps more standard, ways to do the same
thing I guess.

Bruce

Rintze asked me about this, and I see Frank is putting it on the
radar. Here’s (old) XSLT code that correctly does Chicago range
collapsing. Starting on line 39 …

http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/citeproc-xsl/trunk/lib/main/functions.xsl?view=markup

Translated to Python:

from operator import idiv, mod
def collapse(begin, end):
… if begin > 100 and mod(begin, 100) and idiv(begin, 100) ==
idiv(end, 100):
… return(str(begin) + ‘-’ + str(mod(end, 100)))
… else:
… return(str(begin) + ‘-’ + str(end))

collapse(101, 108)
‘101-8’

… though I’d use an en-dash for the separator (If I could figure out
how to enter it in linux!).

So when I implement this, I’ll obviously add a third parameter for the
algorithm.

Bruce

Rintze asked me about this, and I see Frank is putting it on the
radar. Here’s (old) XSLT code that correctly does Chicago range
collapsing. Starting on line 39 …

http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/citeproc-xsl/trunk/lib/main/functions.xsl?view=markup

Translated to Python:

from operator import idiv, mod
def collapse(begin, end):
… if begin > 100 and mod(begin, 100) and idiv(begin, 100) ==
idiv(end, 100):
… return(str(begin) + ‘-’ + str(mod(end, 100)))
… else:
… return(str(begin) + ‘-’ + str(end))

collapse(101, 108)
‘101-8’

Nice! Deliciously concise, and almost right. Collapsing fails with
this code when three or more digits change, which one off the rules
requires. Depending on whose account of the Chicago collapsing rules
we follow, either one or three of the examples in common circulation
fail. (see comment in self-contained test file in Python, attached).

Frank

TESTNUM.py (1.34 KB)

Nice! Deliciously concise, and almost right. Collapsing fails with
this code when three or more digits change, which one off the rules
requires.

I’m not so sure. According to latest CMS, 9.64 (the actual source):

“if three digits change in a four digit number, use all four.”

It doesn’t address five digit numbers, but I presume the same behavior.

Depending on whose account of the Chicago collapsing rules
we follow, either one or three of the examples in common circulation
fail. (see comment in self-contained test file in Python, attached).

Been a long day, but I think the above suggests it’s correct?

Bruce

Nice! Deliciously concise, and almost right. Collapsing fails with
this code when three or more digits change, which one off the rules
requires.

I’m not so sure. According to latest CMS, 9.64 (the actual source):

“if three digits change in a four digit number, use all four.”

It doesn’t address five digit numbers, but I presume the same behavior.

There is a five-digit number with three digits changed in both of the
linked guides, and it’s shown collapsed in both.

Depending on whose account of the Chicago collapsing rules
we follow, either one or three of the examples in common circulation
fail. (see comment in self-contained test file in Python, attached).

Been a long day, but I think the above suggests it’s correct?

There’s that one case left, but it’s easy to fix with one more
conditional in the algorithm. We’re nearly there. I’ve written to
Elena for guidance.

Frank

Nice! Deliciously concise, and almost right. Collapsing fails with
this code when three or more digits change, which one off the rules
requires.

I’m not so sure. According to latest CMS, 9.64 (the actual source):

“if three digits change in a four digit number, use all four.”

It doesn’t address five digit numbers, but I presume the same behavior.

There is a five-digit number with three digits changed in both of the
linked guides, and it’s shown collapsed in both.

OK, I see that example in CMS. The language is rather vague.

Depending on whose account of the Chicago collapsing rules
we follow, either one or three of the examples in common circulation
fail. (see comment in self-contained test file in Python, attached).

Been a long day, but I think the above suggests it’s correct?

There’s that one case left, but it’s easy to fix with one more
conditional in the algorithm. We’re nearly there. I’ve written to
Elena for guidance.

Well, it seems the case is technically valid, but surely quite
uncommon. Still, if someone could figure out how to tweak my algorithm
to cover it (since I’m not sure I can; never was good at math), that’d
be nice to have.

Bruce

Nice! Deliciously concise, and almost right. Collapsing fails with
this code when three or more digits change, which one off the rules
requires.

I’m not so sure. According to latest CMS, 9.64 (the actual source):

“if three digits change in a four digit number, use all four.”

It doesn’t address five digit numbers, but I presume the same behavior.

There is a five-digit number with three digits changed in both of the
linked guides, and it’s shown collapsed in both.

OK, I see that example in CMS. The language is rather vague.

Depending on whose account of the Chicago collapsing rules
we follow, either one or three of the examples in common circulation
fail. (see comment in self-contained test file in Python, attached).

Been a long day, but I think the above suggests it’s correct?

There’s that one case left, but it’s easy to fix with one more
conditional in the algorithm. We’re nearly there. I’ve written to
Elena for guidance.

Well, it seems the case is technically valid, but surely quite
uncommon. Still, if someone could figure out how to tweak my algorithm
to cover it (since I’m not sure I can; never was good at math), that’d
be nice to have.

Me neither! It looks more like a string operation, though. The
attached code (doubles the number of code lines, but) does the right
thing by CMS. Elena has confirmed the rules, I think we’re ready to
slot this in.

Frank

TESTNUM.py (1.75 KB)

Could this be easily expanded to support discontinuous page-ranges like
"102,122-27"? I’m not sure how common these are, but it would just require a
search pattern for (numeric)(dash)(numeric), I think.

RintzeOn Sun, Aug 23, 2009 at 3:53 AM, Frank Bennett <@Frank_Bennett>wrote:

Could this be easily expanded to support discontinuous page-ranges like
"102,122-27"? I’m not sure how common these are, but it would just require a
search pattern for (numeric)(dash)(numeric), I think.

Yes, we’ll scan for range-like strings when processing the page field.

A couple of further items to confirm on this:

(a) I would like to assume that page numbers in the page field will
always be fully expanded in the data. This will allow stricter
validation, so that things like 23-4 (where the reference is to a
single page or location identified by the string “23-4”) can pass
through unmolested. Is this acceptable?

(b) I am inclined to assume that page numbers with arbitrary prefixes
(like N315-N333) should always be printed fully expanded. It seems to
me that collapsing page numbers that have extra clutter in them might
be confusing or even misleading. Is that a safe assumption?

Frank

Could this be easily expanded to support discontinuous page-ranges like
"102,122-27"? I’m not sure how common these are, but it would just require a
search pattern for (numeric)(dash)(numeric), I think.

Yes, we’ll scan for range-like strings when processing the page field.

I suggest someone write the spec language that explains how one
determines “range-like string”, perhaps even include a regular
expression.

A couple of further items to confirm on this:

(a) I would like to assume that page numbers in the page field will
always be fully expanded in the data. This will allow stricter
validation, so that things like 23-4 (where the reference is to a
single page or location identified by the string “23-4”) can pass
through unmolested. Is this acceptable?

To me, yes.

(b) I am inclined to assume that page numbers with arbitrary prefixes
(like N315-N333) should always be printed fully expanded. It seems to
me that collapsing page numbers that have extra clutter in them might
be confusing or even misleading. Is that a safe assumption?

I think it is.

Going back my point above, we should probably formalize this in the
language of the spec.

Bruce

Update on this …

Me neither! It looks more like a string operation, though. The
attached code (doubles the number of code lines, but) does the right
thing by CMS.

I simplified Frank’s code to this:

if algorithm == 'chicago':
    if begin > 100 and mod(begin, 100) and idiv(begin, 100) == idiv(end, 10$
        return(str(begin) + '–' + str(mod(end, 100)))
    elif begin >= 10000:
        return(str(begin) + '–' + str(mod(end, 1000)))
    else:
        return(str(begin) + '–' + str(end))

The elif statement replaces a number of lines of other code, and the tests pass.

Bruce

Oops …On Wed, Sep 30, 2009 at 11:29 AM, Bruce D’Arcus <@Bruce_D_Arcus1> wrote:

   if begin > 100 and mod(begin, 100) and idiv(begin, 100) == idiv(end, 10$

if begin > 100 and mod(begin, 100) and idiv(begin, 100) == idiv(end, 100):

Bruce

Update on this …

Me neither! It looks more like a string operation, though. The
attached code (doubles the number of code lines, but) does the right
thing by CMS.

I simplified Frank’s code to this:

if algorithm == ‘chicago’:
if begin > 100 and mod(begin, 100) and idiv(begin, 100) == idiv(end, 10$
return(str(begin) + ‘–’ + str(mod(end, 100)))
elif begin >= 10000:
return(str(begin) + ‘–’ + str(mod(end, 1000)))
else:
return(str(begin) + ‘–’ + str(end))

The elif statement replaces a number of lines of other code, and the tests pass.

Great stuff! I’ve applied this to citeproc-js, and it does indeed
perform perfectly. The replaced code can’t be deleted completely
because it’s shared with “minimal”, but it’s good to have the code for
chicago self-contained.