Custom year-suffix-delimiters

There is some variation in the delimiters used for collapsed year-suffixes,
e.g. “(Doe 2000a, b, c; 2001)” or “(Doe 2000a;b;c, 2001)”. Currently Zotero
uses a hard coded ", " string as CSL previously did not offer a solution to
specify this delimiter. Recently Bruce added a new variable, year-suffix, to
fix this issue (
http://sourceforge.net/tracker/?func=detail&aid=2212677&group_id=117435&atid=678021).
As delimiters can be set for text elements, I guess that the CSL code could
then look like:

// dates/test001.txt
{
“citation”:[ {“source”:“productiveJohn”} ],
“testof”: “citation”,
“csl”:"

", "result":"(Doe 2000a,b; 2001)" }

I asked Bruce if this was what he had in mind, and he suggested to me to
post this example here for discussion. I tried to write it in test format,
so hopefully it is of some use for Frank.

Rintze

Rintze,

That looks good to me. I’ll work up some sample data to drive it when
i get to that point.

Disambiguation by adding names just passed, and I’m feeling a little
pleased with myself, so I’m going to take a rest from the code for a
few days. But this was a huge and dreadful step cleared: we’re just
about ready to play “Break My Program”.

Frank2009/3/27 Rintze Zelle <@Rintze_Zelle>:

That’s great!

Bruce

Rintze,

I’ve taken a closer look at the year-suffix test, and I’d like to
raise just one issue re the year-suffix variable. Here’s the test:

http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/citeproc-js/branches/fbennett/std/tests/dates/test001.txt?revision=857&view=markup

Here’s the discussion of the year-suffix variable that the test illustrates:

https://sourceforge.net/mailarchive/forum.php?thread_name=53208a5f0903020639n7da5d177p47e6f243f7f90e9b%40mail.gmail.com&forum_name=xbiblio-devel

I’m onboard with treating year-suffix as a magic variable rendered
through a text element. I didn’t do things that way in the first-cut
implementation of year-suffix that I just finished, but that was an
oversight; this is the right way to do it.

I get stuck on that delimiter=",", though. I wonder whether its
function might be a little too far removed from that of ordinary
delimiters to have the the same attribute name. Text elements can
take a delimiter= attribute already, but it’s a delimiter between
components contained inside the element (i.e. for a text element,
between multiple variables). The same is true of all other delimiters
in CSL; multiple elements within the element holding the delimiter=
attribute are joined with it. In this case, though, it sets a magic
prefix, used when joining the year-suffix to a preceding cite, when
the suffix is the first thing rendered. I don’t think that’s obvious
from the term and its appearance in the code. Could it possibly be
given a special name, maybe something with “collapse” in it, to
provide a hint?

Frank

When I wrote the test, I was also struck by how differently “delimiter” is
used in this specific case. The only thing that comes close to the thing we
want to achieve here is the current way to set delimiters between different
cited items in a citation cluster (if I have the terminology right): via the
layout-element. Maybe here support for an additional argument could be
included, e.g.:

This would keep things together and might be clearer.

Rintze

Or it could be an option (“year-suffix-delimiter”). I’d probably opt for that.

Bruce2009/4/2 Rintze Zelle <@Rintze_Zelle>:

When I wrote the test, I was also struck by how differently “delimiter” is
used in this specific case. The only thing that comes close to the thing we
want to achieve here is the current way to set delimiters between different
cited items in a citation cluster (if I have the terminology right): via the
layout-element. Maybe here support for an additional argument could be
included, e.g.:

This would keep things together and might be clearer.

Or it could be an option (“year-suffix-delimiter”). I’d probably opt for that.

I’d be happy with that if there is agreement all around.

That’s fine (you proposed this earlier:
http://sourceforge.net/mailarchive/message.php?msg_name=fbb7c5df0903031708r54e6db1bq9e76a0d797d6d1ed%40mail.gmail.com
).

Rintze

That’s fine (you proposed this earlier:
http://sourceforge.net/mailarchive/message.php?msg_name=fbb7c5df0903031708r54e6db1bq9e76a0d797d6d1ed%40mail.gmail.com).

This is now working in the fbennett branch of citeproc-js.

As if to drive home the value of varied contributions to the test
suite, implementing this feature turned up two important bugs in
citeproc-js that I hadn’t come across on my own: reverse ordering of
times registered without a sort key; and superfluous spacing in some
grouped environments. So special thanks to Rintze for this; it’s the
first contributed test, and it caught three birds with one stone!

Test:
http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/citeproc-js/branches/fbennett/std/tests/dates/test001.txt?revision=863&view=markup

Data:
http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/citeproc-js/branches/fbennett/std/items/year-suffixes-1.txt?revision=833&view=markup
http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/citeproc-js/branches/fbennett/std/items/year-suffixes-2.txt?revision=833&view=markup
http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/citeproc-js/branches/fbennett/std/items/year-suffixes-3.txt?revision=833&view=markup

Frank2009/4/2 Rintze Zelle <@Rintze_Zelle>:

As for the changes this requires in the schema, is this what you had in
mind, Bruce?

http://groups.google.com/group/zotero-dev/web/year-suffix-delimiter.patch

Rintze

Yes, but we might want to specify a generic delimited pattern, rather
than to use ‘text’. Do we want to constrain that with a regular
expeessio, or not?

What could be the benefit of limiting the possible values?

Rintze

Mainly clarity. There’s no practical point in allowing someone to have
a value of “aaaa” or “24”; likely the only values can be represented
something like “\s*(:|;|,)?\s*”.

But admittedly it’s not that important; just asking.

Bruce

Well, okay. I don’t have a strong opinion though (I don’t know if the
increased clarity in the use of the schema outweighs the added complexity of
the schema and the decreased flexibility in the creation of styles).

BTW, this is a bit off-topic, but while I was browsing Google Scholar
looking for different types of delimiters, I came across something else:
collapsing of year-suffixes, e.g.:

“(O’Reilly et al., 1997; Dhanabal et al., 1999a–c)” and “[Korzan et al.,
2000a-c]“
versus
”(Hall, 2000; Domingo, 1999; Barron, 2000abc)”

So, do we want to support this (I think we should), and if so, how? Maybe a
boolean option (e.g. collapse-year-suffixes) would suffice?

Rintze

Well, okay. I don’t have a strong opinion though (I don’t know if the
increased clarity in the use of the schema outweighs the added complexity of
the schema and the decreased flexibility in the creation of styles).

BTW, this is a bit off-topic, but while I was browsing Google Scholar
looking for different types of delimiters, I came across something else:
collapsing of year-suffixes, e.g.:

“(O’Reilly et al., 1997; Dhanabal et al., 1999a–c)” and “[Korzan et al.,
2000a-c]“
versus
”(Hall, 2000; Domingo, 1999; Barron, 2000abc)”

So, do we want to support this (I think we should), and if so, how? Maybe a
boolean option (e.g. collapse-year-suffixes) would suffice?

At the implementation level, this is collapse=“citation-number” with
different colors; it shouldn’t much extra trouble to support it. But
it raises an off-by-one issue in the naming scheme for the collapse
options. While it would change existing CSL, but would it be possible
to adopt the following progression:

(equals current "year") (equals current "year-suffix") (this new feature)

Frank

Good point.

OTOH, if we contemplated this, we might as well through in another
related issue and address them all together:

Page number collapsing sometimes follows different algorithms. In my
implementation, I just used Chicago’s. But one could imagine there may
be others.

So the issue may be not just whether to collapse a particular list of
tokens or integers, but how.

Bruce

Well, okay. I don’t have a strong opinion though (I don’t know if the
increased clarity in the use of the schema outweighs the added complexity of
the schema and the decreased flexibility in the creation of styles).

BTW, this is a bit off-topic, but while I was browsing Google Scholar
looking for different types of delimiters, I came across something else:
collapsing of year-suffixes, e.g.:

“(O’Reilly et al., 1997; Dhanabal et al., 1999a–c)” and “[Korzan et al.,
2000a-c]”
versus
“(Hall, 2000; Domingo, 1999; Barron, 2000abc)”

So, do we want to support this (I think we should), and if so, how? Maybe a
boolean option (e.g. collapse-year-suffixes) would suffice?

At the implementation level, this is collapse=“citation-number” with
different colors; it shouldn’t much extra trouble to support it. But
it raises an off-by-one issue in the naming scheme for the collapse
options. While it would change existing CSL, but would it be possible
to adopt the following progression:

(equals current "year") (equals current "year-suffix") (this new feature)

Good point.

OTOH, if we contemplated this, we might as well through in another
related issue and address them all together:

Page number collapsing sometimes follows different algorithms. In my
implementation, I just used Chicago’s. But one could imagine there may
be others.

So the issue may be not just whether to collapse a particular list of
tokens or integers, but how.

I spent some time thinking about this one today, and saw this recent
item on the Zotero forum:

Localizing the range join character would be simple enough. I’m
wondering about what might happen with final joins in a numeric
series.

How much variety is there likely to be? If one were to aim at
supporting hard cases, so that easy cases take care of themselves, I
think the following may cover the possible extent of the pain:

The two series [1,2,3,5,6,7] and [1,2,3,5,7] could render as:

1,2,3,5,6,7 and 1,2,3,5,7 (series)

1-3,5-7 and 1-3,5,7 (series with range collapse only)
1,2,3,5,6&7 and 1,2,3,5&7 (series with final join only)

1-3&5-7 and 1-3,5&7 (both, priority for range joins, final join for
ranges ok)
1-3,5-7 and 1-3,5&7 (both, priority for range joins, final join for
singletons only)
1-3,5,6&7 and 1-3,5&7 (both, priority for singleton joins)

(Plus things should work with arabic, roman or year-suffix, but that’s
just an implementation wrinkle.) Some of these don’t make any sense,
but on the assumption that editorial caprice knows no bounds, I wonder
what the CSL options capable of capturing all of them would look like.

Just to add some complexity, maybe this should be handled in CSL as well:
https://www.zotero.org/trac/ticket/1083

“For better or worse IEEE wants brackets to work in a different way than the
CSL suggests.
Multiple citations must appear like [1]-[5], not like [1-5].”

RintzeOn Mon, Apr 6, 2009 at 11:05 AM, Frank Bennett <@Frank_Bennett>wrote:

More complexity:
http://groups.google.com/group/zotero-dev/browse_thread/thread/674585c991cc8e11

“I’m trying to create a new style which uses only the start page [ed:from a
page range].”

More complexity:
http://groups.google.com/group/zotero-dev/browse_thread/thread/674585c991cc8e11

“I’m trying to create a new style which uses only the start page [ed:from a
page range].”

Glad to see this, actually, it’s required by Bluebook. You could
maybe just do something like:

?

Just to add some complexity, maybe this should be handled in CSL as well:
https://www.zotero.org/trac/ticket/1083

“For better or worse IEEE wants brackets to work in a different way than
the CSL suggests.
Multiple citations must appear like [1]-[5], not like [1-5].”

Looks like maybe a case for multiple collapse values …

?

Frank