include-period vs. strip-periods/strip-characters

Apologies for my somewhat sparse presence on this list as of late, but
I wanted to make sure we at least investigate this issue before
releasing 1.0. Currently, we have an include-period attribute, which
determines whether a period is added to locale terms. We’ve discussed
this several times before, but we’ve never committed to anything:

https://sourceforge.net/mailarchive/message.php?msg_id=188351850903220251k17336a2q82a2974204d9364f%40mail.gmail.com
https://sourceforge.net/mailarchive/message.php?msg_id=9A5E1819-A25A-4760-9D76-1726BCF5BB5E%40caltech.edu
https://sourceforge.net/mailarchive/message.php?msg_id=0BB1DD0A-287B-483C-958D-0131B8FDFCEC%40simonster.com

Since it seems that 1.0 will not be backwards-compatible with 0.8,
perhaps we can add an option to strip periods before it goes live? The
first use case for this is locale abbreviations, e.g.

Apr. -> Apr

We currently do the reverse, adding periods to locale terms based on
form, but stripping the periods seems more logical and there is less
apparent black magic involved. This would require some changes to the
locales. The second use case is journal abbreviations, e.g.

Nat. Cell Biol. -> Nat Cell Biol

This can only be accomplished by stripping the periods, since we have
no way of knowing which words are abbreviations and which are not.
Thus, include-period is not really sufficient for this purpose.

We have two possible ways of implementing this feature. The first,
suggested by Rintze, is

and the second, which i suggested a while back, is

I have no real preference here. Both of these should be comparably
simple in implementation. The former seems more extensible, but I’m
not sure it would ever get used for anything besides periods, in which
case the extensibility would be for naught. If anyone has a preference
or issue with this change, please let me know. Otherwise, I’ll commit
this tomorrow.

Thanks,
Simon

Question: would this only apply to the “short” form?

Question: would this only apply to the “short” form?

It doesn’t seem like this stipulation would be necessary. include-
period as it is currently implemented applies only to the sort form
because periods shouldn’t be added to the symbol or long forms, but if
the short forms simply included periods in locales.xml and the other
forms did not, then we wouldn’t need this kind of magic. I can’t think
of any cases off the top of my head where periods would be desirable
in symbol or long forms but not in short forms.

Simon

The
first use case for this is locale abbreviations, e.g.

Apr. -> Apr

We currently do the reverse, adding periods to locale terms based on
form, but stripping the periods seems more logical and there is less
apparent black magic involved. This would require some changes to the
locales.

Plus I guess that the current setup (which depends on include-period)
wrongly attaches a period on months that aren’t abbreviated, like
“May” (although you can of course work around that by checking whether
the short and long forms are the same, and skip the period if they
are).

The second use case is journal abbreviations, e.g.

Nat. Cell Biol. -> Nat Cell Biol

This can only be accomplished by stripping the periods, since we have
no way of knowing which words are abbreviations and which are not.
Thus, include-period is not really sufficient for this purpose.

We have two possible ways of implementing this feature. The first,
suggested by Rintze, is

and the second, which i suggested a while back, is

I have no real preference here. Both of these should be comparably
simple in implementation. The former seems more extensible, but I’m
not sure it would ever get used for anything besides periods, in which
case the extensibility would be for naught. If anyone has a preference
or issue with this change, please let me know. Otherwise, I’ll commit
this tomorrow.

The only other related case I can think of right now are patent
numbers. Some styles use “U.S. patent 3,002,329” while others use
“U.S. patent 3002329”, e.g.:


http://www.library.dal.ca/Files/How_do_I/pdf/apa_style.pdf
So there you’d like to strip the commas (assuming you store the patent
numbers in your database with commas). Or you’d need a way to parse
the patent number and choose to apply a thousands separator or not
(which are locale-specific, but I digress).

Rintze

Question: would this only apply to the “short” form?

It doesn’t seem like this stipulation would be necessary. include-
period as it is currently implemented applies only to the sort form
because periods shouldn’t be added to the symbol or long forms, but if
the short forms simply included periods in locales.xml and the other
forms did not, then we wouldn’t need this kind of magic. I can’t think
of any cases off the top of my head where periods would be desirable
in symbol or long forms but not in short forms.

I just need to figure out how to specify it. My impulse is to limit to
short forms for clarity.

If it can apply to any form, we can specify it as a formatting
attribute, which seems simpler to me, but I have no strong opinion.

Simon

So then a generalized strip-characters attribute would support this
case. But, if we wanted to take the localized approach, we might be
better off extending the element?

Simon

The only other related case I can think of right now are patent
numbers. Some styles use “U.S. patent 3,002,329” while others use
"U.S. patent 3002329", e.g.:
http://www.aresearchguide.com/12biblio.html#30
http://www.library.dal.ca/Files/How_do_I/pdf/apa_style.pdf
So there you’d like to strip the commas (assuming you store the patent
numbers in your database with commas). Or you’d need a way to parse
the patent number and choose to apply a thousands separator or not
(which are locale-specific, but I digress).

So then a generalized strip-characters attribute would support this case.
But, if we wanted to take the localized approach, we might be better off
extending the element?

The use of strip-characters would be limited in this case (it would be
a one-way street, so it wouldn’t help with styles that specify the use
of a thousands delimiter unless you’ve carefully curated your
database). The latter option would be preferable, but hasn’t nearly
the priority of strip-periods. Newly issued patent numbers don’t sport
these thousands separators anymore anyway (e.g.
http://ep.espacenet.com/help?locale=en_EP&method=handleHelpTopic&topic=publicationnumber).
So to keep things simple I advocate strip-periods for CSL 1.0.On Mon, Jul 13, 2009 at 7:51 PM, Simon Kornblith<@Simon_Kornblith> wrote:

On Jul 13, 2009, at 10:28 AM, Rintze Zelle wrote:

On Mon, Jul 13, 2009 at 7:43 PM, Bruce D’Arcus<@Bruce_D_Arcus1> wrote:

I just need to figure out how to specify it. My impulse is to limit to
short forms for clarity.

IMHO the specification should afford sufficient clarity. I don’t see
the need to complicate the schema just to limit strip-periods to
short-form variables in cs:text (restricting it to variables in
cs:text should do just fine).

Rintze

The only other related case I can think of right now are patent
numbers. Some styles use “U.S. patent 3,002,329” while others use
"U.S. patent 3002329", e.g.:
http://www.aresearchguide.com/12biblio.html#30
http://www.library.dal.ca/Files/How_do_I/pdf/apa_style.pdf
So there you’d like to strip the commas (assuming you store the patent
numbers in your database with commas). Or you’d need a way to parse
the patent number and choose to apply a thousands separator or not
(which are locale-specific, but I digress).

So then a generalized strip-characters attribute would support this case.
But, if we wanted to take the localized approach, we might be better off
extending the element?

The use of strip-characters would be limited in this case (it would be
a one-way street, so it wouldn’t help with styles that specify the use
of a thousands delimiter unless you’ve carefully curated your
database). The latter option would be preferable, but hasn’t nearly
the priority of strip-periods. Newly issued patent numbers don’t sport
these thousands separators anymore anyway (e.g.
http://ep.espacenet.com/help?locale=en_EP&method=handleHelpTopic&topic=publicationnumber).
So to keep things simple I advocate strip-periods for CSL 1.0.

I just need to figure out how to specify it. My impulse is to limit to
short forms for clarity.

IMHO the specification should afford sufficient clarity. I don’t see
the need to complicate the schema just to limit strip-periods to
short-form variables in cs:text (restricting it to variables in
cs:text should do just fine).

I vote for making this a formatting attribute that just strips
trailing periods, and allowing it everywhere. Using it on a number
would not be useful, but that’s also true of text-case (which is
currently a formatting attribute available everywhere).

Trailing periods only doesn’t cover use cases like abbreviated journal
titles or organizational names.

I guess I’d vote for adding “strip-periods” and for it to apply to the
whole string; only on cs:text.

Bruce

I vote for making this a formatting attribute that just strips
trailing periods, and allowing it everywhere.

Trailing periods only doesn’t cover use cases like abbreviated journal
titles or organizational names.

I guess I’d vote for adding “strip-periods” and for it to apply to the
whole string; only on cs:text.

And label as well?

I vote for making this a formatting attribute that just strips
trailing periods, and allowing it everywhere.

Trailing periods only doesn’t cover use cases like abbreviated journal
titles or organizational names.

I guess I’d vote for adding “strip-periods” and for it to apply to the
whole string; only on cs:text.

And label as well?

My last two messages in this thread were out of focus, and seem to
have stalled progress toward settling this item.

In response to Bruce’s last:

I guess I’d vote for adding “strip-periods” and for it to
apply to the whole string; only on cs:text.

I should have just written:

+1

Frank