Double points in output

Hello Simon (and others),

In this style:

<title>American Psychological Association</title>
 <id>http://www.zotero.org/styles/apa</id>

you use this macro for authors:

which is subsequently called by:

   <text macro="author" suffix="."/>

which (in my citeproc-py implementation) produces output like this:

Grutters, M., van Raaphorst, W., & Helder, W… (2001). Total
hydrolysable amino acid mineralisation in sediments across the
northeastern Atlantic continental slope (Goban Spur). Deep-Sea Research I, 48. , 811–832.

My concern is the double points after the last author name and before
the year. I doubt that this is what the style wants, right? On the
other hand, the macro with suffix set to “.” seems reasonable, as the
last characters of the author macro could also be a label with
parentheses. A label won’t get printed for authors and therefore the
last character becomes a point from the initials of the last author.

Is this a flaw in CSL, the style, or should I just find and replace
all double points with a single one? Or should a suffix (or prefix)
not get added if it is already part of the text?

How does Zotero handle this?

Johan—
http://www.johankool.nl/

If I had to guess, I’d say that Simon will say he’s got some special
handling to account for funky punctuation. I could see that depending
on the specific context, as you note, you could end up with a single
trailing period. The only reason you have a double one here is because
the last character before the suffix is an initialized given name.

If I’m right, it’d be nice to confirm this, and to write down the
rules somewhere so other can easily implement it consistently.

Bruce

If we would say that a suffix gets only added when the text it applies
to does not already end with it, would that deal with this problem and
yet not interfere with situations where this would actually be expected?

I would suggest the same thing for prefix too, as well as for delimiter.

The general rule would then be:

A prefix, delimiter or suffix will not be inserted if the text it
applies to already contain the requested prefix, delimiter or suffix.
If a part of the prefix, delimiter or suffix is already present, only
insert the remainder.

examples:
suffix=“. " text=“Kool, J. " → “Kool, J. "
suffix=”. " text=“Kool, J.” → “Kool, J. "
delimiter=”, " text1=“Kool, J.” text2=” D’Arcus, B.”–> “Kool, J.,
D’Arcus, B.”
delimiter=“, " text1=“Kool, J.,” text2=“D’Arcus, B.”–> “Kool, J.,
D’Arcus, B.”
delimiter=”, " text1=“Kool, J., " text2=“D’Arcus, B.”–> “Kool, J.,
D’Arcus, B.”
prefix=”(" text=“(Kool, J.)” → “(Kool, J.)”
prefix=" (" text=“(Kool, J.)” → " (Kool, J.)"

Does that make sense?

JohanOp 11 apr 2008, om 03:04 heeft Bruce D’Arcus het volgende geschreven:

On Thu, Apr 10, 2008 at 7:51 PM, Johan Kool <@Johan_Kool2> > wrote:

Is this a flaw in CSL, the style, or should I just find and replace
all double points with a single one? Or should a suffix (or prefix)
not get added if it is already part of the text?

If I had to guess, I’d say that Simon will say he’s got some special
handling to account for funky punctuation. I could see that depending
on the specific context, as you note, you could end up with a single
trailing period. The only reason you have a double one here is because
the last character before the suffix is an initialized given name.

If I’m right, it’d be nice to confirm this, and to write down the
rules somewhere so other can easily implement it consistently.


I ran into the same problem Liam is talking about. Here is a mail from
April about it. It proposes how to avoid double punctuations…

Is this a flaw in CSL, the style, or should I just find and replace
all double points with a single one? Or should a suffix (or prefix)
not get added if it is already part of the text?

If I had to guess, I’d say that Simon will say he’s got some special
handling to account for funky punctuation. I could see that depending
on the specific context, as you note, you could end up with a single
trailing period. The only reason you have a double one here is because
the last character before the suffix is an initialized given name.

If I’m right, it’d be nice to confirm this, and to write down the
rules somewhere so other can easily implement it consistently.

If we would say that a suffix gets only added when the text it applies
to does not already end with it, would that deal with this problem and
yet not interfere with situations where this would actually be expected?

I would suggest the same thing for prefix too, as well as for delimiter.

The general rule would then be:

A prefix, delimiter or suffix will not be inserted if the text it
applies to already contain the requested prefix, delimiter or suffix.
If a part of the prefix, delimiter or suffix is already present, only
insert the remainder.

examples:
suffix=“. " text=“Kool, J. " → “Kool, J. "
suffix=”. " text=“Kool, J.” → “Kool, J. "
delimiter=”, " text1=“Kool, J.” text2=” D’Arcus, B.”–> “Kool, J.,
D’Arcus, B.”
delimiter=“, " text1=“Kool, J.,” text2=“D’Arcus, B.”–> “Kool, J.,
D’Arcus, B.”
delimiter=”, " text1=“Kool, J., " text2=“D’Arcus, B.”–> “Kool, J.,
D’Arcus, B.”
prefix=”(" text=“(Kool, J.)” → “(Kool, J.)”
prefix=" (" text=“(Kool, J.)” → " (Kool, J.)"

Does that make sense?

JohanOp 11 apr 2008, om 03:04 heeft Bruce D’Arcus het volgende geschreven:

On Thu, Apr 10, 2008 at 7:51 PM, Johan Kool <@Johan_Kool2> > wrote:


Ah, thanks Johan, that makes sense.

Are there any occasions where this rule might be violated, i.e. in the case
of nested parentheses?

Regards,

Liam.2008/7/2 Johan Kool <@Johan_Kool2>:

Hello Liam,

Good point about the nested parentheses. This rule will indeed give a
problem there. It will make it impossible to output something like
"(1st edition (part 3))". We need to think how we can change the
proposed rule to deal properly with brackets “([{}])”.

The general rule could then be:

A prefix, delimiter or suffix will not be inserted if the text it
applies to already contain the requested prefix, delimiter or suffix.
If a part of the prefix, delimiter or suffix is already present, only
insert the remainder. If the prefix, delimiter or suffix contains any
of the following characters: “([{}])” always insert that character and
the remainder if applicable.

examples:
suffix=". " text="Kool, J. " --> “Kool, J. “
suffix=”. " text=“Kool, J.” --> “Kool, J. “
delimiter=”, " text1=“Kool, J.” text2=” D’Arcus, B.”–> “Kool, J.,
D’Arcus, B.“
delimiter=”, " text1=“Kool, J.,” text2=“D’Arcus, B.”–> “Kool, J.,
D’Arcus, B.“
delimiter=”, " text1=“Kool, J., " text2=“D’Arcus, B.”–> “Kool, J.,
D’Arcus, B.“
prefix=”(” suffix=”)” text=“Kool, J.)” --> “(Kool, J.))“
prefix=” (” suffix=”) " text=“Kool, J.) " --> " (Kool, J.) ) “
suffix=” )” text="Kool, J. " --> “Kool, J. )”

Hi Johan,

There might be further problems with spacing (i.e. two suffixes: ") “, " )”

  • should include the first, but not the second space?).

Could this be handled by an additional prefix/suffix/delimiter attribute
like “no-repeat”? Then for the default case it uses your rule below, but can
be overriden by the CSL designer? I suspect this is a rare use case, but I
don’t know my formats well enough.

Regards,

Liam.2008/7/2 Johan Kool <@Johan_Kool2>:

Hello Liam,

There might be further problems with spacing (i.e. two suffixes: ")
“, " )” - should include the first, but not the second space?).

Yes, that is how I would see it.
suffix=" )" text="text) " → “text) )” (single space between brackets)

Could this be handled by an additional prefix/suffix/delimiter
attribute like “no-repeat”? Then for the default case it uses your
rule below, but can be overriden by the CSL designer? I suspect this
is a rare use case, but I don’t know my formats well enough.

That would be another possibility. Let’s see what Bruce thinks about
this one.

JohanOp 2 jul 2008, om 11:38 heeft Liam Magee het volgende geschreven:

I don’t like the idea of forcing a style writer to figure this out.
The adjustment should be automatic.

It seems like it might make sense to just itemize the characters that
are likely to matter here. Might be regular expression as simple as
(,|.|\s)?

Can anyone figure out the algorithm that Simon used in Zotero?

Bruce

I think this is it:

2872 // clean up
2873 if(string.length && string[0] == “.” &&
2874
Zotero.CSL.FormattedString._punctuation.indexOf(this.string[this.string.length-1])
!= -1) {
2875 // if string already ends in punctuation, preserve the
existing stuff
2876 // and don’t add a period
2877 string = string.substr(1);
2878 } else if(this.string[this.string.length-1] == “(” &&
string[0] == " ") {
2879 string = string.substr(1);
2880 } else if(this.string[this.string.length-1] == " " &&
string[0] == “)”) {
2881 this.string = this.string.substr(0, this.string.length-1);
2882 }

Bruce

A slightly separate issue, but a related one that we should probably
solve at the same time: the way various abbreviations are currently
localized in the locales.xml files is incorrect with regard to periods.

To be properly localized, terms in the locales.xml files need to include
all periods, since certain abbreviations may or may not require trailing
periods depending on the language used. An example would be the short
version of March, which is “Mar.” in English but, according to the
current version of locales-de-DE.xml, “M�rz” in German (which is the
same as the long form). At the moment, if an explicit “.” suffix is
specified in the style, it would get added on to M�rz, and, based on
feedback in the Zotero forums from before we removed periods from the
locales.xml files, localized abbreviations with included periods would
get double periods. Both of those are wrong, so I think there needs to
be some sort of subtractive–rather than additive–suffix mechanism that
removes any periods that are present. Maybe something as simple as a
strip-periods attribute? In the normal case of the attribute not being
present, the string would be passed through as is, and existing styles
would need to be modified not to include the explicit period suffixes on
short form terms.

  • Dan

Dan,On Sun, Jul 6, 2008 at 3:20 PM, Dan Stillman <@Dan_Stillman> wrote:

A slightly separate issue, but a related one that we should probably
solve at the same time: the way various abbreviations are currently
localized in the locales.xml files is incorrect with regard to periods.

To be properly localized, terms in the locales.xml files need to include
all periods, since certain abbreviations may or may not require trailing
periods depending on the language used. An example would be the short
version of March, which is “Mar.” in English but, according to the
current version of locales-de-DE.xml, “März” in German (which is the
same as the long form). At the moment, if an explicit “.” suffix is
specified in the style, it would get added on to März, and, based on
feedback in the Zotero forums from before we removed periods from the
locales.xml files, localized abbreviations with included periods would
get double periods. Both of those are wrong, so I think there needs to
be some sort of subtractive–rather than additive–suffix mechanism that
removes any periods that are present. Maybe something as simple as a
strip-periods attribute? In the normal case of the attribute not being
present, the string would be passed through as is, and existing styles
would need to be modified not to include the explicit period suffixes on
short form terms.

The case of März would be the same as the English “May”: word short
enough to not require abbreviation.

So how does ‘May’ work?

Also, I need some more specific recommendation. Exactly what changes
are you proposing to the schema, to exactly where?

Bruce

The case of März would be the same as the English “May”: word short
enough to not require abbreviation.

So how does ‘May’ work?

It doesn’t. The MLA style, which has include-period=“true” in the
date-part month/short element, produces “May. 2008” in Zotero.

Also, I need some more specific recommendation. Exactly what changes
are you proposing to the schema, to exactly where?

From a quick glance at the schema, I’d suggest replacing all
occurrences of “include-period” with “strip-period”. Here’s the current
comment:

241 ## include-period adds a period after a term if and only if the
242 ## term used (not necessarily term specified; see above) is
243 ## of form “short” or “verb-short”

The comment can probably just be changed to begin with “strip-period
removes a period at the end of a term…”.

  • Dan

I just run into this problem me too (I’ve already sent a description
an hour ago or so).

But my output, here, would be:
Grutters, M. , van Raaphorst, W. , & Helder, W. .

since initialize-with has a space.

The harvard1 style, instead, renders everything just fine. Is there a
consensus on whether this is a style or a language issue?

Andrea