how much bugged a style may be?

Andrea_Rossato1 · November 30, 2010, 8:47pm

I think the mhra-x example is probably the most likely condition to
lead to this problem: a space delimiter to be suppressed because of a
punctuation. But this is just a guess based on the time I spent
analyzing this problem.

Andrea

Frank_Bennett · November 30, 2010, 9:04pm

Andrea,

I’ll happily adjust to do whatever the specification calls for. The
CSL behind the tests you raise in your mail all could be written to
avoid the need for duplicate suppression. So no dispute there.

I’ve laid out the reasoning behind space suppression in my previous
mail. The trade-off for eliminating this behavior would be breakage in
a number of existing styles. To protect against that, both at this
point and in future development, we would need a test framework, with
a good foundation of test cases for all extant styles. I don’t think
anyone is proposing to build that infrastructure, so the breakage
would mostly emerge from user feedback. That would mean a lot of
back-and-forth correspondence for the debugging styles, often under
severe time pressure at both ends.

Can we isolate the types of conditions which are likely to lead to
these problems? Do we know of some examples, beyond the ones in the
test suite that Andrea identified?

We should have specific tests for the various combinations. They are
currently embedded in full styles, which is not terribly transparent.
I’m pretty familiar with the possibilities from that recent work with
Carles, and I’ve been thinking about simplifying the test cases. I’ll
move that up in the todo list.

Frank_Bennett · November 30, 2010, 9:40pm

Andrea,

I’ll happily adjust to do whatever the specification calls for. The
CSL behind the tests you raise in your mail all could be written to
avoid the need for duplicate suppression. So no dispute there.

I’ve laid out the reasoning behind space suppression in my previous
mail. The trade-off for eliminating this behavior would be breakage in
a number of existing styles. To protect against that, both at this
point and in future development, we would need a test framework, with
a good foundation of test cases for all extant styles. I don’t think
anyone is proposing to build that infrastructure, so the breakage
would mostly emerge from user feedback. That would mean a lot of
back-and-forth correspondence for the debugging styles, often under
severe time pressure at both ends.

Can we isolate the types of conditions which are likely to lead to
these problems? Do we know of some examples, beyond the ones in the
test suite that Andrea identified?

We should have specific tests for the various combinations. They are
currently embedded in full styles, which is not terribly transparent.
I’m pretty familiar with the possibilities from that recent work with
Carles, and I’ve been thinking about simplifying the test cases. I’ll
move that up in the todo list.

The tests do need to be made explicit, and I will set that up.

But my original question remains: What is the benefit of not
suppressing extraneous spaces?

Bruce_D_Arcus1 · November 30, 2010, 9:42pm

I thought Andrea answered that already: that it hides what are
effectively style bugs.

I haven’t figured out my position on this, but I think that’s the
debate: about whether to fix these bugs in implementation code, or to
force styles authors to do it.

Bruce

Frank_Bennett · November 30, 2010, 10:24pm

Andrea,

I’ll happily adjust to do whatever the specification calls for. The
CSL behind the tests you raise in your mail all could be written to
avoid the need for duplicate suppression. So no dispute there.

I’ve laid out the reasoning behind space suppression in my previous
mail. The trade-off for eliminating this behavior would be breakage in
a number of existing styles. To protect against that, both at this
point and in future development, we would need a test framework, with
a good foundation of test cases for all extant styles. I don’t think
anyone is proposing to build that infrastructure, so the breakage
would mostly emerge from user feedback. That would mean a lot of
back-and-forth correspondence for the debugging styles, often under
severe time pressure at both ends.

Can we isolate the types of conditions which are likely to lead to
these problems? Do we know of some examples, beyond the ones in the
test suite that Andrea identified?

We should have specific tests for the various combinations. They are
currently embedded in full styles, which is not terribly transparent.
I’m pretty familiar with the possibilities from that recent work with
Carles, and I’ve been thinking about simplifying the test cases. I’ll
move that up in the todo list.

The tests do need to be made explicit, and I will set that up.

But my original question remains: What is the benefit of not
suppressing extraneous spaces?

I thought Andrea answered that already: that it hides what are
effectively style bugs.

Whether they are bugs or not depends on the specification. So to
recast the question, what is lost by requiring in the specification
that duplicate spaces be suppressed?

I haven’t figured out my position on this, but I think that’s the
debate: about whether to fix these bugs in implementation code, or to
force styles authors to do it.

Yes, with the qualification that for “bugs” I would say “duplicate spaces”.

Andrea_Rossato1 · December 1, 2010, 7:59am

I thought Andrea answered that already: that it hides what are
effectively style bugs.

Whether they are bugs or not depends on the specification. So to
recast the question, what is lost by requiring in the specification
that duplicate spaces be suppressed?

I haven’t figured out my position on this, but I think that’s the
debate: about whether to fix these bugs in implementation code, or to
force styles authors to do it.

Yes, with the qualification that for “bugs” I would say “duplicate spaces”.

Sorry Frank but I’m not really following you here.

Take bugreports_DuplicateSpaces. You mention a bug report from Carles
about IEEE:

http://groups.google.com/group/citeproc-js/browse_thread/thread/6f9c7620c6c97ff1?hl=en_US

Carles asks whether this is a citeproc-js bug or a style bug. You
promptly reported it was the first and it has been fixed for that
specific case - a number of other specific cases were then to come.

Shall we first have a look at the style before saying it is not
bugged?

The first issue Carles reported was:

Before New York:

[1]
D’Arcus, B., Boundaries of Dissent:
Protest and State Power in the Media Age, New York: Routledge,
2006.

This is the output of the publisher macro:

That macro is used only in two places after this:

     <text variable="citation-number" prefix="[" suffix="]"/>
     <text macro="author" prefix=" " suffix=", "/>

here:

and here:

           <group delimiter=", ">
              <text macro="title"/>
              <text variable="container-title" font-style="italic"/>
              <text macro="editor"/>
              <text macro="publisher"/>
              <text macro="page"/>
           </group>

I only see 3 possibility:

the style author wants to have an extra space before the
pablisher-place when there is also a title and/or an author name
(the ends with ", ";
the style author wants to have a space after the citation number
and before the publisher-place when there is no author, no
editor, no translator, and no title (for types like bill, book,
graphic, legal_case, motion_picture, report, and song), or, in
case of chapters and paper-conferences, it want the same space
after the citation number when there is no author, no title, no
container-title and no editor;
that extra space is unintended.

If 1. is true, your solutions is actually a bug introduced in
citeproc-js. If 2. true there is a simple solution without changing
citeproc-js:

       <text variable="citation-number" prefix="[" suffix="] "/>
       <text macro="author" suffix=", "/>

But I really doubt 2 could be true.

And what about the second part of Carles report?

Before vol. 18:

[2]
Bennett, F.G., Jr., “Getting Property Right:
‘Informal’ Mortgages in the Japanese Courts,” PRL & PJ,
vol. 18, Aug. 2009, p. 463-509.

This is a simpler case. That output is produced by this group:

           <group delimiter=", ">
              <text macro="title"/>
              <text variable="container-title" font-style="italic"/>
              <text variable="volume" prefix=" vol. "/>
              <date variable="issued">
                 <date-part name="month" form="short" suffix=". " strip-periods="true"/>
                 <date-part name="year"/>
              </date>
              <text macro="page"/>
           </group>

Here we have the usual 3 possibilities:

the author wanted to have 2 spaces after the conteiner-title and
before the “vol.” label for the volume number;
the author of the style is taking care to place a space after
the citation number if the author, the title, and the
container-title are all missing;
the space is unintended.

If 1 is true citeproc-js is bugged. If 2. is true I’ve already give
you a better solution.

I’m sorry because I’m not an English native speaker, but when case 3.
is true then the word “bugged” to qualify the style is the only one it
comes to my mind. An unintended space is not a duplicate space.

Your answer to Carles bug report was wrong. As Carles suggested the
bug was in the style.

Now, I understand that the zotero repository may be full of bugged
styles and I understand that it would be a mess to enforce a stricter
policy now. But the decision to automatically remove “duplicate
spaces” is a decision which is not technically grounded and will not
make those bugs disappear. They’ll just keep to pass unnoticed.

If you believe that the benefit of letting those bugs pass unnoticed
is greater than the mess of fixing them, this is fine with me. I will
be following the specification. But I will not change their quality
from “bug” to “duplicate spaces”. Period.

Andrea

Frank_Bennett · December 1, 2010, 8:06am

I thought Andrea answered that already: that it hides what are
effectively style bugs.

Whether they are bugs or not depends on the specification. So to
recast the question, what is lost by requiring in the specification
that duplicate spaces be suppressed?

I haven’t figured out my position on this, but I think that’s the
debate: about whether to fix these bugs in implementation code, or to
force styles authors to do it.

Yes, with the qualification that for “bugs” I would say “duplicate spaces”.

Sorry Frank but I’m not really following you here.

Take bugreports_DuplicateSpaces. You mention a bug report from Carles
about IEEE:

http://groups.google.com/group/citeproc-js/browse_thread/thread/6f9c7620c6c97ff1?hl=en_US

Carles asks whether this is a citeproc-js bug or a style bug. You
promptly reported it was the first and it has been fixed for that
specific case - a number of other specific cases were then to come.

Shall we first have a look at the style before saying it is not
bugged?

The first issue Carles reported was:

Before New York:

[1]
D’Arcus, B., Boundaries of Dissent:
Protest and State Power in the Media Age, New York: Routledge,
2006.

This is the output of the publisher macro:

That macro is used only in two places after this:
    <text variable="citation-number" prefix="[" suffix="]"/>
    <text macro="author" prefix=" " suffix=", "/>
here:

and here:
          <group delimiter=", ">
             <text macro="title"/>
             <text variable="container-title" font-style="italic"/>
             <text macro="editor"/>
             <text macro="publisher"/>
             <text macro="page"/>
          </group>
I only see 3 possibility:

the style author wants to have an extra space before the
pablisher-place when there is also a title and/or an author name
(the ends with ", ";

the style author wants to have a space after the citation number
and before the publisher-place when there is no author, no
editor, no translator, and no title (for types like bill, book,
graphic, legal_case, motion_picture, report, and song), or, in
case of chapters and paper-conferences, it want the same space
after the citation number when there is no author, no title, no
container-title and no editor;

that extra space is unintended.

If 1. is true, your solutions is actually a bug introduced in
citeproc-js. If 2. true there is a simple solution without changing
citeproc-js:
      <text variable="citation-number" prefix="[" suffix="] "/>
      <text macro="author" suffix=", "/>
But I really doubt 2 could be true.

And what about the second part of Carles report?

Before vol. 18:

[2]
Bennett, F.G., Jr., “Getting Property Right:
‘Informal’ Mortgages in the Japanese Courts,” PRL & PJ,
vol. 18, Aug. 2009, p. 463-509.

This is a simpler case. That output is produced by this group:
          <group delimiter=", ">
             <text macro="title"/>
             <text variable="container-title" font-style="italic"/>
             <text variable="volume" prefix=" vol. "/>
             <date variable="issued">
                <date-part name="month" form="short" suffix=". " strip-periods="true"/>
                <date-part name="year"/>
             </date>
             <text macro="page"/>
          </group>
Here we have the usual 3 possibilities:

the author wanted to have 2 spaces after the conteiner-title and
before the “vol.” label for the volume number;

the author of the style is taking care to place a space after
the citation number if the author, the title, and the
container-title are all missing;

the space is unintended.

If 1 is true citeproc-js is bugged. If 2. is true I’ve already give
you a better solution.

I’m sorry because I’m not an English native speaker, but when case 3.
is true then the word “bugged” to qualify the style is the only one it
comes to my mind. An unintended space is not a duplicate space.

Your answer to Carles bug report was wrong. As Carles suggested the
bug was in the style.

Now, I understand that the zotero repository may be full of bugged
styles and I understand that it would be a mess to enforce a stricter
policy now. But the decision to automatically remove “duplicate
spaces” is a decision which is not technically grounded and will not
make those bugs disappear. They’ll just keep to pass unnoticed.

If you believe that the benefit of letting those bugs pass unnoticed
is greater than the mess of fixing them, this is fine with me. I will
be following the specification. But I will not change their quality
from “bug” to “duplicate spaces”. Period.

This is not for my desk. You need to communicate with the style
authors and maintainers.

Frank_Bennett · December 1, 2010, 8:47am

I thought Andrea answered that already: that it hides what are
effectively style bugs.

Whether they are bugs or not depends on the specification. So to
recast the question, what is lost by requiring in the specification
that duplicate spaces be suppressed?

I haven’t figured out my position on this, but I think that’s the
debate: about whether to fix these bugs in implementation code, or to
force styles authors to do it.

Yes, with the qualification that for “bugs” I would say “duplicate spaces”.

Sorry Frank but I’m not really following you here.

Take bugreports_DuplicateSpaces. You mention a bug report from Carles
about IEEE:

http://groups.google.com/group/citeproc-js/browse_thread/thread/6f9c7620c6c97ff1?hl=en_US

Carles asks whether this is a citeproc-js bug or a style bug. You
promptly reported it was the first and it has been fixed for that
specific case - a number of other specific cases were then to come.

Shall we first have a look at the style before saying it is not
bugged?

The first issue Carles reported was:

Before New York:

[1]
D’Arcus, B., Boundaries of Dissent:
Protest and State Power in the Media Age, New York: Routledge,
2006.

This is the output of the publisher macro:

That macro is used only in two places after this:
    <text variable="citation-number" prefix="[" suffix="]"/>
    <text macro="author" prefix=" " suffix=", "/>
here:

and here:
          <group delimiter=", ">
             <text macro="title"/>
             <text variable="container-title" font-style="italic"/>
             <text macro="editor"/>
             <text macro="publisher"/>
             <text macro="page"/>
          </group>
I only see 3 possibility:

the style author wants to have an extra space before the
pablisher-place when there is also a title and/or an author name
(the ends with ", ";

the style author wants to have a space after the citation number
and before the publisher-place when there is no author, no
editor, no translator, and no title (for types like bill, book,
graphic, legal_case, motion_picture, report, and song), or, in
case of chapters and paper-conferences, it want the same space
after the citation number when there is no author, no title, no
container-title and no editor;

that extra space is unintended.

If 1. is true, your solutions is actually a bug introduced in
citeproc-js. If 2. true there is a simple solution without changing
citeproc-js:
      <text variable="citation-number" prefix="[" suffix="] "/>
      <text macro="author" suffix=", "/>
But I really doubt 2 could be true.

And what about the second part of Carles report?

Before vol. 18:

[2]
Bennett, F.G., Jr., “Getting Property Right:
‘Informal’ Mortgages in the Japanese Courts,” PRL & PJ,
vol. 18, Aug. 2009, p. 463-509.

This is a simpler case. That output is produced by this group:
          <group delimiter=", ">
             <text macro="title"/>
             <text variable="container-title" font-style="italic"/>
             <text variable="volume" prefix=" vol. "/>
             <date variable="issued">
                <date-part name="month" form="short" suffix=". " strip-periods="true"/>
                <date-part name="year"/>
             </date>
             <text macro="page"/>
          </group>
Here we have the usual 3 possibilities:

the author wanted to have 2 spaces after the conteiner-title and
before the “vol.” label for the volume number;

the author of the style is taking care to place a space after
the citation number if the author, the title, and the
container-title are all missing;

the space is unintended.

If 1 is true citeproc-js is bugged. If 2. is true I’ve already give
you a better solution.

I’m sorry because I’m not an English native speaker, but when case 3.
is true then the word “bugged” to qualify the style is the only one it
comes to my mind. An unintended space is not a duplicate space.

Your answer to Carles bug report was wrong. As Carles suggested the
bug was in the style.

Now, I understand that the zotero repository may be full of bugged
styles and I understand that it would be a mess to enforce a stricter
policy now. But the decision to automatically remove “duplicate
spaces” is a decision which is not technically grounded and will not
make those bugs disappear. They’ll just keep to pass unnoticed.

If you believe that the benefit of letting those bugs pass unnoticed
is greater than the mess of fixing them, this is fine with me. I will
be following the specification. But I will not change their quality
from “bug” to “duplicate spaces”. Period.
This is not for my desk. You need to communicate with the style
authors and maintainers.

To clarify, there are multiple stakeholders in this. I will not make
any changes to the behavior of the processor that would adversely
affect present consumers of CSL.

If the CSL specification is amended to dictate that spaces be passed
through verbatim, then I will cheerfully make citeproc-js behave in
that way. For testing purposes. But only for testing purposes. Such
a strict mode will not be enabled by default until I am satisfied, on
the basis of a good body of tests applied to every style in the
repository, that the change will not break the output of any style.

With the qualification in the paragraph immediately above, I’m happy
to make the processor do whatever is decided here. Meanwhile, if you
feel that I have acted irresponsibly, or that I have misrepresented
the requirements of CSL, or that I have acted without proper
consultation, I’m very sorry that you feel that way. I certainly
won’t stand in the way of changes to the test suite, for that reason
or any other reason, if supported by the consensus of this list.

In order to promote a resolution of the duplicate spaces issue, I will
now remove myself from this discussion. I have already said
everything that I had to say on the subject, and will cheerfully
follow any consensus that eventually emerges.

Frank

Bruce_D_Arcus1 · December 1, 2010, 1:46pm

Frank,

…

To clarify, there are multiple stakeholders in this. I will not make
any changes to the behavior of the processor that would adversely
affect present consumers of CSL.

The problem I have is that I don’t know how real this problem is, and
can’t balance that against the potential issues Andrea identifies.

If the CSL specification is amended to dictate that spaces be passed
through verbatim, then I will cheerfully make citeproc-js behave in
that way. For testing purposes. But only for testing purposes. Such
a strict mode will not be enabled by default until I am satisfied, on
the basis of a good body of tests applied to every style in the
repository, that the change will not break the output of any style.

With the qualification in the paragraph immediately above, I’m happy
to make the processor do whatever is decided here. Meanwhile, if you
feel that I have acted irresponsibly, or that I have misrepresented
the requirements of CSL, or that I have acted without proper
consultation, I’m very sorry that you feel that way.

While I can’t say what Andrea “feels,” I certainly did not take his
comments to be in any way intimating the above. I take his comments to
be the reasonable questions of a subsequent implementor. And given
that CSL is shaped by the practical experience of implementors, this
seems entirely appropriate.

And as I say above, I still don’t have a much sense of the real impact
not normalizing whitespace and punctuation would have on the world of
existing styles.

[snip]

Bruce

Bruce_D_Arcus1 · December 1, 2010, 2:03pm

However … a more important consideration is what’s in the spec. And
I don’t see anything on this behavior. Am I correct that we don’t
document what citeproc-js currently does?

If yes, then I would think that would favor Andrea’s argument WRT to
the (1.0) test suite.

That still leaves the question of what to do long term, of course;
whether to suggest citeproc-js deprecate this behavior, for example.

An issue that covers both issue, short and long term, is that if we
tweaked those tests Andrea identifies to be strict according to the
spec, then citeproc-js would presumably fail those tests.

Bruce

Andrea_Rossato1 · December 1, 2010, 5:04pm

Frank, I’m really sorry if you had the impression I was accusing you
of acting irresponsibly or that you misrepresented CSL requirements,
because that was far from being my “feelings”.

Actually I think you did a great job in making CSL a consistent and
robust language, providing code to corroborate your proposal and
putting together a test-suite which makes the implementation of CSL an
easy task: I’ve been able to solve almost all my problems when
upgrading to CSL-1.0 by just reading your code, your tests and your
documentation.

But in this specific case I just think you have been to gentle with
style coders, presuming a wrong result was caused by your code when
instead it was a style fault.

If you think my tone was provocative I apologize…

Andrea

Andrea_Rossato1 · December 1, 2010, 5:08pm

I must confess I did not test this issue, but I would be surprised if
citeproc-js were to fail those tweaked tests.

To put it differently, as far as I know citeproc-js behaves perfectly
with correctly formatted styles. The problem, if you want to call it a
problem - I think it is because it hides style bugs - is that it
behaves perfectly even with incorrectly formatted styles.

Andrea

Bruce_D_Arcus1 · December 1, 2010, 5:16pm

But I thought you started this by identifying a test that you thought
should produce one output but produced something else, and that the
latter was a consequence of this “cleaning” process?

If that’s the case, then changing the test to not assume cleaned up
punctuation would mean that citeproc-js would not produce "correct"
output.

At least that’s my assumption.

So we’re left with three questions really:

should the 1.0 test suite assume cleaned punctuation and whitespace
collapsing?

My answer is no, since it’s not in the spec.

if we go with no for #1, then should we change the relevant tests
accordingly (in which case citeproc-js fails), or simply remove or
otherwise change them so that the JS and haskell implementations both
pass (in which case we don’t really test this).

I don’t know.

long-term, do we want to add this behavior to the spec in some
future release?

I don’t know. Moreover, we can probably wait on #3.

Bruce

Rintze_Zelle · December 1, 2010, 5:21pm

What kind of clean up does the CSL processor in Zotero 2.0.9 perform? Does
it eliminate repeating periods/punctuation, but not spaces/whitespace?

One solution might be to enable space suppression by default, but allow for
deactivation for style editing environments (e.g. by adding a checkbox to
csledit.xul, “Deactivate space suppression”).

Rintze

Andrea_Rossato1 · December 1, 2010, 5:38pm

An issue that covers both issue, short and long term, is that if we
tweaked those tests Andrea identifies to be strict according to the
spec, then citeproc-js would presumably fail those tests.

I must confess I did not test this issue, but I would be surprised if
citeproc-js were to fail those tweaked tests.

To put it differently, as far as I know citeproc-js behaves perfectly
with correctly formatted styles. The problem, if you want to call it a
problem - I think it is because it hides style bugs - is that it
behaves perfectly even with incorrectly formatted styles.

But I thought you started this by identifying a test that you thought
should produce one output but produced something else, and that the
latter was a consequence of this “cleaning” process?

If that’s the case, then changing the test to not assume cleaned up
punctuation would mean that citeproc-js would not produce “correct”
output.

At least that’s my assumption.

Well, your assumption is correct, and the tests were meant to check
whether citeproc-js is producing the “incorrect” but intended result.

To summarize:

bugged style → intended (but incorrect) result
fixed style → intended and correct result

My assumption is that with do not want to check if citeproc-js will
produce the correct-but-not-intended result with a bugged style. We
want to check if citeproc-js will produce the correct result with a
clean style.

So we’re left with three questions really:

should the 1.0 test suite assume cleaned punctuation and whitespace
collapsing?

My answer is no, since it’s not in the spec.

if we go with no for #1, then should we change the relevant tests
accordingly (in which case citeproc-js fails), or simply remove or
otherwise change them so that the JS and haskell implementations both
pass (in which case we don’t really test this).

I don’t know.

We could check if the processor produces the correct and intended
result with a correct input in those specific cases. I’d leave the
amended tests.

long-term, do we want to add this behavior to the spec in some
future release?

I do not have any strong opinion on that. But I think a good solution
could be to require implementations to be strict when requested.

I don’t know. Moreover, we can probably wait on #3.

I think citeproc-js deployment may be a constraint: if we want the
Zotero style repository (which is quite a valuable resource to CSL) to
survive unchanged - and I think this is something valuable for CSL -
we should address the problem quickly.

Andrea

Bruce_D_Arcus1 · December 1, 2010, 5:44pm

Or, we could simply fix offending styles to not rely on this?

I’m guessing there’s a way to test for styles likely to have this
issue (I don’t know, maybe we look for group elements with a suffix
where there’s also a child with a similar suffix?).

Bruce

Andrea_Rossato1 · December 1, 2010, 6:21pm

I think citeproc-js deployment may be a constraint: if we want the
Zotero style repository (which is quite a valuable resource to CSL) to
survive unchanged - and I think this is something valuable for CSL -
we should address the problem quickly.

Or, we could simply fix offending styles to not rely on this?

I’m guessing there’s a way to test for styles likely to have this
issue (I don’t know, maybe we look for group elements with a suffix
where there’s also a child with a similar suffix?).

I think your guess is wrong, as Frank repeatedly said in this thread.

While I do not agree with the technical argument I think Frank
is factually right when he writes:

Simon_Kornblith · December 1, 2010, 7:22pm

It should be possible to test in a brute-force fashion by testing
different item types with different (likely) sets of fields and
different item types and see whether space suppression enabled/
disabled makes a difference, which would cover a large set of possible
issues.

The real problem is that someone has to go through all of the styles
and fix them, because I don’t think there’s an automated way to take
care of that.

The other real problem is that style authors will have to test their
styles with a bunch of different combinations of fields defined, since
the processor would no longer be forgiving of errors that present
themselves only under certain combinations. This is, as Frank notes, a
burden, and potentially a good reason to keep the current space
suppression behavior. 99% of the time, space suppression is going to
fix problems that have gone unnoticed, rather than create problems,
and in the uncommon event that someone explicitly wants to double a
space, there are ways to do that even with space suppression on.

Simon

Bruce_D_Arcus1 · December 2, 2010, 2:50pm

I guess I’m not convinced.

Let’s ask ourselves a high-level question: under what conditions do
unintended behavior around punctuation happen?

Tentative answer: when a CSL structure (cs:text, cs:group, etc.) has
an affix which is based on an assumption about surrounding data that
turns out not to be present, AND there’s a generated affix on the
other side.

Example for illustration:

We have a cs:group with a suffix just before a URL macro at the end of
a reference:

<cs:bibliography suffix=".">
<cs:group suffix=",">
…
</cs:group>
<cs:text macro=“URL”/>
cs:bibliography

If there’s no URL output, we have a spurious comma at the end of the
reference “,.”. So we have:

suffix for group
empty macro result
suffix for bibliography

A hypothesis, then: maybe the problem is in particular with elements
with suffixes followed by trailing cs:text elements?

Another problem: when affixes are attached to the macro definition,
rather than to the cs:text that invokes it. E.g. better to do:

<cs:text macro=“URL” prefix=", "/>

To Simon’s (and I think Frank’s) point, the main problem is we just
don’t know how big the problem is. When I get some time, I’ll see if I
can code something up and test it.

Bruce

Andrea_Rossato1 · December 2, 2010, 3:54pm

I think citeproc-js deployment may be a constraint: if we want the
Zotero style repository (which is quite a valuable resource to CSL) to
survive unchanged - and I think this is something valuable for CSL -
we should address the problem quickly.

Or, we could simply fix offending styles to not rely on this?

I’m guessing there’s a way to test for styles likely to have this
issue (I don’t know, maybe we look for group elements with a suffix
where there’s also a child with a similar suffix?).

I think your guess is wrong, as Frank repeatedly said in this thread.

While I do not agree with the technical argument I think Frank
is factually right when he writes:

While I agree with Andrea that suppressing extraneous spaces in the
processor is a burden – I was surprised to discover, in
post-deployment feedback, the variety of situations in which they can
arise – there are a couple of reasons for going the extra mile. If
the processor is forgiving in this case (as well as with duplicate
punctuation), that makes it easier to recombine macros. There is also
an argument for making things easy for style authors and maintainers,
since their time is a scarce resource in the ecosystem.

We also lack a testing framework and test cases for the styles
themselves. Without a means of catching misformatting before
deployment, passing spaces more strictly would probably result in more
list traffic, with glitches turning up against novel data combinations
in the hands of users.

I guess I’m not convinced.

Let’s ask ourselves a high-level question: under what conditions do
unintended behavior around punctuation happen?

Tentative answer: when a CSL structure (cs:text, cs:group, etc.) has
an affix which is based on an assumption about surrounding data that
turns out not to be present, AND there’s a generated affix on the
other side.

Example for illustration:

We have a cs:group with a suffix just before a URL macro at the end of
a reference:

<cs:bibliography suffix=“.”>
<cs:group suffix=“,”>
…
</cs:group>
<cs:text macro=“URL”/>
cs:bibliography

If there’s no URL output, we have a spurious comma at the end of the
reference “,.”. So we have:

suffix for group

empty macro result

suffix for bibliography

A hypothesis, then: maybe the problem is in particular with elements
with suffixes followed by trailing cs:text elements?

Another problem: when affixes are attached to the macro definition,
rather than to the cs:text that invokes it. E.g. better to do:

<cs:text macro=“URL” prefix=", "/>

To Simon’s (and I think Frank’s) point, the main problem is we just
don’t know how big the problem is. When I get some time, I’ll see if I
can code something up and test it.

I’m finishing a few tests over the Zotero style repository to get some
numbers (I have already done some testing but I need some more time -
obviously I’m testing with citeproc-hs). I think that will be helpful
in getting a better understanding of the problem. I think you are
right if you think the problem can be circumscribed to a few patterns.
I’ll come back soon with the data.

Andrea

Topic		Replies	Views
Version 1.0.47 release CSL Development	0	277	July 17, 2010
citation delimiter issue CSL Development	5	295	April 12, 2011
Punctuation issues CSL Development	31	423	August 15, 2006
Delimiter in name substitute CSL Development	14	293	October 10, 2012
some clean-up CSL Development	2	290	August 29, 2013

how much bugged a style may be?

Related topics