The current citeproc-js implementation suppresses multiple spaces only
where they arise from the combination of neighboring affixes (prefix,
suffix or delimiter). If two spaces are explicitly set within a
single affix, they will be passed through as they stand.
(Confirmatory test checked in a few minutes ago.)
While I agree with Andrea that suppressing extraneous spaces in the
processor is a burden – I was surprised to discover, in
post-deployment feedback, the variety of situations in which they can
arise – there are a couple of reasons for going the extra mile. If
the processor is forgiving in this case (as well as with duplicate
punctuation), that makes it easier to recombine macros. There is also
an argument for making things easy for style authors and maintainers,
since their time is a scarce resource in the ecosystem.
Well, I did not say it is a burden. Actually here it is just a matter
of changing this function:
isPunct = and . map (flip elem ".,;:!?") $ headInline is
into this one:
isPunct = and . map (flip elem ".,;:!? ") $ headInline is
a single character would do the job.
So I removed that character and saw what happened. citeproc-hs was
passing 401 tests and now it passes 392. 3 happen to be bugs that
character was hiding.
Shall we have a look at the rest?
bugreports_DuplicateSpaces: this is clearly a style bug (related to
the one that opened this thread, actually):
the macro “publisher-place” has a space prefix when publisher-place is
set, but the macro is used inside a group with a delimiter:
<group delimiter=", ">
<text macro="title"/>
<text macro="publisher"/>
</group>
The prefix is not needed.
bugreports_DuplicateSpaces3: another style bug you are trying to hide.
<group prefix=" ">
<text term="in" text-case="capitalize-first" suffix=": "/>
<text macro="editor"/>
<text variable="container-title" font-style="italic" prefix=" " suffix="."/>
<text variable="volume" prefix="Vol " suffix="."/>
<text macro="edition" prefix=" "/>
the “container-title” has a prefix regardless the presence of an
editor. The correct solution should be:
<group prefix=" ">
<text term="in" text-case="capitalize-first" suffix=": "/>
<text macro="editor" suffix=" "/>
<text variable="container-title" font-style="italic" suffix="."/>
<text variable="volume" prefix="Vol " suffix="."/>
<text macro="edition" prefix=" "/>
bugreports_LabelsOutOfPlace hides a bug in chicago-full: a macro
starting with a prefix=" "
...
used inside a group with a delimiter ending with a space:
<group delimiter=". ">
<text macro="contributors" />
<text macro="title" />
<text macro="description" />
<text macro="secondary-contributors" />
bugreports_parenthesis and bugreports_DuplicateSpaces2 are quite
interesting tests since they are both testing the same group in the
same style, mhra-x.csl
This is tough and could be indeed a case where
"If the processor is forgiving in this case (as well as with
duplicate punctuation), that makes it easier to recombine macros.
we are not talking of macros, though. Still:
<group prefix=" " suffix="">
<text variable="container-title" font-style="italic" prefix=" "/>
<text variable="volume" prefix=" "/>
<text variable="issue" prefix=", no. "/>
<date variable="issued" prefix=" (" suffix=")">
<date-part name="month" suffix=" "/>
<date-part name="day" suffix=", "/>
<date-part name="year"/>
</date>
<text variable="page" prefix=": "/>
</group>
We would like to use a group delimiter set to a space, but if there is
a “issue” the delimiter should not be applied. That could be a use
case for your solution, if and only if CSL is not expressive enough
for handling this kind of cases.
But what about:
I prepared a test here:
http://gorgias.mine.nu/citeproc/haskell_ExtraSpacesMhra-x.txt
I do not need to suppress any space to pass it.
collapse_TrailingDelimiter is specifically bugged as far as my
understanding of CSL goes. Shouldn’t “et-al” be subject to the
delimeter?
<names variable="author">
<name and="symbol" delimiter=", " form="short" />
<et-al prefix=" " />
</names>
simplespace_case1 is another case of a style bug you are trying to
hied: harvard1.
<group prefix=" " delimiter=" ">
<text term="in" text-case="capitalize-first"/>
<text macro="editor"/>
<text variable="container-title" font-style="italic" suffix="."/>
<text variable="collection-title" suffix="."/>
<group suffix="." delimiter=", ">
<text macro="publisher" prefix=" "/>
<text macro="pages"/>
</group>
</group>
You have a group with a delimiter (" ") and a member with a prefix set
to " ".
We also lack a testing framework and test cases for the styles
themselves. Without a means of catching misformatting before
deployment, passing spaces more strictly would probably result in more
list traffic, with glitches turning up against novel data combinations
in the hands of users.
In the short run you are right, bugged styles will survive because
their bugs will be hidden by your 300 lines of code.
So although coding for the suppression of extra spaces is a headache,
there were some reasons for doing it.
It is not a headache. It hides bugs away pretending there is none. I
think using bugged styles to check if the processor hides them well is
a faulty approach to creating a robust standard language for
formatting citations.
I think I gave you enough evidence to for asking you to be more
specific on the advantages you think extra space elimination would
bring to the clarity and consistency of CSL.
Sorry I was so long.
Andrea