As I anticipated I coded a few tests for getting the numbers. I didn’t
have much time, so this is just some preliminary data. If you think it
is worth I can do some more testing. Take into account this tests
could expose citeproc-hs bugs. They would be probably more meaningful
if conducted with citeproc-js.
First a description of the test procedure.
I’m not very much familiar with Zotero, and I don’t know the style
repository structure. I first tried to update, with the csl-utils, the
"dependent" directory (with ~1150 styles) but I failed with some error
I had no time to investigate. So I tried with the root directory (with
about 275 styles) and I succeeded at the first try.
I had to remove all.csl, blank.csl, irish-historical-studies.csl,
journalistica.csl, sp-legal.csl, test.csl, unisa-harvard.csl,
unisa-harvard3.csl, vienna-legal.csl. This for various reasons -
citeproc-hs bugs I suppose: the “sp” locale is not recognized
(shouldn’t it be “es”?); I do not have “en-EI” and “en-AU” locale
files; citeproc-hs fails if the element is not present.
I tested the styles using only on type, article-journal, because it
instantiates a large number of variables. So I stressed the styles
with missing data: no container-title, no volume, etc. One entry
(ITEM-3) is just a date with a page range.
There are two tests.
The first one tests the presence of two (or more) consecutive spaces.
The results seems to depend quite a lot on the quality of the input
data.
With ITEM-3 106 styles out of 271 will produce consecutive spaces.
Removing ITEM-3 70 styles will fail. I had no time to read the styles.
The citeproc-js approach will entirely remove the problem, here.
The second test checks if the style is producing a space followed by a
punctuation. This is another common case where the possible absence of
some data has not be taken into account.
With ITEM-3 111 styles out of 271 will fail. Without ITEM-3 54.
So I finally checked if the two problems overlaps, but I had no time
for doing the math (I spent the afternoon/night with my two year old
daughter). Still I think maybe 50-60 suffer of both problems (with
ITEM-3).
Now, I don’t know how meaningful this data may be. Is it meaningful to
test styles with low quality input data? I think it is, if we want to
appraise the quality of styles and their use of CSL expressiveness.
The test logs can be found here:
http://gorgias.mine.nu/citeproc/zoteroStyle/
If you want the code (which is not optimized - it takes five minutes
to run a test on my eeepc) I can post it.
Andrea