I’ve been doing some work on my citeproc-py
(GitHub - brechtm/citeproc-py: Yet another Python CSL Processor) and have written down some
questions/remarks about some of the tests and the CSL spec. Note that I
could simply be misunderstanding/misinterpreting things for some of these.
the CSL spec is contradictory about number detection
Tests whether the given variables contain numeric content.
versus
Content is considered numeric if it solely consists of numbers.
For example, “2nd” tests “true” whereas “second” and “2nd edition”
test “false”.
does not seem to agree with condition_IsNumeric
Chicago page range format: what do do with five or more digits?
Which values are allowed for the “page” input field? I see multiple
ranges can also be specified. I think the CSL spec should, in general,
also define the format of the input fields. Personally, I would opt for a
structured format (like the date fields) as opposed to a string-format
(the page field). Individual CSL processors can still convert a
string-formatted field to the structured data. This would require changes
to the tests.
Shouldn’t “page-first” be a number variable? It is used with number in
page_NumberPageFirst
The spec doesn’t say anything about the nested groups special case.
variables_TitleShortOnShortTitleNoTitleCondition seems to disagree with
the CSL spec:
cs:group and its child elements are suppressed if a) at least one
renderingelement in cs:group calls a variable (either directly or via
a macro), and b)all variables that are called are empty.
In the group in the else section only the title variable is called. For
ITEM-3, this variable is empty, so the group should be suppressed, but it
isn’t.
Should a nested group always act as if it’s (successfully) calling a
variable? If so, the spec should mention this.
I seem to remember citeproc-js postprocesses its output to remove
duplicate affixes. The CSL spec doesn’t say anything about this, AFAIK.
What’s the official stance on this? I would personally avoid doing this,
unless the spec includes an unambiguous definition on how this should work.
locale_TitleCaseGarbageLangEnglishLocale: is “en” a valid locale? If so,
and default-locale=“en”, which locale should we use?
textcase_SkipNameParticlesInTitleCase (1): I believe this behavior is
not part of the CSL spec, is it?
textcase_SkipNameParticlesInTitleCase (2): the result doesn’t seem to
follow the CSL spec. The ‘a’ after the colon should be capitalized:
In both cases, stop words are lowercased, unless they are the first or
lastword in the string, or follow a colon.
date_VariousInvalidDates: why is ‘Spring’ in the output?
page_Chicago: is the example S input data correct? It strikes me as a
confusing way of representing a page range (in addition to saving only a
single digit).
A large number of tests test functionality that is not in the CSL spec,
but is provided by citeproc-js (raw dates, static ordering, literal names,
…). I think these should be indicated as such, or perhaps moved to a
separate directory. This would make it easier to check the other CSL
processor’s compatibility.
the CSL spec is contradictory about number detection
Tests whether the given variables contain numeric content.
versus
Content is considered numeric if it solely consists of numbers.
For example, “2nd” tests “true” whereas “second” and “2nd edition”
test “false”.
does not seem to agree with condition_IsNumeric
I can see how the current description in the specification might be
somewhat confusing, but it is meant to agree with https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/condition_IsNumeric.txt.
In “Tests whether the given variables contain numeric content.”
(Redirecting…), I
mean to say that the test is against the entire string contents of
each variable. In a string like “2nd edition”, the “edition” substring
means that the entire string is non-numeric.
Chicago page range format: what do do with five or more digits?
The specification currently links to http://www.aahn.org/guidelines.html, but it seems like the content we
relied on moved to http://www.aahn.org/stylesheet.html . The latter
page shows an excerpt from CMoS that we almost copied verbatim.
Sebastian, could you check if CMoS 16th edition gives any guidance on
number ranges of 5 or more digits?
Which values are allowed for the “page” input field? I see multiple
ranges can also be specified. I think the CSL spec should, in general,
also define the format of the input fields. Personally, I would opt for a
structured format (like the date fields) as opposed to a string-format
(the page field). Individual CSL processors can still convert a
string-formatted field to the structured data. This would require changes
to the tests.
The spec doesn’t say anything about the nested groups special case.
variables_TitleShortOnShortTitleNoTitleCondition seems to disagree with
the CSL spec:
cs:group and its child elements are suppressed if a) at least one
renderingelement in cs:group calls a variable (either directly or via
a macro), and b)all variables that are called are empty.
In the group in the else section only the title variable is called. For
ITEM-3, this variable is empty, so the group should be suppressed, but it
isn’t.
Should a nested group always act as if it’s (successfully) calling a
variable? If so, the spec should mention this.
I seem to remember citeproc-js postprocesses its output to remove
duplicate affixes. The CSL spec doesn’t say anything about this, AFAIK.
What’s the official stance on this? I would personally avoid doing this,
unless the spec includes an unambiguous definition on how this should work.
I’m convinced that CSL processors need to do some suppression of
duplicated punctuation. Frank just prepared some tests that describe
the current behavior in citeproc-js, and I hope to write up some
requirements for the specification in the next few weeks based on
those. See
Don’t know. I think you can ignore this unit test. Frank?
page_Chicago: is the example S input data correct? It strikes me as a
confusing way of representing a page range (in addition to saving only a
single digit).
A large number of tests test functionality that is not in the CSL spec,
but is provided by citeproc-js (raw dates, static ordering, literal names,
…). I think these should be indicated as such, or perhaps moved to a
separate directory. This would make it easier to check the other CSL
processor’s compatibility.
Sylvester Keil proposed using a Cucumber format for unit tests, which
would allow tests to be tagged:
If somebody else helps with the technical infrastructure, I’d be happy
to help reclassifying the existing unit tests.
CMoS page range specs don’t change for ranges with more than 4 digits, i.e.
"Use two digits unless more are needed to include all changed parts"
12345-46
12345-678
12345-6789
and the different rules for multiples of hundred and the first nine digits
thereafter remain,
i.e. cite all digits when dealing with multiples of hundred
12300-12345
and only the changed digit(s) for the first ten thereafter
12301-8On Thu, Aug 8, 2013 at 2:05 PM, David Lawrence <@David_Lawrence>wrote:
shouldn’t we write something like
“If numbers are four or more digits long and three or more digits
change, use all digits” ?
Yes, this is the the reason why I asked in the first place. I should
probably have mentioned that.
For now, in citeproc-py, if the number of common digits between the start
en end page numbers is less than two, it uses the expanded form.
12345-468
12345-13576
123456-5614
I’m not sure which I prefer. As long as it’s clearly defined, I’m happy
The current CSL spec (and thus Brecht’s implementation in -py) is incorrect
according to the current CMoS.
Here are three examples from the relevant chapter (9.60)
1496–500
11564–615
12991–3001
i.e. even when only one digit stays the same, only the changing digits are
displayed after the en-dash.
Since we call this rule “Chicago” we should change this in the specs (and
implementers should change this accordingly).
According to the manual, these rules have never changed, so we must have
gotten that wrong at some point. Sorry for never catching that.