Locale files: Use ordinal instead of suffix="." in some date formats

Date formats in some locales, including Danish, German, Hungarian and
others, contain dots, as, e.g., Danish “1. Marts 2013”.

CSL locale files specify such date formats as

The number, e.g., “1.”, however is not really a cardinal number with a
suffix, but rather an ordinal.

If the ordinal suffix is defined as .,
there usually is no difference in output between using

<date-part name="day" suffix=". "/>

and

<date-part name="day" form="ordinal" suffix=" "/>

Only when trying to format date ranges, this difference becomes
crucial: Both citeproc-js and pandoc-citeproc do not output the
expected “1.–2. Marts 2013”, but “1–2. Marts 2013”.

(On pandoc-citeproc, see
https://github.com/jgm/pandoc-citeproc/issues/12 and
https://github.com/jgm/pandoc-citeproc/issues/18; for two citeproc-js
tests see https://gist.github.com/nickbart1980/8271897)

It seems the processors are behaving as expected: the first suffix in
a range is removed (alternatively, you could say only one suffix is
rendered after the day range) – and it has to be, or else you’d get
"March 1,–2, 2012" in en-US.

Thus, if all dots must be kept with their numbers, as is the case for
Danish, German, Hungarian, and other dates, the number format must not
be defined as number plus suffix “.”, but as ordinal number (with the
ordinal suffix defined, elsewhere, as “.”).

Thus I’d like to propose the following:

  1. in all CSL locale files where ‘.’, and
    where in ‘<date-part name=“day” …>’ and ‘<date-part name=“month”
    …>’ definitions the ‘suffix=’ element contains a dot, add
    ’form=“ordinal”’ or ‘form= “ordinal-leading-zeros”’ (see below) to the
    definition, and remove the dot from the suffix element.

  2. add the value “ordinal-leading-zeros” to the form attribute for
    date-parts day and month to enable leading zeros in date formats such
    as “01.03.2013” and “01.–23.03.2013”.

As you’re probably aware, the value “ordinal-leading-zeros” currently
doesn’t exist in CSL. I’ve read your pandoc tickets and googled a bit,
and I’m not yet convinced that the periods in dates like "01.03.2013"
truly represent ordinals suffixes. The entire format also has been
deprecated (search e.g. for “DIN 5008” in
http://www.cl.cam.ac.uk/~mgk25/iso-time.html ; or see the entry for
Germany at http://en.wikipedia.org/wiki/Date_format_by_country ), so
I’m not ready to add this attribute value right now. My opinion can be
easily swayed if an authoritative source is available, of course.

For all non-numeric dates, I agree that we should remove the suffix
from the “day” date-part and rely on ordinal suffixes instead for the
appropriate locales. Nick, can you open a pull request for the
"locales" repo?

Rintze

Rintze - you only mean 01.03.2013, right? Because 1.3.2013 is most
certainly still the standard date format in Germany, no matter what the DIN
says. And it’s read as “Erster Dritter 2013”, so these are ordinals. Not
sure about the leading zeros either, though.

So we would need “ordinal” as an additional option for the "month"
date-part as well?

Rintze

Ah you’re right. If we’re going that route - yes. I also don’t have strong
feelings on this, but I think what Nick says makes sense and the only
reason it hasn’t come up before is that Zotero et al don’t pass on range
data to the citeprocs, so this will likely become more important once they
do.

I have issued a pull request for the locale files for cs-CZ, da-DK, de-AT,
de-CH, de-DE, et-EE, fi-FI, is-IS, nb-NO and nn-NO. To the best of my
knowledge, all these languages use periods as ordinal indicators, these
ordinal indicators must not be separated from their numbers, and the format
for day ranges is “d.-d. monthname yyyy”.

Other languages listed as using periods as ordinal indicators in <
http://en.wikipedia.org/wiki/Ordinal_indicator> include Croatian, Faroese,
Hungarian, Latvian, Polish, Slovak, Slovene, Serbian, and Turkish.

I have left the locale files for these unchanged for the moment:

hr-HR, sk-SK, sl-SI, and sr-RS have leading zeros in their textual day
formats, so this will have to wait until CSL allows leading zeros for
ordinals.

Faroese has no CSL locale file.

hu-HU and lv-LV have a YMD format, and I am not sure how ranges are handled
here.

pl-PL’s textual day format currently does not contain a period (though
http://en.wikipedia.org/wiki/Date_and_time_notation_in_Poland claims they
are used “often”).

And in tr-TR, periods seem to be used in the numerical but not the textual
form (see http://en.wikipedia.org/wiki/Date_and_time_notation_in_Turkey).

If anyone has more specific information on any of these, please let me know.

As to leading zeros for ordinals, both in textual and numerical date
formats, it seems these are used in many locales, are actually quite
common, and are also specified by style files:

  • Leading zeros are used in the textual day formats of current hr-HR,
    sk-SK, sl-SI, and sr-RS CSL locale files. All these use “.” suffixes that
    should be changed to ordinals.

  • Ordinals with leading zeros are more popular than ordinals without in a
    number of locales: A google search for a numerical date on German sites
    reports 9,110,000 results for ‘“01.01.2014” site:de’, compared with 680,000
    results for ‘“1.1.2014” site:de’; a similar search on Norwegian sites
    reports 784,000 dates with leading zeros and 38,500 without; for Finnish
    sites it’s still 252,000 with vs. 673,000 without leading zeros, and for
    Czech sites 593,000 vs. 1,350,000.

  • The style guide used for harvard7de.csl (German) uses leading zeros for
    day and month in numerical dates.

Taken together, I would say these are excellent arguments in favour of
introducing the option of using leading zeros for ordinals in CSL.On 7 January 2014 05:21, Sebastian Karcher <@Sebastian_Karcher>wrote: