locale files - Part I

This is a long (series of) e-mail containing a request for a small change,
a request for a not-so-small that I discussed in private with
Rintze and then a proposal for a broader framework to manage the "locale"
files and facilitate the i18n of CSL further and provide some
cross-pollination with other open source projects (first in line being

1- The simple request: The renaming of the file “locales-ar-AR.xml”. While
it is understandable that the file contain only generic stuff of Modern
Arabic, and the person that named it probably tried to copy the other
generic language naming: en-EN, fr-FR, pt-PT, de-DE (repeating the
token)… in this case it created an aberration. The first token "ar"
stands for the language abbreviation (Arabic) and the second “AR” for the
country (Argentina) making for a very odd combination of a locale never
used before!!! :slight_smile:

My proposal is that the name gets changed to a generic "locales-ar.xml"
just like the name of the Basque file (locales-eu.xml) , and then other
locale-specific arabic files -ar-DZ, -ar-AE, -ar-EG, … could be build
(fairly quickly) from considering the changes to a central file.

The incorrect labelling of that file is a sign of a bigger problem with the
structure of the locale files in CSL - the lack of a standard place to
place language-dependent tokens… which takes me to the next request,
coming on Part II.

Paulo Ney

ar-AR is obviously wrong. I’d be happy to accept a pull request fixing that
to ar or one of us will do it eventually, I’m a bit swamped atm.On Sun, Oct 6, 2013 at 5:52 AM, Paulo Ney de Souza <@Paulo_Ney_de_Souza>wrote:

The main question here is whether we want a country-specific code,
like “ar-EG” (Egypt), or just the language tag “ar”. So far we always
included country subtags wherever possible. The only exception was
"eu" for Basque, since that language isn’t associated with a country.

Since Paulo mentioned the locale contains Modern Standard Arabic
(http://en.wikipedia.org/wiki/Modern_Standard_Arabic), which doesn’t
have a clear country of origin, I suggest “ar”. That might also be
more in line with what we want to do with other locale files (like
renaming “de-DE” to “de”, see Paulo’s follow-up posts).


citeproc-js can adapt to using two-character language codes as first
fallback without much trouble (if the corresponding locale files are

For term-specific fallback, we already handle that in the context of
locale variants embedded in the styles. Making that work for locales
themselves in citeproc-js should not require much effort (it may work
that way already).

That’s for citeproc-js only, of course.


Actually, fallback within the locale files themselves is already part
of the CSL spec:
(see steps 4 to 6). It just never really came up before because (with
very few exceptions) all current locale files contain the same set of