Locale files - script variants

A user wishes to contribute a Latin variant of the Serbian CSL locale
file (in addition to the existing Cyrillic variant). See

Is it acceptable to add a script subtag to the locale file name and
xml:lang value? E.g. “locales-sr-RS-Latn.xml” (for Latin) and
“locales-sr-RS-Cyrl.xml” (for Cyrillic) (I guess we could omit the
script subtag from one variant, e.g. we could keep the Cyrillic
variant as “locales-sr-RS.xml”). See also
http://www.w3.org/International/articles/language-tags/Overview.en.php#script

Rintze

I know that Frank built citeproc-js to be aware of BCP 47 semantics, which
are used extensively in MLZ, but can we do this without demanding the same
of other processors?

I know that Frank built citeproc-js to be aware of BCP 47 semantics, which
are used extensively in MLZ, but can we do this without demanding the same
of other processors?

Avram,

I think that in locale filenames, citeproc-js will only handle the
first two elements of a tag at the moment. What sort of syntax do you
have in mind?

Frank

Off-list, Rintze and I came to the conclusion that a processor can
just treat the RS-Latn pair as a single tag, with the same fallback
behaviour as a single element. In this case, sr-RS-Latn would fall
back to plain-vanilla sr, with whatever default mapping defined for it
in the processor (so sr-RS). That covers script variants under the RFC
5646 specificaton, and should be simple to implement in processors.
citeproc-js needs a small adjustment to handle the filename in that
case, but it should be easy to do.

So I’m fine with including the file in the repo, if others are happy with it.

Frank

Minor thing: shouldn’t file names be lower-case generally?

I’ll defer to others on filenaming conventions, but RFC 5646 language
tags are not case sensitive, so there would be no problem with
lowercasing as far as the specification is concerned.

Here, we run into a conflict with the standards for the language tags,
which are generally cased in a particular way.

Here is the language from RFC 5646:

The ABNF syntax also does not distinguish between upper- and
lowercase: the uppercase US-ASCII letters in the range ‘A’ through
’Z’ are always considered equivalent and mapped directly to their US-
ASCII lowercase equivalents in the range ‘a’ through ‘z’. So the tag
"I-AMI" is considered equivalent to that value “i-ami” in the
’irregular’ production.

Although case distinctions do not carry meaning in language tags,
consistent formatting and presentation of language tags will aid
users. The format of subtags in the registry is RECOMMENDED as the
form to use in language tags. This format generally corresponds to
the common conventions for the various ISO standards from which the
subtags are derived.

So Avram and I are not at odds. For readability, you would adhere to
the convention of using uppercase.

Frank