Undoubtedly the new schema is a closer representation of reality and treat
language as language and locale as locale and those two things as separate,
but the benefits of adopting the new scheme go beyond organization and
elegance:
-
It is a lot easier to build a new file and test it, including the
fall-back … pushing i18n a bit further. -
the token translations become single-sourced and making their
maintenance a lot easier and above all more accurate, for example, right
now a change in de-DE may, inadvertently, affect people using de-AT. -
it would allow the translations to start with the top language file.
The case of pt-BR is an obvious example of the problem that this
represents. The person that did the translation obviously started at the
en-US (as recommended by the documentation) as opposed to the more natural
pt-PT file, and not only missed several terms, left others untranslated,
mistranslated a few others and missed the specification of the gender for
the ordinals, all of it was done in the pt-PT file. If this were located in
a general language file, the translator would only confirm a few things and
notate the differences. -
but the most important one is that it would allow the control of the
translation of each individual token and a control on its quality. I have
all of them loaded in database along with the translations from BibLaTeX
and a few other projects. Comparing frequencies for the ones that are
different (among the projects) one can see some translations that have been
done by a machine and not a human! -
the locale files can now be written by a single script run and comes
out factored in language and locale directly from the db. -
right now I can only see the DB from the inside, but I am preparing a
web-interface where anyone will be able to see the token in 3-4 languages
of their choice and enter their own translation and even save a locale file
-
for them and for the project. Mistakes can be more readily identified and
fixed.-
from the db one can interact with other projects in the same area, like
BibLaTeX which shares some 75% of the translation tokens with CSL. Some of
the incorrect translations I have detected come from languages and tokens
supported in BibLaTeX. The cross-pollination should go both ways. -
the db is fairly complete on standard tokens (months, ordinals, names
of countries, languages, etc …) on more than 180 languages, leaving the
worker of a translator to a few other entries, and being able to generate
quite a few more locale files.
-
Paulo Ney