locale files - Part III - the benefits

Undoubtedly the new schema is a closer representation of reality and treat
language as language and locale as locale and those two things as separate,
but the benefits of adopting the new scheme go beyond organization and

  • It is a lot easier to build a new file and test it, including the
    fall-back … pushing i18n a bit further.

  • the token translations become single-sourced and making their
    maintenance a lot easier and above all more accurate, for example, right
    now a change in de-DE may, inadvertently, affect people using de-AT.

  • it would allow the translations to start with the top language file.
    The case of pt-BR is an obvious example of the problem that this
    represents. The person that did the translation obviously started at the
    en-US (as recommended by the documentation) as opposed to the more natural
    pt-PT file, and not only missed several terms, left others untranslated,
    mistranslated a few others and missed the specification of the gender for
    the ordinals, all of it was done in the pt-PT file. If this were located in
    a general language file, the translator would only confirm a few things and
    notate the differences.

  • but the most important one is that it would allow the control of the
    translation of each individual token and a control on its quality. I have
    all of them loaded in database along with the translations from BibLaTeX
    and a few other projects. Comparing frequencies for the ones that are
    different (among the projects) one can see some translations that have been
    done by a machine and not a human!

  • the locale files can now be written by a single script run and comes
    out factored in language and locale directly from the db.

  • right now I can only see the DB from the inside, but I am preparing a
    web-interface where anyone will be able to see the token in 3-4 languages
    of their choice and enter their own translation and even save a locale file

  • for them and for the project. Mistakes can be more readily identified and

    • from the db one can interact with other projects in the same area, like
      BibLaTeX which shares some 75% of the translation tokens with CSL. Some of
      the incorrect translations I have detected come from languages and tokens
      supported in BibLaTeX. The cross-pollination should go both ways.

    • the db is fairly complete on standard tokens (months, ordinals, names
      of countries, languages, etc …) on more than 180 languages, leaving the
      worker of a translator to a few other entries, and being able to generate
      quite a few more locale files.

Paulo Ney