CSL style formatter

Hi all,

I finished a JavaScript implementation of
https://github.com/citation-style-language/utilities/blob/master/csl-reindenting-and-info-reordering.py,
live at http://rintze.zelle.me/style-formatter/. Feedback welcome.

The tool indents and formats styles according to our CSL style
repository standards. Like the Python script, it also reorders the
elements in cs:info, which circumvents a limitation in RELAX NG and
allows for stricter validation against
https://github.com/citation-style-language/schema/blob/master/csl-repository.rnc.

It is probably useful as a standalone tool (maybe at
"http://formatter.citationstyles.org"), but we might want to integrate
it into the CSL validator (http://validator.citationstyles.org/). We
could also create a CSL style submission wizard for users that starts
with validation against csl.rnc (to fix the big errors), followed by
reindenting and reordering and then validation against
csl-repository.rnc (along with any additional automated checks we can
come up with), and finally the creation of a pull request. Frank
already wrote some code for the last bit for his MLZ/Juris-M project.

Rintze

Oh, and it can be tested with
https://gist.github.com/rmzelle/e3311deb8826c7d86376 and
https://gist.github.com/rmzelle/352f12a5a290118d76b6 . The examples
demonstrate reordering of cs:info child elements (and XML comments),
reindenting, trimming of the style title, and escaping of visually
hard-to-indentify characters.

Rintze

Cool!

But why do we care about element order in the metadata?

I do believe we could construct the RNG to ensure a particular order. It
just doesn’t seem a good idea, as order isn’t relevant to the content
meaning.

The repository has established stricter validation including such things as
order in order to simplify maintenance and review tasks as the total number
of styles continues to grow.

Tools like this help to bring styles into that cleaned up and standardized
state.

Yeah, the order is similar to indenting: there’s no reason to enforce it
for validation since it doesn’t matter for functionality, but it makes
sense for the repository since it helps with maintenance.

Congrats Rintze, very happy to have these, I’ll give them a proper spin
over the next week.

Well, from a pure XML perspective the order of elements in cs:info is
indeed arbitrary, but with an unordered list we run into a limitation
of RELAX NG. With the use of interleave, which permits arbitrary
element order, we can write only one definition per element tagName
(http://books.xmlschemata.org/relaxng/relax-CHP-6-SECT-9.html and
http://www.relaxng.org/pipermail/relaxng-user/2004-April/000432.html).
This is problematic since we reuse some elements (cs:link,
cs:category) with different sets of attributes. Ordering cs:info makes
it possible to forgo interleave and make sure there is:

  • only one cs:link with “self”
  • any number of cs:link with “template”
  • at least one cs:link with “documentation”
  • only one cs:category with “citation-format”
  • any number of cs:category with “field”.

(see https://github.com/citation-style-language/schema/blob/master/csl-repository.rnc#L26)

And having cs:info ordered makes it much easier for reviewers to make
sure all the pieces are there.

Rintze

P.S. Bruce, I have an old email from you where you wrote:—

On Tue, Apr 26, 2011 at 10:43 AM, Rintze Zelle <> wrote:

Any specific hints? I’d like to keep the cs:category elements within cs:info
unordered (and I would preferably not require that the cs:category elements
should be grouped together), but I would like to limit the occurrence of
cs:category carrying citation-format to once.

I don’t believe that’s possible. You can do:

field = element category { attribute type { "field " } }
format = element category { attribute type { "format " } }

category= field | format

But you can’t then keep them unordered.


When working on things related to the CSL project I thought “oh
thankfully the order is deterministic”. Probably was when I used hacky
ways to count csl files that contains something (without parsing each
one of them), but it’s useful that all are indented and sorted. I do
appreciate it. And much easier to read as well.

Well, for the repository I only run the Python script by hand every
once in a while, so not all styles are indented and sorted all the
time.

Rintze