Styles repository - status update

Hi all,

Just an update of what I’ve been up to lately:

Over the past three months or so, I’ve checked all dependent styles to
see if I could identify their journals. I also checked if each journal
was still active, added missing ISSNs, marked up eISSNs as such, and
added default-locales and documentation links. This was prompted by
the rather poor state of all the “vancouver” dependents. Simon added
these a few years ago based on an official list
(http://www.icmje.org/journals.html), but it’s become quite clear to
me that this list is poorly curated. E.g., we had a whole bunch of
styles for journals that were long discontinued (which went as far
back as the eighties), and I must have deleted over a 100 styles. Many
styles that had “vancouver” as the parent also needed to point to a
different Vancouver variant or a wholly different style.
It was rather labor intensive, but I think our dependent styles are
now in much better shape.

Going forward, I would like to become a bit more strict about the
dependent styles that go into the repository. I plan to write an
extension to the CSL schema so that we can use Travis CI to check for
additional requirements. I would like to require each dependent style
to have a default-locale (possible exceptions for multilingual
styles), a documentation link, and a self link. I also would like to
disallow having two (or more) instances of cs:issn, since generally
one should be a cs:eissn.

You might also have noticed that Charles added some Springer styles,
which bumped us to having over 4000 CSL styles (of which 800 are
independent). Charles has made a nice script that allows us to quickly
create dependent styles from just a template and a CSV file with (at a
minimum) journal titles and ISSNs (see
https://github.com/citation-style-language/utilities/tree/master/generate_dependent_styles
). In addition to the Springer styles, Charles and I have prepared
metadata for over a dozen of other publishers. I hope we can get
similar data for other publishers (maybe Elsevier?).

Finally, today I added “self” links to all dependent styles. I did
this since Zotero 4.0 will offer automatic updating of CSL styles, and
will follow the “self” link of styles to find style updates.

Best,

Rintze

all sounds good to me.

I also would like to disallow having two (or more) instances of cs:issn,
since generally one should be a cs:eissn.
just so I understand this correctly: One eissn and one issn would still be
OK, yes. You just don’t want two cs:issn elements - correct?

I actually asked Elsevier about availability of data on journals and
citation styles and while they were quite nice (I just caught someone
random on chat), they said they didn’t have it.
It might be possible to write a script since the examples they use and the
layout off the author-guidelines page are the same across most journals,
but that’d be more involved.

And of course thanks for your heroic work on the dependent styles.On Wed, Mar 27, 2013 at 9:37 PM, Rintze Zelle <@Rintze_Zelle>wrote:

This is great, Rintze, thanks so much for all the hard work! I agree with all your comments too.

Regarding the script: it’s a nice way to maintain dependent style in a more controlled way. Even if updating the journal list (in tab-delimited text files) and running the script can sometimes take longer than tweaking the one affected style, it’s still worth the extra effort, as it means better maintained dependent in the future. I wonder if this logic could be pushed further in the future, by having an even easier way to edit the dependent list without having to touch any of the XML directly. Sometimes, the best tool for the job is a spreadsheet (though Excel can be very dangerous to use, as it tends to interpret ISSN as dates).

In any case, you can all take time to look into how the script works. You can just edit the tab-delimited list of journals, and edit the template, and then run the script. CSS files are created locally, and are then ready to be copied into the CSL repo. I have just added better documentation on how that works:

https://github.com/citation-style-language/utilities/tree/master/generate_dependent_styles

Charles

Hi Charles,

The script looks great! If you come up with more complex requirements or need more flexible templates you could also use CSL-Ruby – that’s exactly the sort of thing it was written to do (generate/manipulate styles). If you’re interested I could take a look over the weekend and adapt your script to use it? Let me know.

Sylvester

Hi Rintze,

Thanks for your great work! I can start adding more tests for dependent styles as soon as you’re ready.

Best,
Sylvester

CSL-Ruby looks very useful, but I don’t know if it can be readily used as a replacement for what the script currently does. The placeholder strings can be anywhere in the template, and one would need to navigate all the XML elements to look for those placeholders. Or else, we would have to define what the different columns can be used for in the journal lists. The current script don’t care about the template format, and the output is only as valid as the template itself.

Charles

I also would like to disallow having two (or more) instances of cs:issn,
since generally one should be a cs:eissn.
just so I understand this correctly: One eissn and one issn would still be
OK, yes. You just don’t want two cs:issn elements - correct?

Yes. The cs:eissn element was only recently introduced in CSL 1.0.1,
so many existing styles used cs:issn for both the online and print
ISSNs. It’s obviously preferable to mark the online ISSN with
cs:eissn. Also, I don’t really care for storing ISSN-L identifiers
(with cs:issnl), and have even gone as far as deleting these from
styles. They haven’t really caught on as a way to replace print and
online ISSNs, and are redundant if we already have the latter (since
the ISSN-L always seems to match one of those two).

Regarding the script: it’s a nice way to maintain dependent style in a more controlled way. Even if updating the journal list (in tab-delimited text files) and running the script can sometimes take longer than tweaking the one affected style, it’s still worth the extra effort, as it means better maintained dependent in the future. I wonder if this logic could be pushed further in the future, by having an even easier way to edit the dependent list without having to touch any of the XML directly.

I agree. Some ideas for further improvements:

  • we have metadata for several publishers, but as journals get added,
    renamed, or discontinued, we need to update it. We could do this
    ourselves periodically, e.g. once a year (several publishers publish
    their journal metadata online). It is probably also worthwhile to
    clarify our documentation on contributing styles to let users (and
    publishers) know that we have publisher-specific metadata, and that if
    they want to add/rename/delete a style from one of these publishers,
    they don’t need to bother with touching CSL XML.
  • the current script does an admirable job of creating dependents. It
    also allows us to suppress the creation of discontinued/renamed styles
    (via a “skip list”). What it doesn’t do, however, is provide a way to
    identify dependent styles from the repository that are outdated and
    should be removed. It would be enough if the script, after freshly
    generating all its dependents, would check if there are any
    script-generated dependents in the styles repository that aren’t
    present in the fresh batch (script-generated styles are easily
    identifiable via the XML comment we add to each of them). We can then
    delete those by hand.

Rintze

I have put up a quick gist to illustrate how you could quickly generate a style using CSL-Ruby. Now, honestly, I have used it mostly to test existing styles so I have not added a lot of convenience methods, but these could be added very quickly. Nevertheless, there are already a few features that show how this can be useful: for example, you could load the independent_parent of the style with a simple command. The parent would be loaded and parsed on the spot and you could make sure that certain metadata match, for example, or pull information from the related style. Furthermore, you can validate the style with one command before saving etc.

When you have a very clear template this may not help you so much, but if you want more flexibility I think there is a lot of potential there (as I said, I haven’t used it much to generate styles, but all the groundwork has been laid, it is really just a matter of adding more syntactic sugar).

Another thing is that the template is essentially one-way, whereas you can parse the generated styles again with CSL-Ruby if you want to make adjustments later on, and so forth.

Anyway, I just wanted to throw this out there, because there’s a lot of good stuff implemented in CSL-Ruby that I simply haven’t had the time to document.

Here is the gist if you’re curious:

In addition to this, we should explain to users how they can quickly
check whether their journal is covered by one of our independent
styles. E.g. we could write documentation with questions such as:

  • Is the journal publisher by Elsevier?
  • If so, does the citation format in the Instructions to Authors match
    that of Biological Conservation?
  • If so, use elsevier-harvard.csl.

Et cetera. It’s basically what Sebastian already does himself.

Rintze

I agree. Some ideas for further improvements:

  • we have metadata for several publishers, but as journals get added,
    renamed, or discontinued, we need to update it. We could do this
    ourselves periodically, e.g. once a year (several publishers publish
    their journal metadata online). It is probably also worthwhile to
    clarify our documentation on contributing styles to let users (and
    publishers) know that we have publisher-specific metadata, and that if
    they want to add/rename/delete a style from one of these publishers,
    they don’t need to bother with touching CSL XML.
  • the current script does an admirable job of creating dependents. It
    also allows us to suppress the creation of discontinued/renamed styles
    (via a “skip list”). What it doesn’t do, however, is provide a way to
    identify dependent styles from the repository that are outdated and
    should be removed. It would be enough if the script, after freshly
    generating all its dependents, would check if there are any
    script-generated dependents in the styles repository that aren’t
    present in the fresh batch (script-generated styles are easily
    identifiable via the XML comment we add to each of them). We can then
    delete those by hand.

Rintze

Yes, better integration with the actual repository would be nice. At the moment, all the script does is generate those styles and then one has to manually copy them into the repo.

There are in fact 3 issues with this:

  • if a style has not changed, the script will still create a new style, with an updated date, and simply copying over into the repo will create a needless update. I deal with this easily for now, by just using the Github.app and quickly going through each file to spot the one that really changed, and only tick those for the commit. It works but obvisouly error-prone and it gets old quickly
  • discontinued journals, as you indicate. for this, we could simply add a ‘_discontinued.txt’ list, which is distinct from the _skip.txt list (as this one is for journals we are not yet sure what to do with, or maybe that are exceptions etc…).

In both cases, the script will need to look into the actual repo to manipulate things in there as well. This is the next step, and obvisouly something I was not comfortable doing automatically until it was clear we have something solid. Which I think we have at this point. Manual inspection of the result before committing the changes, and proper validation, are still necessary, of course.

Charles

In addition to this, we should explain to users how they can quickly
check whether their journal is covered by one of our independent
styles. E.g. we could write documentation with questions such as:

  • Is the journal publisher by Elsevier?
  • If so, does the citation format in the Instructions to Authors match
    that of Biological Conservation?
  • If so, use elsevier-harvard.csl.

Et cetera. It’s basically what Sebastian already does himself.

Rintze

That’s an excellent point, and in fact, we should have that kind of info in our Papers help pages as well :slight_smile:

Charles

I added “csl-repository.csl” to the schema repository
(https://github.com/citation-style-language/schema/blob/master/csl-repository.rnc).
It can be called instead of csl.rnc. It is dependent on the normal
schema, but changes a few things. So far, it:

  • forbids more than one cs:issn element
  • forbids some attributes on cs:style in dependent styles (“class”,
    global options, and inheritable name options)
  • forbids affixes on cs:et-al (“prefix” and “suffix”)
  • requires cs:rights, set as This work is
    licensed under a Creative Commons Attribution-ShareAlike 3.0
    License
  • requires at least one cs:link element.

With regard to the last point: ideally I would have csl-repository.csl
require one “self” link, one “independent-parent” link (for
dependents), one or more “documentation” links, and allow one or more
"template" links (for independents). However, this is not possible
with RELAX NG as long as we want to allow the elements within cs:info
to be unordered (this is because of a limitation with the "interleave"
operator). I guess here the question is what csl-repository.csl will
be used for. I wouldn’t want to require style contributors to use a
specific cs:info element order, as they already have enough on their
plate. But I already periodically reorder the cs:info elements with a
Python script (https://github.com/citation-style-language/utilities/blob/master/csl-reindenting-and-info-reordering.py)
for consistency, so we could use a version that requires a specific
element order to make sure that the styles in the repo are in good
shape.

I cleaned up the current styles so they validate against this stricter
schema. I didn’t add stricter tests for cs:category and
"default-locale" since there will be exceptions in the style
repository, and I wanted all styles to validate against
csl-repository.csl (at least for now).

Rintze

I forgot to add that while I was writing the extension schema, I found
a small error in the CSL 1.0.1 schema:

The bug allowed dependent styles to test against the rules for
independent styles, which made the schema more lax than it should be.
Not a problem for CSL 1.0.1, but csl-repository.csl should be used
together with a fixed version of csl.rnc.

Rintze

Why don’t we just add the requirements (as well as those you mentioned below: ordering, self-link etc.) to the tests? We can add a description and a link to the schema to the Readme stating these as mandatory requirements for pull-requests. GitHub will send each pull request and subsequent changes to Travis-CI. We could even go further, by adding a Hook that auto-responds to bad pull requests with a message reiterating the rules or something along those lines.

I have committed to many open source projects and typically, the more established they are, the more burden is placed on the person making the commit/pull request. Ideally, it should not be necessary for you (or other maintainers) to have to respond to every pull request that leaves a style in a bad shape, but only review those which pass all the repository rules.

If I remember correctly, the tests validate each style against the schema; if you can express the conditions, I can adapt some styles to be validated against the new schema, as well.

Sylvester

Why don’t we just add the requirements (as well as those you mentioned below: ordering, self-link etc.) to the tests? We can add a description and a link to the schema to the Readme stating these as mandatory requirements for pull-requests. GitHub will send each pull request and subsequent changes to Travis-CI. We could even go further, by adding a Hook that auto-responds to bad pull requests with a message reiterating the rules or something along those lines.

A lot of style contributors are new to GitHub. Our instructions
explain how to create new pull requests (which is relatively simple),
but it’s still rather complicated to modify an existing pull request
via the website (basically, you have to navigate to the style file in
the contributor’s fork of the repo, in the branch used for the pull
request; there the contributor can use the Edit button). I was hoping
we could have contributors test as much as possible before creating a
pull request, hence the stricter schema.

I have committed to many open source projects and typically, the more established they are, the more burden is placed on the person making the commit/pull request. Ideally, it should not be necessary for you (or other maintainers) to have to respond to every pull request that leaves a style in a bad shape, but only review those which pass all the repository rules.

Yes. The style repo is rather unique though compared to many other
open source projects in that we get a lot of one-off contributions
from inexperienced users. Because of that I just think that the
current method of manually creating pull requests isn’t going to scale
much further. I’m also wondering how many users simple decide not to
bother submitting their style improvements because it’s too much work.
And, even if we have a process that only generates correct pull
requests, the style repository maintainers will still need to review
all the correct ones, which is going to be a significant burden if the
repository gets even more popular. (see
http://www.ohloh.net/p/CitationStyleLanguage for graphs that show the
growth in the project activity)

But maybe I’m asking the wrong question. To stir up the discussion a
bit: developers who enjoy the availability of CSL styles (yes you!),
be it from BibSonomy, colwiz, Docear, Drupal’s biblio module,
jekyll-scholar, KCite, Mendeley, pandoc, Qiqqa, or Zotero, what would
you do if Sebastian and I stopped handling style submissions? So far,
only Charles Parnot from Papers helps out. Do any of you have a
process in mind that doesn’t involve relying on me and Sebastian? :stuck_out_tongue:

If I remember correctly, the tests validate each style against the schema; if you can express the conditions, I can adapt some styles to be validated against the new schema, as well.

Some of the additional requirements in csl-repository.csl are already
part of your test framework, but I’ll look at the ones that aren’t.

Rintze

Hi,

I have committed to many open source projects and typically, the more
established they are, the more burden is placed on the person making the
commit/pull request. Ideally, it should not be necessary for you (or other
maintainers) to have to respond to every pull request that leaves a style
in a bad shape, but only review those which pass all the repository rules.

Yes. The style repo is rather unique though compared to many other
open source projects in that we get a lot of one-off contributions
from inexperienced users. Because of that I just think that the
current method of manually creating pull requests isn’t going to scale
much further. I’m also wondering how many users simple decide not to
bother submitting their style improvements because it’s too much work.

I think Sylvester and Rintze’s approach are really 2nd and 1st best
solutions to the current problem.
If we can find a way to make style submissions easier and more reliably
correct that would be ideal. I see some issues - e.g. the fact that we seem
to get people to correctly name styles (even though e.g. the visual editor
mostly does that right by itself), to include ISSNs and links to style
guides - that I wouldn’t really know how to automate. Also, things like
checking whether a style should be a dependent style seems relatively hard
to automate, but I’m not an expert on automation and it’s very well
possible that I’m underestimating what can be done.

But if we can’t get this to work, I think the only sustainable way to go is
to do what Sylvester says and increase the burden on contributors. I’m less
apprehensive about that than Rintze. The majority of pull requests we get
that require a lot of attention is for new styles. I’d argue that - while
it’s obviously good to cover as many styles as possible - we now have so
many styles that my main concern would be to improve their quality rather
than their quantity. (that includes outright mistakes - styles not
conforming to their style guides, incomplete styles that don’t cover all
commonly used item types (not all styles need to cover, say, legal cases,
but most should probably cover theses and reports), as well as things like
poor coding of affixes that lead to incorrect output on certain data
constellation and requires, as Andrea has pointed, a fair amount of
processor work-arounds that aren’t actually part of the specs.)

And, even if we have a process that only generates correct pull
requests, the style repository maintainers will still need to review
all the correct ones, which is going to be a significant burden if the
repository gets even more popular. (see
http://www.ohloh.net/p/CitationStyleLanguage for graphs that show the

growth in the project activity)

right - and I don’t see any way around that, so this:

But maybe I’m asking the wrong question. To stir up the discussion a
bit: developers who enjoy the availability of CSL styles (yes you!),
be it from BibSonomy, colwiz, Docear, Drupal’s biblio module,
jekyll-scholar, KCite, Mendeley, pandoc, Qiqqa, or Zotero, what would
you do if Sebastian and I stopped handling style submissions? So far,
only Charles Parnot from Papers helps out. Do any of you have a
process in mind that doesn’t involve relying on me and Sebastian? :stuck_out_tongue:

really does strike me as a rather important question. For me, the problem
isn’t just that dealing with erroneous pull requests is a lot of unpaid
work, but also that it’s not exactly exciting and quite repetitive for the
most part (e.g. I mind much less looking over correct pull requests to
improve the code).

Shifting to more near term solutions:
I think stricter Travis conditions are a good idea.
The other thing I’d like to see - and I assume that’s possible - is that a
failing Travis would automatically lead to a message along the lines of
what we’re writing manually now:
“Thank you for your submission to the citation-styles repository.
Unfortunately, our friendly maintenance-bot Travis tells us that there are
some problems with you submission. Please make sure that you’ve followed
all the style requirements [link]. Travis’s output [link] may give you
additional clues about the problems in your submission.”

Happy egg (or afikomen) hunting:
Sebastian–
Sebastian Karcher
Ph.D. Candidate
Department of Political Science
Northwestern University

I agree that with all the styles now in the repository, getting stricter is the way to go, and aiming for quality should be the goal. Being less friendly with the pull requests and asking more from the contributors is maybe unfortunate but the only sustainable way IMO.

The fact that most of the work is done by unpaid contributors is a real concern, and Rintze asks a very good question. Interestingly, and maybe it’s going to be a surprise, but I am not paid by Papers/Springer. I just continue on the side to work on some of the CSL stuff, but it’s really a side activity. I have a direct contact with the Papers people, and we receive a lot of useful feedback on issues with some styles, but we don’t have the resources to answer them all at the moment. We hope to have someone with more time on his hands to dedicate to this, though at the moment, there is no promise on that. One of the reasons I keep working on this, is that I am personally excited by the project and its achievements. I want CSL to become the de facto standard.

The idea of a grant is a great one. The project has reached a level of maturity that makes it a good candidate for that. If the project received a grant, of say $100,000, the way I would like it to be spent is to pay one person for 1-2 years to devote 100% of her/his time to writing new styles, and correcting existing ones. I am pretty sure the Papers guys would be perfectly happy to send all the feedback we have to this person :slight_smile:

Charles

Alternatively, we could ask Zotero/Mendeley/Papers/etc. for some
publicity to help us attract some volunteers to help out. But since
we’re talking about grunt work, it’s unlikely that there will be many
takers.

Rintze

Like giving away free licenses to style contributors? :)Sent with my thumbs

Alternatively, we could ask Zotero/Mendeley/Papers/etc. for some
publicity to help us attract some volunteers to help out. But since
we’re talking about grunt work, it’s unlikely that there will be many
takers.

yeah, I’ve thought about that in the context of Zotero translators - my
idea would be that people can learn something while volunteering. That’s
essentially what happened in my case. I wonder, though, if we have
sufficiently attractive learning options available for CSL.
S.