Papers and CSL

Hi everybody,

Finally! I am sorry it took so long, but here is a more official email introducing Mekentosj, Papers and myself to the CSL developer community. I will assume this is the right mailing list / outlet to do that, but let me know if that’s not the case.

First, to clarify, the reason why we did not introduce ourselves and post here before, is simply that we were incredibly busy with the aftermath of the Papers 2.0 release, with an intensity that did not really slow down until we released Papers 2.1 a couple of weeks ago. It was always our intention to reach out to you guys, but sadly, that could not be done in a constructive and useful way without distracting too much from more urgent short-term tasks. As you may know, we are a very small team, with limited resources.

OK, so now here is what we have been willing to say but were holding back until time permits…

The best thing to start with is a very very big thank you to the CSL developers and to the wider CSL community as well, for both the language and the styles. Papers 2 would likely not have its ‘Manuscripts’ component, if it was not for CSL. Or it would be much more limited and a big resource sink. I have been the one working on that component of Papers, and I was very impressed by the quality of the work. The specifications of the language are very well written and documented, the design appears very strong, and the results of the initiative are clearly fantastic: the language is supported by several software pacakges now, and the CSL style repository is really large and growing at a good rate. I personally have a natural tendency to reinvent the wheel at the slightest excuse, but here, there was absolutely no incentive to do this. I felt very confident with what I saw, and I learned a lot.

I also wanted to give you an overview of how CSL was integrated into Papers, and which components of CSL we used.

From the start, we went with the CSL 1.0 specifications. At the time, Zotero was still using 0.8 in the official release, and the style repository was still at version 0.8 and hosted on sourceforge, but it seemed to make more sense to just go straight with 1.0. The timing turned out very well, as the 1.0 specs were ready, and the transition already well on its way.

The processor was written in Objective C, and built on top the Cocoa frameworks that all modern Mac apps are expected to use. Of course, this is also the same foundation used for the rest of Papers. We debated a bit wether to use the existing javascript processor, which could also have been a very good fit. But in this case, we decided to instead reinvent the wheel, and write our own processor. The main concern was performance: using the javascript processor would mean running everything through an extra layer, which would affect both speed and memory usage. Mac OS X has a good interpreter for javascript, even accessible in ObjC, but we still made the decision to go with ‘pure’ objective C.

It helped a lot with the decision to see (1) a strong and clear specification, (2) a suite of tests. I spent a lot of quality time on the spec page, obviously. But I also leveraged the existing test suite to drive the development and refine the implementation in Objective C. I went through 2 relatively large refactoring as I hit some issues with the design of my implementation, but thanks to the tests, that could be done with confidence. Not all the tests pass, even today, and in fact, I am still using an old version of the tests, but a large majority of them are covered. The bottom line is: I wanted to let you know that these tests have been absolutely invaluable, and are a really great complement to the specs. Some of the edge cases and subtleties are better explained and resolved with clear examples, and those tests provide exactly the tool needed.

If you are curious: I wrote a separate app for running the tests. I think that app should be called a harness, if I understand correctly what that means. Anyway, the job performed by that app/harness is to take the fixtures from the CSL repo, as-is, translate the data structures into Papers2 data structures, compute the actual result as Papers2 thinks it should be, and then compare to the expected result. The test fails or passes (and my app can also mark it as a regression, so I can immediately see the negative side-effects of a change in the code). This tool, combined with the fixtures you provided, has been so useful to me, that it just seemed natural to start creating fixtures as CSL-related bugs were reported by our users. Some of the bugs had not been detected by the CSL fixtures, so it seemed very natural to add them to the tests. The fixtures follow the same format as the one on the repo. I have never in fact fixed a bug without creating a corresponding test, which I am very happy about.

Note that in Papers 2.1, we introduced some Papers2-specific CSL variables to deal with some user requests. Inspired by CSS and vendor-specific extensions, I prefixed those variables with ‘papers2_’, and thus named those papers2_pmid, papers2_pmcid and papers2_notes. They are quite self-explanatory. Please let me know if you have any questions, or comments on that. We certainly don’t want to ‘fork’ CSL or go crazy with new variables or other modifications of the CSL. On the contrary, we think a strong and consistent CSL implementation across multiple platforms and sofwtare packages is also in Mekentosj’s best interest.

At this stage, we thus have a good Objective C processor, with still a few holes, as well also a nice harness app to keep fixing bugs without breaking existing behavior, and with the ability to leverage present and future CSL fixtures. In the back of our mind, we have the idea of open-sourcing all that stuff maybe one day, but at the moment, it still has too many dependencies shared with the rest of Papers2. Also, it’s not clear we can do a good job at managing an open-source project in an acceptable way for others (our poor records in communicating with the CSL devs so far is not a good indication!). Also, I am not even sure that it would be very useful to many people other than us. Let us know what you think.

I have described the processor, now I want to describe a bit how we integrate the CSL styles themselves.

When we launched Papers 2.0, we shipped CSL 1.0 styles. At the time, I used the Zotero svn repository to pull the 0.8 styles, then used the XSLT file to translate the styles to 1.0. For a while, this is how the styles were updated as well with new versions of Papers 2.0.x. For Papers 2.1.10 (soon in beta), I have updated the script to directly use the GitHub repository. Back then and still now, we apply a few more filters, to remove styles that have a non-commercial license (that’s only a small number of them), to adjust a few titles we thought were not correct or did not like (e.g. ‘Nature Journal’ becomes simply ‘Nature’), and finally generate a list of acknowledgments. If you open Papers and check the Acknowledgments in the About Papers… window, you’ll see it lists every single author and contributor of every single CSL style we include in the Papers2 app package (that makes a big list!!). All the styles are distributed as part of the Papers2.app package. We do not have live updates that check the content of the GitHub repository, though we might do it at some point. We simply run our CSL update script for new Papers2 releases. Users can also easily install styles or overwrite the built-in styles with their own version (http://support.mekentosj.com/kb/pro-tips/pro-tip-adding-additional-citation-styles).

I also have a handful of styles contributed by some users, that I’d like to push to the main CSL repository. I’d be happy to have direct commit access, but then I also don’t know what the process is to get commit access. Maybe getting commit access would be counterproductive for you, so I’m just as happy to submit them through the normal submission process (I have read the instructions, etc… so no worry, I don’t need them, and I know it’s not the right place to ask questions about this). Whatever minimizes the amount of work on your end. Same for the fixtures.

Please let us know if you have any concerns, questions, comments,… about the above. I’ll do my best to answer your emails on this list (note I am in vacation Dec 13-24, though… including today actually ;-).

I will also be sending 2 other emails to this list soon. They’re shorter emails, and on topics different enough, so I did not want to mix up everything on one thread.

Thanks again for the wonderful contributions of all the CSL developers,

Charles–
Charles Parnot
@Charles_Parnot
twitter: @cparnot

Hi Charles,

Hi everybody,

Finally! I am sorry it took so long, but here is a more official email introducing Mekentosj, Papers and myself to the CSL developer community. I will assume this is the right mailing list / outlet to do that, but let me know if that’s not the case.

This is the right place. Welcome.

The fixtures follow the same format as the one on the repo. I have never
in fact fixed a bug without creating a corresponding test, which I am very
happy about.

Can these test fixtures already be contributed to Frank’s test suite?

Note that in Papers 2.1, we introduced some Papers2-specific CSL variables

to deal with some user requests. Inspired by CSS and vendor-specific
extensions, I prefixed those variables with ‘papers2_’, and thus named
those papers2_pmid, papers2_pmcid and papers2_notes. They are quite
self-explanatory. Please let me know if you have any questions, or comments
on that. We certainly don’t want to ‘fork’ CSL or go crazy with new
variables or other modifications of the CSL. On the contrary, we think a
strong and consistent CSL implementation across multiple platforms and
sofwtare packages is also in Mekentosj’s best interest.

I’m a strong proponent for adding PMID and PMCID variables to CSL 1.0.1.

At this stage, we thus have a good Objective C processor, with still a few
holes, as well also a nice harness app to keep fixing bugs without breaking
existing behavior, and with the ability to leverage present and future CSL
fixtures. In the back of our mind, we have the idea of open-sourcing all
that stuff maybe one day, but at the moment, it still has too many
dependencies shared with the rest of Papers2. Also, it’s not clear we can
do a good job at managing an open-source project in an acceptable way for
others (our poor records in communicating with the CSL devs so far is not a
good indication!). Also, I am not even sure that it would be very useful to
many people other than us. Let us know what you think.

There is interest for an Objective C citeproc:
https://groups.google.com/d/msg/zotero-dev/lAcACTgY5uQ/MrESI5i9aBUJ

When we launched Papers 2.0, we shipped CSL 1.0 styles. At the time, I
used the Zotero svn repository to pull the 0.8 styles, then used the XSLT
file to translate the styles to 1.0. For a while, this is how the styles
were updated as well with new versions of Papers 2.0.x. For Papers 2.1.10
(soon in beta), I have updated the script to directly use the GitHub
repository. Back then and still now, we apply a few more filters, to remove
styles that have a non-commercial license (that’s only a small number of
them)

Do we still have (L)GPL styles? I really think we should get rid of that
type of license in the style repo.

to adjust a few titles we thought were not correct or did not like (e.g.
‘Nature Journal’ becomes simply ‘Nature’)

Can you supply us with an overview of these adjustments? I might fix up a
few styles upstream if I agree with the changes.

I also have a handful of styles contributed by some users, that I’d like
to push to the main CSL repository. I’d be happy to have direct commit
access, but then I also don’t know what the process is to get commit
access. Maybe getting commit access would be counterproductive for you, so
I’m just as happy to submit them through the normal submission process (I
have read the instructions, etc… so no worry, I don’t need them, and I know
it’s not the right place to ask questions about this). Whatever minimizes
the amount of work on your end. Same for the fixtures.

As I indicated in the other thread, just make sure that the style you
commit are valid CSL 1.0. If you start with a few, Sebastian and I will
review them and let you know if there’s anything we would like to see
different.

Thanks for the update, Charles, it’s much appreciated.

RintzeOn Tue, Dec 20, 2011 at 2:53 AM, Charles Parnot <@Charles_Parnot>wrote:

If you’d like commit rights, just send me your github user name and I’ll add you to the “Style Editors” org team.

Ah, I forgot that in my previous email: my username is cparnot, see https://github.com/cparnot

There is interest for an Objective C citeproc: https://groups.google.com/d/msg/zotero-dev/lAcACTgY5uQ/MrESI5i9aBUJ

Ah, thanks for the link!

Do we still have (L)GPL styles? I really think we should get rid of that type of license in the style repo.

to adjust a few titles we thought were not correct or did not like (e.g. ‘Nature Journal’ becomes simply ‘Nature’)

Can you supply us with an overview of these adjustments? I might fix up a few styles upstream if I agree with the changes.

I’ll send more details after the 24th, but for instance, here is the complete list of title adjustments (current name and modified name separated by a tab). I did check the websites for a number of these journals to check the actual name used by them. Some of those changes are a matter of taste. There were a few typos, though they may have been fixed since then (I just run a script that chnages any occurence of the titles before the tabs).

AIDS: an International Bimonthly Journal AIDS
AIDS Journal AIDS
AJR: American Journal of Roentgenology American Journal of Roentgenology
American Journal of Archaeology (Author-Date) American Journal of Archaeology
Bioinformatics Journal Bioinformatics
Cell Journal Cell
Diabetes Journal Diabetes
European Radiology, Neuroradiology (Elsevier) European Radiology, Neuroradiology
European Hospital Journal European Hospital
FEMS based Journal FEMS
Genes and Development Journal Genes and Development
Genome Biology Journal Genome Biology
History Journal History
History and Theory Journal History and Theory
International Organization International Organization
Hydrogeology Journal (Author-Date) Hydrogeology Journal
IEEE-w-url IEEE with URL
Indian Pacing and Electrophsyiology Journal Indian Pacing and Electrophysiology Journal
INTER. Romanian Institute for Inter-Orthodox, Inter-Confessional and Inter-Religious Studies Romanian Institute for Inter-Orthodox, Inter-Confessional and Inter-Religious Studies (INTER)
Journal of Pragmatics (Author-Date) Journal of Pragmatics
Journal of the Swedish Medical Association (see “Lakartidningen”) Journal of the Swedish Medical Association
Metabolic Engineering Journal Metabolic Engineering
Meteoritics & Planetary Science Journal Meteoritics & Planetary Science
Nature Journal Nature
New Foundland Medical Association Journal Newfoundland Medical Association Journal
Nucleic Acids Research Journal Nucleic Acids Research
Pharmacognosy Reviews [Phcog Rev.] Pharmacognosy Reviews
Pharmcognosy Magazine [Phcog Mag.] Pharmacognosy Magazine
PJNZ - Pharmacy Journal of New Zealand Pharmacy Journal of New Zealand (PJNZ)
Public Library of Science Journals Public Library of Science Journals (PLoS)
PNAS Journal PNAS
PS: Political Science & Politics Political Science & Politics
RNA Journal RNA
Science journal Science
SPIE_BiOS SPIE BiOS
SMALL-wiley Small (Wiley)
SNGTV Société Nationale des Groupements Techniques Vétérinaires
SciBX: Science-Business eXchange Science-Business eXchange (SciBX)
the Ceylon Medical Journal Ceylon Medical Journal
The Journal of Neuroscience (Author-Date) The Journal of Neuroscience
The Journal of Investigative Dermatology Journal of Investigative Dermatology
The Journal of Chemical Physics Journal of Chemical Physics
Tijdschrift Voor Nucelaire Geneeskunde Tijdschrift Voor Nucleaire Geneeskunde
UN_ECLAC_CEPAL-v3.1 ECLAC / CEPAL v3.1
Yeast Journal Yeast
Invisu. Art and Humanities (french) Invisu
University of South Australia 2011 (Harvard-based author-date system) University of South Australia 2011
University of South Australia 2007 (Harvard-based author-date system) University of South Australia 2007

I went through the list, and adopted most of your changes:


RintzeOn Wed, Dec 21, 2011 at 2:44 AM, Charles Parnot <@Charles_Parnot>wrote:

Great to hear, and glad that was useful!

charles