Hi everybody,
Finally! I am sorry it took so long, but here is a more official email introducing Mekentosj, Papers and myself to the CSL developer community. I will assume this is the right mailing list / outlet to do that, but let me know if that’s not the case.
First, to clarify, the reason why we did not introduce ourselves and post here before, is simply that we were incredibly busy with the aftermath of the Papers 2.0 release, with an intensity that did not really slow down until we released Papers 2.1 a couple of weeks ago. It was always our intention to reach out to you guys, but sadly, that could not be done in a constructive and useful way without distracting too much from more urgent short-term tasks. As you may know, we are a very small team, with limited resources.
OK, so now here is what we have been willing to say but were holding back until time permits…
The best thing to start with is a very very big thank you to the CSL developers and to the wider CSL community as well, for both the language and the styles. Papers 2 would likely not have its ‘Manuscripts’ component, if it was not for CSL. Or it would be much more limited and a big resource sink. I have been the one working on that component of Papers, and I was very impressed by the quality of the work. The specifications of the language are very well written and documented, the design appears very strong, and the results of the initiative are clearly fantastic: the language is supported by several software pacakges now, and the CSL style repository is really large and growing at a good rate. I personally have a natural tendency to reinvent the wheel at the slightest excuse, but here, there was absolutely no incentive to do this. I felt very confident with what I saw, and I learned a lot.
I also wanted to give you an overview of how CSL was integrated into Papers, and which components of CSL we used.
From the start, we went with the CSL 1.0 specifications. At the time, Zotero was still using 0.8 in the official release, and the style repository was still at version 0.8 and hosted on sourceforge, but it seemed to make more sense to just go straight with 1.0. The timing turned out very well, as the 1.0 specs were ready, and the transition already well on its way.
The processor was written in Objective C, and built on top the Cocoa frameworks that all modern Mac apps are expected to use. Of course, this is also the same foundation used for the rest of Papers. We debated a bit wether to use the existing javascript processor, which could also have been a very good fit. But in this case, we decided to instead reinvent the wheel, and write our own processor. The main concern was performance: using the javascript processor would mean running everything through an extra layer, which would affect both speed and memory usage. Mac OS X has a good interpreter for javascript, even accessible in ObjC, but we still made the decision to go with ‘pure’ objective C.
It helped a lot with the decision to see (1) a strong and clear specification, (2) a suite of tests. I spent a lot of quality time on the spec page, obviously. But I also leveraged the existing test suite to drive the development and refine the implementation in Objective C. I went through 2 relatively large refactoring as I hit some issues with the design of my implementation, but thanks to the tests, that could be done with confidence. Not all the tests pass, even today, and in fact, I am still using an old version of the tests, but a large majority of them are covered. The bottom line is: I wanted to let you know that these tests have been absolutely invaluable, and are a really great complement to the specs. Some of the edge cases and subtleties are better explained and resolved with clear examples, and those tests provide exactly the tool needed.
If you are curious: I wrote a separate app for running the tests. I think that app should be called a harness, if I understand correctly what that means. Anyway, the job performed by that app/harness is to take the fixtures from the CSL repo, as-is, translate the data structures into Papers2 data structures, compute the actual result as Papers2 thinks it should be, and then compare to the expected result. The test fails or passes (and my app can also mark it as a regression, so I can immediately see the negative side-effects of a change in the code). This tool, combined with the fixtures you provided, has been so useful to me, that it just seemed natural to start creating fixtures as CSL-related bugs were reported by our users. Some of the bugs had not been detected by the CSL fixtures, so it seemed very natural to add them to the tests. The fixtures follow the same format as the one on the repo. I have never in fact fixed a bug without creating a corresponding test, which I am very happy about.
Note that in Papers 2.1, we introduced some Papers2-specific CSL variables to deal with some user requests. Inspired by CSS and vendor-specific extensions, I prefixed those variables with ‘papers2_’, and thus named those papers2_pmid
, papers2_pmcid
and papers2_notes
. They are quite self-explanatory. Please let me know if you have any questions, or comments on that. We certainly don’t want to ‘fork’ CSL or go crazy with new variables or other modifications of the CSL. On the contrary, we think a strong and consistent CSL implementation across multiple platforms and sofwtare packages is also in Mekentosj’s best interest.
At this stage, we thus have a good Objective C processor, with still a few holes, as well also a nice harness app to keep fixing bugs without breaking existing behavior, and with the ability to leverage present and future CSL fixtures. In the back of our mind, we have the idea of open-sourcing all that stuff maybe one day, but at the moment, it still has too many dependencies shared with the rest of Papers2. Also, it’s not clear we can do a good job at managing an open-source project in an acceptable way for others (our poor records in communicating with the CSL devs so far is not a good indication!). Also, I am not even sure that it would be very useful to many people other than us. Let us know what you think.
I have described the processor, now I want to describe a bit how we integrate the CSL styles themselves.
When we launched Papers 2.0, we shipped CSL 1.0 styles. At the time, I used the Zotero svn repository to pull the 0.8 styles, then used the XSLT file to translate the styles to 1.0. For a while, this is how the styles were updated as well with new versions of Papers 2.0.x. For Papers 2.1.10 (soon in beta), I have updated the script to directly use the GitHub repository. Back then and still now, we apply a few more filters, to remove styles that have a non-commercial license (that’s only a small number of them), to adjust a few titles we thought were not correct or did not like (e.g. ‘Nature Journal’ becomes simply ‘Nature’), and finally generate a list of acknowledgments. If you open Papers and check the Acknowledgments in the About Papers… window, you’ll see it lists every single author and contributor of every single CSL style we include in the Papers2 app package (that makes a big list!!). All the styles are distributed as part of the Papers2.app package. We do not have live updates that check the content of the GitHub repository, though we might do it at some point. We simply run our CSL update script for new Papers2 releases. Users can also easily install styles or overwrite the built-in styles with their own version (http://support.mekentosj.com/kb/pro-tips/pro-tip-adding-additional-citation-styles).
I also have a handful of styles contributed by some users, that I’d like to push to the main CSL repository. I’d be happy to have direct commit access, but then I also don’t know what the process is to get commit access. Maybe getting commit access would be counterproductive for you, so I’m just as happy to submit them through the normal submission process (I have read the instructions, etc… so no worry, I don’t need them, and I know it’s not the right place to ask questions about this). Whatever minimizes the amount of work on your end. Same for the fixtures.
Please let us know if you have any concerns, questions, comments,… about the above. I’ll do my best to answer your emails on this list (note I am in vacation Dec 13-24, though… including today actually ;-).
I will also be sending 2 other emails to this list soon. They’re shorter emails, and on topics different enough, so I did not want to mix up everything on one thread.
Thanks again for the wonderful contributions of all the CSL developers,
Charles–
Charles Parnot
@Charles_Parnot
twitter: @cparnot