Pursuing grants for CSL-related development

As most you will know, Mendeley and Columbia University were able to
secure funding from the Sloan Foundation for the development of the
online CSL editor (


).

Jeffrey Lancaster was involved at Columbia, and wrote to me that he’d
be happy to assist us with thinking about, and possibly applying for,
further grants. He might have some valuable insight, so I suggest we
shouldn’t waste the opportunity :). Any thoughts? I know Frank Bennett
tried (but failed) to get a grant for development of Multilingual
Zotero, so maybe it’s time for a retry.

Personally, I would really like to see the process of submitting
styles to the repository become more automated. Sebastian Karcher,
Charles Parnot and I spend a lot of time handling style submissions.
The volume of style submissions has increased quite a bit over the
past year, and a large fraction of the work is just making sure that
the submissions are done correctly. While I have tried to document the
process as clearly as possible for users, we still deal with a lot of
incorrect GitHub pull requests, submissions of invalid CSL styles and
style metadata that hasn’t been entered correctly. My motivation to
continue to perform this labor for free has its limits, so I welcome
any thoughts on how to lessen this burden.

Rintze

Personally, I would really like to see the process of submitting
styles to the repository become more automated. Sebastian Karcher,
Charles Parnot and I spend a lot of time handling style submissions.
The volume of style submissions has increased quite a bit over the
past year, and a large fraction of the work is just making sure that
the submissions are done correctly. While I have tried to document the
process as clearly as possible for users, we still deal with a lot of
incorrect GitHub pull requests, submissions of invalid CSL styles and
style metadata that hasn’t been entered correctly. My motivation to
continue to perform this labor for free has its limits, so I welcome
any thoughts on how to lessen this burden.

I think this is the key, and as you suggest, is not really sustainable.

So the question is how we address:

  1. what the project would be to fix the problem? Is it a full-blown
    repository web app, for example, that could tightly integrate with the
    editor, that had the sort of broader review model I’ve previously
    advocated (e.g. that makes it easy and attractive for non-technical
    users to become style editors and reviewers)?

  2. how do we fund it?

  1. what the project would be to fix the problem? Is it a full-blown
    repository web app, for example, that could tightly integrate with the
    editor, that had the sort of broader review model I’ve previously
    advocated (e.g. that makes it easy and attractive for non-technical
    users to become style editors and reviewers)?

I have recently come to the conclusion that a more attractive style editor, that could be used by “regular” people is not the way to go. Unfortunately, such a style editor will be really hard to do, and I am not even sure it can be done. I feel like the resources put into such an effort would be better spent in creating new styles, fixing existing ones, and cleaning things up. And yes, writing tools that help with this is also a good thing, but we have to strike the right balance between “user-friendly” and “developer-friendly”. The recent Travis stuff has been very very useful, and a great asset for the project. I don’t know if I would have dared to create all those Springer styles without it. And it was really set up very quickly, with relatively small efforts (of course, it looked like it from my end!), aided by somebody who knew the setup and approached things in a very pragmatic way.

A grant to write a great CSL editor might be more sexy than paying somebody to just go through styles, but it would be more efficient for the project IMO.

If you think of the CSL styles as code, then the distinction between a user and a developer is clear: the user is writing a paper and wants their f**ing bibliography to be done (but is OK reporting a problem) and the developers are the person contributing to the code (the XML!). Now, if we distinguish between the CSL “user” and the CSL “developer”, there are still things that could be better done for both categories.

For the users:

  • a better style browser, including a way to find a style that matches what they want (and yes, the current csl-editor is a good start for that)
  • a better reporting tool for style issues, where such report should have clear fields about the expected output, the actual output, and the value of the different fields (ideally, with citeproc-js showing the output, so a user can reproduce the ‘bug’)

For the developers:

  • a better style browser (the same as the one for the users!!)
  • a more strict process for submitting styles (what we discussed about pull requests)
  • a better development environment, and the csl-editor has actually some very interesting components there; but again, we are talking about an editor for technical people, and that’s fine, let’s focus on that

Charles

Yup, agree with you Charles.

The only thing I would add to your list is a way for your "developers"
to easily see the impact of a style change. So a preview of a style
diff is you will.

Bruce

I think a style editor and someone in charge of maintaining styles serve
two different purposes. The style editor is very useful for people to
identify the styles they want and it’s great to allow them to make small
alterations/customizations. Since we don’t accept any style to the
repository, I think that’s quite valuable and I’m very glad we have the
editor around.

But I completely agree with Charles that the style editor won’t solve the
quality problem. Two reasons: 1. People without experience are less attuned
to details in styles - even the librarians I trained for Zotero often
missed small details. 2. As soon as you make larger changes, inexperienced
users write bad CSL, be it with the editor or without. (In particular they
use affixes instead of groups way too much).

I think fixing existing styles may take up less work than Charles suggest

  • I think 10 a day isn’t unrealistic. This is probably too low-tech for a
    google-summer project, is it? Sylvester, any thoughts?

The only thing I would add to your list is a way for your “developers” to
easily see the impact of a style change.
agreed - didn’t Frank have something like that? Frank - is that still
around? Could you explain how it’s working?

Sebastian

  1. what the project would be to fix the problem? Is it a full-blown
    repository web app, for example, that could tightly integrate with the
    editor, that had the sort of broader review model I’ve previously
    advocated (e.g. that makes it easy and attractive for non-technical
    users to become style editors and reviewers)?

I have recently come to the conclusion that a more attractive style editor, that could be used by “regular” people is not the way to go. Unfortunately, such a style editor will be really hard to do, and I am not even sure it can be done. I feel like the resources put into such an effort would be better spent in creating new styles, fixing existing ones, and cleaning things up. And yes, writing tools that help with this is also a good thing, but we have to strike the right balance between “user-friendly” and “developer-friendly”. The recent Travis stuff has been very very useful, and a great asset for the project. I don’t know if I would have dared to create all those Springer styles without it. And it was really set up very quickly, with relatively small efforts (of course, it looked like it from my end!), aided by somebody who knew the setup and approached things in a very pragmatic way.

A grant to write a great CSL editor might be more sexy than paying somebody to just go through styles, but it would be more efficient for the project IMO.

If you think of the CSL styles as code, then the distinction between a user and a developer is clear: the user is writing a paper and wants their f**ing bibliography to be done (but is OK reporting a problem) and the developers are the person contributing to the code (the XML!). Now, if we distinguish between the CSL “user” and the CSL “developer”, there are still things that could be better done for both categories.

For the users:

  • a better style browser, including a way to find a style that matches what they want (and yes, the current csl-editor is a good start for that)
  • a better reporting tool for style issues, where such report should have clear fields about the expected output, the actual output, and the value of the different fields (ideally, with citeproc-js showing the output, so a user can reproduce the ‘bug’)

Two years ago, I built a little plugin for Zotero for this purpose:

https://bitbucket.org/fbennett/csl-feedback-zotero

I’m pretty sure it will no longer run against current code, but it
added a “Report style error” button to the document dialogs (edit
citation, edit bibliography). The user would fix up the citation
(using the wysiwyg editor already offered in the dialogs) and click a
submit button. A complete test was added to a “CSL Submission” group,
tagged for the style.

The idea didn’t catch on at the time (the execution was hackish), but
maybe something could be done with the idea of a “reporting API” for
CSL projects. If you had infrastructure for style-level testing,
quality evaluation could be automated to some degree, and that might
(just thinking out loud) open a path to sustainability through
“sponsored styles” or somesuch.

Frank

Ah, yes, good one!

Yup, agree with you Charles.

The only thing I would add to your list is a way for your “developers”
to easily see the impact of a style change. So a preview of a style
diff is you will.

Ah, yes, good one!

Style-level tests really make a difference. I have a batch of tests
that I run against the MLZ legal styles, and I’ve found that they
relieve a lot of worry about side-effects when changes are made. The
volume of data grows pretty quickly though (there are 10,000+ fixtures
for just six styles), so working out good developer-side tools for
managing the suite would be important. If you chose to go that route.

I think a style editor and someone in charge of maintaining styles serve two different purposes. The style editor is very useful for people to identify the styles they want and it’s great to allow them to make small alterations/customizations. Since we don’t accept any style to the repository, I think that’s quite valuable and I’m very glad we have the editor around.

But I completely agree with Charles that the style editor won’t solve the quality problem. Two reasons: 1. People without experience are less attuned to details in styles - even the librarians I trained for Zotero often missed small details. 2. As soon as you make larger changes, inexperienced users write bad CSL, be it with the editor or without. (In particular they use affixes instead of groups way too much).

I fully understand Rintze’s hesitation where putting more responsibility into the hands of contributors is concerned; after all, these are individuals who have already made an effort to help: they created a pull request – now, obviously, we don’t want to scare them away by adding too many obstacles; at the same time, however, I believe the style repository has (as Rintze illustrates) become such an asset by now, that you can take the liberty to be a little stricter. The salient point is that the rules must be well documented and that we do help and support contributors as best we can – but I think all contributors are aware of the quality they’re getting from the style repository and will understand the little extra effort involved in becoming contributors themselves.

So, as I’ve said before, I’ll gladly add more rules to the tests for them to be enforced automatically. Ultimately, we could combine this with style-level tests – travis ci supports node and haskell so we could process test data with the open source cite processors.

All of this, though, does not replace the work of repository maintainers like Rintze or Sebastian; these are just ways to make their work more efficient and effective.

I think fixing existing styles may take up less work than Charles suggest - I think 10 a day isn’t unrealistic. This is probably too low-tech for a google-summer project, is it? Sylvester, any thoughts?

I’ve been thinking about that, too. You’re right that fixing styles is probably not sufficiently challenging, but we’ve already mentioned lots of ideas on this list which would make for good summer of code projects. It’s too late to work on an application now, but perhaps this is something to keep in mind for next year?

Sylvester

Style-level tests really make a difference. I have a batch of tests
that I run against the MLZ legal styles, and I’ve found that they
relieve a lot of worry about side-effects when changes are made. The
volume of data grows pretty quickly though (there are 10,000+ fixtures
for just six styles), so working out good developer-side tools for
managing the suite would be important. If you chose to go that route.

I like that. And indeed, being able to see the effect of a change in the form of a diff in the output would be very useful not just for your work, but in general for any style being edited.

The fixtures in your case are a bunch of publication entries, with the expected output for the different styles, correct?

Charles

For an institutional home, maybe (maybe) an active university consortium.

http://www.universitas21.com/about

or
http://www.sakaiproject.org/about-sakai

Having spent a bit too much time with the latter over the past couple
of years, I’d have to say no to that; it’s culture and governance
(they just merged with Jasig) seem a poor fit.

Bruce

For an institutional home, maybe (maybe) an active university consortium.

http://www.universitas21.com/about

or
http://www.sakaiproject.org/about-sakai

Having spent a bit too much time with the latter over the past couple
of years, I’d have to say no to that; it’s culture and governance
(they just merged with Jasig) seem a poor fit.

Bruce

Sorry to hear that. U21 is an even less likely candidate, I’m afraid.
Something will turn up.

It’s tricky. I developed a spec for a fairly serious institution a year or so ago, but unfortunately the project got lost in the bureaucracy. I have heard of some opportunities coming to light via for example this australian organisation http://www.intersect.com.au, but I don’t think that our local university systems are sufficiently, thus identifying funding opportunities for generic tools is tricky. Hopefully this will improve over the next five years or so.

JISC? (http://www.jisc.ac.uk/fundingopportunities.aspx)

They seem to fund some projects related to CSL, although I don’t know
if it’s a problem that none of us are UK-based.

Rintze

How about asking on twitter to all our followers put together? Some may have ideas of organizations/institutions that would fund this type of stuff?

How to phrase that?

Charles