CSL Funds & Projects

Sebastian_Karcher · April 20, 2019, 2:26am

CSL occasionally receives donations and we currently have some funds left from a) a Mendeley/Elsevier dontation a while back and b) a more recent donation from ProQuest, which is using CSL styles in RefWorks (and other products). Just to remind everyone, I’m currently receiving these as an independent contractor for tax purposes and then paying them out as communally decided.

To facilitate accounting, I have started putting all CSL funds in a separate high-yield savings account, which has the added benefit of earning some interest while we can’t decide. I’ve created a simple spreadsheet that shows the current budget (obviously this isn’t automated; I need to update it).

We have in the past paid some reward to those of us who actively maintain the style repository or significantly contribute to new/improved styles. I think it makes sense to continue to do this, but I would personally like to use a good batch of the current funds to improve CSL infrastructure. In particular:

Improve the CSL Editor: definitely things like updating/moderinizing the code and making updates easier. If possible, it’d be really cool to integrate the editor with a project like anystyle.io so it wouldn’t require specific formatted citations – you could just paste any sample. We think this is doable, but a larger project. The most qualified person to do this – Sylvester, who has written both anystyle.io and the citeproc-ruby parser, is unfortunately not available for this.
Improve the CI for style reviews. I think Rintze and my biggest wish would be for the CI to generate a small set of sample citations for submitted PRs to help with a sanity check. I’m not a CI expert, but given the existing infrastructure, this would seem to be eminently feasible for someone who is.

My questions at this point:

Do those two projects sound good for CSL funding?
What’s a reasonable budget for each of them?
Is anyone interested in working on either of them or knows someone who is (we have traditionally favored working with people who are already interested in CSL in some way or the other; that ensures familiarity with the project and also a principal motivation to help the project along rather than to check the boxes for a dev contract)?
Are there other, similar specific dev investments that we should be thinking about?

As a note on scope, we understand CSL funding to comprise roughly anything that is or should be on the citation-style-language github. So we wouldn’t want to fund anything particular to a specific citeproc, but adding something that’s useful for all/most citeproc developers is definitely in scope.

@Rintze_Zelle will almost certainly have additional thoughts on this, but input is welcome from everyone.

Rintze_Zelle · April 21, 2019, 4:08am

The ability to easily generate diffs of style output would also be extremely valuable. E.g. for the following scenarios (some of which would be outside of pull request CI):

Pull requests where an existing style is modified (diff between output of existing repo style and PR version)
Pull requests where a new style is submitted (diff between output of existing repo template style and PR style)
Pull requests where we want to add some commits (diff between PR version and our edited pre-commit version)
Comparing output of any two styles (e.g. between an existing repo style and edited version, to allow people to check their edits before creating a pull request)
Comparing output of different citeproc-js versions (e.g. to check for regressions or changes in behavior)

Yes :). Maybe $2000-$4000 for (1), and $1000-$2000 for (2), depending on scope? Modernizing the style editor seems more urgent and is probably significantly more work.

retorquere · April 22, 2019, 12:21pm

on the CI part: Is there already infrastructure to run tests similar to this?

Otherwise, this seems doable. Samples of what’s desired and a pointer to what’s already available in terms of testing infra would be really helpful to get an idea of the scope of the project.

Rintze_Zelle · April 22, 2019, 8:35pm

As a separate possible line of expense, it might be a good idea to start subscribing to a proper backup service for CSL’s main repositories (“styles”, “locales”, “documentation”, etc.). GitHub endorses BackHub, which costs $144/year for the cheapest tier (10 repos). Or we could hack something cheaper together ourselves. Seems like a good thing to have in case an account gets hacked or we dramatically mess up a repo ourselves.

https://help.github.com/en/articles/backing-up-a-repository
https://backhub.co/pricing/

I’m particularly curious to hear from the large downstream projects (Zotero, Mendeley, etc.) whether they think this would be valuable to them.

(we separately might also want to use Zenodo to e.g. archive CSL schema releases for long-term archiving and getting citable DOIs; https://guides.github.com/activities/citable-code/)

Dan_Stillman · April 22, 2019, 9:12pm

Not that I want to be the one to have argued against backups if something bad happens, but I don’t see a lot of value in this. The whole point of Git is that a copy of the entire commit history exists on everyone’s computer (and would still be available in the reflog for a while even after a catastrophic force-pull). And while that doesn’t include things like issues and pull requests, restoring those from a backup would be pretty ugly. So while periodically running one of the open-source scripts that dumps everything available via the API seems like a decent idea, I don’t see this as something that’s particularly worth paying for.

I’d say the most important thing would be making sure that people with write access have 2FA enabled and limited API key settings.

Frank_Bennett · April 23, 2019, 1:24am

On style tests and comparisons, I’m working on some changes to Citeproc Test Runner that may be useful. An outline of the plan is in the dev section of the Jurism docs. Basically, the plan is to set up style-specific collections in a public Zotero group, and dump tests from there into eponymous subdirectories of a GitHub project that houses style tests. The dumped fixtures will then either be approved as valid using the Test Runner, or manually edited in the RESULT field to reflect desired output.

An additional part of the plan, not reflected in the Jurism docs note, is to output each form of the item: initial reference; ibid; subsequent; and !near-note, each with and without locator, plus the bibliography entry if applicable. That should give a pretty good picture of what a style does; and by running tests for one style against another, you can get an at-a-glance view of the differences.

I need this for Jurism development anyway, so no funding is needed to push it along, but someone might be able to build something interesting on top of it for use in CI or so.

Rintze_Zelle · April 23, 2019, 3:04am

Okay, just checking. (for a free solution, we could also e.g. introduce quarterly releases of the “styles” and “locales” repos and have those automatically deposited into Zenodo)

Rintze_Zelle · April 23, 2019, 3:29am

We currently use Travis CI on various repos (Travis CI - Test and Deploy with Confidence), with e.g. RSpec tests for the GitHub - citation-style-language/styles: Official repository for Citation Style Language (CSL) citation styles. and GitHub - citation-style-language/locales: Official repository for Citation Style Language (CSL) locale files. repos for quality control, and webhooks to alert GitHub - citation-style-language/Sheldon: Pull request bot for the CSL styles repository, GitHub - citation-style-language/distribution-updater: A WSGI app to update the CSL styles distribution repo based on a Travis CI webhook, and Zotero of build results.

Sheldon posts GitHub comments to pull requests in our “styles” and “locales” repo to assist contributors. See the posts by csl-bot in Update universiti-kebangsaan-malaysia.csl by mazleha · Pull Request #4003 · citation-style-language/styles · GitHub for an example. distribution-updater updates GitHub - citation-style-language/styles-distribution: Official repository for distribution of validated CSL citation styles. whenever a Travis build for the “master” branch of GitHub - citation-style-language/styles: Official repository for Citation Style Language (CSL) citation styles. completes successfully. (the styles-distribution repo is a bit redundant now that GitHub offers protected branches (About protected branches - GitHub Docs), but we haven’t bothered making the change yet)

retorquere · April 23, 2019, 7:56am

That looks pretty comprehensive already. Does csl-bot not already do what’s requested above? I see it producing differences in th GH issue.

Rintze_Zelle · April 23, 2019, 11:07am

Sheldon/csl-bot just links to the Travis CI build reports of failing builds. The tests in these builds currently don’t include any CSL processor-based citation rendering. The differences reported in e.g. https://travis-ci.org/citation-style-language/styles/builds/509502463 are just the result of some string matching within the CSL XML code (see e.g. https://github.com/citation-style-language/styles/blob/5f60c0b0c26b463754661c587c95a0626f60e999/spec/styles_spec.rb#L184).

Sebastian_Karcher · April 26, 2019, 1:00pm

I’ve created a GH issue with a mock-up for the citation rendering part of the request. I agree with Rintze that easy access to the diffs would be great, but don’t currently have a good idea how that would even look, so will leave that to him to add either in the same mock-up or in a separate ticket.

Edit: One more idea in this issue: https://github.com/citation-style-language/Sheldon/issues/14

cormacrelf · April 27, 2019, 2:08am

There are auto-generated diffs at https://aglc4.cormacrelf.net/csl that basically turn split green/red when there are differences in a normalised HTML string. There’s a lot of code in jest-csl to build on, similar to Frank’s test runner, but there are also React components for laying out test results (no real difference from diffing) that could be made into a static site with one page per file (with Gatsby.js) that the ci bot embeds in a comment or links to.

Sebastian_Karcher · May 6, 2019, 1:30am

@retorquere – not particular rush, but just wanted to see if the tickets & mock-ups make sense and if you think something along these lines might be doable?

retorquere · May 6, 2019, 12:05pm

Sorry – I have been slammed the past week, but that should clear up around next Monday.

The mockup looks OK, but that seems like a fairly simple thing to do? If I’m reading it right, it’s just to add a rendered citation/bibliography when the test passes?

Sebastian_Karcher · May 6, 2019, 2:19pm

… and ideally better (customizable) error reports, right on github, when it fails. Might well be simple – all the better. It’ll save us a ton of time and, more importantly, reduce poor quality styles.

retorquere · May 6, 2019, 2:41pm

Just so I have a clear picture – this would then be a modification of Sheldon qua scope. Would it be OK to introduce node there? That would be the easiest way to make sure it actually goes through citeproc-js.

Sebastian_Karcher · May 6, 2019, 3:12pm

Absolutely, yes. Keep an eye towards ease of maintenance, but we’re agnostic in terms of tooling.

retorquere · May 6, 2019, 5:20pm

Wait – Sheldon is just a separate bot that picks up on travis results, correct? That’s trickier because it doesn’t have easy access to the PR context to generate the diffs. It seems to me it would be a lot easier to add this to the actual test runner – the test runner could either actively push out comments to the GH issue (that’s how the BBT builds do it), or actively ping the bot with build assets who then takes care of announcing to the GH issue.

I haven’t worked with Travis bots before – I’d have to dig into that first.

Sebastian_Karcher · May 6, 2019, 8:24pm

We’re completely agnostic about methods. We don’t want people to have to click through to Travis etc., but absolutely this doesn’t have to be a Travis bot/app. Sheldon is some years old, it’s possible the testing framework simply wasn’t sufficiently advanced to do this or that Sylvester simply was more comfortable doing this with a bot, but in either case, in spite of having a name, we’re not terribly attached to Sheldon per se as long as we get similar output (and can customize the message text with reasonable effort).

retorquere · May 28, 2019, 10:14pm

WRT backing up the repos, pushing a copy of the repo to backblaze in a nightly Travis job (or anything else where a clone can be fetched and copied to b2) would run about $5 for a rolling full year of daily full snapshots if my calculations were correct (locales + styles is currently 156MB but let’s call that 200MB to factor in growth, times 365 days makes some 73GB for one year, at 0.005/GB/month would be $4.38/year).

Topic		Replies	Views
Use of CSL Funds 2022 Edition	5	609	April 12, 2022
Mendeley/Elsevier Donation CSL Development	26	577	October 15, 2014
CSL editor update CSL Development	24	693	February 24, 2015
Pursuing grants for CSL-related development CSL Development	15	334	April 3, 2013
CSL style submission bot CSL Development	1	358	April 3, 2015

CSL Funds & Projects

Related topics