Citation Style Language

CSL Funds & Projects

I’ve created a GH issue with a mock-up for the citation rendering part of the request. I agree with Rintze that easy access to the diffs would be great, but don’t currently have a good idea how that would even look, so will leave that to him to add either in the same mock-up or in a separate ticket.

Edit: One more idea in this issue: https://github.com/citation-style-language/Sheldon/issues/14

There are auto-generated diffs at https://aglc4.cormacrelf.net/csl that basically turn split green/red when there are differences in a normalised HTML string. There’s a lot of code in jest-csl to build on, similar to Frank’s test runner, but there are also React components for laying out test results (no real difference from diffing) that could be made into a static site with one page per file (with Gatsby.js) that the ci bot embeds in a comment or links to.

@retorquere – not particular rush, but just wanted to see if the tickets & mock-ups make sense and if you think something along these lines might be doable?

Sorry – I have been slammed the past week, but that should clear up around next Monday.

The mockup looks OK, but that seems like a fairly simple thing to do? If I’m reading it right, it’s just to add a rendered citation/bibliography when the test passes?

… and ideally better (customizable) error reports, right on github, when it fails. Might well be simple – all the better. It’ll save us a ton of time and, more importantly, reduce poor quality styles.

Just so I have a clear picture – this would then be a modification of Sheldon qua scope. Would it be OK to introduce node there? That would be the easiest way to make sure it actually goes through citeproc-js.

Absolutely, yes. Keep an eye towards ease of maintenance, but we’re agnostic in terms of tooling.

Wait – Sheldon is just a separate bot that picks up on travis results, correct? That’s trickier because it doesn’t have easy access to the PR context to generate the diffs. It seems to me it would be a lot easier to add this to the actual test runner – the test runner could either actively push out comments to the GH issue (that’s how the BBT builds do it), or actively ping the bot with build assets who then takes care of announcing to the GH issue.

I haven’t worked with Travis bots before – I’d have to dig into that first.

We’re completely agnostic about methods. We don’t want people to have to click through to Travis etc., but absolutely this doesn’t have to be a Travis bot/app. Sheldon is some years old, it’s possible the testing framework simply wasn’t sufficiently advanced to do this or that Sylvester simply was more comfortable doing this with a bot, but in either case, in spite of having a name, we’re not terribly attached to Sheldon per se as long as we get similar output (and can customize the message text with reasonable effort).

WRT backing up the repos, pushing a copy of the repo to backblaze in a nightly Travis job (or anything else where a clone can be fetched and copied to b2) would run about $5 for a rolling full year of daily full snapshots if my calculations were correct (locales + styles is currently 156MB but let’s call that 200MB to factor in growth, times 365 days makes some 73GB for one year, at 0.005/GB/month would be $4.38/year).

As an update, @retorquere has done an amazing job updating Sheldon so that we now get previews of changes in PRs. He’s put a lot of work into this and still offered to do it at the low end of our suggested rate above, i.e for US$1,000.

This is already making our work reviewing style PRs easier, so Rintze and I would be more than happy to pay this out. We’ll wait a week for any concerns raised here and then, absent objections, pay Emiliano.

We’re still looking for someone who wants to take on the csl-editor update. Please post here and-or to a separate thread.

1 Like

These previews are pretty amazing!

I’m happy they help.

In what sense are you guys looking to modernize the CSL editor? Just an update to ES6? Of the demo site or of the cslEditorLib? I could do that but that seems like a marginal win. Do you mean streamlining the deployment of the demo site (stuff like webpacking for example)?
Or do you mean something like React-ifying the editor?

@retorquere

  1. Seeing no objections to releasing the pay – are you able to sent me an invoice of US$ 1,000 ? Can be informal and by email.
  2. For the web editor, we’re honestly not 100% sure. Here are some general thoughts:
    a) cslEditorLib relies on a bunch of dependencies, some of them outdated. It’d be nice to make sure this is all in shape, and more generally, that the JS used is up to date (including ES6)
    b) the whole deploy process is tedious: in cslEditor, Update submodules, update citeproc separately, then regenerate the example citations, then go over to the demo cite, update the processor, then deploy. There has got to be a way to make that simpler or even automate it
    c) The generation of sample citations is ugly – it writes one single massive (and increasingly massive) JSON file using absurd amounts of memory (IIRC that’s a common problem in writing JSON in javascript bu there are solutions). Speeding that up and making it more robust (i…e less memory intensive) would be great.
    d) While a-c would just be updates, the real vision would be to fundamentally change the basic functionality: instead of requiring specific citations, we would use something like the citation parser behind anystyle.io and alllow people to just paste any citation (e.g. the samples for the author guidelines). Sylvester seemed to think that was totally doable and if it were that’d be a total game changer.
  1. I’ve not sent an invoice in my life. If you can give me the text that would work for you I’ll gladly bounce that back to you.

2.a AFAICT, all dependencies are git submodules, not npm packages, so the best that can be achieved there is a pull from their head. Codemirror is available as an npm package, as is jstree, but jstree for example hasn’t been updated in 7 years, I’m not sure using package versioning would help there. I’m not familiar with the module system used in the editor (I see a “define” for example that looks like it brings in libraries but not sure where that comes from – maybe we should bring in Steve Ridout as the lead on this stuff and I could just assist him where it’s helpful)

2.b should be doable using Travis

2.c why does it write one massive (and growing) JSON file? generateExampleCitations.js doesn’t currently run for me so it’s not easy for me to establish what it does exactly.

2.d I don’t yet understand what the specific citations (from 2.c?) do, and how they are used in the CSL editor. Also, anystyle.io is written in Ruby, so I don’t yet grok how it would be incorporated in the JS-based CSL editor – this seems best left to Sylvester if you ask me.

From what I’ve seen, anystyle.io only determines which parts of the citation correspond to which properties, not the style. If that’s going to be used, it’d still have to generate citations in multiple styles to match the inputted citation to the styles. So, the big JSON file would probably not be needed (unless we want to do both) and we would have to decide whether to host an API for citation generation or have it in the browser. The generation is probably pretty costly, so I’m not sure about an approach there.

2a) It’d be great to get Steve on this, but from our last communications (a couple of years back) I’m not optimistic. All dependencies (except for citeproc-js) are git submodules, yes.
2c) The JSON contains the sample citations in all ~2k styles – it’s growing because we add more styles and it’s enormous for obvious reasons
2d) So the editor’s search by example function does a match of the entered example against the examples (in the JSON from 2c). The idea would be to allow any example and parse that. As Lars says, that does require on-the-fly generation of the styles. The preference would be to run this in the browser so we don’t have to deal with server, user data, and security. I’m not sure if this is possible, but as I said, that’d be a total gamechanger. If this is possible but requires more work than our small budget allows, this one I’d also try to find additional money for.

FWIW – I had thought of this as a natural project for Lars, but happy to have you collaborate, have one take the lead and have the other advise, whatever works. Just please agree on any monetary components beforehand so that there’s no unnecessary conflict.

For pricing, I’d say we’d be happy to offer $2,000 for modernization along the lines of a-c (within the realms of whats possible) and another $2,000 if d) is possible.

Let me know what you think.

Back-of-the-envelope numbers based on the trials summarized in Performance testing? suggest about three to five minutes to generate a full set of samples from arbitrary input. There is some overhead to instantiate the processor for each style. In the test rig:

Cell 83ms
Chicago Fullnote 133ms
APA 257ms

Taking 90ms as an optimistic average build time, and 15ms as rendering time for three additional cites (citeproc-ruby clocks at about 20ms, this is again on the optimistic side for citeproc-js) yields 4.4 minutes as a low-end estimate:

bash> echo 4k 1964 d 0.090* r 0.015 3**+ 60/p|dc
4.4190

1 Like

I don’t need to be lead on this. I figured since there was no uptake yet I could at least get some small things off the ground just to kick off, but it’s slow going for me.

It seems to me in the current implementation, the samples only need to be updated when there’s a change on the styles repo. Even if generation takes long, that shouldn’t matter; Travis could take care of this. But even for this constrained case, I’m still not grokking the basics; citeproc-js is available as a node module, so I’m still trying to figure out why pregeneration uses requirejs, or why there’s a JSDOM and jQuery dependency (it runs in Node, yeah? Not the browser?). And this is just a minor and relatively isolated part of the stack.

All that goes to show that I’m currently unqualified to take the lead here. I’m not even sure I want to be paid for the minor work I could do on this as things stand currently. If Lars wants to take point on this, I’m a-OK with this. If Lars gets the full funds and I can help here and there, I’m also a-OK with this. I declare here and now and definitively that if Lars gets involved, any budding monetary dispute is pre-solved because even if we both do half the work (which seems unlikely), I’m OK with Lars getting the full sum. If Lars joins, I’ll be happy to reiterate this, or, if that makes people uncomfortable, I will take a small, pre-determined amount.

WRT anystyle, it may be possible to bodge something together analogously to:

  1. run a citation through anystyle (e.g. Putnam, Hilary [1985] A comparison of something with something else, New Literary History, vol. 17, pp. 61–79.)
  2. We get back [{"author":[{"family":"Putnam","given":"Hilary"}],"title":"A comparison of something with something else","container-title":"New Literary History","volume":"17","page":"61–79","language":"en","issued":{"date-parts":[[1985]]},"id":"putnam1985a","type":"article-journal"}]
  3. We take a pre-generated sample citation of type article-journal and in it’s formattedBibliography replace the author, title, year etc by a naive text match (this is the bodge part)
  4. Run the search as usual.

But that does depend on the existing search being fuzzy, not crisp, because the odds of getting crisp input this way is minimal.

In principle it should be possible to extend anystyle to do style detection, but I’m not sure where we’d get a sufficient corpus to train it on.