Citum: a reimagining of CSL

UPDATE

I’ve decided to distance this project from CSL, so the new name, and org, is citum. DNS is currently propagating, but this will be the new apex url:

I believe I’ve solved, or at least have a plan to resolve, all of the issues and pain points we identified in this virtual meeting we had in 2022.

The project is almost feature complete, which is to say supports the features in 1.0 plus many more; multilingual, sectional bibs, advanced EDTF dates, etc.


Earlier Context

A few years ago, I started experimenting with a new approach to evolving CSL in GitHub - bdarcus/csln: Reimagining CSL .

But I’m busy, and am amateur programmer with aspirations that (far) exceed my time or skills.

In the past week, however, I’ve been doing a deep dive into new agentic coding tools; notably using the latest Claude’s Opus and Google Gemini models.

After I got more comfortable with how to exploit these tools, I threw this new project together in less than 24 hours, and I have to say: I am super impressed. It already does much more than what I achieved with the earlier project (though is borrowing code from it).

I should add, however, a huge question for me remains whether the 100% fidelity claim mentioned in the README is even possible. I aim to figure this out over the coming weeks (though early progress is slow, so it may take many weeks!)!

Basically, I had these tools analyze how to extend my earlier experiments (which got pretty far actually) in order to bring the vision to completion.

Perhaps the most interesting possibility this opens up, I think, is reflected in the contributing section of the README, which you can see if you create a new issue and select “domain expert.”

The prior art analysis is also super interesting. It’s the result of me asking how to synthesize all the information in the respective code bases, the csln repo issue tracker, and in the spec documents for CSL/M 1.0. That’s now incorporated in to the roadmap (this file, while human readable, is aimed at the LLM tools).

Here’s a parallel project with the start of a rust-based server, upon which I intend to build a client UI based on an idea I’ve previously talked about, that I will make sure the core code supports (for live-previewing and such). Here’s the browsing UI I imagine:

And this is a representation of the creation wizard I’ve previously discussed, with the idea being it has live previewing.

I’ve added a couple of design docs to address some long-standing issues and questions:

  1. How to deal with style reuse and duplication; here there will be no dependent styles, but a more composable alternative. Notably also, iterative development is now driven by the priorities and knowledge reflected in actually-existing 1.0 styles.
  2. How to make finding and creating styles easier.

I got a basic demo of the rust server + sveltekit client front-end working. The previews are (mostly) generated dynamically on the server.

As I said above, I’m trying to develop these in parallel so that they are fully complementary, though will now turn back to the core code.

The last few weeks I’ve focused on improving the XML based migration of styles to these new, quite different models. When I saw multiple high end models fail, I decided to pull the plug on the approach, which was wasting a lot of time and resources.

Instead, I had the strong hunch an earlier idea I had of inferring templates from formatted citeproc-js, would be simpler and more reliable.

So, I had the new Opus 4.6 model run an analysis of the two approaches, and have an architect agent propose a plan.

That plan is here; it uses the XML for what it’s good at, and a new JS inferred script for the templates.

That does includes initial experiments that justify the change in approach:

The inferrer validates that the hard problem (template structure) is better solved by observing output than by parsing XML. The XML compiler’s 0% bibliography match was not a bug — it was evidence that procedural-to-declarative translation via macro flattening is fundamentally harder than reverse-engineering from rendered output.

Next step is to hook up this script to other scripts in order test full rendering impact.

Latest updates:

  1. Per above, I ditched the approach of trying to parse 1.0 XML macros and templates and map them to the very different new model; instead the focus will be deriving styles from the output (using citeproc-js). It’s much easier to reason about and debug parsing common input data than it is the insanely complex 1.0 styles.
  2. I also found an extension of this idea to have an LLM not only do a good job of creating a style, but also that it could iteratively improve the code to match the expected output. Reflected in a new styleauthor agent and skill. More.

And this wrinkle from APA, not supported in CSL 1.0, now works (along with integral/narrative citations generally)!

=== apa-7th.yaml ===

CITATIONS (Non-Integral):
  [pew_social_media] (Auxier & Anderson, 2021)
  [berger_luckmann] (Berger & Luckmann, 1966)
  [vaswani_attention] (Vaswani et al., 2017)
  [aad_atlas_higgs] (Aad et al., 2012)

CITATIONS (Integral):
  [pew_social_media] Auxier and Anderson (2021)
  [berger_luckmann] Berger and Luckmann (1966)
  [vaswani_attention] Vaswani et al. (2017)
  [aad_atlas_higgs] Aad et al. (2012)

yeah this makes sense. forcing CSL 1.0 XML macros into a totally different model was always going to be brittle. macro flattening + conditionals = semantic mess. that 0% bibliography match wasn’t a bug, it was proof the translation layer was wrong.

inferring templates from citeproc-js output is just smarter. you’re treating rendered output as the contract and reverse-engineering structure from there — that’s a problem LLMs can actually handle.

the fact that APA integral/narrative citations now work is the real win. if CSL 1.0 couldn’t express it cleanly but your new pipeline can, you’re clearly on the right track.

It turns out they (particularly the latest codex models) are really good at this. I added a skill that just has them iteratively refine the style against the target, while simultaneously looking for code improvement opportunities. So a good upgrade “wave” sees big jumps in the style metrics AND useful code improvements.

I’ve since broadened that “authority” system beyond CSL; so there are now a few styles I had it port from biblatex, since they can’t be represented fully in CSL; in those cases, it uses the biblatex output as the source of truth.

Also, by way of update, the citum-core project is now pretty much feature complete, with everything listed here now implemented.

Notable that a lot of these features don’t exist in CSL.

It also passes all strict clippy linting, and includes over 600 automated tests.

There are just some little things I need to do before calling this 1.0, and publishing libraries and relevant binaries in the right places.

What I need now: human testing above all.


Also in the realm of cool news, @zepinglee is looking into integrating this into latex.

And I have offered to hand-off that proof-of-concept project to him to develop.

More on the infrastructure end of the project, the core repo now has ~700 automated tests. But they’re not super transparent to humans.

So I’ve integrated a solution for that into the Github CI:

It’s not complete, but they document the behavioral logic for two core crates: the engine (processor), and the migration one to translate CSL styles into Citum styles.

A number of core tests, BTW, were ported from the CSL and CSL-M test suites.

If anyone catches some we missed, let me know.

I stumbled into implementing a few related features that in retrospect make perfect sense, but which I wasn’t much thinking about since I am a mono English language scholar, and I was never really involved in CSL style development and maintenance.

Together they should make the explosion of styles in CSL (language and other small variants of big styles like APA or Chicago; and the problems with dependent styles in general) obsolete.

The three primary changes:

  1. First, what I call “presets” in Citum aren’t just aliases, as dependent styles in CSL are. They define default behavior, which can be locally overridden. So you want Chicago author-date but a different et al rule? You can very concisely represent that. And while I am currently adding support for it at the style level, it’s pervasive throughout the design, so that feature ends up being IMO a superior solution to both dependent styles and macros.
  2. Second, the biggest styles, with the most dependent styles in the CSL world, are compiled into the engine. So users don’t have to worry about finding, keeping track of, updating these styles.
  3. Finally, the locale system can also be locally overridden. So language-variant styles (like chicago-author-date-de.csl), with variant locale files, should also no longer be needed.

The PR, now merged.

I’m was more cautious with this work, because it’s a big change with far-reaching implications (not easy to revert, for example).

One thing I had the agent do to help with decision-making is to update the citum-analyze binary to include metrics that might allow a reasonable approximation of potential benefits compared to a CSL approach to this. A review agent then called those results conservative. So on balance I think the benefits largely outweigh the small to moderate increase in complexity.

I think the only way to really test, though, is to run a bulk process with an enhanced citum-migrate where the code + agent figures out how to optimize. We’ve made a start at that, but not really dug in.

It’s worth noting that I’m implementing some of the features to enable the new style wizard I’m working on. So iterate back-and-forth between the two projects to co-evolve them as well.

That hub API UX is hard, but I should be able to demo it in the coming weeks. Here’s a screenshot of a working UI (which, BTW, points to a cool feature that the entire UI is built on: live WASM-based previewing that is super fast):

Making this very early alpha available in case people want to try it out.

https://hub.citum.org

Notable:

  • “Dependent styles” are just entries in a registry, which are stored in the db.
  • There is WASM-based live previewing everywhere, including in …
  • … the wizard, which actually works, and I think shows what the schema design makes possible.

The big caveat is I’m still not sure if this iteration of the wizard is good enough for complex real world styles. As I said above, this UX is really hard, and I’ve just been focused on getting it to work, rather than rigorously testing it myself.

If this or an iteration of it does end up working well, however, the idea of the hub is it would be the easier to use and maintain successor to the CSL solutions (the styles repo, the editor, etc).

So you can imagine extending this to include dedicated maintainer roles scoped by styles or categories, style versioning, etc; users will be able to “fork” and share styles, bookmark them, etc, which can also sync with their local citum installation.