Upcoming CSL meetup context

Right now, without specific processor magic, 1.0.1 input data is already not compatible with 1.0.2 styles (and vice versa), due to event-title and software

I’m not sure I understand what you mean. Could you elaborate?

If you have CSL-JSON data with event-title, a 1.0.1 style that use <text variable="event"/> would not work and CSL-JSON data with event does not work with a 1.0.2 style using <text variable="event-title"/>. Same for <if type="software"/> and <if type="book" variable="version"/>. Unless the processor transforms the CSL-JSON data to make it compatible with the style version (though the style only says 1.0 making it difficult to know whether it’s compatible).

I see there’s some discussion in [Feedback CSL 1.0.2]: Publication type "event" - #31 by Philipp_Zumstein already. That would help but the version strings need to be more precise in this case.

Meeting notes (still open for comments)

(I tried to quickly add them here, but formatting got screwed up)

High-level summary

Great discussion, but we didn’t really resolve the key questions. Hopefully we can in the coming months.

Any further thoughts on how to do that, please post them below.

My further thoughts and ideas

YAML tests?

One small thing I was wondering: should we switch to YAML (or TOML) for the test suite, per @cormacrelf’s example?

Define two (or more) change types, and match them with GitHub review teams

One bigger question, that is about process and labor:

If we do decide we want to release a bigger change, and so have a plan to evolve CSL going forward, perhaps we could have github teams to review different kinds of changes?

For example, processor developers could be on a “Processor reviewers” team, and we could have a separate team for variables/type/term additions and such (adding strings).

Issue triage could then classify and auto-assign reviewers.

And if we don’t want to be bothered, we should just say so and declare CSL effectively frozen, aside from minor changes.

But per @Sebastian_Karcher’s point during the meeting, I’d prefer not to give up the goal of being able to evolve CSL, despite all of the inertia.

Is a schema with more modular features possible?

On that point, I guess the even bigger question is if it’s technically possible to do what some were musing about: having modular features that processors could support, or ignore without consequence for backward compatibility.

I haven’t ever really looked into this idea, but I’m skeptical it is possible.

But that would solve a lot of problems if it was.

Certainly some of what we added to “1.1” could be ignored by a 1.0 processor. But we’d basically need rules for that; e.g. “Processors shall ignore unknown nodes.” And with that, it may constrain some of what we added.

But I’m not sure; maybe we should look into that more?

Off the top of my head, for example:

  • cs:intext would work, while this alternative would not
  • split titles almost surely would not
  • EDTF dates: not sure
  • related-items: no

My apologies for the no-show, y’all.

1 Like

Hi all, sorry I could not come to the meeting. A few thoughts:

  • I’m glad to see the discussion about an automatic test suite for the major styles.
  • When CSL is updated, it would be good to publish an “archived” style repository for the older version, to support people using processors that only support the older version.
  • I like the idea that styles that support the new version could still work with the old version; but I don’t have a good enough idea how much of a constraint that would be on improving CSL.
  • I’ve often wished for a tool that can “prune” CSL in a test case so that it’s only as big as it needs to be. (I’m often too lazy to do this, so my test cases are often big and contain many copies of the same CSL style.) It should be possible to add tooling to a CSL processor that keeps track of which macros have been used; this information could be input to a tool that removes unnecessary macros.

Edit: One more thing came to mind. When I was working on citeproc-hs, I observed that the test suite contained a lot of tests for minor corner cases that went beyond anything described in the CSL spec. My README.md contains the following:

Although this library is much more accurate in implementing the CSL spec than pandoc-citeproc was, it still fails some of the tests from the CSL test suite (67/862). However, most of the failures are on minor corner cases, and in many cases the expected behavior goes beyond what is required by the CSL spec. (For example, we intentionally refrain from capitalizing terms in initial position in note styles. It makes more sense for the calling program, e.g. pandoc, to do the capitalization when it puts the citations in notes, since some citations in note styles may already be in notes and in this case their rendering may not require capitalization. It is easy to capitalize reliably, hard to un-capitalize reliably.)

Indeed, in a lot of these corner cases I think citeproc-hs produces better output than the expected test output. I seem to recall discussing this with one of you at the time. Anyway, the gist of my comment is that it would be nice to separate out true CSL conformance tests (which test behavior specifically mandated by the spec) from tests that were just in there as citeproc-js regression tests. The latter tests may still be useful, but they should be separate.

We actually have tried to do this, I think in response to your earlier feedback.

Do you have a list of these remaining tests that you would propose?

Also, while on this topic, any thoughts on changing these to use YAML, like this?

July 22
… the gist of my comment is that it would be nice to separate out true CSL conformance tests (which test behavior specifically mandated by the spec) from tests that were just in there as citeproc-js regression tests. The latter tests may still be useful, but they should be separate.

We actually have tried to do this, I think in response to your earlier feedback.

Great! Looking it over, it looks better than I remember. I still see some things which, as far as I can see, don’t test anything mentioned in the spec. Here are some examples of failures from citeproc-hs. In each case I may have simply missed a relative part of the spec.

Does the spec specify these behaviors for quotes?

[FAILED]   test/csl/flipflop_OrphanQuote.txt
--- expected
+++ actual
-Nation of "Positive Obligations " of State under the European Convention on Human Rights (1)
+Nation of “Positive Obligations ” of State Under the European Convention on Human Rights (1)

[FAILED]   test/csl/quotes_QuotesUnderQuotesFalse.txt
--- expected
+++ actual
 <div class="csl-bib-body">
-  <div class="csl-entry"> 'Title with ‘quotes’ in it',.</div>
+  <div class="csl-entry"> ’Title with ‘quotes’ in it’,.</div>

Does the spec say to do this? (As noted in my last mail, it’s not always desirable.)

[FAILED]   test/csl/magic_CapitalizeFirstOccurringTerm.txt
--- expected
+++ actual

Does the spec specify these things?

[FAILED]   test/csl/sort_OmittedBibRefMixedNumericStyle.txt
--- expected
+++ actual
   <div class="csl-entry">1. Anderson, Book One</div>
-  <div class="csl-entry">2. [CSL STYLE ERROR: reference with no printed form.]</div>
+  <div class="csl-entry">[CSL STYLE ERROR: reference with no printed form.]</div>
   <div class="csl-entry">3. Crane, Book Two</div>

[FAILED]   test/csl/sort_OmittedBibRefNonNumericStyle.txt
--- expected
+++ actual
   <div class="csl-entry">Anderson, Book One</div>
+  <div class="csl-entry">[CSL STYLE ERROR: reference with no printed form.]</div>
   <div class="csl-entry">Crane, Book Two</div>

Here I wonder if the absence of space between the Latin and Chinese names is really intended?

[FAILED]   test/csl/name_EtAlWithCombined.txt
--- expected
+++ actual
   <div class="csl-entry">John Doe。</div>
-  <div class="csl-entry">——著,Ziggy Zither等點校。</div>
+  <div class="csl-entry">——著,Ziggy Zither 等點校。</div>

[Edit: added later] In this case I don’t understand why the space before Frinkle (which is present in the reference database) disappears:

[FAILED]   test/csl/sort_LeadingApostropheOnNameParticle.txt
--- expected
+++ actual
   <div class="csl-entry">d’Wander, W</div>
-  <div class="csl-entry">de’ Frinkle, B</div>
+  <div class="csl-entry">de’Frinkle, B</div>
   <div class="csl-entry">in ’t Horvath, P A B</div>

Also, while on this topic, any thoughts on changing these to use YAML, like this?

I guess I’m not sold on the advantages. The existing format is very straightforward to parse, and no special quoting or escaping is needed in the different sections. YAML has some complex quoting and escaping rules, which sometimes trip people up. In addition, you’d have to indent CSL when pasting it into the test. None of this is a big deal, though; if there are substantial advantages to switching to YAML, I wouldn’t object.


1 Like

We did this for 1.0.1–> 1.0.2 (the 1.0.1 branch of the repository) and would do the same moving forward.

Ooh I like that

1 Like

First of all, I’d really would like not to freeze CSL and see it evolving. But we could freeze CSL 1, and make a fresh start with CSL 2.0 while keeping CSL 1 around to keep the vast amount of available styles usable. Maybe we could still add the intext feature to a version of CSL 1 as this is more or less already there.

As I’ve said in the meeting I think the current release model entails a number of drawbacks. I’m still unsure if there are good and practical alternatives, but I wanted to post them before I forget.

Would regular (yearly?) releases be an option?

Instead of aiming for a complete release we could release whatever we have a certain time. E.g. once a year in August. Or maybe every other year in August. A comment period could then start in January before the next release.

CSL → meta-CSL ?

Right now, the CSL operates on a number of different levels. E.g., adding a new term is totally different from adding a new citation mode. Maybe we could make the distinction more explicit. CSL in terms of a DSL could specify that there are terms, variables of certain types (names, dates, etc.), but not what these things are. That could then be specified elsewhere. Also, processors could perhaps just pull in such a list from somewhere instead maintaining a list of variables themselves. Terms and variable only updates would then be much easier.

Add a beta channel for new features

Could we add a say beta channel to try out new features. Of course, these would have to be implemented somewhere by someone. We could then try out new stuff and codify afterwards if things work out.

Make the language more powerful on a lower level

Instead of adding complex features, we could give users more power to make certain things themselves, e.g. with additional testing features.

1 Like

I really like this idea (and the others as well).

We could then take a longer time to get it right, and maybe figure out the technical challenges.

I also think it’s cleaner from a marketing and communication POV; and of course potentially also much cleaner from a technical POV.

Instead of adding complex features, we could give users more power to make certain things themselves, e.g. with additional testing features.

Do you have any idea of what that might look like practically?

@John_MacFarlane - WDYM by “archived repository”? Do you mean that literally?

I worry a little about using branching as a strategy for this with such a large and monolithic repository, but I’m not sure if that worry is misplaced.

Conversation has died down, but I think we might draw a few high-level conclusions from the discussion so far:

  1. There may be some process things we can do in the short-run (like my suggestion of review teams), that would be easy, and make sense. But if we set up those teams, for example, folks will have to be willing to accept invites, and occasionally do reviews.
  2. There are some technical questions (raised most notably by Denis and I here, but also the Zotero folks during the virtual chat) that will only really be answered with the help of processor developers; people actually coding CSL. So we might encourage them to think about some of the questions we’ve raised, and weigh in if and when they can. It may well be possible we can find a technical solution that reduces some of the other inertial barriers to moving CSL forward?
  3. I think in any case we should rename 1.1 to 2.0, and prepare for a cleaner break, a bit farther out. I hope we’re talking a year or so, rather than multiple years.

Not really. Citavi, for example, allows users to use arbitrary C# code in their style to code their own tests. We could make something similar, but IDK if that will work given the variety of implementations.

That’s the rub; the whole point of CSL is that it’s language-independent.

Yes, sure. I was contemplating whether using a language such as Lua might be an option here… But that as well might work well for some implementations (pandoc’s citeproc, citeproc-rs, citeproc-lua), but I don’t know about the implementations in ruby, elisp and php.
And, even if it was technically feasible, it would still open a can of worms as it goes against the whole “xml can be validated” kind of thing.
Anyway, that’s probably a discussion for a different thread, if at all …

I’ve only thought about this just now, so may be missing important details (like substitution and formatting), but …

What about ditching XML entirely, in favor of one of the new cross-language template languages?

Here’s liquid:

Hello {{ 'tobi' | upcase }}
Hello tobi has {{ 'tobi' | size }} letters!
Hello {{ '*tobi*' | textilize | upcase }}
Hello {{ 'now' | date: "%Y %h" }}

So the | pipes the variable through one or more transformation functions, which can include arguments.

For CSL, the obvious filters, or filter groups:

  • Names
  • Dates
  • Titles

Could have a default filter for each group, and then variants that encapsulate arguments (the logic in existing style macros) for ease of authoring.

A simple hypothetical style fragment:

{{ author.names | shorten-names-apa }} ({{ issued | date-year | date-suffix }}). {{ title }}.

And then combine the template strings in a larger YAML file?

  mode: intext
  et-al-min: 4
  et-al-use-first: 1
  disambiguate-add-year-suffix: true
  disambiguate-add-names: true
  disambiguate-add-givenname: true
  givenname-disambiguation-rule: primary-name
  collapse: year
  after-collapse-delimiter: "; "
  template: ({{ author.name }}, {{ date | fmt-year }})

EDIT: it occurred to me that the more complex logic in CSL now is actually described in the simple attributes. So in the above, the template can actually be pretty simple. I don’t know how substitution would work elegantly however ATM.

Any future development work would come down to describing the filters, much like we do now for rendering elements, and adding input/output examples to the test suite.

And in refactoring, can consider things like multilingual from the beginning.

Finally, this would be:

  • easier for users to edit
  • maybe possible to convert to from existing styles.

This assumes most of the logic can be contained in the filters, so the actual templates are pretty simple. Am not sure about that.

Am I crazy, or might this be a way to address all of the above issues?

If you think about it, all the collective knowledge of citation formatting is embedded in that style repo; maybe we can use that to facilitate a more radical break (and also do other cool things)?

Or maybe the basic idea of chained filter transformation could be implemented in the XML syntax; not sure. As I said, this is just a very tentative idea.

Can you explain why that should solve these issues?

Getting ahead of myself there, given this is a hunch more than fully formed idea, but the essence is:

Maybe it’s possible to put more logic in named “filters” and parameters so as to simplify the actual template syntax, which could make compatibility issues easier to manage going forward?

If still XML, it might mean heavier use of attributes (which are easy to ignore in processing).

EDIT: me playing a bit:


template = element template { template.atts,
                              ( template.list | template.render)+

template.atts { attribute context { text }?, attribute name { text }? }

## A single `template` element, that can be used at top-level, or
## within `citation` and `bibliography`
template.render = element render { render.atts }

## And a list element.
template.list = element list { template.affixes, template.render+ }

## A general conditional.
template.cond = attribute when { text }

template.affixes =  attribute suffix { text }?,
                    attribute prefix { text }?,
                    attribute delimiter { text }?

render.atts = { attribute variable { render.vars } }
render.vars = "author" | "editor" | "issued" | "title" | "cited-locators"
render.fmt = {
    ## a template to call for partial rendering
    attribute template { text }?,
    attribute bold { xsd:boolean }?,
    ## an attribute that takes a list of filters, which transform the input
    attribute filters { list { text } }?,
    ## a conditional and substitute; can we do these with attributes?
    attribute substitute { list { render.vars } }?

Example XML:

<template name="citation-apa-paren"
          description="For default rendering of parenthetical APA citations.">
  <list prefix="(" suffix=")" delimiter="; ">
    <render variable="author" suffix=" "
    <render variable="issued" filters="fmt-date-apa"/>
    <!-- unclear filters vs other templates -->
    <render variable="cited-locators" filters="fmt-locators-apa"/>

Aside: there’s one or more problems with this example, since list here isn’t exactly equivalent to cs:group, and I’m not actually clear on the logic (though CSL 1 has the same issue). But it’s enough to illustrate the idea for now.

Basically, in this scenario, we merge cs:macro, cs:text, cs:group, etc into a consistent template, render, and list, where template can be used in different places.

There’s also a consistent way to signal when to transform the input, and when to output it as is.

This could include potentially loading templates from a separate file.

I can imagine extracting and converting, for example, the most widely used and developed styles (the chicagos, APA, etc.) from the styles repo and making them available in one or more files from a CSL NEXT styles repo, so new styles wouldn’t often have to include new template “macros”.

Given the huge technical debt, the question is not just whether this could work, but if a clean break is the best path forward.

But given the conversation last Summer, I felt like we were stuck, and so thought this worth proposing.

My hunch (again) is that for existing processors, if we did this right, it would simplify the parsing and processing code (potentially a lot?), and allow reuse of much of the existing logic (since the key processing-related attributes would stay).

It should also simplify style updating and such, and schema maintenance (since the schema itself should much simpler, and less often in need of updating).

I’ve been discussing the ideas with @Denis_Maier off-forum the past few days, and decided to instead to just ask a question over here.