CSL Discourse

CSL 1.2 Planning


#1

Hi everyone. @Philipp_Zumstein and I went through the zotero-bits issue tracker a while ago and added the items that we think will work well for an uncomplicated 1.2 release. Our criteria were that

  1. It should be a (fairly) uncontroversial change and
  2. It should not require any functionality updates from processors, just new terms/fields/item types

You can find our list here: https://github.com/citation-style-language/zotero-bits/milestone/2

My suggestions would be to do general comments here and put details on specific issues in the relevant issue.

I’d love to get this out this fall – is that realistic?
@Rintze_Zelle @Bruce_D_Arcus1 @Frank_Bennett (and of course anyone else who is interested)


#2

Fine my me, and your suggestion on process makes sense.

I do think we want input from implementers though, in part so they can plan how they’ll deal with 1 vs 1.2.

Speaking of which, do we have a plan on that? Do all 1.x versions get treated the same from a repo statement?


#3

Sounds good. Focusing on a simple release will allow us to focus on our own internal development and release process. My thoughts:

As for @Bruce_D_Arcus1 comment, impact on implementations should be minimal until we start updating styles in the “master” branches of the styles and locales repositories. Once we’re close to release we’ll create “1.0.1” branches in each so implementors can switch over to those if they’re not ready to make the switch to the new CSL version. We can e.g. have a 1-month window between finalizing the new CSL version & creating those branches, and the start of accepting styles and locale files in “master” for the new version.


#4

The set of specific changes looks fairly good to me in terms of the proposals being fleshed out and easy for implementation. The one other set of fields that seems to fit these criteria would be additional creator types for audiovisual items: producer, writer (for both scriptwriters and lyricists), and a term for performer. These would immediately improve CSL support for films, music, etc., and I think they fit the two key criteria here.


#5

@Bruce_D_Arcus1 I’m adopting semantic versioning in citeproc-rs. Styles can declare their version requirements like “1.1” or “1.1.0”, which means >=1.1.0 but <2.0.0. A style that doesn’t depend on any 1.1+ only features can stick with “1.0” or “1.0.1” etc and doesn’t need updating. You can use any version requirement syntax supported by the semver crate.

This requirement is checked by the engine: the engine knows what version it supports, and it will bail out if its own version is not within the specified range. This could, I suppose, be a warning, but few CSL tools have the ability to report warnings at runtime, so I’m not sure of the utility there. Such a bailout would be a prompt to update one’s engine.

From that point on, the engine just processes as normal. It won’t disable 1.1-only features for 1.0-supporting styles, and verifying that declared 1.0 support would actually still work in 1.0 is up to the style author, not the engine. You could do this by downloading an old engine. (Although it will disallow parsing CSL-M-only features for CSL styles and vice versa. These errors get reported at parse time.)

As far as I know, CSL does use de facto semver numbers and the 1.1 release will correspond to “adds features but introduces no breaking changes” rule for minor version bumps. There is a note in the spec about styles declaring themselves 1.0-compatible, but maybe 1.1 could include a statement of the versioning semantics! The tricky part of that is declaring what a breaking change is. One interpretation would be “would an old style fail to parse or produce an error?” but it should probably cover more than that.

I also introduced a new way to specify that a style is CSL-M:

<style ... version="1.1" variant="csl-m">

The current version="1.1mlz1" is non-standard and cannot be parsed into a semantic version range (see the semver spec) but that specific string is special-cased for compatibility. I don’t know if there are other CSL-M version numbers that should be recognised; it’s not clear to me what the 1 in mlz1 means. @Frank_Bennett ? CSL-M version numbers are a separate stream from CSL. The engine will declare one engine version per variant.


#6

The “1.1mlz1” version string on the CSL-M styles and schema is a decade-old lazy hack. The stable of styles is still small, so switching to a standard, more meaningful version/variant identifier makes good sense. The “variant” attribute looks like a good way to handle arbitration between the two systems, without cluttering up the version string with information of a different kind. I think it’s safe to assume that citeproc-js and citeproc-rs are the only tools that attempt to parse the CSL-M variant. I’ll give it some more thought, though, and post again here before making the jump in the hosted styles.


#7

I’ve had some more thoughts about this since, and I think I have something even better, borrowing heavily from the Rust compiler/language evolution process, which I am clearly a fan of. Essentially:

  • Have a list of all independent features your compiler supports
  • Have an optional top-level <features> <feature name="xxx" /> ... </features> section in a style to require + enable individual ones as necessary.
  • Features are unstable and not guaranteed to be around forever. Use at your own risk.
  • Throw errors if you do not support a feature that a style says it requires
  • Throw errors if a style uses a syntax-based feature it does not declare (some features might merely change the output without any new syntax)
  • Canonical feature naming for common implementation is based on tracking issues, so multiple engines can agree on the name of a feature even if they differ in implementation/progress/etc.
  • Eventually, either accept features because they make it into the CSL spec, or remove them. Accepted features no longer need to be declared in the <features> section, and removed features throw errors.
  • Have a shorthand <feature name="csl-m" /> that enables all the features that make up CSL-M, as that list would be quite long. That would make your job pretty easy if you wanted: only support the one feature, “csl-m”. Doing individual flags may be quite a lot of work, although if there is any low-hanging fruit like self-contained single if (cslM) { ... } blocks, it would help to convert them.

citeproc-rs features defined here – In styles they are kebab-case, but rust identifiers can’t be so a conversion is done to snake_case at runtime. Technically, the engine only supports a few of those, so most should be commented out. I have one removed feature just to try it out, although I do think form="abbreviated" would have been better :slight_smile:.

In action – You can see citeproc-rs yell at you with parse-time errors if you delete either of those lines and keep the <conditions> block as-is. Those ones also know which features you could enable to use the syntax, but this wouldn’t be universal.

The <features> section should be an addition to the spec in itself, hopefully 1.1. The only reason for that is so you can get meaningful this engine does not support features/specific feature X errors, and not unknown element <features>. Individual feature flags should not be part of the spec, only the syntax for declaring them. It would be great to have features 1:1 GitHub tracking issues in a single repo. This is so people who use them can be linked to their status, and (importantly) be provided with information if a feature they required was removed. (As Rintze noted, evolution issues are in the wrong place at the moment, but GitHub seems to have built an issue-mover in beta.)

Some benefits of such a scheme are:

  • People who need things faster, like the beneficiaries of Propachi, can live on the edge, but they are forced to document what’s different about the CSL they’re speaking.
  • You can grep over the styles repository to see if anyone is relying on a feature. You don’t have to eventually accept every feature that is used, but it helps to know.
  • You know what modifications an edgy style you’ve found needs before it will run on stable-only again.

Let me know what you think, and if there are any difficulties I haven’t imagined.


CSL-M extensions in standard CSL?
#8

A clear structure for documenting features at the edges would be a very good thing, and if a framework is established, I can add information on CSL-M details. It would be a good opportunity for housecleaning.

It seems there would be several layers to a framework. In ascending order of difficulty:

  • Naming and describing features
  • Testing feature behavior
  • Dynamically applying features in validation and processor configuration

To start with naming and description, presumably there would need to be a registry of some sort, to assure that names are unique to a particular behavior. It would need to be curated in some way, so the management angle would be important.