Versioning of CSL and the CSL spec

There are some things that have to be patched up in the specification, and I
was wondering how we should start versioning stuff. Till now, the spec only
has a date-stamp (“version 2010-03-21”), but I don’t think that’s enough. Do
you all agree with the following proposal?:

1.* changes: used for backwards-incompatible changes in the CSL schema.
These are changes that require a style upgrade (using XSLT), like moving
cs:et-al from being a child element of cs:names to being a child element of
cs:name. With each upgrade, the required value of the version attribute on
cs:style is incremented (e.g. from “1.0” to “1.1”).

1.0.* changes: used for backwards-compatible changes in the CSL schema. An
example is the addition of the new delimiter-precedes-et-al option, which
offers more flexibility but isn’t required in CSL 1.0.0 styles. With
backwards-compatible changes, the value of the version attribute on cs:style
doesn’t change (it stays “1.0”).

1.0.0.* changes: used for changes in the specification that don’t require
changes to the CSL schema. An example are the recently discussed
instructions on the delimiter the CSL processor should use between names and
et-al.

This way we can easily keep the version of the specification aligned with
that of the schema. It also allows us to make versioned changes to the spec
without having to change the version of the schema.

Rintze

+1

Me too. This looks like a good arrangement.

This has come up again, so I want to revisit.

So they change the CSL model as represented in the csl.rnc file in ways that require style files to be modified to properly run.

What about adding a new element that processors can ignore?

Because it doesn’t require style changes, I think we might make the reasonable argument that they can go in a 1.0.x release, so long as we say something about this (how processors handle unknown elements and attributes) in the spec.

+1

But: what about new variables and types, which is the most frequent request we get?

I vote they go here.

+1

The implications of what I say just above is we can do a 1.0.2 release that folds in most of the outstanding consensus changes, and then save for a 1.1 release a few of the others.

The great thing if we can do this is that is would be a nice improvement to a minor release, without need to changes styles, etc.

If my understanding @Rintze_Zelle1’s explanations are correct, he actually suggests 4 release levels, right? And there’s nothing about going from 1.x to 2.0.
I’ve thought that what is said about 1.x changes actually describes X. changes.
1.0.x should be 1.x changes.
And so on…
At least isn’t that what we’re currently doing?

To help with discussing, let’s use this numbering scheme for generic version numbers:
A.B.C.D

Basically, we seem to be debating whether the versioning should primarily reflect the scope of changes needed for styles versus processors.

Versioning based on styles is mostly concerned with whether a style built using an older version can still be interpreted by a processor running a newer version on the same A.B.*.* level. So, a style written using 1.0.1 can interpreted by a 1.0.2 processor without any changes (or, similarly, the style number could be changed to 1.0.2 without becoming an invalid style).

Versioning based on processor changes instead is concerned with the scope of changes entailed in processor behavior. From this perspective, adding new terms, types, or variables requires little to no changes in processor behavior, so they would be a C-level release (A.B.*; 1.0.2), but changes that require potentially larger changes to processor behavior, such as a new type of element or special behavior for a specific term or reserved macro name would be a B-level release (A.*).

The main question seems to be whether to add cs:intext (or similar) as a C-level release (1.0.2) or as a B-level release (1.1.0). Bruce, you seem in favor of a style-centered versioning scheme to aid in tracking the performance and update needs of the many CSL styles. Sebastian seemed to be operating from a more processor-centered versioning scheme to aid in the implementation of new releases (e.g., a 1.0.2 style could go into immediate effect because there would be no blocking processor changes needed). If cs:intext were a 1.0.2 change, then any of the simpler changes in would not be available until processors were ready for cs:intext.

I think we first need to decide which approach we will take.

Partially, but see below.

I’m not concerned about this backward compatibility question.

I’m more concerned with the reverse; what happens when we introduce changes in styles not explicitly anticipated by a processor (like the most simple, a variable)?

This is why I’ve suggested we have a compatibility section in the spec that lays out these expectations; that is a contract that says “CSL will only do X changes for minor releases, and implementers must gracefully deal with them.”

But on the style perspective, main reason for this to allow innovation in styles, which has been tightly constrained because of how slowly we’ve felt we needed to move (with me probably representing the most conservative position).

Yes, the perspective here is indeed the developer. Do they need to change their code to read new styles, or will those styles break in some way if not?

I agree, except I think maybe we can reconcile these two.

For sake of argument, in the aggressive approach I’m contemplating here, we issue a 1.0.2 release with intext, and we add the section to the spec on compatibility which addresses the concern I have under the “styles” (I edited this) part.

Am I missing something that tells us this is a bad idea?

I’d like to suggest again that we look at the principals of semantic versioning.

We have three levels, A.B.C or MAJOR.MINOR.PATCH
To cite from their website:

Given a version number MAJOR.MINOR.PATCH, increment the:

  1. MAJOR version when you make incompatible API changes,
  2. MINOR version when you add functionality in a backwards compatible manner, and
  3. PATCH version when you make backwards compatible bug fixes.

First, this probably means that 1.0.1 was not an optimal name.
Second, if we add something that is purely additive, and that does not break existing functionality should lead to a minor version bump. The next version should therefore be 1.1—with or without a new cs:in-text (or however we end up calling it) element. (You could perhaps argue that adding a variable or an item type is a bug fix.)
If the release after that consists only of cs:in-text it will then be 1.2—again, this is purely additive and has no effect on existing styles. Of course, this is a bigger change than a couple of variables, but I don’t think that make a big difference from a semantic versioning point of view.

But if we, at some point, choose to adopt a new datamodel, this will clearly break the way things work currently, styles will have to be updated, or processors will need a compatibility layer for legacy styles. So, that will be a major version bump then.

Long story short, I propose the next version is 1.1.
Also, new versions with new features—however big or small—should lead to MINOR version bumps.
A MAJOR version bump only if changes break existing behavior.
PATCHES are really just PATCHES. In our context, that involves probably fixing typos, or if variables end up in the wrong place, or if we forget to include a variable in the schema that’s in the specification or vice versa.

Does that work for us?

One wrinkle is the relation between standard versioning, and the version attribute and schema version of CSL.

Would your 1.1 require a “1.1” version attribute value?

If yes, what does that do to style maintenance, etc.?

If no, does that get confusing for people?

Let’s play this through. We release a new version 1.1.
citeproc-js gets updated immediately and is now CSL 1.1 compliant.
So, citeproc-js can now process CSL style up to version 1.1.
A style with version=1.1 needs a 1.1 compliant citeproc, i.e., a citeproc that implements all features in that version of the specification.
But styles with version 1.0.1 will still continue to work, they just can’t use the new features. Here, it depends on the implementation. Currently, it’s already possible to use CSL-M features with citeproc-js, even when version is set to 1.0. pandoc-citeproc is less liberal. In general, I think a citeproc should give you warnings if you use a style that requires a featureset that the citeproc currently doesn’t support.

So, concerning this:

Would your 1.1 require a “1.1” version attribute value?

Only, if you need the full 1.1 set of features.

I don’t know; maybe I’m overthinking this.

Whatever can best, easiest and quickest get us to clear some of the backlog is good by me.

3 Likes

It’s possible I am also reflecting the bias of someone thinking about how to evolve the schema and validation as well.

I worry, though I’m not sure if fully justified, about expanding the number of versioned schema numbers.

Related: as we introduce new versions, do we want the latest schema to always validate earlier versions, or do we have separate schema files for each?

My impulse is the former, but I suspect it could get pretty complicated to implement and maintain.

Long story short, I propose the next version is 1.1.
Also, new versions with new features—however big or small—should lead to MINOR version bumps.
A MAJOR version bump only if changes break existing behavior.
PATCHES are really just PATCHES.

I 100% agree with this approach, would be happy to adopt and would also just encourage us to not overthink this. I think it’s good to have some clarity, but in the end, the version numbers are not the most critical part to get right (e.g. 1.0.1 was a good release in spite of probably sub-optimal version number)

1 Like

Yes, at least as long we are on the same MAJOR version.
1.9.4 is eventually a superset of 1.0.1, so everything valid in 1.0.1 must—by definition—be valid in 1.9.4. If not, it should be considered a bug. When we switch to another major version that’s a different story, but for now we should be safe.

I guess I’m not following.

We have a few variables and item types, the most minimally-invasive of changes, that we agree we need to add.

You guys are proposing those get added to a new 1.1 release?

If yes, this is what I’m not following.

So for sake of argument, if after that we want to add a single new item type; that would mean v1.2?

And therefore separate schemas?

But, don’t we need a new schema after adding a new item type, no matter if it’s a B-level or a C-level release? I don’t see where this makes life significantly easier.

You seem to follow this approach that’s in CSL 1.0 Specification Update 2010-05-30:

A three-number system (e.g. “1.2.3”) will be used for versioning of the CSL schema and specification. The first and second number are used for respectively major and minor backwards incompatible updates to the CSL schema (these updates will require upgrading of existing CSL styles). The third number is used for small backwards compatible updates. Each update to the CSL schema will be accompanied by an updated CSL specification. In addition, minor date-versioned updates to the CSL specification can be released without accompanying changes to the CSL schema (as is the case for the current specification update).

The problem here is, of course, that we don’t really know what the difference between the first and the second level is. The distinction between major or minor incompatibilities seems rather fuzzy to me. Also, in that logic a new cs:in-text could still be on the third level since there’s no need to update styles if you don’t need the new features, so 1.0.3?

But I agree with @Sebastian_Karcher that version numbers aren’t the most important point.

So, what about that as a compromise:

First level: changes or breaks existing behavior
Second level: adds new features, but is completely backwards compatible
Third level: adds item types, variables or terms
Fourth level: bug fixes.

No; an update to a schema is not the same thing as a completely different schema.

The implications of what I am saying is we merge the PRs to master, tag 1.0.2, and are done.

As a style author, I can simply stay on master, and update the schema as needed.

The implications of a new 1.x for every minor changes means, most likely, one of two strategies:

  1. create separate csl-.1.0.rnc and csl-1.1.rnc files.
  2. create separate schema patterns on one csl.rnc file to cleanly separate 1.0 and 1.1 validation, which means an already complex schema gets more complex.

What I mean by 2 is something like this conceptually:

style = style-1.0 | style-1.1

… where every place in the schema where we add stuff, we have to split off separate patterns for different 1.x releases.

So the PR’s I just submitted wouldn’t be the right path, I think; I’d need to refactor to explicitly separate out 1.0 and 1.1 variables.

I think I’m right here, and if I am, doesn’t that explain why this is a problem?

By “level” you mean x.x.x.x?

So in this case, you’d say the variable changes 1.0.2, and intext is 1.1, and rearrange some core CSL model structure would be 2.0?

Exactly, 2.0.: names are not names but numbers; hierarchical data model; deprecate citation and bibliography, put everything in macros that can be called from the processor; styles are now written in lisp.

1 Like

I’d just like to throw in that I don’t think we will (nor should) suddenly completely change how CSL operates and constantly tweak the schema, even in minor ways and how we version shoudl reflect this. I think the idea of having four digit version numbers for what I think will (and again: should) remain, at most(!), yearly updates is a bit silly.

1 Like