In what order are sort keys evaluated?

This one is a simple question, I think.

I’m looking at sort_BibliographyDescendingViaCompositeMacro. Its sort specification is as follows:

<sort>
      <key variable="title" sort="descending"/>
      <key macro="citation-number" sort="descending"/>
</sort>

Where the citation-number macro is in effect a wrapper for the citation-number variable. The expected output is

<div class="csl-bib-body">
  <div class="csl-entry">[4] Aaaa</div>
  <div class="csl-entry">[3] Bbbb</div>
  <div class="csl-entry">[2] Xxxx</div>
  <div class="csl-entry">[1] Zzzz</div>
</div>

I don’t understand why; in fact this seems backwards to me. The first sort key is a straightforward sort on the title variable, descending. That, I would expect, should result in an order of “zzzz” to “aaaa”. There’s nothing left unsorted, so nothing for the secondary sort on citation-number to do. The expected result seems to assume that the sorts would run in the reverse of the order I thought they did.

(If it helps, per the spec, emphasis added)

Sort keys are evaluated in sequence. A primary sort is performed on all items using the first sort key. A secondary sort, using the second sort key, is applied to items sharing the first sort key value. A tertiary sort, using the third sort key, is applied to items sharing the first and second sort key values. Sorting continues until either the order of all items is fixed, or until the sort keys are exhausted. Items with an empty sort key value are placed at the end of the sort, both for ascending and descending sorts.

Since there are no items here sharing the first sort-key value, the secondary sort shouldn’t happen, and it certainly shouldn’t reverse the order of the first one. Should the order of the sort-keys be reversed in the csl here?

(Or am I just confused about ascending and descending? It always confuses me, because we think of going “down” the alphabet from A-Z … but I think in conventional computer terms that’s ascending, as ASCII chars “go up” from A?)

I agree with your reading of the spec and the fact that this should have the titles going from z to a and that is, in fact, what I’m seeing in Zotero with a recent citeproc-js (with the exact style from the fixture), so not sure what’s going on with that. I don’t think Frank releases citeproc versions that don’t pass all tests, so all rather odd. @Frank_Bennett will need to help us out here.

That’s obviously wrong, isn’t it. It must run Z-A there, so that definitely needs to be amended in the test (and citeproc-js need to be made to behave accordingly).

About the treatment of citation-number in this cluster of tests, give me a few days to poke around and think about it. I’ll want to preserve the current behavior of the processor in its discrimination between sort-key calls to that variable via macro vs direct, because some deployments rely on it: but arguably that behavior should be made optional at my end, with the standard tests reflecting a clean reading of the specification, and citeproc-js running them with the option turned off. Needs a bit of thought, but will be a good one to resolve.

OK. Perfect. It helps me, as I encounter results from time to time that surprise me (a tiny fraction of the many failures which simply make me sigh at my stupidity) to ask about them. I’m sorry to try your patience. None of this is in the slightest bit urgent: I’ve plenty to be getting on with.

This is all good. Playing with this fixture, I’m getting some very strange returns from the processor. It’s all citeproc-js-specific brokenness so I won’t bother you with the details, but the problems are triggered by using citation-number as a sort key. There be bugs to fix here, and it’s really good to catch them.

I’ve looked into citeproc-js sorting quirks around citation-number. The experience has prompted some thoughts about sorts on that particular variable, most of which stray off into the weeds as far as the spec is concerned … but in ways that may make sense. Not sure if this thread is the right place to dump them, but here goes (please forgive the italics, which look kind of strident on the page here):

  • Macro calls and variable calls from cs:key should behave in the same way. Discriminating between them is confusing and nuts.
  • Numbers applied to citations must always align exactly with the numbers applied to their respective bib entries.
  • When citation-number is used as the primary bib sort key, it seems to me that sort="descending" should reverse the order of bib entries, but assign numbering in ascending order.
  • When citation-number is used as a secondary bib sort key, it seems to me that sort="descending" should reverse the order of numbering, but leave the order of bib entries untouched.

I can’t think of any use case that couldn’t be covered with that set of rules. They are better than the mess I have at the moment, but well off-spec, and if I implement them in citeproc-js it will have to be behind a configuration setting, anyway whaddaya think?

Points 1, and 2 make sense to me (i.e. accord with my expectation). I’m not sure I understand your third point. You are envisaging that a primary sort on “citation number” should be in effect a sort on citation order, but that numbers will then be (re-)assigned, i.e. that citation-number will in such a case become reverse-citation-number, as it were? Doable of course, but I hate special cases because one ends up with a disproportionate amount of code that is dealing with just them. What is the rationale for it? The most natural meaning of “sort by citation-number descending” would be that citation number [100] comes before citation number [99] …

Similarly with your fourth point. What that seems to be is not a sort at all, but a way of re-assigning citation numbers. I can’t quite get my head around what it should do or why.

Yes, those two are the weird ones. But I’m thinking that if primary controls order (i.e. document order or reverse document order), and secondary controls numbering only, it covers the issue that I linked to earlier (reverse numbering, so that number-of-publications is evident from the top of a personal bib listing). It’s still kind of hackish, but the use case is a real one, and when you think about it, what else would citation-number in secondary position be useful for?

There is actually a wrinkle with (1). It’s an issue similar to the handling of dates within macros, that is part of the CSL 1.0.1 spec. If citation-number is called via a macro, special handling is needed to make it behave as citation-number in the sort. It’s a problem even if you stick with (2) as the sole rule to follow. The key needs to grok that a macro that calls the citation-number variable somewhere, when the calling cs:key has the sort="descending" attribute, is meant to put bib entries in reverse document order. it’s not simple to do, and forces a choice between implementing identical variable/macro behavior only if only citation-number is called, or throwing away the remainder of the sort key returned by the macro. It’s a tough choice, but the latter is what we do with dates, per the spec.

I can see there are implementation problems with macros. I mean, there’s probably a basic design tension here (in the sense that a sensible sort needs some sort of uniformity in the key structure, which a macro cannot be guaranteed to provide, and so “steps must be taken”). I still have a mountain to climb on macros as sort keys. But the behaviour you advocate is in principle correct, in the sense that one thing that should be guaranteed is that if a macro emits only one value, then in principle that should be the same as a direct sort, or something like that.

I’m afraid the other suggestion does seem hackish. If one was designing this cleanly, it seems to me we would:

  • Distinguish between citation-number (which is a printable token, where there is a totally consistent one-to-one mapping between any cited item and one citation number, and citation-order which is the order the cited items actually appeared in the document.

  • Permit sorts either on citation-number or citation-order (which might not be the same).

  • Have global options specifying when citation-number is to be allocated (there are a number of possible points ranging from the moment a cite is first encountered in input (at the earliest) to the moment that it is rendered in a bibliography at the latest (eg allocate-citation-numbers="delay"

  • Have a separate global option specifying how citation-number is to be allocated – whether in order of citation or reverse order of citation.

However, we don’t live in such a perfect world, so we should make do. We should, however, have a clear understanding of what the right behaviour is, if it is going to be as odd as this would make it. And it should however be specified in the spec, because as things stand these effects are, to my mind, contradictory of it, however desirable in practice.

Is this the behaviour you have in mind?

A cs:key element which specifies a sort key consisting of a citation-number operates specially. There are four cases:

  • <key variable="citation-number" sort="ascending"/> specified as primary sort. Sorts in the order in which items were cited, from first to last.
  • <key variable="citation-number" sort="descending"/> specified as primary sort. Sorts in the reverse order in which items were cited, from last to first. After sorting, the citation-number is reassigned, so that the last cited item gets citation number 1. Such reassignment, however, is only done when the sort takes place in the context of rendering a bibliography. [RIGHT? THIS MAKES NO SENSE IF WE HAVE A REVERSE PRIMARY SORT IN A CITATION CONTEXT?]
  • <key variable="citation-number" sort="ascending"/> specified as a secondary or later sort key. Sorts affected citations (i.e. those not sorted by the primary key) into the order in which they were cited, from first to last, leaving the citation numbers unchanged.
  • <key variable="citation-number" sort="descending"/> specified as a secondary or later sort key. First sorts the affected citations (i.e. those not sorted by the primary key) into the reverse of the order in which they were cited, from last to first. Then, in a bibliography context only, reallocates all citation numbers in the entire sorted list produced by the primary sort, from the highest number available to 1. [IS THAT RIGHT?]

The effect of the fourth option is that a bibliography can be produced that is sorted (say) alphabetically, but in which citation numbers are assigned from the highest possible value to the lowest for all the items in the bibliography.

That helps clarity. I’m thinking that, to avoid confusion, tests of the two weird rules should maybe be placed outside the standard test suite. At the next iteration of the spec, designers might opt for the distinction between citation-order and citation-number that you suggest, and off-spec oddities in the test suite should not complicate the choice. That probably goes for other things you’ll find in there as well. A good time for housecleaning.

On the second rule, that is what I had in mind, but your statement of it gives me second thoughts. It would be simpler and more intuitive to preserve the citation-order/citation-number alignment in that case, wouldn’t it. So with titles cited in sequence “First,” “Second,” “Third,” applying <key variable="citation-number" sort="descending"/> as primary would just reverse the order of presentation in the bib, yielding “[3] Third,” “[2] Second,” “[1] First.” IOW citation-number should just work as a bog-standard sort key in primary position.

Yes, I think that would be much easier. The bizarre usefully idiosyncratic behaviour is then confined to secondary sorting, which is probably as it should be. Indeed, otherwise (I think) on your approach if one wanted a reverse sort where items were listed in reverse order of citation but bigger citation numbers appeared at the top, one would need

<sort>
  <key variable="citation-number" sort="descending"/>
  <key variable="citation-number" sort="descending"/>
</sort>

Which is not exactly intuitive!

I’ve had a go at coding in a branch of the citeproc-js repo, and implemented the rules we’ve been kicking around here. It passes sixteen new tests, but necessarily fails three old ones:

Diffs to make these pass are here.

Here are the new tests. Take a look, see what you think. If these look acceptable I can merge them. If the Secondary tests look too out-there, they can be moved to the local tests for citeproc-js. Views on that?

Very helpful thanks. I think I’m failing the ones I expect to fail (i.e. I pass all the primary ones, and fail the secondary descending ones – there’s one secondary that I fail for reasons I don’t understand, but that’s my problem I think and I need to thing properly about it.)

Incidentally there’s an error in the CSL – a group that doesn’t close in the citation layout. The first time I ran this and they ALL crashed, I had a little panic.

ETA: Yes. I know why I fail the one that I fail and it indeed my problem.

Progress! I’ve fixed the CSL (sorry about that, ouch). Okay to move the secondary descending fixtures to citeproc-js local, and merge the rest to test-suite master?

I’ve gone ahead and made that change - moving the secondary descending fixtures to the citeproc-js repo, and merging the rest to test-suite main.