Sorting and printing citation numbers

I am having a test failure where I do not understand the logic of the correct result. (In fact, I am having several test failures which I think are symptomatic of the same lack of understanding on my part: but one can take one test as a core case.) The test is sort_BibliographyCitationNumberDescending.txt

The basic scheme is this: we have eight items (item-1 to item-8), which are cited in the order 1-8. The citation layout does not print the citation number, and there is no pre-sort: so each acquires an implicit citation number from 1 (for item-1) to 8 (for item-8), but none of those numbers is printed during the citation phase (or implicit citation phase).

The bibliography section is sorted by citation-number descending, so we expect item-8 to be output first, and then successively items 7, 6 (etc). I’ve pasted the expected and actual output below: note that the book titles do not correspond to the item numbers. The order of titles is (correctly) “001” (= item-1), “003”, “004”, “006”, “002”, “005”, “007” and “008” (= item-8).

My sort order is the same as that in the test. But the bibliography also prints the citation number, with affixes “[” and “]”. My processor is printing the citation number as it was used in the sort, so item-8 gets citation number “[8]” etc. But the reference test, although it sorts item-8 with citation number 8, is printing that citation number as “[1]”. Why?

This seems to assume that a citation number (as used for sorting) and the printed form of the citation number should be different, and that the printed from should only be “baked in” when it is first printed. I’m not sure I follow the logic of this in terms of its use-case, and since tracking (separately) “printed-citation-number” and “actual-citation-number” would be tiresome, and potentially ambiguous. I wonder if it is necessary, in practical terms?

I am probably missing something obvious, but I’m afraid I’m stuck.

tests/sort_BibliographyCitationNumberDescending.txt FAILED
-------- EXPECTED --------
<div class="csl-bib-body">
  <div class="csl-entry">[1] Book 008</div>
  <div class="csl-entry">[2] Book 007</div>
  <div class="csl-entry">[3] Book 005</div>
  <div class="csl-entry">[4] Book 002</div>
  <div class="csl-entry">[5] Book 006</div>
  <div class="csl-entry">[6] Book 004</div>
  <div class="csl-entry">[7] Book 003</div>
  <div class="csl-entry">[8] Book 001</div>
</div>

----------- GOT -----------
<div class="csl-bib-body">
  <div class="csl-entry">[8] Book 008</div>
  <div class="csl-entry">[7] Book 007</div>
  <div class="csl-entry">[6] Book 005</div>
  <div class="csl-entry">[5] Book 002</div>
  <div class="csl-entry">[4] Book 006</div>
  <div class="csl-entry">[3] Book 004</div>
  <div class="csl-entry">[2] Book 003</div>
  <div class="csl-entry">[1] Book 001</div>
</div>

Error at byte 54: 1:8

Believe it or not, this test figured in an actual use case.

But that use case seems to give what I expected the result to be (namely the citation numbers running from high to low) and not what the expected result of the test is, where the citations are ordered in reverse-order of citation, but their numbering is still low to high … so I’m afraid I’m still puzzled. But I’ll keep thinking.

Me too, I should look back at this more carefully. If we can simplify the internal logic and still address user requirements, that would be a good thing. One thing I remember from working on that one (vaguely but I hope correctly) is that citeproc-js assigns “citation numbers” differently in the bib depending on whether the variable is called on the key directly or via a macro—which is weird and off-spec, but cleared the hurdle in that particular case.

Well. That would explain why you get the result you do. FWIW my procedure (based on assumptions which may be wrong is as follows):

  • Citation number gets assigned when an item is first cited (either actually cited or “ghost” cited, with ghosts cites being arranged to occur inevitably after actual cites). Once assigned it doesn’t change.

  • No citation number will ever be assigned in the bibliography, because everything will always have been at least ghost cited before the bibliography runs, in order to disambiguate.

That sufficiently explains why I get the results I do: once a citation number is assigned, which occurs when the cite is first cited, it never changes. I can’t completely understand why you get another result though, because if a citation number hasn’t been assigned before the sort runs (which is before anything ends up in the bibliography) the outcome should, I’d have thought, be … well, undefined I guess.

Note that my use case assumes a different model from yours to some extent because for reasons that don’t matter here I have to assume that the very first thing the processor always does is to ghost-cite everything it expects to see in the order it expects to see it. If its expectations in that regard are disappointed, it has to run again (having updated its idea of what it expects to see) – so we always disambiguate everything except per-cite disambiguation as the very first thing we do. I don’t know if that makes a difference; it shouldn’t because the disambiguation phase should discard any citation numbers it creates. (There’s a reason for taking this apparently sub-optimal approach, because I’m targeting LaTeX, in which “the moving finger writes and having writ, moves on” – i.e. once something is output, it’s output, so we cannot disambiguate early cites based on later ones, which requires a multi-run approach to disambiguation in worst cases.)

There are styles like ACM SIG Proceedings, though, that sort alphabetically in the bibliography with citations in the document assigned in bib order.

Oh yes! Right. Now you mention it I was wrong about one thing. I do assign citation numbers during the initial “dummy” run, for exactly that reason. So, effectively, the very first thing the processor does is “pretend” that it’s printing a bibliography: it takes the citations it expects to see—in the order it expects to see them—and sorts them etc as if they were going to be in a bibliography, doing any disambiguation it expects to need to do, and so forth. Then it processes, during which point (of course) it may be disappointed in its expectations, in which case when (at the third stage) it produces a bibliography, it will be wrong. But if that is so then the next time it runs (when it will know what actual citations it got, the dummy run will work. Since citation numbers get assigned after the sorting phase of the dummy run, when we are “pretending” to output a bibliography, we cope with citation numbers that are assigned in bibliography order.

It’s an odd sounding procedure, but actually effectively the standard LaTeX method: at least two passes during which the citation data is first collected and then sorted etc, with the only difference being that the processing happens during the run rather than between runs.