API and tests with reference to citation collapsing

Hi,

I’ve just finished implementing the new collapsing features, but I
have some troubles with the way we test citation collapsing.

Basically, citation numbers, if understand it correctly, depend on the
position of the first citation, or, to put it more formally, the
citation number is the position the citation occupies in the list of
citations with the position variable set to “first”, ordered by their
first occurrence (unless some other sorting routine is set).

So, collapse_CitationNumberRangesMixed presents this citation_items:
[
[
{
“id”: “ITEM-1”
},
{
“id”: “ITEM-2”
},
{
“id”: “ITEM-3”
},
{
“id”: “ITEM-5”
},
{
“id”: “ITEM-6”
},
{
“id”: “ITEM-7”
}
]
]

Here we have 6 citations, and they should be numbered from 1 to 6.
Thus this would be the output:

([1]-[6])

In other words, the INPUT should not be relevant, and to appropriately
test the collapsing routing we should have this CITATION_ITEMS:
[
[
{
“id”: “ITEM-1”
},
{
“id”: “ITEM-2”
},
{
“id”: “ITEM-3”
},
{
“id”: “ITEM-4”
},
{
“id”: “ITEM-5”
},
{
“id”: “ITEM-6”
},
{
“id”: “ITEM-7”
}
],
[
{
“id”: “ITEM-1”
},
{
“id”: “ITEM-2”
},
{
“id”: “ITEM-3”
},
{
“id”: “ITEM-5”
},
{
“id”: “ITEM-6”
},
{
“id”: “ITEM-7”
}
]
]

which would produce:
([1]-[7])
([1]-[3], [5]-[7])

The same applies to CitationNumberRangesMixed2,
CitationNumberRangesMixed3 and, for similar reasons, to
YearSuffixCollapse and YearSuffixCollapseNoRange.

More generally, the INPUT, I think, should not be relevant in the
ordering of citations (it is not in citeproc-hs). Moreover, references
appearing in the INPUT which have not been cited should not be taken
into account (for instance, in YearSuffixCollapse ITEM-2 has not been
cited and so it is not possible to have the ‘b’ year suffix assigned
to it).

This leads me to another issue, the fact that 2 distinct ways of
feeding citations to the citeproc are present in the test-suite. I’m
coming to think we should have only one, and so use CITATION_ITEMS and
get rid of CITATIONS (which seems more related to the citeproc-js
internals). Would that be possible? Would it make sense?

Andrea

Hi,

I’ve just finished implementing the new collapsing features, but I
have some troubles with the way we test citation collapsing.

Basically, citation numbers, if understand it correctly, depend on the
position of the first citation, or, to put it more formally, the
citation number is the position the citation occupies in the list of
citations with the position variable set to “first”, ordered by their
first occurrence (unless some other sorting routine is set).

So, collapse_CitationNumberRangesMixed presents this citation_items:
[
[
{
“id”: “ITEM-1”
},
{
“id”: “ITEM-2”
},
{
“id”: “ITEM-3”
},
{
“id”: “ITEM-5”
},
{
“id”: “ITEM-6”
},
{
“id”: “ITEM-7”
}
]
]

Here we have 6 citations, and they should be numbered from 1 to 6.
Thus this would be the output:

([1]-[6])

Yes, if those are the only items seen by the processor.

In other words, the INPUT should not be relevant, and to appropriately
test the collapsing routing we should have this CITATION_ITEMS:
[
[
{
“id”: “ITEM-1”
},
{
“id”: “ITEM-2”
},
{
“id”: “ITEM-3”
},
{
“id”: “ITEM-4”
},
{
“id”: “ITEM-5”
},
{
“id”: “ITEM-6”
},
{
“id”: “ITEM-7”
}
],
[
{
“id”: “ITEM-1”
},
{
“id”: “ITEM-2”
},
{
“id”: “ITEM-3”
},
{
“id”: “ITEM-5”
},
{
“id”: “ITEM-6”
},
{
“id”: “ITEM-7”
}
]
]

which would produce:
([1]-[7])
([1]-[3], [5]-[7])

Yes, that should be the result. Assuming that the bibliography is not
sorted (which might change the sequence numbers – but no such style
exists, as far as we know).

The same applies to CitationNumberRangesMixed2,
CitationNumberRangesMixed3 and, for similar reasons, to
YearSuffixCollapse and YearSuffixCollapseNoRange.

More generally, the INPUT, I think, should not be relevant in the
ordering of citations (it is not in citeproc-hs). Moreover, references
appearing in the INPUT which have not been cited should not be taken
into account (for instance, in YearSuffixCollapse ITEM-2 has not been
cited and so it is not possible to have the ‘b’ year suffix assigned
to it).

If I understand you to mean that the sequence numbers are fixed in the
bibliography, by order of first reference (or by sort order, at least
in theory), and that the citation numbers inside a cite cluster should
be sorted before rendering, yes.

This leads me to another issue, the fact that 2 distinct ways of
feeding citations to the citeproc are present in the test-suite. I’m
coming to think we should have only one, and so use CITATION_ITEMS and
get rid of CITATIONS (which seems more related to the citeproc-js
internals). Would that be possible? Would it make sense?

I agree that that’s probably a good idea. It will make the test cases
more verbose, but it would be better if they reflected the API. It
will take a lot of typing, though, or a lot of script-fiddling, to get
the test cases refactored.

The same applies to CitationNumberRangesMixed2,
CitationNumberRangesMixed3 and, for similar reasons, to
YearSuffixCollapse and YearSuffixCollapseNoRange.

More generally, the INPUT, I think, should not be relevant in the
ordering of citations (it is not in citeproc-hs). Moreover, references
appearing in the INPUT which have not been cited should not be taken
into account (for instance, in YearSuffixCollapse ITEM-2 has not been
cited and so it is not possible to have the ‘b’ year suffix assigned
to it).

If I understand you to mean that the sequence numbers are fixed in the
bibliography, by order of first reference (or by sort order, at least
in theory), and that the citation numbers inside a cite cluster should
be sorted before rendering, yes.

I’m attaching a patch for the test-suite that fixes this problem. All
edited tests work fine with citeproc-js.

This leads me to another issue, the fact that 2 distinct ways of
feeding citations to the citeproc are present in the test-suite. I’m
coming to think we should have only one, and so use CITATION_ITEMS and
get rid of CITATIONS (which seems more related to the citeproc-js
internals). Would that be possible? Would it make sense?

I agree that that’s probably a good idea. It will make the test cases
more verbose, but it would be better if they reflected the API. It
will take a lot of typing, though, or a lot of script-fiddling, to get
the test cases refactored.

It is not a top priority but, since I’m reviewing all tests, I can
start the work and send patches here. Would that be fine?

Andrea

collapsing.diff (4.85 KB)

The same applies to CitationNumberRangesMixed2,
CitationNumberRangesMixed3 and, for similar reasons, to
YearSuffixCollapse and YearSuffixCollapseNoRange.

More generally, the INPUT, I think, should not be relevant in the
ordering of citations (it is not in citeproc-hs). Moreover, references
appearing in the INPUT which have not been cited should not be taken
into account (for instance, in YearSuffixCollapse ITEM-2 has not been
cited and so it is not possible to have the ‘b’ year suffix assigned
to it).

If I understand you to mean that the sequence numbers are fixed in the
bibliography, by order of first reference (or by sort order, at least
in theory), and that the citation numbers inside a cite cluster should
be sorted before rendering, yes.

I’m attaching a patch for the test-suite that fixes this problem. All
edited tests work fine with citeproc-js.

This leads me to another issue, the fact that 2 distinct ways of
feeding citations to the citeproc are present in the test-suite. I’m
coming to think we should have only one, and so use CITATION_ITEMS and
get rid of CITATIONS (which seems more related to the citeproc-js
internals). Would that be possible? Would it make sense?

I agree that that’s probably a good idea. It will make the test cases
more verbose, but it would be better if they reflected the API. It
will take a lot of typing, though, or a lot of script-fiddling, to get
the test cases refactored.

It is not a top priority but, since I’m reviewing all tests, I can
start the work and send patches here. Would that be fine?

If it’s convenient, you could push changes directly to the test suite
repository in Bruce’s BitBucket account. We could tag releases, and
treat the tip as a work in progress.

Frank