how do you disambiguate that?

Hi,

in the test below we have 4 references: 2 with the same names, the
other 2 differing in a family and in a given name with the first
reference respectively.

Disambiguation for each couple of citations would be:

Smith, Brown & Jones (1980); Smith, Brown & Jones (1980)
Smith, Brown, et al. (1980); Smith, Benson, et al. (1980)
Smith, Brown & J. Jones (1980); Smith, Brown & A. Jones (1980)

But, what happens if we put the 3 cites in the same document?

citeproc-js, and citeproc-hs do the following:

Smith, Brown & J. Jones (1980); Smith, Brown & J. Jones (1980)
Smith, Brown & J. Jones (1980); Smith, Benson, et al. (1980)
Smith, Brown & J. Jones (1980); Smith, Brown & A. Jones (1980)

is that correct? I have the feeling that the first one should be:

Smith, Brown & J. Jones (1980); Smith, Brown & Jones (1980)

(that would be a bit troublesome here, but it could be done)

Andrea

===== MODE =====>>
citation
<<===== MODE =====<<

===== RESULT =====>>
Smith, Brown & J. Jones (1980); Smith, Brown & J. Jones (1980)
Smith, Brown & J. Jones (1980); Smith, Benson, et al. (1980)
Smith, Brown & J. Jones (1980); Smith, Brown & A. Jones (1980)
<<===== RESULT =====<<

===== CITATION-ITEMS =====>>
[
[
{
“id”: “ITEM-1”,
“label”: “page”,
“locator”: “100”
},
{
“id”: “ITEM-2”,
“label”: “chapter”,
“locator”: “200”
}
],
[
{
“id”: “ITEM-1”,
“label”: “page”,
“locator”: “100”
},
{
“id”: “ITEM-3”,
“label”: “chapter”,
“locator”: “200”
}
],
[
{
“id”: “ITEM-1”,
“label”: “page”,
“locator”: “100”
},
{
“id”: “ITEM-4”,
“label”: “chapter”,
“locator”: “200”
}
]

]
<<===== CITATION-ITEMS =====<<

===== CSL =====>>

2009-08-10T04:49:00+09:00

<<===== CSL =====<<

===== INPUT =====>>
[
{
“author”: [
{
“family”: “Smith”,
“given”: “John”,
“static-ordering”: false
},
{
“family”: “Brown”,
“given”: “John”,
“static-ordering”: false
},
{
“family”: “Jones”,
“given”: “John”,
“static-ordering”: false
}
],
“id”: “ITEM-1”,
“issued”: {
“date-parts”: [
[
1980
]
]
},
“type”: “book”
},
{
“author”: [
{
“family”: “Smith”,
“given”: “John”,
“static-ordering”: false
},
{
“family”: “Brown”,
“given”: “John”,
“static-ordering”: false
},
{
“family”: “Jones”,
“given”: “John”,
“static-ordering”: false
}
],
“id”: “ITEM-2”,
“issued”: {
“date-parts”: [
[
1980
]
]
},
“type”: “book”
},
{
“author”: [
{
“family”: “Smith”,
“given”: “John”,
“static-ordering”: false
},
{
“family”: “Benson”,
“given”: “John”,
“static-ordering”: false
},
{
“family”: “Jones”,
“given”: “John”,
“static-ordering”: false
}
],
“id”: “ITEM-3”,
“issued”: {
“date-parts”: [
[
1980
]
]
},
“type”: “book”
},
{
“author”: [
{
“family”: “Smith”,
“given”: “John”,
“static-ordering”: false
},
{
“family”: “Brown”,
“given”: “John”,
“static-ordering”: false
},
{
“family”: “Jones”,
“given”: “Arthur”,
“static-ordering”: false
}
],
“id”: “ITEM-4”,
“issued”: {
“date-parts”: [
[
1980
]
]
},
“type”: “book”
}
]
<<===== INPUT =====<<

Hi,

in the test below we have 4 references: 2 with the same names, the
other 2 differing in a family and in a given name with the first
reference respectively.

Disambiguation for each couple of citations would be:

Smith, Brown & Jones (1980); Smith, Brown & Jones (1980)
Smith, Brown, et al. (1980); Smith, Benson, et al. (1980)
Smith, Brown & J. Jones (1980); Smith, Brown & A. Jones (1980)

But, what happens if we put the 3 cites in the same document?

citeproc-js, and citeproc-hs do the following:

Smith, Brown & J. Jones (1980); Smith, Brown & J. Jones (1980)
Smith, Brown & J. Jones (1980); Smith, Benson, et al. (1980)
Smith, Brown & J. Jones (1980); Smith, Brown & A. Jones (1980)

is that correct? I have the feeling that the first one should be:

With disambiguation-rule=“all-names” (and the other options set in the
test), the output above is definitely correct, because the authors
John Jones and Arthur Jones need to be distinguished, regardless of
whether their initials are needed to disambiguate cites.

If you change the options, adding disambiguate-add-year-suffix=“true”,
and changing givenname-disambiguate-rule to “by-cite”, citeproc-js
collapses everything with et al. and year-suffixes except the last
cite, which renders with “A. Jones” as the final name. That does seem
wrong: plain “Jones” would be sufficent to disambiguate the cites.
What do you get from citeproc-hs?

Frank

citeproc-js, and citeproc-hs do the following:

Smith, Brown & J. Jones (1980); Smith, Brown & J. Jones (1980)
Smith, Brown & J. Jones (1980); Smith, Benson, et al. (1980)
Smith, Brown & J. Jones (1980); Smith, Brown & A. Jones (1980)

is that correct? I have the feeling that the first one should be:

With disambiguation-rule=“all-names” (and the other options set in the
test), the output above is definitely correct, because the authors
John Jones and Arthur Jones need to be distinguished, regardless of
whether their initials are needed to disambiguate cites.

My only concern is related to the change of ITEM-2. The J. is needed
to disambiguate ITEM-1 from ITEM-4. Do I need to change ITEM-2 too,
just because it has the same contributor list of ITEM-1?

If you change the options, adding disambiguate-add-year-suffix=“true”,
and changing givenname-disambiguate-rule to “by-cite”, citeproc-js
collapses everything with et al. and year-suffixes except the last
cite, which renders with “A. Jones” as the final name. That does seem
wrong: plain “Jones” would be sufficent to disambiguate the cites.
What do you get from citeproc-hs?

Adding disambiguate-add-year-suffix=“true” would produce the
collapsing of ITEM-1 and ITEM-2. ITEM-3 would be the same.

I didn’t implement “by-cite” yet and so I’m not sure what the correct
result should be.

Andrea

The contributor lists are irrelevant in this case. From the spec:On Tue, Jun 1, 2010 at 6:08 AM, Andrea Rossato <@Andrea_Rossato1> wrote:

On Tue, Jun 01, 2010 at 04:09:47AM +0900, Frank Bennett wrote:

On Mon, May 31, 2010 at 8:54 PM, Andrea Rossato >> <@Andrea_Rossato1> wrote:

citeproc-js, and citeproc-hs do the following:

Smith, Brown & J. Jones (1980); Smith, Brown & J. Jones (1980)
Smith, Brown & J. Jones (1980); Smith, Benson, et al. (1980)
Smith, Brown & J. Jones (1980); Smith, Brown & A. Jones (1980)

is that correct? I have the feeling that the first one should be:

With disambiguation-rule=“all-names” (and the other options set in the
test), the output above is definitely correct, because the authors
John Jones and Arthur Jones need to be distinguished, regardless of
whether their initials are needed to disambiguate cites.

My only concern is related to the change of ITEM-2. The J. is needed
to disambiguate ITEM-1 from ITEM-4. Do I need to change ITEM-2 too,
just because it has the same contributor list of ITEM-1?


With a value of “all-names”, “all-names-with-initials”,
“primary-name”, or “primary-name-with-initials”, disambiguation is
performed for all relevant names, without regard to ambiguity in
individual cites. Transformations governed by these rules apply to all
cites throughout the document. Disambiguation of cites is in this case
incidental to the disambiguation of names.


So if the names “John Jones” and “Arthur Jones” appear in cites within
the document, the given names or initials need to be added to both of
them, to distinguish the two authors.

(For clarity, perhaps the spec language could be amended to read “for
all relevant names individually”.)

If you change the options, adding disambiguate-add-year-suffix=“true”,
and changing givenname-disambiguate-rule to “by-cite”, citeproc-js
collapses everything with et al. and year-suffixes except the last
cite, which renders with “A. Jones” as the final name. That does seem
wrong: plain “Jones” would be sufficent to disambiguate the cites.
What do you get from citeproc-hs?

Adding disambiguate-add-year-suffix=“true” would produce the
collapsing of ITEM-1 and ITEM-2. ITEM-3 would be the same.

I get the same here. Yay.

citeproc-js, and citeproc-hs do the following:

Smith, Brown & J. Jones (1980); Smith, Brown & J. Jones (1980)
Smith, Brown & J. Jones (1980); Smith, Benson, et al. (1980)
Smith, Brown & J. Jones (1980); Smith, Brown & A. Jones (1980)

is that correct? I have the feeling that the first one should be:

With disambiguation-rule=“all-names” (and the other options set in the
test), the output above is definitely correct, because the authors
John Jones and Arthur Jones need to be distinguished, regardless of
whether their initials are needed to disambiguate cites.

My only concern is related to the change of ITEM-2. The J. is needed
to disambiguate ITEM-1 from ITEM-4. Do I need to change ITEM-2 too,
just because it has the same contributor list of ITEM-1?

If you change the options, adding disambiguate-add-year-suffix=“true”,
and changing givenname-disambiguate-rule to “by-cite”, citeproc-js
collapses everything with et al. and year-suffixes except the last
cite, which renders with “A. Jones” as the final name. That does seem
wrong: plain “Jones” would be sufficent to disambiguate the cites.
What do you get from citeproc-hs?

Adding disambiguate-add-year-suffix=“true” would produce the
collapsing of ITEM-1 and ITEM-2. ITEM-3 would be the same.

I didn’t implement “by-cite” yet and so I’m not sure what the correct
result should be.

After mulling over the result with year-suffix and by-cite, I decided
that retaining the initial here is definitely weird, so I’ve pushed a
test fixture covering it, and patched up citeproc-js to conform to it.

Frank

citeproc-js, and citeproc-hs do the following:

Smith, Brown & J. Jones (1980); Smith, Brown & J. Jones (1980)
Smith, Brown & J. Jones (1980); Smith, Benson, et al. (1980)
Smith, Brown & J. Jones (1980); Smith, Brown & A. Jones (1980)

is that correct? I have the feeling that the first one should be:

Adding disambiguate-add-year-suffix=“true” would produce the
collapsing of ITEM-1 and ITEM-2. ITEM-3 would be the same.

I get the same here. Yay.

I must confess I’m still a bit confused about the relations among the
various disambiguation options… so, to recapitulate, with the
initial example and:

  • disambiguate-add-names=“true” we get:

    Smith, Brown & Jones (1980); Smith, Brown & Jones (1980)
    Smith, Brown & Jones (1980); Smith, Benson, et al. (1980)
    Smith, Brown & Jones (1980); Smith, Brown & Jones (1980)

    Is it correct or the second one should be “Smith, Brown, et al.
    (1980); Smith, Benson, et al. (1980)”?

  • disambiguate-add-names=“true”, disambiguate-add-givenname=“true”
    and givenname-disambiguation-rule=“all-names”:

    Smith, Brown & J. Jones (1980); Smith, Brown & J. Jones (1980)
    Smith, Brown & J. Jones (1980); Smith, Benson, et al. (1980)
    Smith, Brown & J. Jones (1980); Smith, Brown & A. Jones (1980)

    This is correct, as you explained before.

  • disambiguate-add-names=“true”, disambiguate-add-givenname=“true”,
    givenname-disambiguation-rule=“all-names”, and
    disambiguate-add-year-suffix=“true”

    Smith et al. (1980a); Smith et al. (1980b)
    Smith et al. (1980a); Smith, Benson, et al. (1980)
    Smith et al. (1980a); Smith, Brown & A. Jones (1980)

And this? Should the last one be “Smith, Brown & Jones”?

This is what I mysteriously get out of my code, BTW. I’ve been trying
hard to understand why with no success … :wink:

I hate disambiguation. I really do!

Andrea

citeproc-js, and citeproc-hs do the following:

Smith, Brown & J. Jones (1980); Smith, Brown & J. Jones (1980)
Smith, Brown & J. Jones (1980); Smith, Benson, et al. (1980)
Smith, Brown & J. Jones (1980); Smith, Brown & A. Jones (1980)

is that correct? I have the feeling that the first one should be:

Adding disambiguate-add-year-suffix=“true” would produce the
collapsing of ITEM-1 and ITEM-2. ITEM-3 would be the same.

I get the same here. Yay.

I must confess I’m still a bit confused about the relations among the
various disambiguation options… so, to recapitulate, with the
initial example and:

  • disambiguate-add-names=“true” we get:

    Smith, Brown & Jones (1980); Smith, Brown & Jones (1980)
    Smith, Brown & Jones (1980); Smith, Benson, et al. (1980)
    Smith, Brown & Jones (1980); Smith, Brown & Jones (1980)

    Is it correct or the second one should be “Smith, Brown, et al.
    (1980); Smith, Benson, et al. (1980)”?

  • disambiguate-add-names=“true”, disambiguate-add-givenname=“true”
    and givenname-disambiguation-rule=“all-names”:

    Smith, Brown & J. Jones (1980); Smith, Brown & J. Jones (1980)
    Smith, Brown & J. Jones (1980); Smith, Benson, et al. (1980)
    Smith, Brown & J. Jones (1980); Smith, Brown & A. Jones (1980)

    This is correct, as you explained before.

  • disambiguate-add-names=“true”, disambiguate-add-givenname=“true”,
    givenname-disambiguation-rule=“all-names”, and
    disambiguate-add-year-suffix=“true”

    Smith et al. (1980a); Smith et al. (1980b)
    Smith et al. (1980a); Smith, Benson, et al. (1980)
    Smith et al. (1980a); Smith, Brown & A. Jones (1980)

And this? Should the last one be “Smith, Brown & Jones”?

This is what I mysteriously get out of my code, BTW. I’ve been trying
hard to understand why with no success … :wink:

I hate disambiguation. I really do!

Me too; there’s not much fun in it, that’s for sure.

Your tests raise further issues in citeproc-js. I’ll get them fixed,
then get back. For now, it’s late here and I’d better get some sleep.

More tomorrow!

Frank

Andrea,

This is a beautiful set of test data. It’s helping to clarify things
that have been quietly lurking about without clear solutions.

citeproc-js, and citeproc-hs do the following:

Smith, Brown & J. Jones (1980); Smith, Brown & J. Jones (1980)
Smith, Brown & J. Jones (1980); Smith, Benson, et al. (1980)
Smith, Brown & J. Jones (1980); Smith, Brown & A. Jones (1980)

is that correct? I have the feeling that the first one should be:

Adding disambiguate-add-year-suffix=“true” would produce the
collapsing of ITEM-1 and ITEM-2. ITEM-3 would be the same.

I get the same here. Yay.

I must confess I’m still a bit confused about the relations among the
various disambiguation options… so, to recapitulate, with the
initial example and:

  • disambiguate-add-names=“true” we get:

    Smith, Brown & Jones (1980); Smith, Brown & Jones (1980)
    Smith, Brown & Jones (1980); Smith, Benson, et al. (1980)
    Smith, Brown & Jones (1980); Smith, Brown & Jones (1980)

    Is it correct or the second one should be “Smith, Brown, et al.
    (1980); Smith, Benson, et al. (1980)”?

It’s correct. The pairing of cites in the output isn’t relevant,
although it’s a very helpful pattern for readability. ITEM-1 should
render in the same form everywhere in the document. It’s the first
item in each pair, so Smith, Brown & Jones (1980) is exactly right for
it, in the second line as well.

  • disambiguate-add-names=“true”, disambiguate-add-givenname=“true”
    and givenname-disambiguation-rule=“all-names”:

    Smith, Brown & J. Jones (1980); Smith, Brown & J. Jones (1980)
    Smith, Brown & J. Jones (1980); Smith, Benson, et al. (1980)
    Smith, Brown & J. Jones (1980); Smith, Brown & A. Jones (1980)

    This is correct, as you explained before.

Yep, that looks right. ITEM-1 and ITEM-2 are not fully
disambiguating, but we just provide all the information we can.

  • disambiguate-add-names=“true”, disambiguate-add-givenname=“true”,
    givenname-disambiguation-rule=“all-names”, and
    disambiguate-add-year-suffix=“true”

    Smith et al. (1980a); Smith et al. (1980b)
    Smith et al. (1980a); Smith, Benson, et al. (1980)
    Smith et al. (1980a); Smith, Brown & A. Jones (1980)

And this? Should the last one be “Smith, Brown & Jones”?

This one gets interesting. Technically, the cites disambiguate fine
here, and as you say, the initial “A.” looks superfluous. But I
wonder …

If you think about a reader digging through the bibliography looking
for these entries, and if you assume that the bibliography list is
sorted alphabetically by author, their task will be made much easier
if the max names are retained for the items disambiguated with
year-suffix (ITEM-1 and ITEM-2) – the extra information isn’t
relevant to disambiguation, but it can be relevant to the search for
the entries in a linear listing, because it helps the reader figure
out where to start looking – “Smith et al.” could be “Smith, Anderson
and Zappa”, or “Smith, Zappa and Anderson”, if there are 50 works by
Smith as the primary author, you’d kind of want to know which end of
the list to start from.

I’m not sure what the style guides say, so I guess that checking them
out would be the next step. But at the threshold, does anyone have
views on this one?

Frank

Andrea,

This is a beautiful set of test data. It’s helping to clarify things
that have been quietly lurking about without clear solutions.

citeproc-js, and citeproc-hs do the following:

Smith, Brown & J. Jones (1980); Smith, Brown & J. Jones (1980)
Smith, Brown & J. Jones (1980); Smith, Benson, et al. (1980)
Smith, Brown & J. Jones (1980); Smith, Brown & A. Jones (1980)

is that correct? I have the feeling that the first one should be:

Adding disambiguate-add-year-suffix=“true” would produce the
collapsing of ITEM-1 and ITEM-2. ITEM-3 would be the same.

I get the same here. Yay.

I must confess I’m still a bit confused about the relations among the
various disambiguation options… so, to recapitulate, with the
initial example and:

  • disambiguate-add-names=“true” we get:

    Smith, Brown & Jones (1980); Smith, Brown & Jones (1980)
    Smith, Brown & Jones (1980); Smith, Benson, et al. (1980)
    Smith, Brown & Jones (1980); Smith, Brown & Jones (1980)

    Is it correct or the second one should be “Smith, Brown, et al.
    (1980); Smith, Benson, et al. (1980)”?

It’s correct. The pairing of cites in the output isn’t relevant,
although it’s a very helpful pattern for readability. ITEM-1 should
render in the same form everywhere in the document. It’s the first
item in each pair, so Smith, Brown & Jones (1980) is exactly right for
it, in the second line as well.

  • disambiguate-add-names=“true”, disambiguate-add-givenname=“true”
    and givenname-disambiguation-rule=“all-names”:

    Smith, Brown & J. Jones (1980); Smith, Brown & J. Jones (1980)
    Smith, Brown & J. Jones (1980); Smith, Benson, et al. (1980)
    Smith, Brown & J. Jones (1980); Smith, Brown & A. Jones (1980)

    This is correct, as you explained before.

Yep, that looks right. ITEM-1 and ITEM-2 are not fully
disambiguating, but we just provide all the information we can.

  • disambiguate-add-names=“true”, disambiguate-add-givenname=“true”,
    givenname-disambiguation-rule=“all-names”, and
    disambiguate-add-year-suffix=“true”

    Smith et al. (1980a); Smith et al. (1980b)
    Smith et al. (1980a); Smith, Benson, et al. (1980)
    Smith et al. (1980a); Smith, Brown & A. Jones (1980)

And this? Should the last one be “Smith, Brown & Jones”?

This one gets interesting. Technically, the cites disambiguate fine
here, and as you say, the initial “A.” looks superfluous. But I
wonder …

If you think about a reader digging through the bibliography looking
for these entries, and if you assume that the bibliography list is
sorted alphabetically by author, their task will be made much easier
if the max names are retained for the items disambiguated with
year-suffix (ITEM-1 and ITEM-2) – the extra information isn’t
relevant to disambiguation, but it can be relevant to the search for
the entries in a linear listing, because it helps the reader figure
out where to start looking – “Smith et al.” could be “Smith, Anderson
and Zappa”, or “Smith, Zappa and Anderson”, if there are 50 works by
Smith as the primary author, you’d kind of want to know which end of
the list to start from.

I’m not sure what the style guides say, so I guess that checking them
out would be the next step. But at the threshold, does anyone have
views on this one?

While waiting for Europe and the Americas to wake up … I spent some
time on this today, mulling over possible use cases, and looking at
some style guides. Here’s a firmer statement.

  • Chicago author-date explictly requires that the in-text references
    correspond exactly to the authors listed in the reference list.

  • APA is less explicit, but seems to require the same.

  • Both Chicago and APA sort on the last names of all listed authors in
    sequence, followed by the year, in their author-date systems.

  • There is anecdotal evidence on the forums of styles that list the
    primary author, followed by the year, in the reference list, with the
    full list of authors later in the citation. I have not found a manual
    that calls for this, but it’s clearly out there.

  • While the styles describe above would be a safe case for collapsing
    back to primary author + year-suffix, such styles presumably always
    use only the primary author in in-text references, using year-suffix
    as required, and sort on the primary author and the date.

  • If these are the systems in circulation, there is no use case for
    collapsing year-suffixed cites back to the primary author + et al.

So the simple question is:

Does everyone agree that it makes sense to always retain the names
that were added by disambiguate-add-names on a year-suffixed cite?

(I have more sophisticated handling running in citeproc-js at the
moment, but will very happily roll back the changes if the answer to
the question above is “yes”.)

Frank

  • disambiguate-add-names=“true”, disambiguate-add-givenname=“true”,
    givenname-disambiguation-rule=“all-names”, and
    disambiguate-add-year-suffix=“true”

    Smith et al. (1980a); Smith et al. (1980b)
    Smith et al. (1980a); Smith, Benson, et al. (1980)
    Smith et al. (1980a); Smith, Brown & A. Jones (1980)

And this? Should the last one be “Smith, Brown & Jones”?

This one gets interesting. Technically, the cites disambiguate fine
here, and as you say, the initial “A.” looks superfluous. But I
wonder …

If you think about a reader digging through the bibliography looking
for these entries, and if you assume that the bibliography list is
sorted alphabetically by author, their task will be made much easier
if the max names are retained for the items disambiguated with
year-suffix (ITEM-1 and ITEM-2) – the extra information isn’t
relevant to disambiguation, but it can be relevant to the search for
the entries in a linear listing, because it helps the reader figure
out where to start looking – “Smith et al.” could be “Smith, Anderson
and Zappa”, or “Smith, Zappa and Anderson”, if there are 50 works by
Smith as the primary author, you’d kind of want to know which end of
the list to start from.

I’m not sure what the style guides say, so I guess that checking them
out would be the next step. But at the threshold, does anyone have
views on this one?

While waiting for Europe and the Americas to wake up … I spent some
time on this today, mulling over possible use cases, and looking at
some style guides. Here’s a firmer statement.

[…]

  • If these are the systems in circulation, there is no use case for
    collapsing year-suffixed cites back to the primary author + et al.

So the simple question is:

Does everyone agree that it makes sense to always retain the names
that were added by disambiguate-add-names on a year-suffixed cite?

(I have more sophisticated handling running in citeproc-js at the
moment, but will very happily roll back the changes if the answer to
the question above is “yes”.)

While Europe woke up 5 hours ago, it had to spend some of that time in
trying to grasp the problem and the implications, in terms of code and
results, of your proposal, which makes perfectly sense to me - that
was actually the way I wrote the code initially.

So, since disambiguate-add-givenname must be applied before applying
disambiguate-add-year-suffix, the above example becomes:

Smith, Brown & J. Jones (1980a); Smith, Brown & J. Jones (1980b)
Smith, Brown & J. Jones (1980a); Smith, Benson, et al. (1980)
Smith, Brown & J. Jones (1980a); Smith, Brown & A. Jones (1980)

with the removal of any ambiguity.

And this is the case with disambiguate-add-names plus
disambiguate-add-year-suffix:

Smith, Brown & Jones (1980a); Smith, Brown & Jones (1980b)
Smith, Brown & Jones (1980a); Smith, Benson, et al. (1980)
Smith, Brown & Jones (1980a); Smith, Brown & Jones (1980)

Then there is the case of disambiguate-add-year-suffix only. I suppose
you would solve it in this way (letters depending on the bibliography
sorting):

Smith et al. (1980a); Smith et al. (1980b)
Smith et al. (1980a); Smith et al. (1980c)
Smith et al. (1980a); Smith et al. (1980d)

Is this correct?

Andrea

  • disambiguate-add-names=“true”, disambiguate-add-givenname=“true”,
    givenname-disambiguation-rule=“all-names”, and
    disambiguate-add-year-suffix=“true”

    Smith et al. (1980a); Smith et al. (1980b)
    Smith et al. (1980a); Smith, Benson, et al. (1980)
    Smith et al. (1980a); Smith, Brown & A. Jones (1980)

And this? Should the last one be “Smith, Brown & Jones”?

This one gets interesting. Technically, the cites disambiguate fine
here, and as you say, the initial “A.” looks superfluous. But I
wonder …

If you think about a reader digging through the bibliography looking
for these entries, and if you assume that the bibliography list is
sorted alphabetically by author, their task will be made much easier
if the max names are retained for the items disambiguated with
year-suffix (ITEM-1 and ITEM-2) – the extra information isn’t
relevant to disambiguation, but it can be relevant to the search for
the entries in a linear listing, because it helps the reader figure
out where to start looking – “Smith et al.” could be “Smith, Anderson
and Zappa”, or “Smith, Zappa and Anderson”, if there are 50 works by
Smith as the primary author, you’d kind of want to know which end of
the list to start from.

I’m not sure what the style guides say, so I guess that checking them
out would be the next step. But at the threshold, does anyone have
views on this one?

While waiting for Europe and the Americas to wake up … I spent some
time on this today, mulling over possible use cases, and looking at
some style guides. Here’s a firmer statement.

[…]

  • If these are the systems in circulation, there is no use case for
    collapsing year-suffixed cites back to the primary author + et al.

So the simple question is:

Does everyone agree that it makes sense to always retain the names
that were added by disambiguate-add-names on a year-suffixed cite?

(I have more sophisticated handling running in citeproc-js at the
moment, but will very happily roll back the changes if the answer to
the question above is “yes”.)

While Europe woke up 5 hours ago, it had to spend some of that time in
trying to grasp the problem and the implications, in terms of code and
results, of your proposal, which makes perfectly sense to me - that
was actually the way I wrote the code initially.

So, since disambiguate-add-givenname must be applied before applying
disambiguate-add-year-suffix, the above example becomes:

Smith, Brown & J. Jones (1980a); Smith, Brown & J. Jones (1980b)
Smith, Brown & J. Jones (1980a); Smith, Benson, et al. (1980)
Smith, Brown & J. Jones (1980a); Smith, Brown & A. Jones (1980)

with the removal of any ambiguity.

And this is the case with disambiguate-add-names plus
disambiguate-add-year-suffix:

Smith, Brown & Jones (1980a); Smith, Brown & Jones (1980b)
Smith, Brown & Jones (1980a); Smith, Benson, et al. (1980)
Smith, Brown & Jones (1980a); Smith, Brown & Jones (1980)

Then there is the case of disambiguate-add-year-suffix only. I suppose
you would solve it in this way (letters depending on the bibliography
sorting):

Smith et al. (1980a); Smith et al. (1980b)
Smith et al. (1980a); Smith et al. (1980c)
Smith et al. (1980a); Smith et al. (1980d)

Is this correct?

That looks great. I have these and a couple of corollary variations
expressed as test cases here. I’ve also adjusted three of the
existing tests to reflect this behavior. Is it alright to push the
changes into the test suite?

Thanks for your patience over all this back-and-forth, Andrea. I was
guilty of thinking too hard about this.

Frank

Thank you for your help in sorting this problem out. It’s a go from
me.

Just one question: generating those suffixes is quite hard form me,
since bibliography and citations are rendered independently. I was
having a look at your code but I didn’t yet understand how you handle
the problem. Can you give me a hint, please?

Thanks.

Andrea

Then there is the case of disambiguate-add-year-suffix only. I suppose
you would solve it in this way (letters depending on the bibliography
sorting):

Smith et al. (1980a); Smith et al. (1980b)
Smith et al. (1980a); Smith et al. (1980c)
Smith et al. (1980a); Smith et al. (1980d)

Is this correct?

That looks great. I have these and a couple of corollary variations
expressed as test cases here. I’ve also adjusted three of the
existing tests to reflect this behavior. Is it alright to push the
changes into the test suite?

Thanks for your patience over all this back-and-forth, Andrea. I was
guilty of thinking too hard about this.

Thank you for your help in sorting this problem out. It’s a go from
me.

Just one question: generating those suffixes is quite hard form me,
since bibliography and citations are rendered independently. I was
having a look at your code but I didn’t yet understand how you handle
the problem. Can you give me a hint, please?

While trawling through documentation I came across the Chicago rule on
sorting and year suffixes. As you say, the suffixes applied have to
depend on the bibliography sort order. I think that’s what happens,
but I’m going to build out the Chicago example as a test case, just to
be sure. After I’ve confirmed that it’s working right, I’ll look back
at the code and (attempt to?) explain how it works.

Frank

It’s been a long while since I’ve dealt with this, but the conceptual
approach in my original XSLT code was that it indeed is generated in
the bibliography. I would group the raw (internal) bibliography list
by author (by a concatenated normalized string of all the names
really*), and each item in the group would get an index. That index
could then converted to the suffix for output.

Bruce

  • E.g. an item with “John Doe” and “Jane Smith” would sort and group
    on “doe-john;smith-jane” sort of thing.

that’s actually the purpose of the disambiguate_YearSuffixAndSort.txt
test, which sorts according to the bibliography sort. Right?

Andrea

While trawling through documentation I came across the Chicago rule on
sorting and year suffixes. As you say, the suffixes applied have to
depend on the bibliography sort order. I think that’s what happens,
but I’m going to build out the Chicago example as a test case, just to
be sure. After I’ve confirmed that it’s working right, I’ll look back
at the code and (attempt to?) explain how it works.

that’s actually the purpose of the disambiguate_YearSuffixAndSort.txt
test, which sorts according to the bibliography sort. Right?

Yep. I’m in a suspenders-and-a-belt mood this evening though. :slight_smile:

While trawling through documentation I came across the Chicago rule on
sorting and year suffixes. As you say, the suffixes applied have to
depend on the bibliography sort order. I think that’s what happens,
but I’m going to build out the Chicago example as a test case, just to
be sure. After I’ve confirmed that it’s working right, I’ll look back
at the code and (attempt to?) explain how it works.

that’s actually the purpose of the disambiguate_YearSuffixAndSort.txt
test, which sorts according to the bibliography sort. Right?

Yep. I’m in a suspenders-and-a-belt mood this evening though. :slight_smile:

Okay, everything checks out, and a couple of fresh Chicago year-suffix
tests have been pushed to the test suite. The tests failed at first,
and I thought there was trouble in the processor; but then I noticed
that chicago-author-date.csl just didn’t yet have a tertiary sort key
for the title.

About the year-suffix assignment, the processor has a persistent
registry that holds, among other things, the disambiguation parameters
for each reference item, as well as its bibliography sort keys. When
disambiguation is performed, the cites that still clash are returned
by the disambiguation function to a “leftovers” list. The leftovers
are sorted using their bibliography sort keys, and then integers are
assigned to the year-suffix disambiguation value of each item in the
list, in sequence. At rendering time, the value is used to generate
the year-suffix.

The sorting happens in src/registry.js, around line 511. The function
that generates the year suffix is in src/util_number.js, at around
line 121. As a bit of trivia, the suffix is surprisingly tricky to
generate correctly for large values; it cannot be built with a
straightforward numeric mapping, because it is based on a “numbering
system” with no zero placeholder.