Disambiguation: A Probably Silly Question

I have a probably dumb question about disambiguation.

Where the spec says (of add-names):

[names] are added one by one to all members of a set of ambiguous cites, until no more cites in the set can be disambiguated by adding expanded names

(And similarly in relation to the name-adding that happens after expansion has failed to completely disambiguate).

It seems to me to allow two interpretations:

  • Progressively add names one by one. On each occasion when a name is added, stop iff adding that name did not improve the position or iff the cites are now fully disambiguated, i.e. treat “until” as an algorithmic test requiring some progress on each pass.
  • Add as many names as you like, if doing so will disambiguate the cite, i.e. treat until as meaning really just “if”, so that the rule is really “add the minimum number of names required to disambiguate the cite, if that is possible”.

The difference can be seen if one considers two cites:

A. John Smith, John Doe, Jane Roe
B. John Smith, John Doe, Albertina Other

and assume that our initial attempt is set to print just one name.

On interpretation 1:

  • At stage 1 we have ambiguous cites (Smith / Smith)
  • At stage 2, after adding a name we still have ambiguous cites (Smith, Doe / Smith, Doe).
  • If we add one more name, we will get unambiguous cites (Smith, Doe, Roe / Smith, Doe, Other).

On the first interpretation, since Stage 2 (adding a name) makes “no progress”, we stop there (in fact, we don’t add the name: it did not good so we roll back to “Smith” and add a year suffix if that’s possible). On the second interpretation, we continue. In theory we would do so even if there were 99 authors and they differed only at author number 99.

Both interpretations seem plausible. The first (of course) fails to disambiguate cites which are potentially capable of being disambiguated by adding names. But the second, while it achieves disambiguation in every case where it is theoretically possible, does so at the price of adding potentially a massive number of names to a large number of cites, even if doing so achieves only a “minor win”, and since as I understand it names are always added to all the ambiguous cites (if they can be) this could result in lots of long namelists.

Neither presents any notably greater technical difficulty. In the average case, I suspect, the results will not be markedly different. Which should it be?

It’s interpretation 2: “add the minimum number of names required to disambiguate the cite, if that is possible”, though it’s actually not technically the minimum number of names, e.g. if you look at a set of three

A. John Smith, John Doe, Jane Roe
B. John Smith, John Doe, Albertina Other
C. John Smith, John Doe, Laura Ingalls

you’d expand all of them to their third name, even though technically you could leave one of them as Smith et al.

I think the can be following until clarifies this in the current version of the specs, but I’m also open for rephrasing.

One thing I’m not sure about, though, is how a “set” is defined exactly. So think

A. John Smith, John Doe, Jane Roe
B. John Smith, John Doe, Albertina Other
C. John Smith, Mark Jones, Laura Ingalls

Does C become Smith, Jones, Ingalls or just Smith, Jones? If A, B, C are a “set” (and since they’re initially ambiguous I think they are) then we’d do the former, but it seems excessive (and I don’t think correct as per APA rules, which have the most in-depth description of this rule in any style guide. @bwiernik would know this).

1 Like

Technically, the second C is disambiguated at Smith, Jones, et al. Most styles adopt that position and allow “et al” to stand in for just one name. APA takes a rather ahistorical stance that “et al” means only “and others” plural (et alia) and not also “and another” singular (et alium) and so insists that “et al” must only stand in for two or more authors when disambiguating.

That a fairly unique (and obscure and dumb) rule, so I’m okay with CSL not supporting it.

Thanks. I think (leaving aside the question whether et al is added) the second C gets to “Smith, Jones” because it is never ambiguous. Basically at each stage where cites have been disambiguated those that are no longer ambiguous get removed from the list and are not processed further.