I have a probably dumb question about disambiguation.
Where the spec says (of
[names] are added one by one to all members of a set of ambiguous cites, until no more cites in the set can be disambiguated by adding expanded names
(And similarly in relation to the name-adding that happens after expansion has failed to completely disambiguate).
It seems to me to allow two interpretations:
- Progressively add names one by one. On each occasion when a name is added, stop iff adding that name did not improve the position or iff the cites are now fully disambiguated, i.e. treat “until” as an algorithmic test requiring some progress on each pass.
- Add as many names as you like, if doing so will disambiguate the cite, i.e. treat until as meaning really just “if”, so that the rule is really “add the minimum number of names required to disambiguate the cite, if that is possible”.
The difference can be seen if one considers two cites:
A. John Smith, John Doe, Jane Roe
B. John Smith, John Doe, Albertina Other
and assume that our initial attempt is set to print just one name.
On interpretation 1:
- At stage 1 we have ambiguous cites (Smith / Smith)
- At stage 2, after adding a name we still have ambiguous cites (Smith, Doe / Smith, Doe).
- If we add one more name, we will get unambiguous cites (Smith, Doe, Roe / Smith, Doe, Other).
On the first interpretation, since Stage 2 (adding a name) makes “no progress”, we stop there (in fact, we don’t add the name: it did not good so we roll back to “Smith” and add a year suffix if that’s possible). On the second interpretation, we continue. In theory we would do so even if there were 99 authors and they differed only at author number 99.
Both interpretations seem plausible. The first (of course) fails to disambiguate cites which are potentially capable of being disambiguated by adding names. But the second, while it achieves disambiguation in every case where it is theoretically possible, does so at the price of adding potentially a massive number of names to a large number of cites, even if doing so achieves only a “minor win”, and since as I understand it names are always added to all the ambiguous cites (if they can be) this could result in lots of long namelists.
Neither presents any notably greater technical difficulty. In the average case, I suspect, the results will not be markedly different. Which should it be?