Disambiguation: changes to test fixtures

Andrea, Sylvester:

Quite some time ago, Andrea raised an objection to the spec
description of disambiguation behavior, which I should have addressed,
but didn’t:

http://xbiblio-devel.2463403.n2.nabble.com/disambiguation-one-more-tt5131926.html#a5133985

Rintze has proposed a revision that clarifies the behavior:

https://github.com/citation-style-language/documentation/pull/16

Reviewing the amendments, I was reminded of a flaw in citeproc-js,
which in some situations was failing to drop names that do not
contribute to disambiguation. The failure is reflected in some of the
fixtures in the test suite. I took another look at my code, and
managed to clean up the behavior. The effect can be seen in the
following changeset:

https://bitbucket.org/bdarcus/citeproc-test/changeset/e4225e251798

The affected fixtures are linked below. Please take a look at them,
and let us know whether you approve of the new behavior. The effect on
the specification is outlined in comments to Rintze’s github pull
request, linked above.

https://bitbucket.org/bdarcus/citeproc-test/src/beda343d95bc/processor-tests/humans/disambiguate_AddNamesFailure.txt
https://bitbucket.org/bdarcus/citeproc-test/src/beda343d95bc/processor-tests/humans/disambiguate_AddNamesFailureWithAddGivenname.txt
https://bitbucket.org/bdarcus/citeproc-test/src/beda343d95bc/processor-tests/humans/disambiguate_AndreaEg1.txt
https://bitbucket.org/bdarcus/citeproc-test/src/beda343d95bc/processor-tests/humans/disambiguate_ByCiteRetainNamesOnFailureIfYearSuffixNotAvailable.txt

Many thanks,
Frank

Frank,

bibtex-ruby does not implement any disambiguation support yet, so everything is fine by me.

In unrelated news, I am currently working on a wrapper around citeproc-js for the csl-test environment (to make it possible to access different citeproc engines through a single API) but have run into several issues with different JavaScript interpreters (using xml4e for Rhino and SpiderMonkey and xmldom for everything else). Right now the wrapper is designed to work with any of the common interpreters (V8, Rhino, SpiderMonkey, node, JavaScriptCore, or the JScript) so I wanted to ask if you were aware of any major obstacles in getting citeproc-js to work in any of them?

Many thanks
Sylvester

Hi Frank,

sorry I’m so slow in catching up… as usual.

Frank Bennett <@Frank_Bennett> writes:

Andrea, Sylvester:

Quite some time ago, Andrea raised an objection to the spec
description of disambiguation behavior, which I should have addressed,
but didn’t:

http://xbiblio-devel.2463403.n2.nabble.com/disambiguation-one-more-tt5131926.html#a5133985

Rintze has proposed a revision that clarifies the behavior:

Rewrite of Disambiguation section by rmzelle · Pull Request #16 · citation-style-language/documentation · GitHub

Reviewing the amendments, I was reminded of a flaw in citeproc-js,
which in some situations was failing to drop names that do not
contribute to disambiguation. The failure is reflected in some of the
fixtures in the test suite. I took another look at my code, and
managed to clean up the behavior. The effect can be seen in the
following changeset:

https://bitbucket.org/bdarcus/citeproc-test/changeset/e4225e251798

The affected fixtures are linked below. Please take a look at them,
and let us know whether you approve of the new behavior. The effect on
the specification is outlined in comments to Rintze’s github pull
request, linked above.

https://bitbucket.org/bdarcus/citeproc-test/src/beda343d95bc/processor-tests/humans/disambiguate_AddNamesFailure.txt
https://bitbucket.org/bdarcus/citeproc-test/src/beda343d95bc/processor-tests/humans/disambiguate_AddNamesFailureWithAddGivenname.txt
https://bitbucket.org/bdarcus/citeproc-test/src/beda343d95bc/processor-tests/humans/disambiguate_AndreaEg1.txt

Yes, I agree.

https://bitbucket.org/bdarcus/citeproc-test/src/beda343d95bc/processor-tests/humans/disambiguate_ByCiteRetainNamesOnFailureIfYearSuffixNotAvailable.txt

I’m not so sure, here. I think that instead of:

Asthma et al. (1990); Asthma et al. (1990);
Dropsy, Enteritis, X. Fever (2000); Dropsy, Enteritis, Y. Fever (2000)

it should be:

Asthma, Bosworth Bronchitis, et al. (1990); Asthma, Beauregarde Bronchitis, et al. (1990);
Dropsy, Edward Enteritis, et al. (2000); Dropsy, Ernie Enteritis, et al. (2000)

When disambiguate-add-names fails to disambiguate a cite, while in the
1.0 specification all names should be shown regardless “et-al”
limitations, the amendments now require the minimum effort to be used:

If cites cannot be (fully) disambiguated by expanding the rendered names,
and if ``disambiguate-add-names`` is set to "true", then the names still
hidden as a result of et-al abbreviation after the disambiguation attempt of
``disambiguate-add-names`` are added one by one to all members of a set of
ambiguous cites, until no more cites in the set can be disambiguated by
adding *expanded* names.

This applies also to:

  1. disambiguate_ByCiteBaseNameCountOnFailureIfYearSuffixAvailable
    Not: Asthma, Bronchitis, Cold (1990a); Asthma, Bronchitis, Cold (1990b);
    Dropsy, Enteritis, X. Fever (2000); Dropsy, Enteritis, Y. Fever (2000)

    But: Asthma, Bosworth Bronchitis, et al. (1990); Asthma, Beauregarde Bronchitis, et al. (1990);
    Dropsy, Edward Enteritis, et al. (2000); Dropsy, Ernie Enteritis, et al. (2000)

  2. disambiguate_ByCiteGivennameExpandCrossNestedNames
    Not: J. Doe, Jane Roe, Robert Jones; J. Doe, Josephine Roe, R. Jones; J. Doe, Jane Roe, Richard Jones
    But: J. Doe, Jane Roe, Robert Jones; J. Doe, Josephine Roe, et al.; J. Doe, Jane Roe, Richard Jones

  3. disambiguate_ByCiteRetainNamesOnFailureIfYearSuffixNotAvailable
    Not: Asthma et al. (1990); Asthma et al. (1990);
    Dropsy, Enteritis, X. Fever (2000); Dropsy, Enteritis, Y. Fever (2000)

    But: Asthma, Bosworth Bronchitis, et al. (1990); Asthma, Beauregarde Bronchitis, et al. (1990);
    Dropsy, Edward Enteritis, et al. (2000); Dropsy, Ernie Enteritis, et al. (2000)

  4. disambiguate_AllNamesBaseNameCountOnFailureIfYearSuffixAvailable: the
    first part is not correct (this is the old behavior).
    Not: Asthma, Bronchitis, Cold (1990a); Asthma, Bronchitis, Cold (1990b);
    But: Asthma et al. (1990a); Asthma et al. (1990b);

  5. disambiguate_YearSuffixAtTwoLevels: I’m not entirely sure, here.
    Not: Smith, Jones & Brown (1986a); Smith, Jones & Brown (1986b);
    Smith, Jones, Brown, et al. (1986a); Smith, Jones, Brown, et al. (1986b)
    But: Smith et al. (1986a); Smith et al. (1986b);
    Smith et al. (1986c); Smith et al. (1986d)

Other problematic tests:

  • disambiguate_ByCiteGivennameShortFormInitializeWith: initialize-with
    is set and so the full name (Smith) should be shown. And so:
    Not: Roe
    J Doe; A Doe
    Smith; Smith
    But: Roe
    J Doe; A Doe
    Thomas Smith; Ted Smith

  • disambiguate_ByCiteDisambiguateCondition: “disambiguate” condition
    should be applied afetr year-suffix if the year suffix is not enough:

    Not: Doe & Roe, Book A (2000); Doe & Roe, Book B (2000)
    But: Doe et al. (2000a); Doe et al. (2000b)

I would also like to address the problem of those tests implying a
stateful processor. We could move them in a different sub-directory,
maybe. These are the ones I remember in the disambiguate group:

disambiguate_AllNamesSimpleSequence
disambiguate_DisambiguateWithThree
disambiguate_DisambiguateWithThree2
disambiguate_YearSuffixFiftyTwoEntries
disambiguate_YearSuffixAtTwoLevels

Thanks,–
andrea

Andrea,

Thanks for looking carefully.

One issue is very clear. In the
disambiguate_ByCiteRetainNamesOnFailureIfYearSuffixNotAvailable test,
by-cite disambiguation should cycle through name expansions after
adding names to see if anything helps. The processor currently only
attempts one “step” of name expansion with this disambiguation rule,
which is why disambiguation doesn’t occur on the second full name in
that pairing. It’s a known limitation at the moment, which may be a
hangover from days before the disambiguation code was cleaned up and
made easier to comprehend and control. I’ll look into improving on
that when time permits. I agree that it should be possible – and if
you have a running implementation with better behavior, the spec can
certainly be amended to that effect.

One the last point raised (concerning
disambiguate_ByCiteDisambiguateCondition), applying the disambiguation
condition after year-suffix is applied would have the same effect as
disabling it altogether, since year-suffix always succeeds. I think
the current behavior there is probably correct, although there must be
very few styles that apply those rules.

Frank

Frank Bennett <@Frank_Bennett> writes:

Andrea,

Thanks for looking carefully.

One issue is very clear. In the
disambiguate_ByCiteRetainNamesOnFailureIfYearSuffixNotAvailable test,
by-cite disambiguation should cycle through name expansions after
adding names to see if anything helps. The processor currently only
attempts one “step” of name expansion with this disambiguation rule,
which is why disambiguation doesn’t occur on the second full name in
that pairing. It’s a known limitation at the moment, which may be a
hangover from days before the disambiguation code was cleaned up and
made easier to comprehend and control. I’ll look into improving on
that when time permits. I agree that it should be possible – and if
you have a running implementation with better behavior, the spec can
certainly be amended to that effect.

If I understand correctly you agree that expanding rendered names with
initials and, if needed, given-names should be done before adding new
names. This is the way I’m interpreting the spec and this is also the
way citeproc-hs is coded to do.

Another way to intend disambiguation is first to try to add names one by
one, then to try with names plus initials one by one, and then to try
with names plus given-names one by one. This is the way citeproc-js
seems to work. But this is not the way I interpret the spec. So the spec
should be amended only if we think this second disambiguation algorithm
is to be preferred.

One the last point raised (concerning
disambiguate_ByCiteDisambiguateCondition), applying the disambiguation
condition after year-suffix is applied would have the same effect as
disabling it altogether, since year-suffix always succeeds. I think
the current behavior there is probably correct, although there must be
very few styles that apply those rules.

Actually year-suffix always succeeds because you add a suffix even when
there is no year. See:

date_YearSuffixWithNoDate
date_YearSuffixImplicitWithNoDate

which produce:

(John Doe n.d.-a [Accessed: June 01, 1965]; John Doe n.d.-b [Accessed: June 01, 2065])

I must confess that I do not like this solution very much. Actually I
think that the last disambiguation step (evaluating the style with the
disambiguate condition set to true) was intended exactly to deal with
cases where there is not a year to add a suffix to.

On the other hand I understand there may be use cases I’m not aware of
(which means nothing, BTW) that dictate for such a behavior. If this is
true the spec should be amended so that the fifth step becomes the
fourth and the year-suffix is also to be applied to the “no date” term
when it is used. Still, what happens when it is not used?

Andrea

Frank Bennett <@Frank_Bennett> writes:

Andrea,

Thanks for looking carefully.

One issue is very clear. In the
disambiguate_ByCiteRetainNamesOnFailureIfYearSuffixNotAvailable test,
by-cite disambiguation should cycle through name expansions after
adding names to see if anything helps. The processor currently only
attempts one “step” of name expansion with this disambiguation rule,
which is why disambiguation doesn’t occur on the second full name in
that pairing. It’s a known limitation at the moment, which may be a
hangover from days before the disambiguation code was cleaned up and
made easier to comprehend and control. I’ll look into improving on
that when time permits. I agree that it should be possible – and if
you have a running implementation with better behavior, the spec can
certainly be amended to that effect.

If I understand correctly you agree that expanding rendered names with
initials and, if needed, given-names should be done before adding new
names. This is the way I’m interpreting the spec and this is also the
way citeproc-hs is coded to do.

Another way to intend disambiguation is first to try to add names one by
one, then to try with names plus initials one by one, and then to try
with names plus given-names one by one. This is the way citeproc-js
seems to work. But this is not the way I interpret the spec. So the spec
should be amended only if we think this second disambiguation algorithm
is to be preferred.

I can see how it would result in an earlier termination in some cases.
I’m not sure when I’d find the time for the revisions to the
processor. What do you think, do you prefer that method (i.e. running
the full add/expand cycle on each name as it is added)?

By the way, it is probably better to look at


for
the most recent version of the description for disambiguation. I rewrote
some stuff after the pull request Frank linked to. In rendered form (pulled
straight from the repo):

http://rst.projectfondue.com/api/v1/rst2html/?rst_url=https%3A%2F%2Fraw.github.com%2Fcitation-style-language%2Fdocumentation%2Fmaster%2Fspecification.txt&css_url=http%3A%2F%2Fcitation-style-language.github.com%2Fstyles%2Fcss%2Fscreen.css&output_type=html&callback=&document_output=whole&highlight_style=manni

RintzeOn Mon, Oct 31, 2011 at 7:36 AM, Frank Bennett <@Frank_Bennett>wrote:

Rintze Zelle <@Rintze_Zelle> writes:

By the way, it is probably better to look
at GitHub - citation-style-language/documentation: Citation Style Language documentation
specification.txt for the most recent version of the description for
disambiguation. I rewrote some stuff after the pull request Frank
linked to. In rendered form (pulled straight from the repo):

the discussion is indeed about the impact of this version of the spec
(and on the fact that when add-names fails to disambiguate now the added
names are not retained anymore as we used to in the 1.0 specification).

Andrea

Frank Bennett <@Frank_Bennett> writes:> On Mon, Oct 31, 2011 at 11:21 AM, andrea rossato > <@andrea_rossato2> wrote:

Frank Bennett <@Frank_Bennett> writes:

Andrea,

Thanks for looking carefully.

One issue is very clear. In the
disambiguate_ByCiteRetainNamesOnFailureIfYearSuffixNotAvailable test,
by-cite disambiguation should cycle through name expansions after
adding names to see if anything helps. The processor currently only
attempts one “step” of name expansion with this disambiguation rule,
which is why disambiguation doesn’t occur on the second full name in
that pairing. It’s a known limitation at the moment, which may be a
hangover from days before the disambiguation code was cleaned up and
made easier to comprehend and control. I’ll look into improving on
that when time permits. I agree that it should be possible – and if
you have a running implementation with better behavior, the spec can
certainly be amended to that effect.

If I understand correctly you agree that expanding rendered names with
initials and, if needed, given-names should be done before adding new
names. This is the way I’m interpreting the spec and this is also the
way citeproc-hs is coded to do.

Another way to intend disambiguation is first to try to add names one by
one, then to try with names plus initials one by one, and then to try
with names plus given-names one by one. This is the way citeproc-js
seems to work. But this is not the way I interpret the spec. So the spec
should be amended only if we think this second disambiguation algorithm
is to be preferred.

I can see how it would result in an earlier termination in some cases.
I’m not sure when I’d find the time for the revisions to the
processor. What do you think, do you prefer that method (i.e. running
the full add/expand cycle on each name as it is added)?

I’m not sure but I think running the full add/expand cycle on each name
as it is added permits to have fewer names and thus a smaller impact on
the “et-al” settings. For me it is also easier to code – but I’m not
sure, since I didn’t try the other method.

Andrea

Frank Bennett <@Frank_Bennett> writes:

Frank Bennett <@Frank_Bennett> writes:

Andrea,

Thanks for looking carefully.

One issue is very clear. In the
disambiguate_ByCiteRetainNamesOnFailureIfYearSuffixNotAvailable test,
by-cite disambiguation should cycle through name expansions after
adding names to see if anything helps. The processor currently only
attempts one “step” of name expansion with this disambiguation rule,
which is why disambiguation doesn’t occur on the second full name in
that pairing. It’s a known limitation at the moment, which may be a
hangover from days before the disambiguation code was cleaned up and
made easier to comprehend and control. I’ll look into improving on
that when time permits. I agree that it should be possible – and if
you have a running implementation with better behavior, the spec can
certainly be amended to that effect.

If I understand correctly you agree that expanding rendered names with
initials and, if needed, given-names should be done before adding new
names. This is the way I’m interpreting the spec and this is also the
way citeproc-hs is coded to do.

Another way to intend disambiguation is first to try to add names one by
one, then to try with names plus initials one by one, and then to try
with names plus given-names one by one. This is the way citeproc-js
seems to work. But this is not the way I interpret the spec. So the spec
should be amended only if we think this second disambiguation algorithm
is to be preferred.

I can see how it would result in an earlier termination in some cases.
I’m not sure when I’d find the time for the revisions to the
processor. What do you think, do you prefer that method (i.e. running
the full add/expand cycle on each name as it is added)?

I’m not sure but I think running the full add/expand cycle on each name
as it is added permits to have fewer names and thus a smaller impact on
the “et-al” settings. For me it is also easier to code – but I’m not
sure, since I didn’t try the other method.

That sounds right to me.

Rintze: If the discussion here is clear, would any change to spec be
needed to specify clearly the behavior Andrea has implemented, or did
I just misread (or mis-write) the docs?

@Frank: we had a chat about this very topic on August 2nd, where we seemed
to agree to add the minimum number of names.

In my reading of the spec, this is what is required (emphasis mine):On Mon, Oct 31, 2011 at 10:20 AM, Frank Bennett <@Frank_Bennett>wrote:

I’m not sure but I think running the full add/expand cycle on each name
as it is added permits to have fewer names and thus a smaller impact on
the “et-al” settings. For me it is also easier to code – but I’m not
sure, since I didn’t try the other method.

That sounds right to me.

Rintze: If the discussion here is clear, would any change to spec be
needed to specify clearly the behavior Andrea has implemented, or did
I just misread (or mis-write) the docs?


“all-names”
Name expansion has the dual purpose of disambiguating cites and names. All
rendered ambiguous names, in both ambiguous and umambiguous cites, are
subject to disambiguation. Each name is progressively transformed until
it is disambiguated.
Names that can not be disambiguated remain in their
original form.

“by-cite”
Default. As “all-names”, but the goal of name expansion is limited to
disambiguating cites. Only ambiguous names in ambiguous cites are affected,
and disambiguation stops after the first name that eliminates cite
ambiguity
.


Rintze

I’m not sure but I think running the full add/expand cycle on each name
as it is added permits to have fewer names and thus a smaller impact on
the “et-al” settings. For me it is also easier to code – but I’m not
sure, since I didn’t try the other method.

That sounds right to me.

Rintze: If the discussion here is clear, would any change to spec be
needed to specify clearly the behavior Andrea has implemented, or did
I just misread (or mis-write) the docs?

@Frank: we had a chat about this very topic on August 2nd, where we seemed
to agree to add the minimum number of names.
In my reading of the spec, this is what is required (emphasis mine):

“all-names”
Name expansion has the dual purpose of disambiguating cites and names. All
rendered ambiguous names, in both ambiguous and umambiguous cites, are
subject to disambiguation. Each name is progressively transformed until it
is disambiguated.
Names that can not be disambiguated remain in their
original form.
“by-cite”
Default. As “all-names”, but the goal of name expansion is limited to
disambiguating cites. Only ambiguous names in ambiguous cites are affected,
and disambiguation stops after the first name that eliminates cite
ambiguity
.

Yup, that’s clear. Sorry for being lazy; I was about to doze off.