disambiguate-add-date option?

I don’t think you’re comparing the output, I think you’re comparing
the original data. Then there is less ambiguity, right? I agree that
maximum number of authors can’t be determined, but couldn’t you just
run a loop to find out how many there are?

the problem is that you don’t know which part of the original data
will be used by that specific style. You may be keeping track of it by
consuming the input, as Stephen suggested, though (which I think is
going to make the code quite complex. But I might be wrong).

You can look here to see how Simon does it in Zotero:
https://www.zotero.org/trac/browser/extension/branches/1.0/chrome/content/zotero/xpcom/cite.js

It’s true that your example styles don’t work in Zotero, but I suspect
they could be fixed to work–I just don’t have time to play with them
right now–maybe during the week.

As I said I discovered the issues my examples point out by reading the
source code you are linking to. Maybe you are right and you can fix
the problems easily.

What I wanted to stress is that you cannot assume, as the code does,
that since a citation has a date field set, and the style is using the
disambiguate-add-year-suffix option then the citation is disambiguated
by applying this rule. It may be the case that the style is not
actually using the date for that specific reference type, as it is the
case with my 3rd style example.

Andrea

My point was that the number is low and only grows with the number of
authors of a work, which always is a low number. So I don’t see a reason
to worry about performance here.

why not? the problem is not just the number of authors, but the
complexity of the style too. You need to re-evaluate the style to
re-generate the citation. Is the union of the complexity of the style
and the complexity of the citation that will give you the complexity
of re-generating a citation. I’m not really familiar with complexity
calculation, but shouldn’t this stuff be exponential (or maybe just
quadratic)?

But you know whether the content of a certain variable has become part
of the output, at least if you keep record. Author-year collapsing can
only be reasonably done if for each cited work in a multiple work
citation the same variables are printed and the printed variables, with
the exception of the year, are identical. Additionally, one should
require that the individual citations would generate the same output if
the years were identical. Under these circumstances one can collapse the
citations by printing any one of the citations and substituting the year
for a comma-separated list of years.

consuming the (cached) input could be another solution. I think it
would be more complicated though. There’s some code for consuming
variables in the Haskell implementation, but I don’t like this
approach.

Moreover, by using my idea I can apply some generic programming
techniques[1], and I like this idea quite a lot (I’ve been struggling
for quite sometime in the search of a problem that could be addressed
by using a SYB approach…:wink:

Andrea

[1] Computer Science - Vrije Universiteit Amsterdam

Andrea Rossato wrote:> On Mon, Aug 04, 2008 at 10:02:52AM +0200, Stephan Tolksdorf wrote:

My point was that the number is low and only grows with the number of
authors of a work, which always is a low number. So I don’t see a reason
to worry about performance here.

why not? the problem is not just the number of authors, but the
complexity of the style too. You need to re-evaluate the style to
re-generate the citation. Is the union of the complexity of the style
and the complexity of the citation that will give you the complexity
of re-generating a citation. I’m not really familiar with complexity
calculation, but shouldn’t this stuff be exponential (or maybe just
quadratic)?

I don’t see anything exponential and even if you have O(n^2) or worse
performance you don’t necessarily run into problems as long as you can
assume that n is small enough.

Stephan

Might we confirm this assumption?

I’m not sure if this is helpful, but …

Thinking out loud about how I might do this in XSLT, what if an
intermediate representation included both the shortened version and
other versions, such that you didn’t need to re-generate? E.g. if we
break down “Doe et al.” into:

  1. the shorted and “final” label string (“Doe et al”)
  2. a string that concatenates the author names in full
    (“doe,jane;smith,stephen;sanders,susan”); can be used both for
    sorting, and for comparison to determine if further disambiguation
    might be needed*

So, for example, if you have more than one item with the author sort
string “doe,jane;smith,stephen;sanders,susan”, then you know you need
to group them, and, depending on the style, append a suffix to each.
This is what I do in my XSLT code*.

What about adding names? Let’s say we have two references:

“Doe et al” => “doe,jane;smith,stephen;sanders,susan”
“Doe et al” => “doe,john;smith,stephen;sanders,susan”

And let’s further say the given names need to be initialized.

You know by comparison that you need to disambiguate. But how? We not
only need to know what names to add, but how.

How about we first save the formatting rules somewhere so we don’t
need to look at the style again?

We then iterate through the names until we get to a pair that are not
the same. In this case, that would be “doe,jane” and “doe,john”. Since
these are in the fact the first authors, all we need to do is add the
given names, formatted according to the style.

Bruce

  • I have a function in my XSLT code that constructs this string and
    inserts it into an intermediate representation, though I don’t have
    the more complex disambiguation options implemented.

My point was that the number is low and only grows with the number of
authors of a work, which always is a low number. So I don’t see a reason
to worry about performance here.

why not? the problem is not just the number of authors, but the
complexity of the style too. You need to re-evaluate the style to
re-generate the citation.

Might we confirm this assumption?

I’m not sure if this is helpful, but …

So far I’m not able to find a counter-example. So I’d call it at leas
a grounded conjecture.

What about adding names? Let’s say we have two references:

“Doe et al” => “doe,jane;smith,stephen;sanders,susan”
“Doe et al” => “doe,john;smith,stephen;sanders,susan”

And let’s further say the given names need to be initialized.

You know by comparison that you need to disambiguate. But how? We not
only need to know what names to add, but how.

How about we first save the formatting rules somewhere so we don’t
need to look at the style again?

We then iterate through the names until we get to a pair that are not
the same. In this case, that would be “doe,jane” and “doe,john”. Since
these are in the fact the first authors, all we need to do is add the
given names, formatted according to the style.

I think your encoding is not enough, though. You may need to take into
account name-as-sort-order and the related sort-separator, which may
be hidden down a long chain of macro calls.

The solution I’m coding (it is not working yet and I’m not sure it
will) is this:

In the output I have a type, a list (let’s say of ‘contributors’) I
can match, which is made up of all possible disambiguation
possibilities.

That is to say, I have a list of the possible outcome of adding a
name, a second name, etc. Moreover for each name I have the list of
the possible outcome, given the style, of the given name
disambiguation rule.

For instance:
FC [FN "Rossato, A. " [“Rossato, Andrea”] ,S ", ",FS “et al.” ]
[[FN "Rossato, A. " [“Rossato, Andrea”] ,S ", ",FN "Pascuzzi, G. " [“Pascuzzi, Giovanni”] ,S ", ",FS “et al.” ],
[FN "Rossato, A. " [“Rossato, Andrea”] ,S ", ",FN "Pascuzzi, G. " [“Pascuzzi, Giovanni”] ,S ", & ",FN "Caso, R. " [“Caso, Roberto”] ]]]]

After evaluating the style I will search for citations with the same
FC (FormatedContributors) and the same formatted year and I will start
disambiguating them till a non colliding version is found.

Andrea

This is probably true. I’ve created a ticket to correct these issues
in Zotero.
https://www.zotero.org/trac/ticket/1097

Simon

You know by comparison that you need to disambiguate.

No, you do not.

Today I lost myself into the complexity of disambiguation and I think
I hit a problem which makes all this disambiguation even more
problematic.

The problem is the locator. The output of a citation with a locator is
not comparable with the citation of the same reference, but without
locator. And you cannot remove the locator from a formatted citation,
since you cannot know how it will look like.

And evaluating a citation with the locator variable unset will not
help you, since the output could differ a lot (think about <if

) and disambiguating this one could be totally different from
disambiguating the citation with the locator.

The only solution I can think is this one: you disambiguate if two
citations share the same group of names, equally formatted, and the
same year, equally formatted. The problem is that I will disambiguate
even if the citations, which have the same names and year, are already
disambiguated (by the title or whatever else - an edition number?).

I know that you may think these issues are mostly theoretical since
I’m considering eventualities with a low probability of occurrence.
Nonetheless I think that if an implementation is not able to evaluate
all syntactically correct expressions because of a language feature,
maybe there could be an issue with the language feature itself.

Or maybe I’m just too fed up with this problem and I need to step back
and let sleep on the back of my mind for a few days. Maybe I’m just
wrong.

Andrea

You know by comparison that you need to disambiguate.

No, you do not.

Sure you do; you know that if specified, you need to add the year
suffix. In author-date styles, that’s the most common and important
one.

There are different kinds of disambiguation here, so it might make
sense to account for that in the code.

Today I lost myself into the complexity of disambiguation and I think
I hit a problem which makes all this disambiguation even more
problematic.

The problem is the locator. The output of a citation with a locator is
not comparable with the citation of the same reference, but without
locator. And you cannot remove the locator from a formatted citation,
since you cannot know how it will look like.

So don’t compare the formatted output of the entire citation.

And evaluating a citation with the locator variable unset will not
help you, since the output could differ a lot (think about <if

) and disambiguating this one could be totally different from
disambiguating the citation with the locator.

The only solution I can think is this one: you disambiguate if two
citations share the same group of names, equally formatted, and the
same year, equally formatted.

That’s basically what I’m suggesting.

The problem is that I will disambiguate
even if the citations, which have the same names and year, are already
disambiguated (by the title or whatever else - an edition number?).

Not following you here.

I know that you may think these issues are mostly theoretical since
I’m considering eventualities with a low probability of occurrence.
Nonetheless I think that if an implementation is not able to evaluate
all syntactically correct expressions because of a language feature,
maybe there could be an issue with the language feature itself.

I don’t think this is really a problem with CSL; it’s a function of
the ugly eccentricities of citation formatting.

Or maybe I’m just too fed up with this problem and I need to step back
and let sleep on the back of my mind for a few days.

That might make sense.

I hope Stephen will find some time to experiment with this, and maybe
offer some concrete input.

Bruce

The problem is that I will disambiguate
even if the citations, which have the same names and year, are already
disambiguated (by the title or whatever else - an edition number?).

Not following you here.

If 2 citations have the same group of names and the same year - the
only elements of a formatted citation you can compare -, then you must
apply the disambiguation rules. Even if the citation style for that
reference type may include other strings (which would be enough for
disambiguation).

I know that you may think these issues are mostly theoretical since
I’m considering eventualities with a low probability of occurrence.
Nonetheless I think that if an implementation is not able to evaluate
all syntactically correct expressions because of a language feature,
maybe there could be an issue with the language feature itself.

I don’t think this is really a problem with CSL; it’s a function of
the ugly eccentricities of citation formatting.

Which could also mean that the above solution is the best we can
achieve.

Andrea

I think the disambiguate conditional should work here, at least in
theory, although I haven’t tested this particular use of it with the
Zotero code.

Simon

Since the date should only be added if there are multiple interviews
with the same subject and interviewer, there should be something
like
disambiguate-add-date option in CSL. It would be relevant to letters
with the same recipient and sender but different dates. Would it be
possible to add this option?

This makes sense, but there’s a little problem related to the
previous
discussion with Andrea: how would we specify the formatting of the
date?

I’m not really clear: would the disambiguate prefix on the
conditional
work? I wasn’t the one the added the following, and so I’m not really
clear:

If text inside an block can be used to

differentiate two otherwise identical citations, it will be

added.

If the citations remain identical after its addition, it will not

be added.

I think the disambiguate conditional should work here, at least in
theory, although I haven’t tested this particular use of it with the
Zotero code.

Does that mean we don’t need an option then? MLA style uses this code
below, and it seems to work:

Best,
Elena

Never mind. This doesn’t work:

     <else-if position="subsequent">
       <group delimiter=", ">
         <text macro="contributors-short"/>
         <text macro="title-short"/>
           <choose>
             <if disambiguate="true">
               <text macro="issued"/>
             </if>
           </choose>
         <text macro="point-locators-subsequent"/>
       </group>
     </else-if>

This theoretically should add a date after title if the citation is
the same, but it doesn’t.
Best,
Elena