author-only citaitons (was do we need a CSL (citation) API?)

Changing the subject …

Re the pandoc discussions, there seems to be acceptance of the
"author-only" use case.

I have a hard time following that discussion, so will just ask: what
the hell is an “author-only” citation? It seems a contradiction in
terms.

Bruce

It may indeed be a contradiction in terms. I would be a nice one,
though, since it helps to solve a problem.

The problem is to have a single document source which will
consistently produce a valid output with (possibly) every CSL citation
style.

And hence the issue we are facing: suppose we are talking about some
author’s work, and not just referencing to it.

In pandoc we could express that with:

Doe [-@item1] said that...

That is to say, we hard-code the author’s name and suppress it (by
setting the suppress-author bit) in the following citation.

That would be rendered, in an in-text author-date style:

Doe (2005) said that...

In a footnote style, that would be:

Doe(1) said that...

(1) A book, 2005.

and what about a purely numeric style? Suppose we decide that, in such
kind of styles, “suppress-author” is ignored (or have some effect on
the formatting). Then it would render as:

Doe [1] said that…

The problem here is ‘Doe’. In such a style that may be undesirable.
Moreover, since citation numbers are generated on the fly, how would
you refer to that reference, if you wanted to use numbers?

Hence the idea: we produce ‘Doe’ too, with an “author-only” citation
and a new rule: in a purely numeric style “suppress-author” will
suppress the output and an “author-only” citation will become a
citation number without formatting.

See here for a hopefully more comprehensible example:
http://gsl-nagoya-u.net/http/pub/citeproc-doc.html#automating-text-insertions

Since even the author name is produced by the citeproc, switching
between different citation styles will always produce a consistent
output. Well, not really always…

There are some problems too, as the pandoc discussion demonstrate. In
pandoc you will be able to write:

Doe [-@item1; but see also @item2] said that...

That will be rendered as:

Doe (2005, but see also Roe 2004) said that...

Doe(1) said that...
(1) 2005, but see also Roe 2004.

What should happen in a numeric style?

Doe, but see also [2], said that...

I think.

Anyway, in pandoc, we are going to introduce a “textual citation”, so
that you could also be writing:

@item1 said that...

which will produce:

Doe (2005) said that...

Doe(1) said that...
(1) A Book, 2005.

Reference [1] said that...

The citeproc, though, will translate:

@item1 said that…

into:

[+@item1] [-@item1] said that…

where ‘+@’ means "set the “author-only” bit.

Obviously the first one [+@item1] is a special citation (since it
occurs before [-@item1]), and must not affect the other citations
positioning ([-@item1] is not ibid, and may even be the “first”,
right?).

Does all this make sense?

Andrea

Changing the subject …

Re the pandoc discussions, there seems to be acceptance of the
"author-only" use case.

I have a hard time following that discussion, so will just ask: what
the hell is an “author-only” citation? It seems a contradiction in
terms.

It may indeed be a contradiction in terms. I would be a nice one,
though, since it helps to solve a problem.

The problem is to have a single document source which will
consistently produce a valid output with (possibly) every CSL citation
style.

And hence the issue we are facing: suppose we are talking about some
author’s work, and not just referencing to it.

In pandoc we could express that with:

Doe [-@item1] said that…

That is to say, we hard-code the author’s name and suppress it (by
setting the suppress-author bit) in the following citation.

That would be rendered, in an in-text author-date style:

Doe (2005) said that…

In a footnote style, that would be:

Doe(1) said that…

(1) A book, 2005.

and what about a purely numeric style? Suppose we decide that, in such
kind of styles, “suppress-author” is ignored (or have some effect on
the formatting). Then it would render as:

Doe [1] said that…

The problem here is ‘Doe’. In such a style that may be undesirable.

OK, let me stop here, since this appears to be the crux of the matter.
Under which conditions would this be “undesirable”? Having a hard time
imagining a practical use case.

Moreover, since citation numbers are generated on the fly, how would
you refer to that reference, if you wanted to use numbers?

Not following here.

Hence the idea: we produce ‘Doe’ too, with an “author-only” citation
and a new rule: in a purely numeric style “suppress-author” will
suppress the output and an “author-only” citation will become a
citation number without formatting.

See here for a hopefully more comprehensible example:
http://gsl-nagoya-u.net/http/pub/citeproc-doc.html#automating-text-insertions

Still confused. Sorry. As I say above, I think I’m just not
understanding the use case.

Bruce

Changing the subject …

Re the pandoc discussions, there seems to be acceptance of the
"author-only" use case.

I have a hard time following that discussion, so will just ask: what
the hell is an “author-only” citation? It seems a contradiction in
terms.

It’s a familiar issue that has come up before in the Zotero forums –
the request by LaTeX users for a “\citet” command equivalent.

It turns out to be not just a matter of mirroring BibTeX or saving a
little typing effort. A common scientific citation style in China
uses a reference number instead of the author name. So where in-text
author date would look like this:

Smith (2000) says so, and others agree. (Brown 1999)

The pure numeric style will look like this (expressing in English):

Source [1] says so, and others agree. ^[2]^

To switch seamlessly between the two styles, citeproc needs to have
control over both the author name and the citation, so the
"suppress-author" toggle is not sufficient. An “author-only” toggle
is the nickname used in the pandoc discussion for citations that
involve inserting an author name, or a placeholder for the author
name, into the document text.

As proof that the use case is a real one, a Zotero user posted a
sample paper that uses the pure-numeric style here:

http://www.box.net/shared/fpgyyhjb4b

Reference marks are circled in red in the sample. 文献 in Chinese means
"source" or “reference”. Note the discriminant use of superscripting.

The related thread is here:

http://forums.zotero.org/discussion/6703/%3FFocus%3D29224#Comment_29417

Frank

Changing the subject …

Re the pandoc discussions, there seems to be acceptance of the
"author-only" use case.

I have a hard time following that discussion, so will just ask: what
the hell is an “author-only” citation? It seems a contradiction in
terms.

It’s a familiar issue that has come up before in the Zotero forums –
the request by LaTeX users for a “\citet” command equivalent.

This is where I’m getting confused. It seems like these are different use cases?

Anyway, moving on …

It turns out to be not just a matter of mirroring BibTeX or saving a
little typing effort. A common scientific citation style in China
uses a reference number instead of the author name. So where in-text
author date would look like this:

Smith (2000) says so, and others agree. (Brown 1999)

The pure numeric style will look like this (expressing in English):

Source [1] says so, and others agree. ^[2]^

So what does the “1” denote in this example? What you say above seems
to suggest it is a number to represent the author, rather than the
source. Is that what you mean to say?

If it is, then I don’t see how CSL has any way to support that. If
it’s not, then I don’t see how this is a citet case.

Bruce

“Source [1]” actually represents both the author and the trailing cite
fragment. CSL can definitely support this, if you think of the output
as consisting of two elements: a text-document segment and a citation
segment. Currently, we only return a citation segment (the string
that gets inserted as the in-text or the footnote citation, depending
on the style). To implement this, we would extend the processors to
return both segments.

It may help to change the nomenclature, and call it "author-in-text"
instead of “author-only”. A processor call with the "author-in-text"
toggle set would return string values in both segments. It would be
up to the calling application (the Zotero plugin, or the pandoc
processor) to set the toggle, and to do something sensible with the
returned strings.

In the experimental implementation, generating the strings requires
two runs of the processor in this case (one for the author string, and
one for the citation or citation fragment). That is I think what
makes things confusing; there is in fact only one citation (consisting
of two elements), and if it is generated with a single API command, it
will be easier to understand, and much easier for calling applications
to handle.

Expressed in JavaScript, the return values would be something like this:On Mon, Nov 8, 2010 at 6:11 AM, Bruce D’Arcus <@Bruce_D_Arcus1> wrote:

On Sun, Nov 7, 2010 at 3:54 PM, Frank Bennett <@Frank_Bennett> wrote:

On Mon, Nov 8, 2010 at 3:00 AM, Bruce D’Arcus <@Bruce_D_Arcus1> wrote:

On Sun, Nov 7, 2010 at 11:59 AM, Andrea Rossato >>> <@Andrea_Rossato1> wrote:

On Sun, Nov 07, 2010 at 10:00:08AM -0500, Bruce D’Arcus wrote:

Changing the subject …

On Sat, Nov 6, 2010 at 11:17 PM, Frank Bennett <@Frank_Bennett> wrote:

Re the pandoc discussions, there seems to be acceptance of the
"author-only" use case.

I have a hard time following that discussion, so will just ask: what
the hell is an “author-only” citation? It seems a contradiction in
terms.

It’s a familiar issue that has come up before in the Zotero forums –
the request by LaTeX users for a “\citet” command equivalent.

This is where I’m getting confused. It seems like these are different use cases?

Anyway, moving on …

It turns out to be not just a matter of mirroring BibTeX or saving a
little typing effort. A common scientific citation style in China
uses a reference number instead of the author name. So where in-text
author date would look like this:

Smith (2000) says so, and others agree. (Brown 1999)

The pure numeric style will look like this (expressing in English):

Source [1] says so, and others agree. ^[2]^

So what does the “1” denote in this example? What you say above seems
to suggest it is a number to represent the author, rather than the
source. Is that what you mean to say?

If it is, then I don’t see how CSL has any way to support that. If
it’s not, then I don’t see how this is a citet case.


in-text author-date style


“author-in-text” toggle set:
{author-text: “Smith”, citation: “(2000)”}
or
"author-in-text" toggle NOT set:
{author-text: “”, citation: “Smith (2000)”}


footnote style


“author-in-text” toggle set:
{author-text: “Smith”, citation: “Book A (2000)”}
or
"author-in-text" toggle NOT set:
{author-text: “”, citation: “Smith, Book A (2000)”}


pure-numeric style


“author-in-text” toggle set:
{author-text: “Source [1]”, citation: “”}
or
"author-in-text" toggle NOT set:
{author-text: “”, citation: “[1]”}

Both citeproc-js and citeproc-hs are able to produce appropriate
output; it’s just a matter of defining an API to invoke the behavior,
and implementing a small wrapper in each processor to invoke the
formatting engine appropriately and return the results in a structured
form.

Yes they are, but from a practical point of view \citet and the
present use of the “author-only” bit in citation items by citeproc
leads to what natbibt users expect. Or at least this is my
understanding in the pandoc discussion.

Andrea

ps: maybe “author-only” is an inappropriate label?

Hang on …

The pure numeric style will look like this (expressing in English):

Source [1] says so, and others agree. ^[2]^

So what does the “1” denote in this example? What you say above seems
to suggest it is a number to represent the author, rather than the
source. Is that what you mean to say?

If it is, then I don’t see how CSL has any way to support that. If
it’s not, then I don’t see how this is a citet case.

“Source [1]” actually represents both the author and the trailing cite
fragment.

You all keep skipping past the bit that I’m tripping up on. :slight_smile:

I don’t understand your response here. How does it represent “the
author” and what do you mean by the “trailing cite fragment”?

To break this down as far as possible, what do the following fragments refer to?

“Source” (document text which isn’t our concern? some generic prefix
for a citation?)
"[1]" (is it a number for the reference, the same as you’d see in any
numeric style? something else?)

Bruce

That might be.

The citet/citep distinction in natbib is, as I understand it,
essentially a distinction between, respectively, an active statement
where the author name is a subject of the sentence the citation is
enclosed in, and a passive statement where the author subject is
unstated.

The suppress-author flag that you often find in Zotero, Endnote, etc.
is a way to achieve the active form, but via an admittedly somewhat
hack-ish solution of entirely removing the author output.

So to simplify, we have:

  1. active, suppress-author
  2. active, print author outside the citation marker (citet)
  3. (standard) passive (citep)

I still don’t know what to say about the case Frank brought up,
because I don’t understand it.

Part of the complication is how to come up with a solution that
balances the need to switch styles radically (which I still think
important), and also support multi-reference citations.

Bruce

Hang on …

The pure numeric style will look like this (expressing in English):

Source [1] says so, and others agree. ^[2]^

So what does the “1” denote in this example? What you say above seems
to suggest it is a number to represent the author, rather than the
source. Is that what you mean to say?

If it is, then I don’t see how CSL has any way to support that. If
it’s not, then I don’t see how this is a citet case.

“Source [1]” actually represents both the author and the trailing cite
fragment.

You all keep skipping past the bit that I’m tripping up on. :slight_smile:

I don’t understand your response here. How does it represent “the
author” and what do you mean by the “trailing cite fragment”?

To break this down as far as possible, what do the following fragments refer to?

“Source” (document text which isn’t our concern? some generic prefix
for a citation?)
"[1]" (is it a number for the reference, the same as you’d see in any
numeric style? something else?)

“Source”: localized term, inserted implicitly by the processor.
"[1]": reference number, same as other numbered styles.

Hang on …

The pure numeric style will look like this (expressing in English):

Source [1] says so, and others agree. ^[2]^

So what does the “1” denote in this example? What you say above seems
to suggest it is a number to represent the author, rather than the
source. Is that what you mean to say?

If it is, then I don’t see how CSL has any way to support that. If
it’s not, then I don’t see how this is a citet case.

“Source [1]” actually represents both the author and the trailing cite
fragment.

You all keep skipping past the bit that I’m tripping up on. :slight_smile:

I don’t understand your response here. How does it represent “the
author” and what do you mean by the “trailing cite fragment”?

To break this down as far as possible, what do the following fragments refer to?

“Source” (document text which isn’t our concern? some generic prefix
for a citation?)
"[1]" (is it a number for the reference, the same as you’d see in any
numeric style? something else?)

“Source”: localized term, inserted implicitly by the processor.

So what I refer to as a “generic prefix for a citation.”

“[1]”: reference number, same as other numbered styles.

OK, so then how is this connected to the citet case? And how is it any
different than any generic numeric style?

Is it that this prefix only gets inserted on a per-citation (or
reference?) basis?

Bruce

Hang on …

The pure numeric style will look like this (expressing in English):

Source [1] says so, and others agree. ^[2]^

So what does the “1” denote in this example? What you say above seems
to suggest it is a number to represent the author, rather than the
source. Is that what you mean to say?

If it is, then I don’t see how CSL has any way to support that. If
it’s not, then I don’t see how this is a citet case.

“Source [1]” actually represents both the author and the trailing cite
fragment.

You all keep skipping past the bit that I’m tripping up on. :slight_smile:

I don’t understand your response here. How does it represent “the
author” and what do you mean by the “trailing cite fragment”?

To break this down as far as possible, what do the following fragments refer to?

“Source” (document text which isn’t our concern? some generic prefix
for a citation?)
"[1]" (is it a number for the reference, the same as you’d see in any
numeric style? something else?)

“Source”: localized term, inserted implicitly by the processor.

So what I refer to as a “generic prefix for a citation.”

“[1]”: reference number, same as other numbered styles.

OK, so then how is this connected to the citet case? And how is it any
different than any generic numeric style?

Is it that this prefix only gets inserted on a per-citation (or
reference?) basis?

Yes, that’s one difference. Another is that it doesn’t carry the
formatting decorations applied via cs:layout (i.e. it’s not
superscripted, as noted in the example above).

So the similarity with citet is just that citeproc is inserting some
content in the body of the sentence (outside the citation).

In this sense, like citet, it’s more a nice-to-have then a necessity?

And the difference between the two cases is that in one the content is
a fixed string, while in the other it depends on the reference
metadata (author)?

I will conclude here by saying most of my writing is in author-date
formats, so I do find myself requiring citet-like rendering. I’d just
always had the sense that it introduces difficulties given other
design goals and choices, and that I don’t mind writing author names
myself and suppressing it in the citation. If there’s a way to prove
me wrong, then that’d be great. But I would want the UI, whether GUI
in Zotero/OOo or textual format in markdown/pandoc, to be as simple
and intuitive as possible.

Bruce

In pandoc we are headed towards an explicit textual citation:

@item1 said that....

as opposed to normal citations:

This is a citation [see @item1, p. 4, etc; see also Doe’s opposite
opinion in -@item2, chap. 3].

The first “textual citation” will be passed to the citeproc as a
single cite with 2 identical citations, the first one with the
"author-only" bit set, the second with the “suppress-author” bit set.
(we are discussing whether to add the possibility of multiple
citations to the textual syntax). The first one will be used by pandoc
in substitution of “@item1” in the text body while the second one will
be placed in-text or in a footnote according to the style class.

Normal citations may have the author suppressed but there is no syntax
to set the “author-only” flag, which remains hidden.

The difference, with the citeproc-js API, is that, while citeproc-js
requires such a citation_items input:

[
[
{
“author-only”: 1,
“id”: “ITEM-1”, “prefix”: “ciao”
}
],
[
{
“id”: “ITEM-1”,
“suppress-author”: 1, “prefix”: “ciao”
}
],
]

I think citeproc-hs will probably require:

[
[
{
“author-only”: 1,
“id”: “ITEM-1”
},
{
“id”: “ITEM-1”,
“suppress-author”: 1
},
{maybe other citations??}
],
]

If John comes up with a syntax for multiple textual citations the
first array may include other subsequent citations.

Frank, what would you think of such an approach?

Andrea

So the similarity with citet is just that citeproc is inserting some
content in the body of the sentence (outside the citation).

In this sense, like citet, it’s more a nice-to-have then a necessity?

And the difference between the two cases is that in one the content is
a fixed string, while in the other it depends on the reference
metadata (author)?

I will conclude here by saying most of my writing is in author-date
formats, so I do find myself requiring citet-like rendering. I’d just
always had the sense that it introduces difficulties given other
design goals and choices, and that I don’t mind writing author names
myself and suppressing it in the citation. If there’s a way to prove
me wrong, then that’d be great. But I would want the UI, whether GUI
in Zotero/OOo or textual format in markdown/pandoc, to be as simple
and intuitive as possible.

In pandoc we are headed towards an explicit textual citation:

@item1 said that…

as opposed to normal citations:

This is a citation [see @item1, p. 4, etc; see also Doe’s opposite
opinion in -@item2, chap. 3].

The first “textual citation” will be passed to the citeproc as a
single cite with 2 identical citations, the first one with the
"author-only" bit set, the second with the “suppress-author” bit set.
(we are discussing whether to add the possibility of multiple
citations to the textual syntax). The first one will be used by pandoc
in substitution of “@item1” in the text body while the second one will
be placed in-text or in a footnote according to the style class.

Normal citations may have the author suppressed but there is no syntax
to set the “author-only” flag, which remains hidden.

The difference, with the citeproc-js API, is that, while citeproc-js
requires such a citation_items input:

[
[
{
“author-only”: 1,
“id”: “ITEM-1”, “prefix”: “ciao”
}
],
[
{
“id”: “ITEM-1”,
“suppress-author”: 1, “prefix”: “ciao”
}
],
]

I think citeproc-hs will probably require:

[
[
{
“author-only”: 1,
“id”: “ITEM-1”
},
{
“id”: “ITEM-1”,
“suppress-author”: 1
},
{maybe other citations??}
],
]

If John comes up with a syntax for multiple textual citations the
first array may include other subsequent citations.

Frank, what would you think of such an approach?

What about having just one item, with “author-only” (or, more
descriptive of the concept I’m putting forward with this note,
“author-in-text”), by itself, without the second item? When the
processor sees “author-in-text” on the first item, it should be able
to just run itself twice with that data. It would make life simpler
for the calling application.

The problem would be the consistency of returned data. With an
"author-in-text" citation you’ll get back a first part to be placed
in-text and a second part which is class dependent.

It seems to me that it would be two calls to make the life of the
calling application simpler. Since in Haskell citation_items are
represented, as in our Json format, as lists, it would be nice to keep
some symmetry between the list of items and the evaluated output, and
thus keep the relation one call / one piece of output. The calling
application then checks the citation_item type (“author-in-text”) to
decide where to place the corresponding evaluated output.

Andrea

What about having just one item, with “author-only” (or, more
descriptive of the concept I’m putting forward with this note,
“author-in-text”), by itself, without the second item? When the
processor sees “author-in-text” on the first item, it should be able
to just run itself twice with that data. It would make life simpler
for the calling application.

The problem would be the consistency of returned data. With an
"author-in-text" citation you’ll get back a first part to be placed
in-text and a second part which is class dependent.

It seems to me that it would be two calls to make the life of the
calling application simpler. Since in Haskell citation_items are
represented, as in our Json format, as lists, it would be nice to keep
some symmetry between the list of items and the evaluated output, and
thus keep the relation one call / one piece of output. The calling
application then checks the citation_item type (“author-in-text”) to
decide where to place the corresponding evaluated output.

So the user will type one item in the document, and the pandoc layer
will figure out that it should be handled with two calls?

I think we are going back to the single item with an "author-in-text"
bit you suggested. The symmetry I was talking about cannot be
guarantee due to citation collapsing.

This is the idea: for each citation group citeproc-hs returns a list
of output data, which may correspond to each singe cited item. If the
citation group starts with an “author-in-text” citation, then the head
of the returned list of the rendered output is to be shown in-text.
The rest is the rendered citation group.

Andrea