do we need a CSL (citation) API?

I forget if I posted a note about the discussion of markdown syntax
extensions for more citations on the pandoc list, but think it raises
a bigger question: should we be defining a CSL API?

That could have some benefits short term, in the sense of clarifying
the syntax issues in pandoc, but may have longer term benefits (like
being able to use the markdown/pandoc format in other contexts for
processing or, longer term, fixing some of the big interop problems
I’ve been ranting about forever).

Just a thought …

Bruce

this is a recurring theme, and my opinion is that yes, we need it,
even though there’s no hurry: we have citepro-js, and the
"discretionary" group of tests.

(BTW citeproc-hs passes 5 out of 6 those tests. The missing one is
discretionary_CitationNumberAuthorOnlyThenSuppressAuthor: I needed
Frank’s explanation to understand it - well, I know it is also
documented in the citeproc-js manual…:slight_smile:

A documented API could be useful also to underline the CSL
expressiveness, thus avoiding the need to refer to bibtex as a model.

Anyway, the fact that we share a common test-suite, which implies the
adoption of a common format (Json) for the input, forces us to share
the API, so, at least now, there’s no strict necessity of such a
document. Still I have some proposal for adding some new features
(like the ability to suppress a rendered citation to appear in the
bibliography): if we had a documented API we would also have a more
structured way of making such proposals, and, for people working with
strict time constrains, that could be helpful in the long run.

Andrea

I forget if I posted a note about the discussion of markdown syntax
extensions for more citations on the pandoc list, but think it raises
a bigger question: should we be defining a CSL API?

That could have some benefits short term, in the sense of clarifying
the syntax issues in pandoc, but may have longer term benefits (like
being able to use the markdown/pandoc format in other contexts for
processing or, longer term, fixing some of the big interop problems
I’ve been ranting about forever).

Just a thought …

this is a recurring theme, and my opinion is that yes, we need it,
even though there’s no hurry: we have citepro-js, and the
“discretionary” group of tests.

(BTW citeproc-hs passes 5 out of 6 those tests. The missing one is
discretionary_CitationNumberAuthorOnlyThenSuppressAuthor: I needed
Frank’s explanation to understand it - well, I know it is also
documented in the citeproc-js manual…:slight_smile:

A documented API could be useful also to underline the CSL
expressiveness, thus avoiding the need to refer to bibtex as a model.

Yeah, is strikes me the pandoc discussion is significantly confused by
having to simultaneously discuss model and syntax. One person is
constantly referring to natbib, and at least you and I keep wanting to
refer to, without any concrete document to reference, an implicit CSL
API.

Anyway, the fact that we share a common test-suite, which implies the
adoption of a common format (Json) for the input, forces us to share
the API, so, at least now, there’s no strict necessity of such a
document. Still I have some proposal for adding some new features
(like the ability to suppress a rendered citation to appear in the
bibliography): if we had a documented API we would also have a more
structured way of making such proposals, and, for people working with
strict time constrains, that could be helpful in the long run.

For now, I’ve created a place-holder:

https://bitbucket.org/bdarcus/csl-schema/wiki/API

Bruce

I forget if I posted a note about the discussion of markdown syntax
extensions for more citations on the pandoc list, but think it raises
a bigger question: should we be defining a CSL API?

That could have some benefits short term, in the sense of clarifying
the syntax issues in pandoc, but may have longer term benefits (like
being able to use the markdown/pandoc format in other contexts for
processing or, longer term, fixing some of the big interop problems
I’ve been ranting about forever).

Just a thought …

this is a recurring theme, and my opinion is that yes, we need it,
even though there’s no hurry: we have citepro-js, and the
“discretionary” group of tests.

(BTW citeproc-hs passes 5 out of 6 those tests. The missing one is
discretionary_CitationNumberAuthorOnlyThenSuppressAuthor: I needed
Frank’s explanation to understand it - well, I know it is also
documented in the citeproc-js manual…:slight_smile:

Great stuff, Andrea. This is wonderful to see.

A documented API could be useful also to underline the CSL
expressiveness, thus avoiding the need to refer to bibtex as a model.

Yeah, is strikes me the pandoc discussion is significantly confused by
having to simultaneously discuss model and syntax. One person is
constantly referring to natbib, and at least you and I keep wanting to
refer to, without any concrete document to reference, an implicit CSL
API.

Anyway, the fact that we share a common test-suite, which implies the
adoption of a common format (Json) for the input, forces us to share
the API, so, at least now, there’s no strict necessity of such a
document. Still I have some proposal for adding some new features
(like the ability to suppress a rendered citation to appear in the
bibliography): if we had a documented API we would also have a more
structured way of making such proposals, and, for people working with
strict time constrains, that could be helpful in the long run.

For now, I’ve created a place-holder:

https://bitbucket.org/bdarcus/csl-schema/wiki/API

Re the pandoc discussions, there seems to be acceptance of the
“author-only” use case. This presents two issues: (1) whether CSL
undertakes to handle that case; and if so (2) what the API for
handling it should look like.

The experimental code in citeproc-js handles this with two
separately-invoked runs of the processor, one for the “author-only”
part, and one for the “suppress-author” part. Apart from the
BibTeX/natbib thing with \citet or \citep or whatever it’s called,
this behavior would allow calling applications to handle a common
pure-numeric style used in China, and possibly elsewhere, which sets
“author-only” citations very differently. Instead of the common form

According to Smith (2000), this is correct.

… the pure-numeric style requires this …

According to Source [1], this is correct.

Setting aside implementation details, handling such a style requires two things:

(1) An author name provided by the processor must always appear in
the text; and
(2) The rump citation must appear in-text or in-note, depending on the style.

If this case is to be handled in pandoc, and if the API of citeproc-js
and citeproc-hs are to be kept in alignment, this would imply a change
to the current return value of citeproc-js, dividing the citation
returned by all calls into two segments: and text-insert segment, and
a citation-insert segment.

That shouldn’t cause any serious difficulties. Applications that
don’t yet handle the “author-only” case (Zotero, Mendeley) would only
need to adjust their code to pick up the citation-insert segment of
the return.

If that were accepted, the next step would be to consider what the
processor call for the case above should look like and how it should
be interpreted.

Frank

In pandoc this is going to be handled with two separate runs of the
processor too, so I do not understand why citeproc-js should change
its return value (with a warning: I’m far from being familiar with
citeproc-js internals).

That is to say, (1) is provided by the first run, a citation with the
"author-only" bit set, while (2) should be provided by the second run,
a citation with the “suppress-author” bit set. In a purely numeric
style (1) would be “<capitalized term “reference” + and the number
without formatting>” and (2) nothing. In a note style (2) would be the
footnote.

I’m adopting the two separate run approach you took because it makes
sense, since (2) can be set to ether be rendered as in-text or in a
footnote.

With reference to the pandoc markdown syntax, this is going to be an
in-text citation (we are not going to generally support “author-only"
citations for the time being) and not two separate citations. In other
words, something like “@item1” will be translated into two separate
citations, the first one with “author-only” set, the second one with
"suppress-author” set (that could be our definition of a “textual
citation”). The fact of using two runs of the processor makes it
easier to switch between in-text and note styles and produce coherent
output.

Multiple “suppress-author” citations within a cite will be allowed. I
provided a few use cases here:
http://groups.google.com/group/pandoc-discuss/msg/56d03197ea9e1462
and here:
http://groups.google.com/group/pandoc-discuss/msg/0cf91b1b3dfbf6be

The same could seem reasonable with “author-only” citations too. I
think that the only specificity of the use case Frank proposed is the
fact that in purely numeric styles “author-only” citation will produce
that specific output (‘Reference [1]’) and “suppress-author” will
produce none.

Still, I wonder if multiple “author-only” citations in a cite make
sense (since the output is just the author name, multiple citations
should have a delimiter?). Does this mean it would make sense to
ignore the “author-only” bit in cites with multiple citations?

Another question: in a numeric style the output is always capitalized
(Source or Reference)? (If the answer is no, I wonder how to handle
that.)

Andrea

(*) I have some terminology problem here:

- I use "cite" to indicate the equivalent of a bibtex \cite
   command (some other time I call it "citation group"): a list of
   citations belonging to a single set (the layout's delimiter
   will delimit them);

- "citation" is the individual bibliographic reference.

This seems (to me) the way these words are used in the CSL
documentation (but I could be wrong, and the fact that I’m not able
to understand the “by-cite” disambiguation rules could be a clear
evidence of that).