Ideal API to expose for CSL processor? (and global citation affixes)

Related to this, but want to broaden the audience.

Note that the subsequent discussion has uncovered a CSL development question.

Org-mode developer is asking the question about what API they should expose to allow external tools (like CSL processors) to take over citation processing.

Specifically, he asks:

we need to decide about how external processors could plug
into the export framework.

  • For example, it could be a simple variable bound to a list of
    functions. Each function accepts three arguments: the export
    back-end, as a symbol, the full citation, as a list of triplets
    (key, prefix, suffix) along with global prefix/suffix, and the
    usual INFO communication channel. Does it need more?

  • Also, the prefix/suffix may contain some Org markup, so this
    needs to be also processed. Should it happen before, or after the
    external processor does its job? I.e., should the function
    translate into Org or target format?

My impulse on second question is the latter, and first question is … not sure.

And a little farther down, he says:

Decide about API Org should provide for it to be useful. Here are
some low hanging fruits:

  • List all “.bib” files associated to the document,

  • List all citations,

  • Return citation key at point, if any.

  • Anything else?

It’s been awhile since I’ve thought about these details. What suggestions do you all have?

Is it providing a list of all citations (each of which is represented as a “triplet,” per his suggestion), bib data for those citations, and a style?

@Frank_Bennett
@cormacrelf

I’m going to test out an argument. Does this make sense?

The one thing I would say about this (the “triplet” model) is to think about the user experience.

At least for CSL processors like citeproc-org, locators are distinct from prefixes and suffixes.

So the code parses the full string, and converts it into a map, with possible keys something like cite_key, prefix, suffix, locators, suppress_author.

My thought is that if org citation syntax doesn’t explicitly incorporate that, then it leads greater room for user confusion and frustration.

So citation should have the following keys in the end?

  • cite_key
  • prefix
  • suffix
  • locators
  • suppress_author (per @Denis_Maier below, better this not be a boolean, so “mode” would be better)

I have just had a quick try with pandoc. Take this input document (test.md):

[@doe]

[@doe, 4]

[cf. @doe]

[cf. @doe, 4]

[cf. @doe, 4 for many interesting comments]

[cf. @doe, for many interesting comments]

[-@doe, 4]

@doe argues ...

@doe [4] argues ..

[@doeA; @doeB]

Now, pandoc test.md -t native > pandoc-ast.txt gives me:

[Para [Cite [Citation {citationId = "doe", citationPrefix = [], citationSuffix = [], citationMode = NormalCitation, citationNoteNum = 0, citationHash = 0}] [Str "[@doe]"]]
,Para [Cite [Citation {citationId = "doe", citationPrefix = [], citationSuffix = [Str ",",Space,Str "4"], citationMode = NormalCitation, citationNoteNum = 0, citationHash = 0}] [Str "[@doe,",Space,Str "4]"]]
,Para [Cite [Citation {citationId = "doe", citationPrefix = [Str "cf.\160"], citationSuffix = [], citationMode = NormalCitation, citationNoteNum = 0, citationHash = 0}] [Str "[cf.",Space,Str "@doe]"]]
,Para [Cite [Citation {citationId = "doe", citationPrefix = [Str "cf.\160"], citationSuffix = [Str ",",Space,Str "4"], citationMode = NormalCitation, citationNoteNum = 0, citationHash = 0}] [Str "[cf.",Space,Str "@doe,",Space,Str "4]"]]
,Para [Cite [Citation {citationId = "doe", citationPrefix = [Str "cf.\160"], citationSuffix = [Str ",",Space,Str "4",Space,Str "for",Space,Str "many",Space,Str "interesting",Space,Str "comments"], citationMode = NormalCitation, citationNoteNum = 0, citationHash = 0}] [Str "[cf.",Space,Str "@doe,",Space,Str "4",Space,Str "for",Space,Str "many",Space,Str "interesting",Space,Str "comments]"]]
,Para [Cite [Citation {citationId = "doe", citationPrefix = [Str "cf.\160"], citationSuffix = [Str ",",Space,Str "for",Space,Str "many",Space,Str "interesting",Space,Str "comments"], citationMode = NormalCitation, citationNoteNum = 0, citationHash = 0}] [Str "[cf.",Space,Str "@doe,",Space,Str "for",Space,Str "many",Space,Str "interesting",Space,Str "comments]"]]
,Para [Cite [Citation {citationId = "doe", citationPrefix = [], citationSuffix = [Str ",",Space,Str "4"], citationMode = SuppressAuthor, citationNoteNum = 0, citationHash = 0}] [Str "[-@doe,",Space,Str "4]"]]
,Para [Cite [Citation {citationId = "doe", citationPrefix = [], citationSuffix = [], citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}] [Str "@doe"],Space,Str "argues",Space,Str "\8230"]
,Para [Cite [Citation {citationId = "doe", citationPrefix = [], citationSuffix = [Str "4"], citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}] [Str "@doe",Space,Str "[4]"],Space,Str "argues",Space,Str ".."]
,Para [Cite [Citation {citationId = "doeA", citationPrefix = [], citationSuffix = [], citationMode = NormalCitation, citationNoteNum = 0, citationHash = 0},Citation {citationId = "doeB", citationPrefix = [], citationSuffix = [], citationMode = NormalCitation, citationNoteNum = 0, citationHash = 0}] [Str "[@doeA;",Space,Str "@doeB]"]]]

So, in pandoc each cite seems to be a list containing multiple citations. Each citation consists of the following keys:

  • citationID
  • citationPrefix
  • citationSuffix
  • citationMode
  • citationNoteNum
  • citationHash

After the list, there’s is also the original input as a string.

Now, concerning your suggestion.

  1. Pandoc uses citationMode instead of suppress_author. I guess this is a good choice as it does also allow for other citation modes, like narrative citations, or so.

  2. Pandoc does not explicitly use a key like locators. Instead, locators are inferred from the suffix, as the manual states:

    pandoc-citeproc will use heuristics to distinguish the locator from the suffix. In complex cases, the locator can be enclosed in curly braces (using pandoc-citeproc 0.15 and higher only):

    Like so:

    [@smith{ii, A, D-Z}, with a suffix]
    [@smith, {pp. iv, vi-xi, (xv)-(xvii)} with suffix here]
    
1 Like

Concerning global affixes, this is currently not really supported by pandoc, as this is clearly not included in CSL. (Biblatex knows global pre- and suffixes, so this might come from there.)

Anyway, pandoc parses this org-mode-citation [(cite): PREFIX; @doeA; cf. @doe p. 4; SUFFIX] as:

,Para [Cite [Citation {citationId = "doeA", citationPrefix = [Str "PREFIX"], citationSuffix = [], citationMode = NormalCitation, citationNoteNum = 0, citationHash = 0},Citation {citationId = "doe", citationPrefix = [Str "cf."], citationSuffix = [Space,Str "p.",Space,Str "4",Str "SUFFIX"], citationMode = NormalCitation, citationNoteNum = 0, citationHash = 0}] []]]

The global prefix is merged with the first citation’s (non-existent) prefix, the global suffix is merged with the second citation’s suffix.

(Perhaps explicit support for global affixes would be something to be discussed for CSL 1.2, or later…)

1 Like

Thanks much @Denis_Maier; this is very helpful.

I’m also putting a screenshot of the Zotero citation UI here.

I’m totally agnostic about this question, either in org, or in CSL.

My impulse is to think it’s overkill though. Does anyone know the practical need for this feature, which cannot be addressed with the existing, flat, approach?

So to clarify, we have citations, which are lists of (let’s call them) cites.

In CSL implementations, including pandoc, cites can have affixes, but citations cannot.

In biblatex, and the proposed org syntax, both levels can have affixes.

So think of an example like:

Doe has argued (see 2016, 2018).

Really, the “see” is a prefix for the citation, since its target is the group of cites that follow.

Obviously, we can still handle this, in say the pandoc syntax, by simply doing:

Doe has argued [see -@doe2016, -@doe2018].

Is that good enough?

Is there a situation where the current approach breaks down? Maybe where the cites need to be resorted within the citation? So a case like above, where user instead enters this, where cites are sorted in date order?

Doe has argued [see -@doe2018, -@doe2016].

Do we, as @Denis_Maier suggests, need to consider the two-level support for the next release?

Yeah, I think that’s pretty much it. If cites are resorted automatically a global prefix would be much easier to handle, but also if you manually resort the cites later, a global prefix is probably more user friendly. Having said that, I have never been inclined to use this even in biblatex.

On the other hand, after thinking a bit more about it: Do affixes fall into the domain of CSL at all? At least currently this seems to be entirely out of the specs, right?

I think the expectations around this are more implicit currently.

So you think this should be added to the specs? Perhaps as an addendum dealing with API details?

Perhaps yes.

I think we didn’t originally put anything in because we thought it premature, and less important than the core of the style spec.

But at this point, we have a lot of experience, so maybe worth revisiting?

Global affixes would definitely be an improvement. The current flat approach works, but it is a pain when revising a document. For example, let’s say I’m writing in APA style (cites sorted alphabetically), and I have the citation (cf. Jones, 2016; Smith, 2008). But then I later add a cite to “Bradley, 2020” to this citation. It now becomes (Bradley, 2020; cf. Jones, 2016; Smith, 2008), and the “cf.” has to be moved manually. It would be really helpful to be able to specify things like “cf.” as a global prefix.

1 Like

Based on this conversation, I suggested to the org devs to keep the global affixes.

If we do define an API in CSL, we should probably too.

1 Like

I also just suggested they add support for textual citations, since we’ve heard this feature request forever.

And I restarted discussion on how to implement this in CSL on github.

I had an epiphany that this might be much easier than I assumed. I hope I’m right, but feel free to tell me I’m wrong :smile:

On API, I started something while I’m thinking about this.

It took me awhile to understand @Frank_Bennett’s work on intext support in CSL, but once I finally got it (thanks @bwiernik!), it’s pretty cool!

Nice to see we’ll be able to finally cross this long-running request off the list.

2 Likes

If anyone is curious, here’s the org citation code, which partially prompted my question here.

1 Like