pandoc and citeproc-hs: an update

Hi,

(I’m posting this to both the pandoc and xbiblio-devel lists).

I’ve been working on citeproc-hs quite a lot, lately, and I now deem
it Andrea Complete. Unfortunately this means nothing else that it is
usable. By me. Alone, I’m afraid. (which is still good enough,
though…;).

The only major deficiencies, as far s as I remember (and understand)
are:

  1. no citation sorting (sorting for the bibliography is
    implemented);

  2. no implementation of the “disambiguate” option (I’m must admit I
    still don’t have a clear idea on how to implement it, I should add:
    basically I wonder if there’s a way to know if two citations will need
    to be disambiguated before evaluating the style, which would make
    everything much much nicer);

  3. no implementation of “hanging-indent”, “line-formatting” and
    "second-field-align".

The rest of CSL should be there.

The MODS parser needs some work, but it perfectly fits my needs. In
other words, in order to start improving it I need to make a release
and have some feedback/bug reports.

And I’m going to make a 0.1 release soon.

The integration with pandoc is done, even though it has not been
committed yet (John may want to have the problem of citation linking
solved before committing. But I think this is a non CSL option and we
may need to find a way to deal with non CSL options in pandoc before
pushing these latest changes).

Anyway, if you want to give it a try read below for installation
instructions.

To give you the taste, at least (if you do not want to install
everything yourself), here are a few tests.

First an example of in-text citations. The mods files comes from the
citeproc-py source tree:
http://gorgias.mine.nu/pandoc/Test.mods

A copy of the applied styles can be found here:
http://gorgias.mine.nu/pandoc/styles/

Here’s the source code:
http://gorgias.mine.nu/pandoc/test.markdown

AMA:
http://gorgias.mine.nu/pandoc/html/text_ama.html

APA:
http://gorgias.mine.nu/pandoc/html/text_apa.html

Chicago
http://gorgias.mine.nu/pandoc/html/text_chicago_aut_date.html

Harvard:
http://gorgias.mine.nu/pandoc/html/text_harvard.html

Footnote citation. This is the source:
http://gorgias.mine.nu/pandoc/test_note.markdown

AMA
http://gorgias.mine.nu/pandoc/html/note_ama.html

APA:
http://gorgias.mine.nu/pandoc/html/note_apa.html

Chicago:
http://gorgias.mine.nu/pandoc/html/note_chicago_full_note.html

To install pandoc with citeproc-hs support you need to grab the latest
citeproc-hs source from the darcs2 repository:
http://code.haskell.org/citeproc-hs

or you can grab it from here:
http://gorgias.mine.nu/pandoc/citeproc-hs-0.1pre.tar.gz

You’ll first need to install HXT and its dependency from here:
http://hackage.haskell.org/packages/archive/pkg-list.html

All these packages (citeproc-hs included) can be installed very
simply, if you have GHC (the Glasgow Haskell Compiler), with the
following commands, run in the source tree of the decompressed
packages:

runhaskell Setup.lhs configure
runhaskell Setup.lhs build
runhaskell Setup.lhs install (as root)

(the file Setup.lhs may be named Setup.hs)

After that you can install pandoc, by grabbing the source code from
the subversion and by applying this patch:
http://gorgias.mine.nu/pandoc/pandoc_citeproc-hs.diff

with this command, run in the pandoc source directory:
patch -p0 < pandoc_citeproc-hs.diff

and then:
runhaskell Setup.lhs configure
runhaskell Setup.lhs build
runhaskell Setup.lhs install (as root)

Then you can try with:
pandoc --mods Test.mods -t html --csl chicago-fullnote-bibliography.csl test_note.markdown

Hope you’ll enjoy

Andrea

ps: the pandoc list will also get the last git patches.

The disambiguate options are going to tend to apply to the author-date
and author styles.

These …

disambiguate-add-givenname
disambiguate-add-names
disambiguate-add-title

… all relate to author family names. So if you have John Doe (1999)
and Jane Doe (1999) cited in the same group, you need to know what to
do so you don’t have (Doe, 1999a, 1999b).

Couldn’t you just compare the author labels for a citation, and if
they’re the same, switch on disambiguation?

This …

disambiguate-add-year-suffix

… is critical for author-date styles, and happens much more commonly
than the above options in my experience. Hope you have this one
working :slight_smile:

Bruce

The problem is that I still do not understand how to implement them,
and specifically I do not understand the relationship between citation
disambiguation and bibliography formatting (which should come into
account only with the year suffix disambiguation I think).

Suppose we have two works by the same author in the same year with
disambiguate-add-year-suffix: which one gets the ‘a’ and which one
gets the ‘b’? Must I use the order of citation? Do I need to
disambiguate the bibliography too?

Andrea

Suppose we have two works by the same author in the same year with
disambiguate-add-year-suffix: which one gets the ‘a’ and which one
gets the ‘b’?

The suffix gets assigned based on the ordered bibliography. So …

Must I use the order of citation?

… no. Except, of course, that it would look weird to have (Doe,
2000b, 2000a), which is why sorting of the citation is helpful.

Do I need to disambiguate the bibliography too?

I guess, yes.

Bruce

So, to summarize:

  1. I collect the disambiguation options

  2. first I turn on a sort of disambiguate variable to see if the style
    evaluation will disambiguate the citations (with the ).

  3. if 2. does not produce any result I apply options collected in 1.
    But, these options are applied in a fixed order.

So, for example, Harvard has:

<option name="disambiguate-add-year-suffix" value="true"/>
<option name="disambiguate-add-names" value="true"/>
<option name="disambiguate-add-givenname" value="true"/>

First I add names, disregarding et-al options; if no disambiguation is
achieve then I add given names; if still I don’t get a result I then
add the year suffix. In this later case I change the bibliography
accordingly.

And what about “add-title”? It should come before year-suffix. Does it
come before “add-names” too?

Thanks,
Andrea

So, for example, Harvard has:

First I add names, disregarding et-al options; if no disambiguation is
achieve then I add given names; if still I don’t get a result I then
add the year suffix. In this later case I change the bibliography
accordingly.

I think you treat the first differently than the others. For the
first, each item in your reference list will have an optional year
suffix. If that’s present, you print it in the citation. So I’d first
process the list, then worry about the citation.

And what about “add-title”? It should come before year-suffix. Does it
come before “add-names” too?

While I’d expect it’d be uncommon to have these cases together, I’m
starting to wonder if there shouldn’t be some explicit coding for the
layout here. I’d say the title would come at the end, though; like
(Doe, 1999a, Some Title). Note, though, that this disambiguation is
redundant, since the suffix already achieves that.

Bruce