disambiguation, pandoc and other news

Hi,

I pushed a few patches that add disambiguation and citation sorting
support.

Moreover pandoc (the svn tree) can now be compiled with citeproc
support without patches (by just running configure with the “-f
citeproc” flag).

The disambiguation bits are supposed to be a real implementation, and
should be supporting every syntactically correct style.

With a major exception: it doesn’t (and will not) support the
"add-title" options (I’ll write a second message for it, since I’m
going to propose to get rid of it).

There are minor issues, which I think are related to the semantics of
those disambiguation options: if a style has short names (or long with
"initialize-with") and “add-given-names” is not present, two list of
authors with the same last name and different given-names (or
initials) will not received a year suffix, since the author is not the
same - (given names and possibly initials are different).

Or, for instance, the apa style has long names with initials in the
bibliography, but has the “add-given-names” as a citation option. Now,
if two authors share names and initials, the citation will be
disambiguated with the given-name, but the bibliography will not, so
you can distinguish the citations but you cannot understand which
citation refers to which bibliographic entry. See below an example.

Here you’ll find a test suite, built upon the zotero issues examples:

http://gorgias.mine.nu/csl/disambig/

For instance, if you run:
pandoc --mods disamb.mods --csl apa.csl disamb.markdown

you will get:
Giovanni Pascuzzi, 2004; Giuseppe Pascuzzi, 2004;

when citing Pascuzzi2004a and Pascuzzi2004b, but the bibliography is:

Pascuzzi, G. (2004). The Brother’s Book.
Pascuzzi, G. (2004). First Book.

See the output here:
http://gorgias.mine.nu/csl/disambig/apa_test.html

The test suite is this one:
pandoc --mods disamb.mods --csl disamb.csl disamb.markdown

and this is the output:
http://gorgias.mine.nu/csl/disambig/test_suite.html

I think this is the way of adding the title (as the Modern Language
Association (mla) style does).

Performance: it is an issue indeed, but I think my approach is overall
efficient.

In this file:
http://gorgias.mine.nu/csl/disambig/disamb_performance.markdown

there are about 800 citations of the 8 colliding references of the
disamb.mods collection:

[14:15:25]$ time pandoc --mods disamb.mods --csl apa.csl disamb_performance.markdown > /dev/null
real 0m2.539s
user 0m2.412s
sys 0m0.036s

This is not bad, after all.

Nonetheless I’m really looking forward to seeing other implementations
for possible improvements.

Collapsing is the only major missing features - together with some
minor formatting options, to be taken care together within pandoc,
mostly. Which means citeproc-hs can now have some broader testing and
debugging.

Please let me know about any issue you may find.

Thanks,

Andrea

So does that mean pandoc now includes your code (the stuff that deals
with CSL, etc.; not the in-document citation stuff), or do we also
need to separately compile that also?

Bruce

Oh, and … how does one do this; I’m not seeing a configure file?

Bruce

First update your citeproc-hs installation:

darcs pull
runhaskell Setup.hs configure
runhaskell Setup.hs build
runhaskell Setup.hs install

The grab svn pandoc:

svn checkout http://pandoc.googlecode.com/svn/trunk/ pandoc

then run:
runhaskell Setup.hs configure -f citeproc -v

you’ll see something like:
Flags chosen: citeproc=True,…
…etc
Dependency citeproc-hs-any: using citeproc-hs-0.1
…etc

then

runhaskell Setup.hs build
runhaskell Setup.hs install

The you can run pandoc with the ‘–mods’ and ‘–csl’ flags.

Andrea

Thanks for the detailed instructions!

I previously had no problem compiling citeproc-hs, but I’m now getting this:

$ runhaskell Setup.lhs configure
Configuring citeproc-hs-0.1…
Setup.lhs: At least the following dependencies are missing:
hxt >=8.1

I presume this is some sort of path issue, b/c I do have hxt 8.1
installed (for some reason, at ~/lib). Any suggestions on this? Have
you changed the dependencies recently?

Bruce

No recent dependency change (hxt-8.1 is been there for quite some
time, before there was hxt-8.0).

You could try passing the path of hxt to configure with something
like:
runhaskell Setup.lhs configure --with-hxt=~

or something like this.

Hope this helps.

Andrea

runhaskell Setup.lhs configure --with-hxt=~/lib/hxt-8.1.0/ghc-6.8.2/
Setup.lhs: Unrecognised flags:
–with-hxt=~/lib/hxt-8.1.0/ghc-6.8.2/

Sigh … I don’t know why it’s always such a PITA to compile things
with Haskell (for me at least).

Bruce

You could try passing the path of hxt to configure with something
like:
runhaskell Setup.lhs configure --with-hxt=~

or something like this.

runhaskell Setup.lhs configure --with-hxt=~/lib/hxt-8.1.0/ghc-6.8.2/
Setup.lhs: Unrecognised flags:
–with-hxt=~/lib/hxt-8.1.0/ghc-6.8.2/

probably you installed hxt with the ‘–user’ flag and you need to run
configure with the same flag for citeproc-hs:

from ‘runhaskell Setup.lhs configure --help’:

      --user                         allow dependencies to be satisfied
                                     from the user package database. also
                                     implies install --user
      --global                       (default) dependencies must be
                                     satisfied from the global package
                                     database

so, try with:
runhaskell Setup.lhs configure --user

Sigh … I don’t know why it’s always such a PITA to compile things
with Haskell (for me at least).

I must say that this is not my experience, but I’m not running a Mac
and I think this is the issue.

probably you installed hxt with the ‘–user’ flag

Yeah, that’s seems to be what happened, though I honestly don’t know how.

and you need to run
configure with the same flag for citeproc-hs:

from ‘runhaskell Setup.lhs configure --help’:

     --user                         allow dependencies to be satisfied
                                    from the user package database. also
                                    implies install --user
     --global                       (default) dependencies must be
                                    satisfied from the global package
                                    database

so, try with:
runhaskell Setup.lhs configure --user

OK, that works.

Sigh … I don’t know why it’s always such a PITA to compile things
with Haskell (for me at least).

I must say that this is not my experience, but I’m not running a Mac
and I think this is the issue.

Yeah, probably; I think I need to transition to linux. For mac users,
it’d be good if the forthcoming v0.47 macport of pandoc could include
options to make it easy to get it working with citeproc (e.g. so users
don’t have to go through the hassles that I’ve had to go through).

In any case, I’ll try to test this out soon.

Bruce