Generating formatted Citations for the Web from MODs

Hi List, [NOTE that this is a cross post from the MODS listserv[0]–this
maybe a better place to post] After careful consideration over the past
several months our web application has decided to store our citations in
MODs as opposed to our propriety and often problematic relational
structure. Great news for sure. We are now able to generate EndNote files,
RIS files, BibTex files, DC, and MARCXML. With the latter two being less
desired by our end users. Ideally our board of directors and (more
importantly) our end users would like to generate formatted HTML citations
in various formats. For example, the way Google scholar will give the user
the choice of MLA, ALA, and Chicago. The problem looks to be that while
there are several leads, no available resource exists for a proper HTML
transformation. The most promising one is the citeproc project and the
Citation Style Language[1]. They have projects in various stages in
multiple languages. However, of the list I am only able to function in
java, python, and JavaScript. The problem to me is that most expect a JSON
format that is not too well documented–as best as I can tell, some of the
discussions I’ve come across on this format our several years old at this
point. Only one purports to work with MODs. citeproc-hs[2] a haskell
library seems to have once expected MODs, but 1. I am not familiar with
haskell and two it appears to not have been kept to date. I have not ruled
it out completely, but need to consult a primer on haskell first. The
python library, citeproc-py[3] claims to work with bibtex. However, they
are still having issues with UTF-8[4]. Additionally, either the mapping is
off in their BibTex parser or bibutils[5] is producing poor BibTex files
from the inputted MODs files. Finally, the library according to the
README.rst[6] is still not ready for production. Ideally, there would be an
Xquery/XSL transformation that we could call from our web application which
is built upon exist-db[7]. I suppose our next step may be writing our own
transformation, however, it seems like coming to this as a programmer and
not a librarian I may not be searching in all the right places. Do I need
to write my own transformation, or has the wheel already been created? Best,
Matt

PS I apologize if I have misrepresented anything about CSL and the various
citeproc projects. I am still only a couple weeks old to this project.
[0]http://listserv.loc.gov/cgi-bin/wa?A1=ind1412&L=mods [1]
http://citationstyles.org/ [2]
https://hackage.haskell.org/package/citeproc-hs [3]
https://github.com/brechtm/citeproc-py [4]
https://github.com/brechtm/citeproc-py/issues/25 [5]
https://sourceforge.net/projects/bibutils/ [6]
https://github.com/brechtm/citeproc-py/blob/master/README.rst#citeproc-py
[7]http://exist-db.org/exist/apps/homepage/index.html

Matt – the most mature CSL code is the javascript implementation by
Frank Bennett.

I don’t believe, though Frank can correct me if I’m wrong, that
citeproc-js can import MODS files.

So I am guessing you’d probably want to get someone to write an XSLT
transformation to generate the JSON input data format from the MODS.

I’m not aware of any such code, but it may already exist. Also worth
noting that conceptually such code would be pretty similar to an XSLT
that could convert MODS to any other similar text format: RIS,
Endnote, etc. So if you could find such openly licensed XSLT, it
should be fairly easy to adapt to a CSL workflow.

But there’s another option, I suppose:

I wrote the first implementation of CSL using MODS in an exist-based
workflow. So that code would probably work, with the downside that it
would be based on an old version of the CSL spec. But you might not
care about that.

http://sourceforge.net/p/xbiblio/code/HEAD/tree/attic/citeproc-xsl/

Bruce

Hi Bruce,

Thanks for the prompt response. citeproc-js was the first project I looked
at, and my recollection is that you are correct. No support for importing
MODs. Can you recommend any good resources outside of the citeproc-js
project and its documentation for the JSON it expects? I think that may
just be the route I go.

As far as original implementation of CSL using MODS that is actually what
brought me to CSL–okay, it was actually google. I’m afraid not only would
it be an older version of the CSL spec it is also would likely expect a
much older version of the MODs we are using.

I just cannot help feeling a bit surprised that having MODS, Bibtex,
Endnote, RIS, DC, and MARCXML I cannot find a solid route to transform one
of them into an HTML formatted citation. I am not tied to using CSL
necessarily, but hands down it looks like one of the best resources out
there. I feel like I am about to reinvent the wheel and that if I just keep
researching a little longer I’ll find something.

Best,
Matt

I believe the folks at Uni Bielefeld’s Katalog Plus do what you want. They
definitely use CSL, I believe via citeproc-js in citeproc-node, and I’d
assume they use their MARC data. I don’t actually think the code is open,
but it might be and/or they might be willing to share stuff with you. I
don’t have a direct contact, maybe someone is reading along here, but I do
know one of the principal architects tweets under @ChPietsch
https://twitter.com/ChPietsch
Another place to look to would be Docear (they definitely read along here),
which users citeproc-js and has a native database in bibtex, so they’re
definitely converting BibTeX to CSLJSON somehow and are fully open.
But since the Bielefeld people do exactly what you have in mind, that’d be
my first attempt.

The Asia & Europe Cluster at Heidelberg carries their content in MODS,
channeled to citeproc-js for rendering, with multilingual citation
support. The architect at Heidelberg is Jens Petersen, his contact
details are here:

http://www.asia-europe.uni-heidelberg.de/en/people/associate-members/associate-members-person-details/persdetail/oestergaard-petersen.html

You are very right to say that documentation on walking data between
the various formats should be closer to the surface. As things have
stabilized over the past year or so, it is within view - all we are
wanting is the time to work on it!

Frank

The problem to me is that most expect a JSON format that is not
too well documented–as best as I can tell, some of the discussions I’ve
come across on this format our several years old at this point.

You’ve found our CSL JSON schema? I’m pretty sure both Zotero and
Mendeley use this to ensure conformity.


(for individual bibligraphic items)

(for citations to said items)

citeproc-hs[2] a haskell library seems to have
once expected MODs, but 1. I am not familiar with haskell and two it appears
to not have been kept to date. I have not ruled it out completely, but need
to consult a primer on haskell first.

I recommend taking another look here. The original author of
citeproc-hs, Andrea Rossato, has indeed been relatively inactive for a
while, but recently announced a willingness to resume work on
citeproc-hs. Meanwhile, the author of pandoc, John McFarlane, created
pandoc-citeproc (based on citeproc-hs, but much more actively
maintained), and extensive work was done to get a good BibTeX-to-CSL
JSON mapping in pandoc (mostly by Nick Bart on the pandoc forums).

https://code.google.com/p/citeproc-hs/issues/detail?id=101#c10


https://groups.google.com/forum/#!forum/pandoc-discuss

Best,
Rintze

Dear all,

Please note that citeproc-java supports several input formats such as
BibTeX, RIS and EndNote. It also has got a nice command line tool that
can be used to convert BibTeX or EndNote to CSL, for example.

What you have to do, though, is to build the tool from source as the
current release 0.6 does not have this features yet (except for BibTeX
import). However, building is relatively easy. It’s all described on the
website:

http://michel-kraemer.github.io/citeproc-java/

After building the tool you can import RIS and EndNote in the same way
you import BibTeX.

I’m currently working on version 1.0 which I plan to release some time
soon. I’m very busy at the moment so the release might be around Christmas.

I hope this helps. Please let me know, if you need any assistance.

Cheers,
Michel

Matthew,

I don’t know what motivated you or your organization to choose MODS; the
main problem I see is that MODS is nowhere as standardized as biblatex,
bibtex (which I would consider to be too limited for serious use though),
or CSL; so if I were you I’d probably try to use one of these formats for
my database instead.

Still, if you must use MODS, some background on citeproc-hs and
pandoc-citeproc:

citeproc-hs and pandoc-citeproc incorporate bibutils to first convert all
bibliography database input formats bibutils recognizes to MODS (with the
exception of CSL JSON and MODS itself for citeproc-hs, and CSL JSON, CSL
YAML, bibtex, biblatex, and MODS for pandoc-citeproc).

Thus, as Andrea Rossato, the citeproc-hs author, has pointed out
repeatedly, citeproc-hs’s MODS parser has been written with the sole aim of
parsing MODS records generated by bibutils, and nothing else, so depending
on the MODS flavour you will be using, your mileage may vary considerably.

Unfortunately, the whatever-to-MODS (bibutils) and MODS-to-CSL JSON
(citeproc-hs) routines suffered from many bugs, some of which have not been
fixed to this date (for open bug reports see
http://sourceforge.net/p/bibutils/discussion/general/ and
http://code.google.com/p/citeproc-hs/issues).

pandoc-citeproc inherited the citeproc-hs MODS parser essentially
unchanged, and since pandoc-citeproc bypasses the MODS-related routines of
both bibutils and citeproc-hs completely for what appear to be its most
popular input formats, biblatex and bibtex (CSL JSON or CSL YAML not
needing conversion anyway), there has never been much demand on the
pandoc-citeproc forum for fixing MODS-related bugs.

That being said, if you want to get an idea of whether pandoc-citeproc’s
MODS-to-CSL JSON conversion could work for you, try pandoc-citeproc --bib2json yourbibfile.mods.

Best,
Nick

To your first question, MODS comes out of the library world, and so is also
a way to bring MARC data into the 21st century. Pretty sure that’s why
they’re using it.

Also, while MODS is less, shall we say controlled, it’s also more
expressive than the alternatives you note.