0.6 release

OK, I started to put the first release up on the Soureforge release
system – version 0.6 – but got annoyed enough with it (it’s sort of
ridiculous how bad the SF interface is, given its function) that I gave
up and just posted it here:

http://www.users.muohio.edu/darcusb/xslt/citeproc-0.6.tar.gz

This version is basically the “book” version; with all the fixes in
place I needed to format my recently finished book.

I also fixed the citekey and number class rendering, which somewhere
along the way broke.

Finally, I included another DocBook example stylesheet; the one I used
to format my book. It processes a series of xinluded (external)
chapters and then spits everything out into separate files, including
the bibliography and notes.

A changelog is included in the archive, though this only includes more
recent changes since I moved the repository to darcs.

The plan:

For the next release I want to:

  1. Remove the existing styles – which are named with generic class
    names – and replace them with real-world canonical styles that
    correspond to these classes. I will definitely include MLA, APA, and
    Chicago, but I could use some help on good candidates for the number
    and citekey classes. If anyone wants to help on the somewhat tedious
    task of tracking down citation style details for these, please let me
    know.

I want to do this first, because it may uncover some changes I need to
make to the CSL schema before going forward. I already know, for
example, that the way I handle configuration of “et al.” handling is
not sufficient.

  1. Remove the DocBook stylesheets I just added, and put that under
    separate version control. I was finding too much of my changelog
    relating to the stylesheets, which are just examples, after all.

  2. The big project: figure out how to reimplment the foot/endnote
    class under the new architecture.

  3. add a simple (Bash, probably) script to make running citeproc
    easier for those not familiar with XSLT.

  4. Better (more aimed at end-user) documentation.

BTW, I’ve ran across a novel way to implement a web interface for
editing CSL files; it involves using Javascript to do drag-and-drop
list reordering. I know nothing about JS, alas, but it’s encouraging
that it should be possible to do what I envision.

Bruce

I fear I can’t help with generic MLA/APA styles since I’m not used to
them but I’d like to try your solution to see how it performes with
styles from my scientific field.

That’d be great.

  1. add a simple (Bash, probably) script to make running citeproc
    easier for those not familiar with XSLT.

  2. Better (more aimed at end-user) documentation.

Both would be very helpful for me since I have only very basic
knowledge of XML/XSL etc. I’ve basically no clue how I would make use
of your solution. So answers to basic questions like:

  • what tools are required
  • what must be changed (and where) to customize a citation style
  • how to call the processor

Much of this is documented already. See the main page in the doc
directory and let me know if you have any questions.

All you need is really Saxon 8 (latest version is 8.4; note, though,
this version has some performance issues with my stylesheets).

BruceOn Mon, 16 May 2005 16:18:20 +0200, “Matthias Steffens” <@Matthias_Steffens> said:

Ok, I think I’ve succesfully installed Saxon 8 (I don’t know how to
test that its successfully running but anyhow).

Using the OSX terminal I’ve cd’ed into the main 'citeproc-0.6’
directory and executed the following command:

java net.sf.saxon.Transform -o test.html samples/docbook-test.xml
xsl/document/dbng-xhtml.xsl citation-style=“author-year”

This gives me the following output:

– output start -----

CiteProc XSL Stylesheets v0.6.0

citation style: author-year
citation class: author-year

Recoverable error on line 109 of file:/Users/msteffens/Languages/XML/XML%20Tools/Bibliographic%20XML%20Tools/xbiblio/citeproc-0.6/xsl/citeproc.xsl:
FODC0005: java.net.ConnectException: Connection refused
Error on line 109 of file:/Users/msteffens/Languages/XML/XML%20Tools/Bibliographic%20XML%20Tools/xbiblio/citeproc-0.6/xsl/citeproc.xsl:
FODC0005: Failed to load document
http://localhost:8080/exist/servlet/db/mods?_query=declare%20namespace%20mods=“http://www.loc.gov/mods/v3”;%20for%20$citekey%20in%20(‘Veer1996a’,%20’TimesP2001a’,%20’Tilly2000a’,%20’Tilly2002a’,%20’Thrift1990a’,%20’NW2000-0207’,%20’NW2000-0424a’,%20’Tremblay2001a’)%20return%20//mods:mods[@ID=$citekey]&_howmany=-1
Transformation failed: Run-time errors were reported

– output end -----

Any idea what the problem is?

Could this be a permission issue? And am I allowed to have spaces in
the path name?

Is it correct that ‘samples/docbook-test.xml’ should be the input XML
file and ‘-o test.html’ specifies the name & location of the output
file?

I don’t see anything for MODS in ‘xsl/document’. So, how would I
invoke conversion of a MODS file?

Thanks, Matthias

Stupid me. I thought that ‘docbook-test.xml’ contains the
bibliographic data. I have no eXist XML DB installed on my machine.

refbase doesn’t support SRU/W yet (does citeproc-0.6 already support
this method?). Instead I’d like to parse a real MODS file and have it
converted to plain text, html or whatever. How would I do that?

Thanks, Matthias

Matthias Steffens wrote:

refbase doesn’t support SRU/W yet (does citeproc-0.6 already support
this method?). Instead I’d like to parse a real MODS file and have it
converted to plain text, html or whatever. How would I do that?

Currently citeproc is setup to format documents. You specify a flatfile
as the db by adding the paramete bibdb=flatfile, the default for which
will use the mods.xml file in the data directory.

That can be configured too.

I should say that if all you want is single-item entry display in a
browser, and you should many may want that. it might be worth
considering doing a citeproc-light. Some of the really complicated
processing citeproc does (sorting, grouping, etc.*) only really matters
in the context of document formatting.

Bruce

  • Example: it’s deceptively difficult to get this output (Doe 1999a,
    1999c; Smith 2000), but the suffixes and contractions and such don’t
    matter for wb display.

Bruce D’Arcus wrote:

I should say that if all you want is single-item entry display in a
browser, and you should many may want that.

um, should say:

“and you think many may want that.”

Am tired!

Bruce

Currently citeproc is setup to format documents. You specify a
flatfile as the db by adding the paramete bibdb=flatfile, the
default for which will use the mods.xml file in the data directory.

Ok.

I should say that if all you want is single-item entry display in a
browser, and you should many may want that. it might be worth
considering doing a citeproc-light.

What I’m interested in is to have citeproc work in a similar fashion
for output of references as bibutils does for output of common
bibliographic formats. I.e., I’d like to integrate citeproc with
refbase so that:

  1. a user can select one or more bibliographic database entries in a
    web browser, choose a citation style from a drop down and click on
    “Cite” (all this is already provided by refbase)

  2. refbase will generate MODS records for all selected entries and
    pass them to citeproc. In a first incarnation this could simply
    mean saving a MODS XML file to a tmp directory and call citeproc
    via the command line specifying the path to the tmp file as input
    file:

    exec(“java net.sf.saxon.Transform -o …”)

(could this step somehow be enhanced if refbase would support SRU/W?)

  1. citeproc will convert all MODS records into references (formatted
    according to the given cite style) and return them

  2. refbase will take these references and do with them whatever was
    requested by the user (display as plain text or html, send as email,
    etc)

Some of the really complicated processing citeproc does (sorting,
grouping, etc.*) only really matters in the context of document
formatting.

Yes, I can imagine that. Regarding refbase, this would be also of
interest but a goal that is a bit further away. Basically, I’d like a
future version of refbase to be able to act as the possible MODS
source (instead of eXist XML DB). If I’ve understood you correctly
this would require refbase to support SRU/W, right?

Thanks, Matthias

Matthias Steffens wrote:

I should say that if all you want is single-item entry display in a
browser, and you should many may want that. it might be worth
considering doing a citeproc-light.

What I’m interested in is to have citeproc work in a similar fashion
for output of references as bibutils does for output of common
bibliographic formats. I.e., I’d like to integrate citeproc with
refbase so that:

  1. a user can select one or more bibliographic database entries in a
    web browser, choose a citation style from a drop down and click on
    “Cite” (all this is already provided by refbase)

  2. refbase will generate MODS records for all selected entries and
    pass them to citeproc. In a first incarnation this could simply
    mean saving a MODS XML file to a tmp directory and call citeproc
    via the command line specifying the path to the tmp file as input
    file:

    exec(“java net.sf.saxon.Transform -o …”)

(could this step somehow be enhanced if refbase would support SRU/W?)

  1. citeproc will convert all MODS records into references (formatted
    according to the given cite style) and return them

  2. refbase will take these references and do with them whatever was
    requested by the user (display as plain text or html, send as email,
    etc)

OK, so I wonder if this isn’t something like the reading list sort of
thing that the Oxford people are working on (and wanting to use citeproc
for as well)? The result, then, is just a formatted list of references?

In that case, you could actually think of a simple DocBook document that
contains the citations, and on which citeproc is run in the bibliography
mode only (not the citations). There may be other ways to address it to.

Current output modes for citeproc are xhtml, fo, and tex, with wordml
and opendocument (openoffice’s file format) planned.

Plain text is simple enough to add.

BTW, did you see the demo I came up with using clientside XSLT and Atom?

http://www.users.muohio.edu/darcusb/feeds/bib-atom.xml

Yes, I can imagine that. Regarding refbase, this would be also of
interest but a goal that is a bit further away. Basically, I’d like a
future version of refbase to be able to act as the possible MODS
source (instead of eXist XML DB). If I’ve understood you correctly
this would require refbase to support SRU/W, right?

Yes. An XSLT processor can ingest documents over HTTP. This is how
integration with eXist works, which has a nice little RESTful server.

SRU is the obvious standard to support here, though in theory something
even more lightweight is possible. All citeproc does is issue a single
call – over a potentially very long url query – that says “give me all
the MODS records that correspond to X list of unique citekey pointers.”

Another theoretical possiblity is for someone to write extension
functions for XSLT processors. That could allow calls like:

<xsl:copy-of select=“xbib:return_record_with_id($cites)”/>

I don’t know that this is necessary, and is something farther out to
consider in any case.

Bruce

Matthias Steffens wrote:

What I’m interested in is to have citeproc work in a similar
fashion for output of references as bibutils does for output of
common bibliographic formats.

OK, so I wonder if this isn’t something like the reading list sort
of thing that the Oxford people are working on (and wanting to use
citeproc for as well)? The result, then, is just a formatted list
of references?

Yes. The returned list of formatted references must retain all local
formatting (i.e. bold, italic, uppercase, etc), so ‘xhtml’ output is
what I’d normally want.

In that case, you could actually think of a simple DocBook document
that contains the citations, and on which citeproc is run in the
bibliography mode only (not the citations).

Sounds good. Is this ‘bibliography mode’ already available in
citeproc-0.6 and (if so) how do I invoke it?

I think it wouldn’t be too difficult for refbase to dynamically
generate a simple DocBook document containing all the citations. I
assume that the given citation strings must equal the MODS identifiers
given in ?

However, I’m not sure why this step would be necessary. I mean, for
my setup the whole purpose of this DocBook document would be to pass
the citation IDs, right? If citeproc could start its work with a MODS
file (dynamically generated by refbase), all the IDs would be already
present. That said, would it be too difficult to modify citeproc in
such a way that it could start its work with a given MODS file (but
without a DocBook document) and return a list of formatted references
as xhtml?

Current output modes for citeproc are xhtml, fo, and tex, with wordml
and opendocument (openoffice’s file format) planned.

Plain text is simple enough to add.

Yes, plain text would be very useful. I can imagine that users might
also want RTF output but I have no clue if this would be difficult to
implement.

BTW, did you see the demo I came up with using clientside XSLT and Atom?

http://www.users.muohio.edu/darcusb/feeds/bib-atom.xml

I’m not sure it works for me. Safari (on OSX 10.4 Tiger) always
converts the URL to

feed://www.users.muohio.edu/darcusb/feeds/bib-atom.xml

and throws up a strange error then. Firefox displays some text as web
page:

Some Journal Article
Jane Doe, Some Article

Some annotations, complete with rich content, including “embedded quotes”.

but not the volume/issue/pages information that’s present in the
feed’s source. What am I supposed to see? A fully formatted reference?

Basically, I’d like a future version of refbase to be able to act
as the possible MODS source (instead of eXist XML DB). If I’ve
understood you correctly this would require refbase to support
SRU/W, right?

Yes. An XSLT processor can ingest documents over HTTP. This is how
integration with eXist works, which has a nice little RESTful server.

SRU is the obvious standard to support here, though in theory something
even more lightweight is possible. All citeproc does is issue a single
call – over a potentially very long url query – that says “give me all
the MODS records that correspond to X list of unique citekey pointers.”

I’m not sure how difficult it would be to support SRU. As a first
measure it would be fine for me to simply provide support for the SRU
query that citeproc generates.

How exactly does the SRU query look like that citeproc sends out?

Another theoretical possiblity is for someone to write extension
functions for XSLT processors. That could allow calls like:

<xsl:copy-of select=“xbib:return_record_with_id($cites)”/>

Ok. I guess this means that such an extension function could send a
query in a form that refbase understands already?

Could you give me any pointers to (or examples of) such a function?
Whats the language used for the function?

Thanks again, Matthias

Matthias Steffens wrote:

However, I’m not sure why this step would be necessary. I mean, for
my setup the whole purpose of this DocBook document would be to pass
the citation IDs, right? If citeproc could start its work with a MODS
file (dynamically generated by refbase), all the IDs would be
already present.

True. I just came up with an example that works, but I need to think a
bit more if there’s not a better way.

The structure of citeproc now pretty much assumes it’s working with a
document, so my “solution” was to tell it that mods:mods/@ID is in fact
a citation. It works, but it brings with some it some processing overhead.

That said, would it be too difficult to modify citeproc in such a way
that it could start its work with a given MODS file (but without a
DocBook document) and return a list of formatted references as xhtml?
Yes, plain text would be very useful. I can imagine that users might
also want RTF output but I have no clue if this would be difficult
to implement.

RTF is sort of a PITA, but it is just text, so certainly possible.

but not the volume/issue/pages information that’s present in the
feed’s source. What am I supposed to see? A fully formatted
reference?

No, I didn’t get around to doing the rest of the templates. It’s just to
show a simple XSLT can be put to good use for these sorts of uses.

Basically, I’d like a future version of refbase to be able to act
as the possible MODS source (instead of eXist XML DB). If I’ve
understood you correctly this would require refbase to support
SRU/W, right?

Yes. An XSLT processor can ingest documents over HTTP. This is how
integration with eXist works, which has a nice little RESTful
server.

SRU is the obvious standard to support here, though in theory
something even more lightweight is possible. All citeproc does is
issue a single call – over a potentially very long url query –
that says “give me all the MODS records that correspond to X list
of unique citekey pointers.”

I’m not sure how difficult it would be to support SRU. As a first
measure it would be fine for me to simply provide support for the SRU
query that citeproc generates.

How exactly does the SRU query look like that citeproc sends out?

I’ve not actually tried it, but an expert in these technologies has
suggested something like:

http://localhost:8081/biblio?operation=searchRetrieve&version=1.1&query=cite.key+any+“Smith1992a+Smith1992b+Mitchell1995a”&recordSchema=mods&startRecord=1&maximumRecords=9999

Another theoretical possiblity is for someone to write extension
functions for XSLT processors. That could allow calls like:

<xsl:copy-of select=“xbib:return_record_with_id($cites)”/>

Ok. I guess this means that such an extension function could send a
query in a form that refbase understands already?

Could you give me any pointers to (or examples of) such a function?
Whats the language used for the function?

Each XSLT processor has their own ways of writing extension functions.
So in Saxon it’s Java (and I think maybe XQuery), in libxslt, it’s C, etc…

The person who suggested the above (Mike Taylor, of Index Data) has also
suggested the possibility of writing an XSLT extension for ZOOM, such
that one could query either SRU/W or Z39.50 catalogs from an XSLT.

Quite intesting actually.

Bruce

Bruce D’Arcus wrote:

True. I just came up with an example that works, but I need to think a
bit more if there’s not a better way.

Here’s one way:

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet xmlns:xsl=“http://www.w3.org/1999/XSL/Transform
version=“2.0”
xmlns:xdoc=“XSLTdoc - A Code Documentation Tool for XSLT - Main Page
xmlns:xs=“http://www.w3.org/2001/XMLSchema
xmlns:mods=“Metadata Object Description Schema (MODS)
xmlns=“XHTML namespace
xmlns:xhtml=“XHTML namespace
xmlns:db=“The DocBook Namespace
xmlns:cs=“Citation Style Language - Citation Style Language
xmlns:bib=“http://purl.org/NET/xbiblio/citeproc
xmlns:exist=“http://exist.sourceforge.net/NS/exist
exclude-result-prefixes=“db xdoc xhtml mods xs
cs exist bib”>

<xsl:import href=“…/citeproc.xsl”/>
<xsl:output method=“xhtml” encoding=“utf-8” indent=“yes”/>
<xdoc:doc type=“stylesheet”>
xdoc:shortStylesheet to transform MODS to XHTML.</xdoc:short>
xdoc:authorBruce D’Arcus</xdoc:author>
xdoc:copyright2005, Bruce D’Arcus</xdoc:copyright>
</xdoc:doc>

<xsl:param name=“include-bib”>yes</xsl:param>
<xsl:variable name=“title”>References</xsl:variable>

<xsl:variable name=“raw-biblist”>
<xsl:copy-of select=“.”/>
</xsl:variable>

<xsl:template match=“/”>



<xsl:value-of select=“$title”/>









<xsl:value-of select=“$title”/>


<xsl:call-template name=“bib:format-bibliography”>
<xsl:with-param name=“output-format” select=“‘xhtml’”/>
</xsl:call-template>




</xsl:template>
</xsl:stylesheet>

  1. Remove the existing styles – which are named with generic class
    names – and replace them with real-world canonical styles that
    correspond to these classes. I will definitely include MLA, APA, and
    Chicago, but I could use some help on good candidates for the number
    and citekey classes. If anyone wants to help on the somewhat tedious
    task of tracking down citation style details for these, please let me
    know.

I fear I can’t help with generic MLA/APA styles since I’m not used to
them but I’d like to try your solution to see how it performes with
styles from my scientific field. I.e. I’d like to generate a few
styles based on your ‘author-year.csl’.

  1. add a simple (Bash, probably) script to make running citeproc
    easier for those not familiar with XSLT.

  2. Better (more aimed at end-user) documentation.

Both would be very helpful for me since I have only very basic
knowledge of XML/XSL etc. I’ve basically no clue how I would make use
of your solution. So answers to basic questions like:

  • what tools are required
  • what must be changed (and where) to customize a citation style
  • how to call the processor

would help me a lot. In other words: what do I need to do if I want
to send MODS to the processor and get formatted references back (in
plain text or html)?

[Sorry if this is too basic but I just don’t know better ;-)]

Thanks, Matthias