CSL Style to convert to JATS XML

Dear list,

although it seems obvious that someone must have thought about using CSL to format citations for the JATS XML, I can’t find information about this topic. Specifically I am using pandoc-citeproc with Pandoc and have created the pandoc-jats [1] writer to generate JATS XML. I would like to generate JATS elements [2].

Has there been any work on something like this? Would it be a reasonable approach to generate a CSL style that outputs JATS -conformant XML? I am currently not interested in the opposite direction, going from JATS XML to Citeproc.

[1] https://github.com/mfenner/pandoc-jats
[2] http://www.ncbi.nlm.nih.gov/pmc/pmcdoc/tagging-guidelines/article/dobs.html#dob-refs

Best,

Martin

Hi Martin,

If I understand right, this is effectively formatted citations wrapped in
metadata elements?

If yes, I would point out this is the same type of case as earlier
discussions about RDFa output. I think the solution is not in styles, but
in formatting engines outputting HTML with enough structure that it can be
easily converted to other, related, formats.

Bruce

I think this isn’t about formatted citations wrapped in metadata, but
rather converting references from CSL’s data format into the NIH-defined
JATS XML format. The NIH provides some tagging guidelines for
referenceshttp://www.ncbi.nlm.nih.gov/pmc/pmcdoc/tagging-guidelines/article/dobs.html#dob-refs
and
a few real-world
exampleshttp://www.ncbi.nlm.nih.gov/pmc/pmcdoc/tagging-guidelines/article/JournalPubv3-sample1.xml
(xml
file) which help give an idea how the elements work within JATS
documents.

Martin, I suppose you’re thinking about a CSL style for this because Pandoc
is already well set up to process CSL for references?

At first glance, it would seem seriously awkward to try and write an
XML-based CSL style that’s more about creating structured XML output than
about formatting references as strings. If it’s even possible at all, I
imagine there would have to be lots of bracket-escaping going on, since
you’d basically be embedding templated XML output within an XML CSL style.

I wonder if there’s any way to get Pandoc to include the references in the
data structure passed to the Doc method (
https://github.com/mfenner/pandoc-jats/blob/master/JATS.lua#L70)? If you
could get the array of references as metadata[‘references’] then you could
write a Lua function to convert the CSL-ready reference data into JATS XML.
I expect that would be no more difficult, and probably much more readable
and maintainable, than creating a CSL style for this purpose.

–greg

Ah, OK. This was reminding me of Docbook, which has a distinction
between raw (data only) and cooked (formatted) references and
citations. I was assuming the latter, but you’re saying the former?

If yes, then yeah, you need to be able to spit out that raw data
somewhere in the citation process and transform it. My guess is if
pandoc can’t do that now, it would be easy for John to add (because
certainly at some point there’s list of that raw data).

Thanks a lot for the feedback. It seems that a CSL style is not the best approach, but I should see that I get the references from pandoc-citeproc as structured metadata and do the formatting into JATS XML using Lua.

Best,

Martin

Update: I have created a CSL style that generates JATS-compatible XML and have just added a pull request at the styles repo.

Even though the XML required for JATS is fairly structured, there is enough extra work needed so that CSL processing makes sense - otherwise I would have to recreate many of the CSL rules in a custom implementation (e.g. name and date parsing). One added benefit is that any reference manager can use the style.

I haven’t covered many of the special cases (content other than journal article yet). One small issue I have run into is the handling of 'et al.‘: JATS recommends to show 6 authors followed by , followed by the last author for 9 authors or more (following the APA style guide). This isn’t quite possible, as '…‘ is hardcoded for ‘et-al-use-last’. Is there a way to replace '…‘ with an optional custom value - ‘’ in my case? Similarly I would need a prefix and suffix around individual name elements rather than the list of names - I have worked around it for now.

Best,

MartinAm 13.12.2013 um 15:44 schrieb Martin Fenner <@Martin_Fenner1>:

Update: I have created a CSL style that generates JATS-compatible XML and have just added a pull request at the styles repo.

Even though the XML required for JATS is fairly structured, there is enough extra work needed so that CSL processing makes sense - otherwise I would have to recreate many of the CSL rules in a custom implementation (e.g. name and date parsing). One added benefit is that any reference manager can use the style.

I haven’t covered many of the special cases (content other than journal article yet). One small issue I have run into is the handling of 'et al.‘: JATS recommends to show 6 authors followed by , followed by the last author for 9 authors or more (following the APA style guide). This isn’t quite possible, as '…‘ is hardcoded for ‘et-al-use-last’. Is there a way to replace '…‘ with an optional custom value - ‘’ in my case?

Is the style meant to run on arbitrary CSL processors? If so, the
changes to accomodate these requirements would need to be adopted to
the specification to get them in place across the board.

If a processor-specific approach would be useful, there are hooks in
citeproc-js for wrapping output in “off-schema” markup. The hooks were
put in there for possible RDFa support, which hasn’t yet materialized,
but they might be useful here.

Update: I have created a CSL style that generates JATS-compatible XML and have just added a pull request at the styles repo.

Even though the XML required for JATS is fairly structured, there is enough extra work needed so that CSL processing makes sense - otherwise I would have to recreate many of the CSL rules in a custom implementation (e.g. name and date parsing). One added benefit is that any reference manager can use the style.

I haven?t covered many of the special cases (content other than journal article yet). One small issue I have run into is the handling of 'et al.?: JATS recommends to show 6 authors followed by , followed by the last author for 9 authors or more (following the APA style guide). This isn?t quite possible, as '?? is hardcoded for ‘et-al-use-last’. Is there a way to replace '?? with an optional custom value - ‘’ in my case?

Is the style meant to run on arbitrary CSL processors? If so, the
changes to accomodate these requirements would need to be adopted to
the specification to get them in place across the board.

If a processor-specific approach would be useful, there are hooks in
citeproc-js for wrapping output in “off-schema” markup. The hooks were
put in there for possible RDFa support, which hasn’t yet materialized,
but they might be useful here.

I am currently using this style with citeproc-hs (and noted some small inconsistencies with the CSL processing of this style). I’m not sure whether there is enough general interest to use CSL to produce machine-readable output to change the specification. I personally think it is a useful direction as these machine-readable outputs become increasingly important (and CSL could play a role in this), but I’m sure not everyone agrees as it increases the scope.

Best, Martin

I am currently using this style with citeproc-hs (and noted some small inconsistencies with the CSL processing of this style). I’m not sure whether there is enough general interest to use CSL to produce machine-readable output to change the specification. I personally think it is a useful direction as these machine-readable outputs become increasingly important (and CSL could play a role in this), but I’m sure not everyone agrees as it increases the scope.

While not a main goal of CSL, I find this idea very interesting, and I agree that it would be nice to make sure that reasonable hooks are in place for that purpose. I think it is already quite good at it, and things like JATS or BibTeX are good examples of the kind of things that should be possible (and make for good test cases for the different processors out there :wink:

Charles–
Charles Parnot
@Charles_Parnot

twitter: @cparnot

Also, converting to CSL JSON would be quite cool. I tried to write a
style for that but didn’t get very far.

However, in my opinion, changing the specification just to support these
cases may be counter-productive. It’s always the same with
specifications: people often try to make them as generic as possible in
order to support a wide range of applications. But then they become too
complex and are not adopted anymore, which finally means their end. I
suggest to focus on what CSL is really meant for instead. Converting
between different formats should be left to tools such as bibutils.

Just my two cents. Though, as I said, I would find it cool if I could
export JSON from every reference manager supporting CSL, but that’s
different story :slight_smile:

Cheers,
Michel

This.

I think this job is for processors; not the CSL styles.

Along these lines, I once had an idea to create simple JSON maps,
modeled loosely on bibutils C source, to configure output formatting.
Alas, I never got very far, but it’s probably viable.

Thanks for the feedback. Even though you can probably do 90% of the work needed in CSL, and have to duplicate some of the CSL functionality, at the end of the day a machine-readable output format is indeed not a good fit for CSL. This should indeed be done by the CSL processor, or by another utility that takes CSL JSON as input.

It would be very helpful if applications implementing CSL would allow the export of CSL JSON. Zotero supports this format, Mendeley and Papers do not. I would actually prefer CSL YAML as output format - something that the biblio2yaml utility by John MacFarlane supports - but that is a different discussion.

I will therefore work on something that can transform CSL JSON (or the corresponding Lua table representation) into JATS XML.

Best,

Martin