Citeproc json data input specs

Hi Bruce,

The Relax NG schema defines an XML format as serialization of
the data model (by the way the same format could and should also
be defined by an XML Schema document).

Yes, except that a) I HATE XSD, and b) CSL is authored in RNG, and so
it’s easy to pull in patterns to avoid duplication.

As far as I can read

http://bitbucket.org/bdarcus/csl-schema/src/tip/csl-data.rnc
http://bitbucket.org/bdarcus/csl-schema/src/tip/csl-types.rnc
http://bitbucket.org/bdarcus/csl-schema/src/tip/csl-variables.rnc

the same structure can also be expressed in XSD so it should be possible
to automatically derive an XSD file from it, for instance
with James Clark’s Trang

http://www.thaiopensource.com/relaxng/trang.html

Having an official XSD for CSL record input format is not Top #1 but
surely necessary for widespread adoption because some people love XSD
(or do not know better ;-). The RNG should be the master of course.

The JSON input format is not explicitely defined by a Schema
because there is no widely adopted schema language for JSON.
I tried some of the JSON schema languages which were mentioned
in this thread but they all seem impractical - either too complex
or you cannot express everything needed and moreover a schema
without validator is of little use. I think it is more practical
to directly implement validators in several programming languages.

OK, so you’re suggesting to define the JSON schema in code?

Following the KISS-principle yes: CSL input format is easy enough.

Normally JSON is mapped to a native data structure build of
objects/maps/hashes, arrays/lists/vectors, and strings. Instead of
requiring each application to load an additional library that validates
the JSON, you can write a simple validator in pseudo-code or provide
equivalent snippets of JavaScript PHP, Perl etc. I bet it’s less then 20
lines of readable code (without the predefined tables of allowed
variables and item types).

By the way I would name the JSON format CSL/JSON and the XML
format CSL/XML. Other CSL input formats that could be useful
are CSL/RDF for CSL as Linked Data and CSL/Microformat to embed
CSL data in HTML.

Makes sense, except I’m not interested in doing CSL/RDF; that’s what
BIBO is for.

Yes, a mapping from BIBO to CSL/JSON or CSL/XML is more useful to start
with.

Jakob–
Jakob Voß <@Jakob_Voss>, skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de

I’ve never met anyone who “loves” XSD :slight_smile:

If someone asks us for an XSD because they really need it, then yes,
it’s trivial enough to provide. But that’s not yet happened. It may
also not be a perfect match the RNG, since RNG has some features XSD
does not.

Bruce

Did one of you already find a way to validate JSON input using csl-data.rnc?
I would like to help out with extending csl-data.rnc, but I would greatly
appreciate the support offered by validation. Frank already prepared a
script to extract the JSON from the citeproc unit tests (
http://bitbucket.org/bdarcus/csl-utils/changeset/54171a592344). Perhaps the
following script would prove useful to convert the JSON files to XML?:
http://michalkorecki.com/content/introducing-json-xml-jquery-plugin

Rintze

Did one of you already find a way to validate JSON input using csl-data.rnc?
I would like to help out with extending csl-data.rnc, but I would greatly
appreciate the support offered by validation.

Jacob looked into this in some detail but concluded the JSON schema
stuff isn’t adequate. Would be nice to know why. For example, is it
because it can only support very simple patterns?

In any case, if that’s true, it’s probably easy enough to write a
dedicated validator.

Frank already prepared a
script to extract the JSON from the citeproc unit tests
(http://bitbucket.org/bdarcus/csl-utils/changeset/54171a592344). Perhaps the
following script would prove useful to convert the JSON files to XML?:
http://michalkorecki.com/content/introducing-json-xml-jquery-plugin

I’m not too worried about converting among this stuff (xml <–> json,
rnc <–> whatever). We just need a target we can work on to agree on
the logical input model.

Bruce

Bruce D’Arcus wrote:

Did one of you already find a way to validate JSON input using csl-data.rnc?
I would like to help out with extending csl-data.rnc, but I would greatly
appreciate the support offered by validation.

Jakob looked into this in some detail but concluded the JSON schema
stuff isn’t adequate. Would be nice to know why. For example, is it
because it can only support very simple patterns?

It’s because JSON schema is not widely adopted, requires additional
libraries which are not available for every programming language and you
only get general error messages of something fails. And it does not
cover all paterns. A JSON schema for CSL input format would not be less
code than a validator most programming languages.

In any case, if that’s true, it’s probably easy enough to write a
dedicated validator.

I just wrote a validator in PHP. It reuses csl-types.rnc and
csl-variables.rnc but it is not tested and does not handle dates and
rich text yet because I am not sure about the final layout:

Frank already prepared a
script to extract the JSON from the citeproc unit tests
(http://bitbucket.org/bdarcus/csl-utils/changeset/54171a592344). Perhaps the
following script would prove useful to convert the JSON files to XML?:
http://michalkorecki.com/content/introducing-json-xml-jquery-plugin

I’m not too worried about converting among this stuff (xml<–> json,
rnc<–> whatever). We just need a target we can work on to agree on
the logical input model.

You can imagine an unlimited number of XML formats and JSON formats for
exactly the same information. That’s why I prefer to talk about data
models independent from a particular serializations which should be
derived in a second step.

Yesterday I tried to map the CSL specifiation, the citeproc-js
documentation and csl-data.rnc and I found a number of issues that are
different or not quite clear:

I’d propose to use (Extended) Backus-Naur-Form instead of a specific
schema language together with description in human language. Every
schema languages has its limits in the patters that you can express.
Moreover there are many pitfalls in using more complicated patterns from
schema languages. For instance the XMl Schema Datatype xsd:gYear
includes year zero and timezones but in CSL we want no timezones and no
year zero, right?

By the way times and timezones may be good to include in dates. For
instance if you cite this, you should include the time :slight_smile:

Cheers
Jakob

You can imagine an unlimited number of XML formats and JSON formats for
exactly the same information. That’s why I prefer to talk about data
models independent from a particular serializations which should be
derived in a second step.

Yesterday I tried to map the CSL specifiation, the citeproc-js
documentation and csl-data.rnc and I found a number of issues that are
different or not quite clear:

http://bitbucket.org/bdarcus/csl-schema/issue/22/clearly-define-csl-input-format

I’d propose to use (Extended) Backus-Naur-Form instead of a specific
schema language together with description in human language. Every
schema languages has its limits in the patters that you can express.
Moreover there are many pitfalls in using more complicated patterns from
schema languages. For instance the XMl Schema Datatype xsd:gYear
includes year zero and timezones but in CSL we want no timezones and no
year zero, right?

By the way times and timezones may be good to include in dates. For
instance if you cite this, you should include the time :slight_smile:

http://twitter.com/jkrums/status/1121915133

I don’t know; just trying to reuse. We hadn’t considered times at all,
since the use case never came up.

Same here (sans the “super”).

Also, I’m wondering if it would be useful to write down the input format, in
a document separate from the rest of the CSL specification? That would allow
us to discuss in detail the input parameters that are allowed, while
escaping the addition of too much technical jargon to the CSL specification.
We could use Frank’s citeproc-js manual as a starting point (which can then
later be slimmed down to avoid duplication):
http://gsl-nagoya-u.net/http/pub/citeproc-doc.html#data-input

Rintze

Rintze Zelle wrote:

The other documents (CSL 1.0 specification, upgrade-notes from CSL 0.8 to
1.0, primer on CSL 1.0) are in http://bitbucket.org/bdarcus/csl-docs , so
that might be a better place.

Rintze

I don’t mind. Speaking about the repositories I found

http://bitbucket.org/bdarcus/csl-schema - CSL Schema
http://bitbucket.org/bdarcus/csl-locales - CSL Locales
http://bitbucket.org/bdarcus/csl-docs - CSL Docs
http://bitbucket.org/bdarcus/csl-utils - CSL Utils

In addition there are test fixtures in

http://bitbucket.org/fbennett/citeproc-js/overview

Moreover there is a kind of CSL-Styles repository at

https://www.zotero.org/svn/csl/

but this is bound to Zotero and CSL 0.8. How about

http://bitbucket.org/bdarcus/csl-styles - CSL Styles

Does Mercurial also know git-like submodules or how do you manage this
on your local machines?

Jakob

Speaking about the repositories I found

http://bitbucket.org/bdarcus/csl-schema - CSL Schema
http://bitbucket.org/bdarcus/csl-locales - CSL Locales
http://bitbucket.org/bdarcus/csl-docs - CSL Docs
http://bitbucket.org/bdarcus/csl-utils - CSL Utils

In addition there are test fixtures in

http://bitbucket.org/fbennett/citeproc-js/overview

The test fixtures have moved to:

http://bitbucket.org/bdarcus/citeproc-test

Does Mercurial also know git-like submodules or how do you manage this
on your local machines?

I just have separate checkouts, nothing special.

Rintze

I’m not sure if it changes things, but I noticed that JSON Schema has seen
some development since this thread went silent:

New draft:
http://tools.ietf.org/id/draft-zyp-json-schema-03.html

JavaScript and Ruby validators:
https://groups.google.com/d/topic/json-schema/VXrhZM4EOSs/discussion

Rintze
<@Rintze_Zelle1

Are there any converters from BibTex (.bib) to CSL JSON?

One in haskell or python would be just kickass.–
View this message in context: http://xbiblio-devel.2463403.n2.nabble.com/Citeproc-json-data-input-specs-tp5135372p7265916.html
Sent from the xbiblio-devel mailing list archive at Nabble.com.

Pandoc uses citeproc-hs, which in turns uses bibutils to convert
different data; including bibtex. I’m pretty sure it (likely
citeproc-hs) supports CSL JSON as well.

I know there are python libraries out there that parse bibtex. Should
be trivial to dump the results to CSL JSON. But there’s nothing I’m
aware of ATM. Feel free to write one up and post it on github :wink:

Bruce

The bibtex-ruby gem converts .bib to CSL/JSON – if there are any features missing, just open up an issue on the project’s github page.

Best,
Sylvester

signature.asc (163 Bytes)