test suite?

So since LIam’s getting back to citeproc-rb, shall we talk about a
test suite that we might share across implementations?

I think we ought to divide the tests between a few different classes:

  1. CSL parsing/writing: correctly reading and writing CSL files; not
    sure how this would work

  2. CSL processing: stuff like sorting, suffix generation (2000a,
    2000b, etc.), substitution, first/subsequent, ibid

It seems to me data input parsing and formatted output ought to be
either a different class of tests, or not necessary.

Bruce

That sounds good to me.

I wouldn’t mind some test cases for output though, partly because I’m less
than clear about what I should be generating in many cases. Similarly I
noticed that CiteProc-Py supports a good number of potential inputs. This
seems logical - I had started with the notion of supporting input conforming
to bibliontology, but a range of sources would be better obviously. So if
not test cases, some small set of a) sample reference input files and b)
expected outputs would be very helpful. There is some test data in both
CiteProc-Py and the csl project which seem good starting points for this.

Regards,

Liam.2008/6/12 Bruce D’Arcus <@Bruce_D_Arcus1>:

Hello,

I agree with Bruce and Liam that a test suite is a good thing to have.
We should have test files with bibliographic references in the various
supported formats (bibtex, mods, etc.) which contain references
exhibiting the special cases (e.g. multiple references from the same
author, etc.) and some simple CSL files that test these cases as well
as the expected output.

It has been suggested to use YAML for the references, but this would
require adding a YAML processor, which in Python can only be done by
adding a library, which I would prefer to avoid to not become to
dependent on external libraries.

Johan—
http://www.johankool.nl/

Avoidance of external libraries is a fool’s errand. An external
library is a layer of code that is someone else’s problem (unless
you’re the maintainer), just ask CPAN.

JSON is probably a more sensible serialisation format, as it’s easier
to write, and can be input and output by javascript natively (which
is good for zotero integration).

</ 2¢

Thanks Johan.

Might it be possible to load libraries dynamically in these cases? In my
case I’d like to support (probably) BibTex, Bibliontology as well as basic
citational data, in YAML or JSON. In all of these cases external Ruby
libraries are required, but can be conditionally loaded - are similar
facilities available in Python, and if so would you be happy with this
approach?

Regards,

Liam.2008/6/18 Johan Kool <@Johan_Kool2>:

I might be mixing up JSON and YAML, but anyway, I guess it is possible
to use the required libraries for testing only. Nevertheless, having
the test files available as bibtex (and other formats) would be very
valuable for testing too.

As for Kieren’s remark about external libraries: I agree that it is
always good to hand of code to specialized libraries. However, in this
case the library is needed for testing purposes only, not for using
the tool as intended. If I can avoid requiring the user to have to
install a third party library, I think that is a good thing. The
advantage of using the library has to outweigh the hassle of
installing the library.

Johan

I might be mixing up JSON and YAML, but anyway, I guess it is possible
to use the required libraries for testing only. Nevertheless, having
the test files available as bibtex (and other formats) would be very
valuable for testing too.

JSON is a superset of YAML. JSON is also a lot easer as a human
writable data serialisation format, especially for people who aren’t
experienced programmers, or who rely on screen readers. And it has
native support in all modern web browsers, which yaml doesn’t.

As for Kieren’s remark about external libraries: I agree that it is
always good to hand of code to specialized libraries. However, in this
case the library is needed for testing purposes only, not for using
the tool as intended. If I can avoid requiring the user to have to
install a third party library, I think that is a good thing. The
advantage of using the library has to outweigh the hassle of
installing the library.

OK, that’s clearer.

In perl where there is a well established distribution packaging
system we would put

requires ‘XML::Parser’;

in the Makefile.PL for an external library for modules required to
run the distribution, and

build_requires ‘YAML::Any’ # will ensure there’s a decent YAML parser
installed, but is not fussy about which one

for modules required to test the distribution only.

As for requiring something for testing, wouldn’t env variables work?
I don’t know any python or ruby, so again I’ll stick to perl:

in the script:

use YAML if $ENV{TESTING_XBIBLIO}; # load the module

and from the command line:

TESTING_XBIBLIO=1 python mytestscript.py

Actually some kind of minimalist format (say, JSON) would be useful for
interim citation data, once it is converted from a source like BibTeX.

My original thought, after some discussion with Bruce, was that
Bibliontology would provide such a format, on the basis that it is
explicitly designed to support other formats, i.e.:

BibTeX/MODS/etc -> Bibliontology -> Citeproc

And I think this is still a viable option. However it entails greater
explicit commitment on the part of users, and the presence of an RDF parser
in a Citeproc implementation, so there is also a case for a minimal JSON
"microformat" consisting of just the CSL variables. Conceivably this could
be for more than just test cases, i.e. a client using Citeproc-(py/rb/…)
could store and load conformant JSON citational data. IMO it would also be
very useful to establish some concordance between Bibliontology and the CSL
variable set, since to some degree there appears to me a duplication of
effort otherwise.

Kieren’s subsequent point re: browser support is valid; my mentioning YAML
is just that it is preferred in the Ruby world. JSON is fine as well.

Regards,

Liam.2008/6/18 Johan Kool <@Johan_Kool2>:

Actually some kind of minimalist format (say, JSON) would be useful for
interim citation data, once it is converted from a source like BibTeX.

My original thought, after some discussion with Bruce, was that
Bibliontology would provide such a format, on the basis that it is
explicitly designed to support other formats, i.e.:

BibTeX/MODS/etc → Bibliontology → Citeproc

And I think this is still a viable option. However it entails greater
explicit commitment on the part of users, and the presence of an RDF parser
in a Citeproc implementation, so there is also a case for a minimal JSON
“microformat” consisting of just the CSL variables.

Right, those are the two choices: a) an internal model that is really
(only) focused on the output formatting, or b) one that is really a
full, robust, and flexible model that could do a lot more.

My hunch is the former option is more sensible, at least as a first cut.

Conceivably this could be for more than just test cases, i.e. a client using Citeproc-(py/rb/…)
could store and load conformant JSON citational data. IMO it would also be
very useful to establish some concordance between Bibliontology and the CSL
variable set, since to some degree there appears to me a duplication of
effort otherwise.

Now that v1.0 of bibo it out, let’s see if we can look to harmonize it
and the CSL model. It’s not that they’re the same, since bibo is an
RDF graph, and the CSL model is simpler, but just need to make sure
bibo can be mapped to CSL reasonably.

Kieren’s subsequent point re: browser support is valid; my mentioning YAML
is just that it is preferred in the Ruby world. JSON is fine as well.

I believe the file I have in the repo is both YAML and JSON. For the
simple stuff this use/test case requires, it ought to be possible to
maintain that.

Bruce

I might be mixing up JSON and YAML, but anyway, I guess it is possible
to use the required libraries for testing only. Nevertheless, having
the test files available as bibtex (and other formats) would be very
valuable for testing too.

My thinking on using JSON/YAML for testing is that we can tailor the
input format to the CSL model, and so we isolate out issues that may
have more to do with data parsing. I think a test suite can get really
far by just doing simple tests with simple data.

As for Kieren’s remark about external libraries: I agree that it is
always good to hand of code to specialized libraries. However, in this
case the library is needed for testing purposes only, not for using
the tool as intended. If I can avoid requiring the user to have to
install a third party library, I think that is a good thing. The
advantage of using the library has to outweigh the hassle of
installing the library.

But keep in mind that installation libraries like Python’s docutils
(easy_install, etc.) and Ruby gems and CPAN make installing
dependencies trivial for the user. From the user perspective, they
don’t really even need to know there are external dependencies.

The one thing that can be a little tricky, an so worth looking out
for, is when you have C-based libraries. Unfortunately, for example,
it seems Python bibtex libraries have C-compilation dependencies.

Bruce

Any recommendations for what would be a good (the best?) JSON library
for Python?

Johan

This app uses simplejson, and I trust his opinion.

http://lcsh.info

Bruce

The latest commit for Citeproc-rb make it useable as a gem, with a
conventional file/directory structure to suit.

I’ve started work on some fixtures, with the following convention:

test/test_csl.rb - test case
test/fixtures/csl_test_data.json - Bruce’s test data in JSON
test/fixtures/styles/test_sort.csl - simplified version of
chicago-author-date.csl, to test sort

Something like this is hopefully easy to copy or modify for Citeproc-PY and
other ports.

Incidentally should I be calling this CiteProc-RB, CiteProc-rb or
Citeproc-rb?

Regards,

Liam.2008/6/19 Bruce D’Arcus <@Bruce_D_Arcus1>:

The latest commit for Citeproc-rb make it useable as a gem, with a
conventional file/directory structure to suit.

I’ve started work on some fixtures, with the following convention:

test/test_csl.rb - test case
test/fixtures/csl_test_data.json - Bruce’s test data in JSON
test/fixtures/styles/test_sort.csl - simplified version of
chicago-author-date.csl, to test sort

Something like this is hopefully easy to copy or modify for Citeproc-PY and
other ports.

Great!

Incidentally should I be calling this CiteProc-RB, CiteProc-rb or
Citeproc-rb?

Hmm … I don’t much like the last one, but have no strong preference
between the first two, or even citeproc-rb.

But I’m not that great with names. Any other opinions?

Bruce

I’ve started work on some fixtures, with the following convention:

test/test_csl.rb - test case
test/fixtures/csl_test_data.json - Bruce’s test data in JSON
test/fixtures/styles/test_sort.csl - simplified version of
chicago-author-date.csl, to test sort

Something like this is hopefully easy to copy or modify for Citeproc-PY and
other ports.

Great!

It seem perfect for the Haskell implementation too.

Incidentally should I be calling this CiteProc-RB, CiteProc-rb or
Citeproc-rb?

Hmm … I don’t much like the last one, but have no strong preference
between the first two, or even citeproc-rb.

But I’m not that great with names. Any other opinions?

the haskell stuff has been named citeproc-hs, small caps.

Cheers,
Andrea