Infrastructure for style-level testing

Hi, all,

The experience of two major deployments of citeproc-js has driven home
the value of the test suite fixtures. But after writing upward of 640
of the things, I don’t relish the idea of continuing to do this by
hand. :slight_smile:

Thanks to generous feedback from many quarters, the processor itself
now seems to be pretty well under control. The problem of style
testing remains with us, however, and the bottleneck is obvious; no
one in their right mind is going to sit down and write the hundreds of
tests that would be required to prove that any individual style is
trouble-free. Yet that level of testing will be needed, it seems to
me, if we want CSL to establish solid credibility with and find favor
among major publishers.

The “syntax” of the test fixtures is admittedly a dog’s breakfast
currently, set up in a rush before speeding forward with the coding.
JSON has been used to describe the input data mostly because it does
not require conversion (saving me the time required to write a
converter), and because typing hundreds of tests in JSON by hand is at
least a little less painful than writing hundreds of tests in XML.
However, now that we have the processor running in at least two
interactive applications, recasting the fixtures in a single uniform
syntax seems a good idea, and the obvious choice would be XML.

What would make XML attractive at this point is that we now have
citeproc-js running in two (2) interactive applications, with many
thousands of users between them. Users typically recognize style
errors when they are entering citations in the word processor. At that
point, the document state is instantiated in the processor registry,
and can be readily accessed via JS. It would be very simple to build a
DOM construct containing the items cited, the citations set in the
document, the context (bibliography/citation), and the expected result
string (provided by the user), and the style code or a pointer to the
style in use (if a standard style), and to dump that to disk. Such a
reporting facility would save time in the fielding of user error
reports, and it would make it possible to contemplate building
comprehensive test suites for individual styles.

So … it would be a great help to the cause if a persevering soul or
hungry intern could be persuaded to produce a plugin or extension for
use with one or both of our citeproc-js-consuming projects, offering
a “Report style error” button in the respective word-processor plugin
menu. I would be very (very) happy to adapt the citeproc-js test
runner to process such output, and (although I’m only guessing) I
reckon that the same goes for Andrea Rossato and other developers.

One could even argue that a robust set of style-level tests, coupled
with a simple CSL IDE built with, say, xulrunner, would have a greater
impact than a style editor, since you could then (safely) rely on
contributions by amateur programmers and relative newcomers to the CSL
scene as collaborative style maintainers.

So that’s the pitch. I should add that I’m unlikely to undertake such
a project off my own bat, as the processor is looking pretty stable,
and need to get on with other tasks. But I do think this would be a
good thing for CSL, so I thought I would float the suggestion, in case
anyone out there is in a Christmas mood vis-a-vis our expanding
community.

Frank

(cross-posted to citeproc-js list, with apologies for any double-deliveries)

One could even argue that a robust set of style-level tests, coupled
with a simple CSL IDE built with, say, xulrunner, would have a greater
impact than a style editor, since you could then (safely) rely on
contributions by amateur programmers and relative newcomers to the CSL
scene as collaborative style maintainers.

Just want to note that I think this highlights that a lot of us are
thinking about the importance of what comes down to smart previewing
as a possible way to facilitate a number of things, including style
creation and editing. E.g.:

https://github.com/citation-style-language/styles/issues#issue/4

Some of us, for example, were again talking about Dan Stillman’s
original thought about a direct-edit preview style creation interface.
With good preview data, and some UI magic to deal with hairy things
like names and dates, this might be a really good idea.

Bruce

Frank Bennett <@Frank_Bennett> writes:

So … it would be a great help to the cause if a persevering soul or
hungry intern could be persuaded to produce a plugin or extension for
use with one or both of our citeproc-js-consuming projects, offering
a “Report style error” button in the respective word-processor plugin
menu. I would be very (very) happy to adapt the citeproc-js test
runner to process such output, and (although I’m only guessing) I
reckon that the same goes for Andrea Rossato and other developers.

One could even argue that a robust set of style-level tests, coupled
with a simple CSL IDE built with, say, xulrunner, would have a greater
impact than a style editor, since you could then (safely) rely on
contributions by amateur programmers and relative newcomers to the CSL
scene as collaborative style maintainers.

Frank’s guess is correct: I’d be delighted to support such a facility
for style development and I agree that this could be more important than
a style editor for the long term robustness of CSL.–
andrea rossato

Absolutely. The citeproc-test suite is an invaluable resource for processor development, so to make the tests more accessible to style authors or to an even larger user base is an effort I would definitely support.

By the way, (and because I still owe Bruce and Rintze an example) I have been playing around with converting the citeproc-test JSON data into cucumber features; you can take a look at an example at:

https://github.com/inukshuk/citeproc-ruby/blob/master/features/condition/is_numeric.feature

The advantage of cucumber features is that they are extremely intuitive and easy to write. Although, in this case the main complexity in formulating a test case is in defining the style and input data. If I understand it correctly, this is also Frank’s position and I would agree that it is a good approach to aim at generating test cases from within an application that already maintains the relevant input data.

Sylvester

Yes. Only downside is it’s then Ruby-specific.

Seems you have some other news of sorts: that one can now do “gem
install citeproc-ruby”.

Cool!

Bruce

By the way, (and because I still owe Bruce and Rintze an example) I have been playing around with converting the citeproc-test JSON data into cucumber features; you can take a look at an example at:

https://github.com/inukshuk/citeproc-ruby/blob/master/features/condition/is_numeric.feature

The advantage of cucumber features is that they are extremely intuitive and easy to write. Although, in this case the main complexity in formulating a test case is in defining the style and input data.

Yes. Only downside is it’s then Ruby-specific.

Yes (although I believe there is a Java implementation around); however, the step definitions in this case are extremely simple (e.g., see https://github.com/inukshuk/citeproc-ruby/blob/master/features/step_definitions/citeproc_steps.rb), therefore, you could easily use Ruby sorto of like a shell script that fires up rhino, hugs, or ghc and feeds the test data to citeproc-js or citeproc-hs.

Not to stray too far off topic, though, you are of course right that an implementation agnostic test suite isthe best option going forward.

Seems you have some other news of sorts: that one can now do “gem
install citeproc-ruby”.

Cool!

It is still very experimental and incomplete but I guess ‘release early’ won’t hurt at this point. (It only works in Ruby 1.9. at the moment, though, because of differences in unicode handling.) I have also reconsidered your advice and changed the license to a two-clause BSD license.

Sylvester

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

By the way, (and because I still owe Bruce and Rintze an example) I have been playing around with converting the citeproc-test JSON data into cucumber features; you can take a look at an example at:

https://github.com/inukshuk/citeproc-ruby/blob/master/features/condition/is_numeric.feature

The advantage of cucumber features is that they are extremely intuitive and easy to write. Although, in this case the main complexity in formulating a test case is in defining the style and input data.

Yes. Only downside is it’s then Ruby-specific.

Yes (although I believe there is a Java implementation around); however, the step definitions in this case are extremely simple (e.g., see https://github.com/inukshuk/citeproc-ruby/blob/master/features/step_definitions/citeproc_steps.rb), therefore, you could easily use Ruby sorto of like a shell script that fires up rhino, hugs, or ghc and feeds the test data to citeproc-js or citeproc-hs.

Not to stray too far off topic, though, you are of course right that an implementation agnostic test suite isthe best option going forward.

Seems you have some other news of sorts: that one can now do “gem
install citeproc-ruby”.

Cool!

It is still very experimental and incomplete but I guess ‘release early’ won’t hurt at this point. (It only works in Ruby 1.9. at the moment, though, because of differences in unicode handling.)

I remembered 1.8 didn’t deal with unicode very well. But even with
1.9.2, I get this:

x = [‘ö’,‘o’,‘a’,‘x’]
=> [“ö”, “o”, “a”, “x”]
x.sort
=> [“a”, “o”, “x”, “ö”]

I was used this just working in saxon. Is there any way to get it to
sort right in 1.9.2?

I have also reconsidered your advice and changed the license to a two-clause BSD license.

Still compatible with GPL?

Bruce

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

By the way, (and because I still owe Bruce and Rintze an example) I have been playing around with converting the citeproc-test JSON data into cucumber features; you can take a look at an example at:

https://github.com/inukshuk/citeproc-ruby/blob/master/features/condition/is_numeric.feature

The advantage of cucumber features is that they are extremely intuitive and easy to write. Although, in this case the main complexity in formulating a test case is in defining the style and input data.

Yes. Only downside is it’s then Ruby-specific.

Yes (although I believe there is a Java implementation around); however, the step definitions in this case are extremely simple (e.g., see https://github.com/inukshuk/citeproc-ruby/blob/master/features/step_definitions/citeproc_steps.rb), therefore, you could easily use Ruby sorto of like a shell script that fires up rhino, hugs, or ghc and feeds the test data to citeproc-js or citeproc-hs.

Not to stray too far off topic, though, you are of course right that an implementation agnostic test suite isthe best option going forward.

Seems you have some other news of sorts: that one can now do “gem
install citeproc-ruby”.

Cool!

It is still very experimental and incomplete but I guess ‘release early’ won’t hurt at this point. (It only works in Ruby 1.9. at the moment, though, because of differences in unicode handling.)

I remembered 1.8 didn’t deal with unicode very well. But even with
1.9.2, I get this:

x = [‘ö’,‘o’,‘a’,‘x’]
=> [“ö”, “o”, “a”, “x”]
x.sort
=> [“a”, “o”, “x”, “ö”]

I was used this just working in saxon. Is there any way to get it to
sort right in 1.9.2?

Unicode-Strings are a big mess right now; encoding works like a charm but manipulation is painful; for example this does not work either:

x.map(&:upcase)
=> [“ö”, “O”, “A”, “X”]

For this reason I’m currently using the unicode-utils gem (which doesn’t support sorting yet I believe), but it is something that I need to assess separately. I remember that Unicode strings worked fairly well under 1.8 using active-support.

I have also reconsidered your advice and changed the license to a two-clause BSD license.

Still compatible with GPL?

Yes. It’s the ‘FreeBSD’ License and it is listed as GPL compatible:

http://www.gnu.org/licenses/license-list.html#GPLCompatibleLicenses

Sylvester

We could start with:

In any case, what do we need?

That work?

Anything else?

https://gist.github.com/909764

Might be cool if the XML test could get fed to cucumber.

Bruce

Off the top of my head:

An optional akin to the ‘citation items’ or ‘bibsection’ in the current test suite (if no input is specified, all the elements in are to be processed).

The tag should support inline styles as well as URIs to load a certain predefined style (because styles can change that is not ideal for processor testing, but it makes writing the tests easier and is useful for style testing — that way we can use the same test format for testing processors and styles.

An optional tag outside of the definition. Again, this should support inline definition or definition by reference.

An optional tag.

An optional tag (or similar); this is something I find quite useful in cucumber: you can add any number of tags to a scenario description. For example, the tags ‘experimental’, ‘v1.0’, ‘edtf’, or ‘html’ could denote that the test is experimental, expected to work in CSL 1.0, uses EDTF input data, or formats the expected result in HTML). Tags are easily extensible, optional as they don’t alter the test itself, but useful for filtering (e.g., I want to run tests that are tagged ‘v1.0’ but not ‘v1.1’ etc.).

Sylvester

I’ve added the start of the schema:

https://github.com/citation-style-language/schema/commit/b98aecf31577eb8c188e57c56d381332a2f58881

I mostly focused on the getting the basic RNC structure right, with
details still to be worked to be worked out. For example, apropos the
last point, do we want to constrain the tag values?

I have a few comments in the source to indicate other questions.

Feel free to comment on the commit, or fork and hack.

Bruce