CiteProc for .NET

Hi everyone,

My name is Fouke Boss, I am a software developer from the Netherlands
with a background in mathematics. In my spare time I am working on a new
piece of genealogy software with some interesting ideas (or at least I
think so). Part of this software is about citing the sources, and
through this
(http://fhtw.byu.edu/static/conf/2008/tucker-10-presentation-fhtw2008.pdf)
presentation I became interested in CSL.

And so the last couple of weeks I have been working on a Microsoft .NET
implementation of the CSL 1.0.1 specification. So far, the
specifications, together with the test suite, have been clear enough for
the most part, and so earlier today I have been able to put a first
version of this .NET implementation online at
https://github.com/fouke-boss/citeproc-dotnet.

If you are a Windows user (yeah, I know), you could download the
binaries (in the /Binaries folder) and run the CiteProc.WpfDemo.exe to
get a first impression of the current capabilities and shortcomings.

I am hoping to implement the remaining features in the next couple of
weeks, and for this, I’d like to ask you all for some help. At the
moment, my main issues are these:

  1. I’m trying to figure out if my currently stateless processor (it
    receives a list of items, and returns a formatted bibliography or
    citation) should in fact be stateful in order to implement
    disambiguation properly. And (related as far as I can tell): what is the
    exact purpose of all these test cases with
    CITATIONS/CITATION-ITEMS/BIBENTRIES/BIBSECTION sections in it?

  2. The specifications do not mention anything about the removal of
    multiple spaces, dots, commas or other punctuation, but the test suite
    requires this behavior, and quite rightly so imho. Can anyone enlighten
    me about the exact rules the other processors have implemented?

  3. I am thinking of supplementing the CSL Test Suite with a ‘Basic Test
    Suite’ that systematically tests each and every element or attribute
    (e.g. the current Csl Test Suite does not contain any test case for
    chicago page ranges). A first (but far from complete) draft can be found
    in the github repository. Rintze Zelle pointed out that a move to
    cucumber has been considered. Do you think such an additional set of
    tests is useful? Which format is preferred?

  4. What license should I use? I’ve looked around a bit ('CPAL or AGPL’
    for citeproc-js, ‘AGPL and the FreeBSD’ for CiteProc-Ruby), but I don’t
    have any experience in this matter. Any thoughts or suggestions?

Any help would be appreciated!
With kind regards,

Fouke

Hi Fouke,

Welcome to the club of CiteProc impementers!

The tests you’re referring to are citeproc-js tests, not CSL only
tests, i.e., they also cover implementation details of citeproc-js
which are not part of the specification. We have been planning to
create an implementatioin agnostic test suite for a few years now. If
anyone’s to blame that it isn’t ready yet, it’s me: I volunteered to
work on it, but never found the time to do it.

If you’re interested in additional tests, there are a lot of citeproc-
ruby and csl-ruby tests here:

https://github.com/inukshuk/citeproc-ruby/tree/master/spec/citeproc/rub
y




These are appoximately 1,000 tests covering many aspects of CSL and
some citeproc-ruby implementation details (I’m certain there are tests
for page ranges for example).

Finally, here are most of citeproc-js tests converted to cucumber
features (not including tests added in the last two years):

I have tried for a long time to write a stateless processor, but I
don’t think it’s possible (disambiguation aside, you need have citation
styes which use ibidem for consecutive citations and, if I remember
correctly, there are number of other requirements which make it
necessary to manage state). My current solution is to distinguish
between a stateless renderer component, a CSL node tree for style and
locale, a formatter (for formatting the output and handling things like
double spaces, flipping single or double quotation marks etc.), and a
stateful processor – I don’t think you can make do with significantly
less.

Sylvester

signature.asc (213 Bytes)

Hi Sylvester,

Thanks for the welcome!

Yeah, an implementation agnostic test suite would be ideal, as it’s
frustrating some times thinking I’ve implemented one feature, only to
find that the test cases also depend on features that are not part of
the CSL specs. Maybe later on, when I’ve implemented most of the
remaining features, I will contact you again to see how I can help to
create such a suite. That will also be a good time for me to switch from
the citeproc-js test format to cucumber (SpecFlow as it’s called for
.NET), and then I will also have a look at the Ruby test sets. Thanks.

And for stateless vs statefull, like you I’ve tried to avoid a stateful
implementation until now, but I too no longer see any way to avoid it.
As you don’t either, I’ll give in then, embrace the dark side.

The other components you mention are already part of my solution, that’s
reassuring. I added one other: I actually compile the style/locale node
tree into c# code.

Could you point out to me which parts of your code tend to the double
spaces/quotes/punctuation?

Thanks for now,
Regards,

Fouke

Sylvester Keil schreef op 2016-06-29 13:23:> Hi Fouke,

FWIW, I fully agree that the divergence between CSL specs and what
citeprocs need to do is a problem and it’s one of the things that we’d like
to address in the next set of the specs (pretty sure that Rintze is on the
same page with me on this). I’m not 100% sure to what degree that’ll be
possible while still keeping the specs a manageable document – so the
exact format this takes and how long it’ll take is unclear.

I think it would be best to work toward a situation where the
specification is a clean subset of the unit test coverage, meaning
that:

  • all behavior described in the spec should be covered by, and be
    consistent with, our unit tests
  • additional tests may be used to go above and beyond the spec

This has the advantage that the specification, which is still the only
source of in-depth information on CSL, doesn’t need to bloat to cover
every single edge case. That helps both readability for style authors,
and maintainability. I think it’s fine if the spec takes a
higher-level view of some topics (like suppression of duplicated
punctuation) and just defers to our unit tests for the desired
behavior.

I’d be happy to help curate the tests if somebody else sets up a
proper infrastructure. Once we have that, we can work to organize/tag
tests, ensure proper coverage, and minimize tests as much as possible.

Rintze