plans moving forward?

So CSL has managed to evolve fairly nicely with a fairly informal
process, with different people stepping up to provide the labor to get
particular things done. Most recently Rintze has done a great job with
the 1.0 release.

But Rintze is finishing his PhD and moving on to a new job (one, oddly
enough, that’s only about two hours from me!), and so this raises
again the question of how we want to manage things going forward.

The biggest question has to do with the revision scheme that Rintze
proposed, and which seems sensible to me: what process do we want to
use to determine more significant, backward incompatible, changes? I
mean, ideally we don’t have to do this at all, but just wondering.

We could continue as we’ve done, which is that effectively if someone
proposes a solution that doesn’t garner any objections, and can
provide a reasonable test scenario and spec code for it, then we add
it, and just leave it up to other implementors to catch up.

But I’m not sure that’s the best strategy, particular as we start to
role out the CSL editor, online repo, etc. We don’t want to have
multiple versions of styles, for example.

The labor question is perhaps somewhat less important, but perhaps
some of my suggestions above might point us in a direction to address
that as well?

So, any thoughts or ideas?

Bruce

So CSL has managed to evolve fairly nicely with a fairly informal
process, with different people stepping up to provide the labor to get
particular things done. Most recently Rintze has done a great job with
the 1.0 release.

But Rintze is finishing his PhD and moving on to a new job (one, oddly
enough, that’s only about two hours from me!), and so this raises
again the question of how we want to manage things going forward.

The biggest question has to do with the revision scheme that Rintze
proposed, and which seems sensible to me: what process do we want to
use to determine more significant, backward incompatible, changes? I
mean, ideally we don’t have to do this at all, but just wondering.

Just to note that four of the items in the tracker are hard, but
important enough that they should not be left unattended (#17 on
cleaning up identifiers, #16 on institution names, #15 on making
publisher an author variable, with multiple publisher-place attribute
values, #14 on listing the last author’s name in a truncated listing).

We could continue as we’ve done, which is that effectively if someone
proposes a solution that doesn’t garner any objections, and can
provide a reasonable test scenario and spec code for it, then we add
it, and just leave it up to other implementors to catch up.

But I’m not sure that’s the best strategy, particular as we start to
role out the CSL editor, online repo, etc. We don’t want to have
multiple versions of styles, for example.

Multiple versions of styles are inevitable (the CSL 0.8.1 styles
remain available for use with 0.8.1 processors); but the version
attribute and update-styles.xsl are our friends.

The labor question is perhaps somewhat less important, but perhaps
some of my suggestions above might point us in a direction to address
that as well?

I’m not sure what you’re suggesting.

When I wrote about process earlier, I was more concerned with issues
that go beyond the four corners of the schema. There are three axes
to interoperability: (1) that CSL processors conform to the schema and
specification; (2) that CSL styles produce the same output for
congruent bibliographic records in in all calling applications; and
(3) that the CSL language and styles be a reliable tool for formatting
ordinary citations in all major fields of publishing.

Item (1) is satisfied by the current process. Item (2) requires
knowledge of the item type and field assignments in calling
applications (there must be no one-to-many mappings to CSL variables,
and CSL should perhaps provide some guidance on field content
appropriate to particular categories of content). Item (3) is pretty
well covered for fields other than law (apart from issues around
review articles, translations, and other hierarchical items). Legal
support is a major task, though. The essential metadata sets for
legal materials are complex, vary between jurisdictions, and vary over
time within any given jurisdiction. There are efforts out there to
produce uniform standards for data exchange between jurisdictions, but
it’s certain that CSL would need a slightly extended set of item
types, and the sort of guidance on mapping conventions described under
Item (2), to get this going.

I’m concerned that Item (2), in particular, is beyond our capacity to
handle as a small circle of developers. Yet guidance on mapping
conventions will be needed, to make CSL processors attractive as a
general solution to citation formatting issues.

I’m kind of stuck for a solution to this at this point, although I’m
pretty sure that, with working code out there, it will be addressed by
someone sooner or later – and certainly one option would be to down
tools at this point (after dealing with the pending issues for 1.1),
and let Items (2) and (3) sort themselves out in other fora. That has
certainly worked out pretty well for development of the Javascript
language (cough, cough). I guess it depends on how much central
control you think is needed to keep things on track.

Frank

I think backward-incompatible changes are unavoidable if want to continue to
improve CSL: so will be the need for hosting older versions of CSL styles.

We could continue as we’ve done, which is that effectively if someone
proposes a solution that doesn’t garner any objections, and can
provide a reasonable test scenario and spec code for it, then we add
it, and just leave it up to other implementors to catch up.

But I’m not sure that’s the best strategy, particular as we start to
role out the CSL editor, online repo, etc. We don’t want to have
multiple versions of styles, for example.

I think backward-incompatible changes are unavoidable if want to continue to
improve CSL: so will be the need for hosting older versions of CSL styles.

Yes, I know. But there’s an important strategic question about the
pace of change. It would be awkward and confusing for us, say, in
three months to issue a 1.1 change, when we have Zotero running 1.0
code, the CSL Editor for 1.0 “coming soon”, and Mendeley running 0.8.
We want to first get some traction on 1.0, which by definition means
some stability.

Hello,

It would be awkward and confusing for us, say, in
three months to issue a 1.1 change,
when we have Zotero running 1.0
code, the CSL Editor for 1.0 “coming soon”, and Mendeley running 0.8.
We want to first get some traction on 1.0, which by definition means
some stability.

I’d hope that in three months time we’d have Mendeley running 1.0
(using citeproc). The current plan is the next
release will use the ‘old’ 0.8 processor and then after that we’ll
migrate. Having said that, I agree with the
basic point about possible confusion arising from having several
different versions in active use.

Regards,
Robert.

Which “citeproc”? Frank’s JS version?

Bruce

I think backward-incompatible changes are unavoidable if want to
continue to
improve CSL: so will be the need for hosting older versions of CSL
styles.

Yes, I know. But there’s an important strategic question about the
pace of change. It would be awkward and confusing for us, say, in
three months to issue a 1.1 change, when we have Zotero running 1.0
code, the CSL Editor for 1.0 “coming soon”, and Mendeley running 0.8.
We want to first get some traction on 1.0, which by definition means
some stability.

I’m fully aware that each backward-incompatible release has a significant
cost: we have to provide an upgrade path for styles, CSL processors have to
be updated and tested, and implementers have to transition to a new CSL
processor. But given the current speed of CSL development I’d say a 1.1
release in the next three months is highly unlikely. I would be happy if we
can maintain a 1 to 2 year release cycle for major releases.

Rintze

Right. But remember my bigger question here: are we content to
continue to figure out these details (say, when x.x releases) on an ad
hoc basis as we go along, or do we need to talk about something a
little more explicit or formal?

I don’t have a strong opinion; just wondering …

Bruce

Which “citeproc”? Frank’s JS version?

Yes. At least, that is the plan.

Regards,
Robert.

What would be the alternative(s) to the current practice?

Rintze

Interesting. Please update us once you get into it.

BruceOn Mon, May 10, 2010 at 12:27 PM, Robert Knight <@Robert_Knight>wrote:

Not really sure. It could involve, I suppose, more formal roadmaps,
with dates and added features?

Bruce

We could continue as we’ve done, which is that effectively if someone
proposes a solution that doesn’t garner any objections, and can
provide a reasonable test scenario and spec code for it, then we add
it, and just leave it up to other implementors to catch up.

But I’m not sure that’s the best strategy, particular as we start to
role out the CSL editor, online repo, etc. We don’t want to have
multiple versions of styles, for example.

I think backward-incompatible changes are unavoidable if want to continue to
improve CSL: so will be the need for hosting older versions of CSL styles.

Yes, I know. But there’s an important strategic question about the
pace of change. It would be awkward and confusing for us, say, in
three months to issue a 1.1 change, when we have Zotero running 1.0
code, the CSL Editor for 1.0 “coming soon”, and Mendeley running 0.8.
We want to first get some traction on 1.0, which by definition means
some stability.

I agree that longish release cycles are good, and that a 1.1 schema
release can wait for a year or so until things settle down.

That would assume some clear idea of where we’re going. Do you, or anyone
for that matter, have long-term targets in mind for CSL that could warrant a
roadmap?

I know there are still some features missing for full-blown legal style
support, but I think CSL 1.0 is already pretty mature (it definitely is for
the natural sciences). If it is just a matter of maintaining CSL by adding a
feature here and there, my vote would be to just have a release every now
and then when we think we made some significant progress.

Rintze

I’m stepping in just to share the experience of a catching-up
developer. I’m using the test-suite and the documentation to implement
all the CSL-1.0 features, and I think that the developing environment
you have created in the last year or so fulfills the need of the
detailed information writing an implementation requires. You really
did a great job.

I think that separating the test-suite from citeproc-js and tagging it
together with the schema and the specification would be a step forward
(right know name ordering are different from CSL-1.0 specification,
for instance).

The Haskell implementation, with the patches I’ve just pushed, passes
295 test out of 499, now. I don’t think I’ll ever reach the 100% of
the present test-suite, since some of the citeproc-js features will
work differently in citeproc-hs - for instance name parsing.

I’m rushing the have a new release soon, to provide pandoc with a
CSL-1.0 compatible processor, with the features the citeproc-js API is
offering. Which would possibly help providing that broader user base
needed to start addressing the other issues Frank was rising.

Andrea

Hi Andrea,

I think that separating the test-suite from citeproc-js and tagging it
together with the schema and the specification would be a step forward
(right know name ordering are different from CSL-1.0 specification,
for instance).

The Haskell implementation, with the patches I’ve just pushed, passes
295 test out of 499, now. I don’t think I’ll ever reach the 100% of
the present test-suite, since some of the citeproc-js features will
work differently in citeproc-hs - for instance name parsing.

Given this, to have any substantive changes to suggest for the test suite?

I’m rushing the have a new release soon, to provide pandoc with a
CSL-1.0 compatible processor, with the features the citeproc-js API is
offering. Which would possibly help providing that broader user base
needed to start addressing the other issues Frank was rising.

Cool; look forward to it!

Bruce

When I wrote about process earlier, I was more concerned with issues
that go beyond the four corners of the schema. There are three axes
to interoperability: (1) that CSL processors conform to the schema and
specification; (2) that CSL styles produce the same output for
congruent bibliographic records in in all calling applications; and
(3) that the CSL language and styles be a reliable tool for formatting
ordinary citations in all major fields of publishing.

Item (1) is satisfied by the current process. Item (2) requires
knowledge of the item type and field assignments in calling
applications (there must be no one-to-many mappings to CSL variables,
and CSL should perhaps provide some guidance on field content
appropriate to particular categories of content). Item (3) is pretty
well covered for fields other than law (apart from issues around
review articles, translations, and other hierarchical items). Legal
support is a major task, though. The essential metadata sets for
legal materials are complex, vary between jurisdictions, and vary over
time within any given jurisdiction. There are efforts out there to
produce uniform standards for data exchange between jurisdictions, but
it’s certain that CSL would need a slightly extended set of item
types, and the sort of guidance on mapping conventions described under
Item (2), to get this going.

I’m concerned that Item (2), in particular, is beyond our capacity to
handle as a small circle of developers. Yet guidance on mapping
conventions will be needed, to make CSL processors attractive as a
general solution to citation formatting issues.

I’m kind of stuck for a solution to this at this point, although I’m
pretty sure that, with working code out there, it will be addressed by
someone sooner or later – and certainly one option would be to down
tools at this point (after dealing with the pending issues for 1.1),
and let Items (2) and (3) sort themselves out in other fora. That has
certainly worked out pretty well for development of the Javascript
language (cough, cough). I guess it depends on how much central
control you think is needed to keep things on track.

Frank

I’m stepping in just to share the experience of a catching-up
developer. I’m using the test-suite and the documentation to implement
all the CSL-1.0 features, and I think that the developing environment
you have created in the last year or so fulfills the need of the
detailed information writing an implementation requires. You really
did a great job.

I think that separating the test-suite from citeproc-js and tagging it
together with the schema and the specification would be a step forward
(right know name ordering are different from CSL-1.0 specification,
for instance).

Disentangling the test fixtures from the citeproc-js source is a very
good idea, and this is a good time to begin working on it. It will
require a new archive (I’ve opened a working repository, which can be
shifted elsewhere later if need be), and a refactored version of the
test.py script, with facilities only for validating the embedded CSL
against the schema, and for “compiling” the JSON version of the
fixtures.

The standard suite should contain only purely standard tests. As
you’ve noticed, they’re currently a mongrel mixture of standard tests
at level 1.0, tests of experimental features (those marked with
version “1.1x”), and tests of functionality specific to citeproc-js
(the functionality documented in the “Dirty Tricks” section of the
processor manual). I’ll need to do some work at my end will be needed
to make things a bit more modular in the citeproc-js testing
framework, after which I should be able to move a clean set of
standard tests to the external repository.

Once things are cleanly separated, there are certainly possibilities
for housecleaning. The category prefixes and the test names were just
tossed on on the fly, and often don’t particularly make sense. The
fixtures themselves often contain more data, or more complex CSL, than
is needed to test the particular feature targeted. This has been
useful, in a way; the randomness sometimes catches bugs that I might
have missed had the tests been constructed in a more conservative and
systematic fashion.

We could also use lots more tests. :slight_smile:

There should probably also be some documentation. I can offer the
existing chapter from the processor manual as a starting point.

It will take awhile to get the separation done. I’ll post to the list
when something is ready.

Frank

When I wrote about process earlier, I was more concerned with issues
that go beyond the four corners of the schema. There are three axes
to interoperability: (1) that CSL processors conform to the schema and
specification; (2) that CSL styles produce the same output for
congruent bibliographic records in in all calling applications; and
(3) that the CSL language and styles be a reliable tool for formatting
ordinary citations in all major fields of publishing.

Item (1) is satisfied by the current process. Item (2) requires
knowledge of the item type and field assignments in calling
applications (there must be no one-to-many mappings to CSL variables,
and CSL should perhaps provide some guidance on field content
appropriate to particular categories of content). Item (3) is pretty
well covered for fields other than law (apart from issues around
review articles, translations, and other hierarchical items). Legal
support is a major task, though. The essential metadata sets for
legal materials are complex, vary between jurisdictions, and vary over
time within any given jurisdiction. There are efforts out there to
produce uniform standards for data exchange between jurisdictions, but
it’s certain that CSL would need a slightly extended set of item
types, and the sort of guidance on mapping conventions described under
Item (2), to get this going.

I’m concerned that Item (2), in particular, is beyond our capacity to
handle as a small circle of developers. Yet guidance on mapping
conventions will be needed, to make CSL processors attractive as a
general solution to citation formatting issues.

I’m kind of stuck for a solution to this at this point, although I’m
pretty sure that, with working code out there, it will be addressed by
someone sooner or later – and certainly one option would be to down
tools at this point (after dealing with the pending issues for 1.1),
and let Items (2) and (3) sort themselves out in other fora. That has
certainly worked out pretty well for development of the Javascript
language (cough, cough). I guess it depends on how much central
control you think is needed to keep things on track.

Frank

I’m stepping in just to share the experience of a catching-up
developer. I’m using the test-suite and the documentation to implement
all the CSL-1.0 features, and I think that the developing environment
you have created in the last year or so fulfills the need of the
detailed information writing an implementation requires. You really
did a great job.

I think that separating the test-suite from citeproc-js and tagging it
together with the schema and the specification would be a step forward
(right know name ordering are different from CSL-1.0 specification,
for instance).

Andrea, everyone,

I’ve separated the non-standard and standard test fixtures in the test
suite, and moved the latter to a separate repository, here:

http://bitbucket.org/fbennett/citeproc-test

There are almost certainly still some citeproc-js-isms among the
tests. If folks call them to my attention as you come across them, I
can move them from the standard set to my local supplementary test
bundle. Apart from that small request, please treat this as a
community resource. The current location is just a holding zone; it
might make sense to move the canonical repository over to Bruce’s
account, so it can be housed together with the schema and docs.

Frank

Thanks Frank for your work. I’ll submit the new tests I’ll be writing,
or any correction I’ll think to propose, as patches to that
repository.

About the citeproc-js-isms: actually I’m considering citeproc-js as
the reference implementation and I’m planning to expose an API which
should be modeled on citeproc-js’s API. In other words, citeproc-hs
could share with citeproc-js the input, citation-items and the
citations JSON data format (input and citation-items are already
supported).

I wonder whether it could be useful to add a testsuite for covering
extensions that may be expected in a style processor. I would do my
best to support it.

Andrea

ps: this is a run of the testsuite with my tree (should be the same
with the public repository).

443 tests in 35 groups
282 successes
161 failures

ps2: to run it with the darcs version, after installing citeproc-hs,
edit test/test_basic.hs to point to the test/machines/ subdirectory
and run:
runhaskell test/test_basic.hs
or
runhaskell test/test_basic.hs name nameattr nameorder

ps3: Bruce: installing hs-bibutils, the haskell bindings to bibutils,
should be now really easy (no need to install bibutils, patch it or
whatever). citeproc-hs comes with bibutils support by default now.

About the citeproc-js-isms: actually I’m considering citeproc-js as
the reference implementation and I’m planning to expose an API which
should be modeled on citeproc-js’s API. In other words, citeproc-hs
could share with citeproc-js the input, citation-items and the
citations JSON data format (input and citation-items are already
supported).

We’ve talked about this before, but given this and other recent
developments, is it time to formalize the data model and JSON
representation?

I wonder whether it could be useful to add a testsuite for covering
extensions that may be expected in a style processor. I would do my
best to support it.

You mean stuff not formally a part of CSL, but implemented in, say, citeproc-js?

ps: this is a run of the testsuite with my tree (should be the same
with the public repository).

   443 tests in 35 groups
   282 successes
   161 failures

Cool. What’s your sense of what these data tell us about the state of
the processor?

ps2: to run it with the darcs version, after installing citeproc-hs,
edit test/test_basic.hs to point to the test/machines/ subdirectory
and run:
runhaskell test/test_basic.hs
or
runhaskell test/test_basic.hs name nameattr nameorder

ps3: Bruce: installing hs-bibutils, the haskell bindings to bibutils,
should be now really easy (no need to install bibutils, patch it or
whatever). citeproc-hs comes with bibutils support by default now.

Awesome!

So this is only for the darcs version; right?

When are you planning to push out a new release?

Bruce