CSL roadmap

Rintze_Zelle · September 10, 2012, 5:46pm

Hi all,

Now that CSL 1.0.1 is out the door, I would like to inquire what
people think should be on the CSL roadmap for the coming year or so,
and what should have the highest priority. In particular, I’m curious
about the things people would like me to work on. Please speak up!

My own thoughts:

I’m relatively happy with the workflow for accepting style patches.
Having Travic CI check each pull request, and having the ability to
run the same tests locally
(https://github.com/citation-style-language/styles/wiki/Test-Environment)
has made it much easier to keep errors out. We now also rely much more
on the style authors themselves to fix style errors. Big kudos to
Sylvester for setting up the tests, and thanks to Sebastian and
Charles for helping out with adding styles! I just hope CSL doesn’t
become too popular (the current workload is still okay).
I don’t think we need to rush with trying to incorporate Frank’s MLZ
extensions to CSL into official CSL. MLZ will be a nice testing
ground, and I rather wait a little to see how the MLZ styles perform
in practice.
Several sites have started offering citeproc-JSON: see e.g.
http://blog.bibsonomy.org/2012/07/feature-of-week-citation-style-language.html
, http://www.doi.org/doi_handbook/5_Applications.html#5.4.1 and
http://crosscite.org/cn/#sec-4-1 . I think we could improve the
consistency of CSL style output between implementations by writing
better documentation on input expectations. This covers:
- date formats. Bruce seems to be a big proponent of adopting the
  Extended Date/Time Format as much as possible (EDTF, see
  http://www.loc.gov/standards/datetime/ )
- name parsing (e.g. extracting non-dropping and dropping particles
  from two-field names)
- field assignments (which item type should, or shouldn’t, have
  which fields). Aurimas prepared a map for Zotero:
  http://aurimasv.github.com/z2csl/typeMap.xml . It would be great if we
  could standardize the fields exposed to the CSL processors among the
  different reference managers (on a per item type basis). My hope is
  that we can clean up Zotero’s metadata model in the coming year
  (tickets as compiled by the Zotero user community can be found at
  https://github.com/ajlyon/zotero-bits/issues ), and offer the result
  as a guideline for other reference managers.
- the JSON schemas. These could use some documentation.
One of the things I’d really like to see is an improved CSL test
suite. Sylvester posted a Cucumber mockup format a while back (
https://github.com/inukshuk/citeproc-ruby/blob/1c420de0f7a86b7c35782dee86ce62cbebb47ab9/features/condition/is_numeric.feature
). I never had much success adding styles to the current test suite
setup, and I really would like to be able to categorize tests (e.g. on
CSL version, and on the CSL feature that is being tested). If the
infrastructure is there, I wouldn’t mind annotating the existing tests
by hand.
There are still quite a few open CSL tickets (
https://github.com/citation-style-language/schema/issues ). Some have
solutions that require a backwards incompatible release (i.e. CSL 1.1)
and style upgrades, while other tickets have stalled due to the
absence of good ideas or due to disagreement. I don’t expect too much
progress here, unless people step in and reboot the discussions.
I agree with Bruce that we might want to revisit our release
strategy. Going 2.5 years between releases is a bit long. I would
favor slightly more frequent releases (e.g. 1 per year), with
agreement on what can and what cannot end up in a 1.0.x release. We
could also allow certain new backward compatible features as soon as
they are approved and continuously update the spec between formal
releases.

Best,

Rintze

Sebastian_Karcher · September 10, 2012, 11:08pm

First, congratulations on the - as it looks so far - quite smooth release.

I’m relatively happy with the workflow for accepting style patches.
Having Travic CI check each pull request, and having the ability to
run the same tests locally
(Test Environment · citation-style-language/styles Wiki · GitHub)
has made it much easier to keep errors out. We now also rely much more
on the style authors themselves to fix style errors. Big kudos to
Sylvester for setting up the tests, and thanks to Sebastian and
Charles for helping out with adding styles! I just hope CSL doesn’t
become too popular (the current workload is still okay).

agreed - I’m a little concerned what will happen when the visual
editor goes live for real. While it writes pretty clean code, it
doesn’t validate and it does some odd things still.

I don’t think we need to rush with trying to incorporate Frank’s MLZ
extensions to CSL into official CSL. MLZ will be a nice testing
ground, and I rather wait a little to see how the MLZ styles perform
in practice.
agreed again - a lot of MLZ functionality already works with CSL once
the respective fields are added to Zotero et al.

(…)

field assignments (which item type should, or shouldn’t, have
which fields). Aurimas prepared a map for Zotero:
http://aurimasv.github.com/z2csl/typeMap.xml . It would be great if we
could standardize the fields exposed to the CSL processors among the
different reference managers (on a per item type basis). My hope is
that we can clean up Zotero’s metadata model in the coming year
(tickets as compiled by the Zotero user community can be found at
Issues · zotero/zotero-bits · GitHub ), and offer the result
as a guideline for other reference managers.
This is, imho, really the top priority - with different fields/item
we’ll get inconsistent outputs - cf. e.g. publishers for journals.
Happy to help work on this.

(…)

There are still quite a few open CSL tickets (
Issues · citation-style-language/schema · GitHub ). Some have
solutions that require a backwards incompatible release (i.e. CSL 1.1)
and style upgrades, while other tickets have stalled due to the
absence of good ideas or due to disagreement. I don’t expect too much
progress here, unless people step in and reboot the discussions.

I think some of them are pretty important. My top two are:

github.com/citation-style-language/schema

Support multiple items per citation-number (for chemistry journals)

opened 01:13PM - 24 Mar 11 UTC

bdarcus

1.2

We need to be able to support compound reference styles with output like: ``` (…21) (a) Childs, A. F.; Goldsworthy, L. J.; Harding, G. F.; King, F. E.; Nineham, A. W.; Norris, W. L.; Plant, S. G. P.; Selton, B.; Tompsett, A. L. L. J. Chem. Soc. 1948, 2174–2177. (b) Chase, B. H.; Downes, A. M. J. Chem. Soc. 1953, 3874–3877. ``` My understanding is this conceptually like a footnote citation, where the individual references simply get labeled based on their position in the list. If I understand that right, then all we need to be able to get the index number for the position within the citation, and then format that using `cs:number`. Does that mean we need to add something like `reference-in-citation-index` to the `cs-numbers` pattern? Aside: we have `first-reference-note-number` as a simple variable (not on number). Two things: 1. well, should it have been under number? 2. we probably need to establish consistent naming of three things: - the full citation as a whole - the individual references within the full citation - the item in the bibliography

github.com/citation-style-language/schema

Expand number of locator terms

opened 01:29PM - 03 May 12 UTC

rmzelle

1.1

See http://forums.zotero.org/discussion/14717/universal-locator-type/ In additi…on to just expanding the number of locator terms, we could also add an empty-string locator type, so that the user can store the label in the locator variable field. Regardless of the solution chosen, we probably should make an inventory of missing locator types. Identified so far: Plays: Act I, scene i, lines 12-23 (or act, scene, verse) Ancient sources: Book 1, lines 1-8; Book 6, chapters 57‐58 Bibles: Psalm 86:5; 1 Cor. 13:1 New International Version (Book Chapter:Verse, Translation) Legal: article http://www.longwoodshakespeare.org/quoteverse.html http://blog.apastyle.org/apastyle/2009/12/happy-holiday-citing-citation-of-classical-works.html http://www.skidmore.edu/classics/courses/2006fall/ssp100/citations.pdf --- See also the discussion at http://xbiblio-devel.2463403.n2.nabble.com/Improving-support-for-locators-td7578007.html

The first one is pretty major, but I’d really like to see 94 in one of
the smaller releases.--------
Sebastian Karcher
Ph.D. Candidate
Department of Political Science
Northwestern University

Rintze_Zelle · September 11, 2012, 12:16am

field assignments (which item type should, or shouldn’t, have
which fields). Aurimas prepared a map for Zotero:
http://aurimasv.github.com/z2csl/typeMap.xml . It would be great if we
could standardize the fields exposed to the CSL processors among the
different reference managers (on a per item type basis). My hope is
that we can clean up Zotero’s metadata model in the coming year
(tickets as compiled by the Zotero user community can be found at
Issues · zotero/zotero-bits · GitHub ), and offer the result
as a guideline for other reference managers.
This is, imho, really the top priority - with different fields/item
we’ll get inconsistent outputs - cf. e.g. publishers for journals.
Happy to help work on this.

The ball for this is really in the court of the Zotero team, though. I
don’t think you, me, Avram, Grégoire (Gracile) and others can do much
more prep work for Simon and Dan, and I’m pretty sure we’re all
available to give input once they start work on it. But it seems to be
a few months out at least (post-November).

There are still quite a few open CSL tickets (
Issues · citation-style-language/schema · GitHub ). Some have
solutions that require a backwards incompatible release (i.e. CSL 1.1)
and style upgrades, while other tickets have stalled due to the
absence of good ideas or due to disagreement. I don’t expect too much
progress here, unless people step in and reboot the discussions.

I think some of them are pretty important. My top two are:
Support multiple items per citation-number (for chemistry journals) · Issue #36 · citation-style-language/schema · GitHub
Expand number of locator terms · Issue #94 · citation-style-language/schema · GitHub

The first one is pretty major, but I’d really like to see 94 in one of
the smaller releases.

I’m actually hoping that we might be able to come up with a consistent
solution that addresses both custom fields, custom locator labels
(Expand number of locator terms · Issue #94 · citation-style-language/schema · GitHub), and
custom identifiers
(Identifiers · Issue #33 · citation-style-language/schema · GitHub).

A recent thought of mine was to create some sort of distinct
name-space for custom variables, e.g.:

That way you would have complete freedom to define custom variable
names, and there wouldn’t be conflicts with the ‘core’ set of official
CSL variables.

Regardless of the particular solution chosen, it might help if we make
a fresh start with the entire discussion. I think nobody will dispute
the usefulness of having custom fields, but I’m very keen to learn, in
the precisest terms possible, what implications Bruce, Dan, Simon, and
others foresee when it comes to custom variables (e.g. with regards to
syncing). That would help me understand what the considerations should
be when thinking of solutions. (a pointer would suffice as well, if
this already has been discussed)

Rintze

Rintze_Zelle · September 11, 2012, 12:40am

To stress my point on the need to discuss this in detail, I think that
the representation of custom fields in CSL styles isn’t nearly as
problematic an issue as coming to agreement regarding surrounding
issues like syncing (where I’m not an expert) and metadata storage.

Rintze

Bruce_D_Arcus1 · September 11, 2012, 1:04am

field assignments (which item type should, or shouldn’t, have
which fields). Aurimas prepared a map for Zotero:
http://aurimasv.github.com/z2csl/typeMap.xml . It would be great if we
could standardize the fields exposed to the CSL processors among the
different reference managers (on a per item type basis). My hope is
that we can clean up Zotero’s metadata model in the coming year
(tickets as compiled by the Zotero user community can be found at
Issues · zotero/zotero-bits · GitHub ), and offer the result
as a guideline for other reference managers.
This is, imho, really the top priority - with different fields/item
we’ll get inconsistent outputs - cf. e.g. publishers for journals.
Happy to help work on this.

The ball for this is really in the court of the Zotero team, though. I
don’t think you, me, Avram, Grégoire (Gracile) and others can do much
more prep work for Simon and Dan, and I’m pretty sure we’re all
available to give input once they start work on it. But it seems to be
a few months out at least (post-November).

There are still quite a few open CSL tickets (
Issues · citation-style-language/schema · GitHub ). Some have
solutions that require a backwards incompatible release (i.e. CSL 1.1)
and style upgrades, while other tickets have stalled due to the
absence of good ideas or due to disagreement. I don’t expect too much
progress here, unless people step in and reboot the discussions.

I think some of them are pretty important. My top two are:
Support multiple items per citation-number (for chemistry journals) · Issue #36 · citation-style-language/schema · GitHub
Expand number of locator terms · Issue #94 · citation-style-language/schema · GitHub

The first one is pretty major, but I’d really like to see 94 in one of
the smaller releases.

I’m actually hoping that we might be able to come up with a consistent
solution that addresses both custom fields, custom locator labels
(Expand number of locator terms · Issue #94 · citation-style-language/schema · GitHub), and
custom identifiers
(Identifiers · Issue #33 · citation-style-language/schema · GitHub).

A recent thought of mine was to create some sort of distinct
name-space for custom variables, e.g.:

That way you would have complete freedom to define custom variable
names, and there wouldn’t be conflicts with the ‘core’ set of official
CSL variables.

Regardless of the particular solution chosen, it might help if we make
a fresh start with the entire discussion. I think nobody will dispute
the usefulness of having custom fields,

I would

In short, and in general, we need to put a high premium on
interoperablity of data and styles.

I don’t have time to get into this in depth ATM, but we probably need
discussion of this and the previous concerns I’ve raised about
managing change in the CSL schema and spec over time.

Bruce

Sylvester_Keil · September 11, 2012, 6:42am

Several sites have started offering citeproc-JSON: see e.g.
BibSonomy Blog: Feature of the week: Citation Style Language export
, DOI® Handbook and
http://crosscite.org/cn/#sec-4-1 . I think we could improve the
consistency of CSL style output between implementations by writing
better documentation on input expectations. This covers:

date formats. Bruce seems to be a big proponent of adopting the
Extended Date/Time Format as much as possible (EDTF, see
Extended Date Time Format (EDTF) Specification (Library of Congress) )

name parsing (e.g. extracting non-dropping and dropping particles
from two-field names)

Name parsing should strictly be optional; I’ve written a name parser for citeproc-ruby to deal with single-field names, but because of language / cultural differences this can quickly become infeasible. In some languages, for example, even word segmentation alone is a hard problem. I’ve gone to great lengths to support names passed in a single field; IIRC Frank even handles Japanese names specifically – but I would be careful to make it mandatory for CSL processors to implement such features as it sets the bar pretty high.

field assignments (which item type should, or shouldn’t, have
which fields). Aurimas prepared a map for Zotero:
http://aurimasv.github.com/z2csl/typeMap.xml . It would be great if we
could standardize the fields exposed to the CSL processors among the
different reference managers (on a per item type basis). My hope is
that we can clean up Zotero’s metadata model in the coming year
(tickets as compiled by the Zotero user community can be found at
Issues · zotero/zotero-bits · GitHub ), and offer the result
as a guideline for other reference managers.

the JSON schemas. These could use some documentation.

I would agree that the data/input format should receive more attention in the future.

A few months ago, I recorded a some observations about the format here:

One of the things I’d really like to see is an improved CSL test
suite. Sylvester posted a Cucumber mockup format a while back (
citeproc-ruby/features/condition/is_numeric.feature at 1c420de0f7a86b7c35782dee86ce62cbebb47ab9 · inukshuk/citeproc-ruby · GitHub
). I never had much success adding styles to the current test suite
setup, and I really would like to be able to categorize tests (e.g. on
CSL version, and on the CSL feature that is being tested). If the
infrastructure is there, I wouldn’t mind annotating the existing tests
by hand.

I’m still working on the rewrite of citeproc-ruby; I’ve made good progress over the summer and I’m currently able to test individual rendering elements in isolation. Once I move on to the point where I can work with integration and acceptance type of tests, I intent to continue the work in the csl-test-suite package. The plan, right now, is to use Cucumber, because I think it offers a number of features that we need, but make it a priority for the tests to be convertible to the current JSON format so that other implementations can use the tests without having to make any changes.

Sylvester

signature.asc (203 Bytes)

Rintze_Zelle · September 11, 2012, 1:24pm

Name parsing should strictly be optional; I’ve written a name parser for citeproc-ruby to deal with single-field names, but because of language / cultural differences this can quickly become infeasible. In some languages, for example, even word segmentation alone is a hard problem. I’ve gone to great lengths to support names passed in a single field; IIRC Frank even handles Japanese names specifically – but I would be careful to make it mandatory for CSL processors to implement such features as it sets the bar pretty high.

Agreed that parsing of unstructured data (dates, names) should be
optional. (the closest the spec comes in discussion unstructured data
is in its description of the is-numeric conditional)

the JSON schemas. These could use some documentation.

I would agree that the data/input format should receive more attention in the future.

A few months ago, I recorded a some observations about the format here:
Processor input (JSON) · citation-style-language/schema Wiki · GitHub

Jakob Voss posted some comments a while back, too:

github.com/citation-style-language/schema

Clearly define CSL input format

opened 04:28PM - 19 Mar 11 UTC

closed 01:55PM - 13 Aug 20 UTC

bdarcus

1.1

The CSL input format is described in the CSL specification (the CSL data model),… in the citeproc-js documentation (the CSL/JSON variant) and in the RELAX NG file (the CSL/XML variant). All should be losslessly mappable to each other but there are some differences. 1. The non-dropping-particle from citeproc-js is called prefix in csl-data.rnc - this is not a problem but may be confusing. 2. In csl-data.rnc a reference is identified by an URI but in citeproc-js by a simple string of any form. 3. I could not find the 'container-uri' property from csl-data.rnc in citeproc-js 4. csl-data.rnc seems to allow multiple dates of the same type and dates with different type in start-date and end-date. 5. In csl-data.rnc the value-space of a year is xsd:gYear but in citeproc-js it is a non-zero integer. 6. In csl-data.rnc the value-space of month and day is 00..99 instead o 1..12 and 1..31. 7. In csl-data.rnc a date may have a day but no month at the same time. 8. In csl-data.rnc a season cannot be of any literal form like in citeproc-js 9. In citeproc-js the season is a property of the whole date variable but in csl-data.rnc it applies to a start and/or an end-date 10. In citeproc-js the circa flag is a property of a whole date variable but in csl-data.rnc the circa flag applies to a start-date and/or an end-date. 11. It is not clear how open-ended date ranges are encoded in csl-data.rnc 12. In citeproc.js the list of allowed markup tags in rich text variables is i, b, sup, sup, sc, span@nocase, span@nodecor, ", ' plus locale quotes plus the tags can be nested (?) but in csl-data.rnc the tags can not be nested, there are no local quotes and the list of tags is: ``` i, b, sup, sub, abbr, cite, cite@part, span, span@protect ``` Citeproc-js almost complies to the specification but point 8 (and maybe 9). Point 10,11 and maybe 2 can be solved by better documentation. The other points require a refactoring of csl-data.rnc --- - Bitbucket: https://bitbucket.org/bdarcus/csl-schema/issue/22 - Originally Reported By: Jakob Voss - Originally Created At: 2010-06-22 00:29:20

id should be optional in Embedded Citation Object Format · Issue #70 · citation-style-language/schema · GitHub refers to
the input format as well.
(and, somewhat related,
Rename variables, types and terms to use consistent punctuation · Issue #6 · citation-style-language/schema · GitHub )

One of the things I’d really like to see is an improved CSL test
suite. Sylvester posted a Cucumber mockup format a while back (
citeproc-ruby/features/condition/is_numeric.feature at 1c420de0f7a86b7c35782dee86ce62cbebb47ab9 · inukshuk/citeproc-ruby · GitHub
). I never had much success adding styles to the current test suite
setup, and I really would like to be able to categorize tests (e.g. on
CSL version, and on the CSL feature that is being tested). If the
infrastructure is there, I wouldn’t mind annotating the existing tests
by hand.

I’m still working on the rewrite of citeproc-ruby; I’ve made good progress over the summer and I’m currently able to test individual rendering elements in isolation. Once I move on to the point where I can work with integration and acceptance type of tests, I intent to continue the work in the csl-test-suite package. The plan, right now, is to use Cucumber, because I think it offers a number of features that we need, but make it a priority for the tests to be convertible to the current JSON format so that other implementations can use the tests without having to make any changes.

Sounds great.

Rintze

Topic		Replies	Views
plans moving forward? CSL Development	24	375	May 25, 2010
citing lower-federal, state, and local court decisions CSL Development	63	731	April 9, 2009
CSL 2016 - second instalment (specification updates) CSL Development	8	734	December 5, 2015
Upcoming CSL meetup context CSL Development	26	1512	December 26, 2023
xbiblio-devel Digest, Vol 118, Issue 1 CSL Development	2	537	June 1, 2017

CSL roadmap

Related topics