CSL roadmap

Hi all,

Now that CSL 1.0.1 is out the door, I would like to inquire what
people think should be on the CSL roadmap for the coming year or so,
and what should have the highest priority. In particular, I’m curious
about the things people would like me to work on. Please speak up!

My own thoughts:

  • I’m relatively happy with the workflow for accepting style patches.
    Having Travic CI check each pull request, and having the ability to
    run the same tests locally
    (https://github.com/citation-style-language/styles/wiki/Test-Environment)
    has made it much easier to keep errors out. We now also rely much more
    on the style authors themselves to fix style errors. Big kudos to
    Sylvester for setting up the tests, and thanks to Sebastian and
    Charles for helping out with adding styles! I just hope CSL doesn’t
    become too popular :slight_smile: (the current workload is still okay).
  • I don’t think we need to rush with trying to incorporate Frank’s MLZ
    extensions to CSL into official CSL. MLZ will be a nice testing
    ground, and I rather wait a little to see how the MLZ styles perform
    in practice.
  • Several sites have started offering citeproc-JSON: see e.g.
    http://blog.bibsonomy.org/2012/07/feature-of-week-citation-style-language.html
    , http://www.doi.org/doi_handbook/5_Applications.html#5.4.1 and
    http://crosscite.org/cn/#sec-4-1 . I think we could improve the
    consistency of CSL style output between implementations by writing
    better documentation on input expectations. This covers:
    • date formats. Bruce seems to be a big proponent of adopting the
      Extended Date/Time Format as much as possible (EDTF, see
      http://www.loc.gov/standards/datetime/ )
    • name parsing (e.g. extracting non-dropping and dropping particles
      from two-field names)
    • field assignments (which item type should, or shouldn’t, have
      which fields). Aurimas prepared a map for Zotero:
      http://aurimasv.github.com/z2csl/typeMap.xml . It would be great if we
      could standardize the fields exposed to the CSL processors among the
      different reference managers (on a per item type basis). My hope is
      that we can clean up Zotero’s metadata model in the coming year
      (tickets as compiled by the Zotero user community can be found at
      https://github.com/ajlyon/zotero-bits/issues ), and offer the result
      as a guideline for other reference managers.
    • the JSON schemas. These could use some documentation.
  • One of the things I’d really like to see is an improved CSL test
    suite. Sylvester posted a Cucumber mockup format a while back (
    https://github.com/inukshuk/citeproc-ruby/blob/1c420de0f7a86b7c35782dee86ce62cbebb47ab9/features/condition/is_numeric.feature
    ). I never had much success adding styles to the current test suite
    setup, and I really would like to be able to categorize tests (e.g. on
    CSL version, and on the CSL feature that is being tested). If the
    infrastructure is there, I wouldn’t mind annotating the existing tests
    by hand.
  • There are still quite a few open CSL tickets (
    https://github.com/citation-style-language/schema/issues ). Some have
    solutions that require a backwards incompatible release (i.e. CSL 1.1)
    and style upgrades, while other tickets have stalled due to the
    absence of good ideas or due to disagreement. I don’t expect too much
    progress here, unless people step in and reboot the discussions.
  • I agree with Bruce that we might want to revisit our release
    strategy. Going 2.5 years between releases is a bit long. I would
    favor slightly more frequent releases (e.g. 1 per year), with
    agreement on what can and what cannot end up in a 1.0.x release. We
    could also allow certain new backward compatible features as soon as
    they are approved and continuously update the spec between formal
    releases.

Best,

Rintze

First, congratulations on the - as it looks so far - quite smooth release.

  • I’m relatively happy with the workflow for accepting style patches.
    Having Travic CI check each pull request, and having the ability to
    run the same tests locally
    (https://github.com/citation-style-language/styles/wiki/Test-Environment)
    has made it much easier to keep errors out. We now also rely much more
    on the style authors themselves to fix style errors. Big kudos to
    Sylvester for setting up the tests, and thanks to Sebastian and
    Charles for helping out with adding styles! I just hope CSL doesn’t
    become too popular :slight_smile: (the current workload is still okay).

agreed - I’m a little concerned what will happen when the visual
editor goes live for real. While it writes pretty clean code, it
doesn’t validate and it does some odd things still.

  • I don’t think we need to rush with trying to incorporate Frank’s MLZ
    extensions to CSL into official CSL. MLZ will be a nice testing
    ground, and I rather wait a little to see how the MLZ styles perform
    in practice.
    agreed again - a lot of MLZ functionality already works with CSL once
    the respective fields are added to Zotero et al.

(…)

  • field assignments (which item type should, or shouldn’t, have
    which fields). Aurimas prepared a map for Zotero:
    http://aurimasv.github.com/z2csl/typeMap.xml . It would be great if we
    could standardize the fields exposed to the CSL processors among the
    different reference managers (on a per item type basis). My hope is
    that we can clean up Zotero’s metadata model in the coming year
    (tickets as compiled by the Zotero user community can be found at
    https://github.com/ajlyon/zotero-bits/issues ), and offer the result
    as a guideline for other reference managers.
    This is, imho, really the top priority - with different fields/item
    we’ll get inconsistent outputs - cf. e.g. publishers for journals.
    Happy to help work on this.

(…)

  • There are still quite a few open CSL tickets (
    https://github.com/citation-style-language/schema/issues ). Some have
    solutions that require a backwards incompatible release (i.e. CSL 1.1)
    and style upgrades, while other tickets have stalled due to the
    absence of good ideas or due to disagreement. I don’t expect too much
    progress here, unless people step in and reboot the discussions.

I think some of them are pretty important. My top two are:


The first one is pretty major, but I’d really like to see 94 in one of
the smaller releases.--------
Sebastian Karcher
Ph.D. Candidate
Department of Political Science
Northwestern University

  • field assignments (which item type should, or shouldn’t, have
    which fields). Aurimas prepared a map for Zotero:
    http://aurimasv.github.com/z2csl/typeMap.xml . It would be great if we
    could standardize the fields exposed to the CSL processors among the
    different reference managers (on a per item type basis). My hope is
    that we can clean up Zotero’s metadata model in the coming year
    (tickets as compiled by the Zotero user community can be found at
    https://github.com/ajlyon/zotero-bits/issues ), and offer the result
    as a guideline for other reference managers.
    This is, imho, really the top priority - with different fields/item
    we’ll get inconsistent outputs - cf. e.g. publishers for journals.
    Happy to help work on this.

The ball for this is really in the court of the Zotero team, though. I
don’t think you, me, Avram, Grégoire (Gracile) and others can do much
more prep work for Simon and Dan, and I’m pretty sure we’re all
available to give input once they start work on it. But it seems to be
a few months out at least (post-November).

  • There are still quite a few open CSL tickets (
    https://github.com/citation-style-language/schema/issues ). Some have
    solutions that require a backwards incompatible release (i.e. CSL 1.1)
    and style upgrades, while other tickets have stalled due to the
    absence of good ideas or due to disagreement. I don’t expect too much
    progress here, unless people step in and reboot the discussions.

I think some of them are pretty important. My top two are:
https://github.com/citation-style-language/schema/issues/36
https://github.com/citation-style-language/schema/issues/94

The first one is pretty major, but I’d really like to see 94 in one of
the smaller releases.

I’m actually hoping that we might be able to come up with a consistent
solution that addresses both custom fields, custom locator labels
(https://github.com/citation-style-language/schema/issues/94), and
custom identifiers
(https://github.com/citation-style-language/schema/issues/33).

A recent thought of mine was to create some sort of distinct
name-space for custom variables, e.g.:

That way you would have complete freedom to define custom variable
names, and there wouldn’t be conflicts with the ‘core’ set of official
CSL variables.

Regardless of the particular solution chosen, it might help if we make
a fresh start with the entire discussion. I think nobody will dispute
the usefulness of having custom fields, but I’m very keen to learn, in
the precisest terms possible, what implications Bruce, Dan, Simon, and
others foresee when it comes to custom variables (e.g. with regards to
syncing). That would help me understand what the considerations should
be when thinking of solutions. (a pointer would suffice as well, if
this already has been discussed)

Rintze

To stress my point on the need to discuss this in detail, I think that
the representation of custom fields in CSL styles isn’t nearly as
problematic an issue as coming to agreement regarding surrounding
issues like syncing (where I’m not an expert) and metadata storage.

Rintze

  • field assignments (which item type should, or shouldn’t, have
    which fields). Aurimas prepared a map for Zotero:
    http://aurimasv.github.com/z2csl/typeMap.xml . It would be great if we
    could standardize the fields exposed to the CSL processors among the
    different reference managers (on a per item type basis). My hope is
    that we can clean up Zotero’s metadata model in the coming year
    (tickets as compiled by the Zotero user community can be found at
    https://github.com/ajlyon/zotero-bits/issues ), and offer the result
    as a guideline for other reference managers.
    This is, imho, really the top priority - with different fields/item
    we’ll get inconsistent outputs - cf. e.g. publishers for journals.
    Happy to help work on this.

The ball for this is really in the court of the Zotero team, though. I
don’t think you, me, Avram, Grégoire (Gracile) and others can do much
more prep work for Simon and Dan, and I’m pretty sure we’re all
available to give input once they start work on it. But it seems to be
a few months out at least (post-November).

  • There are still quite a few open CSL tickets (
    https://github.com/citation-style-language/schema/issues ). Some have
    solutions that require a backwards incompatible release (i.e. CSL 1.1)
    and style upgrades, while other tickets have stalled due to the
    absence of good ideas or due to disagreement. I don’t expect too much
    progress here, unless people step in and reboot the discussions.

I think some of them are pretty important. My top two are:
https://github.com/citation-style-language/schema/issues/36
https://github.com/citation-style-language/schema/issues/94

The first one is pretty major, but I’d really like to see 94 in one of
the smaller releases.

I’m actually hoping that we might be able to come up with a consistent
solution that addresses both custom fields, custom locator labels
(https://github.com/citation-style-language/schema/issues/94), and
custom identifiers
(https://github.com/citation-style-language/schema/issues/33).

A recent thought of mine was to create some sort of distinct
name-space for custom variables, e.g.:

That way you would have complete freedom to define custom variable
names, and there wouldn’t be conflicts with the ‘core’ set of official
CSL variables.

Regardless of the particular solution chosen, it might help if we make
a fresh start with the entire discussion. I think nobody will dispute
the usefulness of having custom fields,

I would :slight_smile:

In short, and in general, we need to put a high premium on
interoperablity of data and styles.

I don’t have time to get into this in depth ATM, but we probably need
discussion of this and the previous concerns I’ve raised about
managing change in the CSL schema and spec over time.

Bruce

  • date formats. Bruce seems to be a big proponent of adopting the
    Extended Date/Time Format as much as possible (EDTF, see
    http://www.loc.gov/standards/datetime/ )
  • name parsing (e.g. extracting non-dropping and dropping particles
    from two-field names)

Name parsing should strictly be optional; I’ve written a name parser for citeproc-ruby to deal with single-field names, but because of language / cultural differences this can quickly become infeasible. In some languages, for example, even word segmentation alone is a hard problem. I’ve gone to great lengths to support names passed in a single field; IIRC Frank even handles Japanese names specifically – but I would be careful to make it mandatory for CSL processors to implement such features as it sets the bar pretty high.

  • field assignments (which item type should, or shouldn’t, have
    which fields). Aurimas prepared a map for Zotero:
    http://aurimasv.github.com/z2csl/typeMap.xml . It would be great if we
    could standardize the fields exposed to the CSL processors among the
    different reference managers (on a per item type basis). My hope is
    that we can clean up Zotero’s metadata model in the coming year
    (tickets as compiled by the Zotero user community can be found at
    https://github.com/ajlyon/zotero-bits/issues ), and offer the result
    as a guideline for other reference managers.
  • the JSON schemas. These could use some documentation.

I would agree that the data/input format should receive more attention in the future.

A few months ago, I recorded a some observations about the format here:

I’m still working on the rewrite of citeproc-ruby; I’ve made good progress over the summer and I’m currently able to test individual rendering elements in isolation. Once I move on to the point where I can work with integration and acceptance type of tests, I intent to continue the work in the csl-test-suite package. The plan, right now, is to use Cucumber, because I think it offers a number of features that we need, but make it a priority for the tests to be convertible to the current JSON format so that other implementations can use the tests without having to make any changes.

Sylvester

signature.asc (203 Bytes)

Name parsing should strictly be optional; I’ve written a name parser for citeproc-ruby to deal with single-field names, but because of language / cultural differences this can quickly become infeasible. In some languages, for example, even word segmentation alone is a hard problem. I’ve gone to great lengths to support names passed in a single field; IIRC Frank even handles Japanese names specifically – but I would be careful to make it mandatory for CSL processors to implement such features as it sets the bar pretty high.

Agreed that parsing of unstructured data (dates, names) should be
optional. (the closest the spec comes in discussion unstructured data
is in its description of the is-numeric conditional)

  • the JSON schemas. These could use some documentation.

I would agree that the data/input format should receive more attention in the future.

A few months ago, I recorded a some observations about the format here:
https://github.com/citation-style-language/schema/wiki/Processor-input-(JSON)

Jakob Voss posted some comments a while back, too:


https://github.com/citation-style-language/schema/issues/70 refers to
the input format as well.
(and, somewhat related,
https://github.com/citation-style-language/schema/issues/6 )

I’m still working on the rewrite of citeproc-ruby; I’ve made good progress over the summer and I’m currently able to test individual rendering elements in isolation. Once I move on to the point where I can work with integration and acceptance type of tests, I intent to continue the work in the csl-test-suite package. The plan, right now, is to use Cucumber, because I think it offers a number of features that we need, but make it a priority for the tests to be convertible to the current JSON format so that other implementations can use the tests without having to make any changes.

Sounds great.

Rintze