PKP conference call

Hi,

last Thursday, Rintze, Frank, and I had a conference call with Alex
Garnett and Juan Pablo Alperin of the Public Knowledge Project
http://pkp.sfu.ca .
We wanted to explore if (and if so how) CSL could find an
institutional host at the PKP and what that would entail. Generally
the conversation was very positive, the PKP folks know CSL and
actually have started using it in one of their projects. They seemed
quite positive about the general prospect of providing a home to CSL.
They don’t have much in terms of developer time to offer, but said
that short term some advice and time for grant writing would be
possible. They said they would want to be included in some way in the
CSL decision-making process, though more in terms of knowing what’s
going on than to influence decisions (we did describe said process as
open and consensus-based, which they seemed fine with). As for grants,
as other have said, they said that it’s basically impossible to get
grants to cover day-to-day operations. Grant institutions want to fund
something specific and new, so we’d have to think about that. Rintze
and I came up with three areas on the spot:

  1. Specifications - while the syntax is well specified, all the little
    things like eliminating double spaces/punctuation etc. that the
    processors do (or not) isn’t. It should be
  2. Legal CSL - incorporating Frank’s modification for legal support
  3. Other CSL 1.1/2.0 developments including field updates, potential
    multilingual improvements etc.
    Perhaps the biggest concern in all of this is that Rintze and I don’t
    see how this is going to reduce our work (which, after all, was one of
    the original reasons we started talking about this).

I’ll send a separate e-mail tomorrow with a brief proposal on the
framework for a PKP-CSL partnership, but wanted to get this out there
for both information and discussion.
Best,
Sebastian–
Sebastian Karcher
Ph.D. Candidate
Department of Political Science
Northwestern University

That’s a big one.

I had an idea today, though, that might catch both objectives. Here’s
the pitch. I think it’s as original as casting the CSL editor. See
what you think about the idea, though.On Sat, Apr 20, 2013 at 4:58 PM, Sebastian Karcher <@Sebastian_Karcher> wrote:

Hi,

last Thursday, Rintze, Frank, and I had a conference call with Alex
Garnett and Juan Pablo Alperin of the Public Knowledge Project
http://pkp.sfu.ca .
We wanted to explore if (and if so how) CSL could find an
institutional host at the PKP and what that would entail. Generally
the conversation was very positive, the PKP folks know CSL and
actually have started using it in one of their projects. They seemed
quite positive about the general prospect of providing a home to CSL.
They don’t have much in terms of developer time to offer, but said
that short term some advice and time for grant writing would be
possible. They said they would want to be included in some way in the
CSL decision-making process, though more in terms of knowing what’s
going on than to influence decisions (we did describe said process as
open and consensus-based, which they seemed fine with). As for grants,
as other have said, they said that it’s basically impossible to get
grants to cover day-to-day operations. Grant institutions want to fund
something specific and new, so we’d have to think about that. Rintze
and I came up with three areas on the spot:

  1. Specifications - while the syntax is well specified, all the little
    things like eliminating double spaces/punctuation etc. that the
    processors do (or not) isn’t. It should be
  2. Legal CSL - incorporating Frank’s modification for legal support
  3. Other CSL 1.1/2.0 developments including field updates, potential
    multilingual improvements etc.
    Perhaps the biggest concern in all of this is that Rintze and I don’t
    see how this is going to reduce our work (which, after all, was one of
    the original reasons we started talking about this).

CSL is a carefully designed language. The potential for CSL to become
a de facto standard for defining and automating document referencing
formats has been proven through performance: several implementations
of the language are running in the wild, and user-contributed styles
have brought the CSL Style Repository to 800+ styles covering 4000+
journals. Major projects, including Mendeley, Papers and Zotero rely
upon the language to serve a large user community, many working in
research or at the PhD level.

In the community’s drive to satisfy user needs, the focus has been on
individual styles. This has spread attention across an expanding
codebase, slowing efforts to refine and improve styles across the
archive as a whole.

This challenge can be addressed by drawing upon a latent potential for
modularity in CSL that has not heretofore played a part in style
maintenance and distribution. At the most basic level, CSL cleanly
separates four elements of style design:

  • Citation formats
  • Citation format parameters
  • Bibliography formats
  • Bibliography format parameters

Although each style in the CSL Style Repository is currently stored as
an atomic unit, each is composed of these four elements, and they can
easily be separated and remixed, resulting in a smaller base of code,
higher quality in many styles, and potential for more rapid coverage
of remaining publisher and university styles. There is deeper
potential for modularity in CSL (through a shared macro library).
Implementing this simple modular break-out in the current repository
infrastructure will make it possible to explore those avenues in
future.

Moving to a modular archive design would require the following:

  • Style-level test suites to confirm current style behaviour;
  • Tools for breaking out the current code base:
    • Separating current styles into citation-format and
      bibliography-format elements for separate validation;
    • Extracting and storing bibliography and citation format IDs and
      parameters on a per-style basis.
  • Tools for exploring commonalities between citation and
    bibliography formats, and merging IDs;
  • A middle layer for recombining styles from modular code and
    testing the result.

For simplicity, this back-office functionality should be masked from
users and style designers, who understand CSL styles (either when
using the CSL editor, or when directly editing style XML) as
integrated units. Accordingly, archive modularisation should be
accompanied by a maintenance layer performing two functions:

  • Automated pre-flight checks for schema validity and correct and
    complete style metadata;
  • Arbitration with the modular repo back-end, with heuristic
    identification and merger of citation and bibliography formats; and
  • User-facing and maintainer-facing UI to drive these facilities.

Frank

First, thanks to those of you that attended the call. I’ve kind of been
slammed this term.

Second, can we step back a bit and talk high-level vision? So, for example

Frank, from a user perspective, what sorts of scenarios would your proposal
enable?

Make it much easier for style editors to manage style additions and
changes? So much so that it would open up style editing to a much wider
range of users?

Something else?

BruceOn Sat, Apr 20, 2013 at 6:07 AM, Frank Bennett <@Frank_Bennett>wrote:

First, thanks to those of you that attended the call. I’ve kind of been
slammed this term.

Second, can we step back a bit and talk high-level vision? So, for example

Frank, from a user perspective, what sorts of scenarios would your proposal
enable?

Pre-flight vetting of submissions could be automated at the first
stage. The idea would be to provide something similar to Amazon
CreateSpace (but for freely distributed styles, of course):

https://www.createspace.com/Help/Index.jsp?cid=02n70000000DfLw&orgId=00D300000001Sh9

(Scroll down to the link “What are the book setup steps?”)

Make it much easier for style editors to manage style additions and changes?
So much so that it would open up style editing to a much wider range of
users?

The focus is on pruning and curating code in the repository, to reduce
the burden of maintaining what’s there. A more compact code base and
an automated workflow for managing submissions would make it possible
to broaden the circle of maintainers. That seems to be a critical
objective at the moment.

The proposal is just a thought, though, as a possible
research-fundraising-friendly path to automated submission pre-flight.

Hi,

First, thanks to those of you that attended the call. I’ve kind of been
slammed this term.

Second, can we step back a bit and talk high-level vision? So, for example

Frank, from a user perspective, what sorts of scenarios would your proposal
enable?

Pre-flight vetting of submissions could be automated at the first
stage. The idea would be to provide something similar to Amazon
CreateSpace (but for freely distributed styles, of course):

Also, and to help to validate styles, something that could be done is
to have a webpage where the user could write the metadata, the
expected output and then compare with the real output.

Use case that I’ll face soon: Elsevier has 6 different styles (afair,
I don’t have the documentation handy). We (CSL project) have these 6
different styles. But I want to make sure that we generate what
Elsevier’s guidelines says.
This could be done on the submission of the style.

I’m always worried about the extra-work that may add for the users
that submit styles. But indeed, we need some way to make the
maintenance easier.

First, thanks to those of you that attended the call. I’ve kind of been
slammed this term.

Second, can we step back a bit and talk high-level vision? So, for example

Frank, from a user perspective, what sorts of scenarios would your proposal
enable?

Pre-flight vetting of submissions could be automated at the first
stage. The idea would be to provide something similar to Amazon
CreateSpace (but for freely distributed styles, of course):

https://www.createspace.com/Help/Index.jsp?cid=02n70000000DfLw&orgId=00D300000001Sh9

(Scroll down to the link “What are the book setup steps?”)

For whatever reason, I can’t get this page to work.

Make it much easier for style editors to manage style additions and changes?
So much so that it would open up style editing to a much wider range of
users?

The focus is on pruning and curating code in the repository, to reduce
the burden of maintaining what’s there. A more compact code base and
an automated workflow for managing submissions would make it possible
to broaden the circle of maintainers. That seems to be a critical
objective at the moment.

That’s right. I’m just asking us at this point to be very explicit
about the subjects (e.g. who) we are implicitly talking about.

I think we basically need to build the technical and other
foundations, step-by-step, so that Rintze and Sebastian can step away
from this work and a) the quality of the styles remains very high, and
b) the number of styles continues to grow.

So we have a few different user roles:

  1. “style editors” (people who, for whatever reason, take on a role of
    responsibility for maintaining the evolution of a particular style for
    a wider community of end users)
  2. “style users” (aka “authors”? people who write academic
    manuscripts, using different software, and simply want the bib and
    citation formatting to “just work”)
  3. “developers” (people who might take the product of different
    projects and piece them together into something else)?

The proposal is just a thought, though, as a possible
research-fundraising-friendly path to automated submission pre-flight.

I think it sounds good; I just didn’t entirely understand it :slight_smile:

It also occurs to me that there are different paths forward, which are
not mutually exclusive:

  1. something like Google SOC, which Sylvester mentioned, and has
    participated in; seems good for pretty focused, practical, projects
    (or subprojects)

  2. Foundational grants like that which has funded things like Zotero,
    and the CSL editor.

My point about “institutional home” probably applies more to #2, and
is really just thinking about whose names go on the application, and
how it gets managed. Sort of how Mendeley paired with Columbia.

I, for example, could do it through my institution, but it’s a bit
awkward given my position within my institution (that I’m expected to
get research grants for work that has nothing to do with technology).

Bruce

I strongly support something like this. It can be a well-defined
project, suitable for a grant, and can be implemented without too much
work.

The current (preferred) workflow for users to contribute styles is this:

  1. the user edits a style, either by hand or via the Visual Editor
  2. wanting to contribute the style, the user navigates to the styles
    repository and finds
    https://github.com/citation-style-language/styles/blob/master/CONTRIBUTING.md
  3. based on the “contributing” instructions, the user makes sure the
    style is valid CSL
    (https://github.com/citation-style-language/styles/wiki/Validation)
  4. based on the “contributing” instructions, the user makes sure the
    style follows the additional requirements we have for repository
    styles (https://github.com/citation-style-language/styles/wiki/Style-Requirements)
  5. the user creates a pull request

I tried to make our instructions as clear as possible, but while many
users manage to create pull requests, a significant fraction of those
have problems. Many pull requests have an incorrect file name (if a
style lacks a .csl extension, Travis-CI currently doesn’t recognize
them), are invalid CSL (or even XML), or don’t follow our guidelines
for the style metadata. I rather not accept pull requests that fail in
Travis. Instead, we ask users to fix up their pull requests, which
often requires detailed instructions (see e.g.


).

I think a pre-flight tool would help with steps 3 and 4. The tool
could be standalone, or bolted onto the Visual Editor. I would still
like to have users create GitHub pull requests themselves for now.
Having a user register with GitHub and create a pull requests gives us
an easy way to publicly communicate with the user. But what the tool
could do is:

  1. allow the user to copy/paste or upload a CSL style, or import it
    directly from the Visual Editor
  2. allow the user to validate the style (e.g. by incorporating
    http://simonster.github.io/csl-validator.js/ )
  3. assist the user with completing style metadata via a wizard-like
    interface. E.g. we could ask the user questions, and generate the
    required metadata from the responses (“Is this a style for a
    journal?”, “Does the journal publish in a single language?”, “Can you
    find the print and online ISSNs of the journal?”, “Is this citation
    style described online?”, etc.
  4. pretty-print the style
  5. after completing the style, give instructions on how to submit the
    style via pull request

The tool could also cover the creation of dependent styles, in which
case steps 1 and 2 would be skipped.

Rintze

I agree - should I write that as our main goal into the write-up for PKP?

For the short-term, sure.

One more thing: we also need a good way to help users figure out
whether they need to create a new style, or whether a dependent would
suffice. The Visual Editor allows users to search for the desired
style output, but for journals published by publishers that use the
same citation style for multiple journals (especially those for which
we have bulk metadata
(https://github.com/citation-style-language/utilities/tree/master/generate_dependent_styles),
we could also simply provide a few pointers, e.g.:

  1. Do you wish to create a new style for a journal?
  2. If so, is the journal published by any of the following
    publishers?: Elsevier, Springer, BioMed Central, etc.
  3. If e.g. Elsevier, does the style use “In:” or “in:” for cited book
    chapters. If “In:”, we need a dependent to “elsevier-harvard2”; if
    "in:", “elsevier-harvard”.

Of course, we should figure out how this ties into maintenance of our
bulk metadata set.

Rintze

Apologies - this has taken me forever,
here’s a first draft for a PKP letter, comments are enabled. I won’t be
back on a computer before Saturday night/Sunday morning.
SebastianOn Mon, Apr 22, 2013 at 3:14 PM, Rintze Zelle <@Rintze_Zelle>wrote:

ugh here’s the letter: