new blog post on citationstyles.org editor/repo vision

Fleshing out some previously discussed ideas:

http://community.muohio.edu/blogs/darcusb/archives/2010/08/19/vision-for-citationstylesorg

Bruce

Good post Bruce.

As mentioned we are transitioning to CSL v1.0 for our next release.

We have decided that shipping hundreds of styles with our client is a
waste for most users. We are moving to the fo following model:

  • ship with the top x number of styles (10, 15, or 20)
  • provide an interface to allow users to search a catalog for extra
    styles, and download them when needed. I believe that we will be
    driving suggestions on search results based on styles that get most
    usage. We have a lot of internal stats on this, but it would be great
    to get stats on usage form the wider community too.

We have built an internal catalog of styles to support this.

Regarding the style editor, from the Mendeley end we run up against
exactly those users who fall into both of the categories that you
describe as people who want to create styles, so at some point we are
going to host a version of the editor in our client. New styles are
going to be pushed back into our catalog.

I want us to be able to sync our internal catalog with a publicly
hosted catalog that exists on citationstyles.org. In that way I want
to be able to enable any good styles that get created within the
Mendeley ecosystem to get pushed out to the larger community.

You mention in your blog post that you don’t have much experience in
hosting these kinds of things. What do we need to do to make this
happen?

I’ll throw out a few options, to get the conversation started:

  • Build a citationsstyles.org repository on top of a hosting service
    like google app engine (free and moderately scalable, but restricted
    in the kind of technology available)

  • Host it on Appengine, and look to support it by setting up a CSL
    foundation with subscriptions from partners (cheapish, but not free,
    and more open with the kind of tools that could be used)

  • Have it hosted by some university server (I don’t like this idea,
    it’s not scalable in the very long term)

  • Have a 3rd party like Mendeley host it (to be honest, I don’t like
    this idea at all, as I believe that it needs to be in some sense an
    independent group that hosts the repository)

  • Create a peer to peer protocol for discovery and sharing of
    citations, and just allow them all to live on some Bit-Torrent like
    system (again this is a bad idea)

Other ideas?

We can support the creation of such a repository by contributing
developer time.

  • Ian

Hi Ian,

Good post Bruce.

As mentioned we are transitioning to CSL v1.0 for our next release.

We have decided that shipping hundreds of styles with our client is a
waste for most users. We are moving to the fo following model:

  • ship with the top x number of styles (10, 15, or 20)
  • provide an interface to allow users to search a catalog for extra
    styles, and download them when needed.

A suggestion about a subtle, but important, detail: don’t use the verb
"download." Something like “activate” might be more appropriate. The
idea is that these styles get updated as they change. So users
shouldn’t be concerned where the style is (“on the net” or
"downloaded").

I believe that we will be
driving suggestions on search results based on styles that get most
usage. We have a lot of internal stats on this, but it would be great
to get stats on usage form the wider community too.

We have built an internal catalog of styles to support this.

Regarding the style editor, from the Mendeley end we run up against
exactly those users who fall into both of the categories that you
describe as people who want to create styles, so at some point we are
going to host a version of the editor in our client. New styles are
going to be pushed back into our catalog.

OK. It’d be great if you could start with the web-hosted-only approach
and see how that goes. Users may at some point like the idea of being
a part of a larger project.

I want us to be able to sync our internal catalog with a publicly
hosted catalog that exists on citationstyles.org. In that way I want
to be able to enable any good styles that get created within the
Mendeley ecosystem to get pushed out to the larger community.

You mention in your blog post that you don’t have much experience in
hosting these kinds of things. What do we need to do to make this
happen?

I was also meaning development. I don’t do PHP, and am not much of a
coder anyway.

I’ll throw out a few options, to get the conversation started:

  • Build a citationsstyles.org repository on top of a hosting service
    like google app engine (free and moderately scalable, but restricted
    in the kind of technology available)

  • Host it on Appengine, and look to support it by setting up a CSL
    foundation with subscriptions from partners (cheapish, but not free,
    and more open with the kind of tools that could be used)

  • Have it hosted by some university server (I don’t like this idea,
    it’s not scalable in the very long term)

  • Have a 3rd party like Mendeley host it (to be honest, I don’t like
    this idea at all, as I believe that it needs to be in some sense an
    independent group that hosts the repository)

  • Create a peer to peer protocol for discovery and sharing of
    citations, and just allow them all to live on some Bit-Torrent like
    system (again this is a bad idea)

Other ideas?

I like your thinking. A few thoughts, and pieces of information …

First, the citationstyles.org site is currently hosted at CNMH, and so
is hosted by the same people that run zotero.org. This is effectively
your third option, except that the CNMH obviously has experience with
scaling (which also relies on Amazon storage, BTW).

Second, I have thought about the foundation idea as well, where
commercial partners pay some modest fee, but then effectively
advertize through the site. I guess since you came upon it
independently as well, it might actually be a good idea :slight_smile:

But I worry a bit about the amount of (legal) work that a foundation
would take to set up. Would people here consider trying to work this
into an already-established foundation, like the Corporation for
Digital Scholarship that was recently setup to deal with projects at
CNMH? I’m not sure how possible this is, but just thought I’d float
the idea to see if it’s worth further discussion.

The P2P idea makes a lot of sense, and has been something I’ve thought
about from the beginning (well, really more modestly, of being able to
"subscribe" to different collections, at different repositories,
perhaps make use of Atom). Is that perhaps too ambitious to do as a
first cut? I don’t know. My main focus is enabling use cases like I
mention in that post.

We can support the creation of such a repository by contributing
developer time.

Awesome; that’s what I want to hear :wink:

Two things I’m wondering about:

  • Vandalism: how can we prevent that users, either maliciously or
    accidentally, break styles? This would range from people trying to spam the
    system to people misunderstanding style guides and introducing incorrect
    changes.
  • Different versions of CSL: I don’t think we can get around the need to
    host different CSL versions of styles (v. 0.8, v 1.0, etc) to support
    applications that use CSL but haven’t adopted the most recent CSL version.
    To a certain extent, it might be possible to automate the up- and
    downgrading of styles using XSLT, but that is probably unlikely to cover all
    cases.

Rintze

The P2P idea makes a lot of sense, and has been something I’ve thought
about from the beginning (well, really more modestly, of being able to
"subscribe" to different collections, at different repositories,
perhaps make use of Atom). Is that perhaps too ambitious to do as a
first cut? I don’t know. My main focus is enabling use cases like I
mention in that post.

Two things I’m wondering about:

  • Vandalism: how can we prevent that users, either maliciously or
    accidentally, break styles? This would range from people trying to spam the
    system to people misunderstanding style guides and introducing incorrect
    changes.

If we assume that there are going to always be more users than
versions of styles, then providing social information about the usage
of a version of a style should reduce propogatability of deficient
styles. The user, on point of contact, where they choose which version
of a style to work with, should be given information on usage patterns
of the styles they are looking at.

Endpoints that consume styles and that know how many of which version
are used should be encouraged to report this information.

That information could be caught by a man in the middle attack,
however I don’t see citation styles as being a high value vector for
malicious attackers, so we mainly need to design against entropy,
rather than evil.

In truth, most people will probably receive a style recommendation
from either their peers or tutors.

There is no way to ensure that no deficient styles are created.

  • Different versions of CSL: I don’t think we can get around the need to
    host different CSL versions of styles (v. 0.8, v 1.0, etc) to support
    applications that use CSL but haven’t adopted the most recent CSL version.
    To a certain extent, it might be possible to automate the up- and
    downgrading of styles using XSLT, but that is probably unlikely to cover all
    cases.

Surely the provider can also provide information to the user about
which version of a style they are browsing?

The P2P idea makes a lot of sense, and has been something I’ve thought
about from the beginning (well, really more modestly, of being able to
"subscribe" to different collections, at different repositories,
perhaps make use of Atom). Is that perhaps too ambitious to do as a
first cut? I don’t know. My main focus is enabling use cases like I
mention in that post.

Two things I’m wondering about:

  • Vandalism: how can we prevent that users, either maliciously or
    accidentally, break styles? This would range from people trying to spam the
    system to people misunderstanding style guides and introducing incorrect
    changes.

I’ve mentioned previously that we introduce a policy that only
"editors" should have style editing rights.

Editorship could be defined in one of two ways:

  1. the person creating the style gets its editorship (but this leaves
    room for vandalism of sorts)

  2. an admin user can define them

  • Different versions of CSL: I don’t think we can get around the need to
    host different CSL versions of styles (v. 0.8, v 1.0, etc) to support
    applications that use CSL but haven’t adopted the most recent CSL version.
    To a certain extent, it might be possible to automate the up- and
    downgrading of styles using XSLT, but that is probably unlikely to cover all
    cases.

Depends on the timeline; doesn’t it? Why can’t we say the repo is 1.0
only once Mendeley and Zotero go live with citeproc-js? It’s an
incentive for users to upgrade, and it makes life MUCH easier for us
going forward?

Bruce

And perhaps style editors can in turn appoint other editors (though
only for that style)?

So there are global users and admins. Each style has one-or-more editors.

Bruce

Vandalism: how can we prevent that users, either maliciously or
accidentally, break styles?

As with a wiki, you can log all changes or additions and provide an
easy way to revert them. In general I would favor systems as
permissive as possible to make it easy for people to get involved.

Regards,
Rob.

So would it be fine in your view to allow anyone to create a new style
or comment on an existing one, but to control editing rights on
already existing styles? Or are you suggesting leaving it wide open?

Bruce

Or are you suggesting leaving it wide open?

I would be tempted to leave it wide-open initially - anyone can
comment, anyone can edit any style,
only introduce more complex permissions if problems are encountered.
Log all recent changes somewhere
prominent and provide an easy way to revert undesired changes.

I think there is a good argument in many situations to leave the
social policy to the users to manage themselves
rather than setting hard restrictions in the code.

Regards,
Rob.

I agree with Rob,

As long as each edited version is associable with the authorship chain
and is versioned, and we can recommend the popular versions then it
should be OK.

A style that is corrupted won’t be used by many people.

  • Ian

But what happens if someone corrupts an existing style? Wouldn’t that give
problems if style updates are automatically pushed to users?

RintzeOn Fri, Aug 20, 2010 at 12:28 PM, Ian Mulvany <@Ian_Mulvany>wrote:

Yes, I have the same question. It does happen that people get
suggested changes wrong.

In any case, I’d suggest that the design of such a system shouldn’t
assume one or the other model, but allow it to be configured for
either. By definition that means distinguishing users and editors I
would think.

Bruce

Especially in the case of major styles that many of the dependent
styles point to, clobbering the data even for a short time could cause
problems for many people. An ideal system in my view would follow
something like the git model, in which citationstyles.org has a
repository that it considers authoritative, and most people will want
to use the most recent stable versions in that repo. It also allows
extremely easy cloning or branching and allows for pull or merge
requests when a user thinks they have something other people would
find useful, but doesn’t allow them to automatically update an
important resource. There is the possibility of actually building it
on a git repository, but I haven’t worked with large multi-user git
projects (or git libraries for various programming languages) enough
to know exactly how easy it would be to implement a pure web interface
so users aren’t required to use git to submit small changes.

Example workflow for such a system:
User goes to citationstyles.org to find a style he needs. He finds one
with the name of his journal but notices that it doesn’t quite do what
the journal wants.
He goes ahead and clicks the edit button.
It turns out the style is dependent on APA but should in fact be a
slightly modified APA.
The user fixes the style to do what he needs and saves it for his own use.
He figures the corrected style would be useful to others, so he
submits the edited version for inclusion in the repository along with
a message explaining the changes and it is saved somewhere but not
automatically pushed onto APA or the journal style.
A maintainer gets notified that there is a pull request and examines
the changes, decides if it should be applied at all, and if so whether
it was a problem with the APA style or the journal style which should
not be a strict copy of APA.

So you’re essentially suggesting github for citation styles?

Interesting.

How would you deal with style identifiers (URIs)?

Bruce

So you’re essentially suggesting github for citation styles?
For a single github hosted project anyway. And even if it is built on
git that doesn’t necessarily mean git should be exposed rather than
hiding its complexity, since edits on styles should be simple single
file commits. Github is just the most widely known implementation I’m
aware of, and I automatically think of these problems in git terms.
I’m not certain about the internals of git, but I imagine for the
website allowing edits of single files git branches would be the
answer, so each time a user started editing a style it would create a
new git branch for them, then a maintainer would have to merge that
in. The cloning would come in for cases like Mendeley or Zotero
hosting a copy under their exclusive control and then periodically
pulling from citationstyles and making pull requests to
citationstyles.

Interesting.

How would you deal with style identifiers (URIs)?

The authoritative repository would mean the links would be stable for
the most up to date version. If someone was hosting a cloned
repository they could also run a simple script to update URIs. For
edited styles that have not yet been accepted to the main repository
(a user editing a single style on the website) I’m not sure if it
would be desirable to have a stable URI for them right away or not.

Its also possible that my vision is getting overly complex, and
something like an svn repo with automatically generated diffs would
solve the problem fine and eliminate some hairiness in implementation.

I really dislike SVN and the whole centralized SCM model myself. So if
it made sense to build this on top of a proper SCM system (and it may
indeed), I would definitely support the notion of building it on top
of git or hg.

Bruce

Also …On Fri, Aug 20, 2010 at 3:58 PM, fcheslack <@fcheslack> wrote:

So you’re essentially suggesting github for citation styles?
For a single github hosted project anyway. And even if it is built on
git that doesn’t necessarily mean git should be exposed rather than
hiding its complexity, since edits on styles should be simple single
file commits. Github is just the most widely known implementation I’m
aware of, and I automatically think of these problems in git terms.
I’m not certain about the internals of git, but I imagine for the
website allowing edits of single files git branches would be the
answer, so each time a user started editing a style it would create a
new git branch for them, then a maintainer would have to merge that
in.

So is each style its own repo? If not, what is a “single file branch”?

Bruce

So is each style its own repo? If not, what is a “single file branch”?

What I was thinking was the authoritative repo would hold all the
styles, then there might be a dev clone on which a branch would be
created for each edited style. So as a logged in user I edit
biochemistry.csl and a branch named fcheslack_edit_biochemistry.csl is
created on the dev repo and stays there until a maintainer pulls it
into the authoritative repo. It might instead work better to have a
clone for each user when they edit.

Again, I’ve never tried using git this way so I’m not sure how
smoothly it would work, but the important part in my mind is allowing
maintainers for the most vital styles and making it easy for users to
fork an existing style into a new one (and ideally also make it as
easy as possible to pull updates of styles and submit improvements).

If you can find time, how about throwing up a simple test/demo repository on
github?

BruceOn Aug 20, 2010 4:36 PM, “fcheslack” <@fcheslack> wrote:

So is each style its own repo? If not, what is a “single file branch”?

What I was thinking was the authoritative repo would hold all the
styles, then there might be a dev clone on which a branch would be
created for each edited style. So as a logged in user I edit
biochemistry.csl and a branch named fcheslack_edit_biochemistry.csl is
created on the dev repo and stays there until a maintainer pulls it
into the authoritative repo. It might instead work better to have a
clone for each user when they edit.

Again, I’ve never tried using git this way so I’m not sure how
smoothly it would work, but the important part in my mind is allowing
maintainers for the most vital styles and making it easy for users to
fork an existing style into a new one (and ideally also make it as
easy as possible to pull updates of styles and submit improvements).