linking the DOI variable

I received this bug report/feature request, about the possibility of
linking DOI data with a note on how to do it:

http://code.google.com/p/citeproc-hs/issues/detail?id=28

I wonder how citeproc-js and other implementations deal with this
problem, so to adopt for citeproc-hs a similar approach

Thanks,
Andrea

citeproc currently doesn’t create any links, not for DOIs, not even for URLs.

sorry, that’s citeproc-js

This is the best approach, and has another benefit:

http://inkdroid.org/journal/2011/04/25/dois-as-linked-data/

Bruce

Existing implementations generally just treat the output as dumb text.
I’ve mentioned before that it would be nice to evolve them to allow
structured data to get extracted from them. So a first step might be
(optionally?) linking DOIs (which as I hint in the other reply, can
pull in RDF data). Might also be nice to optionally link URLs, and if
the aren’t printed, titles.

Embedded RDFa and/or microdata would seems a longer term prospect.

Bruce

Put differently, they’re optimized for Word docs and print media; not
the (linked) web.

Bruce

Maybe you could have an extra style attribute value, in addition to italics, bold…, called ‘link’. This way you could make anything into a link. Implementations could decide to output it as an actual link or not.

including options for linking URLs and DOIs would be nice. Ideally
they would be toggled by the respective CSL implementation.

If we went this way, we’d probably want something like:

<text link=“true” macro=“doi-url” prefix="doi: "/

Bruce, why would you specify the prefix, rather than just denoting
that the node is a link and then leaving it up to the input
to provide the full URI?

Would one want to be able to have an implementation that could auto
render the core identifiers: PMID, ARXIV, ISSN, DOI, in which case
where would the sanitisation of the input be left to the implementations?

  • Ian

In the example I posted, which is NOT very well thought out, I was
assuming the full URI would be constructed in the “doi-uri” macro. So
in this example you’d have “doi:” that is plain text, and the content
that follows would a link.

The problem I didn’t think through is how to distinguish rendered
content from link.

Bruce

I received this bug report/feature request, about the possibility of
linking DOI data with a note on how to do it:

http://code.google.com/p/citeproc-hs/issues/detail?id=28

I wonder how citeproc-js and other implementations deal with this
problem, so to adopt for citeproc-hs a similar approach

Thanks,
Andrea

Live links to DOI (and URL) are definitely a good thing, and I agree
that it’s time to get it into place. Should we settle guidelines on
how it should be controlled? That is, in HTML output …

(a) Should links always be provided? If not, then …

(b) Should processors be given separate output modes (HTML-with-links,
HTML-without-links). Alternatively …

© Should processors be given a toggle to turn links on and off?
Alternatively …

(d) Should there be a CSL attribute that controls whether links are provided?

It seems like (a) plus © would be simplest, and flexible enough to
cover needs. I think.

Frank

I’m strongly in favor of c). This is something a user should be able
to toggle for any given style.
Also, why limit this to HTML output - any reason that can’t be done in RTF?

Except these aren’t all mutually exclusive. I tend to think C + D is
probably ideal, in part because I’m not sure how we deal with
different kinds of base URIs.

Perhaps we ought to collect use cases to figure out whether we really need D?

The one that prompted this discussion is DOIs, where a link URI gets
constructed out of the id, using a base URI dx.doi.org.

Bruce

Particularly for DOIs, it might be desirable to normalize the output. E.g.
it would be nice if we could render the doi field values “
http://dx.doi.org/10.1111/j.1567-1364.2011.00787.x”,
“doi:10.1111/j.1567-1364.2011.00787.x” and
"10.1111/j.1567-1364.2011.00787.x" all as
"doi:10.1111/j.1567-1364.2011.00787.x" (link title) with a href value of “
http://dx.doi.org/10.1111/j.1567-1364.2011.00787.x”.

In this case, we might want to make this parsing explicit in CSL. One could
even imagine creating a new rendering element, cs:link, for that purpose.

Rintze

This doesn’t seem like the processor’s job. All but the last one are bad
data. If anything, the client should normalize the values before passing
them through.

You’re right. Sorry for the noise. I guess I’m just used to seeing bad
metadata.

Rintze

http://dx.doi.org/10.1111/j.1567-1364.2011.00787.x" is a valid URI,
however a it seems valid DOI can be represented as
10.1006/rwei.1999".0001 or as doi:10.1006/rwei.1999".0001
(http://www.doi.org/handbook_2000/appendix_1.html#A1-C) and is often
represented as doi:10.1006/rwei.1999".0001 or as
doi:10.1006/rwei.1999".0001.

It seems clear to me that all three prefixes “doi:”,
http://dx.doi.org/”, and “” are going to be submitted in the wild. If
it’s possible to formulate a solution that can handle all three inputs
gracefully that would be optimal.

  • Ian

http://dx.doi.org/10.1111/j.1567-1364.2011.00787.x" is a valid URI,
however a it seems valid DOI can be represented as
10.1006/rwei.1999".0001 or as doi:10.1006/rwei.1999".0001
(http://www.doi.org/handbook_2000/appendix_1.html#A1-C) and is often
represented as doi:10.1006/rwei.1999".0001 or as
doi:10.1006/rwei.1999".0001.

It seems clear to me that all three prefixes “doi:”,
http://dx.doi.org/”, and “” are going to be submitted in the wild. If
it’s possible to formulate a solution that can handle all three inputs
gracefully that would be optimal.

I don’t know what you’re referring to in that handbook link. DOIs are
just the “10.” part. The “doi:” prefix and the resolver URL prefix
aren’t part of the DOI, and there’s no reason for those to be stored in
DOI fields in clients or passed around.

If Frank wants to add in parsing of those things, that’s up to him, but
that would make about as much sense as parsing “ISBN: 123456789” or a
"http://www.worldcat.org/isbn/[ISBN]" URL. It’s up to clients to pass
the processor proper data.

Agreed. Even though we have a custom processor that’s directly compiled with the app, it is indeed the way things are split on our end: the processor code just takes whatever data is handed to it at face value, with no extra processing. It’s up to my Papers-to-CSL translation layer to return a proper value upon request of a CSL variable.

It makes much more sense to do any parsing at the client level, since the data is also displayed to the user, and might as well be cleaned up for that purpose as well.

charles