How to encode and process "ahead-of-print"

We want to output citations that have an indication of an article’s status, when it comes out electronically, ahead of the print version. For example, in the AMA style, the article http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4365985/ would be cited as:

    Brown CM, Reilly A, Cole RW. A Quantitative Measure of Field Illumination [published online ahead of print March 19, 2015]. Journal of Biomolecular Techniques : JBT. July 2015:jbt.15-2602-001. doi:10.7171/jbt.15-2602-001.

The “[published online …]” bit is inserted into the title macro, when the article has the given status.

In citeproc-json, I’m thinking this could be encoded with*
status: “ahead-of-print”,

But in CSL, I don’t know how to test for the particular value of a variable. None of the tests described here, http://docs.citationstyles.org/en/stable/specification.html#choose, do that. The closest I have come is to test whether or not the status variable exists:

<choose>
  <if variable='status'>
    <group prefix=' [' suffix=']'>
      <text value='published online ahead of print '/>
      <date variable="original-date" form="text"/>
    </group>
  </if>
</choose>

This will work in our system, since we don’t use status for anything else (yet), but I don’t think it would work in general.

Is there a way to do this?

Thanks!

No, and we won’t allow free-text matching on the value of a variable in CSL.
The current solution for most styles is to simply test for the presence of
a volume number or page range for journal articles, and in the absence of
both treat articles as published ahead of print, potentially testing for a
DOI to distinguish them from forthcoming works. That’s the easiest solution
and has the advantage that it works already.

Option number two is two simply print Status, but that’s tricky. We are
doing that for articles without a date with a status in some styles. Also
possible right now.

Option three would be to require restricted vocabulary for status (the way
we do e.g. for creator or locator types, i.e. likely via dropdown menu in
GUI implementations). That likely is the cleanest solution, but it’d be a
lot of effort and it’d require significant changes not just in the specs,
but also in GUI implementations.

Thanks, Sebastian,

I think it will work if we just test for the presence of volume. I’ve modified our copy of the american-medical-association.csl, and I’ll send a pull request.

Now, the second, and larger problem: we need to encode the electronic publication date separately from issued. This is a resurrection of this thread from last year: “Multiple publication dates in citations”, [xbiblio-devel] Multiple publication dates in citations | XBib.

In addition to needing it for the NLM style, it is relevant with this “ahead-of-print” issue. For example, see this article, which is currently ahead-of-print: A Quantitative Measure of Field Illumination - PMC. The american-medical-association style for this, now, is (http://www.ncbi.nlm.nih.gov/pmc/utils/ctxp/?ids=PMC4365985&style=american-medical-association):

Brown CM, Reilly A, Cole RW. A Quantitative Measure of Field Illumination. Journal of Biomolecular Techniques : JBT. July 2015:jbt.15-2602-001. doi:10.7171/jbt.15-2602-001.

We need it to be the following, with the epub date appearing in the ahead-of-print notice:

Brown CM, Reilly A, Cole RW. A Quantitative Measure of Field Illumination [published online ahead of print March 19, 2015]. Journal of Biomolecular Techniques : JBT. July 2015:jbt.15-2602-001. doi:10.7171/jbt.15-2602-001.

I’d like to add this as a new date field in the citeproc-json format: epub-date. I tried using original-date, but that conflicts with other uses.

One of the objections you raised last year was,

Would we ever really need the epub date for citations? I’ve never seen this except when an article hasn’t been published in paper yet.

But this is exactly the use-case we are confronted with, and it seems to me that capturing the e-publication date for a citation before it’s in print is pretty important. We also need it for the NLM style (Box 60, Electronic publication before print - Citing Medicine - NCBI Bookshelf), where it is part of the citation format, even for versions of the article that appear after the print version (in other words, in my understanding, it should be preserved as part of the citation).

Later in the thread, you wrote:

The two things that actually take work is:

  1. See this through in discussion of the next spec update, including making the case that this is broadly needed for citations. I’m still unsure about that, though the NLM example is helpful. It could also be used, together with the absence of a (print) publication date, as an indicator for pre-print publication in citations, so that’d be another plus.

Actually, for our examples, even for ahead-of-print, we have both the e-publication and the issued date. As I mentioned, I’m using absence of volume to determine the ahead-of-print status.

  1. It would then also have to be taken up by reference managers and that might be more of an issue. I’m not sure how happy they’d be with two additional date fields and we’ll definitely do original date of publication.

I think it’s just one additional date field.

Can we get this process started?

Thanks,–
Chris Maloney
NIH/NLM/NCBI (Contractor)
Building 45, 4AN36D-12
301-594-2842

“Sebastian Karcher” wrote:

No, and we won’t allow free-text matching on the value of a variable in CSL.
The current solution for most styles is to simply test for the presence of a volume number or page range for journal articles, and in the absence of both treat articles as published ahead of print, potentially testing for a DOI to distinguish them from forthcoming works. That’s the easiest solution and has the advantage that it works already.

Option number two is two simply print Status, but that’s tricky. We are doing that for articles without a date with a status in some styles. Also possible right now.

Option three would be to require restricted vocabulary for status (the way we do e.g. for creator or locator types, i.e. likely via dropdown menu in GUI implementations). That likely is the cleanest solution, but it’d be a lot of effort and it’d require significant changes not just in the specs, but also in GUI implementations.

On Fri, Jun 5, 2015 at 3:38 PM, Maloney, Christopher (NIH/NLM/NCBI) [C] <@Maloney_Christophermailto:Maloney_Christopher> wrote:
We want to output citations that have an indication of an article’s status, when it comes out electronically, ahead of the print version. For example, in the AMA style, the article A Quantitative Measure of Field Illumination - PMC would be cited as:

    Brown CM, Reilly A, Cole RW. A Quantitative Measure of Field Illumination [published online ahead of print March 19, 2015]. Journal of Biomolecular Techniques : JBT. July 2015:jbt.15-2602-001. doi:10.7171/jbt.15-2602-001.

The “[published online …]” bit is inserted into the title macro, when the article has the given status.

In citeproc-json, I’m thinking this could be encoded with

status: “ahead-of-print”,

But in CSL, I don’t know how to test for the particular value of a variable. None of the tests described here, CSL 1.0.2 Specification — Citation Style Language 1.0.1-dev documentation, do that. The closest I have come is to test whether or not the status variable exists:

<choose>
  <if variable='status'>
    <group prefix=' [' suffix=']'>
      <text value='published online ahead of print '/>
      <date variable="original-date" form="text"/>
    </group>
  </if>
</choose>

This will work in our system, since we don’t use status for anything else (yet), but I don’t think it would work in general.

Is there a way to do this?

Thanks!

Consider it started :wink:

It’s two new fields for the reference managers, because none of them afaik
have implemented original date – so it’d be original date and e-pub date
for them.

Could you explain:

Actually, for our examples, even for ahead-of-print, we have both the
e-publication and the issued date. As I mentioned, I’m using absence of
volume to determine the ahead-of-print status.

Why? Or rather, what’s the issued date for those? Is it in the future?

So, question for others is – are there any objections to an additional
e-pub date in CSL? Just to summarize, the main reasons we’d want it is that
a) It’s needed for some citation styles, most importantly NLM
b) It’s available in several metadata formats, including google’s embedded
metadata and Pubmed data.

“Sebastian Karcher” wrote:

Could you explain:

Actually, for our examples, even for ahead-of-print, we have both the e-publication and the issued date. As I mentioned, I’m using absence of volume to determine the ahead-of-print status.

Why? Or rather, what’s the issued date for those? Is it in the future?

Yes, it’s in the future.

So, question for others is – are there any objections to an additional e-pub date in CSL? Just to summarize, the main reasons we’d want it is that
a) It’s needed for some citation styles, most importantly NLM
b) It’s available in several metadata formats, including google’s embedded metadata and Pubmed data.

Dear list,

while the way CSL handles date information works really well, e.g. for partial dates, I would like to understand why ISO 8601 (the datetime standard) wasn’t choosen for this. ISO 8601 handles both partial dates, e.g. „2006“ or „2006-11“ and date ranges, e.g. „2006-11-01/2006-11-15“, the only limitation are cases such as quarters, etc.

I admit that the reason I bring this up is edge cases currently not reflected in citation styles (e.g. including hours and minutes), but handling of dates in ISO 8601 format also seems to be easier, given the wide support in languages and frameworks. Is it mostly „why change something that works now“, or are there are other arguments that I have missed. Happy to read up earlier messages on this topic if you provide a link.

Best,

Martin

It seems crazy that we need more than one publication date when everything is published electronically, but some publishers (e.g. Elsevier) make frequent use of ahead-of-print. I personally have a problem citing something with a date in the future, and I can imagine this will confuse other people as well, in particular if you have something like a preprint where the dates in the reference list might actually be after the publication date of the preprint. The additional electronic publication date/ahead of print date could be very helpful here. Part of the issue seems to be that we are slowly moving from publication dates that are year-only to more granular publication dates, so that these differences become more visible.

Best,

Martin> Am 08.06.2015 um 18:41 schrieb Maloney, Christopher (NIH/NLM/NCBI) [C] <@Maloney_Christopher>:

This is a citeproc-js question, right? CSL doesn’t have any specified input
date format.

Yes, citeproc-js. Should I convince Frank Bennett then?

Best,

Martin> Am 08.06.2015 um 20:40 schrieb Sebastian Karcher <@Sebastian_Karcher>:

He’s around here – lets see if he implemented this specifically or was
also following a legacy format.
Given how many different places now use citerproc-js, switching is going to
be hard, but also allowing ISO might be an option.

I think the existing format was probably adopted from what was there,
but I’m not sure. The source actually contains a parser that can
handle those string dates (and other things), and that’s what
MLZ/Juris-M uses. You can turn it on with:

 citeproc.opt.development_extensions.raw_date_parsing = true;

Nobody has ever liked the date input format. There was a long
discussion of alternatives, and EDTF was a leading candidate:

http://www.loc.gov/standards/datetime/

IIRC, the pending items when discussion tapered off in CSL were the
selection of a subset of the EDTF forms, and the drafting of a parser
(from the LoC page, it looks like .NET, Ruby and C# implementations
are now available. Don’t know if there is anything for JavaScript or
Haskell.

If CSL adopts a specific input format, and someone writes a parser and
a test suite for it, I would be happy to include it in citeproc-js.

FB

I think what you see in citeproc-js is shaped by practical decisions Zotero
made. As in, was better to be loose with the expectations here.

But I’ve always favored defining CSL dates as EDTF. Not sure what the
implementers would think about that though, as a requirement.

If I understand correctly, we’re talking about the CSL JSON format here,
correct? If so, I don’t think it makes a lot of sense semantically to
supply dates as a complex string within the JSON format. JSON gives you all
the flexibility you need to supply date ranges, approximate dates,
date-times, time zones, whatever else you want to support. It’s just a
matter of adding some more clearly defined and directly accessible
properties. Am I missing something here?

Dear all,

thanks for the feedback. I now better understand that the date implementation in Citeproc/CSL JSON was a pragmatic decision, and that EDTF (which is close enough to ISO 8601) would be a good alternative. It really comes down to use cases, implementations and tests. For the comparison of EDTF vs. ISO 8601 I will take a closer look at EDTF, in particular who is using this format in the wild.

Aurimas, one problem with dates in JSON is that it they are not a native data format, in contrast to strings, numbers, booleans, etc. You can of course define dates in your own JSON spec, but an ISO 8601 string is a reasonable alternative commonly found in JSON.

Best,

Martin> Am 09.06.2015 um 06:41 schrieb Aurimas Vinckevicius <@Aurimas_Vinckevicius>:

Martin,

EDTF is currently pursuing to seek adoption (in part) into ISO 8601 by
TC154. Therefore, it would make sense to identify those features of
EDTF which are most relevant to CSL to help the effort of having them
accepted into ISO 8601. You can read more about this in the EDTF
listserv archive:

http://listserv.loc.gov/cgi-bin/wa?A2=ind1505&L=datetime&T=0&P=1092

For what it’s worth, citeproc-ruby already supports EDTF input if you
install the edtf gem. For usage examples in the wild, I know that the
Digital Public Library of America is using EDTF.

Sylvester

signature.asc (181 Bytes)

Thanks Sylvester, both for the information, and for writing the EDTF gem. I am currently using plain Ruby to handle partial dates and convert them to the date-parts format that Citeproc JSON expects, so I will try your EDTF gem.

This post (of a series of three posts) gives a good overview of EDTF use in the Digital Public Library of America:
http://vphill.com/journal/post/5690/

Best,

Martin> Am 09.06.2015 um 08:55 schrieb Sylvester Keil <@Sylvester_Keil>:

No, you’re not missing anything, except that adding what you suggest does
have a cost.

Seems to me like the cost of correctly composing and parsing EDTF across a
number of citeprocs and clients in various languages is much higher.

This was my initial thought as well, but upon reflection, it seems that
any client that’s going to support even some of EDTF’s advanced features
is probably going to store the date as an EDTF string anyway. I think we
would do exactly that in Zotero. The alternative would just move all of
EDTF’s complexity to the local database schema (though some might need
to be pulled out anyway for efficient searching).

As for composing and parsing, that’s what libraries are for. But, yes,
we’ll need to find or build a good JS library.

Hi, I pushed ahead with adding epub-date to citeproc-json, since we need it in our system. I’m hoping that it can be adopted into the standard by you guys. I’ve created the pull requests for the repos that we use:

I didn’t attempt to send PRs for any of the documentation, since it’s still all subject to getting your approval.–
Chris Maloney
NIH/NLM/NCBI (Contractor)
Building 45, 4AN36D-12
301-594-2842