Proposed change to CSL input XML specification

I have a suggestion for adding a data type to the CSL specification.

Overview

Some
of the design goals of a bibliography system are (1) simplicity, (2)
comprehensiveness, (3) efficient encoding, and (4) adaptability to new
uses and contexts. I think that adding one more complex field type to
the CSL specification will actually improve the specification with
regard to the 4 design goals mentioned above. CSL should add a
“serial” field to encode information about serial sources. It would
apply to periodicals, magazines, newspapers, case reporters,
newsletters, academic journals, and books published in serial formats.
If adopted, the CSL-JSON would support 3 complex field types in total: Date, Person, and Serial.

Examples

Without a serial-type field, one would have to encode a serial publication (like “Brown v. Board of Education of Topeka, Kansas, 483 U.S. 347”) in ordinary CSL fields:

483 U.S. Brown v. Board of Education of Topeka, Kansas 347

With a serial-type
field, you would instead encode the information pertaining to the
serial publication location within a container that shows the logical
relationship between the fields.

Brown v. Board of Education of Topeka, Kansas 347 U.S. 483

Benefits

The
second format shows to the human reader how the discrete fields of
“volume”, “publication”, and “page” are inextricably linked. The essential variables are co-located instead of being dispersed. Moreover,
it has technical advantages for encoding citations items in at least 2 real-world situations.

Parallel Citations

Some legal journals require parallel citations to a legal resources. A parallel citation is “A reference to the same case or statute published in two or more sources” according to the Legal Dictionary.
For example, when citing a Supreme Court case, the writer may be
expected to first cite the official reporter of the Supreme Court, and
then cite the reporters published by WestLaw and LexisNexis. Likewise,
when citing a multilateral treaty,
the author should cite both an domestic reporter and an international
reporter. So, if the user of CSL-JSON was required to use parallel
citations, the “serial” type data structure would easily support it. See
how the previous citation could be extended:

Brown v. Board of Education of Topeka, Kansas 347 U.S. 483 74 S. Ct. 686 98 L. Ed. 873

The first listed publication would be assumed to be the primary publication in a “serials” type field. Without the Serial-type
data field, this information would have to be encoded in fields like
“1st volume”, “1st publication”, “1st FirstPage”, and “2nd volume”, “2nd
publication”, “2nd FirstPage”, etc. Trying to encode all that
information in ordinary CSL-JSON fields would be cumbersome while still
not being comprehensive; and it would be unfriendly to human users while
not providing flexibility for new situations and contexts.

Multipart Articles

Some
articles are published across multiple issues of a publication. The
“serial type” field would be able to efficiently and legibly encode a
citation to a mutlipart article. If a citation appeared as follows:

Harlan F. Stone, The Equitable Rights and Liabilities of Strangers to a Contract (pts. 1 & 2), 18 Colum"page":". L. Rev. 291 (1918), 19 Colum. L. Rev. 177 (1919).

It would be encoded as follows:

Stone Harlan F. The Equitable Rights and Liabilities of Strangers to a Contract (pts. 1 & 2) 18 Colum. L. Rev. 291 19 Colum. L. Rev. 177 1918 1919

The order of listing for “serial” elements should correspond to their respective dates of publication in the ordering of dates in “issued” element.

Proposed specification for “Serial type” Field

A
complex field of “serial” type would consist of an array of one or more
publications. Each publication would have 4 possible fields: “volume”,
“publication”, “issue”, and “page”. The “publication” and “first page”
fields would always be required, while “issue” and “volume” would depend
on the context and source type. From my research, I believe that there
are 3 main classes of serial publications. I will deal with each type
in turn.

  1. Non-Consecutively paginated serials with Volume numbers.

1st
Case is for the serials by are published by non-consecutively paginated
volumes (such as an academic journal). In this case, the citation to
the source should include the Volume, Publication, Issue, and First Page.

  1. Consecutively paginated serials with Volume numbers.

When
issues within a volume continue from the pagination number of the
previous issue, then identifying the issue number often not required (or
even available). The required fields would be Volume, Publication, and First Page.

  1. Serials that are identified only by issue, and do not track Volume numbers

Some
periodicals, like newspapers, do not have volume numbers, and the issue
is identified by the date of publication. In this case, the the volume
number is not required. The required fields would be Publication, Issue, and First Page.

This
situation creates an interesting predicament in which the date of
publication may be duplicated within an item record. The date of
publication would be recorded in the normal date-typed “issued” field, as well as within the serial-typed
“issue” subfield. The specification could include a recommendation to
leave the “issue” subfield blank if it merely copies the information
from the “issued” field.

Mapping Serial-Type Fields to standard CSL fields

No CSL styles or processors currently support the “serials” type. In order to make a transition possible, processors should use the following mapping to ensure compatibility between the different styles.

The elements that pertain to information that would also be located in a “serials” field are the following:

“container-title”
- title of the container holding the item (e.g. the book title for a
book chapter, the journal title for a journal article)

“page” -
range of pages the item (e.g. a journal article) covers in a
container (e.g. a journal issue)

“page-first”
- first page of the range of pages the item (e.g. a journal article)
covers in a container (e.g. a journal issue)

“issue” -
(container) issue holding the item (e.g. “5” when citing a
journal article from journal volume 2, issue 5)

“volume”
- (container) volume holding the item (e.g. “2” when citing a
chapter from book volume 2)

Some elements are near-misses for inclusion. I include these to ensure that proper discussion is had.

“number-of-volumes”
- total number of volumes, usable for citing multi-volume books and
such. Multi-volume books are not serial publications. Serial publications are defined to have indefinite length. There is no instance in which this variable would be useful for serial sources.
“collection-title” - title of the collection holding the item
(e.g. the series title for a book). When a work appears within a collection of works, the title of the containing work should be encoded in this variable. However, not all citation styles support this variable. In my survey of 40 styles, only 28 supported this variable. This makes me wonder if style-creators are using the “container-title” variable for both serial publications and collections.

“edition”
- (container) edition holding the item (e.g. “3” when citing a
chapter in the third edition of a book). I am not aware of any serial publications that use editions.
“number”
- number identifying the item (e.g. a report number). This is a tricky one, especially for legal drafters. Report numbering can appear to look like the numbering for a serial publication. I am open to doing more research on this.

From standard CSL fields to “serial-type” CSL fields:
“container-title” ==> “publication”“page” ==> “page”“page-first” ==> “page”“issue” ==> “issue”“volume” ==> “volume”

From serial-type CSL fields to standard CSL fields:
“publication”==>“container-title”“page”==>“page”“issue”==“issue”“volume”==>"volume

When converting from information from a “serials” field to standard CSL fields, only information from the first “serials” child element should be translated.

Summary

The
Serial-Type field would stand alongside the Person-Type field and
Date-type field as complex fields supported by CSL. I will
summarize the proposed Serial-Type format in the following example:

vol. # Title of primary publication Issue # First Page ...
  • Thomas O’Reilly

Thanks. Since technically CSL is the citation style language and there is
no official input format (though obviously citeproc/CSL JSON as used by
citeproc-js is relevant), I think it’d be helpful to formulate how this
would look in actual CSL syntax. I’m not at all clear on that.

(/rant
Beyond that, I wish law folks would just get DOIs for their resources and
be done with this. This is ridiculous. I know – not going to happen and so
we’ll eventually have to solve this, but I’m having a really time
motivating myself to put effort into accomodating 19th century citation
practices. /rant)

So in the context of being generally super busy, I’m a bit overwhelmed by
the post. Thanks much for taking the time, but perhaps you can start with
first principles, and explain as briefly as possible:

What’s wrong with the current CSL formatting specification that leads you
to this solution? Perhaps an example output of what cannot now be done?

I started to read your first example, as an example, and I was not seeing
the problem you were trying to solve (unless it’s an orthogonal problem
around data representation, which is not our primary focus).

Thomas does say what this is for:

  1. Articles published across serials such as:
    Harlan F. Stone*, The Equitable Rights and Liabilities of Strangers to a
    Contract (*pts. 1 & 2), 18 COLUM. L. REV. 291 (1918), 19 COLUM. L. REV.
    177 (1919).
    (notice the two separate journal issues&dates for a single title

  2. Parallel legal citations
    Czapinski v. St. Francis Hosp., Inc., 2000 WI 80, 236 Wis. 2d 316, 613
    N.W.2d 120.

I only have an approximate understanding of this, but basicually WI80, 236
Wis 2d 316, and 613 N.W.2d 120 are three different places (“reporters”) the
case has been published and legal citation practices (cf. my rant above)
requires to list all three – hence three serials in a single citation.

Personally I think 1. is rare enough to be irrelevant. You could just cite
the above separately (as is often done) or list a date range (which CSL
already supports, even though .
2., on the other hand, is a super-common component of legal citations, so
to the extent we want to support legal citations, we have to support
parallel citations…
Frank in juris-m/csl-m does solve this differently, i.e. by automatically
"collapsing" the same case when cited in a single citation, the same way
CSL does for the same author. That’s also more in line with the data
storage model used by most upstream clients of CSL (which is one of my
major worries with Thomas’s proposal: we can put this in CSL all we want,
but if Zotero and Mendeley don’t implement a data model that can produce
this – thus making it useless to 80%+ of CSL users, what good does it do
us.)
But I’m open to be convinced that there is a compelling and feasible case
here. For me, though, the starting point would be mock-csl syntax rather
than input data.