csl issues

Kind of a ramble of an update on what I’m thinking about, in part for
the archives …

So I think the next release of citeproc will be to reflect whatever
changes are needed in CSL. I’ve not really gotten much in the way of
suggestions, so I can only go on my own hunches.

Let’s look at the book example for apa-en.csl:

     <reftype name="book">
       <creator>
         <names/>
         <role>
           <prefix> (</prefix>
           <suffix>).</suffix>
         </role>
       </creator>
       <date>
         <year>
           <prefix> (</prefix>
           <suffix>) </suffix>
         </year>
       </date>
       <title font-style="italic">
         <suffix>.</suffix>
       </title>
       <origin>
         <prefix> </prefix>
         <place/>
         <publisher>
           <prefix>: </prefix>
         </publisher>
       </origin>
       <genre>
         <suffix>, </suffix>
       </genre>
       <medium>
         <prefix/>
       </medium>
       <availability>
         <prefix>, </prefix>
         <physicalLocation/>
         <url>
           <prefix>, </prefix>
         </url>
       </availability>
     </reftype>

What do I think I need to change?

  1. “date” should be changed to “dateIssued”

Beyond that I’m really left with questions about structure that reflect
my work on normalizing my MODS data and experimenting with RDF
conversion. I’ve also been thinking about what might be involved in
porting citeproc to other languages (say Python or Ruby).

The current model is hierarchical, which has certain advantages for
processing (particularly XSLT) and formatting (stuff doesn’t get
processed/formatted if it’s not there in the metadata).

Now, the problems:

Let’s say I want to be able to indicate original publication year
something like (Marx, 1980 [1866]). Well, that original year is
technically part of another – linked – record (or level in MODS). So
there’s really no way to do this in a purely hierarchical model if I
want to be totally consistent. Rather, I’d want to do something like:

[ ]

So now I’m mixing in a sort of flat structure with the hierarchical.
If it doesn’t bother others, though, I suppose it doesn’t bother me.
But the date issue also complicates things because, for example, it’s
mixing levels in cases like book chapters.

Another issue I’ve raised earlier is where ought the locators
(page/volume/issue) go? Right now they’re in the host, but does this
make sense? From a purely metadata standpoint, they are in fact split
across different objects (article, volume, issue, journal), so there
are no clear answers.

I guess the bigger issue is whether it would make sense to consider
moving to a more flat model, where instead of:

main
	title
	container
		title
		series
			title

… you end up with flat variables like:

title
container-title
series-title

… and a structure like:

     <reftype name="article">
       <creator alternate-sortkey="container-title">
         <names/>
       </creator>
       <dateIssued>
         <prefix> (</prefix>
         <year/>
         <month>
           <prefix>, </prefix>
         </month>
         <day>
           <prefix> </prefix>
         </day>
         <suffix>) </suffix>
       </dateIssued>
       <title>
         <suffix>, </suffix>
       </title>
       <group name="periodical">
         <container-title font-style="italic">
           <suffix>, </suffix>
         </container-title>
         <group name="series">
           <series-title/>
           <part-details>
             <seriesNumber/>
           </part-details>
         </group>
         <part-details>
           <volumeNumber/>
           <issueNumber>
             <prefix>(</prefix>
             <suffix>)</suffix>
           </issueNumber>
           <pageNumbers>
             <prefix>, </prefix>
           </pageNumbers>
         </part-details>
       </group>
       <genre>
         <suffix>, </suffix>
       </genre>
       <medium>
         <prefix> (</prefix>
         <suffix>)</suffix>
       </medium>
       <availability>
         <prefix>, available from: </prefix>
         <physicalLocation>
           <prefix>, </prefix>
         </physicalLocation>
         <url>
           <prefix>, </prefix>
         </url>
       </availability>
     </reftype>

Actually, the above could be quite different depending.

Ultimately the decision depends on two issues, in order of priority:

  1. which model best models real world citation styles

  2. which is easier to implement in different languages (not only XSLT)

Both of these are basically practical questions. I cannot know the
answer to the first without more testing with styles, for example.
Aside from my original year example, my experiments with big styles
like APA and Chicago suggests it works fairly well as is.

On the second issue, I’m not terribly skilled with languages like
Python or Ruby. I could see using an array with a flat hash table and
a bunch of variables. OTOH, I could also see a more hierarchical thing
where you have stuff like:

record = [{:title => “Some Article Title”, :container => [{:title =>
“Periodical Title”, :issn => 83456365}]}]

… etc.

BTW, FWIW, it’s struck me that a format conversion engine ought to
share huge overlap with a formatting engine like citeproc. In fact, I
started thinking about this again when writing a little Ruby script to
convert Refer to RDF. In each case they’re mapping input to output
based on some variables. The primary difference is that citeproc
configures the output on-the-fly.

Put all of this another way, am still hoping for some feedback :wink:

Bruce