csl changes

Hi,

I’m contemplating a change in the way I handle contributors in CSL.

Right now, here is a example of book definition for APA:

     <reftype name="book">
       <creator>
         <names/>
         <role prefix=" (" suffix=")."/>
       </creator>
       <date>
         <year prefix=" (" suffix=") "/>
       </date>
       <title font-style="italic" suffix="."/>
       <origin>
         <place/>
         <publisher prefix=": "/>
       </origin>
       <genre suffix=", "/>
       <medium/>
       <availability prefix=", ">
         <physicalLocation/>
         <url prefix=", "/>
       </availability>
     </reftype>

I adopted the creator structure with role child to offer more
flexibility. However, I’m now thinking that may have been unnecessary,
and am thinking about this instead:

     <reftype name="book">
       <author primary-alternate="editor" 

secondary-alternate=“title”/>















There are a few reasons for this:

  1. in thinking about a non-XML (and non-MODS) data model, it’s become
    apparent to me that it’s easier to deal with authors, editors and
    translators (and maybe have a fourth “contributors” to catch corner
    cases) than otherwise. E.g. simple is better (unless it’s too simple of
    course!).

  2. it’s also simple in terms of processing (the “alternate” stuff is a
    little tricky, but it’s hard to get around)

  3. I’ve realized that I had been thinking about editors wrongly when
    designing CSL. They are not equivalent to authors, but serve in their
    place when they are not present. However, if you have, say, an edited
    collection of writings by Walter Benjamin, the editor is secondary. The
    above would capture all cases then: authored books, edited books, and
    those with both.

I have long been contemplating flattening dates, too; am not sure.

I think it’s important to keep nested structures for some things,
though, because the formatting actually assumes such.

Also, I think I will indeed split CSL from citeproc, and rename the
latter “citeproc-xsl.” I think I’m almost done with CSL, but there are
some nagging bugs in citeproc, and one big feature now supported in CSL
(reference list grouping) I have not managed to resolve.

I may also have a separate package for styles. In that case I’d have
"csl-schema", “csl-styles” and “citeproc-xsl.” Obviously if we make
any progress with ports of citeproc, this would make room for them.

Thoughts on any of the above?

Bruce

Hello Bruce and others,

Here are my (overdue) comments. As always, take them with a grain of
salt, or however that saying goes…

I would argue against flattening the dates. It would make it much
easier to localize when this is kept in the nested style.

I think that using the formatting-def structure is less clear
compared to the structure you used earlier with the “normal” tags for
each thing, author, title etc. Although I can see that the formatting-
def approach might somewhat easier to implement (perhaps), but I
think that gain is very minimal and the easier to read normal tags
are better.

Btw, I should have more time to look at these kind of things from now
on, so I hope to participate more, as well as pick up the development
of the CSL Editor for Mac OS X.

Cheers,

Johan—
More information on:
http://www.geo.vu.nl/~jkool/

Here are my (overdue) comments. As always, take them with a grain of
salt, or however that saying goes…

Thanks.

I would argue against flattening the dates. It would make it much
easier to localize when this is kept in the nested style.

OK, will keep that in mind. Interestingly, ODF already localizes dates,
so I was partly anticipating not having to worry about that. But CSL
should still be independent of ODF.

I still haven’t figured out if or how exactly I might want to add CSL
to ODF, but I’m trying to bring them closer in line. One advantage of
the more generic approach is it could also be used to format other
kinds of metadata-based content, like maybe captions and such.

I think that using the formatting-def structure is less clear compared
to the structure you used earlier with the “normal” tags for each
thing, author, title etc. Although I can see that the formatting-def
approach might somewhat easier to implement (perhaps), but I think
that gain is very minimal and the easier to read normal tags are
better.

There are other issues. If I use elements, those values will be fixed
and difficult to change. Also, with existing elements like origin and
location, I am making an assumption upfront about what children will be
allow there. That may be a mistake.

Finally, using enclosure to indicate relations might also be limiting
in some cases. Part of it is implementation (in other languages, and in
GUI’s), but part of it is just that people might (??) want to mix
levels in a given group.

I have no strong opinion though. Was just looking for what other people
think. So thanks.

Btw, I should have more time to look at these kind of things from now
on, so I hope to participate more, as well as pick up the development
of the CSL Editor for Mac OS X.

Cool.

BTW, if we get a Python port, I just remembered OS X has a Python-ObjC
bridge.

Bruce

I would argue against flattening the dates. It would make it much
easier to localize when this is kept in the nested style.

I just had a thought. It would be possible to retain what you’re asking
for, Johan, while also simplifying the layout, by just changing this:

 <dates jan="January" feb="February" mar="March" apr="April" 

may=“May”
jun=“June” jul=“July” aug=“August” oct=“October” nov=“November”
dec=“December” full-layout=“year-month-day”/>

… to this:

 <dates jan="January" feb="February" mar="March" apr="April" 

may=“May”
jun=“June” jul=“July” aug=“August” oct=“October” nov=“November”
dec=“December”>





So then in the cs:item-layout element, I’d have either:

cs:year
cs:date-full
cs:month-day

…or a single cs:date element with a type attribute. I tend to prefer
the first for consistency, but OTOH I have to consider also adding
support for original (non-translated) titles and such.

I think that using the formatting-def structure is less clear compared
to the structure you used earlier with the “normal” tags for each
thing, author, title etc. Although I can see that the formatting-def
approach might somewhat easier to implement (perhaps), but I think
that gain is very minimal and the easier to read normal tags are
better.

Am still thinking about this one, and also Peter’s question about
whether we need to (again) allow markup for prefix and suffix content.
The only place where I think this might be needed is if the content
contains text (like ).

Bruce

OK, what about this Johan:

   <metadata-type name="book">
     <author alternate="editor"/>
     <year prefix=" (" suffix=") "/>
     <title font-style="italic" suffix="."/>
     <editor/>
     <format-group prefix="(" suffix=")">
       <publisher-place/>
       <publisher-name prefix=":"/>
     </format-group>
    </metadata-type>
   <metadata-type name="chapter">
     <author alternate="editor"/>
     <year prefix=" (" suffix=") "/>
     <title font-style="italic" suffix="."/>
     <container-title prefix=" "/>
     <editor/>
     <format-group prefix="(" suffix=")">
       <publisher-place/>
       <publisher-name prefix=":"/>
     </format-group>
    </metadata-type>

So uses the flat structure, but elements to indicate the variable. It
will help me to know if your objection is because of the structure, or
the syntax.

Bruce

This looks quite neat. It seems to be the easiest readable of what
I’ve seen so far.

JohanOp 7-feb-2006, om 15:44 heeft Bruce D’Arcus het volgende geschreven:

On Feb 5, 2006, at 5:55 PM, Johan Kool wrote:

I think that using the formatting-def structure is less clear
compared to the structure you used earlier with the “normal” tags
for each thing, author, title etc. Although I can see that the
formatting-def approach might somewhat easier to implement
(perhaps), but I think that gain is very minimal and the easier to
read normal tags are better.

OK, what about this Johan:

  <metadata-type name="book">
    <author alternate="editor"/>
    <year prefix=" (" suffix=") "/>
    <title font-style="italic" suffix="."/>
    <editor/>
    <format-group prefix="(" suffix=")">
      <publisher-place/>
      <publisher-name prefix=":"/>
    </format-group>
   </metadata-type>
  <metadata-type name="chapter">
    <author alternate="editor"/>
    <year prefix=" (" suffix=") "/>
    <title font-style="italic" suffix="."/>
    <container-title prefix=" "/>
    <editor/>
    <format-group prefix="(" suffix=")">
      <publisher-place/>
      <publisher-name prefix=":"/>
    </format-group>
   </metadata-type>

So uses the flat structure, but elements to indicate the variable.
It will help me to know if your objection is because of the
structure, or the syntax.

Bruce


More information on:
http://www.geo.vu.nl/~jkool/

This is probably even better:

   <item-layout>
      <type name="book">
         <author alternate="title"/>
         <year prefix=" (" suffix=") "/>
         <title suffix="."/>
         <publisher-place/>
         <publisher prefix=":"/>
      </type>
      <type name="chapter">
         <author/>
         <year prefix=" (" suffix=") "/>
         <title/>
         <title relation="container" suffix="."/>
         <title relation="series"/>
         <volume prefix=", "/>
         <pages prefix=", "/>
      </type>
      <type name="article">
         <author alternate="container-title"/>
         <year prefix=" (" suffix=") "/>
         <title/>
         <title relation="container" suffix="."/>
         <volume prefix=", "/>
         <pages prefix=", "/>
      </type>
   </item-layout>

The relation attribute helps a lot for the XML, though am not quite
sure how that’d work in a GUI.

Bruce

I don’t think you should worry to much about what a GUI would look
like. The important part is to have an easily understood system on
writing it in xml. If that is easy enough, even a text editor could
the the GUI. :slight_smile: But seriously, pretty much everything can be put in
a GUI with some creativity.

Focus on getting it easily readable for your own parser, as well as
the human eye. If it works for that, it works for a GUI too.

JohanOp 8-feb-2006, om 23:58 heeft Bruce D’Arcus het volgende geschreven:

The relation attribute helps a lot for the XML, though am not quite
sure how that’d work in a GUI.


http://www.geo.vu.nl/~jkool/

OK, then, I think that’s settled.

The final issue is about the prefix/suffix thing. Should they be plain
text (as they are now), or should I allow them to have formatting
attached to them?

If the latter, it’d look like this:

      <type name="book">
         <author alternate="title"/>
         <year>
           <prefix> (</prefix>
           <suffix>) </suffix>
         </year>
         <title>
           <suffix>.</suffix>
         </title>
         <publisher-place/>
         <publisher>
           <prefix>:</prefix>
         </publisher>
      </type>

It’s more verbose and difficult to handle, but not significantly so.

For comparison, the alternative would be:

      <type name="book">
         <author alternate="title"/>
         <year prefix=" (" suffix=") "/>
         <title suffix="."/>
         <publisher-place/>
         <publisher prefix=":"/>
      </type>

Bruce

Hi,

Focus on getting it easily readable for your own parser, as well as
the human eye. If it works for that, it works for a GUI too.

I fully agree with this.

The final issue is about the prefix/suffix thing. Should they be
plain text (as they are now), or should I allow them to have
formatting attached to them?

I think it’s a good idea to allow for formatting in prefix/suffix
strings. What about the "In: " string that’s used in cases like:

Dieckmann GS, Hellmer HH (2003) The importance of sea ice: an
overview. In: Thomas DN, Dieckmann GS (eds) Sea ice - an introduction
to its physics, chemistry, biology and geology. Blackwell Science
Ltd, Oxford

This "In: " string is often printed in italics. Or is this not a
prefix/suffix string?

If the latter, it’d look like this:

      <type name="book">
         <author alternate="title"/>
         <year>
           <prefix> (</prefix>
           <suffix>) </suffix>

For comparison, the alternative would be:

      <type name="book">
         <author alternate="title"/>
         <year prefix=" (" suffix=") "/>

Personally, I find the first version way more easier to read and to
grasp. The second option seems to involve more “screen clutter” and I
have to concentrate more in order to understand it.

The hierarchy in the first example helps me to “parse” this quickly by
eye. But it may be just me…

Best regards, Matthias

Yes, I think that allowing formatting would be the wiser thing to do.
I can well imagine scenarios where such formatting would be much
needed. I guess that the extra verbosity is just something we’d have
to live with.

JohanOp 9-feb-2006, om 0:34 heeft Bruce D’Arcus het volgende geschreven:

This "In: " string is often printed in italics. Or is this not a
prefix/suffix string?

No, you’re right, and that’s exactly the case I’m worried about.

If the latter, it’d look like this:

      <type name="book">
         <author alternate="title"/>
         <year>
           <prefix> (</prefix>
           <suffix>) </suffix>

For comparison, the alternative would be:

      <type name="book">
         <author alternate="title"/>
         <year prefix=" (" suffix=") "/>

Personally, I find the first version way more easier to read and to
grasp. The second option seems to involve more “screen clutter” and I
have to concentrate more in order to understand it.

The hierarchy in the first example helps me to “parse” this quickly by
eye. But it may be just me…

So just to be clear, you’d be happy in more ways than one if I adopt
the first option above; right?

So it seems that’s most likely to make everyone happy? It addresses
Peter’s concern, and nicely balances a lot of the other issues (file
size, programming ease, flexibility, xml consistency).

The one awkwardness it’ll introduce is that I’d now have to do:

[ ] ...

A little weird, but it’ll work.

If I hear no objections, I’ll make those changes.

I hope to post example style to start with the ports sometime in the
next few days.

Bruce

Yes, absolutely. Personally, I find attributes very hard to read while
separate elements line up and indent nicely and are thus easier to read
and understand. And if it helps to improve flexibility as well as
future compatibility, that’s even better.

I wouldn’t think too much about XML being verbose. Too me it’s more
important that the XML structure is very clear. I’m sure people will be
more tempted to adopt a particular XML structure if they are able to
grok it.

Regards, Matthias

OK, then, I modified the schema and wrote an XSLT to mostly convert the
old examples.

For now I’ve put it all here:

http://www.users.muohio.edu/darcusb/citations/csl/

Once I stabilize everything (schema, examples, directory and file
naming conventions, etc.), I’ll move it to the Sourceforge site. While
I don’t expect to make any huge changes at this point, please get me
feedback on any of the above.

The one feature I’m still working on figuring out is better
international support. I’ve been talking to a guy who deals with
Japanese texts on this.

As I said, this is a quick-and-dirty way to do an online repository
that I think has promise. In everyday use I’d like my formatter to be
able to grab the needed style from online, and then cache them. Alf
mentioned on his blog awhile back that Endnote ships will thousands of
styles, which seems kind of silly when you consider that a given author
may only ever use a handful of them.

Next step is to figure out how to create a Ruby and/or Python
CitationStyle object out of these.

Bruce

cs-citenumber = element citenumber { cs-formatting.config,
attribute superscript {“yes”}? }

cs-citenumber = element citenumber { cs-formatting.config,
attribute superscript {“yes” | “no”}? }

?

Plus, I was thinking, should there be support for shortened journal
names? Like J. Clim. and then every one knows it is the Journal of
Climatology? Will be very tough to implement though, as I don’t think
there even exists a list with all valid shortenings. Just a thought…

Johan

I think
http://wos01.isiknowledge.com/help/A_abrvjt.html
http://wos01.isiknowledge.com/help/B_abrvjt.html
etc
is the most complete list available at the moment.

alf.

cs-citenumber = element citenumber { cs-formatting.config, attribute
superscript {“yes”}? }

cs-citenumber = element citenumber { cs-formatting.config, attribute
superscript {“yes” | “no”}? }

?

It seems sort of redundant.

Plus, I was thinking, should there be support for shortened journal
names?

Well, yes, there should be an abbreviated title element in general,
which would also cover journals. Thanks.

It would also be nice to have a periodical RDF store somewhere that
included the abbreviation, and one could just link to it. But that not
be our thing; it’s a big job. The OCLC does have a large csv file of
periodicals though. I converted that to XML and used it for some MODS
conversions.

Bruce

I think the solution for csl is simple:

<title type="short"/>

… or maybe call the attribute “variant.”

That way the processor could look for an abbreviated title, and if not
there, default to the simple title.

In terms of how to handle that in data, to me it’s clear that
periodicals and such ought to be normalized as full resources/objects.
So if you’re using a RDBMS, there’s a table called “collections” which
includes periodicals. That table would then have both “title” and
"short-title" columns.

Likewise, in an RDF representation, you’d do:

<biblio:Journal rdf:about=“http://ex.net/journals#x”>
dc:titleSome Full Journal</dc:title>
biblio:abbreviatedTitleS. F. J.</biblio:abbreviated>
</biblio:Journal>

The same issue applies to corporate/organizational names.

Bruce

There are two ways to take abbreviated journal names into account.
Add an abbreviated title to the data, or let the citation processor
abbreviate the name. There are some pros and cons for each approach.

Add abbreviated title to data:

  • abbreviated title might not be present
  • only one abbreviation, different styles might use/prefer different
    abbreviations
  • much easier for the citation processor to implement

Let the citation processor abbreviate:

  • making a full list of all abbreviations is a lot of work
  • title might always abbreviated (although maybe not always exactly
    correct) *
  • each style can define its own preferred abbreviations

*) This is the case if it’s allowed to replace strings partly, e.g.
Journal -> J.
Climatology -> Clim.

There are probably more things for this list. I’m curious which
method you prefer. It seems to me that Bruce prefers the first.
Matthias and me seemingly had the second option in mind.

This whole abbreviation habit is a very annoying habit, and to be
honest I think that journals would be wise to stop using it…

Johan—
http://www.geo.vu.nl/~jkool/

There are two ways to take abbreviated journal names into account. Add
an abbreviated title to the data, or let the citation processor
abbreviate the name.

Correct.

There are some pros and cons for each approach.

Add abbreviated title to data:

  • abbreviated title might not be present
  • only one abbreviation, different styles might use/prefer different
    abbreviations

Hmm … I hadn’t thought about that one. Do styles specify how to
abbreviate journal titles? Do they include other periodicals (court
reporters, magazines, newspapers) too?

  • much easier for the citation processor to implement

Let the citation processor abbreviate:

  • making a full list of all abbreviations is a lot of work

And error prone? And consider the code involved in internationalizing
it.

  • title might always abbreviated (although maybe not always exactly
    correct) *
  • each style can define its own preferred abbreviations

*) This is the case if it’s allowed to replace strings partly, e.g.
Journal → J.
Climatology → Clim.

There are probably more things for this list. I’m curious which method
you prefer. It seems to me that Bruce prefers the first. Matthias and
me seemingly had the second option in mind.

I have no strong opinion at the moment. I suppose my immediate question
is whether CSL needs an abbreviatedTitle element, or whether the
attribute is enough. I’ve already added the latter.

This whole abbreviation habit is a very annoying habit, and to be
honest I think that journals would be wise to stop using it…

I agree. There are a lot of citation practices that I think are a
vestige of a time before computers. I wonder if this is one of them. I
am really reluctant to support some of those old features, like the
absolutely heinous practice in note citations of doing “op. cit.” Every
time I read a book that uses that convention I find myself frustrated.

BTW, my name abbreviation example is “Federal Bureau of Investigation”
→ “FBI”

Bruce