dates processing and csl

So there’s a discussion thread on this Zotero ticket about dates and CSL:

https://www.zotero.org/trac/ticket/888

I’d like to move it here to resolve it on the CSL end, with examples.

For sake of simplicity, let’s take two cases:

a. non-month dates: “2003, April/May” or “2005, Fall”

b. approximate dates: “c1034”

From the CSL perspective, I’d expect that if a processor sees …

… it will print those dates.

I guess right now Zotero would strip “c” and “Fall”, based in part on
the position that users will put all kinds of unexpected crap in that field.

So the question is, do we need to care about this here?

It seems to me that answer is probably yes, if the presentation varies
by style and/or language. At minimum, the locales files need terms like
"cerca" and (perhaps) the seasons, as well as “no-date” (if it’s not
already there). That way we leave it to processors to know what to do
with these sorts of dates, and they can choose to do nothing.

But I’m not sure whether we need more, and I’m not sure whether we need
the is-date attribute (which may cause unnecessary confusion for no
clear benefit). E.g. the bottom-line seems to me:

  1. Zotero ought to be less restrictive on passing through dates, and
    maybe make date parsing smarter over time to account for these cases,
    and …

  2. in CSL, we add the terms noted above, and consider again the
    practical use case that “is-date” solves. If we can’t find one, we
    remove it.

Bruce

For sake of simplicity, let’s take two cases:

a. non-month dates: “2003, April/May” or “2005, Fall”

b. approximate dates: “c1034”

Having had a look at a few of the styles that care about this (a large
number don’t as they only relate to recent things - biochemistry, physics
journals for example), there is a reasonable spread of formats.
for the c1304 format, I’ve seen
c. 1304
ca. 1304
circa 1304
dictated in different guides. I haven’t come across the 1304? style, but I’m
sure its out there. Now if you want to support those as a per style options,
then there is some translation required and some CSL support required.
It seems we need something like a <label… > for this sort of thing.

Date ranges are also used - as in 2003-2005 - I think this occurs
particularly with multi-volume sets.

I can see two ways ahead - the parsing and language gets smarter, and can so
parse these types and the CSL can put them back together in a suitable
format.
Otherwise it gets dumber, and the date can be treated like a string of
characters and included verbatim.
The first is probably better, the 2nd easier.

Julian.

I haven’t come across the 1304? style, but I’m sure its out there.

1304? and c. 1304 do not mean the same thing.

c. 1567 (or ca. 1567 or circa 1567) means approximately 1567.

1567? means 1567 is thought to be the date but there is some uncertainty. It
does not say that the alternative is some date around 1567. The alternative
could be 1576.

[1567] means that no date was given on the publication but an editor or
subsequent commentator has added the date.

1567/8 could mean either a range that includes 1567 and 1568, but more
likely means a period in which the date fell into what was called 1567 in
one place but 1568 in another.

Many combinations are possible. A work I cite is dated [c.1603?]. It had no
publication date, therefore the brackets. Scholars have reason to believe
(therefore the ?) it was published around (therefore the c.) 1603. A
competing theory is that it was published in the early 1620s.

One might say that some of the combinations are redundant. No publication
prints circa in a date, so [c.1603] might as well just say c.1603. But for
consistency the redundant brackets are sometimes used nonetheless.

In summary:

circa has to do with exactness

? has to do with certainly

/ has to do with alternate names for the same date (usually)

  • has to do with range (sometimes a hyphen, better an en dash)

has to do with the source of the information

JohnFrom: xbiblio-devel-bounces@lists.sourceforge.net
[mailto:xbiblio-devel-bounces@lists.sourceforge.net] On Behalf Of Julian
Onions
Sent: Monday, January 21, 2008 7:07 AM
To: development discussion for xbiblio
Subject: Re: [xbiblio-devel] dates processing and csl

For sake of simplicity, let’s take two cases:

a. non-month dates: “2003, April/May” or “2005, Fall”

b. approximate dates: “c1034”

Having had a look at a few of the styles that care about this (a large
number don’t as they only relate to recent things - biochemistry, physics
journals for example), there is a reasonable spread of formats.
for the c1304 format, I’ve seen
c. 1304
ca. 1304
circa 1304
dictated in different guides. I haven’t come across the 1304? style, but I’m
sure its out there. Now if you want to support those as a per style options,
then there is some translation required and some CSL support required.
It seems we need something like a <label… > for this sort of thing.

Date ranges are also used - as in 2003-2005 - I think this occurs
particularly with multi-volume sets.

I can see two ways ahead - the parsing and language gets smarter, and can so
parse these types and the CSL can put them back together in a suitable
format.
Otherwise it gets dumber, and the date can be treated like a string of
characters and included verbatim.
The first is probably better, the 2nd easier.

Julian.

Except in cases like “1988 [1846]” where the brackets refer to an
original publication. I suppose one could still consider the brackets
as more generic editorial addition, though, if you consider that the
librarians tend to be focused citation data as that stuff printed on
book covers and such.

Bruce

[1567] means that no date was given on the publication but an editor or
subsequent commentator has added the date.

Except in cases like “1988 [1846]” where the brackets refer to an
original publication. I suppose one could still consider the brackets
as more generic editorial addition, though, if you consider that the
librarians tend to be focused citation data as that stuff printed on
book covers and such.

Yes, that’s a better way to say it. Brackets are for editorial addition. It
might be that there was no printed publication date (as in my example), but
there could be, as Bruce rightly says.

I can see two ways ahead - the parsing and language gets smarter,
and can so parse these types and the CSL can put them back together
in a suitable format.
Otherwise it gets dumber, and the date can be treated like a string
of characters and included verbatim.
The first is probably better, the 2nd easier.

These are not mutually exclusive.

The issue is whether we make users wait while we figure out all
parsing criteria. Right now date processing is woefully inadequate for
citing historical documents in a variety of styles and fields, and
this needs to be fixed sooner rather that later.

Dan was proposing on the ticket to test the field for existing parsing
conditions, but return the entire string if the field doesn’t parse
cleanly. This seems reasonable to me particularly since John McCaskey
and others already came up with dozens of conditions and terms that
need to be parsed.

I hate to sound like a broken record again, but BCE dates are also
important and need to be properly parsed, localized, and sorted.

Elena> Julian.

Dan was proposing on the ticket to test the field for existing parsing
conditions, but return the entire string if the field doesn’t parse
cleanly. This seems reasonable to me particularly since John McCaskey
and others already came up with dozens of conditions and terms that
need to be parsed.

Right.

My primary concerns in this sort of thing are twofold: obviously how
to deal with it on the CSL end, but also how to reliably encode it in
a structured way so that these concerns largely go away longer term. I
think being liberal in the short-run and on the CSL end, while pushing
towards smarter data modeling, parsing and encoding longer-term is
probably the most sensible approach.

I hate to sound like a broken record again, but BCE dates are also
important and need to be properly parsed, localized, and sorted.

Because this is trivial from the data-encoding and formatting end, and
I’ve said so in various places, various times. It’s the easiest of
these problems to solve. Encode like “-0444” (which is a valid
xsd:gYear, works for sorting, etc.) and add an appropriate term
(“before-christ” or “bce”) to the locales for localization. Actually
tell Zotero users to use this convention, and help them see that it
works in the UI.

Julian is probably right that a comprehensive solution would need to
account for different formatting around those terms, but this would be
a useful first step on tne CSL end.

Bruce

I hate to sound like a broken record again, but BCE dates are also
important and need to be properly parsed, localized, and sorted.

Because this is trivial from the data-encoding and formatting end, and
I’ve said so in various places, various times. It’s the easiest of
these problems to solve. Encode like “-0444” (which is a valid
xsd:gYear, works for sorting, etc.) and add an appropriate term
(“before-christ” or “bce”) to the locales for localization. Actually
tell Zotero users to use this convention, and help them see that it
works in the UI.

The encoding of “444 BCE” as “-0444” will be done in Zotero or in CSL?
If in CSL, what is the conditional I should use to find out whether
it’s a BCE date or not?

Of course, Russian users would enter 444 BCE as 444 до н. э. (not
sure if Cyrillic will go through but you get the idea). I would argue
against forcing non-English speakers to use BCE as a convention when
there is a perfectly valid equivalent in their own language,
particularly given that Zotero is supposed to be localized for
multiple languages.

Elena> Julian is probably right that a comprehensive solution would need to

I hate to sound like a broken record again, but BCE dates are also
important and need to be properly parsed, localized, and sorted.

Because this is trivial from the data-encoding and formatting end, and
I’ve said so in various places, various times. It’s the easiest of
these problems to solve. Encode like “-0444” (which is a valid
xsd:gYear, works for sorting, etc.) and add an appropriate term
(“before-christ” or “bce”) to the locales for localization. Actually
tell Zotero users to use this convention, and help them see that it
works in the UI.

The encoding of “444 BCE” as “-0444” will be done in Zotero or in CSL?
If in CSL, what is the conditional I should use to find out whether
it’s a BCE date or not?

The simplest thing that should work now is that CSL doesn’t have to
know about this. The processor (in this case Zotero) should know when
it comes across a BCE date, and pass it on with the appropriate
localized string. It should also know how to sort them.

I’m ATM seeing this as roughly analogous to personal names. CSL
doesn’t tell Zotero how to sort or display them, except in very basic
ways. This was a conscious design to decision to avoid imposing
culturally-specific conventions on what I would hope could bee used i
locales around the world.

We only need a conditional if in fact we need to configure how the
localized term is displayed in different styles (beyond that it might
be localized). if we need to do that, then you’re right: we need to
either follow Julian’s hunch to extend cs:label, or we need a
conditional (not sure what it would be).

Of course, Russian users would enter 444 BCE as 444 до н. э. (not
sure if Cyrillic will go through but you get the idea). I would argue
against forcing non-English speakers to use BCE as a convention when
there is a perfectly valid equivalent in their own language,
particularly given that Zotero is supposed to be localized for
multiple languages.

Agreed.

Bruce