Dates

How should I handle dates such as “First Quarter 2006” or “Summer 1995” with
CSL?

Should I simply handle this text as the month? Or do we need a separate
element?

Simon

I’ve been working on the RDF schema again and just include a
"dateSupplement" property for this. I’m not really sure the best
approach for CSL. Opinions?

I guess my hunch is to month, but am open to an explicit structure.

Bruce

Hmm.

it occurs to me that we need some way to
support mm/dd/yy dates, as well as things like the Polish PN-ISO
standard,
which someone was posting about on the MSDN blog:

Microsoft Word [online]. Wikipedia, The Free Encyclopedia, 2006-05-5
16:42Z
[accessed: 2006-07-23 11:59Z]. Available at:
http://en.wikipedia.org/w/index.php?title=Microsoft_Word&oldid=653001

I think this is the wrong page.

To handle all the possibilities, we need long, abbreviated, number with
leading zeroes, and number without leading zeroes forms for months, as
well
as long (4 digit) and short (2 digit) forms for years and a leading
zeroes
form for days.

One of the reasons I wanted not to deal with this in CSL and leave as
an implementation detail :wink:

I’d assume that if we put the date supplement in as the
month, it would replace the long and abbreviated forms, but not the
number
forms? Is it worth creating a separate element just so things are
clearer?

Since the link above isn’t right, I’m not exactly sure what you’re
arguing.

Bruce

it occurs to me that we need some way to
support mm/dd/yy dates, as well as things like the Polish PN-ISO
standard,
which someone was posting about on the MSDN blog:

Microsoft Word [online]. Wikipedia, The Free Encyclopedia, 2006-05-5
16:42Z
[accessed: 2006-07-23 11:59Z]. Available at:
http://en.wikipedia.org/w/index.php?title=Microsoft_Word&oldid=653001

I think this is the wrong page.

That’s because it’s not the page that I’m discussing; it’s the style of that
citation. The point is that dates are formatted as “2006-05-5” and the time
is included as well, which CSL can’t presently handle. The page from which I
took that example is
http://blogs.msdn.com/joe_friend/archive/2006/07/13/664960.aspx#675641.

To handle all the possibilities, we need long, abbreviated, number with
leading zeroes, and number without leading zeroes forms for months, as
well
as long (4 digit) and short (2 digit) forms for years and a leading
zeroes
form for days.

One of the reasons I wanted not to deal with this in CSL and leave as
an implementation detail :wink:

I’d assume that if we put the date supplement in as the
month, it would replace the long and abbreviated forms, but not the
number
forms? Is it worth creating a separate element just so things are
clearer?

Since the link above isn’t right, I’m not exactly sure what you’re
arguing.

The idea is that different styles specify different ways of formatting
dates. Most use either the long or short form of the month, the day, and the
full year, which CSL already supports. Other styles, such as PN-ISO (an
example of which I gave above) and probably a good many more, use a shorter
form, such as yyyy-mm-d. If we actually want to handle things correctly, we
can’t leave the formatting style to the implementation, because the PN-ISO
standard does define exactly what the dates should look like.

Simon

OIC!

I grant the point about standard date forms, though the idea that one
should include the time for an access date is absolutely absurd!

Basically, at the data level, that means that while right now we
support xsd:date, xsd:gYear, and xsd:gYearMonth – as well as a kind
of plain text “other date” thing – we’d then need to extend that to
xsd:dateTime.

I don’t know about you guys, but I’m leery of doing that, though if it
needs to be done, so be it. My hunch is it’s a bad idea.

Now, a separate issue is the actual rendering of the dates.

One option is an optional attribute (maybe just use form) on date. E.g.:

<date form="w3c"/>

The value is wrong (I forget the correct one), but you get the idea.
I’m sure you realize this is a complicated issue, so I vote we rely on
existing standards wherever possible. That’s, BTW, why I earlier had
date formatting using native XSLT 2.0 commands.

Bruce

I’m making slow but steady progress on a replacement for the
Javascript CSL processor in Zotero. It’s looking simpler by the day,
and this morning I had a thought about localization of dates. You
already have explicit date formatting in there. Conceptually, at
least, what the new processor will do (which I reckon must be
something close to what the Haskell processor does – Andrea?) is
effectively to link the XML objects in the style to a library, much as
a program is linked at compile time, and execute them, with the Item
object as a data source. If you just permit the date span in the
locale area, and provide a name= attribute to it like a macro, it’s
simple to hook up. If you’ve decided what the syntax should look
like, just let me know.

Frank

OK, we need to fix this; might as well do it now (though I need people
to pay attention and implement this!) …

I’m making slow but steady progress on a replacement for the
Javascript CSL processor in Zotero. It’s looking simpler by the day,
and this morning I had a thought about localization of dates. You
already have explicit date formatting in there. Conceptually, at
least, what the new processor will do (which I reckon must be
something close to what the Haskell processor does – Andrea?) is
effectively to link the XML objects in the style to a library, much as
a program is linked at compile time, and execute them, with the Item
object as a data source. If you just permit the date span in the
locale area, and provide a name= attribute to it like a macro, it’s
simple to hook up. If you’ve decided what the syntax should look
like, just let me know.

Here’s what I think we need to move to for CSL syntax:

The forms for date would thus be:

“long” | “short” | “year”

So long and short would both refer to the localized month forms.

We probably need another attribute to toggle between full and
day-month only. Perhaps we have:

The locale portion would thus need something like:

Here’s one problem though; how do we deal with the delimiters and
their potential variation? Do we need to consider adding an attribute?

Bruce

OK, we need to fix this; might as well do it now (though I need people
to pay attention and implement this!) …

I’m making slow but steady progress on a replacement for the
Javascript CSL processor in Zotero. It’s looking simpler by the day,
and this morning I had a thought about localization of dates. You
already have explicit date formatting in there. Conceptually, at
least, what the new processor will do (which I reckon must be
something close to what the Haskell processor does – Andrea?) is
effectively to link the XML objects in the style to a library, much as
a program is linked at compile time, and execute them, with the Item
object as a data source. If you just permit the date span in the
locale area, and provide a name= attribute to it like a macro, it’s
simple to hook up. If you’ve decided what the syntax should look
like, just let me know.

Here’s what I think we need to move to for CSL syntax:

The forms for date would thus be:

“long” | “short” | “year”

So long and short would both refer to the localized month forms.

We probably need another attribute to toggle between full and
day-month only. Perhaps we have:

The locale portion would thus need something like:

Here’s one problem though; how do we deal with the delimiters and
their potential variation? Do we need to consider adding an attribute?

If you use suffix on month and day instead of prefix on day in the
example, the delimiters work out correctly for all cases, I think?

The locale portion would thus need something like:

Here’s one problem though; how do we deal with the delimiters and
their potential variation? Do we need to consider adding an attribute?

If you use suffix on month and day instead of prefix on day in the
example, the delimiters work out correctly for all cases, I think?

Sorry, but I’m not following. Can you restate?

Bruce

The locale portion would thus need something like:

Here’s one problem though; how do we deal with the delimiters and
their potential variation? Do we need to consider adding an attribute?

If you use suffix on month and day instead of prefix on day in the
example, the delimiters work out correctly for all cases, I think?

Sorry, but I’m not following. Can you restate?

I may not have understood myself. If there is one long form
throughout a style, this works:

A bare year comes out without space, a month and year come out with a
space between them, with month, day and year there is a comma after
the day. But come to think of it, that’s the same as the current date
environment, and probably not what you had in mind.

If a particular style uses a variant consistently, it can be included
in the style’s own locale, so that covers that case. If there are
variants within a style, giving a name to date-layout (treating it
like a macro), with the first one listed as the default would work.

I may be missing something, though. Where would the comma in
day-month-year-delimiter=", " be placed?

Frank

I may not have understood myself. If there is one long form
throughout a style, this works:

A bare year comes out without space, a month and year come out with a
space between them, with month, day and year there is a comma after
the day. But come to think of it, that’s the same as the current date
environment, and probably not what you had in mind.

If a particular style uses a variant consistently, it can be included
in the style’s own locale, so that covers that case. If there are
variants within a style, giving a name to date-layout (treating it
like a macro), with the first one listed as the default would work.

Well, the one without the name would probably be default.

I may be missing something, though. Where would the comma in
day-month-year-delimiter=", " be placed?

In the case I gave, it’d be a suffix on the (imaginary) month-day
group; in other case, it’d be a prefix.

Maybe not that elegant, but I’m searching for the simplest way to
achieve this sort of localization.

Bruce

I’ve been discussing date localization with Frank outside this mailing list,
and I’d like to propose a new upgrade path. First, I’m under the following
assumptions:

  • automatic conversion of date formats to the new form-based format is
    probably going to be hard to do reliable. Any comprehensive conversion will
    probably either require (extensive) manual curation, or will result in
    changed output in some styles, which we’ll probably want to prevent.
  • date localization is actually only desirable in a small subset of
    (generic) styles. Journal-specific styles will function just fine without
    it. Most (if not all) should use “default-locale” in any case to prevent any
    kind of localization (support for “default-locale” will be introduced with
    Frank’s citeproc-js").
  • date localization also means localization of date-formatting (and not just
    localization of date-order).

So, I’m in favor of:

  • leaving all non-generic styles as is. If people truly desire date
    localization for their style, the documentation will lead them there. It
    might be easiest to change the generic styles (only about 10) by hand.
  • from the point above, it follows that we will keep supporting the old date
    format in CSL 1.0 (and beyond). Styles can mix both formats, but only dates
    of the form will be localized.
  • the format of dates of the -form are specified in the
    locales, and can be overruled in the locale-section of the style.
  • leaving date-parts listings ordered. This because in my proposal we apply
    formatting in the locales, which is probably often order-specific. This
    would eliminate the need for the date-part order attributes.

In addition, Frank and I deem it necessary to forbid locales to use affixes
in the date element for date formatting. I don’t think there are cases where
formatting of date-parts wouldn’t be sufficient for date localization.
Applying affixes to cs:date should be reserved to the date-call in the style
(for parentheses and other style-specific punctuation, e.g. ). Related to this, I was musing about the need for a
CSL schema for locales, in addition of the one for styles. Babelzilla
translators probably aren’t very accustomed to translating XML files, so
they can be expected to mess things up sometimes :).

Some example code of how dates could be localized:

In the style (here the English date format is overruled by the
style-locale-section):—










n.d.
















In the locale (only showing the full-form of date):

n.d. ---

Rintze

I’ve been discussing date localization with Frank outside this mailing list,
and I’d like to propose a new upgrade path. First, I’m under the following
assumptions:

  • automatic conversion of date formats to the new form-based format is
    probably going to be hard to do reliable. Any comprehensive conversion will
    probably either require (extensive) manual curation, or will result in
    changed output in some styles, which we’ll probably want to prevent.
  • date localization is actually only desirable in a small subset of
    (generic) styles. Journal-specific styles will function just fine without
    it. Most (if not all) should use “default-locale” in any case to prevent any
    kind of localization (support for “default-locale” will be introduced with
    Frank’s citeproc-js").
  • date localization also means localization of date-formatting (and not just
    localization of date-order).

So, I’m in favor of:

  • leaving all non-generic styles as is.

I was thinking the XSL can check to see if the style is a base style,
and only convert if it is.

If people truly desire date
localization for their style, the documentation will lead them there. It
might be easiest to change the generic styles (only about 10) by hand.

  • from the point above, it follows that we will keep supporting the old date
    format in CSL 1.0 (and beyond). Styles can mix both formats, but only dates
    of the form will be localized.
  • the format of dates of the -form are specified in the
    locales, and can be overruled in the locale-section of the style.
  • leaving date-parts listings ordered. This because in my proposal we apply
    formatting in the locales, which is probably often order-specific. This
    would eliminate the need for the date-part order attributes.

In addition, Frank and I deem it necessary to forbid locales to use affixes
in the date element for date formatting. I don’t think there are cases where
formatting of date-parts wouldn’t be sufficient for date localization.
Applying affixes to cs:date should be reserved to the date-call in the style
(for parentheses and other style-specific punctuation, e.g. ). Related to this, I was musing about the need for a
CSL schema for locales, in addition of the one for styles. Babelzilla
translators probably aren’t very accustomed to translating XML files, so
they can be expected to mess things up sometimes :).

Care to add your suggested changes to the schema and the spec?

Bruce

Care to add your suggested changes to the schema and the spec?

Bruce

I made a patch (http://groups.google.com/group/zotero-dev/web/dates.patch).
The changes I made:

  • Instead of

styles should now use

  • Affixes are taken out of ‘formatting’, so I could make it impossible to
    specify affixes for localized date formats
  • The options to specify date-order have been removed
  • ‘year-other’ has been divided in ‘year’ and ‘other’. In the old situation,
    it was possible to set form for ‘other’, which seemed weird.
  • annotation now specifies that date-parts are ordered
  • dates in the style can be called by a form (to get date localization), or
    by specifying date-parts

I also wrote a style (http://groups.google.com/group/zotero-dev/web/test.csl),
which specifies a custom date-format, and which validates to my patched
csl.rnc.

Rintze

Care to add your suggested changes to the schema and the spec?

Bruce

I made a patch (http://groups.google.com/group/zotero-dev/web/dates.patch).
The changes I made:

OK. I didn’t really have time to check it, but committed it. Can
people please take a look at the updated “split” schema and test it,
and take a look at Rintze’s example instance?

Bruce

Good news for a start. Form attributes added to pass the previouis
schema now cause an error on date-part (which I take it is correct).
So no problem so far##

Error in CSL for test: affix_PrefixFullCitationTextOnly

ELEMENT “DATE-PART” FROM NAMESPACE
"HTTP://PURL.ORG/NET/XBIBLIO/CSL" NOT ALLOWED IN THIS CONTEXT @ line
17

1 <style xmlns="http://purl.org/net/xbiblio/csl"
2 class="in-text"
3 xml:lang="en"
4 version=“1.0”>
5
6
7
8 2007-10-26T21:32:52+02:00
9
10
11
12
13
14
15
16
17
18
19
20
21
22

The only thing I’m not sure about is whether the current selection of date
forms still makes the most sense (full, year, month-day). I also just
discovered that for specifying the date layout in the locale-section of the
style, any date-parts can be called, which might be undesirable (e.g. for
form=“year”, you can include not only the date-part with name=“year”, but
also those with the names “month”, “day” and “other”).

Rintze

I’d like to revive the discussion of dates with a proposal that I
think covers our known use cases. It comes in five parts:

(1) Date input is fully structured
(2) Range handling is implicit
(3) n.d. handling is implicit
(4) Status field is separate from dates
(5) Formatting of fuzzy dates relies on a new term “circa” and new
date option circa=“term|punctuation”.

Re (1), a description of the structured input form is here:
http://gsl-nagoya-u.net/http/pub/citeproc-doc.html#dates (Temporary
location, may be removed or copied out to another doc later.) A
sample of free-text strings that might be recognized by a client is
attached to this message.

Re (2), the processor will render any dates input with range data in
the most concise ranged form. Delimiters and spacing will be
implicit, for the time being at least. No changes necessary to the
schema.

Re (3) and (4), this seems to be where the forum discussion of dates
has arrived.

Re (5), the option is needed to cover styles that flag an approximate
date by placing it in square braces.

Some additional details:

Automatic substitution of the “no date” term for an empty date would
only happen for the “issued” variable.

The client-side parser might screen out “in-press” and “forthcoming”
terms, to make it clear that these do not belong in the date field.
If citeproc-js grows an error-reporting mechanism, the presence of
these values could be taddled back to the client.

Thoughts?

Frank

dates.txt (1.17 KB)

I’d like to revive the discussion of dates with a proposal that I
think covers our known use cases. It comes in five parts:

(1) Date input is fully structured
(2) Range handling is implicit
(3) n.d. handling is implicit
(4) Status field is separate from dates
(5) Formatting of fuzzy dates relies on a new term “circa” and new
date option circa=“term|punctuation”.

Re (1), a description of the structured input form is here:
http://gsl-nagoya-u.net/http/pub/citeproc-doc.html#dates (Temporary
location, may be removed or copied out to another doc later.) A
sample of free-text strings that might be recognized by a client is
attached to this message.

Re (2), the processor will render any dates input with range data in
the most concise ranged form. Delimiters and spacing will be
implicit, for the time being at least. No changes necessary to the
schema.

Re (3) and (4), this seems to be where the forum discussion of dates
has arrived.

Re (5), the option is needed to cover styles that flag an approximate
date by placing it in square braces.

I still strongly suggest that we pay careful attention to the work
going on at the LoC, rather than invent our own scheme. For example,
it has “questionable” dates (“1023?”) and “approximate” (aka circa)
dates (“1023~”).

http://www.loc.gov/standards/datetime/

Some additional details:

Automatic substitution of the “no date” term for an empty date would
only happen for the “issued” variable.

The client-side parser might screen out “in-press” and "forthcoming"
terms, to make it clear that these do not belong in the date field.
If citeproc-js grows an error-reporting mechanism, the presence of
these values could be taddled back to the client.

Thoughts?

Only thing else I would say ATM is that I’m a little leary of the
client-side stuff, since it might be that the current approach taken
by zotero is not the best approach.

Bruce