Dates

Do we have a mechanism for handling something like
http://forums.zotero.org/discussion/7093/hebrew-dats/?

Ugh.

Is not the output representation just that, so that it could be stored
in standard numerical form?

I doubt the problem is different for Asian languages, for example, and
it wouldn’t seem reasonable to store some dates in Kanji characters,
etc.

Bruce

I’d like to revive the discussion of dates with a proposal that I
think covers our known use cases. It comes in five parts:

(1) Date input is fully structured
(2) Range handling is implicit
(3) n.d. handling is implicit
(4) Status field is separate from dates
(5) Formatting of fuzzy dates relies on a new term “circa” and new
date option circa=“term|punctuation”.

Re (1), a description of the structured input form is here:
gsl-nagoya-u.net - This website is for sale! - gsl nagoya u Resources and Information. (Temporary
location, may be removed or copied out to another doc later.) A
sample of free-text strings that might be recognized by a client is
attached to this message.

Do we have a mechanism for handling something like
http://forums.zotero.org/discussion/7093/hebrew-dats/?

Aiyeeee!

Nope, the scheme I have in place won’t handle this, and parsing out
non-Gregorian dates from free text is probably not worth the candle.
It’s not a problem with the script, but with the dating system itself.

The Jewish calendar adds an intercalary month in weird years to keep
the months roughly in sync with the seasons, because the length of 12
lunar months together adds up to less than 364.25. The pattern is
reputedly cyclical, so I guess someone somewhere has probably reduced
this one to code. But the equally important lunar calendars of
various Asian countries (Japan, China and Mongolia, and probably
others, all slightly different) also use an intercalary year, but the
yearly pattern is cast afresh, and arbitrarily, by the relevant
authorities for each year. The only way to map dates through to
another system is by referring to a massive set of explicit
transliteration tables.

Maybe a candidate for a plugin extension to Zotero?

Meanwhile … what to do? One way of at least salvaging the field for
users who need to enter dates like this might be to allow a quote
escape on date fields, to force literal passthrough. I know it would
clutter up the data, but an explicit escape mechanism would have the
benefit of certainty.

Frank

I’d like to revive the discussion of dates with a proposal that I
think covers our known use cases. It comes in five parts:

(1) Date input is fully structured
(2) Range handling is implicit
(3) n.d. handling is implicit
(4) Status field is separate from dates
(5) Formatting of fuzzy dates relies on a new term “circa” and new
date option circa=“term|punctuation”.

Re (1), a description of the structured input form is here:
gsl-nagoya-u.net - This website is for sale! - gsl nagoya u Resources and Information. (Temporary
location, may be removed or copied out to another doc later.) A
sample of free-text strings that might be recognized by a client is
attached to this message.

Re (2), the processor will render any dates input with range data in
the most concise ranged form. Delimiters and spacing will be
implicit, for the time being at least. No changes necessary to the
schema.

Re (3) and (4), this seems to be where the forum discussion of dates
has arrived.

Re (5), the option is needed to cover styles that flag an approximate
date by placing it in square braces.

I still strongly suggest that we pay careful attention to the work
going on at the LoC, rather than invent our own scheme. For example,
it has “questionable” dates (“1023?”) and “approximate” (aka circa)
dates (“1023~”).

http://www.loc.gov/standards/datetime/

Plus “uncertain dates”, I see, and open ranges. I think that about
covers what I’ve missed.

What is the difference between a “questionable date” and an
“approximate date”? It doesn’t seem an obvious or exact distinction.

We already covered the issue of escaping in the
forums—http://forums.zotero.org/discussion/6111/better-date-field/#Item_13—and
you agreed that escaping wasn’t necessary. Also, as far as I understand
it this would be a Zotero feature, not a processor feature, so it’s not
even really within our scope here as long as there’s a mechanism in the
processor for passing a literal.

(For what it’s worth, I did mention the Hebrew date example on the
ticket (https://www.zotero.org/trac/ticket/888) where I (and Sean)
argued for fallback to literal pass-through when the field didn’t parse
cleanly.)

Do we have a mechanism for handling something like
http://forums.zotero.org/discussion/7093/hebrew-dats/?

Aiyeeee!

Nope, the scheme I have in place won’t handle this, and parsing out
non-Gregorian dates from free text is probably not worth the candle.
It’s not a problem with the script, but with the dating system itself.

The Jewish calendar adds an intercalary month in weird years to keep
the months roughly in sync with the seasons, because the length of 12
lunar months together adds up to less than 364.25. The pattern is
reputedly cyclical, so I guess someone somewhere has probably reduced
this one to code. But the equally important lunar calendars of
various Asian countries (Japan, China and Mongolia, and probably
others, all slightly different) also use an intercalary year, but the
yearly pattern is cast afresh, and arbitrarily, by the relevant
authorities for each year. The only way to map dates through to
another system is by referring to a massive set of explicit
transliteration tables.

Maybe a candidate for a plugin extension to Zotero?

Meanwhile … what to do? One way of at least salvaging the field for
users who need to enter dates like this might be to allow a quote
escape on date fields, to force literal passthrough. I know it would
clutter up the data, but an explicit escape mechanism would have the
benefit of certainty.

We already covered the issue of escaping in the
forums—Better Date Field - Zotero Forums
you agreed that escaping wasn’t necessary.

Quite so.

Also, as far as I understand
it this would be a Zotero feature, not a processor feature, so it’s not
even really within our scope here as long as there’s a mechanism in the
processor for passing a literal.

Mea culpa.

To explain, there was one small wrinkle that I thought might affect
how unrecognized strings are handled. Oxford, Cambridge, and the
court systems of Commonwealth jurisdictions use alternative seasons:
Michaelmas, Hilary, Easter, Trinity. These are needed for dating some
case reports.

At the moment, my input description for these just accepts them as
plain strings (whereas proper seasons are numbers, 1-4). So I was
thinking crufty characters → literal season → parsing never fails.

Thinking more carefully, though, deciding whether to pass through a
“seasons” string as free text, and deciding whether a full parse of a
date string ended in “success” or “failure” are separate questions
that depend on different factors. So the two problems to not conflict
… and as you say, that end of things can and should be the concern
of Zotero.

(Still open to comments on the input format itself.)

(For what it’s worth, I did mention the Hebrew date example on the
ticket (#888 ("is-date" should return "true" only if date parses cleanly) – Zotero) where I (and Sean)
argued for fallback to literal pass-through when the field didn’t parse
cleanly.)

I will provide for this now. Sorry for letting it slip.

I’d like to revive the discussion of dates with a proposal that I
think covers our known use cases. It comes in five parts:

(1) Date input is fully structured
(2) Range handling is implicit
(3) n.d. handling is implicit
(4) Status field is separate from dates
(5) Formatting of fuzzy dates relies on a new term “circa” and new
date option circa=“term|punctuation”.

Re (1), a description of the structured input form is here:
gsl-nagoya-u.net - This website is for sale! - gsl nagoya u Resources and Information. (Temporary
location, may be removed or copied out to another doc later.) A
sample of free-text strings that might be recognized by a client is
attached to this message.

Re (2), the processor will render any dates input with range data in
the most concise ranged form. Delimiters and spacing will be
implicit, for the time being at least. No changes necessary to the
schema.

Re (3) and (4), this seems to be where the forum discussion of dates
has arrived.

Re (5), the option is needed to cover styles that flag an approximate
date by placing it in square braces.

I still strongly suggest that we pay careful attention to the work
going on at the LoC, rather than invent our own scheme. For example,
it has “questionable” dates (“1023?”) and “approximate” (aka circa)
dates (“1023~”).

http://www.loc.gov/standards/datetime/

I’m not sure what the difference is between “questionable” and
“approximate”. Under the Requirements link, they have this:

Questionable dates. E.g. 1992? would mean “possibly” the year 1992,

but not “definitely”.

Approximate dates. E.g. 1992~ would mean “approximately” the year 1992.

But that strikes me as a distinction without a difference.

MHRA has a different, but more concrete pair of requirements for
“fuzziness”, including dates:

“Any detail of publication which is not given in the book itself but
can be ascertained should be enclosed in square brackets, e.g.
‘[Paris]’, ‘[1987]’. For details that are assumed but uncertain, use
the form ‘[Paris(?)]’, ‘[1987(?)]’.”

(link courtesy of Rintze:
http://www.mhra.org.uk/Publications/Books/StyleGuide/StyleGuideV2_2.pdf)

Should we recognize these MHRA params in dates input?

I’m not sure what the difference is between “questionable” and
“approximate”. Under the Requirements link, they have this:

Questionable dates. E.g. 1992? would mean “possibly” the year 1992,

but not “definitely”.

Approximate dates. E.g. 1992~ would mean “approximately” the year 1992.

But that strikes me as a distinction without a difference.

I’m in touch with the guy that is running this at the LoC. I’ll ask
him. It might have to do with some (another) odd library tradition.

MHRA has a different, but more concrete pair of requirements for
“fuzziness”, including dates:

“Any detail of publication which is not given in the book itself but
can be ascertained should be enclosed in square brackets, e.g.
‘[Paris]’, ‘[1987]’. For details that are assumed but uncertain, use
the form ‘[Paris(?)]’, ‘[1987(?)]’.”

(link courtesy of Rintze:
http://www.mhra.org.uk/Publications/Books/StyleGuide/StyleGuideV2_2.pdf)

Should we recognize these MHRA params in dates input?

No.

Bruce

I’m not sure what the difference is between “questionable” and
“approximate”. Under the Requirements link, they have this:

Questionable dates. E.g. 1992? would mean “possibly” the year 1992,

but not “definitely”.

Approximate dates. E.g. 1992~ would mean “approximately” the year 1992.

But that strikes me as a distinction without a difference.

I’m in touch with the guy that is running this at the LoC. I’ll ask
him. It might have to do with some (another) odd library tradition.

MHRA has a different, but more concrete pair of requirements for
“fuzziness”, including dates:

“Any detail of publication which is not given in the book itself but
can be ascertained should be enclosed in square brackets, e.g.
‘[Paris]’, ‘[1987]’. For details that are assumed but uncertain, use
the form ‘[Paris(?)]’, ‘[1987(?)]’.”

(link courtesy of Rintze:
http://www.mhra.org.uk/Publications/Books/StyleGuide/StyleGuideV2_2.pdf)

Should we recognize these MHRA params in dates input?

No.

Just to confirm, I don’t mean the markup, but the characteristics of
the date; (a) that a date for a resource is known, but not recorded in
the cited item itself; and (b) that a date is uncertain or
approximate. Are you saying that the data should not contain hints
concerning (a)?

I’m mainly just hoping to be able to focus on one scheme for this. I
emailed the LoC guy for feedback, but I suspect they may be compatible
concepts.

Bruce