EDTF and Date Representation

Bruce_D_Arcus1 · July 4, 2020, 11:21am

As I was thinking about this, thought I’d post it …

We have recently added EDTF to represent dates on the input schema.

Open questions are which features we support, and the status of the date-parts JSON representation. If it stays long-term, we might want to reconcile them as much as possible now.

If not, this could help with documenting regardless.

Feature	EDTF	CSL JSON 1.0	Notes
Seasons	2019-21	“season” property	The JSON could also support this on the month part.
Uncertainty	2019-03-21? (uncertain) 2019-03-21~ (approximate) 2019-03-21% (both)	“circa” property	Which do we support, and how do we interpret in CSL? Biblatex treats ~ as circa.
Ranges (including open)	2018-21/2018-22 …/1900	supported, with two `date-part` sub-arrays	Specify 0 for open-end?
Decades and centuries	12XX (13th century) 195X (1950s decade)	not supported	I’m not clear if we really need this, but could be addressed by allowing X on the year part.

PaulStanley · July 4, 2020, 1:12pm

I’d say:

Seasons. Definitely yes. Seasons are, after all, already supported: the numbers are just different (IIRC 13=spring, 14=summer, 15=fall, 16=winter) on date parts. Luckily these don’t overlap, so one can just allow 13/21 = spring, 14/22 = summer, 15/23 = fall and 16/24 = winter, and there is no room for confusion. Processors will almost certainly normalize on input.
Uncertain, approximate, and uncertain+approximate: I’d support. I think this requires an additional test (is-approximate-date): and an additional potential value on the JSON input for that too. I doubt you need a test for is-uncertain-and-approximate: it should be possible to combine tests. But since uncertainty and approximation are different ideas, it makes sense to separate them.
Ranges. Definitely support, since they are already supported. Could you allow this on date-parts by adopting the convention that 0 means open? Even if you could, I wouldn’t: You don’t want to break date-parts, but there’s no particularly good reason to extend them.
Decades and centuries. Part of me says “don’t support”, because it’s going to be a ton of work. You will need locale-terms for centuries and decades, presumably in both long and short versions, and I suspect that the effort is not worth the very limited use. OTOH, one nearly always regrets bodging things like this. I don’t know enough about other languages to know how difficult the locale-specific stuff would be.

Bruce_D_Arcus1 · July 4, 2020, 2:03pm

Ah, so you’re saying we’re actually close, with just needing a single new property, like so, and some updated documentation?

      "date-parts": [
        [
          2017
        ],
        [
          0
        ]
      ],
      "approximate": true
    }

Though, it occurs to me now, that only allows the uncertainty or circa to be specified for the date as a whole. EDTF allows more fine-grained encoding; though again, am not certain we need that for this use case?

I think the same issue would apply to season ranges; not clear how we could represent “2017-21/2017-22” unless we encoded the season in month slot, as with EDTF.

PaulStanley · July 4, 2020, 3:35pm

I thought that the season was already encoded in the month slot! It does in the test suite anyway, though I think the spec is unclear about it. It encodes to 13, 14, 15, or 16, so [ [16], [2020] ] is Winter 2020. It’s certainly the reasonable thing to do, because you are not likely to have both season and month, and season really means “month as trimester”.

I think it’s reasonable to say that fine-grained control can be left to those who are willing to use EDTF. But it would be a pity if the JSON form wasn’t able to signal approximate, because I’d have thought the “c. 1900” or “1900?” forms are generally useful. I rather doubt very fine-grained control will often be needed, but if it is, the user can drop back to EDTF.

Citeproc-JS is also willing to devote a great deal of effort to parsing all manner of odd dates. That’s very kind of it, but I’d be inclined to say that it is “above and beyond the call of duty”. I’d think the standard should simply be: parse date-parts or EDTF properly, and no guarantees are required about dates in any other form: a processor may be helpful about them, or may simply pass them as text.

PS: I’d prefer “isApproximate”, because I think it’s desirable if variable names make their type explicit, but that’s a detail.

ETA: You might like to investigate whether existing styles use “uncertain-date” to mean “unknown” or “approx”. If they generally render it as “c.”, you might prefer to introduce two new flags, such as isIndeterminate and isApproximate and have isUncertain treated as if it meant isApproximate. That’s not ideal, but it might be kinder, if that’s how it’s currently mostly used.

Bruce_D_Arcus1 · July 4, 2020, 4:18pm

Ah, I guess season is for string representations.

Why we need to annotate this schema!

Part of what we need to settle, though, is what features of EDTF we support, and what we don’t.

Looking more closely at the spec, I think we do want the Level 1 feature “Qualification of a date (complete),” but we do not want the Level 2 extension “Qualification of Individual Component.”

That would keep the two date representations consistent, and simplify implementation.

In that case, probably the sensible path is to follow biblatex, and say we support levels 0 and 1, with the same approach to uncertain dates (though we need more feedback from others on the X feature, which biblatex does support).

Even if coupled with the existing “circa”?

Actually circa = EDTF approximate, so we need a new “uncertain/isUncertain.”

njbart · July 4, 2020, 5:40pm

Right. And to facilitate conversion between EDTF and date-parts we’d probably best retire the current “global” circa flag, and introduce two new flags (circa, or to make the connection to EDTF explicit: approximate, or approx; and uncertain) for both standalone/start and end date.

Bruce_D_Arcus1 · July 4, 2020, 8:00pm

How would that work, given the current parts model is an array?

And are we certain that this is valid EDTF Level 1: “2004-02-01?/2005-02-08”? Neither that spec nor the biblatex docs include such an example, though the implicit suggestion is it is.

njbart · July 5, 2020, 12:33pm

edtf.js shows that this is valid – using node:

const edtf = require('edtf')
edtf.parse('2004-02-01?/2005-02-08').values

Output:

[
  { type: 'Date', level: 1, values: [ 2004, 1, 1 ], uncertain: true },
  { type: 'Date', level: 0, values: [ 2005, 1, 8 ] }
]

And the next example indicates, by analogy, where the up to 2 x 2 “flags” should be placed in a modified date-parts structure.

edtf.parse('2004-02-01%/2005-02-08%').values

Output:

[
  {
    type: 'Date',
    level: 1,
    values: [ 2004, 1, 1 ],
    approximate: true,
    uncertain: true
  },
  {
    type: 'Date',
    level: 1,
    values: [ 2005, 1, 8 ],
    approximate: true,
    uncertain: true
  }
]

Bruce_D_Arcus1 · July 5, 2020, 12:43pm

Perfect.

While you’re at it, can you see how EDTF.js represents an open-end of a range? Would the values property just be null or 0, or an empty array?

njbart · July 5, 2020, 1:02pm

Sure – unknown: null; open: Infinity:

> edtf.parse('2004-02-01%/').values
[
  {
    type: 'Date',
    level: 1,
    values: [ 2004, 1, 1 ],
    approximate: true,
    uncertain: true
  },
  null
]
> edtf.parse('2004-02-01%/..').values
[
  {
    type: 'Date',
    level: 1,
    values: [ 2004, 1, 1 ],
    approximate: true,
    uncertain: true
  },
  Infinity
]

Bruce_D_Arcus1 · July 5, 2020, 1:13pm

And ../1900?

Does anyone see any downside in changing the schema along these lines?

The schema is versioned, so any legacy json will have a different schema id.

njbart · July 5, 2020, 1:14pm

> edtf.parse('../1900').values
[ Infinity, { type: 'Date', level: 0, values: [ 1900 ] } ]

Bruce_D_Arcus1 · July 5, 2020, 5:34pm

I played a bit with node and edtf, which shows (per @njbart’s tests) the model has a Date object, and an Interval object, whose “values” property is an array of Date objects.

> edtf.parse('2004-02-01')
{ type: 'Date', level: 0, values: [ 2004, 1, 1 ] }
> edtf.parse('2004-02-01?')
{ type: 'Date', level: 1, values: [ 2004, 1, 1 ], uncertain: true }
> edtf.parse('2004-02-01/2005-02-08')
{
  values: [
    { type: 'Date', level: 0, values: [Array] },
    { type: 'Date', level: 0, values: [Array] }
  ],
  type: 'Interval',
  level: 0
}
> edtf.parse('2004-02-01?/2005-02-08')
{
  values: [
    { type: 'Date', level: 1, values: [Array], uncertain: true },
    { type: 'Date', level: 0, values: [Array] }
  ],
  type: 'Interval',
  level: 1
}
> edtf.parse('../2005-02-08')
{
  values: [ Infinity, { type: 'Date', level: 0, values: [Array] } ],
  type: 'Interval',
  level: 1
}

bwiernik · July 5, 2020, 6:44pm

How do uncertain and approximate play with style coding? Right now, CSL just has one is-uncertain-date="issued" condition; it doesn’t distinguish between uncertain and approximate senses of “circa”.

Are uncertain vs approximate dates rendered differently in styles, or are they just both annotated as “circa”? If there isn’t a rendering difference, then let’s just add uncertain and approximate while also retaining circa in a deprecated state. All three of these would make is-uncertain-date="issued" test true.

Bruce_D_Arcus1 · July 5, 2020, 6:52pm

You can see how biblatex handles it in the screenshot above; prefaced by “circa” for approximate, and with a “?” suffix for uncertain.

As you know, CSL has no such distinction, currently; only supporting the “approximate/circa.”

Here’s a gist of the basic json schema of just the date object.

bwiernik · July 5, 2020, 7:51pm

Okay, so it would seem that at least these changes are needed in CSL style syntax:

Make a new attribute is-approximate-date that covers existing circa behavior (indicates the date is circa/approximate)
Change the behavior of the existing is-uncertain-date to cover actual uncertainty.
1. Accompanying this, batch-change all existing styles to use is-approximate-date instead of is-uncertain-date.

As for other changes,

Date ranges

Date range support is currently tricky because circa applies to the date as a whole, which precludes things like this:

ca. 1950–1 January 2012
ca. 1884–ca. 1902

The main uses for is-uncertain-date currently are to add a “circa” term and to surround the date in brackets. It seems rather odd to require that all styles manually specify how to position circa, question mark, etc. Do these really vary that much? I suggest that we add placement of the circa term and the uncertainy question mark to locale files. (A similar change is needed to provide better support for formatting the ad/bc/ce/bce terms for era notation.)

Then, is-uncertain-date and is-approximate-date can still test globally and be true if either part of a date rate is uncertain/approximate. That would permit enclosing the date in square brackets, etc. (which does seem to vary across styles).

(In the data model, we should deprecate the global circa element in favor of specifying uncertain/approximate on each range element. For legacy data, consider a global circa element as applying to both parts of a range.)

Decades/centuries

I don’t see a problem with supporting X notation in years. For general rendering, that won’t create any issue, just pass the Xs through and render them. If we supported these via X notation in years, it would be nice to have accompanying form="text" localizations (i.e., for centuries, translating the date into an ordinal number and add a new term for century, for decades, adding a new term for decade and specifying in the locale how to format “1950s decade”).

Bruce_D_Arcus1 · July 5, 2020, 7:55pm

Here’s the github issue, which links back here.

Bruce_D_Arcus1 · July 5, 2020, 8:28pm

“Render them” how? As is? So “12XX”?

Or should that become “1200s” or “13th century”?

bwiernik · July 5, 2020, 8:35pm

Ideally, base on the date form:

form="numeric"
1. Render with Xs (e.g., 12XX)
form="text"
1. For centuries, render as “13th century”
2. For decades, render as “1950s decade”

Bruce_D_Arcus1 · July 6, 2020, 2:15pm

And here’s the PR.

github.com/citation-style-language/schema

input: Align date-parts and edtf string models

citation-style-language:v1.1 ← citation-style-language:csl-etdf-date

opened 11:46AM - 06 Jul 20 UTC

bdarcus

+128 -57

This modifies the structured date object to align with the EDTF model. It mov…es the qualifiers to the date object, and then defines a date range as an array of these date objects. The result is that dates can either be represented as an EDTF string, or as a structured object date, or a date range array. The intention is this structured variant will go away in time, so that in the future the only option will be the EDTF string. Closes #300 ----- In CSL JSON 1.0, dates are either a two-level nested array ... ``` js "issued": { "date-parts": [ [ 2000, 3, 15 ], [ 2000, 3, 17 ] ], "circa": "true" } ``` ... or a `raw` string, with is a standard EDTF-like date (note, however, that the example isn't compliant because months are not two-digits, as they are in ISO 8601). ``` js "accessed": { "raw": "2005-4-12" }, "issued": { "raw": "2000-3-15/2000-3-17" } ``` In CSL JSON 1.1, we move `raw` to an explicit EDTF string option on the property itself, So in this option, the above becomes: ``` js "accessed": "2005-4-12", "issued": "2000-3-15/2000-3-17" ``` This PR defines: 1. a date as an object, and moves the qualifiers onto the object, to make it consistent with EDTF (where you can have different qualifiers on the begin and end of a range) 2. a date range as a a two-item array of date objects Also, removes `date-parts`. A single date would be: ``` js "issued": { "year": "2000", "month": "3", "day": "15" }, "approximate": "true" } ``` ... and the first range example becomes this ("approximate" is the EDTF language what we have called "circa"): ``` js "issued": [ { "year": "2000", "month": "3", "day": "15" }, { "year": "2000", "month": "3", "day": "17" } ] ``` Note: open question is how to handle the mismatch between circa on 1.0 and this. And we can then do things like season ranges (not formally supported by EDTF, but easy for us to add): ``` js "issued": [ { "year": "2000", "month": "21", }, { "year": "2000", "month": "22" } ] ``` This would seem a better long-term that would allow us to painlessly drop the object/array representation at some point in the future.

Topic		Replies	Views
json representation? CSL Development	8	341	May 19, 2011
Dates CSL Development	29	491	October 1, 2009
A three level suggestion CSL Development	5	276	June 2, 2011
Schema questions CSL Development	36	455	August 31, 2009
RIS CSL Development	3	308	March 8, 2011

EDTF and Date Representation

Date ranges

Decades/centuries

Related topics