EDTF and Date Representation

As I was thinking about this, thought I’d post it …

We have recently added EDTF to represent dates on the input schema.

Open questions are which features we support, and the status of the date-parts JSON representation. If it stays long-term, we might want to reconcile them as much as possible now.

If not, this could help with documenting regardless.

Feature EDTF CSL JSON 1.0 Notes
Seasons 2019-21 “season” property The JSON could also support this on the month part.
Uncertainty 2019-03-21? (uncertain)
2019-03-21~ (approximate)
2019-03-21% (both)
“circa” property Which do we support, and how do we interpret in CSL? Biblatex treats ~ as circa.
Ranges (including open) 2018-21/2018-22
…/1900
supported, with two date-part sub-arrays Specify 0 for open-end?
Decades and centuries 12XX (13th century)
195X (1950s decade)
not supported I’m not clear if we really need this, but could be addressed by allowing X on the year part.

I’d say:

  • Seasons. Definitely yes. Seasons are, after all, already supported: the numbers are just different (IIRC 13=spring, 14=summer, 15=fall, 16=winter) on date parts. Luckily these don’t overlap, so one can just allow 13/21 = spring, 14/22 = summer, 15/23 = fall and 16/24 = winter, and there is no room for confusion. Processors will almost certainly normalize on input.
  • Uncertain, approximate, and uncertain+approximate: I’d support. I think this requires an additional test (is-approximate-date): and an additional potential value on the JSON input for that too. I doubt you need a test for is-uncertain-and-approximate: it should be possible to combine tests. But since uncertainty and approximation are different ideas, it makes sense to separate them.
  • Ranges. Definitely support, since they are already supported. Could you allow this on date-parts by adopting the convention that 0 means open? Even if you could, I wouldn’t: You don’t want to break date-parts, but there’s no particularly good reason to extend them.
  • Decades and centuries. Part of me says “don’t support”, because it’s going to be a ton of work. You will need locale-terms for centuries and decades, presumably in both long and short versions, and I suspect that the effort is not worth the very limited use. OTOH, one nearly always regrets bodging things like this. I don’t know enough about other languages to know how difficult the locale-specific stuff would be.
1 Like

Ah, so you’re saying we’re actually close, with just needing a single new property, like so, and some updated documentation?

      "date-parts": [
        [
          2017
        ],
        [
          0
        ]
      ],
      "approximate": true
    }

Though, it occurs to me now, that only allows the uncertainty or circa to be specified for the date as a whole. EDTF allows more fine-grained encoding; though again, am not certain we need that for this use case?

I think the same issue would apply to season ranges; not clear how we could represent “2017-21/2017-22” unless we encoded the season in month slot, as with EDTF.

I thought that the season was already encoded in the month slot! It does in the test suite anyway, though I think the spec is unclear about it. It encodes to 13, 14, 15, or 16, so [ [16], [2020] ] is Winter 2020. It’s certainly the reasonable thing to do, because you are not likely to have both season and month, and season really means “month as trimester”.

I think it’s reasonable to say that fine-grained control can be left to those who are willing to use EDTF. But it would be a pity if the JSON form wasn’t able to signal approximate, because I’d have thought the “c. 1900” or “1900?” forms are generally useful. I rather doubt very fine-grained control will often be needed, but if it is, the user can drop back to EDTF.

Citeproc-JS is also willing to devote a great deal of effort to parsing all manner of odd dates. That’s very kind of it, but I’d be inclined to say that it is “above and beyond the call of duty”. I’d think the standard should simply be: parse date-parts or EDTF properly, and no guarantees are required about dates in any other form: a processor may be helpful about them, or may simply pass them as text.

PS: I’d prefer “isApproximate”, because I think it’s desirable if variable names make their type explicit, but that’s a detail.

ETA: You might like to investigate whether existing styles use “uncertain-date” to mean “unknown” or “approx”. If they generally render it as “c.”, you might prefer to introduce two new flags, such as isIndeterminate and isApproximate and have isUncertain treated as if it meant isApproximate. That’s not ideal, but it might be kinder, if that’s how it’s currently mostly used.

Ah, I guess season is for string representations.

Why we need to annotate this schema!

Part of what we need to settle, though, is what features of EDTF we support, and what we don’t.

Looking more closely at the spec, I think we do want the Level 1 feature “Qualification of a date (complete),” but we do not want the Level 2 extension “Qualification of Individual Component.”

That would keep the two date representations consistent, and simplify implementation.

In that case, probably the sensible path is to follow biblatex, and say we support levels 0 and 1, with the same approach to uncertain dates (though we need more feedback from others on the X feature, which biblatex does support).

Even if coupled with the existing “circa”?

Actually circa = EDTF approximate, so we need a new “uncertain/isUncertain.”

Right. And to facilitate conversion between EDTF and date-parts we’d probably best retire the current “global” circa flag, and introduce two new flags (circa, or to make the connection to EDTF explicit: approximate, or approx; and uncertain) for both standalone/start and end date.

How would that work, given the current parts model is an array?

And are we certain that this is valid EDTF Level 1: “2004-02-01?/2005-02-08”? Neither that spec nor the biblatex docs include such an example, though the implicit suggestion is it is.

edtf.js shows that this is valid – using node:

const edtf = require('edtf')
edtf.parse('2004-02-01?/2005-02-08').values

Output:

[
  { type: 'Date', level: 1, values: [ 2004, 1, 1 ], uncertain: true },
  { type: 'Date', level: 0, values: [ 2005, 1, 8 ] }
]

And the next example indicates, by analogy, where the up to 2 x 2 “flags” should be placed in a modified date-parts structure.

edtf.parse('2004-02-01%/2005-02-08%').values

Output:

[
  {
    type: 'Date',
    level: 1,
    values: [ 2004, 1, 1 ],
    approximate: true,
    uncertain: true
  },
  {
    type: 'Date',
    level: 1,
    values: [ 2005, 1, 8 ],
    approximate: true,
    uncertain: true
  }
]
1 Like

Perfect.

While you’re at it, can you see how EDTF.js represents an open-end of a range? Would the values property just be null or 0, or an empty array?

Sure – unknown: null; open: Infinity:

> edtf.parse('2004-02-01%/').values
[
  {
    type: 'Date',
    level: 1,
    values: [ 2004, 1, 1 ],
    approximate: true,
    uncertain: true
  },
  null
]
> edtf.parse('2004-02-01%/..').values
[
  {
    type: 'Date',
    level: 1,
    values: [ 2004, 1, 1 ],
    approximate: true,
    uncertain: true
  },
  Infinity
]

And ../1900?

Does anyone see any downside in changing the schema along these lines?

The schema is versioned, so any legacy json will have a different schema id.

> edtf.parse('../1900').values
[ Infinity, { type: 'Date', level: 0, values: [ 1900 ] } ]

I played a bit with node and edtf, which shows (per @njbart’s tests) the model has a Date object, and an Interval object, whose “values” property is an array of Date objects.

> edtf.parse('2004-02-01')
{ type: 'Date', level: 0, values: [ 2004, 1, 1 ] }
> edtf.parse('2004-02-01?')
{ type: 'Date', level: 1, values: [ 2004, 1, 1 ], uncertain: true }
> edtf.parse('2004-02-01/2005-02-08')
{
  values: [
    { type: 'Date', level: 0, values: [Array] },
    { type: 'Date', level: 0, values: [Array] }
  ],
  type: 'Interval',
  level: 0
}
> edtf.parse('2004-02-01?/2005-02-08')
{
  values: [
    { type: 'Date', level: 1, values: [Array], uncertain: true },
    { type: 'Date', level: 0, values: [Array] }
  ],
  type: 'Interval',
  level: 1
}
> edtf.parse('../2005-02-08')
{
  values: [ Infinity, { type: 'Date', level: 0, values: [Array] } ],
  type: 'Interval',
  level: 1
}

How do uncertain and approximate play with style coding? Right now, CSL just has one is-uncertain-date="issued" condition; it doesn’t distinguish between uncertain and approximate senses of “circa”.

Are uncertain vs approximate dates rendered differently in styles, or are they just both annotated as “circa”? If there isn’t a rendering difference, then let’s just add uncertain and approximate while also retaining circa in a deprecated state. All three of these would make is-uncertain-date="issued" test true.

You can see how biblatex handles it in the screenshot above; prefaced by “circa” for approximate, and with a “?” suffix for uncertain.

As you know, CSL has no such distinction, currently; only supporting the “approximate/circa.”

Here’s a gist of the basic json schema of just the date object.

Okay, so it would seem that at least these changes are needed in CSL style syntax:

  1. Make a new attribute is-approximate-date that covers existing circa behavior (indicates the date is circa/approximate)
  2. Change the behavior of the existing is-uncertain-date to cover actual uncertainty.
    1. Accompanying this, batch-change all existing styles to use is-approximate-date instead of is-uncertain-date.

As for other changes,

Date ranges

Date range support is currently tricky because circa applies to the date as a whole, which precludes things like this:

  • ca. 1950–1 January 2012
  • ca. 1884–ca. 1902

The main uses for is-uncertain-date currently are to add a “circa” term and to surround the date in brackets. It seems rather odd to require that all styles manually specify how to position circa, question mark, etc. Do these really vary that much? I suggest that we add placement of the circa term and the uncertainy question mark to locale files. (A similar change is needed to provide better support for formatting the ad/bc/ce/bce terms for era notation.)

Then, is-uncertain-date and is-approximate-date can still test globally and be true if either part of a date rate is uncertain/approximate. That would permit enclosing the date in square brackets, etc. (which does seem to vary across styles).

(In the data model, we should deprecate the global circa element in favor of specifying uncertain/approximate on each range element. For legacy data, consider a global circa element as applying to both parts of a range.)

Decades/centuries

I don’t see a problem with supporting X notation in years. For general rendering, that won’t create any issue, just pass the Xs through and render them. If we supported these via X notation in years, it would be nice to have accompanying form="text" localizations (i.e., for centuries, translating the date into an ordinal number and add a new term for century, for decades, adding a new term for decade and specifying in the locale how to format “1950s decade”).

1 Like

Here’s the github issue, which links back here.

“Render them” how? As is? So “12XX”?

Or should that become “1200s” or “13th century”?

Ideally, base on the date form:

  1. form="numeric"
    1. Render with Xs (e.g., 12XX)
  2. form="text"
    1. For centuries, render as “13th century”
    2. For decades, render as “1950s decade”

And here’s the PR.