Design Principles for CSL JSON

Bruce_D_Arcus1 · July 15, 2020, 5:47pm

In the experimental solution we settled on, the result is a native JSON data structure; the unambiguous target for any parsing.

Aside: these days, YAML and JSON are effectively interchangeable. Our JSON schemas can validate a YAML variant.

So despite the syntax differences, we’re saying for csl processors, rich text is a nested array of strings and formatted objects.

In YAML:

title:
  - A title with tex math
  - math-tex: x=y^2

This is also valid though:

title: A title

But, there’s a reason I marked it “experimental”: we need feedback.

Denis_Maier · July 16, 2020, 5:50am

So concerning these three options regarding titles and subtitles:

@Frank_Bennett what would be your perspective on these options? Given your comment above and on the rich text issue, I’d assume that your preference could be option 1, is that correct?

Frank_Bennett · July 16, 2020, 10:18am

I guess I’m curious whether this is an abstract specification (titles may [or must] be parsable into structures that validate against this schema) or something more concrete (titles may be flat strings for verbatim rendering, or structures that validate directly against this schema).

Bruce_D_Arcus1 · July 16, 2020, 10:43am

The latter, I think.

We’re adding ability to separately specify formatting for main titles and subtitles in styles, because some styles require this.

How should a processor access those parts?

Do we need to change the input schema so a processor accesses them directly:

print(ref['subtitle'])

… or:

print(ref['title']['sub'])

… or do we require processors to parse title strings to access these:

print(parse_title(ref['title'], 'sub')

On the input end, the last option yields no change in the input model, but would result in things like this to accommodate non-standard data for which the algorithms break:

title: "A title || a subtitle, non-standard delimeter"

Keep in mind parsing would also apply in the rich text model; am not sure how that would work, actually (say for a new processor targeting 1.1), because the “title string” would no longer only be a string.

Denis_Maier · July 16, 2020, 11:14am

It may be worth noting that the current proposal to split titles is based on current citeproc-js functionality.

Frank_Bennett · July 16, 2020, 12:00pm

The logistics do get complicated. But setting aside the rich-text issue, it would obviously be easier in the processor to just receive title and subtitle as separate fields. That’s not the shape of data in the wild, though, so parsing would have to happen somewhere. The burden would just fall on the calling application, which will adopt various solutions or not.

Bruce_D_Arcus1 · July 16, 2020, 12:15pm

The “redefine title to be main title, and add subtitle variables” option would align us with biblatex.

The title as object I’ve not seen elsewhere (except in MODS).

bwiernik · July 16, 2020, 12:21pm

The parsing rules are based on the existing citeproc-js parsing rules that are used for uppercase subtitles. The additions are (1) a style-setting to specify a set of delimiters from a list of discrete options and (2) a specified character string || to override automatic parsing (similar to the existing full/short comparison, but not requiring multiple fields).

With respect to how this intersects with rich text—I think a simple rule that parsing doesn’t cross markup boundaries would work.

Denis_Maier · July 16, 2020, 1:03pm

That would be a good solution for my use cases. But, as has been said before, that won’t be without problems either.

(Also, even this might require some parsing in the processor: how would a second subtitle be handled in such a solution?)

Bruce_D_Arcus1 · July 16, 2020, 1:13pm

It’s not clear to me we should support that.

Denis_Maier · July 16, 2020, 1:19pm

Why not?
At least with the current proposal it’s possible without adding too much complexity.

Bruce_D_Arcus1 · July 16, 2020, 1:44pm

Because the current proposal adds complexity upfront.

How about let’s leave this question open until end of day (extending this a bit) Sunday, July 19.

If someone wants to argue for a change from the current plan, please state your case, and which option you prefer, here.

Otherwise, we’ll go the parsing titles route.

Bruce_D_Arcus1 · July 16, 2020, 1:51pm

I mean, if I were a developer, I would want to literally know how to do it.

All languages have string splitting functions, so with a string, that’s straightforward.

Are we saying splitting only happens, for example, on the formatted string, after primary processing (if dealing with rich text, one would need to format the sub-strings for output to RTF, HTML, LaTeX, or whatever, after all)?

Denis_Maier · July 16, 2020, 2:52pm

So the parsing route is your favourite?

Bruce_D_Arcus1 · July 16, 2020, 2:54pm

No.

I’d prefer not, but I want to keep this moving, so …

I do think we need an answer to my latest question though before we actually do this.

And I’m curious what @PaulStanley thinks as a relative newcomer.

Denis_Maier · July 16, 2020, 2:57pm

Like not at all or simply not parsing?

Bruce_D_Arcus1 · July 16, 2020, 2:59pm

You asked about parsing, so that’s all I meant.

Denis_Maier · July 16, 2020, 3:08pm

Ok.
But then which one of those three options would be your favorite? (But you don’t want to argue for a change if plans?)

No doubt, feedback from other implementers would be useful. This is based on current citeproc-js behaviour, but what do @PaulStanley, @cormacrelf, @John_MacFarlane think about this? Also @asimonyi

Bruce_D_Arcus1 · July 16, 2020, 3:08pm

I don’t have a strong preference. I can see arguments either way.

But you don’t want to argue for a change if plans?

I think we should ideally base the decision on what works for styles and style authors, and for CSL developers. I have concerns about the parsing for the latter, but developers can speak for themselves.

I may change my mind based on subsequent conversation though

bwiernik · July 17, 2020, 2:39am

Just to clarify the main vs short title distinction, this is something that is a pretty annoying limitation for BibTeX users in fields like law that regularly use short titles that aren’t just the main title.

Topic		Replies	Views
Processor support for CSL-JSON 1.1? CSL Development	2	662	October 8, 2020
json representation CSL Development	0	252	July 10, 2009
csl issues CSL Development	0	231	September 25, 2005
RFC: Rich Text for CSL JSON input format CSL Development	14	1167	November 18, 2021
Thoughts on enhanced date support CSL Development	2	267	November 14, 2010

Design Principles for CSL JSON

Related topics