Design Principles for CSL JSON

Not sure if this is the right thread for it, but it does seem to me that people building systems will inevitably be faced with conversion challenges for string markup/structuring, just because metadata in the wild comes in flavors. The way that’s handled in graphic format conversion is to have a verbose target format that supports all features (PPM/PNM), and then to build tested converters into and out of that format. Thinking of the problem in that way, there is virtue in having a CSL-specified structured format to serve as the “pivot,” even if it is not directly used by implementations.

2 Likes

Yeah. Whatever solution is finally adopted, we need to have a short-title in addition and independent from the main part of the title.

So, what would that mean for the title-subtitle issue? CSL would have have fields for both, and converters would transform their data into that structure? Or am I misunderstanding what you’re saying?

So, what would that mean for the title-subtitle issue? CSL would have have fields for both, and converters would transform their data into that structure?

That’s what I read.

I should add, we added “sub” and “main” forms for styles to 1.1. We could still change this, but the CSL would look like:

<text variable="title" form="sub"/>

That works for two of the options (and for the object approach, there would be a symmetry with this; @form would signal to access the equivalent key), but maybe wouldn’t be needed or ideal for the third (redefining title and adding subtitles).

I only add this to the discussion to give the full picture; we can adjust to whatever is best.

1 Like

Just to note that calling applications also have the option of using external utilities to supply abbreviations. CSL doesn’t need to take on absolutely everything.

4 Likes

Using external applications is a good idea. You mean like a pre-processor? But maybe even that should be mentioned in the specs somehow?

The more I think about this, the more I think we should go with option 2.

Reason:

  • it fits with the modeling we already agreed on for the styles (variable='title' @form='sub' -> ['title']['sub'] )
  • it keeps the input schema simpler; no need for new subtitle variants, and we can remove the short variants (because short title can be moved inside the title object)
  • it leaves room for easier expansion of title handling in the future, should we need it

That’s where I am ATM at least; of course would really like to hear from more developers, so I have modified the deadline above to leave this open through Sunday to make that more likely, with the idea we decide on Monday.

It also makes titles similar to names in structure.

Would this mean to access the short variant only via title form="short"? If you want to do this, please keep in mind tha current functionality where variable="title" form="short" and variable="title-short" don’t mean the same thing:

  • variable="title-short" returns the contents of user-supplied title-short-field.
  • variable="title" form="short" returns a short title either from the entry in the title-short field or an abbreviation from an abbreviation list or an automatically shortened version of the title.

The entries from an abbreviation list or automatically shortened titles might not be adequate for a certain style, therefore these two need to be distinguishable. So, testing for a user supplied short-title has to be in place as well as direct access to it without fallback to the abbreviations.

If we went this way, yes.

Thanks for mentioning that, because I hadn’t thought about that.

Is that a CSL specification detail, or a Zotero implementation decision? A quick look at the spec suggests to me the latter, but I didn’t look closely.

Maybe we need to distinguish “short” variants and “abbreviations”?

In any case, I have a feeling we can solve this, even if I don’t have a specific proposal now. We can cross that bridge if we decide on this solution.

To be honest, I couldn’t tell. I only know it from style authoring and from the CSL-M documentation which tells us that there’s a difference between CSL and CSL-M. The latter doesn’t render title-short when calling title form="short" for type="legal_case" (CSL-M: extensions to CSL — citeproc-js 1.1.73 documentation). Therefore, I suppose that’s a citeproc thing that might be worth to consider in the schema.

Maybe, per my previous point, we add an “abbreviated” form, which can be auto generated?

I think, it’s important to continue the functionality, I have no feelings about the naming :slight_smile:

Would variable="title" form="abbreviated" take the whole functionality of current @form="short" ie. render @form="short" with a fallback chain via abbreviation list and abbreviating the title? Or would it only render the abbreviation? If the latter, the upgrade script for existing styles would be a small little bit more complicated.

Thinking a bit more about it, I don’t think, rendering the abbreviation directly without considering @form=s"short" is reasonable. That would mean to deprive the user from the possibility of overriding the (usually) application supplied abbreviations on a per title basis.

Full circle again… We did consider this option, and I think one of the main reasons we did not go this way was input by @Dan_Stillman who argued against changing the data structure of titles in that direction (edit: wrong link. I can’t find the discussion I was referring to.) This just to remind us here.

I personally wouldn’t mind doing this as I mostly use pandoc and this feels quite natural:

title:
  main: main title
  sub: subtitle

Or even as an array:

title:
  - main title
  - first subtitle
  - second subtitle

So, this would even give us multiple subtitles, and access would be easy enough => text variable="title" form="main" would be title[0].

As this removes some (most) of the complexity from citeproc’s I can see the appeal of that option.

But: The current proposal is able to satisfy these requirements:

  1. Styles vary in what delimiters are used to split (e.g., APA splits on , Chicago does not; Chicago splits on ; or, and , or, but most other styles do not)
  2. Some styles (e.g., Chicago) normalize delimiters, other (e.g., APA) do not; also there’s some variation across styles which delimiters should be normalized.
  3. Styles vary in whether second+ subtitles use the same or a different delimiter (e.g., ; instead of : ).

I’m not sure option 2 does so as well.

1 Like

Two further things:

First, I don’t follow all the use case wrinkles here, so I’m asking questions, not suggesting answers.

Second, my question was based on the sense I have that “short” is maybe trying to do too much.

Reminder: original design of title and title-short was the accommodate titles and main titles (title minus subtitle).

As far as I can tell, main title on its own is not really relevant beyond being able to normalise subtitle casing and the subtitle delimiter in most styles. Styles that require short-titles on the other hand, usually define rules for its creation that go far beyond simply stripping off subtitles.

The fallback chain in @form=“short” helps avoiding a pattern that would be necessary everywhere where you would want the title abbreviation (as said above):

<choose>
  <if variable="title-short">
    <text variable="title-short"/>
  </if>
  <else>
    <text variable="title" form="short"/>
  </else>
</choose>

The original reason to introduce title-short as you say, in the new design will be solved otherwise so I don’t think this should play a role here?

One other detail might be worth clarifying:

Are not titles and subtitles fundamentally core distinctions made by content creators?

What do you mean? And what does that imply?

Well. I’d like to say yes, but I’ve seen lots of examples where it’s not clear if the content creator even was aware of this distinction.

And then, for citing that distinction is of little relevance. In the context of a citation the whole thing is the title and the subtitle can’t be left away. For unambiguous citations it’s great to be able to discern main-title and sub-title but that’s all. Many (institutional) styles don’t even bother to say anything about a subtitle deilimiter.