Should if/else blocks use the delimiter from a parent group?

Should the output from this be a,bcd,e    or    a,b,c,d,e?

<group delimiter="," >
  <text value="a" />
  <choose>
    <if position="first"> <!-- assume true -->
      <text value="b" />
      <text value="c" />
      <text value="d" />
    </if>
  </choose>
  <text value="e" />
</group>

It seems from reading the spec that it should be a,bcd,e, since <choose> is both a rendering element and a direct child element of <group> here, but the three text nodes are descendants (not children).

The cs:group rendering element must contain one or more rendering elements (with the exception of cs:layout ). cs:group may carry the delimiter attribute to separate its child elements

citeproc-js produces a,b,c,d,e, so this is probably the only real option. I’m mostly asking because while it isn’t in the spec really, a,b,c,d,e makes sense, and would make even more sense if you allowed people to remove the surrounding <choose>. It would be good to have some clarity before doing that. This is a feature (implicit-choose) I intend to implement, which will of course save us all millions of human-hours.

<group delimiter=",">
  <text value="a" />
  <if position="first">
    <text value="b" />
    <text value="c" />
    <text value="d" />
  </if>
  <text value="e" />
</group>

(Every other programming language manages without extra delimiters surrounding their if/else chains. In fact, if and else are delimiters already, no need for more. If you have two if blocks in a row, that’s not an issue at all, they are independent and distinguishable by the compiler into two separate chains.)

A further question is whether you should get a,b,c,d,e over macro boundaries as well. That’s a bit more dubious, and citeproc-js doesn’t do it. I agree.The following produces a,bc,de,fg,h.

<macro name="Inner">
  <text value="b" />
  <text value="c" />
</macro>

<macro name="InnerWithIf">
  <choose>
    <if position="first">
      <text value="d" />
      <text value="e" />
    </if>
  </choose>
</macro>

<macro name="InsideIf">
  <text value="f" />
  <text value="g" />
</macro>

<!-- ... -->

<group delimiter="," >
  <text value="a" />
  <text macro="Inner" />
  <text macro="InnerWithIf" />
  <choose>
    <if position="first">
      <text macro="InsideIf" />
    </if>
  </choose>
  <text value="h" />
</group>

Yes, I also think that a,b,c,d,e is the only real option. The other option would be implicitely treating the choose block as a group. But I think the first option is cleary preferable. I would think about this in terms of expansion: if condition x is true =>

<group delimiter="," >
  <text value="a" />
  <text value="b" />
  <text value="c" />
  <text value="d" />
  <text value="e" />
</group>

Else:

<group delimiter="," >
  <text value="a" />
  <text value="e" />
</group>

If technically possible, getting rid of this additional level sounds like a good idea.

Concerning macro boundaries: This somehow contradicts my expansion logic statement above. It seems that macros get implicitly treated as a group. I’m not sure if this kind of behaviour is really consistent. I’d prefer to have a,b,c,d,e,f,g,h, here. Like expanding inner brackets first until there is nothing more to expand.

(Otherwise, if macros are treated as groups why not allow a delimiter for macros. <macro name="macro" delimiter=", ">. But currently I think the macro content must be placed inside another group if we want this. Right? If yes, this behaviour is somewhat self-contradictory: macros are groups, and at the same time they are not.)

The spec you cite is actually unambiguous on this: in XML, a child element is exactly one level deep, so as per the current spec, citeproc-js’s behavior is correct in both instances (since choose and macro are the child elements.
edit: wait, citeproc-js does a,b,c,d,e for choose? That’s weird and I’d argue incorrect, sorry, misread the initial post

(nvm this part then: We could discuss changing this (though the costs in revisiting styles would be substantial, but it would require a specification change and I don’t see why the change would be warranted.)

So, it would be wrong to apply the logic of macro expansion to CSL?

Concerning a,b,c,d,e: I’ve performed a quick test in the visual editor, in Zotero’s style editor, and in the latest Juris-M beta (with latest Propachi). Same result in all three cases.

As far as we know, a lot of styles are probably depending on the a,b,c,d,e behaviour. I will be able to say for sure with a bit more tooling. It would be pretty easy to create an “A/B tester” to determine if there are significant differences across the style repository from changing this. It would involve:

  • A sample library that covers all item types with typical field usage
  • A small script to execute, for each style, every cite position and record the HTML output. There’s a useful TestEngine in jest-csl that could be adapted for this purpose; similarly, citeproc-rs can already do this by itself but would need to be able to select position=“ibid”, etc.
  • Running it
  • Switching a flag in the code and recompiling
  • Running it again into a different file
  • git diff --no-index, our saviour for this kind of thing

If the diff when changing a,b,c,d,e -> a,bcd,e turns out to be very small, then people weren’t depending on the incorrect behaviour, they were writing explicit groups inside if blocks, which would not require rewriting if you made citeproc-js compliant OR if you changed the spec.

If the diff turns out to be large, then people were depending on a,b,c,d,e, which would not require rewriting if you changed the spec to match it, BUT would require rewriting if you made citeproc-js compliant. My guess is that a significant number of styles are. The strange thing about this is that the only users that would actually be affected by introducing a,b,c,d,e as the New Way… are people who are not using citeproc-js, and whose different implementation behaves according to the spec. I do not know if any such people or styles or implementations exist.

I can further confirm that pandoc-citeproc if-blocks follow citeproc-js in a,b,c,d,e, and also in the way macros produce a,bcd,e. It would appear implementations have been forced to follow citeproc-js, most likely due to a large number of styles depending on its behaviour.

P.S.

I quite like the term fragment to describe a region of CSL that isn’t a group and may include multiple elements. You could say that an if block producing bcd is at its core a group without delimiter features, whereas one that produces b,c,d is a fragment that is included inline in place of the <choose>.

I have some other ideas about how to make macros more reusable with something akin to the concept of Transclusion aka <ng-content> in Angular or children in React. As a quick sketch to discuss some other time, mostly to explain precisely what I mean by a fragment:

<macro name="CommaSepBraces">
  <group delimiter=", " prefix="(" suffix=")">
    <content />
    <date variable="issued" />
  </group>
</macro>
...
<CommaSepBraces> <!-- funky new syntax for UpperCamelCased macro names -->
  <text value="a" />
  <text value="b" />
  <!-- inside the macro invocation is a fragment,
       included via the <content/> tag above.     -->
</CommaSepBraces>

… would produce (a, b, 12 Dec 1999).

Losing cs:choose would not do any harm, as it isn’t essential to the logical structures expressed inside it. Processors that can parse those structures in isolation can just treat cs:choose as a no-op; and for those that can’t, styles can easily be pre-processed to wrap logical statements in cs:choose.

@Denis_Maier’s suggestion to allow delimiter on cs:macro makes more sense to me than treating macro content as a bare fragment. In theory that would make macros more versatile, but if used heavily in practice, it would make them less readable. Doesn’t seem like a big win.

On the transparency of conditional boundaries (cs:choose, cs:if, cs:else-if, cs:else) to a delimiter set on a parent cs:group, that would necessarily introduce a structure to CSL that does not currently exist, and is not explicitly specified anywhere—a container boundary that blocks inheritance of a parent delimiter, but does not impose the implicit conditional logic of cs:group (i.e. within a cs:if, cs:else-if or cs:else block).

There are two arguments for doing it anyway: a pragmatic one (that processors that have followed the strict requirements of the spec may be in circulation); and a processual one (that this is the clear intent expressed in the CSL Specification, and should be respected). As @cormacrelf says, the pragmatic concern is not very strong, because a working processor will have been built against the test suite, which assumes “transparent” application of the ancestral delimiter; but that does leave the concern over process.

My memory of issues around macros and conditions was pretty vague, so I did some digging. After going over the history, I would come down on the side of amending the specification, rather than the other way around. Here is a run-down of the highlights:

  • July 3-5, 2009: There was a brief mail thread on xbiblio-devel across these dates concerning the structure of a revision to the CSL Specification, on which Rintze Zelle had begun to work. Those notes place cs:choose in the “Style Behavior” section. I myself chime in on the thread to suggest that cs:group be moved from “Style Behavior” to “Rendering Elements.” No one says anything about cs:choose.

  • October 19, 2009: Bruce D’Arcus made the first GitHub commit of the CSL 1.0 Specification. In that initial version of the spec, cs:choose is again not listed among the “Rendering Elements.”

  • February 21, 2010: Rintze Zelle moved the cs:choose section of the specification from the top of “Style Behavior” to the bottom of “Rendering Elements”, immediately below cs:group.

  • December 19, 2010: Bruce D’Arcus checked the then-current suite of CSL styles into GitHub. Although the CSL 1.0 Specification had been published with provision for a delimiter attribute on cs:group, no styles in this initial check-in made use of it. Even the most complex styles were still entirely joined by affixes at this time.

  • October 10, 2012: Charles Parnot of the Papers2 project raised an issue concerning the effect of cs:substitute under the cs:names element. While the issue is separate from that raised here, he calls attention to another portion of the spec where “children” seems to have a broad meaning, and relates it to differences in the behavior of cs:group and cs:choose:

    “The substitutions are specified as child elements of cs:substitute, and must consist of one or more rendering elements (with the exception of cs:layout). A shorthand version of cs:names without child elements, which inherits the attributes values set on the cs:name and cs:et-al child elements of the original cs:names element, may also be used. If cs:substitute contains multiple child elements, the first element to return a non-empty result is used for substitution.”

    […]

    The confusion is in the use of ‘child elements’, which could be interpreted as either strict children, or more generally descendants. The current implementation of the javascript processor applies the “strict children” rule for <group> child elements, but includes all descendants for <choose> child elements.

    (The outcome of that interaction was argeement that “children” does and should mean “descendants” in that context.)

  • July 18, 2014: Sylvester Kiel pointed out a contradiction between language in the CSL Specification and a fixture in the test suite that he had encountered while working on citeproc-rb. That discussion arose from a different part of the language that prompted the current thread:

    “cs:group may carry the delimiter attribute to separate its child elements, as well as affixes and display attributes (applied to the output of the group as a whole) and formatting attributes (transmitted to the enclosed elements).” [emphasis added]

    The specific issue concerned the inheritance of formatting attributes (such as font-weight="bold") by descendants. The specification language above seems to require it (“transmitted”), but the test fixture cited by Sylvester dictated that the styling be applied to the enclosing group. Everyone in the discussion agreed that the behavior was correct as tested, and that the spec should be amended. The language quoted above is still in there, though.

In light of all that—the lack of clear evidence that placing cs:choose under “Rendering Items” was meant to affect assignment of delimiters, and evidence that spec revisions around this issue have been agreed in the past but remain outstanding— I would favor common behavior over the spec in this case. I don’t see that making cs:choose opaque to delimiter would have any clear practical advantages that weigh toward reconciling in the other direction.

Pending further discussion and consensus here on Discourse, I’ve opened an issue on the CSL Specification to suggest a couple of amendments that I think would help to avoid future confusion.

Thank you, Frank, for your statutory intention analysis aided by extrinsic materials, that really resonates with me! If only we had some kind of court imbued with the jurisdiction to rule on these matters, but I generally agree with you about amending. It’s one thing to say that and another to agree on how to characterise such a change, however.

  • Was moving choose to Rendering Elements a mistake with unintended consequences for group delimiters, and an amendment to the spec puts us back on the intended path? The best argument for an originally a,b,c,d,e intention is CSL 0.8.1 which describes groups as “group together elements with a format applied to the whole group” and then illustrates <group delimiter=": "> .... So maybe delimiter is one of these formats that apply to the whole group? Italics apply to <if> contents, so maybe delimiters do too. 0.8.1 doesn’t have the ‘rendering elements’ classification. (Of course this provides some authority for the resolution of the ‘transmitted’ issue.)
  • Was there undefined behaviour that is now being given clarity? The most convincing argument involves interpreting 0.8.1 as not taking a position at all, and the only hint in CSL 1.0/1.0.1 as not having been intended to resolve it, or considered while drafting to affect it at all. I personally think this quite reasonable.
  • Does it matter what was intended? Was it unambiguous and is that enough?
  • Finally, the reality for me is that without following citeproc-js, citeproc-rs would be incompatible with many existing styles (running it briefly over the styles repo confirms this) and so my idea of a compliant implementation involves choosing a,b,c,d,e.

For completeness, citeproc-ruby is the odd one out, and outputs a,bcd,e. I suppose that weakens the common behaviour, unless @Sylvester_Keil sees this and conforms citeproc-ruby under the sheer weight of the styles repository. I would note that my first run at this was of the a,bcd,e variety because cs:choose was among rendering elements and it fell that way because of how I structured the Element enum variants, but it felt wrong at the time and I compared it to citeproc-js immediately, even though I am only now bringing it up.

The post with the links just covers my understanding of how we got here. I’ll go along with whatever the group decides.

(If you need anything further from this end, just @ me.)

Regarding whether <choose/> is a rendering element or not, the answer is probably “kinda”. Note that the CSL 0.8.1 schema was not accompanied by any kind of specification, and if I remember correctly I was just trying to shoehorn all the parts of the CSL 1.0 schema into a logical structure when writing the specification. It’s also become clear over the years that there are many corner cases not sufficiently ironed out in the specification.

For the two cases that gave raise to this thread, I think the current citeproc-js behavior makes the most sense from the standpoint of a CSL style author.

<if/>, <else-if/> and <else/> are often introduced at a later stage in style coding to cover exceptions, and it seems preferable to me that when a style author moves code that was previously free under control of <choose/>, this should happen as transparent as possible. I.e., other than selecting which code block gets evaluated, <choose/> should not block inheritance that would otherwise be in place.

For macros, the main point of their existence is encapsulating code (for reuse and/or to improve the readability of a style). I think it would make macros much harder to troubleshoot if their internal delimiter use is somehow affected by delimiter set on the <group/> parent of the <text/> element calling the macro. We have enough trouble as it is with style authors who put affixes that belong outside of the macros inside them. E.g. they might write

<macro name="genre">
  <text variable="genre" suffix=", "/>
</macro>

<!-- ... -->

<group>
  <text macro="genre" />
  <text variable="publisher" />
</group>

Delimiter inheritance across macro calls would only add to the headaches here.