Syntax proposal: conditions

Several months ago, I posted an issue on the schema tracker concerning
the “none” attribute. After giving it some more thought, I’ve come up
with the fuller proposal below. The changes proposed are
backward-compatible with the current schema.

At Rintze’s suggestion I ran this past Sebastian and processor
developers before posting here for broader review. The response has
been positive: Sebastian is in favour; Sylvester has implement the
syntax in citeproc-ruby; and Andrea and Charles have indicated that
they are ready to go forward with implementation.

Open issues are:

  • Whether to include a *-none suffix on attributes in addition to
    *-any and *-all (I left this out, Sylvester suggested including it,
    and I can’t see a problem); and
  • Constraints to be imposed by the schema.

A test that exercises the syntax is available here:

https://bitbucket.org/fbennett/citeproc-js/src/tip/tests/fixtures/local/condition_PlusMinus.txt?at=default

The proposal would introduce two changes:

  • An optional not: prefix on elements of the list argument to
    certain condition attributes; and

  • Alternative forms of several condition attributes, identified
    by an *-all or *-any suffix.

Under the proposal, conditional evaluation would take place as
follows:

(1) Each attribute argument list element is evaluated, returning
"true" or “false”;

(2) For each attribute, the results from (1) are evaluated using “all”
(if the attribute suffix is “-all”) or “any” (if the attribute
suffix is “-any”), following the rules described in the CSL
Specification. Legacy attributes with no suffix follow the value
of the “match” attribute, or “all” if no companion "match"
attribute is present).

(3) The “match” attribute evaluates the results from (2) using
"all" or “any” as described in the CSL Specification,
returning an overall test value of “true” or “false”.

This would allow two testing patterns that are not currently possible:

  • A single test can require both true and false values; and

  • Attributes can set an evaluation method (“all” or “any”) independent
    of the “match” attribute that controls inter-attribute evaluation.

This flexibility makes it possible to reduce the bulk of CSL code.
As one example, the construct below is found in several styles in the
CSL repository:

<macro name="year-date">
  <choose>
    <if type="webpage">
      <choose>
        <if variable="issued">
          <date variable="issued">
            <date-part name="year"/>
          </date>
        </if>
        <else>
          <date variable="accessed">
            <date-part name="year"/>
          </date>
        </else>
      </choose>
    </if>
    <else>
      <date variable="issued">
        <date-part name="year"/>
      </date>
    </else>
  </choose>
</macro>

What the code does is to print the “accessed” date if the item is a
webpage and has no “issued” date, and otherwise to print the "issued"
date, regardless of item type. A nested cs:choose statement is needed,
because negative and positive conditions cannot be declared together
on the same cs:if or cs:else-if element.

With the proposed syntax, the code sample above can be rewritten as a
single cs:choose statement:

<macro name="year-date">
  <choose>
    <if type="webpage" variable="not:issued" match="all">
      <date variable="accessed">
        <date-part name="year"/>
      </date>
    </if>
    <else>
      <date variable="issued">
        <date-part name="year"/>
      </date>
    </else>
  </choose>
</macro>

Condition attributes that would accept a not: prefix on argument
elements and be given alternative *-all and *-any forms under the
proposal are the following:

* is-numeric
* is-uncertain-date
* locator
* type
* variable
* jurisdiction (MLZ only)
* page (MLZ only)

Frank

The proposal looks pretty clean, and would make cs:choose much easier
to use. No complaints from me.

Rintze

So as a first comment (will look at details later, though I have confidence
it’s well thought-out given the background work you’ve done, Frank; thanks
for that), let me explain why the syntax is the way it is. E.g. it wasn’t
an accident.

Given that CSL is an XML language, I really wanted to keep the language as
clean to process as possible using the most pure XML processing language
extant: XSLT.

So the idea was any significant CSL logic was represented in terms of
native XML structures: nodes (elements and attributes) and values.

Going this route, where the values themselves take on core logical
semantics, and where those values themselves must be processed (though
admittedly, the processing is simple here; just split on a colon and treat
as key-value), is a change in direction.

I suggest if we do go this route we have a firm logical basis on which we
do this, so that we can have a consistent answer on what’s OK, and what’s
not, going forward.

BruceOn Mon, Apr 15, 2013 at 11:41 AM, Frank Bennett <@Frank_Bennett>wrote:

Correction:On Mon, Apr 15, 2013 at 12:10 PM, Bruce D’Arcus <@Bruce_D_Arcus1> wrote:

admittedly, the processing is simple here; just split on a colon and treat
as key-value …

Not so; the trailing “-all” etc proposal makes things more complicated.

Bruce

One alternative solution that relies on native XML structures would be
to introduce a new …-match attribute for each conditional, that
controls the test logic of the conditional value list. E.g.

variable=“volume issue” variable-match=“all”

means that the item needs to have both “volume” and “issue”. Similarly,

is-numeric=“volume issue” is-numeric-match=“nand”

means that the item must be missing either a volume or issue, or both.
(Frank’s is-numeric-any=“not:volume not:issue” confuses me a bit; is
this supposed to act as a XOR or NAND?)

Frank’s unit test samples, rewritten:

      <if type="article-journal" variable="volume issue"

variable-match=“all” is-numeric=“volume issue” is-numeric-match="nand"
match=“all”>













It is less expressive, though. E.g. it’s not possible cover the logic
encoded by something like

Rintze

Or, we could forgo the “not:” modifier and add …-all, …-any,
…-nand (NAND) and …-none versions of each conditional:

      <if type="article-journal" variable-all="volume issue"

is-numeric-nand=“volume issue” match=“all”>













Something like could be encoded with

Rintze

“Bruce D’Arcus” <@Bruce_D_Arcus1> writes:

So as a first comment (will look at details later, though I have
confidence it’s well thought-out given the background work you’ve
done, Frank; thanks for that), let me explain why the syntax is the
way it is. E.g. it wasn’t an accident.

Given that CSL is an XML language, I really wanted to keep the
language as clean to process as possible using the most pure XML
processing language extant: XSLT.

So the idea was any significant CSL logic was represented in terms of
native XML structures: nodes (elements and attributes) and values.

Going this route, where the values themselves take on core logical
semantics, and where those values themselves must be processed (though
admittedly, the processing is simple here; just split on a colon and
treat as key-value), is a change in direction.

This is quite a strong argument, indeed. I wonder if we could increase
the conditional expressiveness by just permitting boolean connectors
(and, or, xor, plus the prefix not:) inside attributes.

Something like:

<if type-all="article-journal" variable-all="volume issue" is-numeric-any="not:volume not:issue" match="all">

could be expressed:

<if type="article-journal" variable="volume and issue" is-numeric="not:volume or not:issue" match="all">

(possibly ‘or’ could be the default).

Just a thought.–
andrea

Bruce: Absolutely. The original syntax is very clean and makes good
sense. This just aims to add a little more expressiveness, to cope
with some of the more complex styles that we’ve encountered in the
wild.

It looks like we have suggestions running in two directions. We could
move the “not:” condition from the argument to an attribute, making
things more readable but losing some expressiveness (Rintze); or we
could move the match logic into the argument as well, again improving
readability but with more complex semantics (Andrea).

My feeling is that the more expressiveness we have, the better, within
reasonable limits of complexity for validation (on one side) and
implementation (on the other). I think the original proposal strikes
that balance, but I’m still learning.

For maximum expressiveness, a single attribute that recognizes an
XSLT-like syntax (?) would give us everything we could possibly need,
at the cost of considerable pain on the validation side. Something
like:

condition=“and(
type(article-journal),
variable(volume),
variable(issue),
or(
not(variable(volume)),
not(variable(issue))
)
)”

That’s not a fresh proposal – just thinking out loud about possibilities.

If we are going to put more logic in the attributes themselves, then one might as well make it more redable and go all the way, and have the AND/OR/NOT in the attributes.

I also like the fact that indeed attributes don’t participate in the processing, and it’s all in the node and attribute names.

Now, here are more thoughts on this. Please my apologies if I am going quite far from the initial discussion: these are just random thoughts, not something I strongly or advocate for. Just food for thought. I want to let more knowledgeable person make the specs and take decisions, but maybe this can give more ideas.

Part of the discussion also boils down to the issue of readability vs syntax consistency (vs brevity??): do we want the code to be easily readable? Or easily amenable to automation and optimized for fast processing? I think the choice of XML is already not aiming at readability, and the issue raised by Frank is one of readability (a concept to which implicitely includes writability): to generate more complex predicates, one needs nested . It will always be a difficult balance to strike, and that points again to the need for tools on top of the language that help build this kind of predicates.

So, if we want to have more complex predicates, I would argue the syntax should bring more readability. While the ‘not:’ syntax is easy to understand, all the ‘all’, ‘match’, ‘any’ attributes are quite hard to understand. I don’t think this line is readable, for instance, and it’s really hard to understand what is going on:

<if type="article-journal" variable="volume issue" variable-match="all" is-numeric="volume issue" is-numeric match="nand" match="all">

In a way, I prefer the current nested but still easier to parse mentally conditionals.

Now, I like better the idea of new node types to allow a more reable syntax for complex predicates, similar to what Frank suggests below, but maybe go straight to what we really want:

<if type="article-journal">
    <or variable="volume issue" variable="volume" variable = "issue" />
    <text variable = "volume">
<if />

<if type="article-journal">
    <or variable="volume" />
    <or  variable="volume" />
    <text variable = "volume">
<if />

<if type="article-journal">
    <and>
        <not variable="volume" />
        <or  variable="issue"  />
    <and />
    <text variable = "volume">
<if />

with some implicit and when multiple attributes are in the , , or nodes. As to operator precedence etc… I will let years of CS development make the decision. After all, logic is the mother of all computing :slight_smile:

I just like how the stuff above is readable and I believe it’s very easy to process as well.

Charles

If we are going to put more logic in the attributes themselves, then one might as well make it more redable and go all the way, and have the AND/OR/NOT in the attributes.

I also like the fact that indeed attributes don’t participate in the processing, and it’s all in the node and attribute names.

Now, here are more thoughts on this. Please my apologies if I am going quite far from the initial discussion: these are just random thoughts, not something I strongly or advocate for. Just food for thought. I want to let more knowledgeable person make the specs and take decisions, but maybe this can give more ideas.

Part of the discussion also boils down to the issue of readability vs syntax consistency (vs brevity??): do we want the code to be easily readable? Or easily amenable to automation and optimized for fast processing? I think the choice of XML is already not aiming at readability, and the issue raised by Frank is one of readability (a concept to which implicitely includes writability): to generate more complex predicates, one needs nested . It will always be a difficult balance to strike, and that points again to the need for tools on top of the language that help build this kind of predicates.

So, if we want to have more complex predicates, I would argue the syntax should bring more readability. While the ‘not:’ syntax is easy to understand, all the ‘all’, ‘match’, ‘any’ attributes are quite hard to understand. I don’t think this line is readable, for instance, and it’s really hard to understand what is going on:

    <if type="article-journal" variable="volume issue" variable-match="all" is-numeric="volume issue" is-numeric match="nand" match="all">

In a way, I prefer the current nested but still easier to parse mentally conditionals.

Now, I like better the idea of new node types to allow a more reable syntax for complex predicates, similar to what Frank suggests below, but maybe go straight to what we really want:

<if type="article-journal">
    <or variable="volume issue" variable="volume" variable = "issue" />
    <text variable = "volume">
<if />

<if type="article-journal">
    <or variable="volume" />
    <or  variable="volume" />
    <text variable = "volume">
<if />

<if type="article-journal">
    <and>
        <not variable="volume" />
        <or  variable="issue"  />
    <and />
    <text variable = "volume">
<if />

with some implicit and when multiple attributes are in the , , or nodes. As to operator precedence etc… I will let years of CS development make the decision. After all, logic is the mother of all computing :slight_smile:

I just like how the stuff above is readable and I believe it’s very easy to process as well.

Charles

I don’t know how everyone feels about going for more robust logic (I
lean in favour, but that’s just me) …

But if there is interest in that, I like this. The only think I would
want to add would be a wrapper around the rendering
element(s) at the end.

Frank

If we kept the “match” attribute around (and limit its scope to the
element it is used in), we wouldn’t need cs:not, and we might even get
away with forgoing cs:or. In place of Frank’s proposed cs:then (to
contain the output of the conditional), we could also nest the output
within the additional conditionals. Recasting Frank’s unit test XML:

    <choose>
      <if type="article-journal" variable="volume issue" match="all">
        <and-if is-numeric="volume issue" match="nand">
          <text value="is an ARTICLE-JOURNAL with both VOLUME and

ISSUE, but one of them is non-numeric"/>


(I don’t know what English term to introduce as a value for “match” to
cover “but one of them is non-numeric”; going with “nand” for now)

      <else-if type="article-journal" variable="volume issue"

is-numeric=“volume issue” match=“all”>


(this one already works with current CSL)

      <else-if type="article-journal" match="none">
        <and-if variable="edition">
          <text value="is not an ARTICLE-JOURNAL, and has an EDITION"/>
        </and-if>
      </else-if>
      <else-if type="book">
        <and-if variable="edition" match="none">
        <text value="is a BOOK, but has no EDITION"/>
      </else-if>

      <else-if type="chapter" variable="author" match="all">
        <text value="is a CHAPTER, and has an AUTHOR"/>
      </else-if>

(this one also already works with current CSL)

    </choose>On Tue, Apr 16, 2013 at 2:25 AM, Charles Parnot <@Charles_Parnot> wrote:
<if type="article-journal">
    <or variable="volume issue" variable="volume" variable = "issue" />
    <text variable = "volume">
<if />

<if type="article-journal">
    <or variable="volume" />
    <or  variable="volume" />
    <text variable = "volume">
<if />

<if type="article-journal">
    <and>
        <not variable="volume" />
        <or  variable="issue"  />
    <and />
    <text variable = "volume">
<if />

with some implicit and when multiple attributes are in the , , or nodes.


Rintze

I’m sorry I’m a little late to join the discussion. I’m very much in support of the idea of more expressive and powerful conditionals. As Frank mentioned, I already implemented the original proposal. I also like Andrea’s and Frank’s version that would go towards a single condition attribute – that would be more complicated to implement but probably easier for style authors to write.

What I do like about the current conditionals (and also about Frank’s proposal) is the high-level structure:

… … … …

That is to say, one clearly defined root element per conditional branch with no nested children. I think this is easy to read and easy to implement.

I am not in favor at all of adding new (optional) child elements like etc. In my experience, dependencies between nodes always lead to less straight forward and less elegant implementations and consequently to more errors.

I also don’t think it’s fair to use XSLT as a measure for how easy it is to implement a feature cleanly unless you provide an actual XSLT implementation. If we wanted to pick one language for such measurements I would suggest to use Haskell for now, because it is a functional language and we do have an actual implementation to look at.

Sylvester

With the caveat that I’m distracted with other things, and so have not
followed this in detail …

But part of what I will say below is suggesting that the process
should be easy to follow for people who aren’t following closely.
Right now, it’s not.

I’m sorry I’m a little late to join the discussion. I’m very much in support of the idea of more expressive and powerful conditionals. As Frank mentioned, I already implemented the original proposal. I also like Andrea’s and Frank’s version that would go towards a single condition attribute – that would be more complicated to implement but probably easier for style authors to write.

What I do like about the current conditionals (and also about Frank’s proposal) is the high-level structure:

… … … …

That is to say, one clearly defined root element per conditional branch with no nested children. I think this is easy to read and easy to implement.

I am not in favor at all of adding new (optional) child elements like etc. In my experience, dependencies between nodes always lead to less straight forward and less elegant implementations and consequently to more errors.

I also don’t think it’s fair to use XSLT as a measure for how easy it is to implement a feature cleanly unless you provide an actual XSLT implementation. If we wanted to pick one language for such measurements I would suggest to use Haskell for now, because it is a functional language and we do have an actual implementation to look at.

The objection to tying future evolution of CSL to a particular
implementation language is reasonable.

But I still think it’s reasonable to point out the current design puts
no processing logic in node values, and that the original proposal
here did. I am, I think reasonably, urging caution about changing
this.

I totally agree with this.

Back to process. I strongly suggest going forward you adopt a standard
process that starts with clearly defining the problem you’re trying to
solve, and why.

So, first big question, that I’m not sure has been explicitly
addressed: why are we contemplating this change? Who does it benefit,
and how?

This is a very good point. I think the original point was to reduce verbosity indeed. Or increase readability?

Second, what costs would come from adding a change like this? How it
it going to be folded into schema and style versioning?

Third, do the benefits really outweigh the costs?

My view, for example:

If all a new proposal does is make styles less verbose, then that’s
not a compelling enough reason to make a disruptive change.

Fully agreed.

Now, to be precise, I think the point was to increase readability by reducing verbosity. Sometimes, verbosity is a good thing and can increase readability (the opposite of obfuscation). But too much verbosity can make code harder to understand.

I take it you meant ‘readability’ as well is not enough reason for a disruptive change? I would tend to agree as well, because the main focuse of CSL is not readability (and of course, that’s part of the problem with the troubles people have editing styles; but IMO that’s always a problem with code, no matter the language, it won’t be accessible to everybody).

If, OTOH, the changes also make possible new functionality, with
demonstrated need, then the rationale becomes more compelling.

Yes.> Bruce

Bruce, I am in full agreement with your points regarding the design process.

The concerns you raised are very important (who does the change benefit and how? what are the costs? do the benefits outweigh the costs?) but the one on which most emphasis was placed in the discussion (or so it seemed to me) was the proposal’s alleged disruptiveness.

That’s what I wanted to draw attention to.

Or, more to the point: how exactly does the trailing ‘-all’ etc. make things more complicated in a meaningful way?

Currently we have something like: variable=“x y z” and a processor must:

  • split the value into tokens
  • fetch data from the citation item according to those tokens
  • join the evaluated data based on the value of the ‘match’ attribute

Now, with variable-all=“x y z” the processor must:

  • split the value into tokens
  • fetch data from the citation item according to those tokens
  • join the evaluated data using logical AND semantics

Why is that so disruptive as opposed to the current specification?

The main change required to implement the original proposal was actually that the evaluation of all conditions can not be calculated in a single iteration anymore, but with one iteration per attribute (variable, is-numeric etc.). This leads to a slight change in how the ‘none’ matcher must be handled to avoid double negations. In my experience this was the most disruptive aspect of the proposal, but was not even tackled in the present discussion.

The even bigger change is probably caused by the ‘not:’ prefix (at least in the Ruby implementation that was the case).

Please don’t get me wrong; I fully agree with what you’re saying about the process in general and I realize that new features should only be added if they bring a real benefit to the language – all this should be discussed in turn.

But if we are dismissing requests because of implementation-level concerns I would like to see more examples explaining those concerns. Especially when the request comes with two concrete implementations, with unit tests, is backwards compatible (in the sense that a processor can process styles with or without the new feature using the same algorithm) and when the ensuing discussion puts alternatives on the table that personally I find to be far more disruptive than the originally requested changes, which, in my opinion strike a good balance between the two possible approaches (i.e., putting conditional logic into attribute values at the cost of more difficult validation and putting the logic into additional attributes or nodes at the cost of reduced readability and – in my personal opinion – less elegant implementations).

Sylvester

For what it’s worth, Rintze and I have been kicking around some
refinements of the later proposals off-list. I’ve started working
through some of the logic in a thorny Bluebook-compatible style, to
see how one of the more promising patterns would play out in the wild.
It will take a few days for the refactoring, and I may do some coding
to get it working, to offer test samples.

Nothing to show yet, but it’s looking good in the mockup so far. I’ll
try to post again in a week or so.

Frank

For what it’s worth, Rintze and I have been kicking around some
refinements of the later proposals off-list. I’ve started working
through some of the logic in a thorny Bluebook-compatible style, to
see how one of the more promising patterns would play out in the wild.
It will take a few days for the refactoring, and I may do some coding
to get it working, to offer test samples.

Nothing to show yet, but it’s looking good in the mockup so far. I’ll
try to post again in a week or so.

As I said in earlier, in my experience all the instances where we have optional nested elements always lead to very unsatisfying implementations – at least I have been unable to come up with an elegant solution (I’d be glad to hear how others have approached this problem).

As things stand, I would be very unhappy with a solution where the conditional node has to fetch information from optional child nodes in order to evaluate its own condition – the only reason I spoke up was because I saw some of the proposals going into that direction. If there is an elegant solution to this problem, all the better, but I would like to see actual code to illustrate it, that’s all I wanted to say.

We’ll see how it goes. There’s no rush, and feeling out the territory
of possibilities is good for us. From work on the mockup, I’m learning
how much flexibility is needed to flatten conditional nesting for at
least some of the more challenging requirements. Once I have untangled
some of the code, I’ll certainly check to see if the initial proposal
would do as well.

I’ve been feeling the need for extensions to conditional logic for
quite awhile now, and it’s really great to see active discussion of
the issues.

Frank

I won’t argue that CSL development is sometimes a little chaotic, but
personally I like to first brain-storm about possible solutions when a
(perceived) need comes up, and then trace back to see how the somewhat
polished solutions could fit in the scheme of things. It makes it
easier to compare the old to the new. Frank and I did this a lot for
CSL 1.0 and that worked out quite well.

As feedback on Frank’s latest proposal, I disliked how the multiple
cs:condition elements weren’t separated from the following rendering
elements. I suggested that this could either be solved by introducing
a wrapping cs:conditions element for the cs:condition elements, or a
cs:then element for the rendering elements. These options can be seen
at https://gist.github.com/rmzelle/5474220 (first, Frank’s latest
proposal; second, cs:conditions; third, cs:then)

Also, I’m not currently convinced that we need nesting cs:condition
elements to get NAND functionality. If we (initially) can do without
nesting, so much the better.

Rintze

To clarify …

Bruce, I am in full agreement with your points regarding the design process.

The concerns you raised are very important (who does the change benefit and how? what are the costs? do the benefits outweigh the costs?) but the one on which most emphasis was placed in the discussion (or so it seemed to me) was the proposal’s alleged disruptiveness.

That’s what I wanted to draw attention to.

Or, more to the point: how exactly does the trailing ‘-all’ etc. make things more complicated in a meaningful way?

Currently we have something like: variable=“x y z” and a processor must:

  • split the value into tokens
  • fetch data from the citation item according to those tokens
  • join the evaluated data based on the value of the ‘match’ attribute

Now, with variable-all=“x y z” the processor must:

  • split the value into tokens
  • fetch data from the citation item according to those tokens
  • join the evaluated data using logical AND semantics

Why is that so disruptive as opposed to the current specification?

When I used the word “disruptive” in this context, I was meaning WRT
to compatibility, and therefore style and schema versioning. I realize
for developers, it’s not that big a deal.

Do we have a clear policy on this? Does a change like this go in a 1.1
schema and spec, and do we create 1.1 variants of all 5000+ styles?

Notwithstanding details, I think this is the core issue I see.