suggested CSL changes

Simon and I have been chatting a bit about earlier suggestions he had
for a more flexible macro-based CSL structure. The basic changes would
be:

  1. par down the large number of elements to a small handful, and use
    attributes more
  2. get rid of the “relation” attribute and move to a more-or-less flat
    model (and so a long list of variable attributes)
  3. allow user-defined macros
  4. move away from explicit reliance on types for templates; making more
    use of the new conditional structure to achieve the same thing

This would have a number of benefits:

  1. simpler and more self-describing (easy to author styles and write
    code, and to document)
  2. more flexible
  3. more easily extended
  4. styles might be more compact and robust (because moving away from
    using types for templates)
  5. it might be more easily implemented in a GUI editor

The downsides are:

  1. it’s a change (so Simon will have to update code and styles :-))
  2. more difficult to tightly control validation (may not be so bad
    though)
  3. the flat model might have some awkward consequences

But on balance I’m fine with going in this direction. So if Simon is
fine with the changes and implementing them in Zotero, I suggest we
make the schema changes, call that CSL 1.0 pre-1, and push towards a
frozen 1.0 release ASAP.

Perhaps we commit them now as a branch in the SVN?

Anyway, here’s what a new style might look like (minus metadata):

Actually, a correct; this:

… would probably be:

Not sure about the first/subsequent handling in any case though.

Bruce

Okay, an update on the CSL changes, as I work on implementing them in
Zotero:

  1. Bruce has suggested an agents/events/text model, where dates would
    be incorporated into events. I have no real problem with this model,
    but I think “events” could be somewhat counterintuitive.

  2. Consensus is that short/long forms will be allowed on all fields;
    the parser should implement as many as possible, but need not
    implement all. Optionally, a form=“short” attribute may be applied to
    a tag referencing a macro, and the form will be inherited by
    all child elements.

  3. Conditionals will be revised, so that they take whitespace-
    delimited lists of fields and a match=“any”/match=“all” tag, allowing
    rudimentary AND/OR without making parsing difficult.

One question is what to do with subsequent citations. I favor using
allowing and adding subsequent-et-al
options, since much of the citation is the same between first and
subsequent forms. Bruce’s revised schema currently uses , which I believe might result in too much
redundancy.

Simon

Okay, an update on the CSL changes, as I work on implementing them in
Zotero:

  1. Bruce has suggested an agents/events/text model, where dates would
    be incorporated into events. I have no real problem with this model,
    but I think “events” could be somewhat counterintuitive.

I’m pretty open on this. The event thing is to deal with conferences,
hearings, and so forth. The easiest way to test is with real styles
and real data.

  1. Consensus is that short/long forms will be allowed on all fields;
    the parser should implement as many as possible, but need not
    implement all. Optionally, a form=“short” attribute may be applied to
    a tag referencing a macro, and the form will be inherited by
    all child elements.

  2. Conditionals will be revised, so that they take whitespace-
    delimited lists of fields and a match=“any”/match=“all” tag, allowing
    rudimentary AND/OR without making parsing difficult.

One question is what to do with subsequent citations. I favor using
allowing and adding subsequent-et-al
options, since much of the citation is the same between first and
subsequent forms. Bruce’s revised schema currently uses , which I believe might result in too much
redundancy.

Minor correction: it uses “citation-subsequent.”

So for others that might not know, we’re experimenting with a simpler
schema and model. Instead of explicit citation and bibliography
elements, we have a generic “context” element.

... ...

So the question, here, is how to treat subsequent citations; as a
separate context, or as some variant of the citation context?

The “et-al” configurations gets folded into this because some styles
(like APA) have different rules for first and subsequent. So as I’ve
been thinking about it, each context gets its own et al flags; e.g.:

...

Simon may be right that in this case in particular, the two context
definitions would be identical save for the options.

In the other main case – note-based styles – the subsequent template
is likely to be significantly different because it is a short form. So
redundancy is less and issue IMHO.

Related to all this is the question of how to configure the shortening
that happens with “Doe (1999)”

One idea I’d had was that’s a “citation-short” context, and that this
could tie in with the first/subsequent handling.

All of this is to say it gets a little complicated!

Any thoughts?

Bruce

PS - It’s occurred to me that simple key/values like the above option
elements can just as easily be represented as attributes of the
context element. Any opinions on this?

The “et-al” configurations gets folded into this because some styles
(like APA) have different rules for first and subsequent. So as I’ve
been thinking about it, each context gets its own et al flags; e.g.:

...

Simon may be right that in this case in particular, the two context
definitions would be identical save for the options.

This is probably true for the majority of author-date styles. We
could allow the attribute on tags, or we could
have subsequent-et-al-min and subsequent-et-al-use-first options.

In the other main case – note-based styles – the subsequent template
is likely to be significantly different because it is a short form. So
redundancy is less and issue IMHO.

True. But it doesn’t really make sense to go this route when using an
instead doesn’t make the CSL any more complicated in any
situation, but does remove redundancy in author-date styles (which
form the majority of all styles). We could also allow a citation-
subsequent context or tags in the citation context.

Related to all this is the question of how to configure the shortening
that happens with “Doe (1999)”

One idea I’d had was that’s a “citation-short” context, and that this
could tie in with the first/subsequent handling.

Are there styles that use anything besides the year (disambiguated)
in parentheses for this?

PS - It’s occurred to me that simple key/values like the above option
elements can just as easily be represented as attributes of the
context element. Any opinions on this?

Well, if we want to add the position tag to options, then obviously
this won’t work. Otherwise, it depends on the number of options we
end up having.

Simon

The “et-al” configurations gets folded into this because some styles
(like APA) have different rules for first and subsequent. So as I’ve
been thinking about it, each context gets its own et al flags; e.g.:

...

Simon may be right that in this case in particular, the two context
definitions would be identical save for the options.

This is probably true for the majority of author-date styles. We
could allow the attribute on tags, or we could
have subsequent-et-al-min and subsequent-et-al-use-first options.

I guess if we go this way I’d lean toward the second (see below).

In the other main case – note-based styles – the subsequent template
is likely to be significantly different because it is a short form. So
redundancy is less and issue IMHO.

True. But it doesn’t really make sense to go this route when using an
instead doesn’t make the CSL any more complicated in any
situation, but does remove redundancy in author-date styles (which
form the majority of all styles). We could also allow a citation-
subsequent context or tags in the citation context.

I think we should probably choose one way and go with it. In your
proposal, we’d have something like this for a note-style:

...

Right?

Related to all this is the question of how to configure the shortening
that happens with “Doe (1999)”

One idea I’d had was that’s a “citation-short” context, and that this
could tie in with the first/subsequent handling.

Are there styles that use anything besides the year (disambiguated)
in parentheses for this?

I’m not sure; probably not. Typically this is done in applications with
a “suppress author” flag (though this is not how we must do it).

PS - It’s occurred to me that simple key/values like the above option
elements can just as easily be represented as attributes of the
context element. Any opinions on this?

Well, if we want to add the position tag to options, then obviously
this won’t work. Otherwise, it depends on the number of options we
end up having.

I think it’s a good idea to assume simple key/value data structures in
any case because it maps better to standard programming structures
(hashes), but the advantage of not using the dedicated attributes is
that it’s easier to a) maintain the schema (probably, though
trivially), and b) to parse (again trivially).

Bruce

In the other main case – note-based styles – the subsequent
template
is likely to be significantly different because it is a short
form. So
redundancy is less and issue IMHO.

True. But it doesn’t really make sense to go this route when using an
instead doesn’t make the CSL any more complicated in any
situation, but does remove redundancy in author-date styles (which
form the majority of all styles). We could also allow a citation-
subsequent context or tags in the citation context.

I think we should probably choose one way and go with it. In your
proposal, we’d have something like this for a note-style:

...

Right?

Well, the would be enclosed in a / block,
but that’s the basic idea.

Speaking of which, what do you think of adding and <else-if-

tags? Obviously, this behavior can be imitated with <if
(condition) />…, but that’s less than optimal, and
these shouldn’t be hard to implement.

[…]

PS - It’s occurred to me that simple key/values like the above
option
elements can just as easily be represented as attributes of the
context element. Any opinions on this?

Well, if we want to add the position tag to options, then obviously
this won’t work. Otherwise, it depends on the number of options we
end up having.

I think it’s a good idea to assume simple key/value data structures in
any case because it maps better to standard programming structures
(hashes), but the advantage of not using the dedicated attributes is
that it’s easier to a) maintain the schema (probably, though
trivially), and b) to parse (again trivially).

True, but adding a position attribute just makes them two-dimensional
hashes or an array of hashes, both of which are just as widely
available as data structures. I suppose I have a slight preference
for tags. In truth, however, it makes very little difference
to me. We just have to get this decided soon so I can finalize my
implementation.

Simon

Simon Kornblith wrote:

Well, the would be enclosed in a / block,
Oops; yes.
but that’s the basic idea.

Speaking of which, what do you think of adding and <else-if-

tags? Obviously, this behavior can be imitated with <if
(condition) />…, but that’s less than optimal, and
these shouldn’t be hard to implement.

I was thinking about that as well.

But what’s the practical argument for why we might need it? I’m looking
for something little more specific than “less optimal” in other words,
as I prefer not to add something unless we can demonstrate a real benefit.

In this case, you could do:

... ...

… or:

...

… but that hardly qualifies as what I’d call “real benefit.” But maybe I’m missing some obvious use case?

PS - It’s occurred to me that simple key/values like the above
option
elements can just as easily be represented as attributes of the
context element. Any opinions on this?

Well, if we want to add the position tag to options, then obviously
this won’t work. Otherwise, it depends on the number of options we
end up having.

I think it’s a good idea to assume simple key/value data structures in
any case because it maps better to standard programming structures
(hashes), but the advantage of not using the dedicated attributes is
that it’s easier to a) maintain the schema (probably, though
trivially), and b) to parse (again trivially).

True, but adding a position attribute just makes them two-dimensional
hashes or an array of hashes, both of which are just as widely
available as data structures. I suppose I have a slight preference
for tags. In truth, however, it makes very little difference
to me. We just have to get this decided soon so I can finalize my
implementation.
Let’s just go with my hunch the option tags are slightly better.

Bruce

It would probably only get used in situations where, if something
isn’t in the reference, something else needs to be added to a
different part of the reference. It might be useful for, e.g., , but this can be mimicked with . I suppose it’s not truly
necessary, since none of the major styles need anything like it and
the behavior can be reproduced by other means.

Simon

Simon Kornblith wrote:

It would probably only get used in situations where, if something
isn’t in the reference, something else needs to be added to a
different part of the reference. It might be useful for, e.g., , but this can be mimicked with .
Right, and I wanted to mention, too, that the advantage of the if/then
structure is that you can tie formatting not to types, but to properties
(which I believe in general is more robust).

I’d say, then, let’s not go with that. If you find some situation where
it would be really useful, let us know.

Bruce