experimental "simon" version

Here’s some notes on the sort of approach Simon suggested, something
which I’d thought of before, but avoided fully doing. I still think
substitution needs to be an attribute on author/contributor.

Note, I went back to contributor here precisely because it makes things
simpler (less default templates). I could imagine again doing the same
with locators, if this seemed like a better approach.

One trick if we did do this is some rules for this sort of
inheritance/override mechanism. For example, would we say that if one
of these compound elements had any child elements, then it fully
overrides the default? What about attributes? I would not expect, for
example, for defaults to typically include prefix and suffix
attributes. What happens if they do, and a local template includes font
styling?

Bruce

<?xml version="1.0" encoding="UTF-8"?>

Here’s some notes on the sort of approach Simon suggested, something
which I’d thought of before, but avoided fully doing. I still think
substitution needs to be an attribute on author/contributor.

I would suggest that, if we take this route, we add an element, or something of the like (and maybe a complementary
element, but that doesn’t seem as crucial). Otherwise, we can’t
cite author-less resources in-text.

Note, I went back to contributor here precisely because it makes things
simpler (less default templates). I could imagine again doing the same
with locators, if this seemed like a better approach.

I suppose that makes sense. We would get the same amount of versatility
either way, so whatever is simpler is better.

One trick if we did do this is some rules for this sort of
inheritance/override mechanism. For example, would we say that if one
of these compound elements had any child elements, then it fully
overrides the default? What about attributes? I would not expect, for
example, for defaults to typically include prefix and suffix
attributes. What happens if they do, and a local template includes font
styling?

I can come up with at least four ways of handling this:

  1. If a compound element has children, those children override the default
    automatically, but attributes on the default are ignored. In this case,
    would only take compound elements.

  2. We adopt the same stance with respect to compound elements as in 1, , but
    attributes are the union of the two sets, with local attributes overriding
    default attributes. To stop a local tag from applying a prefix or suffix on
    the default, one could then use:

  1. We adopt the same stance with respect to compound elements as in 1, but
    we apply both sets of prefixes, suffixes, and other attributes. We then add
    an “override” attribute, which prevents the default attributes from
    applying:
  1. We extend the “override” attribute introduced in 3, allowing
    {“attributes” | “children” | “all”} as possible values. The problem here is
    we’d need to define some behavior if a compound element had children but no
    “override” attribute.

I agree that we need rules, but the exact approach doesn’t matter much to
me. All of these options are equally flexible, so it’s a question of
intuitiveness, consistency, and simplicity.

Simon

Here’s some notes on the sort of approach Simon suggested, something
which I’d thought of before, but avoided fully doing. I still think
substitution needs to be an attribute on author/contributor.

I would suggest that, if we take this route, we add an element, or something of the like (and maybe a complementary
element, but that doesn’t seem as crucial). Otherwise, we
can’t
cite author-less resources in-text.

Why?

The logic for citation formatting in author-year styles is, as I am
writing in a response to Johan, fully dependent on the bibliographic
formatting. Basically, when you have an author as the first element
(which is effectively always true in author-year styles at least), you
are saying “print the field that represents the sort key” (it’s not the
key per se, because it may be formatted differently).

So from a processing standpoint you first process the reference list
and work out all the year suffixes, substitutions, etc. To format the
citations, you need to look up that information in the reference list.

I agree that we need rules, but the exact approach doesn’t matter much
to
me. All of these options are equally flexible, so it’s a question of
intuitiveness, consistency, and simplicity.

OK, shall I put a branch version of the schema in the repo to test? If
we’re all happy with it, then I can move it to the trunk.

Bruce

Clearly, we need some way to handle this case of the author-less article,
which, with the growth of the web, isn’t as rare as it used to be. I suppose
we could use some kind of element, but provides
the additional advantage that it could conceivably be useful for other
purposes.

If, after we translate a wide range of different styles into CSL, it turns
out is unnecessary, we can kill it, but, when confronted
with a case that the current schema can’t handle, I would suggest we search
for a proactive, general solution.

Larry Wall said that Perl was designed to “make the easy things easy and the
hard things possible.” I think we should look at CSL the same way. It should
make common author-date styles easy to model, but it shouldn’t prevent
someone from modeling a more idiosyncratic system.

Simon

Clearly, we need some way to handle this case of the author-less
article,
which, with the growth of the web, isn’t as rare as it used to be. I
suppose
we could use some kind of element, but
provides
the additional advantage that it could conceivably be useful for other
purposes.

I’m just not sure how this would work though. And it should be
consistent between citation and bibliography of course. I guess, then,
you’re objecting to the alternate and secondary-alternate attributes?

FWIW, I use a lot of sources that are author-less (indeed I published a
book full of archival document and news article references, where the
citations were processed with citeproc), so it’s not like the system
isn’t designed with that in mind.

It’s just that I don’t think the substitution in citations is
independent of the bibliography (well, except for when there is no
bibliography of course, as in some note styles), and it sounds like
you’re arguing they are.

If, after we translate a wide range of different styles into CSL, it
turns
out is unnecessary, we can kill it, but, when
confronted
with a case that the current schema can’t handle, I would suggest we
search
for a proactive, general solution.

True, but the schema I suggested would work. Processing logic for
author-date citations would be:

if author then author
else author_substitute

… where author_substitute is a parameter of the reference (not the
citation), and is determined by first looking for the “alternate” field
and second for the “secondary-alternate.”

Larry Wall said that Perl was designed to “make the easy things easy
and the
hard things possible.” I think we should look at CSL the same way. It
should
make common author-date styles easy to model, but it shouldn’t prevent
someone from modeling a more idiosyncratic system.

Absolutely agree. If you want to suggest markup for the above, as well
as address why the current attribute-based approach is inadequate, I
can add it.

Bruce

I think the current attribute-based approach is inadequate because it
doesn’t allow enough specification on title formatting. Given the CSL you
provided, how does one specify to what length the title should be truncated
(if it should be truncated at all)? What if a format specifies that there
should be ellipses after the title if it’s used in an in-text citation? It
seems to me (although I could be wrong) that there’s no way to do this in
the current CSL. We could add yet more attributes to the element,
but it seems to me that elements are the best way of handling this problem.

As I think about it, can’t handle absolutely everything
either. You’re right that the substitution is reference-specific, because,
for example, an author-less book title needs to be italicized, while an
author-less article title needs to be in quotes. One option would be to
ignore author-less books, considering that should be a very rare
circumstance, but I’d prefer a solution that could take care of all of these
issues together.

Simon

It’s worth asking now, are these requirements? Do formats ever say “you
shall shorten titles in such and such way”, and then specify different
titles be truncated in different ways?

I’ve certainly come across “use shortened title”, in which case that’s
up to either the user to provide in their data, or the software to
handle truncation (by default I’d assume stripping the subtitle).
Hence:

<title form="short"/>

If I look in Chicago – which is pretty much THE definitive citation
manual – I see stuff like this:

Oops, ahem, I’ll keep the above, but:

“The most common short form consists of the last name of the author and
the main title of the work cited, usually shortened if more than four
words …”

But it’s worth noting that the examples they give would be hard for
software to handle, and I’d prefer to leave it to users:

long: Poverty and Inequality in Latin America
short: Poverty and Inequality

Software would be rather dumb and result in "Poverty and Inequality in"
presumably.

So we’re left with these questions:

  1. does CSL need to define shortening rules?
  2. if yes, can they be global? Can they, in other words, be defined
    independent of templating and substitution rules?

It’s worth noting that the short form above is for subsequent citations
common in note styles, which often do not have bibliographies. So there
you’d just do:

The substitution isn’t important, because there’s no bibliography.

In any, I don’t think it’s a big deal to switch from using an attribute
to an element for substitution, and will try it.

Bruce

It’s worth asking now, are these requirements? Do formats ever say “you
shall shorten titles in such and such way”, and then specify different
titles be truncated in different ways?

I’ve certainly come across “use shortened title”, in which case that’s
up to either the user to provide in their data, or the software to
handle truncation (by default I’d assume stripping the subtitle).
Hence:

If I look in Chicago – which is pretty much THE definitive citation
manual – I see stuff like this:

Oops, ahem, I’ll keep the above, but:

“The most common short form consists of the last name of the author and
the main title of the work cited, usually shortened if more than four
words …”

But it’s worth noting that the examples they give would be hard for
software to handle, and I’d prefer to leave it to users:

long: Poverty and Inequality in Latin America
short: Poverty and Inequality

Software would be rather dumb and result in “Poverty and Inequality in”
presumably.

It’s relatively simple to write a smarter shortening algorithm, but I agree
that this need not be part of the CSL specification.

So we’re left with these questions:

  1. does CSL need to define shortening rules?
  2. if yes, can they be global? Can they, in other words, be defined
    independent of templating and substitution rules?

Hmm. #1 is a particularly difficult question to answer. There’s no question
that different styles have different formatting rules, but the differences
between MLA, APA, and Chicago are so obscure that it may not be worth trying
to handle all of them.

For example, Chicago says that, given an unsigned newspaper article, “the
name of the newspaper stands in place of the author.” This is a more
unlikely circumstance that we don’t necessarily have to support (and Chicago
suggests that, for most purposes, newspaper articles don’t need
bibliographic citations anyway), but on the other hand, I would conjecture
that MLA and APA would have you use the article title (although I can’t
verify this, since I don’t have the specific style guides, and the online
summaries are incomplete in this respect).

There’s also the situation of two authors with the same last name. MLA, APA,
and Chicago have you put an initial before (K. Johnson 2001), or specify the
whole name after, while CBE/CSE has you put all initials after (Johnson KQ
2001).

So, I guess this goes back to the question of whether these citations should
be absolutely perfect in all cases or just “good enough.” Honestly, after
looking at all of the rules, I’m beginning to think that no matter how well
we resolve the problem of modeling these shortening in the schema, it’s
still going to be difficult to find and implement all of the shortening
rules for each style.

Simon

For example, Chicago says that, given an unsigned newspaper article,
“the
name of the newspaper stands in place of the author.” This is a more
unlikely circumstance that we don’t necessarily have to support (and
Chicago
suggests that, for most purposes, newspaper articles don’t need
bibliographic citations anyway), but on the other hand, I would
conjecture
that MLA and APA would have you use the article title (although I can’t
verify this, since I don’t have the specific style guides, and the
online
summaries are incomplete in this respect).

Actually, I think this is a pretty easy rule to write (indeed, I’ve
been using it!):

This works because all journal articles, for example, have authors, os
the substitution doesn’t happen. And it works otherwise.

There’s also the situation of two authors with the same last name.
MLA, APA,
and Chicago have you put an initial before (K. Johnson 2001), or
specify the
whole name after, while CBE/CSE has you put all initials after
(Johnson KQ
2001).

I’ve not implemented this feature, so hadn’t look at it closely. For
the latter case, is this not how names are formated in the
bibliography?

So, I guess this goes back to the question of whether these citations
should
be absolutely perfect in all cases or just “good enough.” Honestly,
after
looking at all of the rules, I’m beginning to think that no matter how
well
we resolve the problem of modeling these shortening in the schema, it’s
still going to be difficult to find and implement all of the shortening
rules for each style.

My hunch is we’re fine to leave things as are with the notion of a
short title, and leave it up toe users and software to figure out what
that is.

Bruce