Sorting

Simon_Kornblith · July 11, 2007, 10:04pm

In my current APA sample style, I have:

This won’t work right in many cases. For most author-date styles we
need to sort first by some “citation key,” which is typically author,
editor, translator, or title, chosen in that order. (If the author
exists, we should not sort by editor.) Regardless of citation key, we
then need to sort by date. There are a few ways I can think of that
we can solve this problem. The simplest is probably:

A slightly more versatile approach is:

One final approach is:

This would rely on the specified within . While shortest, this third approach is without a
doubt the most complicated to implement.

Bruce and others, any opinions?

Simon

Bruce_D_Arcus1 · July 11, 2007, 10:27pm

Simon Kornblith wrote:

In my current APA sample style, I have:

This won’t work right in many cases. For most author-date styles we
need to sort first by some “citation key,” which is typically author,
editor, translator, or title, chosen in that order. (If the author
exists, we should not sort by editor.)

Before we move on, remember that the substitution structure is
intimately bound to the sorting logic in author-date sorting. In short,
the first field in the bib entry is in fact the primary sort key; always.

I see you hint at that below …

Regardless of citation key, we then need to sort by date. There are a few ways I can think of that
we can solve this problem. The simplest is probably:

That doesn’t work for unattributed periodical articles (which often or
usually sort on the periodical title). Right?

Also might not work for anonymous documents and such (where you sort on
“Anonymous” rather than substitute).

A slightly more versatile approach is:

One final approach is:

This would rely on the specified within . While shortest, this third approach is without a
doubt the most complicated to implement.

Bruce and others, any opinions?

OK, so the problem here is that in not having an explicit author element
and moving to relying on attributes more, we make it more difficult it
to allow full customization of sorting.

What happens if we don’t allow custom sorting and pre-define the
algorithms, reserving certain macro keywords (like “author”) so it works
reliably? E.g.

 <option name="sort-algorithm" value="author-date"/>

Bruce

Bruce_D_Arcus1 · July 11, 2007, 10:34pm

Actually, if we were going with a reserved macro, I’d probably call it
"creator" and leave the “author” variable.

Bruce

Simon_Kornblith · July 11, 2007, 10:51pm

According to the Chicago Manual of Style (16.83):

Successive entries by two or more authors in which only the first
author’s name is the same are alphabetized according to the
coauthors’ last names.

Brooks, Daniel R., and Deborah A. McLennan. The Nature of Diversity:
An Evolutionary Voyage of Discovery. Chicago: University of Chicago
Press, 2002.
Brooks, Daniel R., and E. O. Wiley. Evolution as Entropy. 2nd ed.
Chicago: University of Chicago Press, 1986.

This is where it gets complicated. If we are comparing names, we have
to inspect each name individually, rather than the full formatted
string. This means that the CSL parser must determine 1) what
variables/terms are used in the author macro and 2) the order in
which they are used, for any given item. We can’t just sort based on
the formatted text string.

As I reflect on it, I realize that this is the most versatile
approach there is, and there’s a good probability we’ll need this
versatility to properly implement some styles. However, from an
implementation standpoint, it’s also the most complicated option.

Simon

Bruce_D_Arcus1 · July 11, 2007, 11:08pm

Simon Kornblith wrote:

According to the Chicago Manual of Style (16.83):

Successive entries by two or more authors in which only the first
author’s name is the same are alphabetized according to the
coauthors’ last names.

Brooks, Daniel R., and Deborah A. McLennan. The Nature of Diversity:
An Evolutionary Voyage of Discovery. Chicago: University of Chicago
Press, 2002.
Brooks, Daniel R., and E. O. Wiley. Evolution as Entropy. 2nd ed.
Chicago: University of Chicago Press, 1986.

This is where it gets complicated. If we are comparing names, we have
to inspect each name individually, rather than the full formatted
string.

What I do in my XSLT stylesheets is construct a sort-string for the names:

Doe, Jane; Smith, John; Jones, Sarah

So that’s irrespective of how it gets formatted. Wouldn’t that work?

One of the reasons I’ve been kicking around the notion of modeling this
in the data, and not worry about first and last names so much.

As I reflect on it, I realize that this is the most versatile
approach there is, and there’s a good probability we’ll need this
versatility to properly implement some styles. However, from an
implementation standpoint, it’s also the most complicated option.

So which solution are you now preferring?

Bruce

Simon_Kornblith · July 12, 2007, 1:52am

What I do in my XSLT stylesheets is construct a sort-string for the
names:

Doe, Jane; Smith, John; Jones, Sarah

So that’s irrespective of how it gets formatted. Wouldn’t that work?

If you format names like this and ignore the prefix/suffix/delimiter
on all elements in the macro, then this should work, although we need
to either define the behavior if there are multiple elements
in the author macro, or we need to explicitly prohibit this situation.

One of the reasons I’ve been kicking around the notion of modeling
this
in the data, and not worry about first and last names so much.

As I reflect on it, I realize that this is the most versatile
approach there is, and there’s a good probability we’ll need this
versatility to properly implement some styles. However, from an
implementation standpoint, it’s also the most complicated option.

So which solution are you now preferring?

It seems like sorting based on is the simplest
solution that gets everything right.

Simon

Bruce_D_Arcus1 · July 12, 2007, 2:58am

Simon Kornblith wrote:

What I do in my XSLT stylesheets is construct a sort-string for the
names:

Doe, Jane; Smith, John; Jones, Sarah

So that’s irrespective of how it gets formatted. Wouldn’t that work?

If you format names like this and ignore the prefix/suffix/delimiter
on all elements in the macro, then this should work, although we need
to either define the behavior if there are multiple elements
in the author macro, or we need to explicitly prohibit this situation.

But the sorting key here is irrespective of any style; it’s contingent
on the data.

Think of it this way:

All names will have a sort form and a display form. The sort form is
style-independent. The display form is not.

So I don’t think this part is really that complicated.

One of the reasons I’ve been kicking around the notion of modeling
this
in the data, and not worry about first and last names so much.

As I reflect on it, I realize that this is the most versatile
approach there is, and there’s a good probability we’ll need this
versatility to properly implement some styles. However, from an
implementation standpoint, it’s also the most complicated option.
So which solution are you now preferring?

It seems like sorting based on is the simplest
solution that gets everything right.

OK, but using pre-set sort-algorithms?

I’m not that fond of using syntax in attributes like “macro:creator”.

Bruce

Simon_Kornblith · July 12, 2007, 3:23am

Simon Kornblith wrote:

What I do in my XSLT stylesheets is construct a sort-string for the
names:

Doe, Jane; Smith, John; Jones, Sarah

So that’s irrespective of how it gets formatted. Wouldn’t that work?

If you format names like this and ignore the prefix/suffix/delimiter
on all elements in the macro, then this should work, although we need
to either define the behavior if there are multiple elements
in the author macro, or we need to explicitly prohibit this
situation.

But the sorting key here is irrespective of any style; it’s contingent
on the data.

Think of it this way:

All names will have a sort form and a display form. The sort form is
style-independent. The display form is not.

So I don’t think this part is really that complicated.

Suppose we have:

What makes it complicated is that actually
needs to be processed in a peculiar way. We need to process the title
macro, but ignore the prefix/suffix of the title element within it
(no quotes before or after the title). Alternatively, we could make
“title” a special macro as well, but I’d rather avoid this if
possible to ensure versatility (what if a user uses a macro to format
the container title?).

The second question was what if we have:

[...]

Obviously, this is a bit strange, but we either need to prevent it
from happening at the schema level, or we need to specify that we
sort on and then on the constituent fields of . The latter is not especially difficult if we
already have to process macros as above.

One of the reasons I’ve been kicking around the notion of modeling
this
in the data, and not worry about first and last names so much.

As I reflect on it, I realize that this is the most versatile
approach there is, and there’s a good probability we’ll need this
versatility to properly implement some styles. However, from an
implementation standpoint, it’s also the most complicated option.
So which solution are you now preferring?

It seems like sorting based on is the simplest
solution that gets everything right.

OK, but using pre-set sort-algorithms?

I’m not that fond of using syntax in attributes like “macro:creator”.

I think a preset sort algorithm will be fine. If it turns out not
to be, we could also use:

…a variation on the scheme I previously proposed. But we can hold
off on this unless there’s an obvious need.

Simon

Bruce_D_Arcus1 · July 12, 2007, 3:38am

Simon Kornblith wrote:

Think of it this way:

All names will have a sort form and a display form. The sort form is
style-independent. The display form is not.

So I don’t think this part is really that complicated.

Suppose we have:

What makes it complicated is that actually
needs to be processed in a peculiar way. We need to process the title
macro, but ignore the prefix/suffix of the title element within it
(no quotes before or after the title).

OIC. The problem is actual substitution macros other than names. Yes,
this is a little funky. I guess only look for the variables within the
macro then.

Alternatively, we could make “title” a special macro as well, but I’d rather avoid this if
possible to ensure versatility (what if a user uses a macro to format
the container title?).

The second question was what if we have:
[...]
Obviously, this is a bit strange, but we either need to prevent it
from happening at the schema level, or we need to specify that we
sort on and then on the constituent fields of . The latter is not especially difficult if we
already have to process macros as above.

If we have a rule that one can only use variables for sort keys, then
that should solve this problem.

We might also want to restrict such a case.

…

…a variation on the scheme I previously proposed. But we can hold
off on this unless there’s an obvious need.

OK.

Bruce

Bruce_D_Arcus1 · July 12, 2007, 6:52am

Simon Kornblith wrote:

[...]
Obviously, this is a bit strange, but we either need to prevent it
from happening at the schema level …

Perhaps we shouldn’t allow macros to be called from within macros?
Consider this:

Wouldn’t be cool for a style to be able to send a processor into an
infinite loop.

Bruce

Simon_Kornblith · July 12, 2007, 7:05am

I was thinking about this as well. Limiting macros from calling other
macros seems like an unnecessary limitation, however. While it’s
probably not possible to specify in the schema, when we see:

[...]

…we could simply remember that “foo” has already been called. If
"foo" is called again by another tag within “foo” or a
different macro called by “foo,” we can then throw an error. For this
simple example, we see:

foo called
bob called
foo called - error out

This avoids infinite loops and, since our conditionals are relatively
simple and we have no stored variables, it doesn’t restrict anything
else.

Simon

Bruce_D_Arcus1 · July 12, 2007, 7:15am

Simon Kornblith wrote:

I was thinking about this as well. Limiting macros from calling other
macros seems like an unnecessary limitation, however.

But practically speaking, how real a limitation is it? I’m finding it
hard to imagine a use case that really needs to be able to call macros
from within macros.

…we could simply remember that “foo” has already been called. If
“foo” is called again by another tag within “foo” or a
different macro called by “foo,” we can then throw an error. For this
simple example, we see:

foo called

bob called

foo called - error out

This avoids infinite loops and, since our conditionals are relatively
simple and we have no stored variables, it doesn’t restrict anything
else.

I’m not sure this is even possible in XSLT (a functional language)? At
least, I can’t think how to do it.

How would you do it in JS?

Bruce

Simon_Kornblith · July 12, 2007, 7:35am

Simon Kornblith wrote:

I was thinking about this as well. Limiting macros from calling other
macros seems like an unnecessary limitation, however.

But practically speaking, how real a limitation is it? I’m finding it
hard to imagine a use case that really needs to be able to call macros
from within macros.

As I mentioned before, I’m already doing it in the APA style:

…we could simply remember that “foo” has already been called. If
“foo” is called again by another tag within “foo” or a
different macro called by “foo,” we can then throw an error. For this
simple example, we see:

foo called

bob called

foo called - error out

This avoids infinite loops and, since our conditionals are relatively
simple and we have no stored variables, it doesn’t restrict anything
else.

I’m not sure this is even possible in XSLT (a functional language)? At
least, I can’t think how to do it.

How would you do it in JS?

Suppose there’s some function, processLayoutElements(elements,
calledMacros), that takes the children of a tag (or a
or ) and processes them recursively. Then I could have:

processLayoutElements(elements_of_layout, )
…which, upon encountering the macro “foo,” would then call…
processLayoutElements(elements_of_foo, [‘foo’])
…which, upon encountering the macro “bob,” would then call…
processLayoutElements(elements_of_bob, [‘foo’, ‘bob’])
…which, upon encountering the macro “foo” again, would error out,
since “foo” is already in the array.

This could also be implemented with a hash table, although given that
this array would probably never have more than 2-3 items, it doesn’t
seem to matter much. I can’t see why this wouldn’t work in a standard
functional language, since the array itself isn’t changed (only
passed on), but I don’t know all of the idiosyncrasies of XSLT.

Simon

Bruce_D_Arcus1 · July 12, 2007, 2:03pm

OK, let’s grant we could use the feature. But …

Simon Kornblith wrote:

…we could simply remember that “foo” has already been called. If
“foo” is called again by another tag within “foo” or a
different macro called by “foo,” we can then throw an error.

Even if I figure out how to code it in XSLT, we’re still stuck with a
style which effectively has a major bug. This code would simply be there
to avoid crashing the application or process. The styling process would
still not work correctly.

So I don’t think that’s a good solution. I think we need to document
what would be an improper macro call within a macro.

Then it might be possible to catch that condition with, say, a
Schematron assertion.

Bruce

Simon_Kornblith · July 12, 2007, 5:44pm

Well, if you want it to format, it could throw a warning instead of
erroring out and simply exit the loop. Alternatively, it would be
trivial to make Zotero do this validation before executing a CSL,
rather than on the fly, so that all macros get validated. (If we
wanted to, we could require all parsers to do this in the spec, which
should prevent anyone from accidentally releasing a style like this.
I don’t know much about XSLT, but if you have a loop structure and a
list structure, you should be able to implement this. And, since XSLT
is turing-complete, you have something resembling a loop structure
and some way of emulating a list structure.)

The basic rule is: no macro may call itself, either directly or by
calling a chain of macros that calls it. Since CSL’s macros have no
side effects and no arguments, this is equivalent to prohibiting an
infinite loop.

It’s pretty hard to create a loop like this by accident, especially
when you’re usually not getting beyond 2 levels of macros.

Simon

Bruce_D_Arcus1 · July 12, 2007, 5:54pm

Simon Kornblith wrote:

Well, if you want it to format, it could throw a warning instead of
erroring out and simply exit the loop. Alternatively, it would be
trivial to make Zotero do this validation before executing a CSL,
rather than on the fly, so that all macros get validated. (If we
wanted to, we could require all parsers to do this in the spec, which
should prevent anyone from accidentally releasing a style like this.
I don’t know much about XSLT, but if you have a loop structure and a
list structure, you should be able to implement this. And, since XSLT
is turing-complete, you have something resembling a loop structure
and some way of emulating a list structure.)

Ideally, we achieve this through validation. E.g. it is not a CSL style
unless it validates against the schema.

Now, the question is whether we can validate this …

The basic rule is: no macro may call itself, either directly or by
calling a chain of macros that calls it. Since CSL’s macros have no
side effects and no arguments, this is equivalent to prohibiting an
infinite loop.

OK, Schematron works using xpath; so anything you can express with xpath
you can validate with Schematron. Seems ideal for this circumstance.

macro calling itself; something like:

context: cs:style/cs:macro/
pattern: @name = */@macro
error message: “You cannot call a macro from within a child element.”

That’s not quite right, but close enough. In short, it’s easy.

The chain is going to take much more thought.

Bruce

Topic		Replies	Views
proposed changes to CSL to permit AGU-style reference sorting CSL Development	26	335	August 21, 2008
finishing with CSL? CSL Development	22	312	August 14, 2007
Revised APA Style CSL Development	13	243	July 6, 2007
Sort keys on CSL styles CSL Development	3	353	May 19, 2012
Counting authors CSL Development	24	605	April 10, 2009

Sorting

Related topics