Sorting by articulars/infixes

Something popped up on the Zotero forums regarding articular-formatting
(“van”):

Frank’s reply touches on a related subject, which is still undefined in CSL:
the use of articulars in name sorting. There are some styles that sort on
the complete surname, including the articular (“van Dijken, Hans”), and some
that disregard the articular when sorting (“Dijken, Hans van”). I’d like to
propose a global non-localized option to toggle between these two behaviors,
perhaps “name-sort-with-infix” with boolean values (infix might be a better
term than articular, see: http://www.eogen.com/Tussen).

Can we still do this for CSL 1.0, or is it something for 1.1?

Rintze

Something popped up on the Zotero forums regarding articular-formatting
(“van”):
http://forums.zotero.org/discussion/3076/et-al-in-italic-/#Item_9

Frank’s reply touches on a related subject, which is still undefined in CSL:
the use of articulars in name sorting. There are some styles that sort on
the complete surname, including the articular (“van Dijken, Hans”), and some
that disregard the articular when sorting (“Dijken, Hans van”).

Are you SURE about this; that some styles force you to include all
articulars for sorting?

I very strongly believe that handling of articulars is really about
the data (the name itself) than it is about particular styles. For
example, the author of the RNG book (Eric van der Vlist) has a dutch
name that would normally treat “van der” as a non-soting articular.
However, he’s French, and expects his name to sort on the whole string
(“van der Vlist”).

So in the data format I’ve created for citeproc-py, the articular key
is ALWAYS non-sorting, and if you expect it to be included in sorting,
you need to include it as part of the family name key.

I’d like to propose a global non-localized option to toggle between these two behaviors,
perhaps “name-sort-with-infix” with boolean values (infix might be a better
term than articular, see: http://www.eogen.com/Tussen).

I don’t believe “infix” is ever used in English for this example.

Can we still do this for CSL 1.0, or is it something for 1.1?

We need to first clarify the use case.

Bruce

I’d like to propose a global non-localized option to toggle between these two behaviors,
perhaps “name-sort-with-infix” with boolean values (infix might be a better
term than articular, see: http://www.eogen.com/Tussen).

I don’t believe “infix” is ever used in English for this example.

BTW, maybe “article” is more common.

Bruce

I’d like to propose a global non-localized option to toggle between these two behaviors,
perhaps “name-sort-with-infix” with boolean values (infix might be a better
term than articular, see: http://www.eogen.com/Tussen).

I don’t believe “infix” is ever used in English for this example.

BTW, maybe “article” is more common.

In the document linked below, the names for these things just follow
their names as parts of speech: “preposition” for von or de "article"
for le or la.

http://dublincore.org/documents/1998/02/03/name-representation/

“Infix” is not that common in speech, but surely its meaning is within
reach for the reader, in a spec that makes prominent use of "prefix"
and “suffix”?

Something popped up on the Zotero forums regarding articular-formatting
(“van”):
http://forums.zotero.org/discussion/3076/et-al-in-italic-/#Item_9

Frank’s reply touches on a related subject, which is still undefined in CSL:
the use of articulars in name sorting. There are some styles that sort on
the complete surname, including the articular (“van Dijken, Hans”), and some
that disregard the articular when sorting (“Dijken, Hans van”).

Are you SURE about this; that some styles force you to include all
articulars for sorting?

Did some scratching around on sort orders (doing work avoidance, I’m
afraid), but the search space is heavily polluted by … discussion of
sorting methods use by reference manager software. Found just one
link of some use – in wikipedia, of course. Sigh.

It’s a personal and a cultural thing, or at least it can be. Jury’s
still out on what styles might require, though.

(

Something popped up on the Zotero forums regarding articular-formatting
(“van”):
http://forums.zotero.org/discussion/3076/et-al-in-italic-/#Item_9

Frank’s reply touches on a related subject, which is still undefined in
CSL:
the use of articulars in name sorting. There are some styles that sort on
the complete surname, including the articular (“van Dijken, Hans”), and
some
that disregard the articular when sorting (“Dijken, Hans van”).

Are you SURE about this; that some styles force you to include all
articulars for sorting?

Pretty sure. My favorite example:
http://aem.asm.org/cgi/content/full/74/9/2766#REFERENCES
I think English styles generally sort on the whole surname (including
articular/infix/surname prefix/tussenvoegsel). Some supporting material:
http://www.van-diemen-de-jel.nl/Genea/Spelling.html
also
http://blogamundo.net/dev/2008/12/23/dutch-names-and-your-database-columns/(one
of the commenters, Wouter Bolsterlee, is a (Dutch) Gnome
contributor/translator who apparently has quite some experience with this
stuff. We could always bother him if we want more details :P).

I very strongly believe that handling of articulars is really about
the data (the name itself) than it is about particular styles. For
example, the author of the RNG book (Eric van der Vlist) has a dutch
name that would normally treat “van der” as a non-soting articular.
However, he’s French, and expects his name to sort on the whole string
(“van der Vlist”).

So in the data format I’ve created for citeproc-py, the articular key
is ALWAYS non-sorting, and if you expect it to be included in sorting,
you need to include it as part of the family name key.

Maybe the sort-option should be global and localized.

Rintze

Something popped up on the Zotero forums regarding articular-formatting
(“van”):
http://forums.zotero.org/discussion/3076/et-al-in-italic-/#Item_9

Frank’s reply touches on a related subject, which is still undefined in
CSL:
the use of articulars in name sorting. There are some styles that sort
on
the complete surname, including the articular (“van Dijken, Hans”), and
some
that disregard the articular when sorting (“Dijken, Hans van”).

Are you SURE about this; that some styles force you to include all
articulars for sorting?

Pretty sure. My favorite example:
http://aem.asm.org/cgi/content/full/74/9/2766#REFERENCES
I think English styles generally sort on the whole surname (including
articular/infix/surname prefix/tussenvoegsel). Some supporting material:
http://www.van-diemen-de-jel.nl/Genea/Spelling.html
also
http://blogamundo.net/dev/2008/12/23/dutch-names-and-your-database-columns/
(one of the commenters, Wouter Bolsterlee, is a (Dutch) Gnome
contributor/translator who apparently has quite some experience with this
stuff. We could always bother him if we want more details :P).

I don’t have time to weed through this, but it seems the last link
above is not about formatting per se, but about the data, recognizing
that most data formats have no concept of an article.

I still think this is primarily a data problem, and that it’d probably
be helpful if CSL left room for some evolution on this issue.

In English, I don’t think there’s any set rule. I do think,
anecdotally, that we tend to ignore articles though. For example,
“Alexander von Humbolt” I think usually would show up as “Humbolt,
Alexander” in a list.

But feel free to contact him.

I very strongly believe that handling of articulars is really about
the data (the name itself) than it is about particular styles. For
example, the author of the RNG book (Eric van der Vlist) has a dutch
name that would normally treat “van der” as a non-soting articular.
However, he’s French, and expects his name to sort on the whole string
(“van der Vlist”).

So in the data format I’ve created for citeproc-py, the articular key
is ALWAYS non-sorting, and if you expect it to be included in sorting,
you need to include it as part of the family name key.

Maybe the sort-option should be global and localized.

Maybe; but not until we have real evidence this is needed. Zotero and
other implementations don’t even understand articles, so it seems to
me it’d be best to sort it out (no pun intended!) on that end before
bringing back suggestions of changes in CSL?

Bruce

What more evidence do you need? Do you require style guide instructions, or
are show cases sufficient? Two Dutch journals that disregard articulars when
sorting:

http://www.tijdschriftvoorpsychiatrie.nl/zoeken/download.php?id=2814
"Rijk, R.H. de, & de Kloet, E.R. (2008). Corticosteroid receptor
polymorphisms:
determinants of vulnerability and resilience. European
Journal of Pharmacology, 583, 303-311."

http://www.criminologie.nl/tvc/thema_veelplegers.pdf (e.g. page 67)
“Laan, P.H. van der & A.A.M. Essers (1990) De Kwartaalkursus en recidive,
Arnhem: Gouda Quint (WODC serie onderzoek en beleid, # 99).”

Another American journal that does sort on articulars:


"van der Grift, E. A., and R. Pouwels. 2006.
Restoring habitat connectivity across transport
corridors: identifying high-priority locations for
defragmentation with the use of an expert-based
model. Pages 205–231 in J. Davenport and J. L.
Davenport, editors. The ecology of transportation:
managing mobility for the environment Springer,
Dordrecht, The Netherlands."

Rintze

Maybe; but not until we have real evidence this is needed.

What more evidence do you need? Do you require style guide instructions, or
are show cases sufficient?

No, I want to see some style guide that has unambiguous rules about
sorting with articles.

For example …

Two Dutch journals that disregard articulars when
sorting:

http://www.tijdschriftvoorpsychiatrie.nl/zoeken/download.php?id=2814
"Rijk, R.H. de, & de Kloet, E.R. (2008). Corticosteroid receptor
polymorphisms:
determinants of vulnerability and resilience. European
Journal of Pharmacology, 583, 303-311."

http://www.criminologie.nl/tvc/thema_veelplegers.pdf (e.g. page 67)
“Laan, P.H. van der & A.A.M. Essers (1990) De Kwartaalkursus en recidive,
Arnhem: Gouda Quint (WODC serie onderzoek en beleid, # 99).”

… the sorting is already what Frank and I expect, so that’s fine.

Another American journal that does sort on articulars:

http://www.ecologyandsociety.org/vol14/iss2/art7/ES-2009-2957.pdf
"van der Grift, E. A., and R. Pouwels. 2006.
Restoring habitat connectivity across transport
corridors: identifying high-priority locations for
defragmentation with the use of an expert-based
model. Pages 205–231 in J. Davenport and J. L.
Davenport, editors. The ecology of transportation:
managing mobility for the environment Springer,
Dordrecht, The Netherlands."

We can’t derive any general rules from this. What if "van der Grift"
is a case like Eric, where the details are about the individual (so it
should be formatting this way regardless of style), rather than the
style?

Bruce

Btw, I recognize you have more personal experience with Dutch name
issues. I just want confirmation that a Dutch persion with an article
in their name MUST (e.g. It would violate the style guide otherwise)
effectively prepend them to their family name with some styles.

I can confirm that all the examples I gave are of ‘real’ Dutch persons (and
not the rare ambiguous kind like Erik). And clearly the American journals I
linked to want these articles prepended.

I recognize that an option might have a limited scope, and may only be
really useful for Dutch names (especially if the implementation will rely on
a number of standard articles, like “van”, “van der”, etc.).

Rintze

More links:
http://forum.citizendium.org/index.php?action=printpage;topic=2460.0

I can confirm that all the examples I gave are of ‘real’ Dutch persons (and
not the rare ambiguous kind like Erik). And clearly the American journals I
linked to want these articles prepended.

This is what’s not clear to me though; that U.S. journals explicitly
expect this. Where does you clarity on this come from? Is it just a
commonly understood expectation?

I recognize that an option might have a limited scope, and may only be
really useful for Dutch names (especially if the implementation will rely on
a number of standard articles, like “van”, “van der”, etc.).

Articles are, of course, used in a lot of langauges: Spanish (“de”, as
in “de Arcos”; my family name in some distant past), and Arabic
(“bin”, as in “bin Laden”) are obvious examples.

I think this intersects with a broader discussion we need to have
about name order. We can probably say default sort order is [“family”,
“suffix”, “given”], but not I’ m not sure if we stop there, and I’m
not sure what happens to the display of the articile part in this
form?

If we added this option, it would mean some different rule; am not
sure it’s that we assume the article is prepended to the family name,
or if the part precedes the family name i n the sort order.

In any case, we need to figure out all this. Adding an attribute
parameters for this is easy enough.

Bruce

Although I haven’t found any clear discussion on this in any style guide, it
seems to me most U.S. journal styles just don’t consider the articles as
being something discrete. They’re ‘just’ part of the surname. But I’ll keep
looking.

Rintze

I can confirm that all the examples I gave are of ‘real’ Dutch persons (and
not the rare ambiguous kind like Erik). And clearly the American journals I
linked to want these articles prepended.

This is what’s not clear to me though; that U.S. journals explicitly
expect this. Where does you clarity on this come from? Is it just a
commonly understood expectation?

Presumably Rintze is speaking from immediate personal experience. One
of his example is his own article, published in an American scientific
journal, and has the references of Dutch authors sorted first from the
preposition part, then from the family name part:
http://aem.asm.org/cgi/content/full/74/9/2766#REFERENCES

There seem to be four possible cases for a style rule:

(1) always use particle in sort
(2) never use particle in sort
(3) want to use particle in sort, defer to personal pref
(4) do not want to use particle in sort, defer to personal pref

It looks like case (4) is illusory. An author who strongly prefers
that his name be sorted on the particle will just treat it as part of
his family name (as Bruce indicates). So that leaves cases (1), (2)
and (3) to deal with.

The Dutch publications he cites are of type (2) (sort on whatever each
person considers to be their family name).

Rintze’s own publication seems to be either of type (1) or of type
(3). Whichever it is, though, a (non-localized) style option is
clearly needed.

Since there are three potential cases, the safest course might be to
provide three values for use with a name-sort-with-infix (I still like
infix! :slight_smile: option: “always”, “never”, and “prefer”.

That would leave an unresolved issue over how to signal the preference
of the named person in the data. The preference in favor of sorting
on the particle is easy, it just gets merged with the family name.
The preference against sorting on the particle is harder. Do we
need to make a “no-infix-sort” flag available in the data bundle for
each name?

Frank

I can confirm that all the examples I gave are of ‘real’ Dutch persons (and
not the rare ambiguous kind like Erik). And clearly the American journals I
linked to want these articles prepended.

This is what’s not clear to me though; that U.S. journals explicitly
expect this. Where does you clarity on this come from? Is it just a
commonly understood expectation?

Presumably Rintze is speaking from immediate personal experience. One
of his example is his own article, published in an American scientific
journal, and has the references of Dutch authors sorted first from the
preposition part, then from the family name part:
http://aem.asm.org/cgi/content/full/74/9/2766#REFERENCES

Right, but all sorts of things happen in the publication chain that
are not the result of any clear set of rules. E.g. any one publication
is not any particular evidence of what the journal guidelines specify.

But that aside, this isn’t the most critical issue; I don’t have a
strong opinion on including this parameter, or not. But I do want us
to clarify the precise issues here.

There seem to be four possible cases for a style rule:

(1) always use particle in sort
(2) never use particle in sort
(3) want to use particle in sort, defer to personal pref
(4) do not want to use particle in sort, defer to personal pref

Before we move on, what do you mean by “personal pref”? Which “person”?

It looks like case (4) is illusory. An author who strongly prefers
that his name be sorted on the particle will just treat it as part of
his family name (as Bruce indicates). So that leaves cases (1), (2)
and (3) to deal with.

The Dutch publications he cites are of type (2) (sort on whatever each
person considers to be their family name).

Rintze’s own publication seems to be either of type (1) or of type
(3). Whichever it is, though, a (non-localized) style option is
clearly needed.

Since there are three potential cases, the safest course might be to
provide three values for use with a name-sort-with-infix (I still like
infix! :slight_smile: option: “always”, “never”, and “prefer”.

Again, we need to be precise here: what “use” are you speaking of
here? Put differently, are you talking about CSL, or the data input?

That would leave an unresolved issue over how to signal the preference
of the named person in the data. The preference in favor of sorting
on the particle is easy, it just gets merged with the family name.
The preference against sorting on the particle is harder. Do we
need to make a “no-infix-sort” flag available in the data bundle for
each name?

Here you seem to be talking about the data?

I don’t really think this is a problem. If we have “Alexander von
Humbolt,” then I’d call the “von” an “article” (or just leave it out
entirely frankly), which by default means it is not included in the
sort. If the style expects to sort on articles, I’d set that in the
style.

So I see only two cases here, I guess. But it’s the end of my day, and
I’ve been focused on other things, so feel free to set me straight.

Bruce

I can confirm that all the examples I gave are of ‘real’ Dutch persons (and
not the rare ambiguous kind like Erik). And clearly the American journals I
linked to want these articles prepended.

This is what’s not clear to me though; that U.S. journals explicitly
expect this. Where does you clarity on this come from? Is it just a
commonly understood expectation?

Presumably Rintze is speaking from immediate personal experience. One
of his example is his own article, published in an American scientific
journal, and has the references of Dutch authors sorted first from the
preposition part, then from the family name part:
http://aem.asm.org/cgi/content/full/74/9/2766#REFERENCES

Right, but all sorts of things happen in the publication chain that
are not the result of any clear set of rules. E.g. any one publication
is not any particular evidence of what the journal guidelines specify.

But that aside, this isn’t the most critical issue; I don’t have a
strong opinion on including this parameter, or not. But I do want us
to clarify the precise issues here.

There seem to be four possible cases for a style rule:

(1) always use particle in sort
(2) never use particle in sort
(3) want to use particle in sort, defer to personal pref
(4) do not want to use particle in sort, defer to personal pref

Before we move on, what do you mean by “personal pref”? Which “person”?

The person behind the name: http://kaivonfintel.org/von/

Before we move on, what do you mean by “personal pref”? Which “person”?

The person behind the name: http://kaivonfintel.org/von/

So from the CSL perspective, I think we can say one either has an
article in their name, or they don’t. Articles are never, by default,
included in sorting (though we still need to settle display rules for
them).

What you seem to be suggesting with your notion of “personal
preference” is the ability to say on a one-off basis “sort this
person’s name using the article”. I’m saying this is a non-sequitur in
my definition above; the “van der” in “Eric van der Vlist” is in fact
part of his family name, and so is not an article.

You may be worrying about how this might work in data entry, but I’d
say that’s really not our concern. I don’t think we should be
expecting CSL processors to be scanning strings and trying to deduce
what’s an article, and what’s not. It should be clear in the data.

Bruce

Before we move on, what do you mean by “personal pref”? Which “person”?

The person behind the name: http://kaivonfintel.org/von/

So from the CSL perspective, I think we can say one either has an
article in their name, or they don’t. Articles are never, by default,
included in sorting (though we still need to settle display rules for
them).

What you seem to be suggesting with your notion of “personal
preference” is the ability to say on a one-off basis “sort this
person’s name using the article”. I’m saying this is a non-sequitur in
my definition above; the “van der” in “Eric van der Vlist” is in fact
part of his family name, and so is not an article.

The other way around.

I don’t know what this means.

Bruce

The opposite of always. Some people insist that their name never be
sorted on the particle, regardless of publisher’s preference or
inclination.

That’s the meaning. As for the point itself, I’ve just discovered
that my trial subscription to the online CMS has not quite expired.
The relevant section is 18.69:On Tue, Aug 25, 2009 at 5:27 AM, Bruce D’Arcus<@Bruce_D_Arcus1> wrote:

On Mon, Aug 24, 2009 at 4:12 PM, Frank Bennett<@Frank_Bennett> wrote:

On Mon, Aug 24, 2009 at 11:25 PM, Bruce D’Arcus<@Bruce_D_Arcus1> wrote:

On Sun, Aug 23, 2009 at 8:54 PM, Frank Bennett<@Frank_Bennett> wrote:

Before we move on, what do you mean by “personal pref”? Which “person”?

The person behind the name: http://kaivonfintel.org/von/

So from the CSL perspective, I think we can say one either has an
article in their name, or they don’t. Articles are never, by default,
included in sorting (though we still need to settle display rules for
them).

What you seem to be suggesting with your notion of “personal
preference” is the ability to say on a one-off basis “sort this
person’s name using the article”. I’m saying this is a non-sequitur in
my definition above; the “van der” in “Eric van der Vlist” is in fact
part of his family name, and so is not an article.

The other way around.

I don’t know what this means.

In alphabetizing family names containing particles, the indexer must
consider the individual’s personal preference (if known) as well as
traditional and national usages. Merriam-Webster’s Biographical
Dictionary (bibliog. 4.1) provides a safe guide; library catalogs are
another useful source. Cross-references are often advisable (see
18.16). Note the wide variations in the following list of actual names
arranged alphabetically as they might appear in an index. See also
8.7, 8.11–13.

Beauvoir, Simone de
Ben-Gurion, David
Costa, Uriel da
da Cunha, Euclides
D’Amato, Alfonse
de Gaulle, Charles
di Leonardo, Micaela
Keere, Pieter van den
Kooning, Willem de
La Fontaine, Jean de
Leonardo da Vinci
Medici, Lorenzo de’
Van Rensselaer, Stephen

Chicago occasionally deviates from Webster when a name is invariably
accompanied by a particle and thus likely to be sought by most readers
under the particle—de Gaulle, for example.

This suggests that Chicago-conformant styles would treat Dutch names
as in Holland, and German names as in Germany. So if no publisher
ever deviates from Chicago rules, then no option is necessary.

We still don’t know whether that is the case, of course.

Frank