Patch for particle sorting

I haven’t had time to look at this, but I’m against any distinction
among particles. A particle is a particle. If it is what you call
“non-dropping” then it is by definition part of the family name.

Does that not work?

No. “de” in the name “W. de Koning” is non-dropping, but it may be appended
in inverted names (“Koning, W. de”). So it’s not a fixed part of the family
name.

I think it adds unreasonable complexity to have typed name parts
(forget about “hints”; that’s a hack to obscure more complex
modeling*), so we might need to talk about this some more.

When may we see the behavior you note above? How common is it?

Do any other implementers have thoughts on this?

Bruce

  • in json:

{
“author”: {
“particle”: {
“dropping”: “true”,
“value”: “de”
}
}

Some additional examples:

and

User Guide | 01. National Geographic Official Brand Assets | Brandfolder Mon, Sep 14, 2009 at 6:54 PM, Bruce D’Arcus <@Bruce_D_Arcus1> wrote:

On Mon, Sep 14, 2009 at 12:46 PM, Rintze Zelle <@Rintze_Zelle> > wrote:

On Mon, Sep 14, 2009 at 6:38 PM, Bruce D’Arcus <@Bruce_D_Arcus1> > wrote:

I haven’t had time to look at this, but I’m against any distinction
among particles. A particle is a particle. If it is what you call
“non-dropping” then it is by definition part of the family name.

Does that not work?

No. “de” in the name “W. de Koning” is non-dropping, but it may be
appended
in inverted names (“Koning, W. de”). So it’s not a fixed part of the
family
name.

I think it adds unreasonable complexity to have typed name parts
(forget about “hints”; that’s a hack to obscure more complex
modeling*), so we might need to talk about this some more.

When may we see the behavior you note above? How common is it?


Capitalization of particles, such as de, du, la, von, varies, depending on
the preference of the individual. If this cannot be ascertained, lowercase
the particle (except for La, Le, Les in French names) with the full name or
surname only. Occasionally the particle is dropped entirely.

Vasco da Gama, da Gama
Charles de Gaulle, de Gaulle
Comte de Grasse, de Grasse
Michel de Montaigne, Montaigne
Marquis de Montcalm, Montcalm
Duc de La Rochefoucauld, La Rochefoucauld
Ludwig van Beethoven, Beethoven
Vincent van Gogh, van Gogh
Wernher von Braun, Von Braun
Friedrich von Schiller, Schiller

Rintze

The only minor issue I see is van Gogh and von Braun. All else can be
handle either by attaching the particle to the given name (a particle
per se) or the family name.

In any case, I still want to underline that it’s asking a lot of
people (developers, users) to expect them to model a name with that
sort of complexity. I personally think that a reasonable solution
should ideally work with only a family and given name property, and no
special “hinting”.

Bruce

Sorry, but I’m not following. Do you mean to say that particles attached to
the family name aren’t considered as independent objects? Then how could you
obtain results like “de Gaulles, Charles” and “Gaulle, Charles de” without
modifying your input data? (in this example, “de” is a non-dropped particle,
and some styles might require it to be appended to the name).

Rintze

I haven’t had time to look at this, but I’m against any distinction
among particles. A particle is a particle. If it is what you call
“non-dropping” then it is by definition part of the family name.

Does that not work?

No. “de” in the name “W. de Koning” is non-dropping, but it may be
appended
in inverted names (“Koning, W. de”). So it’s not a fixed part of the
family
name.

I think it adds unreasonable complexity to have typed name parts
(forget about “hints”; that’s a hack to obscure more complex
modeling*), so we might need to talk about this some more.

When may we see the behavior you note above? How common is it?

Some additional examples:

http://books.google.com/books?id=SJyp_PS1rSkC&pg=PA107&lpg=PA107&dq=von+braun+particle+name&source=bl&ots=h47frFOp0H&sig=5bTuvhkyNA01Hi7H4db8P6lWhms&hl=en&ei=XneuSqT1Ho_s-AaO2PD0CA&sa=X&oi=book_result&ct=result&resnum=4

and

User Guide | 01. National Geographic Official Brand Assets | Brandfolder

Capitalization of particles, such as de, du, la, von, varies, depending
on
the preference of the individual. If this cannot be ascertained,
lowercase
the particle (except for La, Le, Les in French names) with the full name
or
surname only. Occasionally the particle is dropped entirely.

Vasco da Gama, da Gama
Charles de Gaulle, de Gaulle
Comte de Grasse, de Grasse
Michel de Montaigne, Montaigne
Marquis de Montcalm, Montcalm
Duc de La Rochefoucauld, La Rochefoucauld
Ludwig van Beethoven, Beethoven
Vincent van Gogh, van Gogh
Wernher von Braun, Von Braun
Friedrich von Schiller, Schiller

The only minor issue I see is van Gogh and von Braun. All else can be
handle either by attaching the particle to the given name (a particle
per se) or the family name.

Sorry, but I’m not following. Do you mean to say that particles attached to
the family name aren’t considered as independent objects?

Yes.

So from a processor standpoint, you’d get:

{ “family”: “van Gogh”, “given”: “Vincent” }

… and:

{ “family”: “Humbolt”, “given”: “Alexander von” }

We could then in theory adopt Frank’s suggestion that only a lowercase
component of the given name gets treated as a particle.

Or, we could be more explicit:

{ “family”: “Humbolt”, “given”: “Alexander”, “particle”: “von” }

This wouldn’t work for the van Braun case in terms of changing
capitalization, but there may be other solutions?

Then how could you obtain results like “de Gaulles, Charles” and “Gaulle, Charles de” without
modifying your input data? (in this example, “de” is a non-dropped particle,
and some styles might require it to be appended to the name).

I’d say “de Gaulles” is his family name.

I’m just strongly suggesting that we cannot reasonably expect to
support an independent name-part “particle” and to support two
different types of these pieces.

Bruce

The same goes for all the Dutch names, e.g. our “Willem de Koning”. “de” is
never dropped, and in Dutch styles the “de” is appended to the rest of the
name, while it is prepended for the AEM-style. An excerpt from a Dutch
paper:

http://www.tijdschriftvoorpsychiatrie.nl/zoeken/download.php?id=2814On Mon, Sep 14, 2009 at 10:23 PM, Bruce D’Arcus <@Bruce_D_Arcus1> wrote:

Then how could you obtain results like “de Gaulles, Charles” and
“Gaulle, Charles de” without
modifying your input data? (in this example, “de” is a non-dropped
particle,
and some styles might require it to be appended to the name).

I’d say “de Gaulles” is his family name.


(De Kloet e.a. 2005; Joëls e.a. 2007)

Joëls, M., Karst, H., de Rijk, R., e.a. (2008). The coming out of the brain
mineralocorticoid receptor. Trends in Neurosciences, 31, 1-7.
Kloet, C.S. de, Vermetten, E., Heijnen, C.J., e.a. (2007). Enhanced cortisol
suppression in response to dexamethasone administration
in traumatized veterans with and without posttraumatic
stress disorder. Psychoneuroendocrinology 32, 215-226.

I’m just strongly suggesting that we cannot reasonably expect to
support an independent name-part “particle” and to support two
different types of these pieces.

Even though there are two types of particles in real life? I agree that the
simplest solution is generally preferable, but the whole point of this
exercise is to increase flexibility in name handling, which seems to demand
this level of complexity.

Rintze

Then how could you obtain results like “de Gaulles, Charles” and
“Gaulle, Charles de” without
modifying your input data? (in this example, “de” is a non-dropped
particle,
and some styles might require it to be appended to the name).

I’d say “de Gaulles” is his family name.

The same goes for all the Dutch names, e.g. our “Willem de Koning”. “de” is
never dropped

And just to be clear: this is an exception. Right?

, and in Dutch styles the “de” is appended to the rest of the
name, while it is prepended for the AEM-style. An excerpt from a Dutch
paper:

http://www.tijdschriftvoorpsychiatrie.nl/zoeken/download.php?id=2814

(De Kloet e.a. 2005; Joëls e.a. 2007)

Joëls, M., Karst, H., de Rijk, R., e.a. (2008). The coming out of the brain
mineralocorticoid receptor. Trends in Neurosciences, 31, 1-7.
Kloet, C.S. de, Vermetten, E., Heijnen, C.J., e.a. (2007). Enhanced cortisol
suppression in response to dexamethasone administration
in traumatized veterans with and without posttraumatic
stress disorder. Psychoneuroendocrinology 32, 215-226.

I suppose I’m just being dense, but how does the above demonstrate
your point? Nothing seems out of the ordinary to me. E.g. I would
expect all those name to include particles, and that they would not be
prepended to the family name (notwithstanding, perhaps, de Koning).

As I say, though, I may just be missing something. I hope so, because
we seem to be going in circles :wink:

I’m just strongly suggesting that we cannot reasonably expect to
support an independent name-part “particle” and to support two
different types of these pieces.

Even though there are two types of particles in real life? I agree that the
simplest solution is generally preferable, but the whole point of this
exercise is to increase flexibility in name handling, which seems to demand
this level of complexity.

It’s been my experience that sometimes when it seems a particular
requirement requires a complex solution, it might make sense to
reassess.

Bruce

Then how could you obtain results like “de Gaulles, Charles” and
“Gaulle, Charles de” without
modifying your input data? (in this example, “de” is a non-dropped
particle,
and some styles might require it to be appended to the name).

I’d say “de Gaulles” is his family name.

The same goes for all the Dutch names, e.g. our “Willem de Koning”. “de” is
never dropped

And just to be clear: this is an exception. Right?

, and in Dutch styles the “de” is appended to the rest of the
name, while it is prepended for the AEM-style. An excerpt from a Dutch
paper:

http://www.tijdschriftvoorpsychiatrie.nl/zoeken/download.php?id=2814

(De Kloet e.a. 2005; Joëls e.a. 2007)

Joëls, M., Karst, H., de Rijk, R., e.a. (2008). The coming out of the brain
mineralocorticoid receptor. Trends in Neurosciences, 31, 1-7.
Kloet, C.S. de, Vermetten, E., Heijnen, C.J., e.a. (2007). Enhanced cortisol
suppression in response to dexamethasone administration
in traumatized veterans with and without posttraumatic
stress disorder. Psychoneuroendocrinology 32, 215-226.

I suppose I’m just being dense, but how does the above demonstrate
your point? Nothing seems out of the ordinary to me. E.g. I would
expect all those name to include particles, and that they would not be
prepended to the family name (notwithstanding, perhaps, de Koning).

As I say, though, I may just be missing something. I hope so, because
we seem to be going in circles :wink:

I went over these issues very carefully with Rintze when we were
working toward the initial two-option solution (options
inverted-name-sort-order and name-display-order). He has managed to
work that down into a single-option solution that covers known use
cases, and that’s a welcome simplification. What seems to be causing
confusion is that, in any solution that addresses real-world known use
cases, the particle must be handled as a particle, in all but a tiny
number of cases. It is possible to lose sight of that if you look
only at display forms; but for each combination of option-state and
data-state, you need to consider both the display order and the sort
key arrangement as a package deal. You do need the particle as a
separate input element to get things working right for both display
and sort keys.

Note also that the examples Rinze provides do include names that
have the particle prepended, following normal Dutch typesetting
conventions, and that presumably Rintze’ intention is to cover these
cases, in the form that he gives in the examples, since that is needed
in his work environment.

I’m just strongly suggesting that we cannot reasonably expect to
support an independent name-part “particle” and to support two
different types of these pieces.

Even though there are two types of particles in real life? I agree that the
simplest solution is generally preferable, but the whole point of this
exercise is to increase flexibility in name handling, which seems to demand
this level of complexity.

It’s been my experience that sometimes when it seems a particular
requirement requires a complex solution, it might make sense to
reassess.

Unfortunately, this is a complex one, and I think Rintze has worked it
down to the simplest solution possible in CSL.

Frank

I think we’re all talking at cross-purposes unfortunately, which is
making an already complex topic impossible to manage.

The conversation today was not about whether there needs to be a
separate particular structure. It was (at least as I understand it)
about whether there needs to be TWO kinds of particles.

So we have to ask ourselves about any solution like this: how likely
is to actually be implemented?

Nevermind CSL; how is this supposed to be implemented in Zotero or
Mendeley (in the internal data model and in the UI), or in some common
exchange format?

What happens if a CSL processor is using some existing legacy format for input?

Bruce

So we have to ask ourselves about any solution like this: how likely
is to actually be implemented?

Nevermind CSL; how is this supposed to be implemented in Zotero or
Mendeley (in the internal data model and in the UI), or in some common
exchange format?

Well, as discussed in the proposed spec-entry, you could make the
distinction between dropping and non-dropping particles simply based on
their location in either the given or family name fields. So there isn’t a
strict requirement to change the UI. Internally, the particles should of
course be handled separately. The “common exchange format” could either just
use the given/family name fields (so every app has to redo the particle
parsing), or it could be made a bit more intelligent and support storing of
particles in separate fields.

What happens if a CSL processor is using some existing legacy format for

input?

From what I understood, CSL processors generally won’t do much data parsing,
right (e.g. parsing of raw text date strings like “8 October 2008”)? The
same would hold for names. If names aren’t presented in separate given and
family name fields, no particle-logic is used.

Rintze

Just want to go back to this …

I haven’t had time to look at this, but I’m against any distinction
among particles. A particle is a particle. If it is what you call
“non-dropping” then it is by definition part of the family name.

Does that not work?

No. “de” in the name “W. de Koning” is non-dropping, but it may be appended
in inverted names (“Koning, W. de”). So it’s not a fixed part of the family
name.

Examples like this seem to be the tricky case.

  1. “May be appended” in what contexts? A style might require that it
    be so? If yes, what’s the language that specifies that?

  2. does this example sort on “K” or “d”?

Bruce

No. “de” in the name “W. de Koning” is non-dropping, but it may be
appended
in inverted names (“Koning, W. de”). So it’s not a fixed part of the
family
name.

Examples like this seem to be the tricky case.

  1. “May be appended” in what contexts? A style might require that it
    be so? If yes, what’s the language that specifies that?

This is the behavior that we’re trying to set with the demote-particle
option.

  1. does this example sort on “K” or “d”?

That depends. Copied from the earlier example:

Sort order A
(1) “de Koning”
(2) “W.”

Sort order B
(1) “Koning”
(2) “de”
(3) “W.”

Rintze