Names with particles

There has been a flurry of activity on the Zotero forums around the
handling of name particles. The citeproc-js processor is being too
aggressive about forcing capitals on particles, and the lack of
control has caused frustration to users.

Discussion has led to a possible solution. It is simple to describe,
but after implementing it in the processor, I find that it causes
quite a few of the existing tests to break. I have made a tentative
checkin of changes to the affected tests, so that they can be reviewed
by the group.

The rules applied in the proposal are:

(1) The first character of a name particle in first position on the
first-listed name in a bibliography is force to a capital letter.
(2) A particle not in first position is always forced to lowercase.

Three new tests provide a compact illustration of the behaviour:

https://bitbucket.org/bdarcus/citeproc-test/src/de13966c0d0c8a6fec4005b46ac6a07f3f614ff7/processor-tests/humans/name_ParticleCaps1.txt?at=default
https://bitbucket.org/bdarcus/citeproc-test/src/de13966c0d0c8a6fec4005b46ac6a07f3f614ff7/processor-tests/humans/name_ParticleCaps2.txt?at=default
https://bitbucket.org/bdarcus/citeproc-test/src/de13966c0d0c8a6fec4005b46ac6a07f3f614ff7/processor-tests/humans/name_ParticleCaps3.txt?at=default

(The ParticleCaps3 test includes a quote-escaped family name, to bind
the name particle as part of the name itself. If quote-escaping of off
the map of the specification, I can remove that data from the test.)

The affected tests are linked below. If members of the group have time
to review them, it would be a great help.

Frank

Hmm … doesn’t this necessarily suggest some spec changes? If yes, don’t
we need an issue, with usual documentation (of the use cases, etc.)?

BruceOn Mon, Mar 18, 2013 at 8:58 PM, Frank Bennett <@Frank_Bennett>wrote:

The tests illustrate use cases.

You want me to repost this as an issue on the specification tracker?

I don’t see where we address capitalization of name-particles at all in the
specs. Which part of the specs do you think this would affect, Bruce?
Maybe we should cover this - but if I see that correctly currently citeproc
does something that’s not in the specs either.On Mon, Mar 18, 2013 at 7:09 PM, Frank Bennett <@Frank_Bennett>wrote:

That’s the point: if we’re changing the test suite to respond to
non-documented behavior, then we better document it.

Frank, that’d be my impulse; maybe see what Rintze has to say?

I suggest we first hammer out the behavioral details here on the list
(which seems to have better reach than the GitHub issue tracker). Once
there is consensus, I’ll open a GitHub issue to keep track of the
required textual changes to the spec.

I found some supportive information in
http://www.ntvg.nl/publicatie/huidkanker-van-smeren-tot-snijden/volledig
(an article from a Dutch medical journal that uses Vancouver).
Bibliographic entry 21:

  1. De Vijlder HC, Sterenborg HJCM, Neumann HAM, Robinson DJ, de Haas
    ERM. Light fractionation significantly improves the response of
    superficial basal cell carcinoma to aminolaevulinic acid photodynamic
    therapy: five-year follow-up of a randomized, prospective trial. Acta
    Derm Venereol. 2012;92:641-7

Both “De” and “de” are both Dutch non-dropping name particles, and, as
Frank has in one of his tests, only the one starting the bibliographic
entry gets capitalized.

I don’t know if things are different in other languages, though.

Rintze

That’s the point: if we’re changing the test suite to respond to
non-documented behavior, then we better document it.

I suggest we first hammer out the behavioral details here on the list
(which seems to have better reach than the GitHub issue tracker). Once
there is consensus,

That’s fine. But I still think we need to be in the habit of clearly
and consistently documenting use cases and requirements in one place,
rather than forcing people to dig through different places to
reconstruct.

Putting forward proposal spec language early, in my view, is more efficient …

I’ll open a GitHub issue to keep track of the
required textual changes to the spec.

I found some supportive information in
http://www.ntvg.nl/publicatie/huidkanker-van-smeren-tot-snijden/volledig
(an article from a Dutch medical journal that uses Vancouver).
Bibliographic entry 21:

  1. De Vijlder HC, Sterenborg HJCM, Neumann HAM, Robinson DJ, de Haas
    ERM. Light fractionation significantly improves the response of
    superficial basal cell carcinoma to aminolaevulinic acid photodynamic
    therapy: five-year follow-up of a randomized, prospective trial. Acta
    Derm Venereol. 2012;92:641-7

Both “De” and “de” are both Dutch non-dropping name particles, and, as
Frank has in one of his tests, only the one starting the bibliographic
entry gets capitalized.

I don’t know if things are different in other languages, though.

Right.

Bruce

Going back to the notional rules from my original note, they were:

(1) The first character of a name particle in first position on the
first-listed name in a bibliography is force to a capital letter.

(2) A particle not in first position is always forced to lowercase.

As another data point, I have learned from Stephan De Spiegeleire that
in Belgium his name is alphabetised under “D”, and the particle is
treated as a fixed part of his family name, while publishers in the
Netherlands lowercase the “D”, and sort his name under “S”.

Particles set as a fixed part of the family name are a known issue,
handled in Zotero’s two-field input by enclosing the name in quotes.
The fact that they are not necessarily static in all contexts – that
a publisher might chose to treat it instead as a tussenvoegsel – is a
little awkward with that approach, but I don’t see how to can be more
refined without unacceptable complexity in the UI.

While in this particular case the two rules work out okay, I think
that maybe rule (2) can be dropped. It would be useful only if this
combination is desired:

citation: (Smith, J. & Van Jones, B.)
bibliography: John Smith and Brenda van Jones

Rule (2) could produce this result (if the ambiguous "first position"
modifier in the rule is taken to mean “first position in the printed
name, wherever it occurs in the citation”). I don’t think we’ve seen
evidence this combination is ever wanted, though.

Frank

Going back to the notional rules from my original note, they were:

(1) The first character of a name particle in first position on the
first-listed name in a bibliography is force to a capital letter.

We currently have no good way to explicitly encode this in CSL, so
implicit behavior might be the easiest way to handle it. There is
text-case=“capitalize-first”, but this can only be set on
cs:name-part, which would affect all names, not just the first one.
(see http://citationstyles.org/downloads/specification.html#text-case
)

While in this particular case the two rules work out okay, I think
that maybe rule (2) can be dropped. It would be useful only if this
combination is desired:

citation: (Smith, J. & Van Jones, B.)
bibliography: John Smith and Brenda van Jones

I don’t think we need this. (for Dutch)

Rintze

What I had found when investigating particles was that the capitalization of particles varies. Dropping particles seem to always be lowercase, but for the non-dropping part, the rule is different for different countries.

Here is what we have, where first string is the particle in lowercase, and then the dropping part, then the non-dropping part. I am not 100% sure of all that, but I am sure this can be combined with existing knowledge. I don’t see much way around listing all the cases and figuring out the rules for each particle. I don’t think there can be a general rule about capitalization, I am afraid.

Charles

// spain (??)
@“al”, @“al”, @"",
@“dos”, @“dos”, @"",
@“el”, @“el”, @"",
@“de las”, @“de”, @“Las”,
@“lo”, @“lo”, @"",
@“les”, @“les”, @"",

// italy (??)
@“il”, @“il”, @"",
@“del”, @"", @“del”,
@“dela”, @“dela”, @"",
@“della”, @“della”, @"",
@“dello”, @“dello”, @"",
@“di”, @"", @“Di”,
@“da”, @"", @“Da”,
@“do”, @"", @“Do”,
@“des”, @"", @“Des”,
@“lou”, @"", @“Lou”,
@“pietro”, @"", @“Pietro”,

// france – checked by Charles
@“de”, @"", @“de”,
@“de la”, @“de”, @“La”,
@“du”, @“du”, @"",
@“d’”, @“d’”, @"",
@“le”, @"", @“Le”,
@“la”, @"", @“La”,
@“l’”, @"", @“L’”,
@“saint”, @"", @“Saint”,
@“sainte”, @"", @“Sainte”,
@“st.”, @"", @“Saint”,
@“ste.”, @"", @“Sainte”,

// holland
@“van”, @"", @“van”,
@“van de”, @"", @“van de”,
@“van der”, @"", @“van der”,
@“van den”, @"", @“van den”,
@“vander”, @"", @“vander”,
@“v.d.”, @"", @“vander”,
@“vd”, @"", @“vander”,
@“van het”, @"", @“van het”,
@“ver”, @"", @“ver”,
@“ten”, @“ten”, @"",
@“ter”, @“ter”, @"",
@“te”, @“te”, @"",
@“op de”, @“op de”, @"",
@“in de”, @“in de”, @"",
@“in 't”, @“in 't”, @"",
@“in het”, @“in het”, @"",
@“uit de”, @“uit de”, @"",
@“uit den”, @“uit den”, @"",

// germany / austria
@“von”,@“von”, @"",
@“von der”,@“von der”, @"",
@“von dem”,@“von dem”, @"",
@“von zu”,@“von zu”, @"",
@“v.”, @“von”, @"",
@“v”, @“von”, @"",
@“vom”,@“vom”, @"",
@“das”, @“das”, @"",
@“zum”, @“zum”, @"",
@“zur”, @“zur”, @"",
@“den”, @“den”, @"",
@“der”, @“der”, @"",
@“des”, @“des”, @"",
@“auf den”, @“auf den”, @"",

// scotland (?)
@“mac”, @"", @“Mac”,
// not really particles since they are always attached and not used for sorting (?)
// @“mc”, @"", @“Mc”,
// @“o’”, @"", @“O’”,

// north africa / middle east (?)
@“ben”, @"", @“Ben”,
@“bin”, @"", @“Bin”,
@“sen”, @“sen”, @"",

// what to do with these ??
// au
// af

What I had found when investigating particles was that the capitalization of particles varies. Dropping particles seem to always be lowercase, but for the non-dropping part, the rule is different for different countries.

Here is what we have, where first string is the particle in lowercase, and then the dropping part, then the non-dropping part. I am not 100% sure of all that, but I am sure this can be combined with existing knowledge. I don’t see much way around listing all the cases and figuring out the rules for each particle. I don’t think there can be a general rule about capitalization, I am afraid.

Charles

Thanks for posting this, Charles. This is very good information to have.

After this post to zotero.org, I removed forced lowercasing for
ordinary order (non-sort order) names in citeproc-js. The processor
will leave the user’s capitalisation alone in all contexts other than
the very start of a bibliography entry:

forums.zotero.org/discussion/28457/arabic-names-with-the-particle-al/

Frank