title casing skip words

Hi everyone,
currently, CSL specs define title casing (text-case=“title”) as uppercasing
the first letter of every word that isn’t on a list of “stop words”.
http://citationstyles.org/downloads/specification.html#title-case-conversion
According to the Chicago Manual of Style - which has to the best of my
knowledge the most thorough rules for title casing, all propositions
should be lowercased

I would like to propose (and Frank & Rintze agree) to bring the specs in
line with CMoS on this, i.e. rephrase the rule as “preopositions as well as
the following stop words…”

Current citeproc-js behavior is a hybrid, with some additional
prepositions, not listed in the specs, already included, but others missing.
On the technical side, while we may miss some prepositions and some words
may be ambiguous, I don’t think it’d be hard to find & add a list of
prepositions to title-casing.

Are there any concerns or objections?

Sebastian
original discussion on the Zotero forums:

Aren’t "stop words’ just a superset of prepositions?

Seems better to keep the current language, and specify what we mean by the
former.

Bruce

Does CMoS give a comprehensive list of words? If not, we could change
the language in the spec to something more general (e.g. a mention
that we follow CMoS on this topic), and provide a JSON file with all
desired stop-words in either the “schema” or “documentation” repo.
That would be much easier to keep up-to-date.

Rintze

I will also add that, looking at this list of english prepositions (is
this correct?), I’m not sure I accept the CMoS description.

Bruce

  1. The CMoS doesn’t provide a _de_scription of title case, it provides a
    _pre_scription. If we want correct title case according to CMoS we’ll need
    to follow it. If someone wants to put in the work to find out if other
    style manuals define other capitalization rules I’d be happy to discuss
    those, but I’ve never seen them clearly defined anywhere but in the CMoS
    (which we also follow otherwise, e.g. by always capitalizing the last word).

  2. CMoS does not give a comprehensive list of words. So while on the
    technical side, prepositions are indeed just a subset of the stop words, I
    don’t think we should prescribe what processors regard as preposition in
    the specs - not least because that would mean changing the specs every time
    we notice a preposition we didn’t include. I like Rintze’s idea of just
    supplying them in a separate file.

yes, I do indeed think so. Here are the full rules from CMoS, followed by
some examples.
Notice the explicit “regardless of length” in rule three and the lower
casing of “according to” in the 4th example.
As I mention in my original mail, we won’t be able to get everything right
all the time (see example 5), but that’s already the case now.

The conventions of headline style are governed mainly by emphasis and
grammar. The following rules, though occasionally arbitrary, are intended
primarily to facilitate the consistent styling of titles mentioned or cited
in text and notes:

  1. Capitalize the first and last words in titles and subtitles (but see
    

rule 7), and capitalize all other major words (nouns, pronouns, verbs,
adjectives, adverbs, and some conjunctions—but see rule 4).

  1. Lowercase the articles the, a, and an.

  2. Lowercase prepositions, regardless of length, except when they are
    used adverbially or adjectivally (up in Look Up, down in Turn Down, on in
    The

Great; thanks.

I still think the notion of a preposition in their rules may not be
entirely clear. But what seemed to be the emerging consensus on how to
deal with this should allow us to get what we need. If it were me, I’d
only include the obvious core prepositions, and examples they include
in CMoS. E.g. I would not include every word or group of words that’s
listed in WikiPedia.

I would also leave room for the possibility there are other rules on
this in the future. Certainly that’s been the case with things like
shortening number ranges.

BTW, the “stop words” phrase is just something I borrowed from
programming. It may or may not be the best phrase here.

yeah, I’m OK to take some of the more obscure prepositions out and also
remove words that can be both prepositions and something else (“regarding”,
e.g.).

If different rules for title casing emerge, we’ll need to add a
title-case-rule attribute or so, but let’s cross that bridge when we get
there. I’ll put together a suggested list of prepositions to include.

ok, here’s a suggested list of prepositions. Took the one and two word list
from WP, removed anything that I could conceivably see being used in a
non-preposition way. Didn’t include three words and outdated prepositions
(though some of these are already on the outdated side of things).

What’s “WP”? Should I go ahead and store these words in a JSON file in
the “documentation” repository, e.g. in a file “prepositions.json”?

Rintze

this is the list from Wikipedia (WP) narrowed down manually by me as
described.
If there are not objections against the list itself, yes, putting these in
the documentation as JSON sounds good.
Frank and other proc maintainers - any issues/wishes for implementing this?On Mon, Aug 26, 2013 at 9:29 AM, Rintze Zelle <@Rintze_Zelle>wrote:

resending without all the thread at the bottom - sourceforge is complaining
this is the list from Wikipedia (WP) narrowed down manually by me as
described.
If there are not objections against the list itself, yes, putting these in
the documentation as JSON sounds good.
Frank and other proc maintainers - any issues/wishes for implementing this?

I will only say that I worry that this is a longer list than I’d like, and
that it may result in a fair bit of complaining from users, who get
unexpected results.

Am not sure I’m right, and I don’t feel really strong about this; just
sayin’.

I see what you’re saying, but will note that this goes both ways and people
are just as unhappy when Zotero capitalizes things it shouldn’t (and we
have been getting complaints about that, which is what started this
thread). If we get specific complaints, we can certainly fine-tune the list
(I may have overlooked some examples of prepositional words that can be
used otherwise), but I don’t see an alternative to a pretty lengthy list.

And I plan to revise the specification to just refer to the JSON file for
the words that are assumed to be prepositions. That should make maintenance
of the list much easier.

Rintze

I’m fine with implementing the new list in citeproc-js. I’ll wait
until the JSON file is ready, but it’s not problem to set up.

Frank

I’ve set up citeproc-js to use the new list, and installed the new
version in the Zotero CSL processor patch plugin. I’ve added a few
terms that seemed to be missing, mentioned here:

https://gist.github.com/adam3smith/6326169

It would be great if anyone has time to try it out and see if there
are any obvious problems. Here’s the plugin:

http://gsl-nagoya-u.net/http/pub/zotero-processor.xpi

Frank

where is the list you’re using? We never created the JSON file, did we?On Wed, Oct 16, 2013 at 9:29 AM, Frank Bennett <@Frank_Bennett>wrote:

I linked it in the last mail – it’s the gist you posted earlier in
the discussion.

https://gist.github.com/adam3smith/6326169

There’s no JSON file as far as I know, but I wanted to refactor the
title case code anyway, so I went ahead and used your draft list. The
changes aren’t yet live in Zotero or MLZ, it’s just out for testing so
far.

Frank