title casing

John asked awhile ago about more title-case options in CSL. I haven’t
done that because I wanted to stick to CSS/FO. Here’s what we have now:

  attribute text-transform { "none" | "lowercase" | "uppercase" | 

“capitalize” }?,

So I guess we’d just need one more option: maybe call it “headline”?
Any other ideas?

Does anybody know if there’s been any discussion of adding this to CSS
such that we could retain compatibility there?

The problem with all this is it depends on text processing. Automatic
capitalization just seems so error prone, maybe more so when you go
from the British (capitalize) to the US (headline) style.

Bruce

John asked awhile ago about more title-case options in CSL. I haven’t
done that because I wanted to stick to CSS/FO. Here’s what we have
now:

  attribute text-transform { "none" | "lowercase" | "uppercase" |

“capitalize” }?,

So I guess we’d just need one more option: maybe call it “headline”?
Any other ideas?

Can we call one “sentence” and the other “title”? This are the names
CMS uses (8.166-8.167).

Does anybody know if there’s been any discussion of adding this to CSS
such that we could retain compatibility there?

Doubtful, given how difficult it would be to implement.

The problem with all this is it depends on text processing. Automatic
capitalization just seems so error prone, maybe more so when you go
from the British (capitalize) to the US (headline) style.

Yes. Some processors (those that use BibTeX conventions, where
putting brackets around a letter disables automatic case changing)
might be able to get this right. There will always be exceptions,
both when converting sentence to title style (e.g., iMac -> Imac) and
when converting headline to sentence style (e.g., proper nouns).
There’s no automated solution. These attributes could only be
guidelines that parsers could either attempt to observe or ignore
entirely.

Simon

My experience with BibDesk shows that this is a pretty big issue.
It’s good to have the options you describe, but where would someone
specify that a word is an Acronym and therefore should never be
downcased?

It’s tempting to allow a user to have a list of Acronyms, but that
would never work because of ones like IS (Information Science/Systems
etc.)

The information has to be in the citation record.

One option is to force the user to {} protect acronyms in the title,
but people find that very confusing in BibDesk/BibTex and so just {}
protect the whole title, which then breaks styles. One option that
we’ve considered is having a GUI checkbox next to the title that,
when checked, {} wraps words in ALLCAPS, but that doesn’t work for
Proper Names and wrapping all capitals seems a bit excessive.

Another option is specifying Capitalized tokens in another citation
field, but that seems like even more work for the user.

The other situation is capitalization after things like : and ? or
— but that can be handled.

If CSL is going to alter capitalization, how would it get its
citation specific exception information from?

–J

John asked awhile ago about more title-case options in CSL. I haven’t
done that because I wanted to stick to CSS/FO. Here’s what we have
now:

  attribute text-transform { "none" | "lowercase" | "uppercase" |

“capitalize” }?,

So I guess we’d just need one more option: maybe call it “headline”?
Any other ideas?

Can we call one “sentence” and the other “title”? This are the names
CMS uses (8.166-8.167).

I don’t like the idea of changing the standard CSS/FO names. Title
should retain the CSS/FO meaning (which IIRC is what you call
"sentence") and we can call the other what you want to call it.

Does anybody know if there’s been any discussion of adding this to CSS
such that we could retain compatibility there?

Doubtful, given how difficult it would be to implement.

The problem with all this is it depends on text processing. Automatic
capitalization just seems so error prone, maybe more so when you go
from the British (capitalize) to the US (headline) style.

Yes. Some processors (those that use BibTeX conventions, where
putting brackets around a letter disables automatic case changing)
might be able to get this right. There will always be exceptions,
both when converting sentence to title style (e.g., iMac -> Imac) and
when converting headline to sentence style (e.g., proper nouns).
There’s no automated solution. These attributes could only be
guidelines that parsers could either attempt to observe or ignore
entirely.

Right.

Bruce

My experience with BibDesk shows that this is a pretty big issue.
It’s good to have the options you describe, but where would someone
specify that a word is an Acronym and therefore should never be
downcased?

It’s tempting to allow a user to have a list of Acronyms, but that
would never work because of ones like IS (Information Science/Systems
etc.)

I think this is what Bookends does.

The information has to be in the citation record.

One option is to force the user to {} protect acronyms in the title,
but people find that very confusing in BibDesk/BibTex and so just {}
protect the whole title, which then breaks styles. One option that
we’ve considered is having a GUI checkbox next to the title that,
when checked, {} wraps words in ALLCAPS, but that doesn’t work for
Proper Names and wrapping all capitals seems a bit excessive.

One idea kicking around the Zotero forums was using CSS classes for
doing what some users are asking for in notes and such: rich text. So
use a class like “species-name” for latin species names, and specify
italic there.

That could translate well to this context because you could also reuse
semantic xhtml structure like abbr.

But there’s still the question of UI. Maybe when entering such a title
an application could flag possible words that need special treatment,
and the user could select the class or structure to apply to it?

So imagine I enter: “The ABC of Tao” or some such.

The app highlights “ABC” and the user can select an acronym option. Or
maybe it automatically applies it (b/c easy in this case) but the user
can change.

Another option is specifying Capitalized tokens in another citation
field, but that seems like even more work for the user.

The other situation is capitalization after things like : and ? or
— but that can be handled.

If CSL is going to alter capitalization, how would it get its
citation specific exception information from?

Yeah, this is an ugly, ugly problem.

I think we have to allow it be specified in the CSL, but have to leave
it to others how to implement.

Bruce

My experience with BibDesk shows that this is a pretty big issue.
It’s good to have the options you describe, but where would someone
specify that a word is an Acronym and therefore should never be
downcased?

It’s tempting to allow a user to have a list of Acronyms, but that
would never work because of ones like IS (Information Science/Systems
etc.)

I think this is what Bookends does.

The information has to be in the citation record.

One option is to force the user to {} protect acronyms in the title,
but people find that very confusing in BibDesk/BibTex and so just {}
protect the whole title, which then breaks styles. One option that
we’ve considered is having a GUI checkbox next to the title that,
when checked, {} wraps words in ALLCAPS, but that doesn’t work for
Proper Names and wrapping all capitals seems a bit excessive.

One idea kicking around the Zotero forums was using CSS classes for
doing what some users are asking for in notes and such: rich text. So
use a class like “species-name” for latin species names, and specify
italic there.

That could translate well to this context because you could also reuse
semantic xhtml structure like abbr

Yeah, I think that’s the way to go. Classes are ideal for that.

Which citation format, eg mods, allow that though?

Agree though that it isn’t a prob of CSL as long as it can use the
classes. So basically there’d be different capitalization actions for
the different classes. Cool.

–J

MODS does not; but RDF does.

<dc:title rdf:parseType="Literal">The <x:abbr>ABC</x:abbr> of 

Tao</dc:title>

Bruce

Nice, so Zotero is using RDF to store references?

–J

No; SQLite. They use it as the primary import/export format, though. If
you think of rdf resources as something like tables and URIs as primary
keys, it’s actually better fit than something like MODS.

It would be smart for Mozilla to put an RDF layer on top of SQLite as
an option though. I’m not sure that extensions like Zotero ought to
need to deal with SQL and such.

Bruce

None of the CSS/FO standard names do what we want. text-transform:
capitalize sets the first letter of each word to uppercase.
Obviously, when capitalizing titles, this is not what we want to do.
We want to use title case, which has much more complicated rules than
this. If we go with the CSS names, then we’re implementing CSS
incorrectly. In addition to being ambiguously named, the CSS options
don’t do what we want.

I’m open to the idea of using a different attribute to take care of
title case/sentence case, but I believe that, if we use a CSS
property, we should implement it as it is in CSS.

Simon

I don’t like the idea of changing the standard CSS/FO names. Title
should retain the CSS/FO meaning (which IIRC is what you call
"sentence") and we can call the other what you want to call it.

None of the CSS/FO standard names do what we want. text-transform:
capitalize sets the first letter of each word to uppercase.
Obviously, when capitalizing titles, this is not what we want to do.
We want to use title case, which has much more complicated rules than
this. If we go with the CSS names, then we’re implementing CSS
incorrectly. In addition to being ambiguously named, the CSS options
don’t do what we want.

Right, right; my bad. I forgot there’s no “title” option in CSS. For
some reason I was thinking there was.

I’m open to the idea of using a different attribute to take care of
title case/sentence case, but I believe that, if we use a CSS
property, we should implement it as it is in CSS.

Agreed.

So I don’t care that much then how we do it.

Bruce