title casing skip words

ah OK, great - I was confused by your wording in that last mail.
We should update the specs before this goes life, too, to prevent
confusion.
@Rintze - do you want me to pack the list into a JSON file? Do you want to
handle editing the specs?On Wed, Oct 16, 2013 at 10:00 AM, Frank Bennett <@Frank_Bennett>wrote:

I’ll prepare the JSON file and open a pull request. Might be a day or two,
though.

Rintze

Done and ready for review:

I wonder if we should remove or add periods for all abbreviations. We
currently have “c”, “ca”, but also “v.” and vs.", which seems a bit
inconsistent. And we have a straight quote in “d’”, while typographically
you’d want an apostrophe. Should we just remove the straight quote?

RintzeOn Wed, Oct 16, 2013 at 12:32 PM, Rintze Zelle <@Rintze_Zelle>wrote:

I think while on the topic of title casing, it would be good to put in some
sort of way for the user to override CSL processor’s automatic
capitalization (or, conversely, force it). I am particularly concerned with
CMoS (exceptions to the) rules 3 and 7 mentioned
herehttp://xbiblio-devel.2463403.n2.nabble.com/title-casing-skip-words-tp7578629p7578635.html
:

  1. Lowercase prepositions, regardless of length, except when they are used

adverbially or adjectivally (up in Look Up, down in Turn Down, on in The On
Button, to in Come To, etc.) or when they compose part of a Latin
expression used adjectivally or adverbially (De Facto, In Vitro, etc.).

  1. Lowercase the second part of a species name, such as fulvescens in

Acipenser fulvescens, even if it is the last word in a title or subtitle.

I would suggest something on the lines of wrapping the words in 's
with class names “title-case” (to force title-casing of skip words as in
rule 3) and “no-title-case” (to prevent title-casing of non-skip words as
in rule 7). It seems that it would be close to impossible for a CSL
processor to figure out exactly when to follow these rules/exceptions.On Thu, Nov 14, 2013 at 8:17 PM, Rintze Zelle <@Rintze_Zelle>wrote:

So far the CSL spec is rather format-agnostic when it comes to input. It’s
one of the reasons why citeproc-js’s support for inline rich text
formatting of titles (
http://www.zotero.org/support/kb/rich_text_bibliography) isn’t included in
the spec.

I see the use of what you’re proposing, but it is rather HTML-oriented. Are
we comfortable including something like this in the spec, or would it be
better to have a separate (sub)document that focuses on CSL input (which
could also be used to describe the CSL JSON data model)?

Rintze

I haven’t looked at this issue, but putting html in json files feels
really wrong as a general proposition.

Clearly, string formatting (mostly in titles) is necessary and is getting
implemented whether CSL specifies it or not. IMO HTML (or XML, which would
probably be more work for everyone) is the most elegant and broadly
supported approach.

If CSL really wants to remain format-agnostic in this regard, then it could
just specify that substrings can be marked (with possible nesting) for
various formatting (italics, superscript, forced title-casing, etc.) and
leave the language of the formatting up to the citeproc developers. CSL can
then go on to specify how such substrings are handled when producing
citations.

Done and ready for review:
Add stop-words.json by rmzelle · Pull Request #31 · citation-style-language/documentation · GitHub

looks good to me, thanks!

I wonder if we should remove or add periods for all abbreviations. We
currently have “c”, “ca”, but also “v.” and vs.", which seems a bit
inconsistent.

in line with abbreviations in locales I’d add them I think - only concern
would be that processors should skip both versions of the abbreviations -
with and without period.

And we have a straight quote in “d’”, while typographically you’d want an

apostrophe. Should we just remove the straight quote?

I’m not 100% sure what the d’ is for - something like Palme d’Or? Wouldn’t
removing the apostrophe potentially cause problems then?

SebastianOn Thu, Nov 14, 2013 at 7:17 PM, Rintze Zelle <@Rintze_Zelle>wrote:

We could define these stop words without punctuation, and the CSL processor
could do the matching after deleting any periods in titles, and replacing
any single quote marks and apostrophes by spaces. So we would have “c”,
“ca”, “v”, “vs” and “d” as stop words, and a title like “Palme d’Or vs.
Smith” would be preprocessed to “Palme d Or vs Smith”, after which the
individual words are matched against the stop word list.

Rintze

Hi all,

Sorry for the potential thread break — I’m a new member to the list so I
can’t reply to this thread in a nice way…

My apologies in advance if this idea has previously been proposed and
decided against. I’ve only kept up with the list for a short while now.

I’m a developer for Paperpile, a new(ish) reference manager making use of
CSL and citeproc-js.

Personally, I’d love to see CSL move toward widespread support for the
solution that Bibtex has used for years, which is to allow users to
surround words with braces to protect the capitalization as-is. This would
go very far towards giving a user the power to force de-capitalization (or
capitalization) of things that they know should be a certain way in their
bibliography, despite the CSL style and/or the processor’s best attempts to
do the right thing.

It seems clear that no matter how smart one is about trying to title-case,
there will always be edge cases where a single item is improperly
capitalized. Without any user-facing way to protect these edge cases, we’re
stuck either (a) avoiding title case like the plague in CSL styles and just
using strings as-is from the input, or (b) embracing title case in the
styles and dealing with frustrated users who can’t tweak things the way
they want.

Some benefits to a bibtex-like syntax for protecting capitalization would
be:

  • It’s immediately familiar to anyone who’s ever used bibtex.
  • It’s directly interoperable with existing bibtex data.
  • It’s simple enough for many users to learn and remember. (I don’t think
    an HTML syntax would ever have this benefit.)
  • It easily generalizes to all variable outputs, not just titles.

(For context, we recently had a user contact us about incorrect
capitalization of journal names in APA style:
APA style: title-case for periodical names · Issue #759 · citation-style-language/styles · GitHub which brought
this issue to our attention.)

I think any type of HTML-based syntax would be misdirected toward
capitalization, since the issues of formatting (e.g. italics, superscript)
and capitalization are not one and the same. Formatting is specific to only
non-plain-text outputs, while capitalization is relevant no matter how a
citation is ultimately being displayed.

All that said, maybe there’s some obvious problem with this approach, or
maybe it’s something that should be left up to each processor to decided.
However, I have a feeling it would be best in the long term to have
something clearly spelled out in a processor-agnostic document to both
improve clarity and aid widespread adoption.

If there’s interest, we’d be happy to help implement this in citeproc-js.

greg

Clearly, string formatting (mostly in titles) is necessary and is getting
implemented whether CSL specifies it or not. IMO HTML (or XML, which would
probably be more work for everyone) is the most elegant and broadly
supported
approach.

If CSL really wants to remain format-agnostic in this regard, then it
could just
specify that substrings can be marked (with possible nesting) for various
formatting (italics, superscript, forced title-casing, etc.) and leave the
language
of the formatting up to the citeproc developers. CSL can then go on to
specify
how such substrings are handled when producing citations.

I haven’t looked at this issue, but putting html in json files feels
really wrong as a general proposition.

So far the CSL spec is rather format-agnostic when it comes to input.
It’s
one of the reasons why citeproc-js’s support for inline rich text
formatting
of titles (kb:rich_text_bibliography [Zotero Documentation])
isn’t
included in the spec.

I see the use of what you’re proposing, but it is rather HTML-oriented.
Are

If { } are only adopted from BibTeX for preserving capitalization (though
we still have an issue of how to indicate forced capitalization, i.e. CMoS
rule 3) then I don’t have much of a problem with this. My only nit would be
mixing HTML (italics, bold, etc.) and BibTeX (capitalization) for markup
(and of course there would need to be a way to escape curly braces). If, on
the other hand, this will eventually lead to adoption of BibTeX syntax for
text formatting, then I would have to say that BibTeX syntax is far far
from straightforward. My biggest issue (which I am still lost in) is the
way capitalization is treated within different levels of brace nesting. How
do I mark up capitalization within, e.g., \textit{ } fragment?

Excerpt from “Tame the Beast” (
http://tug.ctan.org/info/bibtex/tamethebeast/ttb_en.pdf):

the second transformation applied to a title is to be turned to lower case
(except the first character).
The function named change.case$ does this job. But it only applies to
letters that are
a brace depth 0, except within a special character. In a special
character, brace depth is always
0, and letters are switched to lower case, except LATEX commands, that are
left unmodified.

Imagine explaining that to someone who’s never actually used BibTeX…

But as I say above, perhaps it makes more sense to allow each processor to
define the syntax of the markup. If a processor was to be written for
LaTeX, then it would probably make sense for it to use full BibTeX syntax
instead of a mix of HTML and BibTeX.

Just my 2 cents.
Aurimas

Hi all,

Sorry for the potential thread break — I’m a new member to the list so I
can’t reply to this thread in a nice way…

My apologies in advance if this idea has previously been proposed and
decided against. I’ve only kept up with the list for a short while now.

I’m a developer for Paperpile, a new(ish) reference manager making use of
CSL and citeproc-js.

Personally, I’d love to see CSL move toward widespread support for the
solution that Bibtex has used for years, which is to allow users to surround
words with braces to protect the capitalization as-is. This would go very
far towards giving a user the power to force de-capitalization (or
capitalization) of things that they know should be a certain way in their
bibliography, despite the CSL style and/or the processor’s best attempts to
do the right thing.

It seems clear that no matter how smart one is about trying to title-case,
there will always be edge cases where a single item is improperly
capitalized. Without any user-facing way to protect these edge cases, we’re
stuck either (a) avoiding title case like the plague in CSL styles and just
using strings as-is from the input, or (b) embracing title case in the
styles and dealing with frustrated users who can’t tweak things the way they
want.

Some benefits to a bibtex-like syntax for protecting capitalization would
be:

  • It’s immediately familiar to anyone who’s ever used bibtex.
  • It’s directly interoperable with existing bibtex data.
  • It’s simple enough for many users to learn and remember. (I don’t think
    an HTML syntax would ever have this benefit.)
  • It easily generalizes to all variable outputs, not just titles.

(For context, we recently had a user contact us about incorrect
capitalization of journal names in APA style:
APA style: title-case for periodical names · Issue #759 · citation-style-language/styles · GitHub which brought
this issue to our attention.)

I think any type of HTML-based syntax would be misdirected toward
capitalization, since the issues of formatting (e.g. italics, superscript)
and capitalization are not one and the same. Formatting is specific to only
non-plain-text outputs, while capitalization is relevant no matter how a
citation is ultimately being displayed.

All that said, maybe there’s some obvious problem with this approach, or
maybe it’s something that should be left up to each processor to decided.
However, I have a feeling it would be best in the long term to have
something clearly spelled out in a processor-agnostic document to both
improve clarity and aid widespread adoption.

If there’s interest, we’d be happy to help implement this in citeproc-js.

greg

Some processor-specific syntax for this currently coded into citeproc-js:

qwerty

The “nocase” form has the effect of squiggly braces in BibTeX (as I
understand it): it prevents changes to the case of the enclosed text
when title case or text case are applied to the field. The idea was
(and I guess I would say is) to use a markup syntax that can be easily
represented in a UI without additional levels of external parsing by
the client. As far as I know the syntax isn’t supported in the UI of
any clients out there, though, and it hasn’t seen much use for the
obvious-enough reason that it’s quite an awkward thing to type, and
distracting when displayed verbatim.

It should be a simple thing to add a spec line to the parser that
applies the same methods to squiggly-brace-enclosed text. Backslash
escaping should just work, without additional coding.

in citeproc-js, mixing “plain text” and “html” approaches isn’t
particularly a problem, but smooth operation with other tools might
require greater consistency. It would be good to have input from other
processor developers and consumers of CSL; and in the interest of data
exchange, it would be good to have a description of preferred markup
set in an adjunct to the CSL specification before making further
changes.

Hi all,

Sorry for the potential thread break — I’m a new member to the list so I
can’t reply to this thread in a nice way…

My apologies in advance if this idea has previously been proposed and
decided against. I’ve only kept up with the list for a short while now.

I’m a developer for Paperpile, a new(ish) reference manager making use of
CSL and citeproc-js.

Personally, I’d love to see CSL move toward widespread support for the
solution that Bibtex has used for years, which is to allow users to surround
words with braces to protect the capitalization as-is. This would go very
far towards giving a user the power to force de-capitalization (or
capitalization) of things that they know should be a certain way in their
bibliography, despite the CSL style and/or the processor’s best attempts to
do the right thing.

It seems clear that no matter how smart one is about trying to title-case,
there will always be edge cases where a single item is improperly
capitalized. Without any user-facing way to protect these edge cases, we’re
stuck either (a) avoiding title case like the plague in CSL styles and just
using strings as-is from the input, or (b) embracing title case in the
styles and dealing with frustrated users who can’t tweak things the way they
want.

Some benefits to a bibtex-like syntax for protecting capitalization would
be:

  • It’s immediately familiar to anyone who’s ever used bibtex.
  • It’s directly interoperable with existing bibtex data.
  • It’s simple enough for many users to learn and remember. (I don’t think
    an HTML syntax would ever have this benefit.)
  • It easily generalizes to all variable outputs, not just titles.

(For context, we recently had a user contact us about incorrect
capitalization of journal names in APA style:
APA style: title-case for periodical names · Issue #759 · citation-style-language/styles · GitHub which brought
this issue to our attention.)

I think any type of HTML-based syntax would be misdirected toward
capitalization, since the issues of formatting (e.g. italics, superscript)
and capitalization are not one and the same. Formatting is specific to only
non-plain-text outputs, while capitalization is relevant no matter how a
citation is ultimately being displayed.

All that said, maybe there’s some obvious problem with this approach, or
maybe it’s something that should be left up to each processor to decided.
However, I have a feeling it would be best in the long term to have
something clearly spelled out in a processor-agnostic document to both
improve clarity and aid widespread adoption.

If there’s interest, we’d be happy to help implement this in citeproc-js.

greg

Some processor-specific syntax for this currently coded into citeproc-js:

qwerty

The “nocase” form has the effect of squiggly braces in BibTeX (as I
understand it): it prevents changes to the case of the enclosed text
when title case or text case are applied to the field. The idea was

(Oops, sorry, typo: I should have written “when title case or sentence
case are applied to the field”.)

The “nocase” form has the effect of squiggly braces in BibTeX (as I
understand it): it prevents changes to the case of the enclosed text
when title case or text case are applied to the field. The idea was
(and I guess I would say is) to use a markup syntax that can be easily
represented in a UI without additional levels of external parsing by
the client.

That’s right. Not only represented, though, but also easily UI-ified
(think, say, a client like zotero with a context menu that allows one
to select how to treat particular sub-field text).

Any discussion of changes to this particular solution should probably
consider the goals, too.

Bruce

Just to restate our current approach to this so that we’re all on the same
page:

  • The recommendation is to store all titles in sentence case. Where
    sentence case is required, we just leave them alone (we never use
    text-case=“sentence” in CSL styles).
  • Where title case is required, items can be title-cased correctly in
    almost all cases (the “almost” part is what Aurimas wants to solve with
    this). So anything we implement will mean a lot less mark-up than bibtex,
    where you have to protect every proper name etc. We only need no-casing
    exceptions for rare cases like two word species names in English titles
    where automatic title casing fails.
  • Title casing can be turned of for non-English items using the language
    variable (http://citationstyles.org/downloads/specification.html#id87 )
  • So far we’ve not been using title case much for container-titles,
    especially of periodicals, even where required for reasons I explain here:
    https://github.com/citation-style-language/styles/issues/759#issuecomment-28864718.
    Ss I say in that comment, while I’m not without hesitation about
    this, I
    think we should reverse what was always an informal and
    not-strictly-enforced policy and generally apply title case to container
    titles where it is required, though in most cases I don’t think it’s
    particularly urgent to implement - but Rintze and I will be accepting
    pull-requests (or appreciate commits from those with direct access) to that
    extent.

With all that said, I think we should find a way to do this, but it’s going
to be a lot less heavy-handed than the BibTeX approach and I think that’s
very good.

I don’t have strong views on the implementation side, except that I’d like
to keep it uniform across implementations if it’s at all possible to agree,
so that data exchange becomes less of a mess.
Best,
Sebastian

As far as I know the syntax isn’t supported in the UI of any clients out there, though, and it hasn’t seen much use for the obvious-enough reason that it’s quite an awkward thing to type, and distracting when displayed verbatim.

Papers allows BibTeX-like escaping for nocase support, which is also supported in our CSL engine. Ths strings are however not stored quite with the same markup, but with a custom markup similar to XML (e.g. using , , , etc…).

Charles

Thanks Sebastian for summarizing — it’s very helpful to understand more of
the decision-making behind the current state of things.

It’s also interesting to hear that citeproc-js already supports this kind
of functionality, albeit with a different syntax. Cool!

Perhaps I came across a bit too evangelical on the Bibtex-brace approach. I
wouldn’t ever argue for widespread support of anything more than the braces
to protect caps. Most other parts of LaTeX syntax, as Aurimas mentioned,
are far from ideal. :slight_smile: And we’re not against a different approach if it’s
widely agreed upon.

Anyhow, this is definitely an edge case, and any movement toward a
documented solution would benefit greatly from widespread adoption and
feedback from other programs. But this is definitely on our minds, and I’m
encouraged by the possibility that supporting a single syntax within fields
could serve both our bibtex and CSL users equally well.

–greg

Content-Type: text/plain; charset=3D"iso-8859-1"

Just to restate our current approach to this so that we’re all on the same
page:

  • The recommendation is to store all titles in sentence case. Where
    sentence case is required, we just leave them alone (we never use
    text-case=3D"sentence" in CSL styles).
  • Where title case is required, items can be title-cased correctly in
    almost all cases (the “almost” part is what Aurimas wants to solve with
    this). So anything we implement will mean a lot less mark-up than bibtex,
    where you have to protect every proper name etc. We only need no-casing
    exceptions for rare cases like two word species names in English titles
    where automatic title casing fails.
  • Title casing can be turned of for non-English items using the language
    variable (Redirecting… )
  • So far we’ve not been using title case much for container-titles,
    especially of periodicals, even where required for reasons I explain here:
    APA style: title-case for periodical names · Issue #759 · citation-style-language/styles · GitHub
    8864718
    .
    Ss I say in that comment, while I’m not without hesitation about
    this, I
    think we should reverse what was always an informal and
    not-strictly-enforced policy and generally apply title case to container
    titles where it is required, though in most cases I don’t think it’s
    particularly urgent to implement - but Rintze and I will be accepting
    pull-requests (or appreciate commits from those with direct access) to that
    extent.

With all that said, I think we should find a way to do this, but it’s going
to be a lot less heavy-handed than the BibTeX approach and I think that’s
very good.

I don’t have strong views on the implementation side, except that I’d like
to keep it uniform across implementations if it’s at all possible to agree,
so that data exchange becomes less of a mess.
Best,
Sebastian

On Wed, Nov 20, 2013 at 6:44 PM, Frank Bennett <@Frank_Bennett>wrote=
:

Hi all,

Sorry for the potential thread break =97 I’m a new member to the list =
so I
can’t reply to this thread in a nice way…

My apologies in advance if this idea has previously been proposed and
decided against. I’ve only kept up with the list for a short while now=
.

I’m a developer for Paperpile, a new(ish) reference manager making use
of
CSL and citeproc-js.

Personally, I’d love to see CSL move toward widespread support for the
solution that Bibtex has used for years, which is to allow users to
surround
words with braces to protect the capitalization as-is. This would go
very
far towards giving a user the power to force de-capitalization (or
capitalization) of things that they know should be a certain way in
their
bibliography, despite the CSL style and/or the processor’s best
attempts to
do the right thing.

It seems clear that no matter how smart one is about trying to
title-case,
there will always be edge cases where a single item is improperly
capitalized. Without any user-facing way to protect these edge cases,
we’re
stuck either (a) avoiding title case like the plague in CSL styles and
just
using strings as-is from the input, or (b) embracing title case in the
styles and dealing with frustrated users who can’t tweak things the wa=
y
they
want.

Some benefits to a bibtex-like syntax for protecting capitalization
would
be:

  • It’s immediately familiar to anyone who’s ever used bibtex.
  • It’s directly interoperable with existing bibtex data.
  • It’s simple enough for many users to learn and remember. (I don’t
    think
    an HTML syntax would ever have this benefit.)
  • It easily generalizes to all variable outputs, not just titles.

(For context, we recently had a user contact us about incorrect
capitalization of journal names in APA style:
APA style: title-case for periodical names · Issue #759 · citation-style-language/styles · GitHub which
brought
this issue to our attention.)

I think any type of HTML-based syntax would be misdirected toward
capitalization, since the issues of formatting (e.g. italics,
superscript)
and capitalization are not one and the same. Formatting is specific to
only
non-plain-text outputs, while capitalization is relevant no matter how=
a
citation is ultimately being displayed.

All that said, maybe there’s some obvious problem with this approach, =
or
maybe it’s something that should be left up to each processor to
decided.
However, I have a feeling it would be best in the long term to have
something clearly spelled out in a processor-agnostic document to both
improve clarity and aid widespread adoption.

If there’s interest, we’d be happy to help implement this in
citeproc-js.

greg

Some processor-specific syntax for this currently coded into citeproc-j=
s:

<span class=3D"nocase">qwerty

The “nocase” form has the effect of squiggly braces in BibTeX (as I
understand it): it prevents changes to the case of the enclosed text
when title case or text case are applied to the field. The idea was

(Oops, sorry, typo: I should have written “when title case or sentence
case are applied to the field”.)

(and I guess I would say is) to use a markup syntax that can be easily
represented in a UI without additional levels of external parsing by
the client. As far as I know the syntax isn’t supported in the UI of
any clients out there, though, and it hasn’t seen much use for the
obvious-enough reason that it’s quite an awkward thing to type, and
distracting when displayed verbatim.

It should be a simple thing to add a spec line to the parser that
applies the same methods to squiggly-brace-enclosed text. Backslash
escaping should just work, without additional coding.

in citeproc-js, mixing “plain text” and “html” approaches isn’t
particularly a problem, but smooth operation with other tools might
require greater consistency. It would be good to have input from other
processor developers and consumers of CSL; and in the interest of data
exchange, it would be good to have a description of preferred markup
set in an adjunct to the CSL specification before making further
changes.

Clearly, string formatting (mostly in titles) is necessary and is
getting
implemented whether CSL specifies it or not. IMO HTML (or XML, which
would
probably be more work for everyone) is the most elegant and broadly
supported
approach.

If CSL really wants to remain format-agnostic in this regard, then it
could
just
specify that substrings can be marked (with possible nesting) for
various
formatting (italics, superscript, forced title-casing, etc.) and leave
the
language
of the formatting up to the citeproc developers. CSL can then go on to
specify
how such substrings are handled when producing citations.

I haven’t looked at this issue, but putting html in json files feels
really wrong as a general proposition.

On Thu, Nov 14, 2013 at 10:02 PM, Rintze Zelle <[hidden email]> wrote=
:

So far the CSL spec is rather format-agnostic when it comes to inpu=
t.> On Thu, Nov 21, 2013 at 8:42 AM, Frank Bennett <@Frank_Bennett> > wrote:
On Thu, Nov 21, 2013 at 6:00 AM, Gregory Jordan <@Gregory_Jordan> > wrote:
On Fri, Nov 15, 2013 at 4:13pm, Aurimas Vinckevicius wrote:
On Thu, Nov 14, 2013 at 9:52 PM, Bruce D’Arcus <[hidden email]> wrote:
It’s
one of the reasons why citeproc-js’s support for inline rich text
formatting
of titles (kb:rich_text_bibliography [Zotero Documentation])
isn’t
included in the spec.

I see the use of what you’re proposing, but it is rather
HTML-oriented.
Are
we comfortable including something like this in the spec, or would
it be
better to have a separate (sub)document that focuses on CSL input
(which
could also be used to describe the CSL JSON data model)?

Rintze

-------------------------------------------------------------------------=


Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovatio=
n.
Intel(R) Software Adrenaline delivers strategic insight and
game-changing
conversations that shape the rapidly evolving mobile landscape. Sign u=
p
now.

http://pubads.g.doubleclick.net/gampad/clk?id=3D63431311&iu=3D/4140/ostg.=
clktrk


xbiblio-devel mailing list
xbiblio-devel@lists.sourceforge.net
xbiblio-devel List Signup and Options

–047d7bdc80de3814ae04ebae9dc0
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Thanks Sebastian for summarizing =97 it's very he= lpful to understand more of the decision-making behind the current state of= things.

It's also interesting to hear that ci= teproc-js already supports this kind of functionality, albeit with a differ= ent syntax. Cool!

Perhaps I came across a bit too evangelical on the Bibt= ex-brace approach. I wouldn't ever argue for widespread support of anyt= hing more than the braces to protect caps. Most other parts of LaTeX syntax= , as Aurimas mentioned, are far from ideal. :) And we're not against a = different approach if it's widely agreed upon.

Anyhow, this is definitely an edge case, and any moveme= nt toward a documented solution would benefit greatly from widespread adopt= ion and feedback from other programs. But this is definitely on our minds, = and I'm encouraged by the possibility that supporting a single syntax w= ithin fields could serve both our bibtex and CSL users equally well.

--greg


Message:= 4
Date: Wed, 20 Nov 2013 19:43:06 -0700
From: Sebastian Karcher <= karcher@u.n= orthwestern.edu>

Subject: Re: [xbiblio-devel] title casing skip words
To: development dis=
cussion for xbiblio
=A0 =A0 =A0 =A0 <<a href=3D"mailto:xbiblio-devel@=
lists.sourceforge.net" target=3D"_blank">xbiblio-devel@lists.sourceforge.ne=
t>

Message-ID:
=A0 =A0 =A0 =A0 <CAOSYSD7rxFuBDXPiwE+Hhzmu7F=3D<a href=3D=
“mailto:7Jfw9qAtkSFXKuTGTqEQZ1Q@mail.gmail.com” target=3D"_blank">7Jfw9qAtk=
SFXKuTGTqEQZ1Q@mail.gmail.com>
Content-Type: text/plain; charset=
=3D"iso-8859-1"


Just to restate our current approach to this so that we're all on t=
he same
page:
- The recommendation is to store all titles in sentence=
case. Where
sentence case is required, we just leave them alone (we nev=
er use

text-case=3D"sentence" in CSL styles).
- Where title case is r=
equired, items can be title-cased correctly in
almost all cases (the &qu=
ot;almost" part is what Aurimas wants to solve with
this). So anyth=
ing we implement will mean a lot less mark-up than bibtex,

where you have to protect every proper name etc. We only need no-casing
=
exceptions for rare cases like two word species names in English titles
=
where automatic title casing fails.
- Title casing can be turned of for =
non-English items using the language

variable (<a href=3D"http://citationstyles.org/downloads/specification.html=
#id87" target=3D"_blank">http://citationstyles.org/downloads/specification.=
html#id87=A0)
- So far we've not been using title case much for =
container-titles,

especially of periodicals, even where required for reasons I explain here:<=

<a href=3D"APA style: title-case for periodical names · Issue #759 · citation-style-language/styles · GitHub
issuecomment-28864718" target=3D"_blank">https://github.com/citation-style-=
language/styles/issues/759#issuecomment-28864718.

=A0Ss I say in that comment, while I'm not without hesitation about
=
this, I
think we should reverse what was always an informal and
not-s=
trictly-enforced policy and generally apply title case to container
titl=
es where it is required, though in most cases I don't think it's<br=

particularly urgent to implement - but Rintze and I will be accepting
pu=
ll-requests (or appreciate commits from those with direct access) to that<b=

extent.

With all that said, I think we should find a way to do thi=
s, but it's going

to be a lot less heavy-handed than the BibTeX approach and I think that&#39=
;s
very good.

I don't have strong views on the implementation=
side, except that I'd like
to keep it uniform across implementation=
s if it's at all possible to agree,

so that data exchange becomes less of a mess.
Best,
Sebastian
</di=


On Wed, Nov 20,=

2013 at 6:44 PM, Frank Bennett <span dir=3D"ltr"><<a href=3D"mailto:bie=
rcenator@gmail.com" target=3D"_blank">@Frank_Bennett> =
wrote:

On Thu, Nov 21, 2013 at = 8:42 AM, Frank Bennett <biercen= ator@gmail.com> wrote:

> On Thu, Nov 21, 2013 at 6:00 AM, Gregory Jordan <<a href=3D"mailto:=
@Gregory_Jordan">@Gregory_Jordan> wrote:

>> Hi all,

>>

>> Sorry for the potential thread break =97 I'm a new member to t=
he list so I

>> can't reply to this thread in a nice way…

>>

>> My apologies in advance if this idea has previously been proposed =
and

>> decided against. I've only kept up with the list for a short w=
hile now.

>>

>> I'm a developer for Paperpile, a new(ish) reference manager ma=
king use of

>> CSL and citeproc-js.

>>

>> Personally, I'd love to see CSL move toward widespread support=
for the

>> solution that Bibtex has used for years, which is to allow users t=
o surround

>> words with braces to protect the capitalization as-is. This would =
go very

>> far towards giving a user the power to force de-capitalization (or=


>> capitalization) of things that they know should be a certain way i=
n their

>> bibliography, despite the CSL style and/or the processor's bes=
t attempts to

>> do the right thing.

>>

>> It seems clear that no matter how smart one is about trying to tit=
le-case,

>> there will always be edge cases where a single item is improperly<=

>> capitalized. Without any user-facing way to protect these edge cas=
es, we're

>> stuck either (a) avoiding title case like the plague in CSL styles=
and just

>> using strings as-is from the input, or (b) embracing title case in=
the

>> styles and dealing with frustrated users who can't tweak thing=
s the way they

>> want.

>>

>> Some benefits to a bibtex-like syntax for protecting capitalizatio=
n would

>> be:

>> =A0- It's immediately familiar to anyone who's ever used b=
ibtex.

>> =A0- It's directly interoperable with existing bibtex data.<br=

>> =A0- It's simple enough for many users to learn and remember. =
(I don't think

>> an HTML syntax would ever have this benefit.)

>> =A0- It easily generalizes to all variable outputs, not just title=
s.

>>

>> (For context, we recently had a user contact us about incorrect<br=

>> capitalization of journal names in APA style:

>> <a href=3D"https://github.com/citation-style-language/styles/issue=
s/759" target=3D"_blank">https://github.com/citation-style-language/styles/=
issues/759 which brought

>> this issue to our attention.)

>>

>> I think any type of HTML-based syntax would be misdirected toward<=

>> capitalization, since the issues of formatting (e.g. italics, supe=
rscript)

>> and capitalization are not one and the same. Formatting is specifi=
c to only

>> non-plain-text outputs, while capitalization is relevant no matter=
how a

>> citation is ultimately being displayed.

>>

>> All that said, maybe there's some obvious problem with this ap=
proach, or

>> maybe it's something that should be left up to each processor =
to decided.

>> However, I have a feeling it would be best in the long term to hav=
e

>> something clearly spelled out in a processor-agnostic document to =
both

>> improve clarity and aid widespread adoption.

>>

>> If there's interest, we'd be happy to help implement this =
in citeproc-js.

>>

>> greg

>

> Some processor-specific syntax for this currently coded into citeproc-=
js:

>

> =A0 =A0<span class=3D"nocase">qwerty</span>

>

> The "nocase" form has the effect of squiggly braces in BibTe=
X (as I

> understand it): it prevents changes to the case of the enclosed text<b=

> when title case or text case are applied to the field. The idea was<br=


(Oops, sorry, typo: I should have written "when title case= or sentence
case are applied to the field".)

> (and I guess I would say is) to use a markup syntax that can be easily=
> represented in a UI without additional levels of external parsing by > the client. As far as I know the syntax isn't supported in the UI = of
> any clients out there, though, and it hasn't seen much use for the=
> obvious-enough reason that it's quite an awkward thing to type, an= d
> distracting when displayed verbatim.
>
> It should be a simple thing to add a spec line to the parser that
> applies the same methods to squiggly-brace-enclosed text. Backslash > escaping should just work, without additional coding.
>
> in citeproc-js, mixing "plain text" and "html" app= roaches isn't
> particularly a problem, but smooth operation with other tools might > require greater consistency. It would be good to have input from other=
> processor developers and consumers of CSL; and in the interest of data=
> exchange, it would be good to have a description of preferred markup > set in an adjunct to the CSL specification before making further
> changes.
>
>
>>
>> On Fri, Nov 15, 2013 at 4:13pm, Aurimas Vinckevicius wrote:
>>
>>>Clearly, string formatting (mostly in titles) is necessary and = is getting
>>>implemented whether CSL specifies it or not. IMO HTML (or XML, = which would
>>>probably be more work for everyone) is the most elegant and bro= adly
>>> supported
>>>approach.
>>
>>>If CSL really wants to remain format-agnostic in this regard, t= hen it could
>>> just
>>>specify that substrings can be marked (with possible nesting) f= or various
>>>formatting (italics, superscript, forced title-casing, etc.) an= d leave the
>>> language
>>>of the formatting up to the citeproc developers. CSL can then g= o on to
>>> specify
>>>how such substrings are handled when producing citations.
>>
>> On Thu, Nov 14, 2013 at 9:52 PM, Bruce D'Arcus <[hidden ema= il]> wrote:
>>>
>>> I haven't looked at this issue, but putting html in json f= iles feels
>>> really wrong as a general proposition.
>>>
>>> On Thu, Nov 14, 2013 at 10:02 PM, Rintze Zelle <[hidden ema= il]> wrote:
>>> > So far the CSL spec is rather format-agnostic when it com= es to input.
>>> > It's
>>> > one of the reasons why citeproc-js's support for inli= ne rich text
>>> > formatting
>>> > of titles (http://www.zotero.org/support/kb/ri= ch_text_bibliography)
>>> > isn't
>>> > included in the spec.
>>> >
>>> > I see the use of what you're proposing, but it is rat= her HTML-oriented.
>>> > Are
>>> > we comfortable including something like this in the spec,= or would it be
>>> > better to have a separate (sub)document that focuses on C= SL input (which
>>> > could also be used to describe the CSL JSON data model)?<= > >>> >
>>> > Rintze
>>
>> ------------------------------------------------------------------= ------------
>> Shape the Mobile Experience: Free Subscription
>> Software experts and developers: Be at the forefront of tech innov= ation.
>> Intel(R) Software Adrenaline delivers strategic insight and game-c= hanging
>> conversations that shape the rapidly evolving mobile landscape. Si= gn up now.
>> http://pubads.g.doubleclick= .net/gampad/clk?id=3D63431311&iu=3D/4140/ostg.clktrk
>> _______________________________________________
>> xbiblio-devel mailing list
>> xbiblio-dev= el@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/xbiblio= -devel
>>

–047d7bdc80de3814ae04ebae9dc0–