quotes localization (was html entities)

Bruce_D_Arcus1 · June 17, 2009, 10:18pm

Yes:

http://en.wikipedia.org/wiki/Quotation_mark,_non-English_usage

Am open to other suggestions.

Bruce

Frank_Bennett · June 17, 2009, 11:00pm

Is there a use case for these locale terms?

Yes:

http://en.wikipedia.org/wiki/Quotation_mark,_non-English_usage

Am open to other suggestions.

The glyphs used for quote marks should be drawn from the locale, but
the placement of punctuation appears to be a matter of editorial
taste:

So how about this:

-- current behavior for backward compatibility, but quote mark glyphs drawn from locale -- deprecated -- explicitly invokes current behavior, quote mark glyphs drawn from locale -- places punctuation outside quotes, quote mark glyphs drawn from locale

Bruce_D_Arcus1 · June 17, 2009, 11:40pm

Is there a use case for these locale terms?

Yes:

http://en.wikipedia.org/wiki/Quotation_mark,_non-English_usage

Am open to other suggestions.

The glyphs used for quote marks should be drawn from the locale,

That was my first impulse. But another approach is just to have some
named parameters that would correspond to the options here:

http://en.wikipedia.org/wiki/Quotation_mark_glyphs

but the placement of punctuation appears to be a matter of editorial
taste:

Punctuation moved inside quotes - undocumented? - Zotero Forums

Right, but there are still, I think, locale-specific defaults
(notwithstanding the impact of dominant languages like English on
these traditions)…

So how about this:
-- current behavior for backward compatibility, but quote mark glyphs drawn from locale -- deprecated -- explicitly invokes current behavior, quote mark glyphs drawn from locale -- places punctuation outside quotes, quote mark glyphs drawn from locale

I think it’d probably be preferable to have a different attribute for this.

Bruce

Frank_Bennett · June 18, 2009, 1:09am

Is there a use case for these locale terms?

Yes:

http://en.wikipedia.org/wiki/Quotation_mark,_non-English_usage

Am open to other suggestions.

The glyphs used for quote marks should be drawn from the locale,

That was my first impulse. But another approach is just to have some
named parameters that would correspond to the options here:

http://en.wikipedia.org/wiki/Quotation_mark_glyphs

So … map the locale key (“double quotes”) to a named entity stripped
of its ampersand and semicolon, that maps through a function to a
unicode character to which it corresponds one-to-one? As opposed to
mapping the locale key to the character directly.

but the placement of punctuation appears to be a matter of editorial
taste:

Punctuation moved inside quotes - undocumented? - Zotero Forums

Right, but there are still, I think, locale-specific defaults
(notwithstanding the impact of dominant languages like English on
these traditions)…

So long as default behavior can be overridden, no objections there.

So how about this:
-- current behavior for backward compatibility, but quote mark glyphs drawn from locale -- deprecated -- explicitly invokes current behavior, quote mark glyphs drawn from locale -- places punctuation outside quotes, quote mark glyphs drawn from locale

I think it’d probably be preferable to have a different attribute for this.

Let me know when it hits the spec.

Sean_Takats · June 18, 2009, 1:08pm

A question I have related to the discussion on quotation marks and
locales: how should CSL handle locators that include quotes? For
example, at present an author might generate a locator like the
following:

s.v. “Piracy”

“Preface”

etc.

But these currently render in Chicago Manual of Style at the end of
the citation string as

s.v. “Piracy”.

“Preface”.

So, correct in British usage, but not in American, though this
behavior doesn’t seem to be controlled by any localization. It’s just
inserting a dumb string before the terminal punctuation. Any thoughts
on how to handle this, or should users just plan to edit such
citations manually (not the end of the world)?

Sean

Bruce_D_Arcus1 · June 18, 2009, 2:16pm

A question I have related to the discussion on quotation marks and
locales: how should CSL handle locators that include quotes?

Ugh; good question.

For example, at present an author might generate a locator like the
following:

s.v. “Piracy”

“Preface”

etc.

And what is that supposed to mean? “See the chapter with title ‘X’”?
E.g. what’s in quotes is a chapter title?

Somehow this feels kind of hackish.

But these currently render in Chicago Manual of Style at the end of
the citation string as

s.v. “Piracy”.

“Preface”.

So, correct in British usage, but not in American, though this
behavior doesn’t seem to be controlled by any localization. It’s just
inserting a dumb string before the terminal punctuation. Any thoughts
on how to handle this, or should users just plan to edit such
citations manually (not the end of the world)?

I don’t really have an opinion, but it probably wouldn’t be too hard
to scan the final string and adjust it.

Frank?

Bruce

Frank_Bennett · June 18, 2009, 7:59pm

A question I have related to the discussion on quotation marks and
locales: how should CSL handle locators that include quotes?

Ugh; good question.

For example, at present an author might generate a locator like the
following:

s.v. “Piracy”

“Preface”

etc.

And what is that supposed to mean? “See the chapter with title ‘X’”?
E.g. what’s in quotes is a chapter title?

Somehow this feels kind of hackish.

But these currently render in Chicago Manual of Style at the end of
the citation string as

s.v. “Piracy”.

“Preface”.

So, correct in British usage, but not in American, though this
behavior doesn’t seem to be controlled by any localization. It’s just
inserting a dumb string before the terminal punctuation. Any thoughts
on how to handle this, or should users just plan to edit such
citations manually (not the end of the world)?

I don’t really have an opinion, but it probably wouldn’t be too hard
to scan the final string and adjust it.

Frank?

String remangling might be the simplest way to handle all of the
punctuation-quote musical chairs issues – adopting what a Bulgarian
colleague refers to fondly as the “bigger hammer method”. It should
produce the desired result (except when it doesn’t).

Frank_Bennett · June 27, 2009, 9:36pm

Is there a use case for these locale terms?

Yes:

http://en.wikipedia.org/wiki/Quotation_mark,_non-English_usage

Am open to other suggestions.

The glyphs used for quote marks should be drawn from the locale,

That was my first impulse. But another approach is just to have some
named parameters that would correspond to the options here:

http://en.wikipedia.org/wiki/Quotation_mark_glyphs

So … map the locale key (“double quotes”) to a named entity stripped
of its ampersand and semicolon, that maps through a function to a
unicode character to which it corresponds one-to-one? As opposed to
mapping the locale key to the character directly.

but the placement of punctuation appears to be a matter of editorial
taste:

Punctuation moved inside quotes - undocumented? - Zotero Forums

Right, but there are still, I think, locale-specific defaults
(notwithstanding the impact of dominant languages like English on
these traditions)…

So long as default behavior can be overridden, no objections there.

So how about this:
-- current behavior for backward compatibility, but quote mark glyphs drawn from locale -- deprecated -- explicitly invokes current behavior, quote mark glyphs drawn from locale -- places punctuation outside quotes, quote mark glyphs drawn from locale

I think it’d probably be preferable to have a different attribute for this.

Let me know when it hits the spec.

Still need some means of identifying the quote characters appropriate
to the locale, and of specifying the correct punctuation handling
method.

A decision will need to be made at the UI level about how quotes will
be represented in the data. Inline markup will need to recognize
these characters and treat them as markup, so the processor needs to
know what characters are in the set, so it can identify them.
Possibilities are to require typewriter quotes everywhere (fragile,
probably not a good idea), or recognize everything (better, but
requires someone to identify what all the possible quotation marks in
the world are – if I should use the wikipedia entry linked by Bruce,
let me know).

I’ll state what seems sensible to me at the moment, but I’ll have no
problem if a different design is chosen by executive decision, so long
as it’s unambiguous and covers user needs:

For quote characters, use locale terms:

“</>
”</>
‘</>
’</>

For punctuation handling, use an explicit option, with some default
value or other:

If locale defaults are to be provided, then some means of specifying
configuration values in the locale needs to be provided – at the
moment, the locale contains only string values, not programming
parameters.

These items need to be covered before anything can be done for inline markup.

Bruce_D_Arcus1 · July 1, 2009, 1:34pm

…

Still need some means of identifying the quote characters appropriate
to the locale, and of specifying the correct punctuation handling
method.

Big question:

Given the discussion on the Zotero forums from one French user, who
suggested this may not be entirely a locale issue, do we really know
that we need to add this to CSL?

Or put differently, how would an implementation accommodate, say, a
French user who doesn’t want French-style quotation marks?

A decision will need to be made at the UI level about how quotes will
be represented in the data. Inline markup will need to recognize
these characters and treat them as markup, so the processor needs to
know what characters are in the set, so it can identify them.
Possibilities are to require typewriter quotes everywhere (fragile,
probably not a good idea), or recognize everything (better, but
requires someone to identify what all the possible quotation marks in
the world are – if I should use the wikipedia entry linked by Bruce,
let me know).

In data, this why I prefer using XML:

Here's some quote

But this of course isn’t directly related to CSL.

I’ll state what seems sensible to me at the moment, but I’ll have no
problem if a different design is chosen by executive decision, so long
as it’s unambiguous and covers user needs:

For quote characters, use locale terms:

“</>
”</>
‘</>
’</>

If we settle the question above and decide we need this, this is
good. Except I’d remove the “single” and “double” notion, since that’s
likely locale-specific. Maybe “right-quote” and “right-inner-quote”?

For punctuation handling, use an explicit option, with some default
value or other:

Yup.

If locale defaults are to be provided, then some means of specifying
configuration values in the locale needs to be provided – at the
moment, the locale contains only string values, not programming
parameters.

These items need to be covered before anything can be done for inline markup.

Bruce

Sean_Takats · July 1, 2009, 2:57pm

…

Still need some means of identifying the quote characters appropriate
to the locale, and of specifying the correct punctuation handling
method.

Big question:

Given the discussion on the Zotero forums from one French user, who
suggested this may not be entirely a locale issue, do we really know
that we need to add this to CSL?

Or put differently, how would an implementation accommodate, say, a
French user who doesn’t want French-style quotation marks?

A decision will need to be made at the UI level about how quotes will
be represented in the data. Inline markup will need to recognize
these characters and treat them as markup, so the processor needs to
know what characters are in the set, so it can identify them.
Possibilities are to require typewriter quotes everywhere (fragile,
probably not a good idea), or recognize everything (better, but
requires someone to identify what all the possible quotation marks in
the world are – if I should use the wikipedia entry linked by Bruce,
let me know).

In data, this why I prefer using XML:

Here's some quote

But this of course isn’t directly related to CSL.

I’ll state what seems sensible to me at the moment, but I’ll have no
problem if a different design is chosen by executive decision, so
long
as it’s unambiguous and covers user needs:

For quote characters, use locale terms:

“</>
”</>
‘</>
’</>

If we settle the question above and decide we need this, this is
good. Except I’d remove the “single” and “double” notion, since that’s
likely locale-specific. Maybe “right-quote” and “right-inner-quote”?

The Wikipedia piece uses “primary” and “secondary” (Quotation mark - Wikipedia
). And what about RTL issues?

Maybe:

“</>
”</>
“</>
”</

Frank_Bennett · July 1, 2009, 3:31pm

…

Still need some means of identifying the quote characters appropriate
to the locale, and of specifying the correct punctuation handling
method.

Big question:

Given the discussion on the Zotero forums from one French user, who
suggested this may not be entirely a locale issue, do we really know
that we need to add this to CSL?

Or put differently, how would an implementation accommodate, say, a
French user who doesn’t want French-style quotation marks?

How would a problem arise? If a style for French sources wants to use
some other mark for quotes, it can overload the locale. If a French
user has sources with non-French quotes inlined in a title to be fed
to CSL, it will be passed through as a literal, if the style uses the
French locale, and the chosen quote character is not standard input
markup in the French locale. If we are thinking of the same forum
thread and I am remembering it correctly, that would cover the case
described by the user.

A decision will need to be made at the UI level about how quotes will
be represented in the data. Inline markup will need to recognize
these characters and treat them as markup, so the processor needs to
know what characters are in the set, so it can identify them.
Possibilities are to require typewriter quotes everywhere (fragile,
probably not a good idea), or recognize everything (better, but
requires someone to identify what all the possible quotation marks in
the world are – if I should use the wikipedia entry linked by Bruce,
let me know).

In data, this why I prefer using XML:

Here's some quote

Tag markup would be easier to implement in the processor, because you
would not need to handle courier-style quotes, which are identical for
open and close. But is Zotero going to implement markup for quotes at
the database level in the short term? If the answer is no, then we
need to handle quote characters in the processor if we want it
deployed in Zotero.

I’ll look forward to seeing how it turns out.

But this of course isn’t directly related to CSL.

I’ll state what seems sensible to me at the moment, but I’ll have no
problem if a different design is chosen by executive decision, so long
as it’s unambiguous and covers user needs:

For quote characters, use locale terms:

“</>
”</>
‘</>
’</>

If we settle the question above and decide we need this, this is
good. Except I’d remove the “single” and “double” notion, since that’s
likely locale-specific. Maybe “right-quote” and “right-inner-quote”?

If no other clear proposal emerges in the next couple of weeks, I will
go with this, amended by Sean’s suggestion to remove RL assumptions
from the attribute names.

Bruce_D_Arcus1 · July 1, 2009, 5:05pm

There are a lot of things up-in-the-air here.

Is there any indication of how Zotero intends to deal with this (is
there a ticket, for example)? AFAIK. there’s no easy way for users to
override locales definitions ATM.

Is there any information about what practical issues users are having
with quotes?

Also, as an alternative, there’s another possibility I earlier
mentioned, which is something like:

Am not sure this is a good idea; really depends on the details of the use case.

Bruce

Frank_Bennett · July 1, 2009, 10:14pm

…

Still need some means of identifying the quote characters appropriate
to the locale, and of specifying the correct punctuation handling
method.

Big question:

Given the discussion on the Zotero forums from one French user, who
suggested this may not be entirely a locale issue, do we really know
that we need to add this to CSL?

Or put differently, how would an implementation accommodate, say, a
French user who doesn’t want French-style quotation marks?

How would a problem arise? If a style for French sources wants to use
some other mark for quotes, it can overload the locale. If a French
user has sources with non-French quotes inlined in a title to be fed
to CSL, it will be passed through as a literal, if the style uses the
French locale, and the chosen quote character is not standard input
markup in the French locale. If we are thinking of the same forum
thread and I am remembering it correctly, that would cover the case
described by the user.

A decision will need to be made at the UI level about how quotes will
be represented in the data. Inline markup will need to recognize
these characters and treat them as markup, so the processor needs to
know what characters are in the set, so it can identify them.
Possibilities are to require typewriter quotes everywhere (fragile,
probably not a good idea), or recognize everything (better, but
requires someone to identify what all the possible quotation marks in
the world are – if I should use the wikipedia entry linked by Bruce,
let me know).

In data, this why I prefer using XML:

Here's some quote

Tag markup would be easier to implement in the processor, because you
would not need to handle courier-style quotes, which are identical for
open and close. But is Zotero going to implement markup for quotes at
the database level in the short term? If the answer is no, then we
need to handle quote characters in the processor if we want it
deployed in Zotero.

I’ll look forward to seeing how it turns out.

There are a lot of things up-in-the-air here.

Is there any indication of how Zotero intends to deal with this (is
there a ticket, for example)? AFAIK. there’s no easy way for users to
override locales definitions ATM.

Is there any information about what practical issues users are having
with quotes?

Also, as an alternative, there’s another possibility I earlier
mentioned, which is something like:

Am not sure this is a good idea; really depends on the details of the use case.

http://www.witch.westfalen.de/csstest/quotes/quotes.html

Frank_Bennett · July 1, 2009, 10:28pm

…

Still need some means of identifying the quote characters appropriate
to the locale, and of specifying the correct punctuation handling
method.

Big question:

Given the discussion on the Zotero forums from one French user, who
suggested this may not be entirely a locale issue, do we really know
that we need to add this to CSL?

Or put differently, how would an implementation accommodate, say, a
French user who doesn’t want French-style quotation marks?

How would a problem arise? If a style for French sources wants to use
some other mark for quotes, it can overload the locale. If a French
user has sources with non-French quotes inlined in a title to be fed
to CSL, it will be passed through as a literal, if the style uses the
French locale, and the chosen quote character is not standard input
markup in the French locale. If we are thinking of the same forum
thread and I am remembering it correctly, that would cover the case
described by the user.

A decision will need to be made at the UI level about how quotes will
be represented in the data. Inline markup will need to recognize
these characters and treat them as markup, so the processor needs to
know what characters are in the set, so it can identify them.
Possibilities are to require typewriter quotes everywhere (fragile,
probably not a good idea), or recognize everything (better, but
requires someone to identify what all the possible quotation marks in
the world are – if I should use the wikipedia entry linked by Bruce,
let me know).

In data, this why I prefer using XML:

Here's some quote

Tag markup would be easier to implement in the processor, because you
would not need to handle courier-style quotes, which are identical for
open and close. But is Zotero going to implement markup for quotes at
the database level in the short term? If the answer is no, then we
need to handle quote characters in the processor if we want it
deployed in Zotero.

I’ll look forward to seeing how it turns out.

There are a lot of things up-in-the-air here.

Is there any indication of how Zotero intends to deal with this (is
there a ticket, for example)?

https://www.zotero.org/trac/ticket/928
https://www.zotero.org/trac/ticket/989
https://www.zotero.org/trac/ticket/990
https://www.zotero.org/trac/ticket/1238
https://www.zotero.org/trac/ticket/1436

AFAIK. there’s no easy way for users to
override locales definitions ATM.

The glyphs applied by quotes=“true” should be controlled by the style.
Quotes inlined in database entries should conform to the language of
the source. As I wrote earlier, if those inlined quote characters are
passed through verbatim when they are not known to the language of the
citation style rendering the database source, then you get the
behavior that several users have indicated is required by multilingual
styles (see relevant links below).

There just needs to be a mechanism for fetching the quote mark glyphs
associated with the locale of the style.

Frank_Bennett · July 1, 2009, 11:01pm

…

Still need some means of identifying the quote characters appropriate
to the locale, and of specifying the correct punctuation handling
method.

Big question:

Given the discussion on the Zotero forums from one French user, who
suggested this may not be entirely a locale issue, do we really know
that we need to add this to CSL?

Or put differently, how would an implementation accommodate, say, a
French user who doesn’t want French-style quotation marks?

How would a problem arise? If a style for French sources wants to use
some other mark for quotes, it can overload the locale. If a French
user has sources with non-French quotes inlined in a title to be fed
to CSL, it will be passed through as a literal, if the style uses the
French locale, and the chosen quote character is not standard input
markup in the French locale. If we are thinking of the same forum
thread and I am remembering it correctly, that would cover the case
described by the user.

A decision will need to be made at the UI level about how quotes will
be represented in the data. Inline markup will need to recognize
these characters and treat them as markup, so the processor needs to
know what characters are in the set, so it can identify them.
Possibilities are to require typewriter quotes everywhere (fragile,
probably not a good idea), or recognize everything (better, but
requires someone to identify what all the possible quotation marks in
the world are – if I should use the wikipedia entry linked by Bruce,
let me know).

In data, this why I prefer using XML:

Here's some quote

Tag markup would be easier to implement in the processor, because you
would not need to handle courier-style quotes, which are identical for
open and close. But is Zotero going to implement markup for quotes at
the database level in the short term? If the answer is no, then we
need to handle quote characters in the processor if we want it
deployed in Zotero.

I’ll look forward to seeing how it turns out.

There are a lot of things up-in-the-air here.

Is there any indication of how Zotero intends to deal with this (is
there a ticket, for example)?

#928 (enter curly quotes in the title field) – Zotero
#989 (option for quotes before comma in british styles) – Zotero
#990 (assign language to individual citation styles) – Zotero
#1238 (Localize quotation marks) – Zotero
#1436 (omit period or comma when title enclosed in quotes ends with a question mark) – Zotero

AFAIK. there’s no easy way for users to
override locales definitions ATM.

The glyphs applied by quotes=“true” should be controlled by the style.
Quotes inlined in database entries should conform to the language of
the source. As I wrote earlier, if those inlined quote characters are
passed through verbatim when they are not known to the language of the
citation style rendering the database source, then you get the
behavior that several users have indicated is required by multilingual
styles (see relevant links below).

There just needs to be a mechanism for fetching the quote mark glyphs
associated with the locale of the style.

Actually, configuration of quotes is closely related to configuration
of inline markup. If a solution can be found that covers both, that
would be splendid.

Just as a style needs to know what to do with balanced " characters,
it will need to know what decorations to apply with (say) XXX, or XXX. So
there needs to be a mapping from semantic tag names to formatting
attributes+value sets. The obvious place for those parameters to be
provided is the locale. How would you like to represent it?

Dan_Stillman · July 1, 2009, 10:32pm

I’d be inclined to go with tag-based markup in Zotero, since I really
don’t think we want to pollute stored(/synced/displayed) data with
arbitrary inline markup, and we already have mechanisms in various
places to deal with HTML. Simon and I have discussed using a
stripped-down version of TinyMCE for title fields to handle this. My
only concern would be performance, since the TinyMCE-based rich-text
notes in 2.0 have a bit of a display lag, and that’d be more of an issue
for title fields.

A few French and German users in the Zotero forums have expressed the
need for citation-level control over quote style based on the original
language of the source, but that idea was mostly dismissed as crazy and
prohibitively difficult. If we wanted to support it, though, it could be
done using the Language field (or, rather, an improved version of it:
http://forums.zotero.org/discussion/3252/#Item_5) rather than by
modifying the actual title data in Zotero. But it might not be worth the
additional schema/processor changes that would be required.

Bruce_D_Arcus1 · July 1, 2009, 11:37pm

…

Actually, configuration of quotes is closely related to configuration
of inline markup.

Also has some connection to dates localization, which we also need to deal with.

If a solution can be found that covers both, that
would be splendid.

Just as a style needs to know what to do with balanced " characters,
it will need to know what decorations to apply with (say) XXX, or XXX. So
there needs to be a mapping from semantic tag names to formatting
attributes+value sets. The obvious place for those parameters to be
provided is the locale. How would you like to represent it?

I don’t know. We need someone (say Zotero) to commit to this before we
settle on how to do it. We also need to know how complex this mapping
needs to be (again, assuming someone implements it). It might b as
simple as:

I doubt that would be enough though.

Bruce

Bruce_D_Arcus1 · July 1, 2009, 11:39pm

…

Also, as an alternative, there’s another possibility I earlier
mentioned, which is something like:

Am not sure this is a good idea; really depends on the details of the use case.

http://www.witch.westfalen.de/csstest/quotes/quotes.html

That sort of thing covers the general case, but not how it plays out
in citation practice. Can we be confident that we can make the
assumption they are equivalent?

Bruce

Bruce_D_Arcus1 · July 1, 2009, 11:43pm

I’d be inclined to go with tag-based markup in Zotero, since I really don’t
think we want to pollute stored(/synced/displayed) data with arbitrary
inline markup, and we already have mechanisms in various places to deal with
HTML. Simon and I have discussed using a stripped-down version of TinyMCE
for title fields to handle this. My only concern would be performance, since
the TinyMCE-based rich-text notes in 2.0 have a bit of a display lag, and
that’d be more of an issue for title fields.

Any chance a lighter-weight alternative would work?

A few French and German users in the Zotero forums have expressed the need
for citation-level control over quote style based on the original language
of the source, but that idea was mostly dismissed as crazy and prohibitively
difficult.

I’m more wondering if users working in other languages are complaining
about the quotes output in the bibliographies. Or, to turn it around,
am wondering if there are emerging trends towards standardizing on
English conventions.

Bruce

Frank_Bennett · July 1, 2009, 11:43pm

…

Also, as an alternative, there’s another possibility I earlier
mentioned, which is something like:

Am not sure this is a good idea; really depends on the details of the use case.

http://www.witch.westfalen.de/csstest/quotes/quotes.html

That sort of thing covers the general case, but not how it plays out
in citation practice. Can we be confident that we can make the
assumption they are equivalent?

Need to talk to the publishers to get an answer to that one.

Topic		Replies	Views
quotes CSL Development	8	248	July 11, 2008
Quotes CSL Development	9	354	September 11, 2006
Proposal: test condition for "language" CSL Development	19	405	December 12, 2010
CSL Questions CSL Development	60	527	March 6, 2007
Style duplication CSL Development	43	420	December 14, 2007

Related topics