quotes localization (was html entities)

Actually, configuration of quotes is closely related to configuration
of inline markup.

Also has some connection to dates localization, which we also need to deal with.

If a solution can be found that covers both, that
would be splendid.

Just as a style needs to know what to do with balanced " characters,
it will need to know what decorations to apply with (say) XXX, or XXX. So
there needs to be a mapping from semantic tag names to formatting
attributes+value sets. The obvious place for those parameters to be
provided is the locale. How would you like to represent it?

I don’t know. We need someone (say Zotero) to commit to this before we
settle on how to do it. We also need to know how complex this mapping
needs to be (again, assuming someone implements it).

I’m ready to work on implementation in the processor. The question is
over support in the Zotero UI?

I’d be inclined to go with tag-based markup in Zotero, since I really don’t
think we want to pollute stored(/synced/displayed) data with arbitrary
inline markup, and we already have mechanisms in various places to deal with
HTML. Simon and I have discussed using a stripped-down version of TinyMCE
for title fields to handle this. My only concern would be performance, since
the TinyMCE-based rich-text notes in 2.0 have a bit of a display lag, and
that’d be more of an issue for title fields.

Any chance a lighter-weight alternative would work?

A few French and German users in the Zotero forums have expressed the need
for citation-level control over quote style based on the original language
of the source, but that idea was mostly dismissed as crazy and prohibitively
difficult.

I’m more wondering if users working in other languages are complaining
about the quotes output in the bibliographies. Or, to turn it around,
am wondering if there are emerging trends towards standardizing on
English conventions.

Quotes and formatting decorations need to be placed under the control
of the locale before multi-lingual citations can be contemplated. For
example, in an English-language title that calls for italicization, a
parenthetical English translation of the title should be italicized,
but the Japanese characters should not be (italicization can be
applied by , but it is not a Japanese typesetting convention
and looks really weird on the page).

The first step to getting there is to set formatting and
quote-handling attributes in the locale of the style. At a later
stage, that could be extended to apply decorations as appropriate to
the language of the field, if things develop in that direction.

We’re not seeing these issues at the moment because this behavior is a
core requirement in multilingual environments, so Zotero doesn’t get
used there. Personally, I would like to encourage movement toward
multi-lingual support, of course, because there is a strong demand for
it among our students.

A decision will need to be made at the UI level about how quotes will
be represented in the data. Inline markup will need to recognize
these characters and treat them as markup, so the processor needs to
know what characters are in the set, so it can identify them.
Possibilities are to require typewriter quotes everywhere (fragile,
probably not a good idea), or recognize everything (better, but
requires someone to identify what all the possible quotation marks in
the world are – if I should use the wikipedia entry linked by Bruce,
let me know).

In data, this why I prefer using XML:

Here's some quote

Tag markup would be easier to implement in the processor, because you
would not need to handle courier-style quotes, which are identical for
open and close. But is Zotero going to implement markup for quotes at
the database level in the short term? If the answer is no, then we
need to handle quote characters in the processor if we want it
deployed in Zotero.

I’ll look forward to seeing how it turns out.

There are a lot of things up-in-the-air here.

Is there any indication of how Zotero intends to deal with this (is
there a ticket, for example)?

I’d be inclined to go with tag-based markup in Zotero, since I really don’t
think we want to pollute stored(/synced/displayed) data with arbitrary
inline markup, and we already have mechanisms in various places to deal with
HTML. Simon and I have discussed using a stripped-down version of TinyMCE
for title fields to handle this. My only concern would be performance, since
the TinyMCE-based rich-text notes in 2.0 have a bit of a display lag, and
that’d be more of an issue for title fields.

Sorry, Dan, we might be talking at cross purposes. The reference to
tag-based versus characters was specific to quotes – I think everyone
is agreed that wysiwyg is preferable to wiki-style markup. The
question is whether quotes should be represented as tags, rather than
as Unicode characters, with me thinking out loud that it would be
simpler to allow users to enter them normally via the keyboard.

But are you suggesting that a user would override the locale-specified
quote characters that appeared around a title in bibliographic output by
editing the data value in Zotero? If so, that doesn’t make much sense to
me, because then the data would essentially be locale-specific, and we’d
need to strip or interpret those characters anywhere we wanted to
process or display the values.

For quotes that appear around titles, I don’t see why we’d need anything
in the actual data values. For quotes that appear within a title, a
tag-based approach that separates structure from presentation makes more
sense to me.

Or perhaps I’m completely misunderstanding the issue…

I’d be inclined to go with tag-based markup in Zotero, since I really don’t
think we want to pollute stored(/synced/displayed) data with arbitrary
inline markup, and we already have mechanisms in various places to deal with
HTML. Simon and I have discussed using a stripped-down version of TinyMCE
for title fields to handle this. My only concern would be performance, since
the TinyMCE-based rich-text notes in 2.0 have a bit of a display lag, and
that’d be more of an issue for title fields.

Sorry, Dan, we might be talking at cross purposes. The reference to
tag-based versus characters was specific to quotes – I think everyone
is agreed that wysiwyg is preferable to wiki-style markup. The
question is whether quotes should be represented as tags, rather than
as Unicode characters, with me thinking out loud that it would be
simpler to allow users to enter them normally via the keyboard.

But are you suggesting that a user would override the locale-specified quote
characters that appeared around a title in bibliographic output by editing
the data value in Zotero? If so, that doesn’t make much sense to me, because
then the data would essentially be locale-specific, and we’d need to strip
or interpret those characters anywhere we wanted to process or display the
values.

No, sorry if what I wrote gave that impression. The current
quotes=“true” markup in CSL for styles that need quotes is certainly
the right way to go for wrapping titles in quotes during bib
rendering. The next step will be to figure out how to set the
characters used for the quotes. For the outer quotes on a title (if
any) there seem to be two proposals on the table. One is to define
four locale terms, one for each of the opening and closing, single and
double quote characters. The other is to have an option in the
bibliography section that says something like , with permutations for the various
possible combinations of quote marks. Either way, the quote marks to
be used would be defined in the style, and applied by the style.
There would be no tampering with field content.

For quotes that appear around titles, I don’t see why we’d need anything in
the actual data values. For quotes that appear within a title, a tag-based
approach that separates structure from presentation makes more sense to me.

At first blush this looked attractive to me also, but I think that it
turns out to be more complicated to implement common cross-lingual
cases via this route. At the boundary between French and English, you
might have an English review of a book written in French, which uses
French quotes in the title of the review:

J. Smith, “A Review of Cholley’s «Je suis un chat»”, Literary Studies
(2000), v. 10, n. 5

I could be wrong, but I think that would be the commonly expected
behavior in this case. The inner quotes aren’t hard-wired: they would
need to flip-flop to singles if the review were cited in a style that
applies French guillemets instead of English quotes to a title. But
in the example above, the guillemets are outside the scope of English
punctuation, so they are just passed through verbatim.

Implementing this in CSL is not difficult, and it wouldn’t require any
changes in Zotero to make it work; quote characters are just entered
as part of literal text of titles in which they appear, and display as
ordinary UTF-8. Any flip-flop mangling at bib render time would be up
to the CSL processor. To make flip-flop work where it’s needed, the
set of quote marks to be recognized in input for the locale of the
style, and the hierarchy of quotes to be used in output, would need to
be fixed, either as locale terms, or using some special element for
that purpose. (Separating input from output quote character sets
would solve any lurking problems with courier-style quote marks
slipping through into typeset output.)

If the inner quotes were tag markup, on the other hand, they would
need to carry a hint of the locale of the source in order to produce
the correct quotation glyphs. You would also need some blocking or
intercept mechanism to prevent the insertion of literal quote
characters via the keyboard … while somehow permitting users to
insert apostrophes … it seems like it could get pretty complicated
pretty quickly.

Or perhaps I’m completely misunderstanding the issue…

I think we’re all on the same page, it’s just uncharted territory.

But are you suggesting that a user would override the locale-specified quote
characters that appeared around a title in bibliographic output by editing
the data value in Zotero? If so, that doesn’t make much sense to me, because
then the data would essentially be locale-specific, and we’d need to strip
or interpret those characters anywhere we wanted to process or display the
values.

No, sorry if what I wrote gave that impression. The current
quotes=“true” markup in CSL for styles that need quotes is certainly
the right way to go for wrapping titles in quotes during bib
rendering. The next step will be to figure out how to set the
characters used for the quotes. For the outer quotes on a title (if
any) there seem to be two proposals on the table. One is to define
four locale terms, one for each of the opening and closing, single and
double quote characters. The other is to have an option in the
bibliography section that says something like, with permutations for the various
possible combinations of quote marks. Either way, the quote marks to
be used would be defined in the style, and applied by the style.
There would be no tampering with field content.

OK, got it. Sorry for the confusion.

I can’t think of any particular reason (e.g., from the forums) not to do
this using locale terms.

For quotes that appear around titles, I don’t see why we’d need anything in
the actual data values. For quotes that appear within a title, a tag-based
approach that separates structure from presentation makes more sense to me.

At first blush this looked attractive to me also, but I think that it
turns out to be more complicated to implement common cross-lingual
cases via this route. At the boundary between French and English, you
might have an English review of a book written in French, which uses
French quotes in the title of the review:

J. Smith, “A Review of Cholley’s «Je suis un chat»”, Literary Studies
(2000), v. 10, n. 5

I could be wrong, but I think that would be the commonly expected
behavior in this case. The inner quotes aren’t hard-wired: they would
need to flip-flop to singles if the review were cited in a style that
applies French guillemets instead of English quotes to a title. But
in the example above, the guillemets are outside the scope of English
punctuation, so they are just passed through verbatim.

Implementing this in CSL is not difficult, and it wouldn’t require any
changes in Zotero to make it work; quote characters are just entered
as part of literal text of titles in which they appear, and display as
ordinary UTF-8. Any flip-flop mangling at bib render time would be up
to the CSL processor. To make flip-flop work where it’s needed, the
set of quote marks to be recognized in input for the locale of the
style, and the hierarchy of quotes to be used in output, would need to
be fixed, either as locale terms, or using some special element for
that purpose. (Separating input from output quote character sets
would solve any lurking problems with courier-style quote marks
slipping through into typeset output.)

That makes sense, and indeed sounds like the better option. I
particularly like the “wouldn’t require any changes in Zotero” part.

Another question for the spec:

Do we assume default values for quotes, or are all locales required to
specify the glyphs? The only reason I could for the latter is if it’s
feasible a locale would want nil values (and so no quotes).

Bruce

Are quotes different in this regard from any other locale terms? I would
think that, across the board, we should just distinguish between empty
strings and missing strings, with the latter falling back to English.

So you’re suggesting a “yes” answer to my question?

Bruce

Another question for the spec:

Do we assume default values for quotes, or are all locales required to
specify the glyphs? The only reason I could for the latter is if it’s
feasible a locale would want nil values (and so no quotes).

Are quotes different in this regard from any other locale terms? I would
think that, across the board, we should just distinguish between empty
strings and missing strings, with the latter falling back to English.

I’ve added a couple of tests, and can confirm (after fixing a small
bug that the test revealed) that the new processor behaves in this way
for all terms. So quotes can either be provided in all locales, or a
default can be provided for English, with the other locales providing
a definition only when it differs.

Another question for the spec:

Do we assume default values for quotes, or are all locales required to
specify the glyphs? The only reason I could for the latter is if it’s
feasible a locale would want nil values (and so no quotes).

Are quotes different in this regard from any other locale terms? I would
think that, across the board, we should just distinguish between empty
strings and missing strings, with the latter falling back to English.

So you’re suggesting a “yes” answer to my question?

Is the question whether quotes should be specified in the locale or,
alternatively be coded into the processor with a set of option values
() to modify the default
behavior on a per-style basis?

I would recommend the former, if that’s the question posed. If quotes
are specified as locale terms, it would make it simpler, at a later
stage, to adapt CSL to handle multi-lingual citations correctly (i.e.
cites with translated titles etc).

If the question is whether, when quote characters are specified in the
English locale, they also should be specified in all other locales,
the answer is that they only need to be defined when the default
should be overridden (including nil quote cases, since specifying an
empty term will override the default value and replace it with an
empty string.) It does no harm to define the quote characters
explicitly in every locale, though.

(Hoping that the above is responsive.)

Frank

Another question for the spec:

Do we assume default values for quotes, or are all locales required to
specify the glyphs? The only reason I could for the latter is if it’s
feasible a locale would want nil values (and so no quotes).

Are quotes different in this regard from any other locale terms? I would
think that, across the board, we should just distinguish between empty
strings and missing strings, with the latter falling back to English.

So you’re suggesting a “yes” answer to my question?

Is the question whether quotes should be specified in the locale or,
alternatively be coded into the processor with a set of option values
() to modify the default
behavior on a per-style basis?

No; the question was the one I asked which Dan quoted:

“Do we assume default values for quotes, or are all locales required
to specify the glyphs?”

I would recommend the former, if that’s the question posed. If quotes
are specified as locale terms, it would make it simpler, at a later
stage, to adapt CSL to handle multi-lingual citations correctly (i.e.
cites with translated titles etc).

If the question is whether, when quote characters are specified in the
English locale, they also should be specified in all other locales,
the answer is that they only need to be defined when the default
should be overridden (including nil quote cases, since specifying an
empty term will override the default value and replace it with an
empty string.) It does no harm to define the quote characters
explicitly in every locale, though.

Yes. My conclusion is the quotes only get specified in the locale if
they are different than what we use in English (e.g. what we’ve been
doing). E.g. my answer to my question is “yes.”

Bruce