Damn, the webmailer scrubled all tags, sorry!
The RNC definition for richt text [1] implies the following rich text
elements
abbr, b, cite, cite+class=part, i, sc, span, span+class=protect, sup,
sup
The FlipFlip parser of citeproc-js [2] supports this richt text
elements:
i, b, sup, sub, sc, span+class=nodecor, span+class=nocase, ", ’
My inclusion of the sc element is a bug. I’d really prefer to stick to
a strict subset of HTML, since it can do what we need, and there are
tons of tools and software that can deal with it.
At least ‘sc’ for small-caps is well defined, you can map it to HTML and
vice versa. The elements i, b, sup, and sub are also clear, so I focus
on
the rest. You wrote about use cases:
- preserve case (proper nouns and acronyms, though the latter can be
handled programmatically)
Why keeping ‘abbr’? It is semantic markup, not supported by citeproc-js
anyway.
- species names
- foreign language terms
The last two can probably reasonably be handled without any semantic
markup: if field output as italic, switch in-field italics to normal.
Yes, this is semantic markup, that we do not have on real data. In
practise
we only have presentational markup like italic, which could indicate
species
names, foreign language terms, or anything else beyond of scope of CSL.
[this] is tricky. While I hate to admit it, maybe it can be
handled through presentation rules similar to the above? But then we
get into awkwardness like, are the following all equivalent from a
processing perspective: simple quotation marks, “smart” quotes that
use the right and left hand quotes marks, the LaTeX double ‘’ and ``
characters, not to mention all the international wrinkles?
It’s not about titles within titles but also other words set in quotes
a title. I thought citeproc-js handles these by parsing quotes (" and ')
in rich text. International quotes and LaTeX characters are out of the
scope of CSL schema, they must be recognized and converted in a
pre-parsing
state. I thought that the elements ‘cite’ and ‘cite+class=part’
correspond
to two levels of quotes in citeproc-js.
But if we do that, we need to distinguish between two inline titles:
regula (normally italicized in English), and parts (normally in
quotes).
I have never heard of this “normally” rules before, they seem to apply
to
English only. We only have italicized parts and parts set in quotes,
whatever they mean.
My question was about the remaining markup element:
span or span+class=protect in CSL schema
and
span+class=nodecor or span+class=nocase in citeproc-js
There could be one element for case-preserving (which is semantic
markup), but two?
Jakob
[1] https://bitbucket.org/bdarcus/csl-schema/src/tip/csl-data.rnc
[2]
https://bitbucket.org/fbennett/citeproc-js/src/tip/src/util_flipflop.js
which contains the following definition
[“”, “”, “italics”, “@font-style”, [“italic”, “normal”], true],
[“”, “”, “bold”, “@font-weight”, [“bold”, “normal”], true],
[“”, “”, “superscript”, “@vertical-align”, [“sup”, “sup”], true],
[“”, “”, “subscript”, “@vertical-align”, [“sub”, “sub”], true],
[“”, “”, “smallcaps”, “@font-variant”, [“small-caps”, “small-caps”],
true],
[“”, “”, “passthrough”, “@passthrough”, [“true”, “true”], true],
[“”, “”, “passthrough”, “@passthrough”, [“true”, “true”], true],
[‘"’, ‘"’, “quotes”, “@quotes”, [“true”, “inner”], “'”],
[" ‘", "’“, “quotes”, “@quotes”, [“inner”, “true”], '”']–
Verbundzentrale des GBV (VZG)
Digitale Bibliothek - Jakob Voß
Platz der Goettinger Sieben 1
37073 Goettingen - Germany
+49 (0)551 39-10242
@Jakob_Voss