CSL Questions

Bruce_D_Arcus1 · July 20, 2006, 2:00pm

Though I wonder what Johan thinks. Might be more difficult to handle in
a GUI.

Bruce

Simon_Kornblith · July 20, 2006, 4:24pm

Okay, I’ve updated http://simonster.com/csl/ not to put a " " on unless it’s
explicitly specified as the prefix, but now there are places where spaces
are missing. Is this a bug in the style, or is something still wrong with
the way I’m interpreting the CSL?

Simon

Bruce_D_Arcus1 · July 20, 2006, 4:42pm

Yeah, there were missing prefixes on some of the volume and access
elements. I’ve fixed them.

Bruce

Johan_Kool2 · July 20, 2006, 5:20pm

So, basically, there’d be strings from a controlled vocabulary (
with an ‘id’ attribute: ) and arbitrary strings or
punctuation ( without an ‘id’ attribute).

Sure. That might be better.

Though I wonder what Johan thinks. Might be more difficult to handle in
a GUI.

It might be a little easier with the label attribute, but it won’t
matter very much. Just go with what seems the best from the logic of
CSL.

Just so I understand the idea for the localization so far.

We include a file with the CiteProc distribution that defines a
default translation.

Default-localization.xml:

available from
…
…

ist da auf
…
…

beschikbaar op
…
…

etc.

And add to the csl file:

komt uit

The latter overwrites the default. Can I suggest that the
Default-localization.xml gets a version and that the CSL file notes
which version the author of it assumed was present. If CiteProc sees
that it Default-localization.xml has a different version number from
the one mentioned in the CSL file it can warn the user that it’s
localization might differ from the CLS author’s intention.

Johan

Bruce_D_Arcus1 · July 20, 2006, 5:33pm

Just so I understand the idea for the localization so far.

We include a file with the CiteProc distribution that defines a
default translation.

Default-localization.xml:

available from
…
…

ist da auf
…
…

beschikbaar op
…
…

etc.

Right now, I have an XSL file that handles this:

<xsl:param name=“lang”>en</xsl:param>

<xsl:variable name=“date-accessed”>
xsl:choose
<xsl:when test=“$lang = ‘en’”>accessed</xsl:when>
</xsl:choose>
</xsl:variable>

<xsl:variable name=“in”>
xsl:choose
<xsl:when test=“$lang = ‘en’”>In</xsl:when>
</xsl:choose>
</xsl:variable>

But we’d talked about putting up JSON/YAML equivalent somewhere. In
Simon’s Javascript code, he has:

var loc = {
and:“and”,
etAl:“et al”,
pSingle:“p.”,
pMultiple:“pp.”,
editorVerb:“Edited By”,
editorNounSingle:“Ed.”,
editorNounMultiple:“Eds.”,
translatorVerb:“Translated By”,
translatorNounSingle:“Trans.”,
translatorNounMultiple:“Trans.”,
months:[“January”, “February”, “March”, “April”, “May”, “June”, “July”,
“August”, “September”, “October”, “November”, “December”],
monthsAbbreviated:[“Jan”, “Feb”, “Mar”, “Apr”, “May”, “Jun”, “Jul”,
“Aug”,
“Sep”, “Oct”, “Nov”, “Dec”],
pagesShortSingle:“p”,
pagesShortMultiple:“pp”,
pagesLongSingle:“page”,
pagesLongMultiple:“pages”
}

E.g. JSON. Note: for convention’s sake, isn’t it better to use
underscores for these sorts of variables?

In any case, that’s a minor detail.

And add to the csl file:

komt uit

The latter overwrites the default.

Yes, that seems sensible to me. Simon?

Can I suggest that the Default-localization.xml gets a version and
that the CSL file notes
which version the author of it assumed was present. If CiteProc sees
that it Default-localization.xml has a different version number from
the one mentioned in the CSL file it can warn the user that it’s
localization might differ from the CLS author’s intention.

Would it be sensible to cross this bridge when we come to it? E.g.
don’t require declaring a version for the localization until it’s
actually needed?

Bruce

Matthias_Steffens · July 20, 2006, 6:27pm

Hi Johan,

We include a file with the CiteProc distribution that defines a
default translation.

Default-localization.xml:

available from
…
…

ist da auf
…
…

beschikbaar op
…
…

etc.

Yes, or as Bruce noted, some kind of JSON/YAML equivalent, such as:

en:
available-from: “available from”
…
de:
available-from: “verfügbar via”
…
nl:
available-from: “beschikbaar op”
…

And add to the csl file:

komt uit

The latter overwrites the default.

Yes, if it defines its own (local) text.

But as I understood things, the regular case (Bruce’s option 2 in one of
his previous mails today) could be this:

In that case the text with id=“available-from” would be taken from the
language file “Default-localization.xml”.

Note however, if the latter case is the standard case, then CSL files
would not be self-contained anymore – unless they contain a list of
strings at the top of the CSL file (this was my original suggestion).

So, we’re actually dealing with three possible places were strings could
be stored:

inside individual containers within the CSL structure
within a single location inside the CSL file (e.g., at the top of the
file)
within a single additional localization file

I think it’s important to decide which is the default location and which
other locations are optionally allowed.

If we want CSL files to be self-contained, the default must be option 1
and/or option 2 (with 2 avoiding duplication of strings between
reftypes).

Matthias

Simon_Kornblith · July 20, 2006, 6:30pm

Just so I understand the idea for the localization so far.

We include a file with the CiteProc distribution that defines a
default translation.

Default-localization.xml:

available from
…
…

ist da auf
…
…

beschikbaar op
…
…

etc.

Right now, I have an XSL file that handles this:

[…]

But we’d talked about putting up JSON/YAML equivalent somewhere. In
Simon’s Javascript code, he has:

[…]

E.g. JSON. Note: for convention’s sake, isn’t it better to use
underscores for these sorts of variables?

In any case, that’s a minor detail.

If we use JSON/YAML, we could use something at least vaguely more structured
than what my code is using right now (which is purely temporary). For
example, in JSON:

pages:{
    short:{
        single:"p",
        multiple:"p"
    }
}

And then reference these as pages.short.single, pages.short.multiple, etc. I
also see an argument for going with straight XML, as it means a bit less
work to implement, since you already need a decent XML parser to parse CSL
in the first place.

And add to the csl file:

komt uit

The latter overwrites the default.

Yes, that seems sensible to me. Simon?

The alternative is to allow the style author to supply the xml:lang in the
element, then simply override the default in place, the same way I
had previously suggested we handle everything. I slightly prefer this
approach, because it means the data is only in two places, rather than
three. Or, we could allow authors to use either of these approaches,
CSS-style, although that may just further complicate things.

Can I suggest that the Default-localization.xml gets a version and
that the CSL file notes
which version the author of it assumed was present. If CiteProc sees
that it Default-localization.xml has a different version number from
the one mentioned in the CSL file it can warn the user that it’s
localization might differ from the CLS author’s intention.

Would it be sensible to cross this bridge when we come to it? E.g.
don’t require declaring a version for the localization until it’s
actually needed?

Argh. I’d hate to put people in a situation where all the data is there, but
they can’t use it even in its default because they don’t have the newest
localization file (and thus can’t map one of the IDs), but I guess we’ll
eventually need to do this. We should try to make our first version
comprehensive so that we can reduce the number of styles that require a
specific version.

Simon

Matthias_Steffens · July 20, 2006, 6:39pm

And add to the csl file:

komt uit

The latter overwrites the default.

Yes, if it defines its own (local) text.

But as I understood things, the regular case (Bruce’s option 2 in one
of his previous mails today) could be this:

Oops, I didn’t realize the ‘’ bit in your
example and thought you were referring to a part within the CSL
structure. Sorry for the confusion!

Matthias

Bruce_D_Arcus1 · July 20, 2006, 6:49pm

If we use JSON/YAML, we could use something at least vaguely more
structured
than what my code is using right now (which is purely temporary). For
example, in JSON:
pages:{
    short:{
        single:"p",
        multiple:"p"
    }
}
And then reference these as pages.short.single, pages.short.multiple,
etc.

Right. I think it probably makes more sense myself.

I also see an argument for going with straight XML, as it means a bit
less
work to implement, since you already need a decent XML parser to parse
CSL
in the first place.

Yeah. I guess I’d think of this as implementation detail that is
largely separate from CSL. If you’d like to go with an XML localization
file, then we can use the same file, but it’s totally up to you.

And add to the csl file:

komt uit

The latter overwrites the default.

Yes, that seems sensible to me. Simon?

The alternative is to allow the style author to supply the xml:lang in
the
element, then simply override the default in place, the same
way I
had previously suggested we handle everything.

Note: I switched back to having an xml:lang attribute on the root,
because it has implications for the XML processing model. It means –
to an XML parser – that all children are in that language by default
(e.g. unless explicitly overridden).

In any case, if I understand right, you’re suggesting the localization
be in the CSL file; right? As it was originally, but with the explicit
label or text element that can be overridden.

I slightly prefer this approach, because it means the data is only in
two places, rather than three. Or, we could allow authors to use
either of these approaches,
CSS-style, although that may just further complicate things.

Hmm … yeah. I think the practical question is, do strings get handled
exclusively by CSL, or do we say implementors are responsible for them?

I think for users it’s easier if they don’t have to worry about
specifying the strings in addition to choosing their id.

Can I suggest that the Default-localization.xml gets a version and
that the CSL file notes
which version the author of it assumed was present. If CiteProc sees
that it Default-localization.xml has a different version number from
the one mentioned in the CSL file it can warn the user that it’s
localization might differ from the CLS author’s intention.

Would it be sensible to cross this bridge when we come to it? E.g.
don’t require declaring a version for the localization until it’s
actually needed?

Argh. I’d hate to put people in a situation where all the data is
there, but
they can’t use it even in its default because they don’t have the
newest
localization file (and thus can’t map one of the IDs), but I guess
we’ll
eventually need to do this. We should try to make our first version
comprehensive so that we can reduce the number of styles that require a
specific version.

Yes, if we go this route (and it seems consensus is we should) then I’d
hate to have to use versioning.

Bruce

Johan_Kool2 · July 20, 2006, 7:31pm

I also see an argument for going with straight XML, as it means a bit
less work to implement, since you already need a decent XML parser to parse
CSL in the first place.

I don’t know exactly why it was proposed to go with JSON/YAML, but I
would think xml be easier as the CSL file might need to contain some
additional localization, which can then be formatted similar to the
main localization file.

I haven’t exactly followed the original discussion about going with
JSON/YAML, but isn’t sticking to one file-format easier?

Yes, if we go this route (and it seems consensus is we should) then I’d
hate to have to use versioning.

I think you are going to have to use some versioning there. I wouldn’t
make it a required field, nor would I let CiteProc stop when the
versions don’t match. Just emit a warning so the user is alerted to
possible mistranslations.

Johan–

Bruce_D_Arcus1 · July 20, 2006, 7:40pm

Yeah, fair enough. It might indeed be that the simple approach is to go
back to the original approach, modify it a bit, and use the same syntax
both inside and outside CSL. The outboard file might be:

Bruce

Bruce_D_Arcus1 · July 20, 2006, 8:05pm

Actually, I think going back to my original approach, roles really have
their own logic (what role and their position is context-dependent),
and so I’d probably want something like:

… and then:

I think, then, that probably suggests a schema pattern like:

cs-prefix = element cs:prefix { cs-text+ }
cs-text = element cs:text { text | attribute idref { token } }

… and then also allow “role” for contributors.

Bruce

Johan_Kool2 · July 20, 2006, 9:38pm

I don’t see why you use the term “term” or “role” instead of just one
term. I used “text” below, but it could be another tag just as well.
Just thought it was rather confusing to have different tags.
I was thinking that it might be better to not use attributes. If e.g.
some style wants “Edited by” that this can be done more easily.

p pp page pages Ed Eds Editor Editors Ed Edited By available from

… and then:

,

Johan–
http://www.johankool.nl/

Bruce_D_Arcus1 · July 20, 2006, 9:54pm

I don’t see why you use the term “term” or “role” instead of just one
term.

Because they behave differently. See below …

I used “text” below, but it could be another tag just as well.
Just thought it was rather confusing to have different tags.
I was thinking that it might be better to not use attributes. If e.g.
some style wants “Edited by” that this can be done more easily.

[…]

,

Yes, but that’s not exactly more clear is it? I mean, you have an
author element, but saying “if the editor substitute is used in place
of this, print editor.” But that is nowhere explicit.

In the current repo version, I at least have a rule defined globally
that saves a lot of trouble:

<contributors>
  <label position="before-unless-first" type="verb"/>
</contributors>

So it’s explicit here.

And you’re attaching different behavior to this thing (role); it’s
conditional, only printed if the actual variable used here is an
editor. This is why I originally had all this defined globally, and
then templates like:

The roles were then defined within the CSL file, and if there was no
role element for a given variable (like author) the role was empty.
That worked well and was flexible.

BTW, I just did a test with the option we’re discussing here:

templates with (current) attribute-baed approach: 37 lines
templates with the full element + explicit label approach: 145 lines

It’s all a question of trade-offs. I’d opt for consistency and clarity
over brevity, but these little details are difficult to design.

Bruce

Bruce_D_Arcus1 · July 21, 2006, 3:25am

I just had an idea (it may be a bad one, as it’s late!), which is to
keep things as they are, with one minor change, illustrated with this
example:

In other words, keep the flatter structure as is, but a) remove the
label attribute and add the text element, and b) use the new group
element to condition the output if needed.

We then continue to treat roles and locator labels as special cases
subject to the following rules:

both get configured globally in cs:general (as they are now)
locators can have an optional “exclude-label” attribute (for the
annoying example Matthias posted), understood to be false by default

How’s that? The best of all worlds? Or am I missing something obvious?

Bruce

Simon_Kornblith · July 21, 2006, 4:06pm

I agree. Our approach should be as generalized as possible. In fact, I would
recommend something even more general than what Johan suggested:

p pp page pages ...

There’s no reason “short” and “long” have to be tags. The only case in which
we need tags is that of and , because the parser must
choose which term is appropriate.

There are at least two conceivable issues with this approach, however. The
first is that we can’t encode something like “before-unless-first,” which
the APA style uses (although I’m not quite clear on the behavior it’s
supposed to provide). The second is that, for the case we discussed of the
style that uses “p” rather than “pp,” localization is broken for the pages
term when all the data is there. The former would seem more important, but
I’d need to know more about the purpose of “before-unless-first” before I
suggest a solution.

Simon

Bruce_D_Arcus1 · July 22, 2006, 8:59pm

So it seems the list is now working again, and now we have to get back
in sync.

I don’t see why you use the term “term” or “role” instead of just one
term. I used “text” below, but it could be another tag just as well.
Just thought it was rather confusing to have different tags.
I was thinking that it might be better to not use attributes. If e.g.
some style wants “Edited by” that this can be done more easily.

I agree. Our approach should be as generalized as possible. In fact, I
would
recommend something even more general than what Johan suggested:
p pp page pages ...
There’s no reason “short” and “long” have to be tags.

That’s fine.

The only case in which we need tags is that of and
, because the parser must choose which term is appropriate.

True.

There are at least two conceivable issues with this approach, however.
The
first is that we can’t encode something like “before-unless-first,”
which
the APA style uses (although I’m not quite clear on the behavior it’s
supposed to provide).

It’s to avoid:

(Ed.) Doe, J. (1999) ...

… when a substitution happens.

The second is that, for the case we discussed of the
style that uses “p” rather than “pp,” localization is broken for the
pages
term when all the data is there.

Exactly. It would almost work with my earlier approach, though, where
the strings are defined in the CSL file. So I guess one could override
with:

p p

Except, of course, whether it gets printed at all is type-dependent.

I see two options for dealing with that:

don’t (screw 'em if they’re going to be that insane!)
have a suppress-label attribute on pages (as I think I do for
citations)

There’s a third option that may be better though (see below).

The former would seem more important, but
I’d need to know more about the purpose of “before-unless-first”
before I
suggest a solution.

The more I’ve been thinking about this, the more I think we ought to go
back to:

… for all contributors. I just came across these two examples …

<http://www.english.uiuc.edu/cws/wworkshop/writer_resources/
citation_styles/apa/edited_collections.htm>

<http://www.english.uiuc.edu/cws/wworkshop/writer_resources/
citation_styles/apa/chapter_edited_book.htm>

… which suggests that in one context the role term is uppercase, and
in the other, lowercase. It may be a typo (it certainly doesn’t make
any sense!), but having the role element explicitly there at least
gives flexibility to do that if needed, as well as to attach formatting
to it.

It is only slightly more verbose, but avoids potential problems.

But then we’re stuck with what to do about the locators (the above).
The third solution is:

To leave it off, then just don’t include the text element.

Bruce

Simon_Kornblith · July 23, 2006, 2:35am

The more I’ve been thinking about this, the more I think we ought to go
back to:

… for all contributors. I just came across these two examples …

<http://www.english.uiuc.edu/cws/wworkshop/writer_resources/
citation_styles/apa/edited_collections.htm>

<http://www.english.uiuc.edu/cws/wworkshop/writer_resources/
citation_styles/apa/chapter_edited_book.htm>

… which suggests that in one context the role term is uppercase, and
in the other, lowercase. It may be a typo (it certainly doesn’t make
any sense!), but having the role element explicitly there at least
gives flexibility to do that if needed, as well as to attach formatting
to it.

It is only slightly more verbose, but avoids potential problems.

It also seems like a more natural resolution of the “before-unless-first”
issue, because the editor would simply have the author style applied to it,
which would put the ed. after, rather than before.

But then we’re stuck with what to do about the locators (the above).
The third solution is:

To leave it off, then just don’t include the text element.

I think this is a good idea. It makes the behavior more configurable with
minimal added complexity.

As long as we’re on the topic of configurability, I’ve been wondering about
the section. I realize that, if we were to completely eliminate
it, the style would become much longer, and much more repetitive. To
counteract that, I wonder if we could make a tag simply define the
default behavior for each of the tags, e.g.:

Then, each time author is referenced from an , use:

If, for some specific type, the prefix has to come before the author, use:

It seems that, if at all possible, we should use a small but consistent tag
set in all sections of the document. This approach would seem to offer
greater versatility than the current element does (since
overriding is now possible at any point) but without significantly
complicating markup if the new features are unnecessary for a given style.

It would also allow us to make some structures slightly more complex, in
situations where complexity allows a more open structure. For example, we
could now make the alternate a completely separate tag without making the
style significantly more verbose, e.g.:

We would, however, need to override for in-text citations. But, we can now
handle some of what CiteProc is currently not capable of:

Adding support for author-less documents as per APA specifications is
actually relatively simple. (We do need the truncate attribute to truncate
the title when there’s no editor. I don’t know what the APA says about the
exact length we’re supposed to truncate it to, but I’ve arbitrarily chosen
three words.)

It would seem that, the greater the amount of tag re-use, the simpler the
schema, the simpler the programming implementation, and the more versatile
the style language. The downside is that it might make creating new styles
more difficult, since it might require more markup, although it might also
make things easier, since the behavior is more consistent.

What do you think?

Simon

Bruce_D_Arcus1 · July 23, 2006, 9:30am

It seems that, if at all possible, we should use a small but
consistent tag
set in all sections of the document. This approach would seem to offer
greater versatility than the current element does (since
overriding is now possible at any point) but without significantly
complicating markup if the new features are unnecessary for a given
style.

I agree, though the devil is always in the details, so let’s see …

It would also allow us to make some structures slightly more complex,
in
situations where complexity allows a more open structure. For example,
we
could now make the alternate a completely separate tag without making
the
style significantly more verbose, e.g.:

So just to be clear, this would be contained in the general (or maybe
“defaults”) section, and then a default use would just do:

<author/>

…?

We would, however, need to override for in-text citations. But, we can
now
handle some of what CiteProc is currently not capable of:

Hmm … yeah, that does feel like a lot of potential added markup in
this case. But on the face of it, if it saves markup and makes things
more consistent elsewhere, then it might well be fine.

For sake of argument, could you imagine implementing the above in a
nice Ajax-ified editor? This isn’t an absolute requirement; more “nice
to have.”

Adding support for author-less documents as per APA specifications is
actually relatively simple.

OK, but here’s a potential show-stopper problem:

The way the citations work now is that a citation template that has
“author” as it’s first element is going to use the same substitution
function as the reference entry.

What this means practically is that the substitution mechanism for
citations is type-specific, and approach above would mean you’d need a
citation template for EVERY reftype you have in the bibliography
section. For an article, it will be the periodical title, for a
monograph, it will be either a title or “anonymous”, for chapters and
such (parts in monographs) it will be the editor, etc.

So how do you solve that?

I’ve been wondering, BTW, about removing the three-type fallback system
and having instead a single required generic reftype. I’ve always
thought that unreliable, but your approach here might make that
possible.

(We do need the truncate attribute to truncate
the title when there’s no editor. I don’t know what the APA says about
the
exact length we’re supposed to truncate it to, but I’ve arbitrarily
chosen
three words.)

Unless you have a short-title field.

It would seem that, the greater the amount of tag re-use, the simpler
the
schema, the simpler the programming implementation, and the more
versatile
the style language.

I agree.

The downside is that it might make creating new styles
more difficult, since it might require more markup, although it
might also
make things easier, since the behavior is more consistent.

What do you think?

Solve the citation problem above and then we can try it

Bruce

Bruce_D_Arcus1 · July 23, 2006, 10:01am

Actually, one solution is to decouple substitution from templating
altogether.

So:

Then a general:

… and:

Substitution would then also be removed from the bibliographic
templates.

It may not be totally pure, but I can’t see another way at the moment
(though obviouly the markup could be different).

Substitition is critical on authors/creators because these are used as
sort keys.

Bruce

Topic		Replies	Views
Questions, Comments, and Reminders CSL Development	30	464	April 16, 2007
Revised APA Style CSL Development	13	314	July 6, 2007
csl changes CSL Development	19	376	February 10, 2006
finishing with CSL? CSL Development	22	384	August 14, 2007
Style duplication CSL Development	43	548	December 14, 2007

CSL Questions

Related topics