Chapters, volumes and editions

As far as I see it:

Option 1 - extracting the numeric value from fields is the easiest. I’ve
already written the code for zotero and it all works, at least for the
examples I’ve tried. It allows fields which may be numeric to store non
numeric data, or a combination, and to test and extract the relevant parts
if required. Its backwards compatible with the old format - the CSL
additions do not change existing formatters, it just allows more control if
you need it. There is no need to upgrade databases, or tinker with scrapers
or anything.

Option 2 - I’m still not clear how this solves the issue. If the value field
is a number, you pass through the number, but if its not you pass through
the whole thing. So you don’t know whether to append “ed.” or “vol.” onto
the end. I don’t think it has moved us on any further. You might code it up
as






You will still end up with either a style that is not conformant to, say,
Chicago with “3 ed.” or else something like “reprint edition ed.”

Option 3 - edition_number and edition_version style would solve the problem
to a degree. I still think it will be as clumsy as the option 1. How would
you format edition under this scheme? Probably something like













So its basically its very similar amount of formatting data as option 1,
except we don’t have the new formatting for ordinal, and roman numerals. Its
also now a solution specific to edition, so if we find weird cases of
volume, issue, chapter or number-of-volumes we have to make some new double
variables there too. It requires either upgrading databases and scrapers, or
else writing code to transform a base variable into one of these two at run
time. Its basically an explicit version of option 2.

I guess I vote for 1, but then I would say that!

Julian.

Option 4 - extract the numeric value from fields when possible, and
otherwise, the variable “edition” is the edition as text without any
kind of ed/edition suffix. (How this gets done is Zotero’s business,
not CSL’s. It’s extraordinarily easy just to strip these suffixes out
when the bibliography is generated, but another implementation might
just specify not to enter a suffix.) falls back on the text
variable if no number is available.

Under option 1, if the edition is “revised edition” but the style
wants “revised ed.” it the edition variable’s value is “revised
edition,” and this is what appears in the bibliography. Not too big of
a deal, but not quite right. Under option 4, the edition variable’s
value would be “revised” and you’d get “revised ed.”

With implicit fallback (if no number exists, prints the same thing as
would), formatting looks like:





This has the disadvantage of introducing incompatibility with current
styles, but the advantage of being simple and producing better output,
except maybe when the style wants “edition 2,” although that could be
solved by allowing the kind of conditional that option 1 requires. In
this case, maybe would be less misleading as
far as syntax goes.

I’d definitely go for this option or option 1. Option 1 has fewer
implied rules, but this results in shorter CSLs and better output.

Simon

Option 4 - extract the numeric value from fields when possible, and
otherwise, the variable “edition” is the edition as text without any
kind of ed/edition suffix. (How this gets done is Zotero’s business,
not CSL’s. It’s extraordinarily easy just to strip these suffixes out
when the bibliography is generated, but another implementation might
just specify not to enter a suffix.) falls back on the text
variable if no number is available.

Under option 1, if the edition is “revised edition” but the style
wants “revised ed.” it the edition variable’s value is “revised
edition,” and this is what appears in the bibliography. Not too big of
a deal, but not quite right. Under option 4, the edition variable’s
value would be “revised” and you’d get “revised ed.”

With implicit fallback (if no number exists, prints the same thing as
would), formatting looks like:





This has the disadvantage of introducing incompatibility with current
styles, but the advantage of being simple and producing better output,
except maybe when the style wants “edition 2,” although that could be
solved by allowing the kind of conditional that option 1 requires. In
this case, maybe would be less misleading as
far as syntax goes.

I’d definitely go for this option or option 1. Option 1 has fewer
implied rules, but this results in shorter CSLs and better output.

Its an interesting idea - if we apply it to other fields like volume it
would again possibly need things stripping off. I’ve just been looking
through the Chicago spec. It depends if we want to support all possible
things it does.
Examples which work include
Florence Babb, Between Field and Cooking Pot: The Political Economy of
Marketwomen in Peru
, rev. ed. (Austin: University of Texas Press, 1989),
199.
Strunk, William, Jr., and E. B. White. The Elements of Style. 4th ed. New
York: Allyn and Bacon, 2000.
Anderson, J. L., and D. Richie. 1982. The Japanese film art and industry.
Exp. ed. Princeton, NJ: Princeton Univ. Press.
Weber, M., H. M. de Burlet, and O. Abel. 1928. Die Säugetiere. 2nd ed. 2
vols. Jena: Gustav Fischer.

Examples which don’t from

Fitzgerald, F. Scott. The Great Gatsby. New York: Scribner, 1925.
Reprinted with preface and notes by Matthew J. Bruccoli. New York: Collier
Books, 1992. Page references are to the 1992 edition.
Schweitzer, Albert. J. S. Bach. Translated by Ernest Newman. 1911.
Reprint, New York: Dover, 1966.

Shakespeare, William. Hamlet. Arden edition. Edited by Harold Jenkins.
London: Methuen, 1982.

You might consider those as too complex to worry about?

Julian.

My apologies in coming to this debate late–I think dates may require
a more complicated solution than editions.

It’s not really structured information when you get into things like
“Summer 2007” and “circa 1754”, though, is it? (Obviously if it were
seen as such, CSL could correctly translate “Summer”, but that
definitely seems like something we don’t want to get into, and
those are
probably relatively easy examples anyway.)

We should probably come up with some Option 1 and Option 2 examples
for
funkier dates–handling at least date ranges as additional structured
options–before deciding. My sense is that if the date doesn’t parse
into supported semantic fields without a remainder, it probably
needs to
be passed through in its entirety without structure, replacing the
whole
element, and there wouldn’t be too much point in using extra
conditionals. That would avoid issues like the one Sean mentioned
in the
Zotero forums where the range part of a date range was silently
discarded.

I really like “smart” parsing in Zotero when you don’t have to clean
up dates entered by translators, but:

I seem to remember that some translators entered date issued as “Fri,
Nov. 30, 2007”–those may present a problem for this solution because
“Fri” needs to be discarded, whereas in “circa 1754” the entire
string should be preserved.

It would be great to be able to sort dates such as “Summer 2007” in
the middle pane–“Summer” should sort before “Fall” and “September”.

Best,
Elena

Simon Kornblith wrote:

Option 4 - extract the numeric value from fields when possible, and
otherwise, the variable “edition” is the edition as text without any
kind of ed/edition suffix. (How this gets done is Zotero’s business,
not CSL’s. It’s extraordinarily easy just to strip these suffixes out
when the bibliography is generated, but another implementation might
just specify not to enter a suffix.) falls back on the text
variable if no number is available.

So, for example, the mapping of raw input to CSL variable would be
something like:

"First Edition" ==> "1"
"New Edition" ==> "New"

…?

Can we formulate that logic more clearly? Do we apply it to other
variables (dates, etc.)?

[aside: I wonder if we need a more generic “version” variable too?]

Bruce

Julian Onions wrote:

Examples which don’t [work with option 4] from
Page Not Found

Shakespeare, William. /Hamlet/. Arden edition. Edited by Harold
Jenkins. London: Methuen, 1982.

So you’re saying that the number element would be hunting for a number,
and since “Arden edition” contains no such thing, under Simon’s
suggested rules, the field would be empty?

On your other note:

Option 3 - edition_number and edition_version style would solve the
problem to a degree. I still think it will be as clumsy as the option
1.

It would, with the sole advantage being that it is less ambiguous, which
was part of my problem with option 1. Note, though, that I didn’t use
“edition_version”; I used “edition_description”.

Bruce

Under the rules suggested I think this would be pruned to “Arden”, and "ed."
added on, so ending up with “Arden ed.” - which isn’t quite the same as what
Chicago specifies - but do we care (enough)?
I think we’d also end up with “Reprint ed.” and
"Reprinted with preface and notes by Matthew J. Bruccoli. ed."
again, not quite what Chicago suggests.

The edition field is going to be complex - so we can either try and find
something like a two line CSL description (option 4) that will cater for
90+% of all cases, or something more complex (like option 1) that will
perhaps get to 99% of all cases.
When all said and done, I think few people will care about the CSL
description, they’ll just use the results, and won’t care that it takes 2, 5
or 50 lines to produce what they want.

Julian.

Well its been a couple of weeks since I started this thread - does anyone
think there is any consensus, or should we just ignore the issue and see if
it really is a problem when deployed?

We had 4 proposals:

  1. A new directive which extracts numeric data and allows custom
    formatting.
  2. Pass through a number if present just for edition field, else pass the
    whole field.
  3. Have edition_number and edition_description variables
  4. Have the implementation (e.g. zotero) sanitise the field, removing things
    like ed. edition. rd, th, etc to get to something that can have ed. appended
    without worry.

I think we’ve been through most of the pros and cons, and so far I think
only option 1 handles all the examples from Chicago, but is also the most
verbose.

I could upload the code for 1. to zotero as a branch if people wanted to try
it out in practice and see if it is tractable.

Julian.

Julian Onions wrote:

Well its been a couple of weeks since I started this thread - does
anyone think there is any consensus, or should we just ignore the issue
and see if it really is a problem when deployed?

I’m not sure.

I could upload the code for 1. to zotero as a branch if people wanted to
try it out in practice and see if it is tractable.

I don’t think the issue is how things work on the implementation end.
It’s how it works on the style authoring end. Whatever solution (if any)
we use needs to be clear and consistent. Ideally it’s concise, but
that’s somewhat less of an issue for me.

I’m curious about other opinions, though. As I said earlier, I don’t
have a ton of time to think about this ATM.

Bruce

Julian Onions wrote:

Well its been a couple of weeks since I started this thread - does
anyone think there is any consensus, or should we just ignore the issue
and see if it really is a problem when deployed?

We had 4 proposals:

  1. A new directive which extracts numeric data and allows
    custom formatting.
  2. Pass through a number if present just for edition field, else pass
    the whole field.
  3. Have edition_number and edition_description variables
  4. Have the implementation (e.g. zotero) sanitise the field, removing
    things like ed. edition. rd, th, etc to get to something that can have
    ed. appended without worry.

I think we’ve been through most of the pros and cons, and so far I think
only option 1 handles all the examples from Chicago, but is also the
most verbose.

To go back to your example for 1, you suggested:

The 3 example would be similar, but would differentiate number and text
variables:

So I don’t think 1 has any better support for Chicago than 3?

The primary reason I prefer three is that it in essence makes clear they
are two distinct fields from the perspective of CSL. It just feels a
little more intuitive to me.

I think in the context of some uncertainty about the best course, I’m
prone to accept your opinion Julian, since you’ve written the most styles.

To be clear, it would result in the following schema changes:

a. add a new cs:number element, something like:

number = element cs:number {
attribute variable {
“edition”
> “volume”
> “issue”
> “pages”
}

plus whatever additional attributes

}

b. add a new “number” attribute on the conditional structures

c. add (probably) “edition_description” to the text list.

Right? Anything else?

To be clear, then, also: this would be a backward incompatible change,
and would require changes in all existing styles (probably easy enough
with a simple script?).

Bruce

Julian Onions wrote:

Well its been a couple of weeks since I started this thread - does
anyone think there is any consensus, or should we just ignore the issue
and see if it really is a problem when deployed?

We had 4 proposals:

  1. A new directive which extracts numeric data and allows
    custom formatting.
  2. Pass through a number if present just for edition field, else pass
    the whole field.
  3. Have edition_number and edition_description variables
  4. Have the implementation (e.g. zotero) sanitise the field, removing
    things like ed. edition. rd, th, etc to get to something that can have
    ed. appended without worry.

I think we’ve been through most of the pros and cons, and so far I think
only option 1 handles all the examples from Chicago, but is also the
most verbose.

To go back to your example for 1, you suggested:

The 3 example would be similar, but would differentiate number and text
variables:

So I don’t think 1 has any better support for Chicago than 3?

yes that is true for edition - however in this case I think 1 is more
general as we don’t have to have a volume_description an issue_description
and so on. I think it also means more work in the processing. Zotero would
have to work out what to put in each, and what rules would work. Would it
store a single variable and separate it at process time or split it at
storage time so allowing editing of the different parts.

The primary reason I prefer three is that it in essence makes clear they

are two distinct fields from the perspective of CSL. It just feels a
little more intuitive to me.

I see it more as two views on the same field. Can we treat it as a number or
just text, as in many interpreted languages.

I think in the context of some uncertainty about the best course, I’m

prone to accept your opinion Julian, since you’ve written the most styles.

To be clear, it would result in the following schema changes:

a. add a new cs:number element, something like:

number = element cs:number {
attribute variable {
“edition”
> “volume”
> “issue”
> “pages”
}

plus whatever additional attributes

    }

b. add a new “number” attribute on the conditional structures

Or possible isnumber/isnumeric was suggested as more descriptive.

c. add (probably) “edition_description” to the text list.

I think that just complicates things myself, anyone else have opinions?

Right? Anything else?

The formatting directives - I guess they’re part of the additional
attributes.

To be clear, then, also: this would be a backward incompatible change,

and would require changes in all existing styles (probably easy enough
with a simple script?).

It would if edition_description was used, if we had the view edition alone
was used it would be backwards compatible. However I don’t see either as a
stumbling block. A number of the new styles don’t work correctly with
1.0.1anyway.

Julian.

Julian Onions wrote:

yes that is true for edition - however in this case I think 1 is more
general as we don’t have to have a volume_description an
issue_description and so on. I think it also means more work in the
processing. Zotero would have to work out what to put in each, and what
rules would work. Would it store a single variable and separate it at
process time or split it at storage time so allowing editing of the
different parts.

The primary reason I prefer three is that it in essence makes clear they
are two distinct fields from the perspective of CSL. It just feels a
little more intuitive to me.

I see it more as two views on the same field. Can we treat it as a
number or just text, as in many interpreted languages.

Duck-typing for citation styling :wink:

Tell you what, can you come up with clear and concise language that I
can put in the schema documentation that explains this in terms that
both programmers and style authors can unambiguously understand?

If we can agree on that, I can integrate the changes.

I think in the context of some uncertainty about the best course, I'm
prone to accept your opinion Julian, since you've written the most
styles.

To be clear, it would result in the following schema changes:

a. add a new cs:number element, something like:

number = element cs:number {
        attribute variable {
                "edition"
                > "volume"
                > "issue"
                > "pages"
                }
# plus whatever additional attributes
        }

b. add a new "number" attribute on the conditional structures

Or possible isnumber/isnumeric was suggested as more descriptive.

I missed this.

Bruce

I see it more as two views on the same field. Can we treat it as a
number or just text, as in many interpreted languages.

Duck-typing for citation styling :wink:

Could be!

Tell you what, can you come up with clear and concise language that I

can put in the schema documentation that explains this in terms that
both programmers and style authors can unambiguously understand?

I’ll have a go.

"The number markup directive matches the first number found in a field, and
returns only that component. If no number is detected, the result is empty.
A non-empty number may be subject to further formatting consisting of a form
attribute whose value may be numeric, ordinal or roman to format it as a
simple number (the default), an ordinal number (1st, 2nd, 3rd etc) or roman
(i, ii, iii, iv etc). The text-case can also apply to capitalize the roman
numbers for instance. The other normal formatting rules apply too
(font-style, …).

When used in a conditional, number/isnumber tests if there is a
number present, allowing conditional formatting."

Julian.

Julian Onions wrote:

 > I see it more as two views on the same field. Can we treat it as a
 > number or just text, as in many interpreted languages.

Duck-typing for citation styling ;-)

Could be!

Tell you what, can you come up with clear and concise language that I
can put in the schema documentation that explains this in terms that
both programmers and style authors can unambiguously understand?

I’ll have a go.

OK, I’ve added it, along with a “datatype” conditional (with options for
“number” and “date”; more flexible going forward, but still works for
this case?).

Take a look as a) I’m busy (leaving town tomorrow), and b) under the
weather. I wouldn’t be surprised if I made some stupid mistake.

Bruce

Looks good - I like the datatype idea.
I’ve fixed up a couple of minor issues.

Julian.

How do you test a variable for being a number with the datatype?
Is it

in which case how do you tie the datatype and the variable together?
I may well be missing something here!

Julian.

Julian Onions wrote:

How do you test a variable for being a number with the datatype?
Is it

in which case how do you tie the datatype and the variable together?
I may well be missing something here!

I honestly didn’t put much thought into it, so I wouldn’t read a lot
into how I implemented it. But the above seems reasonable to me. Did you
have something else in mind?

Bruce

I’m not sure how to implement it, as you could have

it seems a little confusing on a second look.
Really the datatype needs to be tied to a variable.
something like

for instance is a little confusing.
I can try and get it to work though if we agree to limit the semantics a
little so the above is not included.

Julian.

I’m not sure how to implement it, as you could have

it seems a little confusing on a second look.
Really the datatype needs to be tied to a variable.
something like

for instance is a little confusing.

That wouldn’t be valid. The value of datatype is either one or the
other; not both.

I can try and get it to work though if we agree to limit the semantics a
little so the above is not included.

Does the above address your concern?

Bruce

It makes the code slightly harder, but its doable. I’ll let you know if not.

Julian.