schema bug

I understand the importance of sticking to the versioning convention–
so can’t we say that this is 1.1.0, a new release that is primarily
intended as a bug fix release, but which is incidentally backwards
incompatible in a small way? The larger plans for 1.1 that were in the
1-2 year range can then be moved to 1.2. It does sometime happen that
version numbers increment faster than we anticipate.

Avram

Given the way we’ve designed the schema (e.g. the version attribute),
the 1.x releases would suggest significant changes that would require
style and/or locales forks.

1.0.x releases are minor changes.

I would hope we don’t need to many bug fix releases, but that when we
do need them, we can manage them fairly quickly.

Bruce

Can we go back and discuss whether it’s possible to: “keep a ‘normal’ “sub
verbo” term and create a “sub-verbo” locator term”, along with “CSL
processors would have to link “sub-verbo” to “sub verbo”” (see my earlier
post).

This way any valid CSL 1.0 style will remain valid (because the locator=“sub
verbo” wouldn’t pass validation anyway). With only added functionality, this
wouldn’t be a backwards-incompatible change. I will be a little messy in the
schema, but we can clean that up in CSL 1.1.

Rintze

This is an option. But I suspect what on first blush looks like the
easier approach may actually create more headaches.

What would be the problem if we simply did as I suggest: a quick
search-and-replace on all github csl repos?

So long as implementations update their locales files, things would
continue to “just work” without interruption.

If styles themselves include the term, they’d need to be updated. But
as I said, there’s only one style (that we control) that fits that
scenario. Even if a style did need to be updated but wasn’t, the
impact on output would be trivial (in the case of this style, there’d
be a single, extra period).

Bruce

Because you brought up ordinals, I would just like to add that I think Rintze is absolutely right in that including the gender support for locales will not lead to issues for CSL 1.0 processor processing 1.0 locales. Furthermore, it is not difficult for 1.0.1 processors to handle 1.0 locales; judging from some of the citeproc-tests I believe that the proposal is implemented in citeproc-js already? I definitely and to implement gender support for some of the tests to pass.

However, there was another issue I had to address in citeproc-ruby. For illustrations purposes, you can take a look at the test cases here:

https://github.com/inukshuk/citeproc-ruby/blob/master/spec/csl/locale_spec.rb

The relevant tests are for ‘#ordinalize’, lines 85 through 159. As you can see on lines 92 and 100, I would expect 3 to become 3rd, 13 to become 13th, 23 to become 23rd. This does not work with CSL 1.0 because I can only specify the ordinals 1-4 (13 would probably become 13rd here). What’s more, I would expect that in different languages all kinds of exceptions are possible which lead to similar problems.

My implementation currently addresses this as follows:

Given a a number X to ordinalize, I check the locales, following the normal prioritization, for a definition of X; if X is not defined, the process is repeated for Y = X % mod where mod currently starts at 100 and is divided by 10 in every consecutive step. Therefore, to ordinalize 1155 I would try to look up the following terms until there is a match:

ordinal-1155
ordinal-55
ordinal-05
ordinal-00

To ordinalize 113 the look-up would be for:

ordinal-113
ordinal-13
ordinal-03
ordinal-00

So, as you can see, for English, I would have to define the following minimal set of ordinals to cover all cases (that I can think of right now):

ordinal-00 = 'th’
ordinal-01 = 'st’
ordinal-02 = 'nd’
ordinal-03 = 'rd’
ordinal-11 = 'th’
ordinal-12 = 'th’
ordinal-13 = ‘th’

In the two examples above 1155 becomes 1155th and 113 becomes 113th, whereas using a CSL 1.0 locale the algorithm would return 1155th and 113rd which is wrong but consistent, I think, with CSL 1.0 expectations. That is to say, allowing locales to define any number as an ordinal term should not alter the behaviour of CSL 1.0 processors at all; furthermore, the results of a processor using the above algorithm when using 1.0 locales is consistent with current implementations, too.

One possible problem with this approach is that ordinal-00 is implicitly treated as the default ordinal; given that 0 itself is likely to occur less frequently than 4 (which acts like a default now) this is perhaps even an improvement, but given a language that has a zero ordinal ending which is not suitable as a default this quickly becomes problematic. Therefore, we could add an explicit default ordinal-xx instead of the implicit ordinal-00.

Sylvester

My apologies, of course this should mean that 1.0 processors should be able to process 1.0.1 locales (with gender definitions) without issues (not 1.0 locales obviously), because they would simply disregard the gender attribute (probably selecting the last gender to be defined as it would override the previous definitions).

Sylvester

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

You’re saying you want to fold in a backward-incompatible feature
addition with a bug fix release?

In this context, backward-compatibility is the ability of a CSL 1.0.1 processor to handle CSL 1.0 styles and locales, right? Then only the bug fix is backward-incompatible (as defining gender-specific ordinals in the CSL locales will always be optional).

Because you brought up ordinals, I would just like to add that I think Rintze is absolutely right in that including the gender support for locales will not lead to issues for CSL 1.0 processor processing 1.0 locales. Furthermore, it is not difficult for 1.0.1 processors to handle 1.0 locales; judging from some of the citeproc-tests I believe that the proposal is implemented in citeproc-js already? I definitely and to implement gender support for some of the tests to pass.

However, there was another issue I had to address in citeproc-ruby. For illustrations purposes, you can take a look at the test cases here:

https://github.com/inukshuk/citeproc-ruby/blob/master/spec/csl/locale_spec.rb

The relevant tests are for ‘#ordinalize’, lines 85 through 159. As you can see on lines 92 and 100, I would expect 3 to become 3rd, 13 to become 13th, 23 to become 23rd. This does not work with CSL 1.0 because I can only specify the ordinals 1-4 (13 would probably become 13rd here). What’s more, I would expect that in different languages all kinds of exceptions are possible which lead to similar problems.

My implementation currently addresses this as follows:

Given a a number X to ordinalize, I check the locales, following the normal prioritization, for a definition of X; if X is not defined, the process is repeated for Y = X % mod where mod currently starts at 100 and is divided by 10 in every consecutive step. Therefore, to ordinalize 1155 I would try to look up the following terms until there is a match:

ordinal-1155
ordinal-55
ordinal-05
ordinal-00

To ordinalize 113 the look-up would be for:

ordinal-113
ordinal-13
ordinal-03
ordinal-00

So, as you can see, for English, I would have to define the following minimal set of ordinals to cover all cases (that I can think of right now):

ordinal-00 = ‘th’
ordinal-01 = ‘st’
ordinal-02 = ‘nd’
ordinal-03 = ‘rd’
ordinal-11 = ‘th’
ordinal-12 = ‘th’
ordinal-13 = ‘th’

In the two examples above 1155 becomes 1155th and 113 becomes 113th, whereas using a CSL 1.0 locale the algorithm would return 1155th and 113rd which is wrong but consistent, I think, with CSL 1.0 expectations. That is to say, allowing locales to define any number as an ordinal term should not alter the behaviour of CSL 1.0 processors at all; furthermore, the results of a processor using the above algorithm when using 1.0 locales is consistent with current implementations, too.

Sylvester,

Nice. So the idea is to use a fixed algorithm, and to control it by
defining match-points in the locale files that will yield that right
result for the language? This sounds very interesting.

Frank

Exactly. This way, each locale can decide which match-points to define; for example, German would only have to define ordinal-00 = ‘.’; we could include long-forms in this process, too, so that it is up to each locale which long-forms to define. The algorithm is fixed in the sense, that it assumes the number system uses a base 10; perhaps we could allow for a locale to specify the modulus too?

Sylvester

So am totally not following this. Can someone boil down the
implications for the 1.0.1 discussion?

So am totally not following this. Can someone boil down the
implications for the 1.0.1 discussion?

This is cross-talk, not relevant to the immediate 1.0.1 discussion.
Nice solution, though. I’ll open a ticket for it in the schema
tracker.

So am totally not following this. Can someone boil down the
implications for the 1.0.1 discussion?

Because Rintze suggested to include gender ordinals support in locales into CSL 1.0.1, I wanted to point to another issue with ordinals which could be easily fixed and included in a 1.0.1 release, too.

The problem with the 1.0 schema is that it limits the ordinal definitions to “ordinal-01” through “ordinal-04”; therefore it may not be possible to transform all numbers into their correct ordinal form in all languages. In fact, it is not even possible to do so in English. As a quick example consider 13 and 23: they should become 13th and 23rd, but there is no way for a cite processor to distinguish between the two if you have only defined the ordinals 1-4: 13 and 23 would either become 13rd and 23rd, because both end with 3 and ordinal-03 is defined as ‘rd’, or they would be become 13th and 23th because they are both greater than 4.

What I am suggesting is to alter the schema to allow for any number of ordinal-** definitions. Using the algorithm I proposed, it would be sufficient to define ordinals 0, 1, 2, 3, 11, 12, 13 in order to be able to render all ordinals in English (unless I missed something).

Consider this test case:

https://github.com/inukshuk/citeproc-ruby/blob/master/features/locale/ordinalize.feature

This is not legal CSL 1.0 because it contains ordinal definitions for 0, 11, 12, and 13; however, by changing the CSL to be valid (i.e., instead of ordinal-00 use ordinal-04 and remove ordinals 11, 12, 13), I don’t think it is possible to implement a processor that would produce the expected results in all cases (either 11, 12, 13 or 21, 22, 23 would be wrong).

The proposed change to the schema is ‘harmless’ in that does not render CSL 1.0 styles invalid; furthermore, CSL 1.0 processors would still be able to process 1.0.1 styles as if they were 1.0 styles (although they would not pass schema validation).

As regards the issue of gendered ordinals, I’ve just added a test to illustrate a possible use case; perhaps Rintze or Frank could confirm if this is correct?

https://github.com/inukshuk/citeproc-ruby/blob/master/features/locale/gender.feature

Sylvester

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

So am totally not following this. Can someone boil down the
implications for the 1.0.1 discussion?

Because Rintze suggested to include gender ordinals support in locales into CSL 1.0.1, I wanted to point to another issue with ordinals which could be easily fixed and included in a 1.0.1 release, too.

The problem with the 1.0 schema is that it limits the ordinal definitions to “ordinal-01” through “ordinal-04”; therefore it may not be possible to transform all numbers into their correct ordinal form in all languages. In fact, it is not even possible to do so in English. As a quick example consider 13 and 23: they should become 13th and 23rd, but there is no way for a cite processor to distinguish between the two if you have only defined the ordinals 1-4: 13 and 23 would either become 13rd and 23rd, because both end with 3 and ordinal-03 is defined as ‘rd’, or they would be become 13th and 23th because they are both greater than 4.

What I am suggesting is to alter the schema to allow for any number of ordinal-** definitions. Using the algorithm I proposed, it would be sufficient to define ordinals 0, 1, 2, 3, 11, 12, 13 in order to be able to render all ordinals in English (unless I missed something).

Consider this test case:

https://github.com/inukshuk/citeproc-ruby/blob/master/features/locale/ordinalize.feature

This is not legal CSL 1.0 because it contains ordinal definitions for 0, 11, 12, and 13; however, by changing the CSL to be valid (i.e., instead of ordinal-00 use ordinal-04 and remove ordinals 11, 12, 13), I don’t think it is possible to implement a processor that would produce the expected results in all cases (either 11, 12, 13 or 21, 22, 23 would be wrong).

The proposed change to the schema is ‘harmless’ in that does not render CSL 1.0 styles invalid; furthermore, CSL 1.0 processors would still be able to process 1.0.1 styles as if they were 1.0 styles (although they would not pass schema validation).

As regards the issue of gendered ordinals, I’ve just added a test to illustrate a possible use case; perhaps Rintze or Frank could confirm if this is correct?

https://github.com/inukshuk/citeproc-ruby/blob/master/features/locale/gender.feature

Sylvester

I would certainly support including this in a 1.0.1 release. Currently
citeproc-js handles ordinals with an algorithm that is hard-coded to
the English convention, with special handling for ordinals 11-13.
Sylvester’s approach would make it possible to use a truly general
algorithm capable of covering all languages. It’s a big win.

Frank

Whether this can be introduced in 1.0.1 depends on how the ordinals are
defined. Currently, the ordinal index is included in the element name (e.g.
“13” in ). This is somewhat limiting for
validation, as, as far as I know, you’d have explicitly define each ordinal
in the schema (and I guess we don’t know the index range that will be
needed, although we could just pick a high number). A more flexible approach
would be , but this obviously breaks with
the convention in CSL 1.0.

RintzeOn Sat, Apr 30, 2011 at 5:44 AM, Sylvester Keil <@Sylvester_Keil>wrote:

True. A flexible approach would be much preferable over having to enumerate hundreds of ordinals.

Incidentally, are there any languages with a very inconsistent ordinal paradigm? Or languages other than English which more exceptions than three?

Sylvester

We might want to request a sticky in the Styles category of the Zotero
forums to discuss this proposal. There are some linguists/locals there that
might know.

RintzeOn Sat, Apr 30, 2011 at 9:32 AM, Sylvester Keil <@Sylvester_Keil>wrote:

as a regular expression. Given that we currently don’t provide any
dedicated documentation for each of those values, there’s reasonable
basis for making the change.

Bruce

Ah, right. I was overlooking that “ordinal-13” is just an attribute-value,
so regular expressions should work. So I guess we can keep the existing
scheme.

Rintze

This seemingly “little” issue has become much more complicated, and
I’d really like input from as many implementers as possible on this
(currently Zotero has weighed, but AFAIK, they’re the only one to
support the ‘sub verbo’ locator):

If you don’t weight, then don’t be surprised if you get bug reports
from your users at some point. Likewise, you could conceivably get
unexpected behavior now because of this bug.

Bruce