Pluralization of ordinary item field terms

Gracile has requested that the term associated with the "issue"
variable turn to plural when appropriate:

http://forums.zotero.org/discussion/15085/french-localization-csl-10/#Comment_74961

The cs:label node currently accepts only page or locator as variable
values; the terms for issue and volume can only be rendered via a
cs:text node. Terms rendered via cs:text only accept a “true” or
"false" value on the plural attribute; “contextual” is available only
on the cs:label node.

It looks like adding volume and issue as possible values for the
variable attribute on cs:label would cover this use case.

Frank

Fine with me.

RintzeOn Wed, Nov 10, 2010 at 7:29 PM, Frank Bennett <@Frank_Bennett>wrote:

Hello all,

I subscribed to the list too late to reply properly to the first two messages. Anyway, thanks for addressing this so quickly!
I would add that “folio” has a plural-short-form too, in French (singular: fo; plural: fos): even if the use case is not as frequent as for issue, that could be useful to implement that too.

Indirectly related is the question of gender agreement of ordinal numbers. Rintze has proposed a solution to that issue: http://forums.zotero.org/discussion/15085/french-localization-csl-10/#Item_7 . It would solve the same problem in Spanish (See: http://forums.zotero.org/discussion/10859/errors-in-spanish-bibliographylocale/#Item_27 ) and in other languages probably. What do you think?

Thanks to all for your work,
Gracile>On Wed, Nov 10, 2010 at 7:29 PM, Frank Bennett <@Frank_Bennett>wrote:

Gracile has requested that the term associated with the "issue"
variable turn to plural when appropriate:

http://forums.zotero.org/discussion/15085/french-localization-csl-10/#Comment_74961

The cs:label node currently accepts only page or locator as variable
values; the terms for issue and volume can only be rendered via a
cs:text node. Terms rendered via cs:text only accept a “true” or
"false" value on the plural attribute; “contextual” is available only
on the cs:label node.

It looks like adding volume and issue as possible values for the
variable attribute on cs:label would cover this use case.

Fine with me.

Rintze

Hello all,

I subscribed to the list too late to reply properly to the first two
messages. Anyway, thanks for addressing this so quickly!
I would add that “folio” has a plural-short-form too, in French (singular:
fo; plural: fos): even if the use case is not as
frequent as for issue, that could be useful to implement that too.

Not sure if this explanation is needed, but … Internally (in Zotero
and in citeproc-js), the “fields” shown in the plugin pinpoint list
aren’t actually variables, they’re just labels that can be applied to
the “locator” variable (which holds the value typed into the pinpoint
textbox itself). The label will adapt automatically to the user
selection, if called in CSL via the cs:label element with
variable=“locator”. In that case, it will also automatically
pluralize as appropriate to the field content, if a plural form of the
label is available in the selected locale.

Frank

Gracile has requested that the term associated with the "issue"
variable turn to plural when appropriate:

http://forums.zotero.org/discussion/15085/french-localization-csl-10/#Comment_74961

The cs:label node currently accepts only page or locator as variable
values; the terms for issue and volume can only be rendered via a
cs:text node. Terms rendered via cs:text only accept a “true” or
"false" value on the plural attribute; “contextual” is available only
on the cs:label node.

It looks like adding volume and issue as possible values for the
variable attribute on cs:label would cover this use case.

Fine with me.

Rintze

Hello all,

I subscribed to the list too late to reply properly to the first two
messages. Anyway, thanks for addressing this so quickly!
I would add that “folio” has a plural-short-form too, in French (singular:
fo; plural: fos): even if the use case is not as
frequent as for issue, that could be useful to implement that too.

Not sure if this explanation is needed, but … Internally (in Zotero
and in citeproc-js), the “fields” shown in the plugin pinpoint list
aren’t actually variables, they’re just labels that can be applied to
the “locator” variable (which holds the value typed into the pinpoint
textbox itself). The label will adapt automatically to the user
selection, if called in CSL via the cs:label element with
variable=“locator”. In that case, it will also automatically
pluralize as appropriate to the field content, if a plural form of the
label is available in the selected locale.

Having sent the explanation above, I’m now quite certain that it was
not needed. Life of the party, as usual. Sorry for the extra
traffic.

Indirectly related is the question of gender agreement of ordinal numbers.
Rintze has proposed a solution to that issue:

. It would solve the same problem in Spanish (See:

) and in other languages probably. What do you think?

I searched the xbiblio list and Zotero forums (for “gender”, “feminine”,
“masculine”, “female” and “male”), and the only related thread I found was
this one:
http://sourceforge.net/mailarchive/message.php?msg_id=53208a5f0901070524k25391321i204d65ace07282d7%40mail.gmail.com[[xbiblio-devel]
Style-specific ordinals]

The thread discusses Romanian, which has the same gender discrimination in
assigning ordinals as French and Spanish, and includes a proposal by Frank
for handling the same issue. In his approach, (gender-specific) ordinals are
directly linked to specific variables, e.g.:On Thu, Nov 11, 2010 at 5:26 PM, G C <@G_C> wrote:


-al
-a

whereas my proposal (also see the Zotero thread) explicitly allows gender to
be assigned to any term:


édition éditions janvier

re
er

I think either way will work (with the necessary magic in citeproc-js), but
in my self-confidence I’ll proclaim I like my solution somewhat better. Any
thoughts? I’d also like to stress the existence of the January/janvier case,
which is a bit more complex, as the ordinal is actually attached to the day
instead of the month (“1er janvier”).

Rintze

Indirectly related is the question of gender agreement of ordinal numbers.
Rintze has proposed a solution to that issue:

http://forums.zotero.org/discussion/15085/french-localization-csl-10/#Item_7
. It would solve the same problem in Spanish (See:

http://forums.zotero.org/discussion/10859/errors-in-spanish-bibliographylocale/#Item_27
) and in other languages probably. What do you think?

I searched the xbiblio list and Zotero forums (for “gender”, “feminine”,
“masculine”, “female” and “male”), and the only related thread I found was
this one:
http://sourceforge.net/mailarchive/message.php?msg_id=53208a5f0901070524k25391321i204d65ace07282d7%40mail.gmail.com
[[xbiblio-devel] Style-specific ordinals]

The thread discusses Romanian, which has the same gender discrimination in
assigning ordinals as French and Spanish, and includes a proposal by Frank
for handling the same issue. In his approach, (gender-specific) ordinals are
directly linked to specific variables, e.g.:


-al
-a

whereas my proposal (also see the Zotero thread) explicitly allows gender to
be assigned to any term:


édition éditions janvier

re
er

I think either way will work (with the necessary magic in citeproc-js), but
in my self-confidence I’ll proclaim I like my solution somewhat better.

I agree. Explicit is better than implicit.

Hi,

I think either way will work (with the necessary magic in citeproc-js), but
in my self-confidence I’ll proclaim I like my solution somewhat better.

I agree. Explicit is better than implicit.

From Zen of Python?
http://www.python.org/dev/peps/pep-0020/

Or just:
carles@pinux:~$ python
[…]

import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
… (some more)

I should go to sleep soon…

Thinking about the implementation of this, I think the schema change would
be very simple:

term-attributes =
attribute form { cs-term-forms }?,
  • attribute gender { “feminine” | “masculine” }?,
    attribute name { cs-terms }

A possible addition for the specification, that describes the behavior:—

Some languages (e.g. French, Spanish) use gender-specific long-ordinals
(e.g. the French “premier” (masculine) and “première” (feminine)) and
ordinal suffixes (e.g. “1re” (feminine) and “1er” (masculine)). For this
reason, cs:term may carry the “gender” attribute (with values “feminine” or
"masculine"). If cs:number calls a variable (e.g. “edition”) with the “form"
attribute set to “ordinal” or “long-ordinal”, and the “gender” attribute is
set on the “long” (default) form of the matching term (in this case
"edition”), gender-specific ordinal (suffix) terms are preferentially used
when available. For cs:date, the behavior is similar, except that here the
"day" date-part in the “ordinal” form is preferentially rendered with the
appropriate gender specific ordinal (suffix) term if a gender is set on the
rendered month term (e.g. allowing for “1er janvier”).


So in this description, there is fallback to gender-unspecific ordinals, and
the French and Spanish locales should contain both gender-specific and
unspecific ordinal terms. The only changes would be to the locales (and
optionally to cs:locale inside CSL styles), so this should work quite
cleanly.

Rintze

Thinking about the implementation of this, I think the schema change would
be very simple:

term-attributes =
attribute form { cs-term-forms }?,
  • attribute gender { “feminine” | “masculine” }?,
    attribute name { cs-terms }

A possible addition for the specification, that describes the behavior:


Some languages (e.g. French, Spanish) use gender-specific long-ordinals
(e.g. the French “premier” (masculine) and “première” (feminine)) and
ordinal suffixes (e.g. “1re” (feminine) and “1er” (masculine)). For this
reason, cs:term may carry the “gender” attribute (with values “feminine” or
"masculine"). If cs:number calls a variable (e.g. “edition”) with the “form"
attribute set to “ordinal” or “long-ordinal”, and the “gender” attribute is
set on the “long” (default) form of the matching term (in this case
"edition”), gender-specific ordinal (suffix) terms are preferentially used
when available. For cs:date, the behavior is similar, except that here the
"day" date-part in the “ordinal” form is preferentially rendered with the
appropriate gender specific ordinal (suffix) term if a gender is set on the
rendered month term (e.g. allowing for “1er janvier”).


So in this description, there is fallback to gender-unspecific ordinals, and
the French and Spanish locales should contain both gender-specific and
unspecific ordinal terms. The only changes would be to the locales (and
optionally to cs:locale inside CSL styles), so this should work quite
cleanly.

This looks very good. I’ve opened at ticket against citeproc-js, and
will have a go at implementation and testing.

Frank

Thinking about the implementation of this, I think the schema change would
be very simple:

term-attributes =
attribute form { cs-term-forms }?,
  • attribute gender { “feminine” | “masculine” }?,
    attribute name { cs-terms }

A possible addition for the specification, that describes the behavior:


Some languages (e.g. French, Spanish) use gender-specific long-ordinals
(e.g. the French “premier” (masculine) and “première” (feminine)) and
ordinal suffixes (e.g. “1re” (feminine) and “1er” (masculine)). For this
reason, cs:term may carry the “gender” attribute (with values “feminine” or
"masculine"). If cs:number calls a variable (e.g. “edition”) with the “form"
attribute set to “ordinal” or “long-ordinal”, and the “gender” attribute is
set on the “long” (default) form of the matching term (in this case
"edition”), gender-specific ordinal (suffix) terms are preferentially used
when available. For cs:date, the behavior is similar, except that here the
"day" date-part in the “ordinal” form is preferentially rendered with the
appropriate gender specific ordinal (suffix) term if a gender is set on the
rendered month term (e.g. allowing for “1er janvier”).


So in this description, there is fallback to gender-unspecific ordinals, and
the French and Spanish locales should contain both gender-specific and
unspecific ordinal terms. The only changes would be to the locales (and
optionally to cs:locale inside CSL styles), so this should work quite
cleanly.

A question, though, about fallback behavior. Suppose the "edition"
term is defined in long and in short form, but only the long form has
a long + masculine form defined:

editionMASC
edition
edn

When the edition variable is rendered with form=“ordinal”, should the
gender be drawn from the long form of the edition label, or from the
short form?

Frank

Thinking about the implementation of this, I think the schema change would
be very simple:

term-attributes =
attribute form { cs-term-forms }?,
  • attribute gender { “feminine” | “masculine” }?,
    attribute name { cs-terms }

A possible addition for the specification, that describes the behavior:


Some languages (e.g. French, Spanish) use gender-specific long-ordinals
(e.g. the French “premier” (masculine) and “première” (feminine)) and
ordinal suffixes (e.g. “1re” (feminine) and “1er” (masculine)). For this
reason, cs:term may carry the “gender” attribute (with values “feminine” or
"masculine"). If cs:number calls a variable (e.g. “edition”) with the “form"
attribute set to “ordinal” or “long-ordinal”, and the “gender” attribute is
set on the “long” (default) form of the matching term (in this case
"edition”), gender-specific ordinal (suffix) terms are preferentially used
when available. For cs:date, the behavior is similar, except that here the
"day" date-part in the “ordinal” form is preferentially rendered with the
appropriate gender specific ordinal (suffix) term if a gender is set on the
rendered month term (e.g. allowing for “1er janvier”).


So in this description, there is fallback to gender-unspecific ordinals, and
the French and Spanish locales should contain both gender-specific and
unspecific ordinal terms. The only changes would be to the locales (and
optionally to cs:locale inside CSL styles), so this should work quite
cleanly.

A question, though, about fallback behavior. Suppose the "edition"
term is defined in long and in short form, but only the long form has
a long + masculine form defined:

editionMASC
edition
edn

When the edition variable is rendered with form=“ordinal”, should the
gender be drawn from the long form of the edition label, or from the
short form?

(Sorry, I think the middle cs:term in my example – long form with no
gender – should be deleted. Question still applies, though.)

I’m not an expert in French or Spanish, but I wrote the spec description
under the assumption that the gender of a word (in form=“long”) and its
abbreviated form (in form=“short”) are always the same. This is implicit in
the spec description, as I suggested that we only test against the "long"
form of the matching variable. This behavior seems desirable to me as a) it
will be easier to maintain the CSL locale files, as it won’t be necessary to
set the gender on both the “long” and “short” forms of a term and b)
citeproc will never encounter conflicts, e.g. in cases where the “short"
form has a different gender than the “long” form (different could also mean
"not set/undefined”, as in your example above).

RintzeOn Thu, Nov 18, 2010 at 7:24 PM, Frank Bennett <@Frank_Bennett>wrote:

Thinking about the implementation of this, I think the schema change would
be very simple:

term-attributes =
attribute form { cs-term-forms }?,
  • attribute gender { “feminine” | “masculine” }?,
    attribute name { cs-terms }

A possible addition for the specification, that describes the behavior:


Some languages (e.g. French, Spanish) use gender-specific long-ordinals
(e.g. the French “premier” (masculine) and “première” (feminine)) and
ordinal suffixes (e.g. “1re” (feminine) and “1er” (masculine)). For this
reason, cs:term may carry the “gender” attribute (with values “feminine” or
"masculine"). If cs:number calls a variable (e.g. “edition”) with the “form"
attribute set to “ordinal” or “long-ordinal”, and the “gender” attribute is
set on the “long” (default) form of the matching term (in this case
"edition”), gender-specific ordinal (suffix) terms are preferentially used
when available. For cs:date, the behavior is similar, except that here the
"day" date-part in the “ordinal” form is preferentially rendered with the
appropriate gender specific ordinal (suffix) term if a gender is set on the
rendered month term (e.g. allowing for “1er janvier”).


So in this description, there is fallback to gender-unspecific ordinals, and
the French and Spanish locales should contain both gender-specific and
unspecific ordinal terms. The only changes would be to the locales (and
optionally to cs:locale inside CSL styles), so this should work quite
cleanly.

A question, though, about fallback behavior. Suppose the "edition"
term is defined in long and in short form, but only the long form has
a long + masculine form defined:

editionMASC
edition
edn

When the edition variable is rendered with form=“ordinal”, should the
gender be drawn from the long form of the edition label, or from the
short form?

(Sorry, I think the middle cs:term in my example – long form with no
gender – should be deleted. Question still applies, though.)

Hmm. After playing around with implementing this for a few minutes,
I’m starting to think that there is something to be said for my
original proposal.

There are two problems with Rintze’s approach. One is that the
"gender" attribute is being used in two different senses on cs:term.

Some languages (e.g. French, Spanish) use gender-specific long-ordinals
(e.g. the French “premier” (masculine) and “première” (feminine)) and
ordinal suffixes (e.g. “1re” (feminine) and “1er” (masculine)). For
this
reason, cs:term may carry the “gender” attribute (with values
"feminine" or
"masculine"). If cs:number calls a variable (e.g. “edition”) with the
"form"
attribute set to “ordinal” or “long-ordinal”, and the “gender"
attribute is
set on the “long” (default) form of the matching term (in this case
"edition”), gender-specific ordinal (suffix) terms are preferentially
used
when available. For cs:date, the behavior is similar, except that here
the
"day" date-part in the “ordinal” form is preferentially rendered with
the
appropriate gender specific ordinal (suffix) term if a gender is set on
the
rendered month term (e.g. allowing for “1er janvier”).


So in this description, there is fallback to gender-unspecific
ordinals, and
the French and Spanish locales should contain both gender-specific and
unspecific ordinal terms. The only changes would be to the locales (and
optionally to cs:locale inside CSL styles), so this should work quite
cleanly.

A question, though, about fallback behavior. Suppose the "edition"
term is defined in long and in short form, but only the long form has
a long + masculine form defined:

editionMASC
edition
edn

When the edition variable is rendered with form=“ordinal”, should the
gender be drawn from the long form of the edition label, or from the
short form?

(Sorry, I think the middle cs:term in my example – long form with no
gender – should be deleted. Question still applies, though.)

I’m not an expert in French or Spanish, but I wrote the spec description
under the assumption that the gender of a word (in form=“long”) and its
abbreviated form (in form=“short”) are always the same.

Yep, I’d make the same assumption. That was a question about how to
handle broken on incomplete data, since one can imagine a
masculine/feminine/neuter locale that might work okay – accidentally
or not – with some missing term variants, depending on how fallback
operates. We wouldn’t want a locale that works okay with one
processor to break when run on another. Including the fallback case
in the tests will help keep all of the processors in line.

Frank and I had an off-list discussion, and we think things might be easier
to implement if we use two attributes. Our new proposal is that
"gender-form" would be used to indicate different gender forms (for the
ordinal and long-ordinal terms), while “gender” would be used to specify the
gender of terms like “edition”. Examples of how this would work are given in
the tests:

http://bitbucket.org/bdarcus/citeproc-test/src/388cdeedd0cf/processor-tests/humans/number_EditionOrdinalMasculine.txt
http://bitbucket.org/bdarcus/citeproc-test/src/388cdeedd0cf/processor-tests/humans/number_DateOrdinalMasculine.txt

As Frank already indicated, the main benefit is with redefining terms in
styles using cs:locale. Terms carrying “gender-form” (see A below) would
overwrite the matching “gender-form” variant of the matching term in the CSL
locale(s), whereas terms carrying “gender” (see B) overwrite the value of
the “gender” attribute of the matching term in the CSL locale(s). This is an
important distinction, as for the ordinal and long-ordinal terms we need to
allow for a set of genders (“feminine”, “masculine” and undefined), while
for terms like “edition” we want to set a single gender. I hope this is all
clear, but otherwise Frank will probably be happy to chip in.

A)

stMASC

B)

edition

RintzeOn Thu, Nov 18, 2010 at 9:17 PM, Frank Bennett <@Frank_Bennett>wrote:

Frank Bennett-3 wrote:

Gracile has requested that the term associated with the "issue"
variable turn to plural when appropriate:

Plurals can get a bit messier if we turn to Slavic languages. I can speak
with confidence only about Czech and Russian, but similar behavior shows up
in other Slavic languages.

Here’s the basic behavior (in Czech without diacritics, Russian in ALA-LC
transliteration)
Singular:
English - Czech - Russian
1 page = 1 stranka = 1 stranitsa
page 1 = stranka 1 = stranitsa 1

Genitive singular:
2 pages = 2 stranky = 2 stranitsy
3 pages = 3 stranky = 3 stranitsy
4 pages = 4 stranky = 4 stranitsy

Genitive plural:
5 pages = 5 stranek = 5 stranits

Plural:
pages 1-2 = stranky 1-2 = stranitsy 1-2

In Czech, the noun takes the genitive plural form for all numbers 5 and up.
In Russian, however, the form is dictated by the last part of higher
numbers.
Thus:
1 => singular, 2-4 => gen. sg., 5-20 => gen. pl.,
21 => sg., 22-24 => gen. sg., 25-30 => gen. pl.,

101 => sg., 102-104 => gen. sg., 105-120 => gen. pl.,
121 => sg., 122-124 => gen. sg., 125-130 => gen. pl.
And so on.

This isn’t pretty, and it doesn’t show up at all in short forms, and it
matters only with number-of-pages and number-of-volumes (that is, when the
number modifies the noun), but it’s still sometimes required.

Just for consideration.

Avram–
View this message in context: http://xbiblio-devel.2463403.n2.nabble.com/Pluralization-of-ordinary-item-field-terms-tp5727124p5998231.html
Sent from the xbiblio-devel mailing list archive at Nabble.com.