There are some ambiguities (I think) and limitations of is-numeric
that I think we should resolve.
Per the spec, is-numeric
has the following behavior:
Tests whether the given variables (Appendix IV - Variables) contain numeric content. Content is considered numeric if it solely consists of numbers. Numbers may have prefixes and suffixes (“D2”, “2b”, “L2d”), and may be separated by a comma, hyphen, or ampersand, with or without spaces (“2, 3”, “2-4”, “2 & 4”). For example, “2nd” tests “true” whereas “second” and “2nd edition” test “false”.
I’m not exactly sure this description achieves the desired goals for the test.
The main applications of is-numeric
are:
- test for things like
edition: Revised ed.
versusedition: 2
to control formatting (e.g., ordinalizing and adding “ed.” foredition: 2
or adding the prefix “No.” beforenumber: 484
ornumber: 5R01HD081252-04
). - Extracting the integer number content for sorting.
For sorting, this is the relevant spec text for sorting about numbers:
numbers: Number variables called via the variable attribute are returned as integers (form is “numeric”). If the original variable value only consists of non-numeric text, the value is returned as a text string.
Number variables rendered within the macro with cs:number and date variables are treated the same as when they are called via variable. The only exception is that the complete date is returned if a date variable is called via the variable attribute. In contrast, macros return only those date-parts that would otherwise be rendered (respecting the value of the date-parts attribute for localized dates, or the listing of cs:date-part elements for non-localized dates).
“A2” should be sorted before “B1”, which from my read contradicts the spec.
For rendering, here are five potential variables:
edition: 2
edition: Revised ed.
number: 484
number: Season 4, Episode 3
number: 5R01HD081252-04
The first two are really clear, and I think this was the main impetus behind is-numeric
. Those easily fit the pattern:
<if is-numeric="edition">
<group delimiter=" ">
<number variable="edition" form="ordinal-short"/>
<label variable="edition" form="short"/>
</group>
</if>
<else>
<text variable="edition"/>
</else>
The third is also clearly a number, and the fourth is also clearly not a number. But I think the fifth be treated like a number for formatting.
The case I am thinking about at the moment is this that I want to test whether number
is numeric to control whether “No.” is added the number. The desired output for 3-5 above is:
3. No. 484
4. Season 4, Episode 3
5. No. 5R01HD081252-04
There currently isn’t a way to get (5) to be formatted like (3) rather than like (4).
Proposal
I think a much simpler spec for is-numeric
would be:
Content is considered numeric if all contained words include one or more numbers. Numbers may have prefixes and suffixes (“D2”, “2b”, “L2d”), may be separated by non-numeric characters (e.g., “5R01HD081252-04”), and may be separated by a comma, hyphen, or ampersand, with or without spaces (“2, 3”, “2-4”, “2 & 4”). Content is considered non-numeric if any word consists solely of non-numeric characters (e.g., “second”, “revised edition”, “number 3”).
I think this fits all of the needs correctly. If not, if we keep the existing is-numeric
spec, I suggest we add is-numberlike
to cover the above case.
Revised sorting text would be:
numbers: If a number variable consists solely of numbers, commas, hyphens, ampersands, and spaces, when called via the variable attribute, it is returned as the first integer before a space or punctuation (form is “numeric”). If the number variable value contains any other characters, the value is returned as a text string.