Is-numeric behavior, new is-numberlike test?

There are some ambiguities (I think) and limitations of is-numeric that I think we should resolve.

Per the spec, is-numeric has the following behavior:

Tests whether the given variables (Appendix IV - Variables) contain numeric content. Content is considered numeric if it solely consists of numbers. Numbers may have prefixes and suffixes (“D2”, “2b”, “L2d”), and may be separated by a comma, hyphen, or ampersand, with or without spaces (“2, 3”, “2-4”, “2 & 4”). For example, “2nd” tests “true” whereas “second” and “2nd edition” test “false”.

I’m not exactly sure this description achieves the desired goals for the test.

The main applications of is-numeric are:

  1. test for things like edition: Revised ed. versus edition: 2 to control formatting (e.g., ordinalizing and adding “ed.” for edition: 2 or adding the prefix “No.” before number: 484 or number: 5R01HD081252-04).
  2. Extracting the integer number content for sorting.

For sorting, this is the relevant spec text for sorting about numbers:

numbers: Number variables called via the variable attribute are returned as integers (form is “numeric”). If the original variable value only consists of non-numeric text, the value is returned as a text string.

Number variables rendered within the macro with cs:number and date variables are treated the same as when they are called via variable. The only exception is that the complete date is returned if a date variable is called via the variable attribute. In contrast, macros return only those date-parts that would otherwise be rendered (respecting the value of the date-parts attribute for localized dates, or the listing of cs:date-part elements for non-localized dates).

“A2” should be sorted before “B1”, which from my read contradicts the spec.

For rendering, here are five potential variables:

  1. edition: 2
  2. edition: Revised ed.
  3. number: 484
  4. number: Season 4, Episode 3
  5. number: 5R01HD081252-04

The first two are really clear, and I think this was the main impetus behind is-numeric. Those easily fit the pattern:

<if is-numeric="edition">
  <group delimiter=" ">
    <number variable="edition" form="ordinal-short"/>
    <label variable="edition" form="short"/>
  </group>
</if>
<else>
  <text variable="edition"/>
</else>

The third is also clearly a number, and the fourth is also clearly not a number. But I think the fifth be treated like a number for formatting.

The case I am thinking about at the moment is this that I want to test whether number is numeric to control whether “No.” is added the number. The desired output for 3-5 above is:
3. No. 484
4. Season 4, Episode 3
5. No. 5R01HD081252-04

There currently isn’t a way to get (5) to be formatted like (3) rather than like (4).

Proposal

I think a much simpler spec for is-numeric would be:

Content is considered numeric if all contained words include one or more numbers. Numbers may have prefixes and suffixes (“D2”, “2b”, “L2d”), may be separated by non-numeric characters (e.g., “5R01HD081252-04”), and may be separated by a comma, hyphen, or ampersand, with or without spaces (“2, 3”, “2-4”, “2 & 4”). Content is considered non-numeric if any word consists solely of non-numeric characters (e.g., “second”, “revised edition”, “number 3”).

I think this fits all of the needs correctly. If not, if we keep the existing is-numeric spec, I suggest we add is-numberlike to cover the above case.

Revised sorting text would be:

numbers: If a number variable consists solely of numbers, commas, hyphens, ampersands, and spaces, when called via the variable attribute, it is returned as the first integer before a space or punctuation (form is “numeric”). If the number variable value contains any other characters, the value is returned as a text string.

@Frank_Bennett @cormacrelf @PaulStanley @Rintze_Zelle

2 Likes

Another alternative, would be to recognize that “Season 4, Episode 3” is more like a page number for a chapter within a book than it is like a document number on a report and so store this sort of information in page and/or chapter-number instead.

1 Like

I do think “Season 3, Episode 4” should actually be saved more in a locator like structure (what have we settled on?), so that the terms can be localized. The equivalent would be a chapter in a multivolume book, right? “Volume 4, Chapter 3”?