Why term-set?

Hello,

Why are some of the terms in the locales files wrapped in term-set?
Doesn’t make it easier. I liked the previous style much better.

Johan—
http://www.johankool.nl/

That was Simon’s design.

I think a term-set refers to a collection of role or locator terms.
E.g. in those cases, you aren’t doing a simple 1:1 mapping of a label
to a term (as the simple term elements), but generally conditional
lookup. The old approach didn’t account for that, and so would have
broken I think.

I’ve not looked at it yet closely, but it seemed fine to me. What’s the
problem, and what’s the (better) solution, given my comments above?

Bruce

My problem is that I now have to figure out in which set a term
occurs. E.g. is it in locators or in roles or in months or none of
these? Before I would simply put the string “month-01-short” into a
simple function that did the look up. Very simple.

I’d say we either go the full way and do it like this:

page pages paragraph paragraph p pp ¶ ¶¶

so, do it the whole way, or go back to flat. This halfway in between
is just annoying. Perhaps it should be <long|

<single|multiple> instead.

Johan

I’m going to let Simon respond here, and see if he can address your
concerns.

Bruce

I chose the approach because it’s extensible. “Long” and "short"
won’t necessarily cover everything. We also need “verb” for contributors
(e.g., “Edited By”), which poses a possible schema issue because we
shouldn’t have “verb” for locators. It’s also conceivable that we’ll
discover we need other term-sets in the future, and defining a new term-set
is much simpler than altering the schema to accommodate a new type of term.
and are data-dependent, but term-sets are
style-dependent, so the paradigm appears consistent to me.

To implement term-set in my code, I simply added an additional "term-set"
argument to the function that retrieves locale terms. In this fashion, it’s
exceedingly simple to map something like
into a term, easier than if there were separate / tags. Rather
than doing:

var month = getTerm(“month-01-short”);

I do:

var month = getTerm(“month-01”, “months-short”);

I suppose the term-set isn’t absolutely necessary for months, since no part
of the style directly references them, but I figured I might as well put
them there for consistency. If you disagree, feel free to take them out
(although there shouldn’t be any difference in complexity). We could also
change “month-01” to “01”; there’s no reason “month” has to be in the term
name once it’s in a term-set.

Hope this clears things up.

Simon

I chose the approach because it’s extensible. “Long” and “short”
won’t necessarily cover everything. We also need “verb” for contributors
(e.g., “Edited By”), which poses a possible schema issue because we
shouldn’t have “verb” for locators.

Actually, that’s no problem in RELAX NG. Just define separate patterns fro each.

It’s also conceivable that we’ll
discover we need other term-sets in the future, and defining a new term-set
is much simpler than altering the schema to accommodate a new type of term.

This is important. The less we have to hard-code in the schema, the
easier things will be long-term.

and are data-dependent, but term-sets are
style-dependent, so the paradigm appears consistent to me.

To implement term-set in my code, I simply added an additional “term-set”
argument to the function that retrieves locale terms. In this fashion, it’s
exceedingly simple to map something like
into a term, easier than if there were separate / tags. Rather
than doing:

var month = getTerm(“month-01-short”);

I do:

var month = getTerm(“month-01”, “months-short”);

Good of you to give the example code here. Getting rid of the “month”
makes it even easier, since you can just pull the value from the data
do

getTerm(month, “months-short”)

I suppose the term-set isn’t absolutely necessary for months, since no part
of the style directly references them, but I figured I might as well put
them there for consistency. If you disagree, feel free to take them out
(although there shouldn’t be any difference in complexity). We could also
change “month-01” to “01”; there’s no reason “month” has to be in the term
name once it’s in a term-set.

+1

Bruce

I’ve updated locales_nl.xml to reflect the proposed changes. Would
this be an acceptable format?

Johan

<?xml version="1.0" encoding="UTF-8"?> in ibid benaderd in voorbereiding Referenties en uit pagina pagina's p pp redacteur redacteurs red reds vertaler vertalers vert verts januari jan februari feb maart maa april apr mei mei june jun juli jul augustus aug september sep oktober okt november nov december dec

Simon; thoughts?

Bruce

  1. There’s no purpose to a tag if there are no separate sets of
    terms. If we do go this route, we should get rid of it.

  2. In its current form, this specification would need an extra tag to handle
    verb roles (e.g., Edited By). While this is not too big of a deal, it
    underscores the lack of extensibility inherent in the approach.

  3. For me, at least, this approach is no simpler and probably harder to code
    than the term-set approach.

An alternative that would address these concerns is:

in
ibid


pagina
pagina’s


p
pp

And then:

I’m not completely sure of the benefits this approach has over the current
one, but I’m happy to go along with it if it would make things easier for
others.

Simon

  1. There’s no purpose to a tag if there are no separate
    sets of
    terms. If we do go this route, we should get rid of it.

  2. In its current form, this specification would need an extra tag to
    handle
    verb roles (e.g., Edited By). While this is not too big of a deal, it
    underscores the lack of extensibility inherent in the approach.

Let me see if I understand the differences. Johan’s version is:

	<term-set name="roles">
		<term name="editor">
			<long>
				<single>redacteur</single>
				<multiple>redacteurs</multiple>
			</long>
			<short>
				<single>red</single>
				<multiple>reds</multiple>
			</short>
		</term>

Your’s does not have a separate elements for the different forms, but
instead has compound set names? E.g.

	<term-set name="roles">
	   <term name="editor">
	     <single>redacteur</single>
	     <multiple>redacteurs</multiple>
	   </term>
	   <term name="editor-short">
	     <single>red</single>
	     <multiple>reds</multiple>
	   </term>

Is that the difference?

An alternative that would address these concerns is:

in
ibid


pagina
pagina’s


p
pp

And then:

I agree this is better for the more structured approach.

I’m not completely sure of the benefits this approach has over the
current
one, but I’m happy to go along with it if it would make things easier
for
others.

Johan, care to elaborate on why we need a more structured approach?
E.g. what are the practical benefits?

Bruce

Actually, mine defines separate term-sets which are then referenced from the
label:

editor editors translator translators ed eds tran trans

And then:

But the basic principle is the same.

The benefit I can see to the more structured approach is that if a style
tries to get a form of a term that a given locales.xml file doesn’t support,
it can use a different form of the same term, rather than rolling over to
the English version. I’m not sure how much help this is, because either way
there’s a pretty big discrepancy between the desired output and the real
output, but it’s worth some thought.

Simon

Hello,

To answer a few remarks/questions.

if a style tries to get a form of a term that a given locales.xml
file doesn’t support, it can use a different form of the same term,
rather than rolling over to the English version.

That would be very odd and confusing behaviour. If I see English text
appearing in my output I know what’s wrong: localization is missing.
If I see text appearing I didn’t expect, it could be either a missing
translation or a misconfigured style. That sounds very confusing to
work with.

  1. There’s no purpose to a tag if there are no separate
    sets of
    terms. If we do go this route, we should get rid of it.

  2. In its current form, this specification would need an extra tag to
    handle
    verb roles (e.g., Edited By). While this is not too big of a deal, it
    underscores the lack of extensibility inherent in the approach.

My problem is ao that it is unhandy to have to construct a special
string to look up a text. Perhaps we could do by using the term-set
as before, but by splitting it up into 2 attributes. Or perhaps on
term-set instead of term.

	<term-set name="roles">
	   <term name="editor" form="long">
	     <single>redacteur</single>
	     <multiple>redacteurs</multiple>
	   </term>
	   <term name="editor" form="short">
	     <single>red</single>
	     <multiple>reds</multiple>
	   </term>

This is also much more flexible, but allows simpler ways to find
something in the file. Is that an idea?

Johan

Yes, it’s an idea. I’m pretty much agnostic about which is better
though.

Any other opinions?

Bruce

if a style tries to get a form of a term that a given locales.xml
file doesn’t support, it can use a different form of the same term,
rather than rolling over to the English version.

That would be very odd and confusing behaviour. If I see English text
appearing in my output I know what’s wrong: localization is missing.
If I see text appearing I didn’t expect, it could be either a missing
translation or a misconfigured style. That sounds very confusing to
work with.

Then it seems like, from a feature standpoint, there’s no advantage to a
more structured approach over the current one.

  1. There’s no purpose to a tag if there are no separate
    sets of
    terms. If we do go this route, we should get rid of it.

  2. In its current form, this specification would need an extra tag to
    handle
    verb roles (e.g., Edited By). While this is not too big of a deal, it
    underscores the lack of extensibility inherent in the approach.

My problem is ao that it is unhandy to have to construct a special
string to look up a text. Perhaps we could do by using the term-set
as before, but by splitting it up into 2 attributes. Or perhaps on
term-set instead of term.

redacteur redacteurs red reds

This is also much more flexible, but allows simpler ways to find
something in the file. Is that an idea?

Again, the is useless if you’re going to put the form on each
term.

I still don’t exactly see where the difficulty exists in implementing term
sets, and I wonder if you’ve misunderstood the concept. You should know
exactly what term-set to look in at all times from the “term-set” attribute
on . (I think I only updated Chicago and APA, but it’s trivial to
add the attribute to the other styles.) You should never have to loop
through all of the sets. What makes the approach you gave above any easier
to implement? I am trying to think of an XML API that would make this
difficult, but I can’t.

Simon

Yeah, I think what he’s suggesting would mean in fact this:

… while your suggestion was just:

I agree the difference is trivial. I guess in the absence of a clear
reason why the latter is problematic, I’d stay with the second.

I suppose one advantage to the former is that the schema is then a
little simpler (because there’s then less “name” options).

Bruce

Ok. Keep it the way it is then. It seems odd to me to be
concatenating attributes into one attribute, but well… it’s not
such a big a deal that I am going to care that much about it. :slight_smile:

Cheers,

Johan

I’m going to look at which of the two options is easier to implement
(and maintain) in the schema, and then decide based on that.

Bruce

I’ve changed the format to something that should satisfy Johan’s objections
while further simplifying the schema. Now, we simply have:

<term name="page">
  <single>page</single>
  <multiple>pages</multiple>
</term>
<term name="page" form="short">
  <single>p</single>
  <multiple>pp</multiple>
</term>

Labels are now:

All we need in the schema is:

attribute form { token }?

Or, if we want to formalize the what’s currently in the XML file within the
schema:

attribute form { “short” | “verb” }?

As in other places in CSL (e.g., contributors and titles), form defaults to
long, then switches to short if specified. I figured that this makes more
sense if we need a term in the short form ( makes
more sense than ). If there are any complaints,
I’m willing to revert to the old version.

Simon

Thanks Simon,

I think it looks good this way. Esp. vs. .

Johan

Labels are now:

How do you indicate the short verb form of contributors?

All we need in the schema is:

attribute form { token }?

Or, if we want to formalize the what’s currently in the XML file
within the
schema:

attribute form { “short” | “verb” }?

All of this stuff should be formalized. If there’s a reason to allow
extension independent of explicit extension in the schema (not the case
here I believe), then we build in an extension points like so:

  1. define a pattern for the options; say “label-forms”

  2. add an extension pattern like so:

labels-forms = “short” | “long” | label-forms.extension
label-forms-extension = notAllowed

A custom schema can then override the latter with custom terms.

There are few places in CSL where this is necessary though.

Bruce