CSL Group handling

As Bruce is aware, but probably most others are not, I’m working on a version of CiteProc for PHP and am just trying to understand the processing if “groups”.

The text that is confusing is (from the specifications.txt):

“In addition, cs:group acts as an conditional: if none
of the enclosed elements reference a non-empty variable or macro,
‘decorating’ cs:text elements that output verbatim text or
terms (e.g. <text value="some text"> and
<text term="editor">) are ignored.”

Says if “none” of the enclosed references a non-empty variable, the decorating elements are ignored.

Shouldn’t they be ignored if they reference an empty variable?

TIA,_________________________________________
Ron Jerome

Basically, if all variable content within a group is empty, then
nothing gets printed. A common use case might be where you want to
print "retrieved from " a URL. Makes no sense to print that text if
there is no such variable.

We probably need to change the language of this to be more clear
(double negatives and such)? Any suggestions?

Also, I’ve also previously explained this is syntactic sugar for a
more complex cs:choose element, and included an example. Should we add
that as well (I’m guessing, though, that Frank has added such an
example to the test suite, so maybe not)?

Bruce

I agree that the current description is unclear. Would the following do?:On Thu, Feb 11, 2010 at 3:31 PM, Bruce D’Arcus <@Bruce_D_Arcus1> wrote:

On Thu, Feb 11, 2010 at 8:57 AM, Jerome, Ron <@Jerome_Ron> > wrote:

As Bruce is aware, but probably most others are not, I’m working on a
version of CiteProc for PHP and am just trying to understand the processing
if “groups”.

The text that is confusing is (from the specifications.txt):

“In addition, cs:group acts as an conditional: if none
of the enclosed elements reference a non-empty variable or macro,
‘decorating’ cs:text elements that output verbatim text or
terms (e.g. <text value="some text"> and
<text term="editor">) are ignored.”

Says if “none” of the enclosed references a non-empty variable, the
decorating elements are ignored.

Shouldn’t they be ignored if they reference an empty variable?

Basically, if all variable content within a group is empty, then
nothing gets printed. A common use case might be where you want to
print "retrieved from " a URL. Makes no sense to print that text if
there is no such variable.

We probably need to change the language of this to be more clear
(double negatives and such)? Any suggestions?


Note that cs:group implicitly acts as a conditional: the cs:group
element and its child elements are only processed if at least one of the
rendering elements included in the cs:group element call a non-empty
variable (either directly or via a macro). For example,

would result in “(Published by: Company A)” when the publisher variable is
set to “Company A”, but wouldn’t generate any output when the publisher
variable is empty.


Rintze

Ok, that’s what I though. My comment then would be this… (I didn’t search the archives on this topic , so forgive me if this has already been beaten to death)

I think group processing would be a lot more efficient if the dependent variable (or macro) were attached to the group tag as an attribute. That way I don’t have don’t have to sort through the group only to find out I don’t have anything to do.

Ron.From: Rintze Zelle [mailto:@Rintze_Zelle]
Sent: February 11, 2010 9:45 AM
To: development discussion for xbiblio
Subject: Re: [xbiblio-devel] CSL Group handling

On Thu, Feb 11, 2010 at 3:31 PM, Bruce D’Arcus <@Bruce_D_Arcus1mailto:Bruce_D_Arcus1> wrote:
On Thu, Feb 11, 2010 at 8:57 AM, Jerome, Ron <@Jerome_Ronmailto:Jerome_Ron> wrote:

As Bruce is aware, but probably most others are not, I’m working on a version of CiteProc for PHP and am just trying to understand the processing if “groups”.

The text that is confusing is (from the specifications.txt):

“In addition, cs:group acts as an conditional: if none
of the enclosed elements reference a non-empty variable or macro,
‘decorating’ cs:text elements that output verbatim text or
terms (e.g. <text value="some text"> and
<text term="editor">) are ignored.”

Says if “none” of the enclosed references a non-empty variable, the decorating elements are ignored.

Shouldn’t they be ignored if they reference an empty variable?
Basically, if all variable content within a group is empty, then
nothing gets printed. A common use case might be where you want to
print "retrieved from " a URL. Makes no sense to print that text if
there is no such variable.

We probably need to change the language of this to be more clear
(double negatives and such)? Any suggestions?

I agree that the current description is unclear. Would the following do?:


Note that cs:group implicitly acts as a conditional: the cs:group element and its child elements are only processed if at least one of the rendering elements included in the cs:group element call a non-empty variable (either directly or via a macro). For example,

would result in “(Published by: Company A)” when the publisher variable is set to “Company A”, but wouldn’t generate any output when the publisher variable is empty.


Rintze

Basically, if all variable content within a group is empty, then
nothing gets printed. A common use case might be where you want to
print "retrieved from " a URL. Makes no sense to print that text if
there is no such variable.

Not to belabor this point, but this one is a bit tricky.

If you have something like this…

  <group delimiter=". ">
    <text macro="contributors" />
    <text macro="title" />
    <text macro="description" />
    <text macro="secondary-contributors" />
    <group delimiter=", ">
      <text macro="container-title" />
      <text macro="container-contributors" />
      <text macro="locators-chapter" />
    </group>
  </group>

Then any non-empty items (macros) are printed, but if you have something like this…

      <group>
        <text form="short" suffix=". " term="volume" text-case="capitalize-first" />
        <number form="numeric" variable="volume" />
      </group>

The first “text” object will always be non-empty, but it’s (implicitly) dependant on the second “number” object. This seems to break the rule that any non-empty group item is printed.

Am in a hurry, so will simply suggest that Frank would be a good
candidate to handle this :wink:

He’s in Asia, so will probably drop by a little later.

Basically, if all variable content within a group is empty, then
nothing gets printed. A common use case might be where you want to
print "retrieved from " a URL. Makes no sense to print that text if
there is no such variable.

Not to belabor this point, but this one is a bit tricky.

If you have something like this…

 <group delimiter=". ">
   <text macro="contributors" />
   <text macro="title" />
   <text macro="description" />
   <text macro="secondary-contributors" />
   <group delimiter=", ">
     <text macro="container-title" />
     <text macro="container-contributors" />
     <text macro="locators-chapter" />
   </group>
 </group>

Then any non-empty items (macros) are printed, but if you have something like this…

     <group>
       <text form="short" suffix=". " term="volume" text-case="capitalize-first" />
       <number form="numeric" variable="volume" />
     </group>

The first “text” object will always be non-empty, but it’s (implicitly) dependant on the second “number” object. This seems to break the rule that any non-empty group item is printed.

Hi, Jerome,

Groups are one place where things start getting interesting. As you
note, the output decision requires inspection of all variable calls
within the group, including those nested within one or more macros,
and you might reach the end of a complex structure only to discover
that nothing should be done.

The priority for the spec is to keep the style definition as compact
and readable as possible for the style designer. In several areas
(including this implicit conditional behavior of groups) this can
force design compromises or workarounds within the implementation. I
didn’t always see it this way, but given the large and unavoidable
recurrent burden born by style maintainers, it’s right for the spec
design to come first, with implementators to cope as best we can.

In your example above, the cs:text element is not calling a variable,
so it is what you might call a “servient” node (i.e. it serves the
same role as a cs:label element). If the “dominant” cs:number node
produces no output, the servient node output should be suppressed.

Not sure if chatter about another implementation is useful, but in
citeproc-js, group handling induced me to completely rework the way
output is produced. I originally was catenating output to a single
string while processing each node, but to handle this “postponed”
case, I shifted to a nested representation of the output elements,
from which elements can be “snipped out” at the close of processing
for a group node. The rest is pretty straightforward (or should be –
my code could use some cleanup here). Nodes which successfully call a
variable raise a “has output” flag on a stack that tracks the nesting
level of the group, and if the flag is false when the group closes,
any lingering content is removed from the output object.

Hope this helps,
Frank

Basically, if all variable content within a group is empty, then
nothing gets printed. A common use case might be where you want to
print "retrieved from " a URL. Makes no sense to print that text if
there is no such variable.

Not to belabor this point, but this one is a bit tricky.

If you have something like this…

 <group delimiter=". ">
   <text macro="contributors" />
   <text macro="title" />
   <text macro="description" />
   <text macro="secondary-contributors" />
   <group delimiter=", ">
     <text macro="container-title" />
     <text macro="container-contributors" />
     <text macro="locators-chapter" />
   </group>
 </group>

Then any non-empty items (macros) are printed, but if you have something like this…

     <group>
       <text form="short" suffix=". " term="volume" text-case="capitalize-first" />
       <number form="numeric" variable="volume" />
     </group>

The first “text” object will always be non-empty, but it’s (implicitly) dependant on the second “number” object. This seems to break the rule that any non-empty group item is printed.

Hi, Jerome,

(Ron: after posting I realized I had your given and family names
reversed. My apologies.)

Hi Frank,

Not sure if chatter about another implementation is useful, but in
citeproc-js, group handling induced me to completely rework the way
output is produced. I originally was catenating output to a single
string while processing each node, but to handle this “postponed”
case, I shifted to a nested representation of the output elements,
from which elements can be “snipped out” at the close of processing
for a group node. The rest is pretty straightforward (or should be –
my code could use some cleanup here). Nodes which successfully call a
variable raise a “has output” flag on a stack that tracks the nesting
level of the group, and if the flag is false when the group closes,
any lingering content is removed from the output object.

Hmmm, sounds like I’m going down the same road, and hitting the same brick wall (ouch). I have reviewed your code (although I don’t claim to be knowledgeable in JavaScript, I live mainly in the C, C++ and PHP worlds), and I’m impressed by it’s completeness, but I must admit that I was having some difficulty tying to figure out what you were doing with all your stacks, queues and flags. I think I now have a better understanding what’s going on.

In my formulation, I’ve created a class for each CSL tag, all of which are sub-classes of a container class (among others). Thus any “CSL” object can “contain” any number of other “CSL” objects, and so on in a hierarchical structure the same as the CSL input file. Each class (well to be accurate, only those which are sub classes of the “rendering_element” class) has a render() function, which calls the render() function on any of its children, and so on down the line concatenating parts of the citation as it goes. So basically once you’ve ingested the CSL file and built the class structure, you just call render($data) on the layout object of the bibliography or citation class and it returns a fully rendered citation (or so the theory goes :-).

It is actually working to a large degree, but of course there is also a fair amount of tuning left to be done.

Cheers,

Ron.

Hi, Jerome,

(Ron: after posting I realized I had your given and family names
reversed. My apologies.)

Don’t you just love author name handling :slight_smile:

In my formulation, I’ve created a class for each CSL tag, all of which are sub-classes of a container class (among others). Thus any “CSL” object can “contain” any number of other “CSL” objects, and so on in a hierarchical structure the same as the CSL input file. Each class (well to be accurate, only those which are sub classes of the “rendering_element” class) has a render() function, which calls the render() function on any of its children, and so on down the line concatenating parts of the citation as it goes. So basically once you’ve ingested the CSL file and built the class structure, you just call render($data) on the layout object of the bibliography or citation class and it returns a fully rendered citation (or so the theory goes :-).

That’s more or less what I’m doing in my code …

http://github.com/bdarcus/citeproc-py/blob/master/citeproc.py

… except I’m using the Python ElementTree objects to hold the
rendered content, rather than rolling my own. This has the obvious
benefit that it’s easy to serialize it directly to X(HT)ML.

I’ve not tackled cs:group yet, but my earlier idea was to transform
the tree into a cs:choose tree internally, and then just use the
existing (working) function for dealing with that. That might not be
the smartest approach though :wink:

It is actually working to a large degree, but of course there is also a fair amount of tuning left to be done.

Great news.

Bruce

That’s more or less what I’m doing in my code …

http://github.com/bdarcus/citeproc-py/blob/master/citeproc.py

… except I’m using the Python ElementTree objects to hold the
rendered content, rather than rolling my own. This has the obvious
benefit that it’s easy to serialize it directly to X(HT)ML.

Cool, I hadn’t even looked at that until now, but your right, it does look very similar.

Ron

Hi Frank,

Not sure if chatter about another implementation is useful, but in
citeproc-js, group handling induced me to completely rework the way
output is produced. I originally was catenating output to a single
string while processing each node, but to handle this “postponed”
case, I shifted to a nested representation of the output elements,
from which elements can be “snipped out” at the close of processing
for a group node. The rest is pretty straightforward (or should be –
my code could use some cleanup here). Nodes which successfully call a
variable raise a “has output” flag on a stack that tracks the nesting
level of the group, and if the flag is false when the group closes,
any lingering content is removed from the output object.

Hmmm, sounds like I’m going down the same road, and hitting the same brick wall (ouch). I have reviewed your code (although I don’t claim to be knowledgeable in JavaScript, I live mainly in the C, C++ and PHP worlds), and I’m impressed by it’s completeness, but I must admit that I was having some difficulty tying to figure out what you were doing with all your stacks, queues and flags. I think I now have a better understanding what’s going on.

Sometime in the middle of last year I started jonesing to get it
finished, and as a result, well … thank goodness for the triple
forces of version control, unit testing and XML validation.

In my formulation, I’ve created a class for each CSL tag, all of which are sub-classes of a container class (among others). Thus any “CSL” object can “contain” any number of other “CSL” objects, and so on in a hierarchical structure the same as the CSL input file. Each class (well to be accurate, only those which are sub classes of the “rendering_element” class) has a render() function, which calls the render() function on any of its children, and so on down the line concatenating parts of the citation as it goes. So basically once you’ve ingested the CSL file and built the class structure, you just call render($data) on the layout object of the bibliography or citation class and it returns a fully rendered citation (or so the theory goes :-).

Building the internal representation of the style as a properly nested
set of nodes is definitely a better idea than what I’ve currently got
in citeproc-js.

It is actually working to a large degree, but of course there is also a fair amount of tuning left to be done.

From memory, some of the more entertaining bits to watch out for are:
citation collapsing with custom joins; second-field-align;
sentence-case capitalization of terms rendered before any other
content; and per-locale punctuation/quote swapping.

Frank