Text elements

What about adding a new attribute to text elements (term, value) which are dependent on another variable to be output. I would propose a “dependent-variable” attribute, which would have to evalute to TRUE before the term or value would be output.

Again IMHO, this would greatly simplify processing logic.

Ron.

What about adding a new attribute to text elements (term, value) which are
dependent on another variable to be output. I would propose a
"dependent-variable" attribute, which would have to evalute to TRUE before
the term or value would be output.

Again IMHO, this would greatly simplify processing logic.

I don’t like to be the nay-sayer in the room, but I think this might
be a risky move. The conditions that cause failure of any variable to
render can turn on conditional branching, either inside the cs:names
element (through cs:substitute), or through a proper cs:choose node.
The only failsafe way to know whether a variable was rendered is to
run the conditions and find out. For a couple of (admittedly
contrived) examples:

The following will produce output if either the “author” or the
"editor" var is present:

The following will produce output only if both the “author” and the
"editor" vars are present:

With cs:group operating as an implicit conditional based on actual
output from at least one variable, the style designer only needs to
worry about grouped joins, and the conditional logic follows without
any additional coding (and associated debugging). With 180+ styles in
circulation, keeping things simple for style maintainers ends up being
a priority.

I’ve felt a bit of frustration myself over implementing some of these
features (a trawl through the archives will show that I wasn’t nearly
so nice about my own objections…). But I think on this one there’s
a fairly sound reason for retaining the current behavior of cs:group.

That’s only my take; Bruce, Rintze and others may have other views.

Frank

That’s only my take; Bruce, Rintze and others may have other views.

No; that makes sense, and you have the most recent experience
implementing this successfully.

I guess as a general rule, I’d suggest that before people make
specific suggestions on syntax changes in CSL, they first define the
problem, and see if other implementors have already found reasonable
ways around it. For example, for Ron, you write:

I would propose a “dependent-variable” attribute, which would have to evalute to TRUE before the term or value would be output… IMHO, this would greatly simplify processing logic

Can you give an example? And does Frank’s explanation alter your thinking?

Bruce

OK, I’m still trying to wrap my head around this stuff. Here’s the example (from the Vancouver style) that is giving me some problem…

 <group prefix=" " suffix=". ">
    <text term="in" suffix=": " text-case="capitalize-first" />
    <text macro="editor" />
    <text variable="container-title" />
 </group>

In this case, which takes precedence and why? Both of the text macros following the term depend on a variable in the data stream.

Does that mean the if either of the text elements following the text-term element are blank the entire group is discarded?

Ron.________________________________________
From: Frank Bennett [@Frank_Bennett]
Sent: Saturday, February 20, 2010 7:14 AM
To: development discussion for xbiblio
Subject: Re: [xbiblio-devel] Text elements

On Sat, Feb 20, 2010 at 10:36 AM, Jerome, Ron <@Jerome_Ron> wrote:

What about adding a new attribute to text elements (term, value) which are
dependent on another variable to be output. I would propose a
"dependent-variable" attribute, which would have to evalute to TRUE before
the term or value would be output.

Again IMHO, this would greatly simplify processing logic.

I don’t like to be the nay-sayer in the room, but I think this might
be a risky move. The conditions that cause failure of any variable to
render can turn on conditional branching, either inside the cs:names
element (through cs:substitute), or through a proper cs:choose node.
The only failsafe way to know whether a variable was rendered is to
run the conditions and find out. For a couple of (admittedly
contrived) examples:

The following will produce output if either the “author” or the
"editor" var is present:

The following will produce output only if both the “author” and the
"editor" vars are present:

With cs:group operating as an implicit conditional based on actual
output from at least one variable, the style designer only needs to
worry about grouped joins, and the conditional logic follows without
any additional coding (and associated debugging). With 180+ styles in
circulation, keeping things simple for style maintainers ends up being
a priority.

I’ve felt a bit of frustration myself over implementing some of these
features (a trawl through the archives will show that I wasn’t nearly
so nice about my own objections…). But I think on this one there’s
a fairly sound reason for retaining the current behavior of cs:group.

That’s only my take; Bruce, Rintze and others may have other views.

Frank

Ron.

Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev


xbiblio-devel mailing list
xbiblio-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel


Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev


xbiblio-devel mailing list
xbiblio-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

I guess as a general rule, I’d suggest that before people make
specific suggestions on syntax changes in CSL, they first define the
problem, and see if other implementors have already found reasonable
ways around it. For example, for Ron, you write:

I would propose a “dependent-variable” attribute, which would have to evalute to TRUE before the term or value would be output… IMHO, this would greatly simplify processing logic

Can you give an example? And does Frank’s explanation alter your thinking?

Fair enough, but there is also a reason that I’m looking at this from a processing efficiency point of view, and that is that my code will be running on the server side, which means that there could potentially be thousands of instances of CiteProc running simultaneously… To put that in perspective, picture running 1000 web browsers (or instances of Mendeley) simultaneously on your PC (or Mac :slight_smile: all rendering a bibliography list thne imagine what that might do to the processor utilization :slight_smile:

Ron.

No; it means if both of them are blank, the entire group is discarded.
You can consider it syntactic sugar for …

… except that we don’t (now) allow that “macro” attribute on the if
element (should we?).

Bruce

So:

a) is this macro-handling processing efficiency issue (which is the
issue here) a practical problem now? if yes …

b) is there something that could be done to optimize this without
having to change the CSL syntax?

Bruce

Maybe there is an issue with that partiular example then becuase in my case I have and editor but not a container-title which results in an extra ". " due to the group suffix being applied after the editor macro which also has a “.” suffix.

Ron.

Maybe there is an issue with that partiular example then becuase in my case I have and editor but not a container-title which results in an extra ". " due to the group suffix being applied after the editor macro which also has a “.” suffix.

There are two issues with that:

  1. I’d argue that this macro is poorly designed, and should not be
    including the closing period (should be adding it to the cs:text
    element that calls that macro)

  2. we’ve previously discussed rules to clean up punctuation like
    duplicate periods. I’m not sure whether that’s dropped off the radar,
    but that should probably be addressed in the spec (preferably with a
    replacement regular expression pattern?).

Bruce

I guess as a general rule, I’d suggest that before people make
specific suggestions on syntax changes in CSL, they first define the
problem, and see if other implementors have already found reasonable
ways around it. For example, for Ron, you write:

I would propose a “dependent-variable” attribute, which would have to evalute to TRUE before the term or value would be output… IMHO, this would greatly simplify processing logic

.>> Can you give an example? And does Frank’s explanation alter your thinking?.

Fair enough, but there is also a reason that I’m looking at this from a processing efficiency point of view, and that is that my code will be running on the server side, which means that there could >potentially be thousands of instances of CiteProc running simultaneously… To put that in perspective, picture running 1000 web browsers (or instances of Mendeley) simultaneously on your PC (or Mac :->) all rendering a bibliography list thne imagine what that might do to the processor utilization :slight_smile:

So:

a) is this macro-handling processing efficiency issue (which is the
issue here) a practical problem now? if yes …

I don’t have any quantitative data to support this right now, but it’s certainly possible to collect (which I will do) using debugger/profiler such as XDebug on the PHP code.

b) is there something that could be done to optimize this without
having to change the CSL syntax?

Anything is possible I guess, but I my point was that if some of the obvious processing logic can be handled once by the human building the CSL, then perhaps you could avoid having to figure it out over and over again in the code.

I realize that perhaps mine is a special case, but the other thing to remember with PHP is that with each page load I’m starting from scratch as opposed to a browser app or standalone app which can initialize all this stuff once and reuse it ad infinitum until the app is closed.

Cheers,

Ron.

I did some debugging and profiling of my Haskell implementation: the
only issues I really faced were related to disambiguation. This is the
only part that needed quite a lot of work to be optimized - and a lot
remains to be done.

But there can be language specific problems I’m not aware of, so take
these my 2 cents for their value.

Andrea

I did some debugging and profiling of my Haskell implementation: the
only issues I really faced were related to disambiguation. This is the
only part that needed quite a lot of work to be optimized - and a lot
remains to be done.

Hmmm, I haven’t even got to disambiguation yet :frowning: but I’ll keep that in mind.

Some preliminary runs in the profiler have both validated and invalidated my concerns :slight_smile:

Bootstrapping/initializing the object structure is taking about 90% - 95% of the execution time so I guess my concerns about rendering speed slow downs due to elements implicitly coupled to variables may be a none issue. I’m going to look into caching a serialized version of the fully bootstrapped object structure which will hopefully significantly reduce or eliminate the setup times.

Ron.

Maybe there is an issue with that partiular example then becuase in my case I have and editor but not a container-title which results in an extra ". " due to the group suffix being applied after the editor macro which also has a “.” suffix.

There are two issues with that:

  1. I’d argue that this macro is poorly designed, and should not be
    including the closing period (should be adding it to the cs:text
    element that calls that macro)

  2. we’ve previously discussed rules to clean up punctuation like
    duplicate periods. I’m not sure whether that’s dropped off the radar,
    but that should probably be addressed in the spec (preferably with a
    replacement regular expression pattern?).

Just made a trawl through the citeproc-js code to figure out what I
did about this. Apparently I abandoned regular expressions, and ended
up checking for an immediately-preceding content character that is
identical to the first character of a suffix or a following delimiter
used for a join. So “,” would become “,” and “…” would become "."
while “.,” or “.?” would get through.

If I recall correctly, one or more of the CMS flavors has an issue
with duplicate punctuation that would have been difficult to correct
in the style. Regular expressions wouldn’t work, because the
duplicate punctuation becomes visible in string form only after
decorations (italics, etc) have been applied to the content. In
citeproc-js, at least, It has to be caught earlier, when the cite is
still a nested bundle of JS objects.

The code that does the trick, complete with colorful comment, is here:

http://bitbucket.org/fbennett/citeproc-js/src/tip/src/queue.js#cl-465

Frank

OK, I’m still trying to wrap my head around this stuff. Here’s the example (from the Vancouver style) that is giving me some problem…

<group prefix=" " suffix=". ">
   <text term="in" suffix=": " text-case="capitalize-first" />
   <text macro="editor" />
   <text variable="container-title" />
</group>

In this case, which takes precedence and why? Both of the text macros following the term depend on a variable in the data stream.

Does that mean the if either of the text elements following the text-term element are blank the entire group is discarded

No; it means if both of them are blank, the entire group is discarded.
You can consider it syntactic sugar for …

… except that we don’t (now) allow that “macro” attribute on the if
element (should we?).

The cs:choose logic can be derived from the style syntax. If it will
help simplify processing, an app might refactor its styles to conform
to a more verbose, but more easily-digestible interim schema before
they go to the processor?