implementation question on macros

Just to bounce some ideas around …

So I’m (very) slowly working on citeproc-py, with the goal to make it
as simple as possible to maintain, extend, and debug.

http://github.com/bdarcus/citeproc-py

The approach I’m taking is explained in this function:

def process_bibliography(style, reference_list):
""“
With a Style and the list of References produce the FormattedOutput
for the bibliography.
”""
processed_bibliography = [[process_node(style_node, reference) for
style_node in style.bibliography.layout]
for reference in reference_list]

return(processed_bibliography)

So I have a nested list comprehension that iterates through style
nodes and reference items, and passes them on to a process_node
function, which returns a FormattedNode object. This object merges the
style node (formatting instructions) and the reference content.

Does this make sense?

Does it make sense to consider a macro call as in turn generating a
sublist of these FormattedNode objects?

So, in other words, if you have a style with three nodes: the first
and last we’ll call ‘a’ and ‘d’. The second is a macro call that in
turn has two text nodes, which we’ll call ‘b’ and c’.

My thought is the processed list looks like:

[a, [b, c], d]

Hence, to generate the final output, you flatten the list, and spit
out the pieces.

I ask because I haven’t even started on the difficult stuff (names and
such), but am assuming a broadly similar approach.

Bruce

Just to bounce some ideas around …

So I’m (very) slowly working on citeproc-py, with the goal to make it
as simple as possible to maintain, extend, and debug.

http://github.com/bdarcus/citeproc-py

The approach I’m taking is explained in this function:

def process_bibliography(style, reference_list):
“”"
With a Style and the list of References produce the FormattedOutput
for the bibliography.
“”"
processed_bibliography = [[process_node(style_node, reference) for
style_node in style.bibliography.layout]
for reference in reference_list]

return(processed_bibliography)

So I have a nested list comprehension that iterates through style
nodes and reference items, and passes them on to a process_node
function, which returns a FormattedNode object. This object merges the
style node (formatting instructions) and the reference content.

Does this make sense?

How does the iteration over style.bibliography.layout work? Does it
return a flat list, with start and end nodes expressed separately, or
is there a nested recursion effect in the function that I’m not
grasping?

It looks like this would hit each node of the style for every
reference, including failed conditional spans. Is that your
intention?

So I have a nested list comprehension that iterates through style
nodes and reference items, and passes them on to a process_node
function, which returns a FormattedNode object. This object merges the
style node (formatting instructions) and the reference content.

Does this make sense?

How does the iteration over style.bibliography.layout work? Does it
return a flat list, with start and end nodes expressed separately, or
is there a nested recursion effect in the function that I’m not
grasping?

At this point, I’ve only implemented the most basic support
(cs:text/@variable), but the idea is that it iterates through the
layout (or a related called macro), and when it hits a rendering
element (cs:text, cs:names, etc.) and the reference item contains the
data for that variable, it returns a FormattedNode object (or, as I’m
discussing here, list of these objects).

It looks like this would hit each node of the style for every
reference, including failed conditional spans. Is that your
intention?

I haven’t implemented the conditional yet, but I’d likely stop the
processing and move on if the condition isn’t met.

So no, that’s not the intention.

Bruce

So I have a nested list comprehension that iterates through style
nodes and reference items, and passes them on to a process_node
function, which returns a FormattedNode object. This object merges the
style node (formatting instructions) and the reference content.

Does this make sense?

How does the iteration over style.bibliography.layout work? Does it
return a flat list, with start and end nodes expressed separately, or
is there a nested recursion effect in the function that I’m not
grasping?

At this point, I’ve only implemented the most basic support
(cs:text/@variable), but the idea is that it iterates through the
layout (or a related called macro), and when it hits a rendering
element (cs:text, cs:names, etc.) and the reference item contains the
data for that variable, it returns a FormattedNode object (or, as I’m
discussing here, list of these objects).

When you implement group, I think you’ll encounter the issue that I’ve
described.

You mean processing styles nodes that I don’t need?

Given my answers to previous questions about group here, I’ll probably
treat it as more-or-less a wrapper for a choose statement. So
transform …

…into this internally:

if reference[‘publisher’] or reference[‘publisher-place’]:
# then iterate through the children

Would that solve it?

Bruce

When you implement group, I think you’ll encounter the issue that I’ve
described.

You mean processing styles nodes that I don’t need?

Sorry, I should have been more clear. That was a comment on two
items. One second was the conditional branching thing. The first
(and the item to which groups is relevant) is nested recursion.

The sample code is written as a straight list expansion, that recurses
over reference items, leading to a recursion over style nodes, leading
to evaluation of individual nodes. It looks like a flat list
operation, but the FormattedNode returned from the overall iteration
is going to be some sort of nested object that can be flattened into a
string. The first level is clear (iterating over reference items),
but with the remainder of the syntax as written, I don’t see how you
get nested recursion from iterating over style.bibliography.layout, or
how processing nodes front-to-back can return a nested FormattedNode
object for each reference item.

Could be there is some aggressive subclassing behind this that I’m not
grasping, but on the face if it, the syntax there looks like it won’t
produce the desired result.

Given my answers to previous questions about group here, I’ll probably
treat it as more-or-less a wrapper for a choose statement. So
transform …

…into this internally:

if reference[‘publisher’] or reference[‘publisher-place’]:
# then iterate through the children

Would that solve it?

Choice nodes don’t accept formatting, of course, so they can’t be used
as a complete drop-in replacement for group. But roughly speaking it
would have to be something like that. The tricky bit is how you get
at the variables in the children to feed to your condition --bearing
in mind that they might be buried in choice statements, groups of
their own, or a names node (or names substitute node), which might or
might not be rendered, depending on the value of the suppress-author
cite option. Coping with the choice and substitute issues means that
you can’t just test for presence of any mentioned variables on the
reference (as the syntax above does), since they might be passed over
when the nodes are fully processed.

When you implement group, I think you’ll encounter the issue that
I’ve
described.

You mean processing styles nodes that I don’t need?

Sorry, I should have been more clear. That was a comment on two
items. One second was the conditional branching thing. The first
(and the item to which groups is relevant) is nested recursion.

The sample code is written as a straight list expansion, that recurses
over reference items, leading to a recursion over style nodes, leading
to evaluation of individual nodes. It looks like a flat list
operation, but the FormattedNode returned from the overall iteration
is going to be some sort of nested object that can be flattened into a
string. The first level is clear (iterating over reference items),
but with the remainder of the syntax as written, I don’t see how you
get nested recursion from iterating over style.bibliography.layout, or
how processing nodes front-to-back can return a nested FormattedNode
object for each reference item.

The direction i’m going is, for example, that a text/@variable node
returns a FormattedNode object, while text/@macro returns a list of
them. That yields the nested list.

But maybe that is wrong, and it should be a different type of object.

When you implement group, I think you’ll encounter the issue that
I’ve
described.

You mean processing styles nodes that I don’t need?

Sorry, I should have been more clear. That was a comment on two
items. One second was the conditional branching thing. The first
(and the item to which groups is relevant) is nested recursion.

The sample code is written as a straight list expansion, that recurses
over reference items, leading to a recursion over style nodes, leading
to evaluation of individual nodes. It looks like a flat list
operation, but the FormattedNode returned from the overall iteration
is going to be some sort of nested object that can be flattened into a
string. The first level is clear (iterating over reference items),
but with the remainder of the syntax as written, I don’t see how you
get nested recursion from iterating over style.bibliography.layout, or
how processing nodes front-to-back can return a nested FormattedNode
object for each reference item.

The direction i’m going is, for example, that a text/@variable node
returns a FormattedNode object, while text/@macro returns a list of
them. That yields the nested list.

But maybe that is wrong, and it should be a different type of object.

Subclassing the list type for text/@macro (and group, and names, and
date) is probably the way to go.

The direction i’m going is, for example, that a text/@variable node
returns a FormattedNode object, while text/@macro returns a list of
them. That yields the nested list.

But maybe that is wrong, and it should be a different type of object.

Subclassing the list type for text/@macro (and group, and names, and
date) is probably the way to go.

Turns out I was fine just returning a native list for the macro call
and then flattening it all out at the end.

BTW, we need to talk again about cs:date and localization again soon.
We need to fix that for 1.0.

Bruce

The direction i’m going is, for example, that a text/@variable node
returns a FormattedNode object, while text/@macro returns a list of
them. That yields the nested list.

But maybe that is wrong, and it should be a different type of object.

Subclassing the list type for text/@macro (and group, and names, and
date) is probably the way to go.

Turns out I was fine just returning a native list for the macro call
and then flattening it all out at the end.

Sounds good. It’s not important if it works, but I’m still puzzled.

If text/@macro has formatting attributes, how they would find their
way into the output? I may be slipping a concept somewhere, but on
the face of the sample, I can’t see how there would be a place to
store those (i.e. the macro seems to be expressed as a native list
object, which would strip the formatting attributes – that was why I
thought subclassing would be needed).

BTW, we need to talk again about cs:date and localization again soon.
We need to fix that for 1.0.

Do you have a plan for the matching locale and layout markup for
dates? If I remember correctly, you proposed the layout markup
earlier, but it wasn’t made explicit what the supporting block in
locale would look like.

The direction i’m going is, for example, that a text/@variable node
returns a FormattedNode object, while text/@macro returns a list of
them. That yields the nested list.

But maybe that is wrong, and it should be a different type of object.

Subclassing the list type for text/@macro (and group, and names, and
date) is probably the way to go.

Turns out I was fine just returning a native list for the macro call
and then flattening it all out at the end.

Sounds good. It’s not important if it works, but I’m still puzzled.

If text/@macro has formatting attributes, how they would find their
way into the output?

Ah, crap :frowning:

So I’ll go back to the FormattedNodeList class.

I may be slipping a concept somewhere, but on
the face of the sample, I can’t see how there would be a place to
store those (i.e. the macro seems to be expressed as a native list
object, which would strip the formatting attributes – that was why I
thought subclassing would be needed).

Indeed.

BTW, we need to talk again about cs:date and localization again soon.
We need to fix that for 1.0.

Do you have a plan for the matching locale and layout markup for
dates? If I remember correctly, you proposed the layout markup
earlier, but it wasn’t made explicit what the supporting block in
locale would look like.

No, I haven’t thought about it since then. This is why I just mentioned it.

But my goal would be the simplest solution that works (is truly
international) :wink:

Bruce

If text/@macro has formatting attributes, how they would find their
way into the output?

Ah, crap :frowning:

So I’ll go back to the FormattedNodeList class.

Had an epiphany today and decided to ditch all this, and to use
ElementTree Element objects to deal with all of this nested data stuff
in an internal HTML + RDFa* representation. So dumping to HTML + RDFa
is just a simple tostring function call, and it can be also be easily
transformed to other formats.

Bruce

  • this is partly an experiment to see how easy it is to construct the
    structured RDFa from the CSL and the data. Might be a little ugly.

If text/@macro has formatting attributes, how they would find their
way into the output?

Ah, crap :frowning:

So I’ll go back to the FormattedNodeList class.

Had an epiphany today and decided to ditch all this, and to use
ElementTree Element objects to deal with all of this nested data stuff
in an internal HTML + RDFa* representation. So dumping to HTML + RDFa
is just a simple tostring function call, and it can be also be easily
transformed to other formats.

That sounds like a very useful experiment. citeproc-js does something
very odd in that phase, laying out the style as an executable list of
open, close and singleton tokens. As I’ve written before, I think it
probably would have been better to keep things as a nested expression
– but the existing format has the advantage that it’s very simple to
work with a flat token list when it comes to debugging, and it’s
easier to implement workarounds and dirty tricks. Lately, I’ve
started wondering whether it isn’t actually a sensible idea after all.
But an approach like you suggest will be much cleaner if it works.