Macro expansions

I’ve just discovered the interesting fact that macros can be called
before they are declared, while trying to load the Chicago Full Note
with Bibliography:

http://www.zotero.org/styles/chicago-fullnote-bibliography (search
for the “issued” macro)

This complicates things slightly, and I’ve started reworking a small
portion of citeproc-js to cope with it. But in the midst of the work,
it has occurred to me that it’s probably risky design to permit this
in the first place.

If the engine expects macros to be declared before they are called, a
style that puts things in the wrong sequence can be stopped with an
error that says “CSL parsing error: call to undeclared macro. Macros
must be declared before they are called. Please check the sequence of
macro definitions in the style file.”. The operator can then fix
things up by rearranging the macro blocks.

On the other hand, if the engine permits one macro be invoked within
another before the first is declared, circular calls are possible.
That will be caught either by a low-level loop limitation in the
platform running the engine (in Rhino, I think this would cough up a
Java error trace), or by the consumption of all available memory in
the machine running the processor. The error message in the former
case will provide a wealth of irrelevant information, and is unlikely
to point at the actual (quite simple) problem. The latter, of course
(speaking as a lawyer) could turn out have costly and embarrassing
side effects.

Unless circular calls can be easily caught by a validator, adopting a
simple requirement that macros be declared before they are invoked in
the style (either directly or via another macro) seems like it would
be a friendly policy in the long run.

(This is an area where my lack of formal CS training might show
through; if there are use cases that would be blocked by this, I’ll
stand corrected. But on the face of it, it doesn’t look as though
anything would be lost.)

My 2 cents, anyway.

Frank

If the engine expects macros to be declared before they are called, a
style that puts things in the wrong sequence can be stopped with an
error that says “CSL parsing error: call to undeclared macro. Macros
must be declared before they are called. Please check the sequence of
macro definitions in the style file.”. The operator can then fix
things up by rearranging the macro blocks.

Does that address the following?

Bruce

If the engine expects macros to be declared before they are called, a
style that puts things in the wrong sequence can be stopped with an
error that says “CSL parsing error: call to undeclared macro. Macros
must be declared before they are called. Please check the sequence of
macro definitions in the style file.”. The operator can then fix
things up by rearranging the macro blocks.

Does that address the following?

Maybe I should have written, “The operator can then fix things up by
rearranging the macro blocks, or by eliminating obvious errors in the
style.” :slight_smile: It depends on the design of the processor, I guess, but
in citeproc-js that one would be caught along with the rest.

You could also keep track of the stack of running macros and give an
error if a macro already in the stack gets called again.

Simon

You could also keep track of the stack of running macros and give an error
if a macro already in the stack gets called again.

Good idea. Checked in.

Frank2009/3/21 Simon Kornblith <@Simon_Kornblith>:

I’ve just discovered the interesting fact that macros can be called
before they are declared, while trying to load the Chicago Full Note
with Bibliography:

http://www.zotero.org/styles/chicago-fullnote-bibliography (search
for the “issued” macro)

This complicates things slightly, and I’ve started reworking a small
portion of citeproc-js to cope with it. But in the midst of the work,
it has occurred to me that it’s probably risky design to permit this
in the first place.

If the engine expects macros to be declared before they are called, a
style that puts things in the wrong sequence can be stopped with an
error that says “CSL parsing error: call to undeclared macro. Macros
must be declared before they are called. Please check the sequence of
macro definitions in the style file.”. The operator can then fix
things up by rearranging the macro blocks.

This is an interesting procedural bias… since Haskell is a
functional (and purely functional) language, there is no after nor
before, and there is no sequence: you just have a set of relations and
after parsing all of them you start evaluating them.

I would think about a CSL style in the same terms.

Letting aside computational paradigms, and after admitting I didn’t
have the time to dig into citeproc-js, I wonder if you are
instantiating each macro with the bibliographic data as soon as you
parse the style. If yes, what happens, for instance, when
disambiguation comes into the picture? If adding names and given-names
(suppose there’s no year-suffix option set), you then need to
re-evaluate the style by setting the “disambiguate” conditional to
true. Does this work without reading the style again?

In my experience the main issue when dealing with the overall
performance of the engine comes with disambiguation.

But, once again, keep in mind I didn’t read your code carefully and
thoroughly.

Unless circular calls can be easily caught by a validator, adopting a
simple requirement that macros be declared before they are invoked in
the style (either directly or via another macro) seems like it would
be a friendly policy in the long run.

I don’t like this approach. CSL is a (functional and not a procedural)
programming language and should be expressive enough to let people
write bullshit, like infinite loops that make a computer explode. I
don’t like any kind of paternalistic approach. As a lawyer, I mean.
:wink:

My 2 cents, anyway.

My 2 cents too.
Andrea

Andrea,

Thanks for your response. It’s not something I feel strongly about in
principle. It just seems like a practical issue of risk that could be
controlled at low cost through validation. I’ve doodled a few
responses below, but I just want to flag here at the top that I’m
happy to flow with the consensus on this as far as CSL is concerned.

I’ve just discovered the interesting fact that macros can be called
before they are declared, while trying to load the Chicago Full Note
with Bibliography:

http://www.zotero.org/styles/chicago-fullnote-bibliography (search
for the “issued” macro)

This complicates things slightly, and I’ve started reworking a small
portion of citeproc-js to cope with it. But in the midst of the work,
it has occurred to me that it’s probably risky design to permit this
in the first place.

If the engine expects macros to be declared before they are called, a
style that puts things in the wrong sequence can be stopped with an
error that says “CSL parsing error: call to undeclared macro. Macros
must be declared before they are called. Please check the sequence of
macro definitions in the style file.”. The operator can then fix
things up by rearranging the macro blocks.

This is an interesting procedural bias… since Haskell is a
functional (and purely functional) language, there is no after nor
before, and there is no sequence: you just have a set of relations and
after parsing all of them you start evaluating them.

I would think about a CSL style in the same terms.

I would really like to learn more about Haskell. :slight_smile:

Letting aside computational paradigms, and after admitting I didn’t
have the time to dig into citeproc-js, I wonder if you are
instantiating each macro with the bibliographic data as soon as you
parse the style.

No, the engine is instantiated just once, before any data is read.

If yes, what happens, for instance, when
disambiguation comes into the picture? If adding names and given-names
(suppose there’s no year-suffix option set), you then need to
re-evaluate the style by setting the “disambiguate” conditional to
true. Does this work without reading the style again?

Yes and no. When the engine is instantiated, the elements and
attributes of the style are reduced to discrete functions stored on a
token list (roughly one token per XML tag). A data item is rendered
by applying the token list to the item, and returning the result as a
string. The engine itself persists across multiple renderings.

Bibliographic items are filed in a registry object inside the engine
as they are seen. The registry fixes bibliography sort order (the
same machinery is used for the ephemeral sort of cites within a
citation), and identifies disambiguation partners for individual cites
based on their “un-disambiguated” form. The registry can be updated
at any time during a session (to add items and to identify deletions).

When rendering a cite that has disambiguation partners, the engine
first checks the partner set for a taint flag, to see whether it has
been resolved since the last change affecting a member of the set in
the registry. If the set is tainted (i.e. has not yet been
disambiguated), both the cite and its partners are re-rendered
repeatedly until the ambiguity is resolved (or resolution options are
exhausted). The disambiguation level discovered through re-rendering
is filed against the partner set in the registry. If the flag is
checked and the set is found to be not tainted (i.e. it has already
been disambiguated once), the style’s disambiguation resolution
functions are applied to the target cite alone, to the level filed in
the registry against the partner set.

The initial re-renderings will involve a small performance hit when
generating citations that require disambiguation, but this is limited
to the citations affected, and the operations only need to be
performed once per change to the set.

I’ve just discovered the interesting fact that macros can be called
before they are declared, while trying to load the Chicago Full Note
with Bibliography:

http://www.zotero.org/styles/chicago-fullnote-bibliography (search
for the “issued” macro)

This complicates things slightly, and I’ve started reworking a small
portion of citeproc-js to cope with it. But in the midst of the work,
it has occurred to me that it’s probably risky design to permit this
in the first place.

If the engine expects macros to be declared before they are called, a
style that puts things in the wrong sequence can be stopped with an
error that says “CSL parsing error: call to undeclared macro. Macros
must be declared before they are called. Please check the sequence of
macro definitions in the style file.”. The operator can then fix
things up by rearranging the macro blocks.

This is an interesting procedural bias… since Haskell is a
functional (and purely functional) language, there is no after nor
before, and there is no sequence: you just have a set of relations and
after parsing all of them you start evaluating them.

I would think about a CSL style in the same terms.

Letting aside computational paradigms, and after admitting I didn’t
have the time to dig into citeproc-js, I wonder if you are
instantiating each macro with the bibliographic data as soon as you
parse the style.

Like the other implementation details it’s not directly relevant to
the headline topic, but by way of explanation … in the case of
citeproc-js, macros are flattened into the token list in the
compilation phase. All that the engine sees during rendering is the
tokens/tags that they contain, plus an enclosing pair of quasi-group
tokens that handle the prefix, suffix, or font decoration attributes
of the text tag from which the macro was called. Any macros that are
defined but not called just disappear, when the build-time structures
are jettisoned. You can think of the overall system as a static
compiler, with the functions defined within citeproc-js itself as the
core library, and with the macros as an additional library to be
linked in producing the final “binary”. The engine is a
self-contained Javascript object, that no longer requires access to
citeproc-js in order to run; all you need is a JS interpreter and some
item data.

Let me make this conversation simple, then, and point out that AFAIK,
there’s no way to validate this in RELAX NG (or XSD). If I’m right,
then the only language that might be able to express this is
Schematron, which can certainly be layered into the RNG, but for what
real gain? At that point, it’d be more about documentation for
implementers, since it’s easy enough to check this in your code (per
Simon’s point).

I’d say, let’s forget about this then :wink:

This theoretical problem is the price to pay for the flexibility of
the macro structure.

Bruce

Thanks for your response. It’s not something I feel strongly about in
principle. It just seems like a practical issue of risk that could be
controlled at low cost through validation.

Let me make this conversation simple, then, and point out that AFAIK,
there’s no way to validate this in RELAX NG (or XSD). If I’m right,
then the only language that might be able to express this is
Schematron, which can certainly be layered into the RNG, but for what
real gain? At that point, it’d be more about documentation for
implementers, since it’s easy enough to check this in your code (per
Simon’s point).

I’d say, let’s forget about this then :wink:

No objection here.

So it’s okay to halt processing of a style that calls an
as-yet-undeclared macro? Or it’s wrong to do so? There’s a toggle in
my code, I just need to know whether to set it true or false. :slight_smile:

Simon’s note was about something different, I think – repeated
definitions of a macro of the same name. That’s never useful, and
catching it is just a friendly way of giving the style author a
heads-up that he or she may have made a boo-boo.

Frank

Thanks for your response. It’s not something I feel strongly about in
principle. It just seems like a practical issue of risk that could be
controlled at low cost through validation.

Let me make this conversation simple, then, and point out that AFAIK,
there’s no way to validate this in RELAX NG (or XSD). If I’m right,
then the only language that might be able to express this is
Schematron, which can certainly be layered into the RNG, but for what
real gain? At that point, it’d be more about documentation for
implementers, since it’s easy enough to check this in your code (per
Simon’s point).

I’d say, let’s forget about this then :wink:

No objection here.

So it’s okay to halt processing of a style that calls an
as-yet-undeclared macro? Or it’s wrong to do so? There’s a toggle in
my code, I just need to know whether to set it true or false. :slight_smile:

Simon’s note was about something different, I think – repeated
definitions of a macro of the same name. That’s never useful, and
catching it is just a friendly way of giving the style author a
heads-up that he or she may have made a boo-boo.

Ah, no, my bad. That was Simon’s point, wasn’t it. Will address
that. Issue closed.

Cheers,
Frank

So just to be clear:On Sat, Mar 21, 2009 at 6:47 AM, Frank Bennett <@Frank_Bennett> wrote:

So it’s okay to halt processing of a style that calls an as-yet-undeclared macro?

No. Your code should understand the macros as effectively an unordered list.

Bruce

So just to be clear:

So it’s okay to halt processing of a style that calls an as-yet-undeclared macro?

No. Your code should understand the macros as effectively an unordered list.

Done, and loop detection fix checked in. Sorry for the static on
this; as I said, my lack of CS training sometimes causes me to puzzle
over issues at the wrong level. Thanks to everyone for your patience.

Frank

Andrea,

Thanks for your response. It’s not something I feel strongly about in
principle.

This is an interesting procedural bias… since Haskell is a
functional (and purely functional) language, there is no after nor
before, and there is no sequence: you just have a set of relations and
after parsing all of them you start evaluating them.

Let me rephrase, then: it is due to a probably interesting functional
bias the fact that I’ve never happened to think about an order in the
macro definitions… :slight_smile:

I would think about a CSL style in the same terms.

I would really like to learn more about Haskell. :slight_smile:

I don’t know if this is going to boost your desire to learn Haskell,
but I wanted to learn it - and it has been quite a challenge - right
after studying (and falling in love with) javascript and reading the
stuff you can find here (probably you know it):

http://www.crockford.com/javascript/

Haskell is a love that made me forget, literally, any other
programming language, though. So, be careful!

(BTW, the best way to learn Haskell is to implement a simple
programming language, which usually happens to be functional, and
leads me back to the initial bias and forth to my idea of CSL)

Yes and no. When the engine is instantiated, the elements and
attributes of the style are reduced to discrete functions stored on a
token list (roughly one token per XML tag). A data item is rendered
by applying the token list to the item, and returning the result as a
string. The engine itself persists across multiple renderings.

[…]

Apart from the sort framework, very little of this has been written,
but it’s pretty straightforward, and I don’t expect any major problems
in putting it together (he said).

Thanks for this interesting overview.

My implementation reads a CSL file and transforms each element into a
(recursive, since elements may contain elements) data structure
(Element) which is passed to the evalElement function. The evaluation
occurs with access to a state (EvalState) which carries the reference
map (a map with all the CSL variables instantiated with the reference
data, read from a MODS string) and the environment (the named macros,
the map of the terms and the options, plus some empty data filled
during the evaluation process).

The evalElement function can be called (recursively) to evaluate any
chunk of a style (it is called, for instance, to evaluate the
element). Which means that evalElement is called to evaluate the
once for each cite.

The evaluation doesn’t produce a string yet, but a recursive data
structure (Output) which carries all the substitutions (elements get
substituted by the variable content) together with all the possible
disambiguated forms of all disambiguatable parts of a cite (according
to the options set). For instance a contributor is represented by the
Output constructor OContrib which, together with the produced output
([Output]) carrying the evaluated name, has all the possible forms of
the name produced by applying the disambiguation options of the style
([[Output]]).

This way disambiguation is carried out only by manipulating the output
produced by the evaluation of the element. A new evaluation
(you would call it re-rendering) would happen only if, given all the
disambiguation possibilities, a set of cites still presents some
collisions.

The disambiguated output is then processed for applying collapsing
options.

The result of the processing is a data structure (FormattedOutput)
which carries the strings a cite is made of, plus the formatting
options to be applied to each string (Formatting).

It is the rendering function (renderPlain or renderPandoc) which then
traverses the FormattedOutput to generate the final string (adding
rendering functions should be easy). This is my idea of an output
independent citeproc.