Functional CSL processor?

In my opinion this is a good idea. It would be faster, probably, but
more importantly it would be easier to maintain. I started work on
something like what I imagine you have in mind but got bogged down
with other work & with trying to figure out Zotero’s internals.
Attached if you are interested.

Cool!

The best thing to do might be to write something like this from
scratch as Bruce says, as pure javascript, & then to see if it
couldn’t be integrated into Zotero. Certainly I found that learning
Zotero internals at the same time that you are trying to write new
code is difficult.

A couple of things:

First, I was thinking that JQuery might help on the sort of basic
parsing and XML support that E4X provides for the current Zotero code
(and in browsers, could help with additional functionality).

Second, WRT to the generic vs. Zotero specific issue, the approach
that most CSL implementations take is to define an independent data
representation and then write different input drivers to map to that,
and different output drivers to get the output (XHTML, ODF, TeX, RTF,
etc.). So, in other words, I’d expect that a rewritten cite.js (or
csl.js) file would not know anything about Zotero.

Bruce

Apropos of the message I just replied to, here’s the parent message,
just in case Erik’s code is of interest to someone …---------- Forwarded message ----------
From: Erik Hetzner xxx@gmail.com
Date: Sun, Jan 25, 2009 at 11:19 AM
Subject: Re: Functional CSL processor?
To: zotero-dev@googlegroups.com

In my opinion this is a good idea. It would be faster, probably, but
more importantly it would be easier to maintain. I started work on
something like what I imagine you have in mind but got bogged down
with other work & with trying to figure out Zotero’s internals.
Attached if you are interested.

The best thing to do might be to write something like this from
scratch as Bruce says, as pure javascript, & then to see if it
couldn’t be integrated into Zotero. Certainly I found that learning
Zotero internals at the same time that you are trying to write new
code is difficult.

-Erik

On Sun, Jan 25, 2009 at 4:18 AM, Bruce D’Arcus <@Bruce_D_Arcus1> wrote:

On Sun, Jan 25, 2009 at 2:15 AM, Frank Bennett <@Frank_Bennett> wrote:

[…]

I’m just daydreaming on this, but would that be correct? Or is it six
of one and half a dozen of the other, in terms of performance?
[…]

cite2.js (6.87 KB)

The Zotero CSL parser is certainly not the cleanest piece of code, as
it has evolved from a much different version of CSL to its current
representation. Cleaning it up is a good idea, and I would be very
receptive to attempts to do so. I haven’t gotten around to it because
I have very little time to contribute at the moment, and at this point
the parser is fairly stable.

I don’t think either JQuery or E4X ought to be a requirement for a
functional CSL parser. The current parser’s use of the E4X predicate
filter is minimal and could be trivially avoided. The main advantage
to E4X is that the whole CSL can be manipulated as an object. If your
approach is to compile everything from the start, DOM XML should be
sufficient, although probably a little messier and more annoying.

The sample code that Eric wrote is very nice, and much cleaner than
the current implementation! If there is interest in continuing this
work, I would be happy to provide help/clarification where necessary.
The biggest sticking point that I foresee is the element
of . If macros are compiled independently of the CSL, then to
implement this feature properly, one must keep track of another state
besides item/citation. This can certainly be worked out, but it’s
probably better to plan out a solution in advance than to run into
this issue later in the coding process.

Simon

Unless I’m missing something, this explanation doesn’t seem to address
the tricky part of substitution in such a model. If a substitution is
made, it should prevent the variable that has been substituted from
being displayed later in the citation. If I have a macro

and a bibliography entry

<layout suffix=".">
  <text macro="author" suffix="."/>
  <text macro="issued" suffix=" "/>
  <text macro="title"/>
  <names variable="editor"><-! ... --></names>
</layout>

then, if there is no author, the second occurrence of the editor
variable should be ignored, and the editor should only be printed
once. This requires maintaining some kind of state regarding what
should be ignored. This construction remains from the first days of
CSL, and we could replace it with conditionals, but to do so would
require 20+ lines extra lines of not particularly intuitive logic for
most author-date styles.

Unless I’m missing something, this explanation doesn’t seem to address
the tricky part of substitution in such a model. If a substitution is
made, it should prevent the variable that has been substituted from
being displayed later in the citation. If I have a macro

and a bibliography entry

<layout suffix=".">
  <text macro="author" suffix="."/>
  <text macro="issued" suffix=" "/>
  <text macro="title"/>
  <names variable="editor"><-! ... --></names>
</layout>

then, if there is no author, the second occurrence of the editor
variable should be ignored, and the editor should only be printed
once. This requires maintaining some kind of state regarding what
should be ignored. This construction remains from the first days of
CSL, and we could replace it with conditionals, but to do so would
require 20+ lines extra lines of not particularly intuitive logic for
most author-date styles.

I don’t know if this is for me… anyway, the Haskell implementation
does print the editor variable only once. Variables are consumed once
used, and no variable can be printed twice.

the relevant bits:

evalElement :: Element -> State EvalState [Output]
evalElement el
[…]
> Substitute (e:els) <- el = ifEmpty (consuming $ evalElement e)
(getFirst els) id

do you see ‘consuming’? it does the trick.

Hope this helps,
Andrea