names implementation

Bruce_D_Arcus1 · August 2, 2009, 10:07pm

Anyone care to comment on the best approach to implementing names
support, which:

a) includes the fact that sorting is different than display

b) accounts for the dreaded disambiguation support

Bruce

Frank_Bennett · August 2, 2009, 10:50pm

Anyone care to comment on the best approach to implementing names
support, which:

a) includes the fact that sorting is different than display

b) accounts for the dreaded disambiguation support

I’m not sure what the best approach is, but in citeproc-js, both sort
keys and trial strings for comparison during citation-based
disambiguation are generated by writing the data (a key for sort, or a
full citation for disambiguation) to the standard queue of output
“blobs”, and then flattening the output queue into a string variable
used for comparison. Then all that’s needed is a way of controlling
the shape of names (name order, number of names, etc) through some
sort of parameters fed to the method that invokes the rendering
machinery.

Names-based disambiguation can be done by maintaining a names
registry. The names registry in citeproc-js isn’t too awful, and
might be a worthwhile reference, at least:

http://bitbucket.org/fbennett/citeproc-js/src/tip/src/namereg.js

The registry is called when processing a name, to derive a formatting
parameter for each name (0, 1 or 2, for short, w/initial, or
fullname). If citation-based disambiguation is in force, that is then
used as the starting point for disambiguation. Otherwise, it’s just
used as-is.

I may have misunderstood the question, though. Is that responsive?

Frank

Bruce_D_Arcus1 · August 3, 2009, 4:31pm

Anyone care to comment on the best approach to implementing names
support, which:

a) includes the fact that sorting is different than display

b) accounts for the dreaded disambiguation support

I’m not sure what the best approach is, but in citeproc-js, both sort
keys and trial strings for comparison during citation-based
disambiguation are generated by writing the data (a key for sort, or a
full citation for disambiguation) to the standard queue of output
“blobs”, and then flattening the output queue into a string variable
used for comparison. Then all that’s needed is a way of controlling
the shape of names (name order, number of names, etc) through some
sort of parameters fed to the method that invokes the rendering
machinery.

In my original XSLT code, I had a function that generated a string for
comparison; something like:

“doe:jane;smith:john”

I was thinking of doing the same here, but CSL has evolved
considerably since I wrote that code.

Names-based disambiguation can be done by maintaining a names
registry. The names registry in citeproc-js isn’t too awful, and
might be a worthwhile reference, at least:

http://bitbucket.org/fbennett/citeproc-js/src/tip/src/namereg.js

I have a hard time reading the code, which is why I was hoping to sort
it out in English

The registry is called when processing a name, to derive a formatting
parameter for each name (0, 1 or 2, for short, w/initial, or
fullname). If citation-based disambiguation is in force, that is then
used as the starting point for disambiguation. Otherwise, it’s just
used as-is.

I may have misunderstood the question, though. Is that responsive?

Yeah, but I think I’m losing the details.

Bruce

Frank_Bennett · August 3, 2009, 11:39pm

Anyone care to comment on the best approach to implementing names
support, which:

a) includes the fact that sorting is different than display

b) accounts for the dreaded disambiguation support

[snip]

In my original XSLT code, I had a function that generated a string for
comparison; something like:

“doe:jane;smith:john”

I was thinking of doing the same here, but CSL has evolved
considerably since I wrote that code.

It sounds like the next step is to sort out exactly how output will be
produced – what I referred to earlier as the “output queue”, made up
of “blobs” (ill-chosen terms that I’ll admit probably have explanatory
power on a par with Klingon baby talk).

It looks like you’re instantiating the CSL style string as an
elementree object, and then passing the target node, and the data, to
a function that walks the tree. At each level, rendering starts by
casting a fresh elementree object, which is passed through to the
function that processes the next level. The freshly cast elementree
object returned at the end of the style tree walk will be an abstract
representation of the output. I think that’s how it’s set up, correct
me if I’m wrong.

The problem is with how to think about that output object. At the
moment you’re extending it by casting a single fresh output node,
tacking some data onto it, and returning it to the function instance
processing the parent style node. That works fine for text elements
rendering a variable, because one node in the style should produce
exactly one node in the output (basically – things get more
complicated with inline markup, but that’s for later).

Name nodes (names, name) in the style can’t be mapped one-to-one into
the output; the full output representation of a CSL names node needs
seven levels of nesting.[] So you need to extend the output object,
but casting fresh objects and appending them explicitly in each
process_ function will get unmanageable very quickly. The thing to
do is subclass elementree into a “cslOutput” (or whatever) class, and
provide methods like “append”, “openlevel”, and “closelevel” on the
output object itself (a global pointer inside the object can track
which node is the current “tip”).

Personally, I have real trouble getting my head around recursive
nesting, and every time I have to deal with one of these problems, it
drives me nuts trying to figure out what’s happening with the
navigation. From XML-world you probably have a natural feel for that
sort of thing, but for me, I would probably have to start by setting
up a test suite (PyUnit, or you mentioned Nose earlier), and start
extending a test suite in parallel with work on a simple output object
class. Once that’s done, it should be easier to think about how to
generate names.

[*] The attached working note was prepared while refactoring the
citeproc-js output queue several months ago.

Frank

names_nesting.txt (1.21 KB)

Bruce_D_Arcus1 · August 3, 2009, 11:51pm

…

It looks like you’re instantiating the CSL style string as an
elementree object, and then passing the target node, and the data, to
a function that walks the tree. At each level, rendering starts by
casting a fresh elementree object, which is passed through to the
function that processes the next level. The freshly cast elementree
object returned at the end of the style tree walk will be an abstract
representation of the output. I think that’s how it’s set up, correct
me if I’m wrong.

No, that’s correct.

The problem is with how to think about that output object. At the
moment you’re extending it by casting a single fresh output node,
tacking some data onto it, and returning it to the function instance
processing the parent style node. That works fine for text elements
rendering a variable, because one node in the style should produce
exactly one node in the output (basically – things get more
complicated with inline markup, but that’s for later).

Name nodes (names, name) in the style can’t be mapped one-to-one into
the output; the full output representation of a CSL names node needs
seven levels of nesting.[*]

The idea behind my internal representation ATM is not so much to hew
closely to the CSL model, but rather an ideal HTML + RDFa output. My
thinking is this also has benefits for processing.

For example, in RDFa, one can do J. This allows you to keep the presentation and
the raw data together, so the latter can be extracted. But it also
leaves it intact for further processing (say disambiguation).

This is the idea at least

So you need to extend the output object,
but casting fresh objects and appending them explicitly in each
process_* function will get unmanageable very quickly. The thing to
do is subclass elementree into a “cslOutput” (or whatever) class, and
provide methods like “append”, “openlevel”, and “closelevel” on the
output object itself (a global pointer inside the object can track
which node is the current “tip”).

Will keep this in mind.

Personally, I have real trouble getting my head around recursive
nesting, and every time I have to deal with one of these problems, it
drives me nuts trying to figure out what’s happening with the
navigation. From XML-world you probably have a natural feel for that
sort of thing …

Yes; it’s how you work with XSLT (which has been described as LISP,
with angle brackets!).

[*] The attached working note was prepared while refactoring the
citeproc-js output queue several months ago.

Thanks.

Bruce

Topic		Replies	Views
Disambiguation questions CSL Development	12	274	March 28, 2009
how to code a CSL processor CSL Development	3	323	May 25, 2010
disambiguation, pandoc and other news CSL Development	8	364	August 12, 2008
disambiguation question CSL Development	0	257	November 8, 2010
Two small remaining names issues CSL Development	0	284	March 19, 2009

names implementation

Related topics