RDFa support

cc-ing xbib-dev on reply …

After a discussion with Phillip Lord, I’ve made some changes in
citeproc-js to provide access to the itemID inside the function used
to generate the bibliography entry wrapper. One of the things this
opens a path to is the inclusion of simple RDFa structures in HTML
output as a set of hidden tags.

So just to be clear, this is because of something about the way you’ve
designed the code that means that output tokens lose association to
metadata content during processing?

Effectively, then, you need to dump a separate representation of the
bib item. This would facilitate RDFa, but also any other format
really.

But it does have the cost of requiring duplication of content.

Do I have that all right?

While not as fancy as wrapping content
in RDFa, this would still offer a basis (once an RDFa parser is sorted
out) for round-tripping citation data between the Web and reference
managers.

To anyone interested in contributing, RDFa output should be pretty
simple to implement. Everything you need to know is in a little
comment in the format.js source file of citeproc-js.

http://groups.google.com/group/citeproc-js/browse_thread/thread/8095a38f21e0327d

And if anyone has BIBO/RDFa specific questions, there’s google group.

http://groups.google.com/group/bibliographic-ontology-specification-group

Bruce
Bruce

cc-ing xbib-dev on reply …

After a discussion with Phillip Lord, I’ve made some changes in
citeproc-js to provide access to the itemID inside the function used
to generate the bibliography entry wrapper. One of the things this
opens a path to is the inclusion of simple RDFa structures in HTML
output as a set of hidden tags.

So just to be clear, this is because of something about the way you’ve
designed the code that means that output tokens lose association to
metadata content during processing?

It means that the itemID was not available for use in writing
unescaped text content associated with an item, but that it now is. I
don’t know what that says about the design of citeproc-js.
Implementing it required the addition of one line of code, and a small
change to another.

Effectively, then, you need to dump a separate representation of the
bib item. This would facilitate RDFa, but also any other format
really.

But it does have the cost of requiring duplication of content.

Do I have that all right?

You would need a separate record of the content anyway, since field
content may be littered with raw HTML tags, and the names data will
often not be complete in the rendered bibliography entries. But
otherwise, yes. The idea would be to dump a separate representation of
the bib item.

But for sake of possible future implementors/contributors (e.g. for
the record), one could do an object representation of the output, such
that this would be easy to do. E.g. an intermediate representation
that combines metadata and processed output strings:

{“variable”:“issued”, “value”:“Apr 2, 2011”, “content”:“2011-04-02”,
“font-weight”:“bold”}

You then have an output routine that could write that to HTML like:

<span
property="dc:issued"
content=“2011-04-02”
style=“font-weight:bold;”>Apr 2, 2011

It’s just that citeproc-js didn’t take that approach, and so it’s hard
to bolt on after the fact.

Right?

Not a critique; I just want people to recognize other possible approaches.

Bruce

(Bruce: sorry, reposting to include the list.)

cc-ing xbib-dev on reply …

After a discussion with Phillip Lord, I’ve made some changes in
citeproc-js to provide access to the itemID inside the function used
to generate the bibliography entry wrapper. One of the things this
opens a path to is the inclusion of simple RDFa structures in HTML
output as a set of hidden tags.

So just to be clear, this is because of something about the way you’ve
designed the code that means that output tokens lose association to
metadata content during processing?

It means that the itemID was not available for use in writing
unescaped text content associated with an item, but that it now is. I
don’t know what that says about the design of citeproc-js.
Implementing it required the addition of one line of code, and a small
change to another.

Effectively, then, you need to dump a separate representation of the
bib item. This would facilitate RDFa, but also any other format
really.

But it does have the cost of requiring duplication of content.

Do I have that all right?

You would need a separate record of the content anyway, since field
content may be littered with raw HTML tags, and the names data will
often not be complete in the rendered bibliography entries. But
otherwise, yes. The idea would be to dump a separate representation of
the bib item.

But for sake of possible future implementors/contributors (e.g. for
the record), one could do an object representation of the output, such
that this would be easy to do. E.g. an intermediate representation
that combines metadata and processed output strings:

{“variable”:“issued”, “value”:“Apr 2, 2011”, “content”:“2011-04-02”,
“font-weight”:“bold”}

Yes, that’s what is needed. citeproc-js produces an object
representation of the output already, all that’s needed is to add a
pointer to the input item and variable name when casting output, so
that it can be picked up when the object is flattened.

You then have an output routine that could write that to HTML like:

<span
property=“dc:issued”
content=“2011-04-02”
style=“font-weight:bold;”>Apr 2, 2011

It’s just that citeproc-js didn’t take that approach, and so it’s hard
to bolt on after the fact.

Right?

It’s not that hard to do, really – making the itemID available to bib
items only took two lines of code. Providing access to individual
variables will take a little more work, because not all variables in
CSL have item content. But compared to the difficulties of getting the
rendered output right, it’s not a huge task.

Not a critique; I just want people to recognize other possible approaches.

Bruce

I think it’s more a matter of clean coding style than anything
fundamental about the design. The code of citeproc-js is a little
woolly in places, partly because I didn’t know Javascript all that
well when I started writing, and partly because the project chased
emerging requirements in CSL 1.0 while it was being written.

The one thing I don’t like about the code is my extensive reliance on
“state.tmp” values, which are in effect global variables limited in
scope to the processor. With a proper inheritance mechanism, many of
these could go away, and that would make the program more readable.

Frank