RPC & citeproc-js

Apologies if this discussion belongs on-list, as it’s not directly
related to CSL, but I feel that some subscribers here (particularly
the Mendeley people) might have something to say regarding these issues.

Our current plan is to encode the collaborator ID as a URI, as RDF,
with the
bibliographic ontology.

Ah good :slight_smile:

This should satisfy your use case, although it might
be hard to figure out how to store bibliographic metadata in doc/
docx…

Yes, but I have a feeling there are ways to do this; certainly in
docx.

An interesting development in the past six months or so is that MS has
joined the ODF TC at OASIS, so there’s at least some effort at working
on interoperability between them.

The docx container format definitely specifies a way to do this, and
there is probably an API included in Office 2007 for Windows (since it
looks like some add ins use it), but it looks as if it would be
impossible to embed metadata properly in Word 2008 for Mac. There are
no APIs for the built-in bibliography/citation support. This means
that we are quite limited in the ways we can store both citation
elements (fields) and the associated bibliographic metadata.

With respect to the former, I don’t think there’s any way around
continuing to use Word’s decrepit field/bookmark support. We have to
be able to modify citation elements while the document is open, and in
the absence of any APIs for anything else, fields (or bookmarks) are
simply the only way.

With respect to the latter, it turns out custom XML is actually
preserved across opening/saving a document, but it can’t be modified
while the document is in use. Again, there are no APIs to perform any
kind of modifications, and simply modifying the zip while the document
is open doesn’t work (any modifications just get clobbered on save).
This leaves us with a few options:

  1. Store metadata The Right Way, by running a background process that
    waits until the document is saved/closed, then adds the appropriate
    metadata to it. Assuming we don’t want this background process running
    all the time, it would be triggered by using any kind of Zotero
    functionality, and continue running until that document is closed.
    This would be very kludgy, and probably undesirably CPU-intensive.
    Additionally, if Word crashed, and a recovered document were opened
    and re-saved without using any Zotero functionality, it would be saved
    without any metadata.

  2. Store metadata in field objects. Beyond being the wrong way of
    doing things, this approach has its own disadvantages, since the
    metadata would either need to be attached to each field (thus
    significantly increasing file size) or only some fields, in which case
    deleting the first citation would also delete the associated metadata.
    The metadata would be added back if Zotero functionality was triggered
    on the same computer as it was created on, but this could be a problem
    if a document were modified without triggering Zotero, or if a
    collaborator opened and edited a document before Zotero got triggered.
    It would, however, work with both doc and docx.

  3. Store metadata in Word’s custom document properties. This would
    mean splitting things up into 255 character chunks, since that seems
    to be the length limit, and would likely be associated with some kind
    of speed hit for the initial reading of metadata. This would also work
    with both doc and docx.

  4. Store metadata as hidden text, or in some other kind of invisible
    object. I haven’t investigated this much, and it might actually lead
    to the most functional implementation, but generally, the approach
    just seems wrong.

I welcome advice regarding these options. In OpenOffice, of course, we
can just use the RDF support which you advocated and appears to be
included in OO 3.0 (although I haven’t tested it).

Simon

Apologies if this discussion belongs on-list, as it’s not directly related to CSL, but I feel that some subscribers here (particularly the Mendeley people) might have something to say regarding these issues.

No need to apologize; this is an important implementation concern.

I welcome advice regarding these options.

Would probably help to heard from MS. I forwarded your post to a
couple of key people there.

In OpenOffice, of course, we can just use the RDF support which you advocated and appears to be included in OO 3.0 (although I haven’t tested it).

It’s not in 3.0. My understanding is that it’s coming in 3.2 later this year.

Sun has a developer working on this, and he successfully integrated
Redland into OOo and added an API wrapper on top of it. He also got
all the in-package RDF stuff working. Where I believe he got delayed
was on multi-user editing details like this.

But as I said, it should be coming soon-ish.

Bruce

A reply from the Microsoft project lead for the bibliographic stuff,
forwarded with permission …---------- Forwarded message ----------
From: Amani Ahmed amaniah@exchange.microsoft.com
Date: Thu, Apr 23, 2009 at 3:49 PM
Subject: RE: [xbiblio-devel] Storing bibliographic information in word
processing documents [was RPC & citeproc-js]
To: Bruce D’Arcus <@Bruce_D_Arcus1>
Cc: Doug Mahugh Doug.Mahugh@microsoft.com

Hi Bruce -

Sorry for taking so long to get back to you on this. I wanted to
confirm some things before responding.

You and Simon are correct that the XML for the bibliography sources
cannot be manipulated directly when the document is open. However,
you overcome this issue by using the object model supported by Office
2007. Our OM provides objects that represent each of the bibliography
sources with the document — the properties of which can be retrieved
and set as needed. More details on this can be found on MSDN:
http://msdn.microsoft.com/en-us/library/bb258052.aspx

As for storing your own custom metadata with the document, you can
accomplish this by creating your own custom XML part with the document
package. You can define the schema for this custom XML part in such
as way so that the ID for a particular bibliography source is stored
alongside the metadata properties associated with it. Again, this
data from the custom XML part can be manipulated through our object
model (http://msdn.microsoft.com/en-us/library/bb608618.aspx) This
solution, of course, is under the assumption that the “metadata” you
were referring to below were addition properties you wanted to store
on a bibliography source by source basis. Please correct me if I made
the wrong assumption here. :slight_smile:

I’m still following up with our Mac division to confirm whether
building a solution this is manner will port over to their version of
the application. I will let you know what I hear back from them.

Please let me know if you would like any more details on the solutions
above or have further questions.

Thanks,

Amani

-----Original Message-----
From: Bruce D’Arcus [mailto:@Bruce_D_Arcus1]
Sent: Monday, April 20, 2009 1:07 PM
To: Amani Ahmed
Subject: Re: [xbiblio-devel] Storing bibliographic information in word
processing documents [was RPC & citeproc-js]

Amani – any chance you can get back to me and Simon on this issue?

A quick reply Amani …

Hi Bruce -

Sorry for taking so long to get back to you on this. I wanted to confirm some things before responding.

You and Simon are correct that the XML for the bibliography sources cannot be manipulated directly when the document is open. However, you overcome this issue by using the object model supported by Office 2007. Our OM provides objects that represent each of the bibliography sources with the document — the properties of which can be retrieved and set as needed. More details on this can be found on MSDN: http://msdn.microsoft.com/en-us/library/bb258052.aspx

Right, but one problem is that this model is somewhat limited.

The bigger problem is that I don’t think this API is available on the
Mac version.

As for storing your own custom metadata with the document, you can accomplish this by creating your own custom XML part with the document package. You can define the schema for this custom XML part in such as way so that the ID for a particular bibliography source is stored alongside the metadata properties associated with it. Again, this data from the custom XML part can be manipulated through our object model (Custom XML parts overview - Visual Studio (Windows) | Microsoft Learn) This solution, of course, is under the assumption that the “metadata” you were referring to below were addition properties you wanted to store on a bibliography source by source basis. Please correct me if I made the wrong assumption here. :slight_smile:

So are you saying that the custom XML part would essentially be a
separate island of XML that got linked to the main bibliographic
representation through a common ID?

For comparison, for ODF 1.2, we’d probably be using the bibo RDF
vocabulary, which is both richer, and more flexible; a developer would
add additional properties by simply including those triples in the
same RDF/XML package. I suppose ideally, we’d be able to use the same
RDF/XML in both ODF and OOXML.

http://bibliontology.com/

I’m still following up with our Mac division to confirm whether building a solution this is manner will port over to their version of the application. I will let you know what I hear back from them.

OK, thanks. As I said above, I expect the answer may not be
encouraging. But I 'd be happy to be wrong.

Of course, some (though not all) of these issues would be solved if MS
were to simply adopt CSL as its citation styling language, rather than
XSLT. There are now over 1000 CSL styles, all of them a product of
part-time ad hoc community effort, and the CSL versions of the major
styles are, I think it’s fair to say, much smaller and simpler, but
also more compliant. XSLT just isn’t a very good language to do this
stuff in.

Bruce

Bruce, Did you ever get a response re: storing metadata in Office 2008
for Mac? I’m not too optimistic, but since this is something we
probably want to implement sooner rather than later, it would be great
if you did.

Thanks,
Simon