Apologies if this discussion belongs on-list, as it’s not directly
related to CSL, but I feel that some subscribers here (particularly
the Mendeley people) might have something to say regarding these issues.
…
Our current plan is to encode the collaborator ID as a URI, as RDF,
with the
bibliographic ontology.Ah good
This should satisfy your use case, although it might
be hard to figure out how to store bibliographic metadata in doc/
docx…Yes, but I have a feeling there are ways to do this; certainly in
docx.An interesting development in the past six months or so is that MS has
joined the ODF TC at OASIS, so there’s at least some effort at working
on interoperability between them.
The docx container format definitely specifies a way to do this, and
there is probably an API included in Office 2007 for Windows (since it
looks like some add ins use it), but it looks as if it would be
impossible to embed metadata properly in Word 2008 for Mac. There are
no APIs for the built-in bibliography/citation support. This means
that we are quite limited in the ways we can store both citation
elements (fields) and the associated bibliographic metadata.
With respect to the former, I don’t think there’s any way around
continuing to use Word’s decrepit field/bookmark support. We have to
be able to modify citation elements while the document is open, and in
the absence of any APIs for anything else, fields (or bookmarks) are
simply the only way.
With respect to the latter, it turns out custom XML is actually
preserved across opening/saving a document, but it can’t be modified
while the document is in use. Again, there are no APIs to perform any
kind of modifications, and simply modifying the zip while the document
is open doesn’t work (any modifications just get clobbered on save).
This leaves us with a few options:
-
Store metadata The Right Way, by running a background process that
waits until the document is saved/closed, then adds the appropriate
metadata to it. Assuming we don’t want this background process running
all the time, it would be triggered by using any kind of Zotero
functionality, and continue running until that document is closed.
This would be very kludgy, and probably undesirably CPU-intensive.
Additionally, if Word crashed, and a recovered document were opened
and re-saved without using any Zotero functionality, it would be saved
without any metadata. -
Store metadata in field objects. Beyond being the wrong way of
doing things, this approach has its own disadvantages, since the
metadata would either need to be attached to each field (thus
significantly increasing file size) or only some fields, in which case
deleting the first citation would also delete the associated metadata.
The metadata would be added back if Zotero functionality was triggered
on the same computer as it was created on, but this could be a problem
if a document were modified without triggering Zotero, or if a
collaborator opened and edited a document before Zotero got triggered.
It would, however, work with both doc and docx. -
Store metadata in Word’s custom document properties. This would
mean splitting things up into 255 character chunks, since that seems
to be the length limit, and would likely be associated with some kind
of speed hit for the initial reading of metadata. This would also work
with both doc and docx. -
Store metadata as hidden text, or in some other kind of invisible
object. I haven’t investigated this much, and it might actually lead
to the most functional implementation, but generally, the approach
just seems wrong.
I welcome advice regarding these options. In OpenOffice, of course, we
can just use the RDF support which you advocated and appears to be
included in OO 3.0 (although I haven’t tested it).
Simon