Mendeley Word and OpenOffice plugins

Hi everyone,

I’m a software engineer at Mendeley (www.mendeley.com) and new to this
mailing list. Since Bruce has expressed interest in our software I thought
I’d give a brief overview of our current work and ideas for the future
regarding word processor integration.

I’m currently writing a plugin for OpenOffice writer which will work
similarly to the existing MS Word plugin. It uses CSL to format the
citations and is very similar to Zotero plugin although unfortunately
documents will not be compatible between Mendeley and Zotero. (Bruce has a
blog post about this:
http://community.muohio.edu/blogs/darcusb/archives/2009/03/01/the-babel-of-citations
)

In the long term we would like to use an open document format which supports
the sharing of documents created with tools like Mendeley, Zotero and
Endnote. Does anyone know if OOXML is up to the job and well supported
enough yet? However, due to a need to support Word 2003 we are probably
going to use bookmarks to share citation data (document UUIDs) between Word
and OpenOffice documents the way Zotero does for the time being. This is not
ideal and if anyone knows of a better of sharing citations please let me
know.

Currently, the cited documents UUIDs are stored in the document and the
metadata is retrieved from Mendeley Desktop by the plugin. Documents with
citations can be shared between different Mendeley users by placing the
cited documents in a shared group. In future, to make the documents more
portable it would be nice to store all the metadata in the document itself
(although this raises possible tricky issues of synchronisation) and
possibly the current CSL style sheet also. I notice the OpenOffice
developers are planning something like this (
http://wiki.services.openoffice.org/wiki/Bibliographic/Developer_Page/API_Enhancements#service_Metadata_and_User_Defined_Data_Access)
although again I’m not sure how far it is from being implemented and
useable.

For the moment I’m pretty busy just getting the OpenOffice plugin working
but it’s good to have a general plan for the future. If anyone has any
suggestions for what they think we should be doing regaring word processor
integration I’d love to hear them.

Steve–
Steve Ridout
Software Engineer
Mendeley

I’m a software engineer at Mendeley (www.mendeley.com) and new to this
mailing list. Since Bruce has expressed interest in our software I thought
I’d give a brief overview of our current work and ideas for the future
regarding word processor integration.

Hi Steve.

Before I move on, an aside: you guys might take a look at the work
that Frank has been doing on a test suite. It would no doubt be
helpful for all of us to use the same tests.

I’m currently writing a plugin for OpenOffice writer which will work
similarly to the existing MS Word plugin. It uses CSL to format the
citations and is very similar to Zotero plugin although unfortunately
documents will not be compatible between Mendeley and Zotero. (Bruce has a
blog post about this:
http://community.muohio.edu/blogs/darcusb/archives/2009/03/01/the-babel-of-citations)

Yeah, we really need to solve this; it’s insane that this stuff just
doesn’t work. I know a couple of people at MS that I’ll check with on
the status on their end,

In the long term we would like to use an open document format which supports
the sharing of documents created with tools like Mendeley, Zotero and
Endnote. Does anyone know if OOXML is up to the job and well supported
enough yet?

My understanding is that OOXML is more-or-less up to the task (though
the bibo RDF work I’ve helped with has a richer model), but that there
are some limitations in the API that Office 2007/2008 exposes that
make it trick to use that. I’ll check on where this stands.

However, due to a need to support Word 2003 we are probably
going to use bookmarks to share citation data (document UUIDs) between Word
and OpenOffice documents the way Zotero does for the time being. This is not
ideal and if anyone knows of a better of sharing citations please let me
know.

Currently, the cited documents UUIDs are stored in the document and the
metadata is retrieved from Mendeley Desktop by the plugin.

Yeah, this is how Zotero currently works, though I think they actually
use a local database key.

I think the trick is that either:

a) the ids need to be totally consistent across implementations
(impossible), or …

b) there needs to be additional metadata

I’ve long suggested that UUIDs should be URIs, with perhaps simple
rules like (in pseudo-code, and just for illustration):

def get_uri(item_data):
if item_data[‘uri’]:
return(item_data[‘uri’])
elif item_data[‘doi’]:
return(“info:doi:”, item_data[‘doi’])
else:
return(“whatever; you get the idea …”)

Perhaps you then also add, say, a title slug (“some-title”), and a year issued.

That gives enough to identify items without reliance on any particular
database or software.

Documents with citations can be shared between different Mendeley users by placing the
cited documents in a shared group.

That’s what Zotero is planning to add soon. Makes one wonder:
shouldn’t people be able to share stuff (profiles, items, etc.)
between Mendeley and Zotero?

In future, to make the documents more
portable it would be nice to store all the metadata in the document itself
(although this raises possible tricky issues of synchronisation) and
possibly the current CSL style sheet also. I notice the OpenOffice
developers are planning something like this
(http://wiki.services.openoffice.org/wiki/Bibliographic/Developer_Page/API_Enhancements#service_Metadata_and_User_Defined_Data_Access)
although again I’m not sure how far it is from being implemented and
useable.

I think the Zotero people have been looking at this stuff too (the
data synchronization). Not sure where they are with it.

As for CSL issues, that’s related to the recent discussion of CSL
updating. I think with that sort of approach, this wouldn’t be a
problem. I would probably store just the URI for the style in the
document.

For the moment I’m pretty busy just getting the OpenOffice plugin working
but it’s good to have a general plan for the future. If anyone has any
suggestions for what they think we should be doing regaring word processor
integration I’d love to hear them.

My suggestion is just to settle on a use case to work towards: that
Mendeley and Zotero users can collaborate on the same document without
headache.

Some of my suggestions above move towards that.

Bruce

Oh, also:

What would happen if processing got implemented at the WP level, and
plug-ins only passed over (standard) metadata, which got embedded
(again in standard ways) in the document?

Bruce

I’m a software engineer at Mendeley (www.mendeley.com) and new to this
mailing list. Since Bruce has expressed interest in our software I
thought
I’d give a brief overview of our current work and ideas for the future
regarding word processor integration.

Hi Steve.

Before I move on, an aside: you guys might take a look at the work
that Frank has been doing on a test suite. It would no doubt be
helpful for all of us to use the same tests.

I’m currently writing a plugin for OpenOffice writer which will work
similarly to the existing MS Word plugin. It uses CSL to format the
citations and is very similar to Zotero plugin although unfortunately
documents will not be compatible between Mendeley and Zotero. (Bruce has
a
blog post about this:

http://community.muohio.edu/blogs/darcusb/archives/2009/03/01/the-babel-of-citations
)

Yeah, we really need to solve this; it’s insane that this stuff just
doesn’t work. I know a couple of people at MS that I’ll check with on
the status on their end,

In the long term we would like to use an open document format which
supports
the sharing of documents created with tools like Mendeley, Zotero and
Endnote. Does anyone know if OOXML is up to the job and well supported
enough yet?

My understanding is that OOXML is more-or-less up to the task (though
the bibo RDF work I’ve helped with has a richer model), but that there
are some limitations in the API that Office 2007/2008 exposes that
make it trick to use that. I’ll check on where this stands.

However, due to a need to support Word 2003 we are probably
going to use bookmarks to share citation data (document UUIDs) between
Word
and OpenOffice documents the way Zotero does for the time being. This is
not
ideal and if anyone knows of a better of sharing citations please let me
know.

Currently, the cited documents UUIDs are stored in the document and the
metadata is retrieved from Mendeley Desktop by the plugin.

Yeah, this is how Zotero currently works, though I think they actually
use a local database key.

Right, and this means that Zotero created doucments aren’t even transferable
between the same user’s different computers.

I think the trick is that either:

a) the ids need to be totally consistent across implementations
(impossible), or …

b) there needs to be additional metadata

I’ve long suggested that UUIDs should be URIs, with perhaps simple
rules like (in pseudo-code, and just for illustration):

def get_uri(item_data):
if item_data[‘uri’]:
return(item_data[‘uri’])
elif item_data[‘doi’]:
return(“info:doi:”, item_data[‘doi’])
else:
return(“whatever; you get the idea …”)

Perhaps you then also add, say, a title slug (“some-title”), and a year
issued.

That gives enough to identify items without reliance on any particular
database or software.

True, I was thinking pehaps of storing all the document metadata, along with
the mendeley UUID and any ID associated with Zotero or other software. A /
DOI / ISBN would be OK in most circumstances but there what if it’s a
reference to a document which hasn’t been published yet or hasn’t got a URI?

Documents with citations can be shared between different Mendeley users
by placing the
cited documents in a shared group.

That’s what Zotero is planning to add soon. Makes one wonder:
shouldn’t people be able to share stuff (profiles, items, etc.)
between Mendeley and Zotero?

We are talking about implementing some form of synching between Mendeley and
Zotero although when this’ll happen I’m not sure.

In future, to make the documents more
portable it would be nice to store all the metadata in the document
itself
(although this raises possible tricky issues of synchronisation) and
possibly the current CSL style sheet also. I notice the OpenOffice
developers are planning something like this
(
http://wiki.services.openoffice.org/wiki/Bibliographic/Developer_Page/API_Enhancements#service_Metadata_and_User_Defined_Data_Access
)
although again I’m not sure how far it is from being implemented and
useable.

I think the Zotero people have been looking at this stuff too (the
data synchronization). Not sure where they are with it.

As for CSL issues, that’s related to the recent discussion of CSL
updating. I think with that sort of approach, this wouldn’t be a
problem. I would probably store just the URI for the style in the
document.

That’s what we currently do and you’re right that it’s probably all that is
needed, only problem would be if a user has edited the CSL file themselves
and not uploaded it to a public repository.

For the moment I’m pretty busy just getting the OpenOffice plugin working
but it’s good to have a general plan for the future. If anyone has any
suggestions for what they think we should be doing regaring word
processor
integration I’d love to hear them.

My suggestion is just to settle on a use case to work towards: that
Mendeley and Zotero users can collaborate on the same document without
headache.

We would like this too, once we get a stable OpenOffice Mendeley plugin
working we’ll look into this.

What would happen if processing got implemented at the WP level, and
plug-ins only passed over (standard) metadata, which got embedded
(again in standard ways) in the document?

At the moment, Mendeley and Zotero have different CSL formatting code,
Zotero’s is written in javascript and ours is in C++. If the same open
source formatting code could be used in both Word and OpenOffice csl
formatters at the WP level that would be great since everyone could
contribute and benefit from improvements to the same code base. Not sure how
likely that is though.2009/4/7 Bruce D’Arcus <@Bruce_D_Arcus1>

I’ve long suggested that UUIDs should be URIs, with perhaps simple
rules like (in pseudo-code, and just for illustration):

def get_uri(item_data):
if item_data[‘uri’]:
return(item_data[‘uri’])
elif item_data[‘doi’]:
return(“info:doi:”, item_data[‘doi’])
else:
return(“whatever; you get the idea …”)

Perhaps you then also add, say, a title slug (“some-title”), and a year
issued.

That gives enough to identify items without reliance on any particular
database or software.

True, I was thinking pehaps of storing all the document metadata, along with
the mendeley UUID and any ID associated with Zotero or other software. A /
DOI / ISBN would be OK in most circumstances but there what if it’s a
reference to a document which hasn’t been published yet or hasn’t got a URI?

Right; my for-illustration function above would need to be a little
more complex.

Documents with citations can be shared between different Mendeley users
by placing the
cited documents in a shared group.

That’s what Zotero is planning to add soon. Makes one wonder:
shouldn’t people be able to share stuff (profiles, items, etc.)
between Mendeley and Zotero?

We are talking about implementing some form of synching between Mendeley and
Zotero although when this’ll happen I’m not sure.

My suggestion to both projects is to look at what’s going in the
social networking word, with FOAF, etc. For example, my profile at
identica can be exported as FOAF:

http://identi.ca/bdarcus

My homepage also exposes it’s data as FOAF (using both embedded RDFa,
and RDF/XML):

$ curl -H “Accept: application/rdf+xml” http://bruce.darcus.name/about#me

You can imagine generalizing some of that to a site would facilitate
this sort of interop.

In future, to make the documents more
portable it would be nice to store all the metadata in the document
itself
(although this raises possible tricky issues of synchronisation) and
possibly the current CSL style sheet also. I notice the OpenOffice
developers are planning something like this

(http://wiki.services.openoffice.org/wiki/Bibliographic/Developer_Page/API_Enhancements#service_Metadata_and_User_Defined_Data_Access)
although again I’m not sure how far it is from being implemented and
useable.

I think the Zotero people have been looking at this stuff too (the
data synchronization). Not sure where they are with it.

As for CSL issues, that’s related to the recent discussion of CSL
updating. I think with that sort of approach, this wouldn’t be a
problem. I would probably store just the URI for the style in the
document.

That’s what we currently do and you’re right that it’s probably all that is
needed, only problem would be if a user has edited the CSL file themselves
and not uploaded it to a public repository.

Yes, all the more reason to discourage it :wink:

For the moment I’m pretty busy just getting the OpenOffice plugin
working
but it’s good to have a general plan for the future. If anyone has any
suggestions for what they think we should be doing regaring word
processor
integration I’d love to hear them.

My suggestion is just to settle on a use case to work towards: that
Mendeley and Zotero users can collaborate on the same document without
headache.

We would like this too, once we get a stable OpenOffice Mendeley plugin
working we’ll look into this.

What would happen if processing got implemented at the WP level, and
plug-ins only passed over (standard) metadata, which got embedded
(again in standard ways) in the document?

At the moment, Mendeley and Zotero have different CSL formatting code,
Zotero’s is written in javascript and ours is in C++. If the same open
source formatting code could be used in both Word and OpenOffice csl
formatters at the WP level that would be great since everyone could
contribute and benefit from improvements to the same code base. Not sure how
likely that is though.

What roadblocks do you see to that? Is there any particular opposition
on your end to open sourcing that code?

Bruce

I’ve long suggested that UUIDs should be URIs, with perhaps simple
rules like (in pseudo-code, and just for illustration):

def get_uri(item_data):
if item_data[‘uri’]:
return(item_data[‘uri’])
elif item_data[‘doi’]:
return(“info:doi:”, item_data[‘doi’])
else:
return(“whatever; you get the idea …”)

Perhaps you then also add, say, a title slug (“some-title”), and a year
issued.

That gives enough to identify items without reliance on any particular
database or software.

True, I was thinking pehaps of storing all the document metadata, along with
the mendeley UUID and any ID associated with Zotero or other software. A /
DOI / ISBN would be OK in most circumstances but there what if it’s a
reference to a document which hasn’t been published yet or hasn’t got a URI?

Right; my for-illustration function above would need to be a little
more complex.

Documents with citations can be shared between different Mendeley users
by placing the
cited documents in a shared group.

That’s what Zotero is planning to add soon. Makes one wonder:
shouldn’t people be able to share stuff (profiles, items, etc.)
between Mendeley and Zotero?

We are talking about implementing some form of synching between Mendeley and
Zotero although when this’ll happen I’m not sure.

My suggestion to both projects is to look at what’s going in the
social networking word, with FOAF, etc. For example, my profile at
identica can be exported as FOAF:

http://identi.ca/bdarcus

My homepage also exposes it’s data as FOAF (using both embedded RDFa,
and RDF/XML):

$ curl -H “Accept: application/rdf+xml” http://bruce.darcus.name/about#me

You can imagine generalizing some of that to a site would facilitate
this sort of interop.

In future, to make the documents more
portable it would be nice to store all the metadata in the document
itself
(although this raises possible tricky issues of synchronisation) and
possibly the current CSL style sheet also. I notice the OpenOffice
developers are planning something like this

(http://wiki.services.openoffice.org/wiki/Bibliographic/Developer_Page/API_Enhancements#service_Metadata_and_User_Defined_Data_Access)
although again I’m not sure how far it is from being implemented and
useable.

I think the Zotero people have been looking at this stuff too (the
data synchronization). Not sure where they are with it.

As for CSL issues, that’s related to the recent discussion of CSL
updating. I think with that sort of approach, this wouldn’t be a
problem. I would probably store just the URI for the style in the
document.

That’s what we currently do and you’re right that it’s probably all that is
needed, only problem would be if a user has edited the CSL file themselves
and not uploaded it to a public repository.

Yes, all the more reason to discourage it :wink:

For the moment I’m pretty busy just getting the OpenOffice plugin
working
but it’s good to have a general plan for the future. If anyone has any
suggestions for what they think we should be doing regaring word
processor
integration I’d love to hear them.

My suggestion is just to settle on a use case to work towards: that
Mendeley and Zotero users can collaborate on the same document without
headache.

We would like this too, once we get a stable OpenOffice Mendeley plugin
working we’ll look into this.

What would happen if processing got implemented at the WP level, and
plug-ins only passed over (standard) metadata, which got embedded
(again in standard ways) in the document?

At the moment, Mendeley and Zotero have different CSL formatting code,
Zotero’s is written in javascript and ours is in C++. If the same open
source formatting code could be used in both Word and OpenOffice csl
formatters at the WP level that would be great since everyone could
contribute and benefit from improvements to the same code base. Not sure how
likely that is though.

I don’t know, long-term maintenance of these things shouldn’t be much
of a resource drain. It’s a lot of work to build an implementation,
but the feature set should start to stabilize fairly soon, as CSL
moves to 1.0. The main thing is just to assure that implementations
can be used as drop-in replacements of one another. Is there an API
for driving the Mendeley CSL binary, so that users can verify it
against test cases for a particular schema version independently?
That would presumably be a threshold requirement for calling it CSL
1.0 conformant.

Frank

At the moment, Mendeley and Zotero have different CSL formatting code,
Zotero’s is written in javascript and ours is in C++. If the same open
source formatting code could be used in both Word and OpenOffice csl
formatters at the WP level that would be great since everyone could
contribute and benefit from improvements to the same code base. Not sure
how
likely that is though.

What roadblocks do you see to that? Is there any particular opposition

on your end to open sourcing that code?

Bruce

I don’t know, long-term maintenance of these things shouldn’t be much
of a resource drain. It’s a lot of work to build an implementation,
but the feature set should start to stabilize fairly soon, as CSL
moves to 1.0. The main thing is just to assure that implementations
can be used as drop-in replacements of one another. Is there an API
for driving the Mendeley CSL binary, so that users can verify it
against test cases for a particular schema version independently?
That would presumably be a threshold requirement for calling it CSL
1.0 conformant.

Frank

There isn’t an API at the moment, but this is something we’ll consider
doing, along with possibly open-sourcing the CSL formatting code. This would
take a bit of work to create a separate dll and to remove dependencies on
the rest of the Mendeley code but it would definately be good to be able to
take advantage of a shared test suite. Do you think there would be much
interest in an open source C++/Qt CSL formatting library?–
Steve Ridout
Software Engineer
Mendeley

There isn’t an API at the moment, but this is something we’ll consider
doing, along with possibly open-sourcing the CSL formatting code. This would
take a bit of work to create a separate dll and to remove dependencies on
the rest of the Mendeley code but it would definately be good to be able to
take advantage of a shared test suite. Do you think there would be much
interest in an open source C++/Qt CSL formatting library?

What role does Qt play here?

I can’t really gauge how much interest there might be specifically in
terms of, say, potential code contributors. In my experience, it’s
hard to find skilled C++ programmers in this particular space.

OTOH, it certainly would be good if the code was available so that
this sort of processing could be more-or-less easily dropped into
environments like OOo, KWord, and Word. All are primarily C++
environments (and KDE also uses Qt). So I could imagine developers
from those sorts of communities might be willing to help out.

It could also show a good faith effort to tangibly contribute to the
success of CSL in ways that so far we have not seen from Mendeley*,
but which can have all kinds of not-so-direct benefits for your
efforts.

As Frank says, that’s not the only way to achieve this (the test suite
is really important, for example), but I can’t see any particular
value to you to keeping that code closed (notwithstanding the
additional work you need to get it suitable for wider use and
contribution), and a variety of potential advantages in opening it up.

Bruce

  • E.g. it’s great Mendeley is using CSL, but not so great that you
    don’t (at least the last I checked) really acknowledge it, or the fact
    that you borrow a lot from Zotero’s work. There are all sorts of
    reason this can happen that may have nothing to do with conscious
    design, but it’s still not the best approach either for you, or your
    users.

At one point I started writing an editor for creating CSL styles using Qt
since this is a great cross platform toolkit.

Due to time restraints I had to put the project on hold. But another major
reason for not easily being able to move forward was that I realized that a
formatting library would be needed. I would like to see such a library and
might be able to contribute.

Peter

Hello,

Quick introduction - I am another of the engineers working on Mendeley Desktop.

What role does Qt play here?

Qt is used for XML parsing, basic data structures (lists, stacks,
dictionaries) and string manipulation. It also provides a certain
amount of dynamic functionality which is missing in the language
itself. That is, reading and writing properties of an object given
the string name of the property.

Regards,
Robert.2009/4/10 Peter Hedlund <@Peter_Hedlund>: