Using citeproc implementation for third-party applications

Hello list,

a quick feedback by someone who is trying to integrate one of the existing implementations of citeproc into a php application. My sense is that it might be too early for that, and that I might be asking too much at this point. I have been struggling for a while now, trying the haskell, javascript, and python implementation in turn, withoug getting any results (I don’t know ruby, so I shied away from that). The biggest problem is that there aren’t any tutorials that guide you from implementation to coding a simple example.

I have checked out the whole SVN tree and installed the Haskell implementation using cabal (as suggested by Bruce). The javascript implementation has a working test suite, which runs on my computer, but I am lost how I would proceed if I have, for example, a Json datastructure sent by the server to the client. Or how I would call a rhino script on the server which I could do from PHP through the shell. The same is true for the Haskell and Python implementation.

What would really be great is if each implementation had a simple test script (a kind of CSL-“hello world”) which can be called from the command line and which can be used as a blueprint to write your own hooks for your application. I know that writing documentation at this point is aiming at a moving target. But if there was an updated script like that, third-party developers could move with the target.

Somewhat lost,

Christian–
View this message in context: http://n2.nabble.com/Using-citeproc-implementation-for-third-party-applications-tp2603902p2603902.html
Sent from the xbiblio-devel mailing list archive at Nabble.com.

biggest problem is that there aren’t any tutorials that guide you from implementation to coding a simple example.

I meant “… that guide you from installation …”–
View this message in context: http://n2.nabble.com/Using-citeproc-implementation-for-third-party-applications-tp2603902p2603909.html
Sent from the xbiblio-devel mailing list archive at Nabble.com.

Hello list,

a quick feedback by someone who is trying to integrate one of the existing implementations of citeproc into a php application. My sense is that it might be too early for that, and that I might be asking too much at this point. I have been struggling for a while now, trying the haskell, javascript, and python implementation in turn, withoug getting any results (I don’t know ruby, so I shied away from that). The biggest problem is that there aren’t any tutorials that guide you from implementation to coding a simple example.

I have checked out the whole SVN tree and installed the Haskell implementation using cabal (as suggested by Bruce). The javascript implementation has a working test suite, which runs on my computer, but I am lost how I would proceed if I have, for example, a Json datastructure sent by the server to the client. Or how I would call a rhino script on the server which I could do from PHP through the shell. The same is true for the Haskell and Python implementation.

What would really be great is if each implementation had a simple test script (a kind of CSL-“hello world”) which can be called from the command line and which can be used as a blueprint to write your own hooks for your application. I know that writing documentation at this point is aiming at a moving target. But if there was an updated script like that, third-party developers could move with the target.

Hi, Christian,

The citeproc-js code is getting there, but it’s not ready for rollout,
and it’s true that I haven’t given much thought to what the deployment
packaging will look like. Completion of the full CSL feature set is
probably two or three months away, if things continue to go smoothly.
Andrea’s Haskell processor is your choice for an early solution, I
think.

My only JS experience is in the Rhino environment in which citeproc-js
is being written. My only server experience is with fairly primitive
stuff in Apache and fairly sophisticated stuff in Plone; I’m not a
very useful source of advice on PHP integration, I’m afraid. From
what I’ve heard in passing, though, I’d guess that tomcat would be an
appropriate server platform for it, since it would allow you to
instantiate a citeproc-js formatter for the duration of a session, as
it’s designed to work. I don’t think running it from shell for each
transaction would be a happy experience; a citeproc-js formatter has
to be built (compiled, sort of) before it’s run. The runtime instance
should be fast, but the build stage will be slow, and you would
probably run into latency issues fairly quickly.

You’re suggestion of a Hello World example is a good one, though, and
I’ll keep that in mind as I get closer to the finish line on this
thing.

Frank

I have checked out the whole SVN tree and installed the Haskell implementation using cabal (as suggested by Bruce). The javascript implementation has a working test suite, which runs on my computer, but I am lost how I would proceed if I have, for example, a Json datastructure sent by the server to the client. Or how I would call a rhino script on the server which I could do from PHP through the shell. The same is true for the Haskell and Python implementation.

This sort of goes back to my common API question from awhile back.
Perhaps this is a way to back into this. To wit, I’ve created a wiki
page for this:

https://apps.sourceforge.net/trac/xbiblio/wiki/CSLDeployment

You need to be a project member to edit the wiki, but I’d you,
Christian, to edit this the way you would hope it would work, and get
back to us. I’ve done that for the Python version (though quickly, and
so subject to a change).

What would really be great is if each implementation had a simple test script (a kind of CSL-“hello world”) which can be called from the command line and which can be used as a blueprint to write your own hooks for your application. I know that writing documentation at this point is aiming at a moving target. But if there was an updated script like that, third-party developers could move with the target.

Agreed. Along with the test suite, we really need that for 1.0.

Bruce

The citeproc-js code is getting there, but it’s not ready for rollout,
and it’s true that I haven’t given much thought to what the deployment
packaging will look like. Completion of the full CSL feature set is
probably two or three months away, if things continue to go smoothly.
Andrea’s Haskell processor is your choice for an early solution, I
think.

Andrea’s implementation would work similar to what I’ve suggested for
the Python version. But it would indeed be good for him to show us
exactly how it should be done.

My only JS experience is in the Rhino environment in which citeproc-js
is being written.

So for JS, there’s two options: client and server.

I’m not sure the use cases for the client end, but I’d expect that it
would work roughly like what you see in my pseudo python version,
except that the data would get loaded as JSON, and perhaps also via an
AJAX server call.

So I’d expect a function to run, say, a bibliography, but that you’re
not feeding it a traditional object (e.g. class instance), but rather
a JSON data structure.

Bruce

Hello list,

a quick feedback by someone who is trying to integrate one of the
existing implementations of citeproc into a php application. My
sense is that it might be too early for that, and that I might be
asking too much at this point. I have been struggling for a while
now, trying the haskell, javascript, and python implementation in
turn, withoug getting any results (I don’t know ruby, so I shied
away from that). The biggest problem is that there aren’t any
tutorials that guide you from implementation to coding a simple
example.

Actually here you can find a minimal example:
http://code.haskell.org/citeproc-hs/docs/Text-CSL.html#2

together with the haskell implementation API.

If you save it as test.hs, you can run it with:
runhaskell test.hs

or compile it to native code with:
ghc --make test.hs
and run
./test

I have checked out the whole SVN tree and installed the Haskell
implementation using cabal (as suggested by Bruce). The javascript
implementation has a working test suite, which runs on my computer,
but I am lost how I would proceed if I have, for example, a Json
datastructure sent by the server to the client. Or how I would call
a rhino script on the server which I could do from PHP through the
shell. The same is true for the Haskell and Python implementation.

What would really be great is if each implementation had a simple
test script (a kind of CSL-“hello world”) which can be called from
the command line and which can be used as a blueprint to write your
own hooks for your application. I know that writing documentation at
this point is aiming at a moving target. But if there was an updated
script like that, third-party developers could move with the target.

Somewhat lost,

You are right but, and now I’m speaking only for the Haskell code, it
is still immature and subject to rapid development. While I’m
documenting the code I need the stuff to stabilize a bit before
writing end-user documentation (otherwise it would become obsolete
very shortly).

Anyway I’m really willing to help you if you want.

Best,
Andrea

a quick feedback by someone who is trying to integrate one of the existing implementations of citeproc into a php application.

I am certainly encouraged re. the remarks on the
javascript/python/haskell implementations that came in response to
your query. Ideally, we’ll get a native citeproc-php eventually.

The haskell implementation, while not perfect, works with pandoc very
well. It is fairly trivial to use exec() on a server with these tools
if you don’t mind these “heavy” dependencies. I’ve done this, but
haven’t released anything: we don’t want our webapp to be dependent on
these (it is distributed, rather than centrally hosted).

I think Matthias and I had kicked around the idea of having a central
server that ran pandoc+citeproc-hs & bibutils in order to process
citations for other webapps. Perhaps it is still a reasonably good
idea to have some kind of central server for this in the short-term
(especially if the more-mature javascript code can be used
(server-side) instead).

–Rick

Ron Jerome mentioned he’d made some progress on a php version modeled
more on the python and ruby versions than the earlier Zotero approach.
I think he managed to get a CSL style parsed into a PHP object. But a)
that’s not actually the hard part (it’s the processing functions), and
b) he got distracted by other stuff. Would be good to get him to put
this up on GitHub or something.

Bruce

Hi everybody

thanks for all of the infos. I’ll use this message to respond to all of them at once so that the discussion does not fragment too much.

My only JS experience is in the Rhino environment in which citeproc-js
is being written. My only server experience is with fairly primitive
stuff in Apache and fairly sophisticated stuff in Plone; I’m not a
very useful source of advice on PHP integration, I’m afraid. From
what I’ve heard in passing, though, I’d guess that tomcat would be an
appropriate server platform for it, since it would allow you to
instantiate a citeproc-js formatter for the duration of a session, as
it’s designed to work. I don’t think running it from shell for each
transaction would be a happy experience; a citeproc-js formatter has
to be built (compiled, sort of) before it’s run. The runtime instance
should be fast, but the build stage will be slow, and you would
probably run into latency issues fairly quickly.

Frank, the java dependency is exactly the same problem that I had when I first tried Bruce’s XSLT2 implementation two years ago and it was dependent on the Saxon XSLT2 processor. The problem is that I want my application to run even on small servers (like a thinly-powered virtual server) - that’s the whole point of sticking with PHP - , which excludes Java Application Servers like Tomcat etc. I wonder, though, aren’t there any standalone binary Javascript interpreters that one could use?

This sort of goes back to my common API question from awhile back.
Perhaps this is a way to back into this. To wit, I’ve created a wiki
page for this:

https://apps.sourceforge.net/trac/xbiblio/wiki/CSLDeployment

Bruce, that’s exactly what we need! Thanks. I’ll test all of the solutions posted there and will give you feedback.

So for JS, there’s two options: client and server.

I’m not sure the use cases for the client end, but I’d expect that it
would work roughly like what you see in my pseudo python version,
except that the data would get loaded as JSON, and perhaps also via an
AJAX server call.

So I’d expect a function to run, say, a bibliography, but that you’re
not feeding it a traditional object (e.g. class instance), but rather
a JSON data structure.

Bruce

Yes, I’d be interested in that, too. This would solve the latency issues. I have a bibliography application which keeps the currently selected bibliographic data in memory on the client anyways. Instead of the round trip to the server, the bibliography could be generated on-the-fly in the browser, written into a new window (or the system clipboard, if supported).

Actually here you can find a minimal example:
http://code.haskell.org/citeproc-hs/docs/Text-CSL.html#2

together with the haskell implementation API.

If you save it as test.hs, you can run it with:
runhaskell test.hs

or compile it to native code with:
ghc --make test.hs
and run
./test

You are right but, and now I’m speaking only for the Haskell code, it
is still immature and subject to rapid development. While I’m
documenting the code I need the stuff to stabilize a bit before
writing end-user documentation (otherwise it would become obsolete
very shortly).

Anyway I’m really willing to help you if you want.

Andrea, thank you. My problem is that I would need a binary that I can configure with command-line options. I don’t know Haskell, so I cannot write it myself. Maybe this would be a useful addition to your library that would not need much documentation? Something that can be compiled on each platform and then called like

citeproc-hs --style apa.cls --infile myreferences.txt --bibdata mybibdb.bib --format bibtex --outfile bibliography.html --format html

or something like this? The command line parameters would not need to change at all even when you refactor your code.

I am certainly encouraged re. the remarks on the
javascript/python/haskell implementations that came in response to
your query. Ideally, we’ll get a native citeproc-php eventually.

The haskell implementation, while not perfect, works with pandoc very
well. It is fairly trivial to use exec() on a server with these tools
if you don’t mind these “heavy” dependencies. I’ve done this, but
haven’t released anything: we don’t want our webapp to be dependent on
these (it is distributed, rather than centrally hosted).

Rick, this is exactly what I want to do. The Haskell impementation compiled as a binary would probably be very fast (although I have no idea how that compares to python or ruby). And we might not need a php-only citeproc at all (except of course in situations where PHP is prevented from calling executables for security reasons).

I think Matthias and I had kicked around the idea of having a central
server that ran pandoc+citeproc-hs & bibutils in order to process
citations for other webapps. Perhaps it is still a reasonably good
idea to have some kind of central server for this in the short-term
(especially if the more-mature javascript code can be used
(server-side) instead).

This would of course be fabulous, except that you probably do not want other people to use your bandwith and CPU to process their bibliographies? :wink: It could be limited by distributing API keys (or you might want to make it a subscription service).

Ron Jerome mentioned he’d made some progress on a php version modeled
more on the python and ruby versions than the earlier Zotero approach.
I think he managed to get a CSL style parsed into a PHP object. But a)
that’s not actually the hard part (it’s the processing functions), and
b) he got distracted by other stuff. Would be good to get him to put
this up on GitHub or something.

Of course, I’d be thrilled to have a php-only implementation since that I what I am fluent in. But as I said, maybe it would be enough for the start to write a php-bridge to one or more of the other implementations.

Again, thanks for all of your suggestions, and I am happy to try out everything that you throw my way.

Christian–
View this message in context: http://n2.nabble.com/Using-citeproc-implementation-for-third-party-applications-tp2603902p2610093.html
Sent from the xbiblio-devel mailing list archive at Nabble.com.

Frank, the java dependency is exactly the same problem that I had when I first tried Bruce’s XSLT2 implementation two years ago and it was dependent on the Saxon XSLT2 processor. The problem is that I want my application to run even on small servers (like a thinly-powered virtual server) - that’s the whole point of sticking with PHP - , which excludes Java Application Servers like Tomcat etc. I wonder, though, aren’t there any standalone binary Javascript interpreters that one could use?

Yes, try Mozilla’s standalone interpreters, such as spidermonkey (they
have a newer and faster VM, though I don’t think that’s currently
packaged as a separate standalone).

Bruce, that’s exactly what we need! Thanks. I’ll test all of the solutions posted there and will give you feedback.

OK, but note: the Python one is just an example; it won’t work. I
should probably add that qualifier to the page :slight_smile:

Andrea’s example should work though.

Rick, this is exactly what I want to do. The Haskell impementation compiled as a binary would probably be very fast (although I have no idea how that compares to python or ruby). And we might not need a php-only citeproc at all (except of course in situations where PHP is prevented from calling executables for security reasons).

My guess is the Haskell version would be substantially faster than all
other versions, given that Haskell is a compiled language that offers
performance almost on par with C.

Bruce

My guess is the Haskell version would be substantially faster than all
other versions, given that Haskell is a compiled language that offers
performance almost on par with C.

Though I will say that in my experiments using python’s lxml library
(based on libxml underneath), it will parse a CSL style into a native
Python Style object essentially instantaneously. Ruby has a similar
library (Nokogiri), which I’d imagine would see similar speed.

Bruce

Andrea, thank you. My problem is that I would need a binary that I can configure with command-line options. I don’t know Haskell, so I cannot write it myself. Maybe this would be a useful addition to your library that would not need much documentation? Something that can be compiled on each platform and then called like

citeproc-hs --style apa.cls --infile myreferences.txt --bibdata mybibdb.bib --format bibtex --outfile bibliography.html --format html

or something like this?

Why not use pandoc, which already has similar command line options?
From Andrea’s example:
pandoc --csl cslStyle.csl --biblio modsCollection.mods text.markdown

text.html

Andrea’s cite also has instructions on building pandoc with citeproc-hs
support.

This is ready to go right now (as I said, I’ve used it). Your webapp
would need to make ‘text.markdown’ to know which references to cite and
would need to write the reference data to a modsCollection.mods MODS XML
file. MODS XML is still one of the richest formats and has reasonably
good adoption, so it isn’t a bad idea that your app would support it
natively anyway. However, you could use bibutils to convert from the
BibTeX file you can apparently already generate if you absolutely needed to.

The Haskell impementation compiled as a binary would probably be very fast

Yes, it is.

I think Matthias and I had kicked around the idea of having a central
server that ran pandoc+citeproc-hs & bibutils in order to process
citations for other webapps.

…you probably do not want other people to use your bandwith and CPU to process their bibliographies?

I suspect that it’d be fine (especially in the near-term). There is
(was?) a public bibutils server & there are a few small citation
processing servers self-hosted by .edus. But, as you say, we could add
restrictions if needed. The F/OSS webapp (refbase, bibliograph, etc.)
could certainly all get by with a single server to do this right now.

I don’t want to volunteer the old .edu server that I still help
maintain, unless it is the only option (e.g. others agree that it would
be useful, but nobody else volunteers).

maybe it would be enough for the start to write a php-bridge to one or more of the other implementations.

And, as above, the bridge is really trivial if you are using exec() w/
pandoc.

–RickFrom: panyasan <@panyasan>:

Writing something like this would be trivial, except for the fact that
the html format is not supported yet (due for 0.3 and needed for the
test suite).

citeproc-hs is a library: writing C bindings should be easy (BTW there
are pandoc C bindings, so…:-). Which means that writing a PHP
extension that calls citeproc-hs directly should be easy too (even
though I’m not sure I’d be able to do it). Easier then writing a PHP
implementation from scratch I think (I’ve never read the code of the
implementation there were rumors about, even thought I’d have loved
to).

The API is going to change a bit in the next few weeks - needed by the
most recent additions to CSL -, but I would like to have a more stable
API along with CSL 1.0, so I would expect something like what you
would like to see could be feasible by the end of the summer,
reasonably.

Andrea

Why not use pandoc, which already has similar command line options?
From Andrea’s example:
pandoc --csl cslStyle.csl --biblio modsCollection.mods text.markdown

text.html

Andrea’s cite also has instructions on building pandoc with citeproc-hs
support.

This is ready to go right now (as I said, I’ve used it). Your webapp
would need to make ‘text.markdown’ to know which references to cite and
would need to write the reference data to a modsCollection.mods MODS XML
file. MODS XML is still one of the richest formats and has reasonably
good adoption, so it isn’t a bad idea that your app would support it
natively anyway. However, you could use bibutils to convert from the
BibTeX file you can apparently already generate if you absolutely needed to.

Richard, seems like I am getting closer! Maybe that’s the way to go for a PHP bridge at this point. I’ll do some experiments, but of course, if you have a fully working example, it would be great if you could put it into the wiki.

I think Matthias and I had kicked around the idea of having a central
server that ran pandoc+citeproc-hs & bibutils in order to process
citations for other webapps.

…you probably do not want other people to use your bandwith and CPU to process their bibliographies?

I suspect that it’d be fine (especially in the near-term). There is
(was?) a public bibutils server & there are a few small citation
processing servers self-hosted by .edus. But, as you say, we could add
restrictions if needed. The F/OSS webapp (refbase, bibliograph, etc.)
could certainly all get by with a single server to do this right now.

I don’t want to volunteer the old .edu server that I still help
maintain, unless it is the only option (e.g. others agree that it would
be useful, but nobody else volunteers).

I’d certainly be grateful if there was a server that could be used to format stuff with a REST API. I’d volunteer a (rather slow) server myself for the community, actually (a Debian Sarge virtual server) if you would take care of the installation - if a normal user account would suffice to install the stuff for the local user. You can contact me off-line about it.

Christian–
View this message in context: http://n2.nabble.com/Using-citeproc-implementation-for-third-party-applications-tp2603902p2618098.html
Sent from the xbiblio-devel mailing list archive at Nabble.com.