citeproc-py setup?

Johan,

It may not be ready to play with yet, but how do you set up citeproc-py with a
standard python install (not py2app); at least for local testing/playing?

Bruce

cd citeproc-py/citeproc
python citeproc.py [options]

for details:
python citeproc.py -h

basically:
python citeproc.py --csl /path/to/cslfilein --output /path/to/cslfileout

I am trying to use your cslgallery, but it doesn’t go past
OperationalError at /no such table: styles_style even though I ran the
syncdb command. (I’m using sqlite3.)

Johan

cd citeproc-py/citeproc
python citeproc.py [options]

for details:
python citeproc.py -h

basically:
python citeproc.py --csl /path/to/cslfilein --output /path/to/cslfileout

Doh! Thanks.

I am trying to use your cslgallery, but it doesn’t go past
OperationalError at /no such table: styles_style even though I ran the
syncdb command. (I’m using sqlite3.)

When you run syncdb, it should report (to stdout) the list of tables it builds.

Perhaps because I’m using the authentication and admin machinery,
it expects a multi-user db lke postgres or mysql?

I’m using it with postgres, FWIW. These directions had me running
postgres nicely in about 10 minutes:

http://www.robbyonrails.com/articles/2008/01/22/installing-ruby-on-rails-and-postgresql-on-os-x-third-edition

In any case, if you have further problems, I’ll give you access to my
installation, though
I probably can’t get to it until tomorrow sometime, or Saturday.

Bruce

BTW, this …

-w …, --citeulike-username your username at CiteULike.org

… is cool!

You might look to make it more generic over time, such that one can
use different services. Perhaps have separate parameters for
"service" and “username”?

Bruce

Hey Bruce,

I didn’t got to look at cslgallery anymore, maybe tomorrow evening.
That particular styles_style table was not reported during syncdb. I
am not quite keen on installing mysql or postgresql on my machine, but
I could use the mysql server on my website. I think I’ll just look
into the sqlite3 option more. Is there any particular reason you
recommended against it in your setup.example.py?

The CiteULike.org thing was just a quick idea. I guess it should be
possible, but I haven’t looked into it much, nor is that my priority.
My own data is in bibtex, so that’ll be the first format I want to
support. And it is a lot easier to parse than MODS. :slight_smile:

Goodnight!

Johan—
http://www.johankool.nl/

I didn’t got to look at cslgallery anymore, maybe tomorrow evening.
That particular styles_style table was not reported during syncdb.

Weird. You could always manually create the table.

Is there any particular reason you
recommended against it in your setup.example.py?

Not particularly; just what I said earlier about the muilti-user stuff.

Let me know if you get it working, and I’ll send you the URL for my
install when I have it running again.

Bruce

I’m hoping to finally get back into the Ruby citeproc version in the
next week or so (I had a stab at this with partial success last year).
I’m glad to see a Python version underway, and if possible would like to
collaborate to harmonise the approaches. I have no particularly fixed
ideas about the implementation, but the approach with citeproc-py looks
roughly commensurable. Do you mind if we work together on this?

Regards,

Liam.

Bruce D’Arcus wrote:

Hello Liam,

I do not mind brainstorming together on how to approach and implement
our parsers. I am a bit wary to have others working on CiteProc-Py at
this moment, mostly because it is changing a lot these days. But I
don’t think that you had the latter in mind anyway. You are of course
free to read my code and implement it in a similar manner in Ruby. I
would love to hear where you think my approach is wrong, or could be
more efficient.

I took a quick glance at the Ruby code and realized how little Ruby I
speak. One big thing that might be nice to synchronize is the options
from the command line we take. Than we can share documentation, which
is after all the most boring part to write.

Johan—
http://www.johankool.nl/

FWIW, while I’m no expert in either, I think Ruby and Python are more similar
than they are different. The kinds of idioms that work well in one tend to work
well in the other.

It might be worth considering coordinating on an API and tests? What
are the basic
classes and associated methods, for example. What parameters do they take?

Common documentation, of course, can flow from that.

I updated the test json file in the “data” directory, BTW. Ought to be good for
testing of basic processing (abstracted from import parsing of data formats,
for example)…

Bruce

BTW, looking at the citeproc.py file, here’s a pretty fundamental
question that gets
at these questions of API. This method:

168 def loadCSL(self, csl):
169 # Create an instance of the Handler.
170 handler = CSLDocumentHandler()
171 # Create an instance of the parser.
172 parser = make_parser()
173 # Set the content handler.
174 parser.setContentHandler(handler)
175 inFile = open(csl, ‘r’)
176 # Start the parse.
177 parser.parse(inFile) # [10]
178 # Alternatively, we could directly pass in the file name.
179 #parser.parse(inFileName)
180 inFile.close()
181 return handler.root

As I read this, the method take a CSL file name. I’ve been thinking
for awhile that
it should instead take a URI, and that mapping to a particular file
(on the network,
or cached locally) should probably be a separate method.

e.g. I’m thinking stuff like:

c = CitaProcessor(csl=“http://zotero.org/styles/apa”)

… and maybe something like:

c.changeStyle(“http://ex.net/syles/xyz”)

BTW, for testing in Python ,the doctest support might be good for
this, since you can
get the API usage automatically extracted into documentation along with it
serving double-duty for testing.

Bruce

Oops, that’d be:

c.change_style(“http://ex.net/styles/xzy”)

Bruce

Another comment Johan:

In looking through the directory and class organization, and comparing
it to other Python code I’ve seen, wouldn’t it make more sense that an
example like:

 CitationCSLObject.py

… would instead be:

 csl/citation.py

…?*

E.g. “Object” is redundant, and create a csl directory.

That would result in python import statement like:

import citeproc.csl.citation

Or something like that.

Bruce

  • Or even “csl/models/citation.py”.

As I read this, the method take a CSL file name. I’ve been thinking
for awhile that it should instead take a URI, and that mapping to a
particular file (on the network, or cached locally) should probably
be a separate method.

Consider that done.

It can now be either one of these things:

  • a URL of a remote XML file
  • a filename of a local XML file
  • standard input (“-”)
  • the actual XML document, as a string

This worked great:

% python citeproc/citeproc.py --csl http://www.zotero.org/styles/asa

BTW, for testing in Python ,the doctest support might be good for
this, since you can get the API usage automatically extracted into
documentation along with it serving double-duty for testing.

Do you have a good website for me to read on this topic? I know
docstrings and I know testing, but not how to combine those.

In looking through the directory and class organization, and
comparing it to other Python code I’ve seen, wouldn’t it make more
sense that an example like: CitationCSLObject.py … would instead
be: csl/citation.py …?*

I guess I could do a global find and replace at some point.
CitationCSLObject is btw very different from Citation. The former is
the object representing the citation-tag in a CSL file, the latter is
a citation encountered in a text file. I’ll give it a thought though…

Johan—

As I read this, the method take a CSL file name. I’ve been thinking
for awhile that it should instead take a URI, and that mapping to a
particular file (on the network, or cached locally) should probably
be a separate method.

Consider that done.

It can now be either one of these things:

  • a URL of a remote XML file
  • a filename of a local XML file
  • standard input (“-”)
  • the actual XML document, as a string

This worked great:

% python citeproc/citeproc.py --csl http://www.zotero.org/styles/asa

What happens if the file identified by that URI is stored/cached
locally (say in a datbase)?

BTW, for testing in Python ,the doctest support might be good for
this, since you can get the API usage automatically extracted into
documentation along with it serving double-duty for testing.

Do you have a good website for me to read on this topic? I know
docstrings and I know testing, but not how to combine those.

I ran across it in the context of Django. See:

http://docs.python.org/lib/module-doctest.html
http://www.djangoproject.com/documentation/0.96/testing/

In looking through the directory and class organization, and
comparing it to other Python code I’ve seen, wouldn’t it make more
sense that an example like: CitationCSLObject.py … would instead
be: csl/citation.py …?*

I guess I could do a global find and replace at some point.
CitationCSLObject is btw very different from Citation. The former is
the object representing the citation-tag in a CSL file, the latter is
a citation encountered in a text file. I’ll give it a thought though…

Ah, right. Maybe document.citation vs. csl.citation?

Bruce

What happens if the file identified by that URI is stored/cached
locally (say in a datbase)?

Currently it gets fetched every time AFAIK. If I implement a cache, it
would work just as with webpages. Just a matter of asking the
webserver when the URI was last updated and act accordingly.

Johan—

I’ve finally got back to looking at citeproc-rb. It is still very minimal,
but I’ve committed the changes in any case, since it is a starting point.

I like the idea of a consistent API, test cases, command-line options and
documentation - judging from a quick review of the Python code, it seems
there is already some overlap in class names and design.

I couldn’t run the Python code locally incidentally:

$ python citeproc.py --csl …/misc/asa.csl -m …/misc/Test.mods -d
…/misc/docb
ook-test.xml
Traceback (most recent call last):
File “citeproc.py”, line 451, in
main(sys.argv[1:])
File “citeproc.py”, line 446, in main
k = CiteProcessor(csl, referenceSource, referenceType, documentSource,
docum
entType, output)
File “citeproc.py”, line 115, in init
cslfiles = dircache.listdir(csl)
File “/usr/lib/python2.5/dircache.py”, line 27, in listdir
list = os.listdir(path)
OSError: [Errno 20] Not a directory: ‘…/misc/asa.csl’

Regards,

Liam.

I’ve finally got back to looking at citeproc-rb. It is still very minimal,
but I’ve committed the changes in any case, since it is a starting point.

Cool!

Note, your licensing terms are a little vague. I see this in one file …

Licensed under the same terms as CiteProc.

… but when I go to the main citeproc.rb, I see the same thing. Mght
it be good too use the BSD-like Ruby license?

I like the idea of a consistent API, test cases, command-line options and
documentation …

Yes. Nice to see you filling out the tests. The real challenge will be
in coming up with good tests for the actual formatting (et al
handling, sorting and related stuff, name-handling).

Bruce

I’m happy with whichever license - I meant really the same terms as
citeproc-xsl. BSD is fine, or if there is a common license covering CSL, I’m
happy with that too.

Unfortunately I’m somewhat hazy about what the actual tests should produce

  • the formatting is not my strong point. However a common syntax for test
    citation data, and some way of specifying how the citations are formatted
    for testing, common to both Python and Ruby, would be ideal. YAML or XML are
    probably candidates for how this information is declared. I’m happy to work
    on this - but as I say I just don’t know enough about what, say, APA
    citations should look like in fine detail.

Regards,

Liam.

Hello Lia,

You should now be able to run CiteProc-Py from the commandline. It was
before expecting a folder with CSL style files so I could easily see
the output for each of those.

You do not need to know exactly what APA citations look like to be
able to write tests. I think it is better to test on smaller things,
like if I set a prefix and suffix, and ask my code to add those, is
that done correctly? Or, if I set the capitalization to words, does it
produce the expected output. If you write a test for your whole code
and expect it to output APA, you cannot know at that point wether it
is an error in your codes logic or in the APA style file.

As I already have indicated to Bruce off-list. If the need arises, I
am willing to reconsider other licenses for CiteProc-Py than the
current GPLv3.

Regards,

Johan—
http://www.johankool.nl/

ooops; forgot to send this earlier (consistent with Johan’s suggestion) …

Liam Magee wrote:

Unfortunately I’m somewhat hazy about what the actual tests should
produce - the formatting is not my strong point. However a common
syntax for test citation data, and some way of specifying how the
citations are formatted for testing, common to both Python and Ruby,
would be ideal. YAML or XML are probably candidates for how this
information is declared.

This is part of why I started the sample data file. I’d imagine having
tests like the following would be good:

when given a list of names and CSL parameter does it output correct

order, initialization, spacing

name_formatting

when given a date and CSL parameter does it output correct

order, shortening, and spacing

date_formatting

test for correct “2000a”; insure proper index value among list of refs

author_date_suffx

when given a long author list and CSL et-al-min and et-al-use

parameters, yields correct boolean value

et_al

when given a list of citations, and item within it returns

the correct boolean value

subsequent

RSpec is nice for this kind of thing, BTW, but feel free to turn this
into standard tests.

There could probably be a lot more. but this ought to be a good start.

Bruce