cp-py - input, abstraction, api

On this Johan:

-c …, --csl=… use specified csl file or URL
-m …, --mods=… use specified mods file or URL
-o …, --output=… use specified output file

I suggest you make input generic, or leave it to a wrapper script that
imports the library. So I think you want to put this user-oriented
stuff in a separate file, and have there:

-s …, --style=… use specified csl style file or URL
-d …, --data=… use specified data file or URL
-o …, --output=… use specified output file

After all, it seems to me the input might be MODS, or RDF, or an SQL
database.

There are two approaches to this abstraction:

  1. Simon’s approach is to map source data to native Javascript (JSON?)
    data structures. As he explains it:

We use something like this:

‘itemType’ => “book”
‘title’ => “Computer-Mediated Communication: Human-to-Human
Communication
Across the Internet”
‘dateAdded’ => “2006-03-12 05:25:50”
‘dateModified’ => “2006-03-12 05:25:50”
‘publisher’ => “Allyn & Bacon Publishers”
‘year’ => “2002”
‘pages’ => “347”
‘ISBN’ => “0-205-32145-3”
‘creators’ …
‘0’ …
‘firstName’ => “Susan B.”
‘lastName’ => “Barnes”
‘creatorType’ => “author”

My only complaint about this to him was the name model they’re using.
Also, the flat “publisher” structure might be a problem.

  1. My approach in citeproc-rb is fully object-based. For one thing,
    citeproc-rb is not responsible for input. I leave that up to whoever
    wants to use the library, and they simply have to create Reference
    objects.

I tend to think (not being an expert mind you) each approach makes
sense in their respective contexts.

Simon is processing citations in a browser and the greater abstraction
I have is overkill for that context.

But I do think for languages like Python and Ruby, where the code might
be used in a lot of different contexts (desktop and web-based
applications, scripts, etc.), it makes sense to follow a strict OO
design approach.

In the end, for example, with the Ruby version (once it’s finished!),
I’d like a Rails web developer or a TeX developer be able to pick up
the library and easily integrate it into a solution: almost-instant
citation processing support.

This is why the unit tests and API discussion are important.

Bruce

Hello Bruce et al,

How are you today? I was a bit surprised to read your e-mail, and a
bit confused too. I haven’t really looked at the tests, just am
trying to get CiteProc-py to output as would be expected, not to
recreate the structure of CSL in Python. I have been cleaning up my
code some more to allow for more file formats. It’s not final yet (at
all), but it seems to be going the right way.

I do agree that writing tests is useful, but I’d rather do that after
I’ve feel the code has somewhat stabilized and crystallized out. If
Peter Sefton or someone at his place wants to write the tests or help
coding that is of course welcome. Since I am very new to Python (only
learned it last week) any comments to the coding/coding style is of
course welcome too.

Cheers,

JohanOp 22-jul-2006, om 16:57 heeft Bruce D’Arcus het volgende geschreven:

FYI, Johan. These guys are interested in a citeproc-py. Also, in
strong unit testing :wink:

Take a look at the tests I wrote (both for Python, but more for the
Ruby version) and see if you think the API they suggest is right.

Begin forwarded message:

There is code going in - but no change to the tests. If unit-tests
don’t get done at the start then they are unlikely to get done at
all.

We have secured our funding for the ICE for Research and
Scholarship - currentlqy writing position descriptions so it will
be at least a few weeks until we can hire people and start work.
As soon as that is sorted I can get someone to help.

Peter Sefton
Technical Manager, RUBRIC, University of Southern Queensland


http://www.johankool.nl/

How are you today? I was a bit surprised to read your e-mail, and a
bit confused too.

Sorry; I was just confused too, so trying to clarify. :wink:

I haven’t really looked at the tests, just am trying to get
CiteProc-py to output as would be expected, not to recreate the
structure of CSL in Python. I have been cleaning up my code some more
to allow for more file formats. It’s not final yet (at all), but it
seems to be going the right way.

Great!

I do agree that writing tests is useful, but I’d rather do that after
I’ve feel the code has somewhat stabilized and crystallized out.

With testing like this, you write tests BEFORE you code. You say – in
test code of course – “if I give my code these data, I can expect the
title for the second in the sorted set to be ‘XYZ’”

The reason this is important is in part because it can help build up
what an API ought to look like before even coding (or finishing) it.

So say you write a bunch of code, and don’t have time to implement
certain features. If the unit tests are there, someone else (or you)
can pick it up later and finish it knowing “this is the result I am
after.”

Here were two posts on the OOoBib dev list that got me pointed in this
direction:On Jul 22, 2006, at 11:32 AM, Johan Kool wrote:

On 1/30/06, Bruce D’Arcus <@Bruce_D_Arcus1> wrote:

I still need to wrap my head around this (particularly exactly what
kinds of methods to create tests for), but I guess the idea is to
create a file full of this sort of stuff?

Yep, write tests for the code that doesn’t exist and run them and
watch them fail. Then start filling in the code until you can get
tests to start passing. When all the tests pass you’re done!

//Ed

You typically want to test how a user will want to use your library.
You don’t want to write tests for internal stuff. If there are a
sequence of interpendent events that need to be tested I tend to
bundle them in an individual test method.

Testing does take time. When you first start writing them it feels
like wasted time. But I’ve found it’s an invaluable tool for figuring
out what your API should look like before you start making it…and
more importantly they serve as a safety net as you refactor stuff
later on. It’s very liberating to have a nice test suite that allows
you to go in and monkey with internals, and then run the test suite
afterwards to reassure yourself that things are working properly.

I guess I’m starting to sound religious about this. I’m not a disciple
of extreme programming, but I do think testing is one of the best
things to come out of that methodology.

//Ed

Bruce

Is it just me or do some messages not get through to the discussion
list?

Johan

I was sort of wondering. I think I sent one message yesterday, but
didn’t receive any.

Bruce

I can help with the tests. Ideally, they shouldn’t be really aimed
towards any particular use, but more general, so if I have some that
are too particular, they should be changed or removed.

A test like this is useful because it makes sure you get the proper
sorting and grouping right for the author-year style, but isn’t
particularly picky about how you do it internally:

 def test_reference_grouping_sorting():
     # use itertools groupby?
     assert ref[0].suffix == "a"

Note: I don’t much like how I do the setup though:

 def test_setup():
     data = "http://purl.org/net/darcusb/references/jarticles"
     citations = ("Smith1992a", "Smith1992b", "Jones2002")
     ref = ReferenceList(data, citations)

The data should really be native Python lists and dictionaries.

I think in the end, for the use perspective, we want to end up where
someone can import the library and do something like:

list = CiteProc::ReferenceList()

They then have their own code that loads “list” with Reference objects
(which is probably just a dictionary with some methods to make
accessing the data easier??), so that they can then do:

list.to_xhtml

It might even be that the ReferenceList class also stores the CSL
logic, so that you do:

list = CiteProc::ReferenceList("apa")

In the Ruby code, I have separate CitationStyle and ReferenceList
objets created, which seems awkward to me now. Not sure either way
though!

Bruce