citeproc-py development

Hello,
is this the right place to discuss citeproc-py?
I assume that the sf svn is the most uptodate development trunk?
Many thanks
Petr

I think the most up-to-date version can be found at
http://github.com/bdarcus/citeproc-py/, but Bruce D’Arcus would know for
sure. The xbiblio SourceForge SVN isn’t used anymore.

Rintze2010/5/25 Petr Šimon <@Petr_Simon>

Thanks a lot. The sf code, however, much more recent and much much more
complete.
I was wondering what are the current milestones as I would like to chip in.
Petr

Thanks a lot. The sf code, however, much more recent …

I don’t believe that’s true. I started the bitbucket code after Johan
had put aside the svn code.

and much much more complete.

True.

They’re designed very differently. The svn code is object based, while
mine in bitbucket is more functional. I thought the code might end up
being much simpler.

I also made the decision to use an internal representation of the
output tree that is HTML + RDFa + a few other attributes. My thinking
here was this allows the default output mode to support round-tripping
of data (e.g. the formatted output embeds structured data).

Also, last I looked, the svn code is GPL, while mine is MIT.

I was wondering what are the current milestones as I would like to chip in.

Take a look through the respective code and see which you’d like to
work on. I can help you on your questions more with the bitbucket
code, but if you strongly prefer working on the svn code, we could try
to get in touch with Johan.

BTW, here’s a simple example of how you’d run my code:2010/5/25 Petr Šimon <@Petr_Simon>:

===
from citeproc import process_bibliography, format_bibliography, Style

REFS = [
{
“author”: [
{“family”: “Doe”, “given”: “Jane”},
{“family”: “Smith”, “given”: “John”}
],
“title”:“Some title”,
“volume”:“12”,
“issue”:“3”,
“isbn”: “34239845”
},
{
“title”:“Another Title”,
“issue”:“3”,
“volume”:“23”
}
]

here x.csl is the style file

STYLE = Style(‘x.csl’)

BIBLIO = process_bibliography(STYLE, REFS)

format_bibliography(BIBLIO)

But you’re right; it doesn’t do that much ATM :wink:

Bruce

Just for clarity: when code development had moved away from the xbiblio SVN,
Bruce put the old SVN code put in the “attic” folder. The date-stamps of
these files reflect the date of the move, not the date when they were last
changed.

Rintze

Rintze, thanks for clarification.

Bruce,
I certainly find the git code approach more appealing.

I’ve just quickly tried your example, but it fails at line 218:
conditions.append(reftype == reference[‘type’])
KeyError: ‘type’

When running on the sampledata from citeproc-js/demo or the data in
citeproc-py/tests:
line 148, in substitute
children = substitute_node.getchildren()
AttributeError: ‘NoneType’ object has no attribute ‘getchildren’

I’m now at GMT+8 so I turn to bed and have a closer look tomorrow.
Thanks
Petr

Rintze, thanks for clarification.

Bruce,
I certainly find the git code approach more appealing.

I’ve just quickly tried your example, but it fails at line 218:
conditions.append(reftype == reference[‘type’])
KeyError: ‘type’

When running on the sampledata from citeproc-js/demo or the data in
citeproc-py/tests:
line 148, in substitute
children = substitute_node.getchildren()
AttributeError: ‘NoneType’ object has no attribute ‘getchildren’

Yeah, that stuff isn’t hooked up yet :slight_smile:

Run the code I posted like this (where the code is in file t1.py):

$ python t1.py x.csl

BTW, also works with Python 3.

I’m now at GMT+8 so I turn to bed and have a closer look tomorrow.

Sure thing.

Bruce2010/5/25 Petr Šimon <@Petr_Simon>:

Hello,
many thanks to all for your comments.
I am basically happy with the functional approach Bruce has proposed and I
shall continue in his path.
I have a concern though about the html+rdfa representation. I think that
e.g. Zotero sends RTF to OOo and Word. And I might be wrong, but I think
that the primary function of citeproc will be to provide for these two. E.g.
I’m not sure I understand the goal for the output of citeproc-js in the
demo. I would think that the whole bibliography would ideally be richtext
transferable (copy & paste into word processor). This is not possible with
the current external css style, as far as I can tell. This might be only
necessary for the hanging indent, but it seems to me it should be part of
the output and not left for the end application to format.
Do I understand correctly that citeproc is like bibtex only it needs to
provide output in more formats?
I hope someone could elucidate what’s the big picture.

I’m a bit confused about what’s 0.8 and what’s 1.0
Can I assume that:

  • the data format sent to processor in json has not changed, so examples
    remain the same
  • the styles are all 0.8, e.g. citeproc-js/style.
  • the styles in demo/loadcsl.js are 1.0.

In the demo:

  • Item 5 does not indicate the cs-type. Is that permitted? How is it handled
    in citeproc-js?

Do I understand correctly that citeproc can expect that the data contains {
type, uri?, container-uri?, contributor*, date?, variable+ } from
csl-data.rnc?

Many thanks
Petr

Rintze, thanks for clarification.

Bruce,
I certainly find the git code approach more appealing.

I’ve just quickly tried your example, but it fails at line 218:
conditions.append(reftype == reference[‘type’])
KeyError: ‘type’

When running on the sampledata from citeproc-js/demo or the data in
citeproc-py/tests:
line 148, in substitute
children = substitute_node.getchildren()
AttributeError: ‘NoneType’ object has no attribute ‘getchildren’

Yeah, that stuff isn’t hooked up yet :slight_smile:

Run the code I posted like this (where the code is in file t1.py):

$ python t1.py x.csl

BTW, also works with Python 3.

I’m now at GMT+8 so I turn to bed and have a closer look tomorrow.

Sure thing.

Bruce



xbiblio-devel mailing list
xbiblio-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

Hello,
many thanks to all for your comments.
I am basically happy with the functional approach Bruce has proposed and I
shall continue in his path.
I have a concern though about the html+rdfa representation. I think that
e.g. Zotero sends RTF to OOo and Word. And I might be wrong, but I think
that the primary function of citeproc will be to provide for these two. E.g.
I’m not sure I understand the goal for the output of citeproc-js in the
demo.

html+rdfa is the internal model of Bruce’s citeproc-py work. The
citeproc-js processor is a completely different piece of code.
Currently it does not speak RDFa; it just outputs plain old garden
variety XHTML.

The citeproc-js demo page is just that; it shows that the processor
works. It also allows the code to be run on multiple platforms
(browsers) for testing. Into the bargain, I uncovered a few bugs
while putting it together.

I would think that the whole bibliography would ideally be richtext
transferable (copy & paste into word processor). This is not possible with
the current external css style, as far as I can tell. This might be only
necessary for the hanging indent, but it seems to me it should be part of
the output and not left for the end application to format.

The use of an external stylesheet is intentional. RTF allows the use
of stylesheets, as does LaTeX. When integrated with Word, OO, or
LaTeX, citeproc-js will tie into the stylesheet mechanism of the
target rendering application, allowing the user to easily control
formatting of like entries throughout the document. If formatter were
all coded inline (as is done in current Zotero output), C&P works
better, but Zotero insertions into a document are inflexible and
difficult to control smoothly.

Do I understand correctly that citeproc is like bibtex only it needs to
provide output in more formats?

Yes.

I hope someone could elucidate what’s the big picture.

I’m a bit confused about what’s 0.8 and what’s 1.0
Can I assume that:

  • the data format sent to processor in json has not changed, so examples
    remain the same

The JSON input format was worked out for communication with
citeproc-js. Andrea has also adopted it for input to the latest
citeproc-hs. Both of these processors speak CSL 1.0. The CSL 0.8
processors had no common format for input.

  • the styles are all 0.8, e.g. citeproc-js/style.

It looks like the styles under citeproc-js/style are not valid CSL,
but they’re meant to be CSL 1.0 (they were hand-converted to CSL 1.0,
before update-styles.sh was available).

  • the styles in demo/loadcsl.js are 1.0.

Yes.

In the demo:

  • Item 5 does not indicate the cs-type. Is that permitted? How is it handled
    in citeproc-js?

That’s an accident; it should have type article-journal. But as you
can see, it still formats just fine, which demonstrates the robustness
of CSL. :slight_smile:

Do I understand correctly that citeproc can expect that the data contains {
type, uri?, container-uri?, contributor*, date?, variable+ } from
csl-data.rnc?

That’s a draft spec for input, not yet adopted. It’s not yet used for
validation. But you’re right that there should be a type var there.2010/5/26 Petr Šimon <@Petr_Simon>:

Hello,
many thanks to all for your comments.
I am basically happy with the functional approach Bruce has proposed and
I
shall continue in his path.
I have a concern though about the html+rdfa representation. I think that
e.g. Zotero sends RTF to OOo and Word. And I might be wrong, but I think
that the primary function of citeproc will be to provide for these two.
E.g.
I’m not sure I understand the goal for the output of citeproc-js in the
demo.

html+rdfa is the internal model of Bruce’s citeproc-py work. The
citeproc-js processor is a completely different piece of code.
Currently it does not speak RDFa; it just outputs plain old garden
variety XHTML.

Right. I’ve noticed that. I would be more inclined to separate internal
model from output format. But I don’t want to start forking and rewriting
mindlessly :slight_smile:

The citeproc-js demo page is just that; it shows that the processor
works. It also allows the code to be run on multiple platforms
(browsers) for testing. Into the bargain, I uncovered a few bugs
while putting it together.

I would think that the whole bibliography would ideally be richtext
transferable (copy & paste into word processor). This is not possible
with
the current external css style, as far as I can tell. This might be only
necessary for the hanging indent, but it seems to me it should be part of
the output and not left for the end application to format.

The use of an external stylesheet is intentional. RTF allows the use
of stylesheets, as does LaTeX. When integrated with Word, OO, or
LaTeX, citeproc-js will tie into the stylesheet mechanism of the
target rendering application, allowing the user to easily control
formatting of like entries throughout the document. If formatter were
all coded inline (as is done in current Zotero output), C&P works
better, but Zotero insertions into a document are inflexible and
difficult to control smoothly.

I see. Thanks.

Do I understand correctly that citeproc is like bibtex only it needs to
provide output in more formats?

Yes.

I hope someone could elucidate what’s the big picture.

I’m a bit confused about what’s 0.8 and what’s 1.0
Can I assume that:

  • the data format sent to processor in json has not changed, so examples
    remain the same

The JSON input format was worked out for communication with
citeproc-js. Andrea has also adopted it for input to the latest
citeproc-hs. Both of these processors speak CSL 1.0. The CSL 0.8
processors had no common format for input.

  • the styles are all 0.8, e.g. citeproc-js/style.

It looks like the styles under citeproc-js/style are not valid CSL,
but they’re meant to be CSL 1.0 (they were hand-converted to CSL 1.0,
before update-styles.sh was available).

  • the styles in demo/loadcsl.js are 1.0.

Yes.

In the demo:

  • Item 5 does not indicate the cs-type. Is that permitted? How is it
    handled
    in citeproc-js?

That’s an accident; it should have type article-journal. But as you
can see, it still formats just fine, which demonstrates the robustness
of CSL. :slight_smile:

Can you point me to the place where it’s handled? (or is it a feature-bug?
:wink:

Do I understand correctly that citeproc can expect that the data contains
{
type, uri?, container-uri?, contributor*, date?, variable+ } from
csl-data.rnc?

That’s a draft spec for input, not yet adopted. It’s not yet used for
validation. But you’re right that there should be a type var there.

So what’s the general policy about missing information (like title, year
etc.)? Just warning like bibtex does and format whatever is available?

Petr

Hello,
many thanks to all for your comments.
I am basically happy with the functional approach Bruce has proposed and
I
shall continue in his path.
I have a concern though about the html+rdfa representation. I think that
e.g. Zotero sends RTF to OOo and Word. And I might be wrong, but I think
that the primary function of citeproc will be to provide for these two.
E.g.
I’m not sure I understand the goal for the output of citeproc-js in the
demo.

html+rdfa is the internal model of Bruce’s citeproc-py work. The
citeproc-js processor is a completely different piece of code.
Currently it does not speak RDFa; it just outputs plain old garden
variety XHTML.

Right. I’ve noticed that. I would be more inclined to separate internal
model from output format. But I don’t want to start forking and rewriting
mindlessly :slight_smile:

This is important to understand:

The internal model is actually not exactly HTML + RDFa; rather, it’s
an HTML + RDFa ElementTree object representation.

Another path I could have chosen was to write my own object model for
the internal model. But that would require a fair bit more code, with
no benefit that I could see.

On the other hand, using this approach makes it possible to really
easily dump to a standard-compliant rich output. The intention I had
was to then subsequently write other output drivers that convert that
into whatever other format (RTF, plain HTML, TeX, etc.).

So don’t consider this a bug: it’s a cool feature :wink:

So what’s the general policy about missing information (like title, year
etc.)? Just warning like bibtex does and format whatever is available?

No warnings; just format whatever’s there, and hope the style author
has made proper considerations of possible missing data.

Bruce2010/5/26 Petr Šimon <@Petr_Simon>:

Bruce,
One more question regarding the tests. I’m not sure what I should make of
the test files in cp-py/tests. I suppose you thought about using those
before the test files in cp-js were created?
I suppose it would make sense to use the test set from cp-js (which I assume
are actively used and maintained). I suppose Andrea is using the same files
as cp-js.
Thanks
Petr

Yes. I just removed that stuff from the citeproc-py repo.

Frank mentioned possibly moving his test repo over to my account so
that it’s in the same place as the schema. So I just forked it:

http://bitbucket.org/bdarcus/citeproc-test

if we all agree this makes sense, we can figure out how we proceed
administratively and such.

Bruce2010/5/26 Petr Šimon <@Petr_Simon>:

Thanks. I will work with that. Petr

Bruce,
One more question regarding the tests. I’m not sure what I should make of
the test files in cp-py/tests. I suppose you thought about using those
before the test files in cp-js were created?
I suppose it would make sense to use the test set from cp-js (which I assume
are actively used and maintained). I suppose Andrea is using the same files
as cp-js.

Yes. I just removed that stuff from the citeproc-py repo.

Frank mentioned possibly moving his test repo over to my account so
that it’s in the same place as the schema. So I just forked it:

http://bitbucket.org/bdarcus/citeproc-test

if we all agree this makes sense, we can figure out how we proceed
administratively and such.

When Andrea chimes in and you give us write privileges, I can replace
the copy at my end with a forwarding notice.

I’ve given you admin and write privilege, and Andrea and Rintze the latter.

Bruce

When Andrea chimes in and you give us write privileges, I can replace
the copy at my end with a forwarding notice.

I’ve given you admin and write privilege, and Andrea and Rintze the latter.

I’ve scrubbed the content of the copy in my account, and replaced the
README.txt with a forwarding note.

Frank

Further to Petr Šimon’s note about the missing “type” variable (in the
loadcites.js file behind the citeproc-js demo page), that’s now been
fixed.

(I seem to have lost the original mail on this subject; apologies to
anyone who finds their mail threading is loused up by this message.)

Frank