a Haskell implementation of citeproc

Hi,

I announced to Bruce that I was going to write a Haskell
implementation of citeproc to be used with pandoc a few months ago,
but only this week I was able to start coding.

Luckily I was able to produce an almost working proof of concept -
just the internal data type structure, with the basic function to
evaluate cs-elements and expand macros -, still enough to make me
believe I should be able to cope with the challenge, which was
something I was not entirely confident with.

Unfortunately I lack some documentation - and the schema seems not to
be enough to solve all my doubts. Moreover I must confess I’m not that
familiar with the programming languages the other implementations are
written with. Well, the Python and the Ruby code was really inspiring,
but they are incomplete. I have also to admit I have some difficulties
in parsing the Zotero implementation.

I’m also a bit confused because what I read on the xbiblio.sf.net site
seems to me quite misleading if compared with the recent development
of the schema, am I right?

What I’m trying to say is that I need your help to carry on my work.
Just to give you a few examples of stuff that is not clear to me:

  1. what is the present status (and meaning) of the “class” attribute
    of the “style” element (“in-text” | “note”)?

  2. what is the meaning and usage of the “class” attribute of the
    "group" element?

  3. how is the fallback mechanisms supposed to work? Is it still there?
    (see http://xbiblio.sourceforge.net/csl/)

Then I have more specific questions - and more will come along the
development path. For instance, something related to the
"if"/“else-if” elements:

  1. the “variable” attribute: the comment reads “If a given variable
    exists, this is true”. This means that the variable must have a non
    null value, right?

  2. the “is-numeric” attribute: what does "contains numeric data"
    means? That (or “volume” | “issue” |
    “number” | “number-of-volumes”) is always true?

This is just an exemplification of the questions that arise here and
that the documentation I found seems not enough to solve. Maybe you
can point me to better docs.

By the way, I think you did a wonderful job and I have the feeling
that implementing CSL in Haskell is going to be fun and far less
complicated than I previously feared.

At the present time I working on the internal type structure: the CSL
object model (to speak Ruby) and the Reference object. When this part,
the core, will be ready I’ll start working on the input filters: CSL
and reference parsing. Since during this summer my spare time will be
very limited I think I’ll be able to produce something usable by the
end of the next autumn. Still I will share the code - with BSD license

  • as soon as it will be readable, even if not usable (hopefully by the
    end of the month).

Thanks for your attention.

Best regards,
Andrea

PS: and now something to introduce myself. I’m a legal scholar, which
means that I do legal research and teach in an Italian university. I
met CSL quite early in its development life, when I was developing a
wiki engine, written in PHP, which had (well, still has) a citation
markup, a bibliographic database and automatic reference and
bibliographic formatting: UniWakka. At that time I was looking for a
standardized way of dealing with bibliographies and I came to know
about Bruce’s work, which I’ve been following since then.

With that wiki I wrote a book - the wiki also had an OpenDocument
exporter. Now I do not actively develop that wiki engine any longer
and I was looking for a substitution to deal with my work-flow, when I
found pandoc, which seems to perfectly fit my needs - if it only had
an OpenDocument filter and a bibliographic formatting engine. The
first issue has been solved recently… :wink:

  1. what is the present status (and meaning) of the “class” attribute
    of the “style” element (“in-text” | “note”)?

In-text means the citation is rendered in-line; “note” means it’s
rendered as a footnote or endnote.

  1. what is the meaning and usage of the “class” attribute of the

“group” element?

It’s designed for examples like “(New York:ABC Books)”. I’ll try to
formalize a more clear definition.

  1. how is the fallback mechanisms supposed to work? Is it still there?
    (see Citation Style Language - Citation Style Language)

There are three fallback types: “article,” “book”, and “chapter.” They
each correspond to more abstract classes of resources. From a CSL
model perspective, I suppose you could say those abstract classes
might be something like (in pseudo code):

if container-title:
if volume or issue:
return article
else:
return chapter
else:
return book

Then I have more specific questions - and more will come along the

development path. For instance, something related to the
“if”/“else-if” elements:

  1. the “variable” attribute: the comment reads “If a given variable
    exists, this is true”. This means that the variable must have a non

null value, right?

Yes.

  1. the “is-numeric” attribute: what does “contains numeric data”
    means? That (or “volume” | “issue” |
    “number” | “number-of-volumes”) is always true?

Hmm … this is sort of new, and I think Julian requested it. We
probably need to formalize that better, but I think it just means that
the content of a given variable is (or can be cast?) to an integer.
Consider an example like an edition, where you have have “1” that you
want as “1st”, or you may have just “new ediition.”

This is just an exemplification of the questions that arise here and
that the documentation I found seems not enough to solve. Maybe you
can point me to better docs.

We need to enhance the schema so these questions go away. There’s this too:

http://dev.zotero.org/csl_syntax_summary

By the way, I think you did a wonderful job and I have the feeling
that implementing CSL in Haskell is going to be fun and far less
complicated than I previously feared.

Great!

At the present time I working on the internal type structure: the CSL
object model (to speak Ruby) and the Reference object. When this part,
the core, will be ready I’ll start working on the input filters: CSL
and reference parsing. Since during this summer my spare time will be
very limited I think I’ll be able to produce something usable by the
end of the next autumn. Still I will share the code - with BSD license

  • as soon as it will be readable, even if not usable (hopefully by the
    end of the month).

Sounds great! Note the recent discussion about a test suite. If you
think that might be useful for you, feel free to join.

Bruce

Ooh, Andrea, can I get you to vet our support for legal citations,
both here, and in the new bibo RDF ontology?

http://bibliontology.com/

Bruce

Hi Andrea,

Thanks for the introduction, which is quite timely given I’ve been able to
resume the Ruby version in the past week after some months of inactivity.
There has been recent discussion on developing some test cases across
implementations, which you are more than welcome to join.

Regards,

Liam.2008/6/19 Andrea Rossato <@Andrea_Rossato>:

Il Thu, Jun 19, 2008 at 08:04:46AM -0400, Bruce D’Arcus ebbe a scrivere:

  1. what is the present status (and meaning) of the “class” attribute
    of the “style” element (“in-text” | “note”)?

In-text means the citation is rendered in-line; “note” means it’s
rendered as a footnote or endnote.

And what is the implementation supposed to do? I mean, I suppose this
is related to the use of “ibid”. That is to say, when citeproc is
called with a style belonging to the in-line class all citation are
supposed to be used in-line and that’s it, right?

  1. how is the fallback mechanisms supposed to work? Is it still there?
    (see Citation Style Language - Citation Style Language)

There are three fallback types: “article,” “book”, and “chapter.” They
each correspond to more abstract classes of resources. From a CSL
model perspective, I suppose you could say those abstract classes
might be something like (in pseudo code):

if container-title:
if volume or issue:
return article
else:
return chapter
else:
return book

Is the implementation supposed to have some predefined layouts (as
) for the citation and the bibliography elements?

That is to say, suppose you have a style with a bibliographic layout
like this:

should produce nothing for articles, ect, and the string “nothing” in
case of books. Is that right?

  1. the “is-numeric” attribute: what does “contains numeric data”
    means? That (or “volume” | “issue” |
    “number” | “number-of-volumes”) is always true?

Hmm … this is sort of new, and I think Julian requested it. We
probably need to formalize that better, but I think it just means that
the content of a given variable is (or can be cast?) to an integer.
Consider an example like an edition, where you have have “1” that you
want as “1st”, or you may have just “new ediition.”

Shouldn’t you use the cs-number element? Edition, volumes, etc., are
defined as carrying “numeric” information in the (numeric | ordinal |
roman) form. Or, just to make a stupid example, in the case of a title
like “123456”, should return true?

The same applies to “is-date”.

Sounds great! Note the recent discussion about a test suite. If you
think that might be useful for you, feel free to join.

A test suite will definitely be useful to anyone writing an
implementation. If I’ll have some contribution to bring to the
discussion I’ll definitely share them.

Thanks.

Andrea

Il Thu, Jun 19, 2008 at 08:04:46AM -0400, Bruce D’Arcus ebbe a scrivere:

  1. what is the present status (and meaning) of the “class” attribute
    of the “style” element (“in-text” | “note”)?

In-text means the citation is rendered in-line; “note” means it’s
rendered as a footnote or endnote.

And what is the implementation supposed to do? I mean, I suppose this
is related to the use of “ibid”. That is to say, when citeproc is
called with a style belonging to the in-line class all citation are
supposed to be used in-line and that’s it, right?

Right. Following are all in-text examples:

author-date:  " .... (Doe, 1999)."

label: " .... [doe99]."

numeric: " .... [1]."

Keep in mind one goal of CSL is allow switching between in-text and
note styles without modifying source.

  1. how is the fallback mechanisms supposed to work? Is it still there?
    (see Citation Style Language - Citation Style Language)

There are three fallback types: “article,” “book”, and “chapter.” They
each correspond to more abstract classes of resources. From a CSL
model perspective, I suppose you could say those abstract classes
might be something like (in pseudo code):

if container-title:
if volume or issue:
return article
else:
return chapter
else:
return book

Is the implementation supposed to have some predefined layouts (as
) for the citation and the bibliography elements?

In earlier versions of CSL (e.g. before we added macros and changed
the structure), definitions for the fallback were required. One might
argue that this fallback behavior is legacy, though, and that macros
make them less important.

That is to say, suppose you have a style with a bibliographic layout
like this:

should produce nothing for articles, ect, and the string “nothing” in
case of books. Is that right?

Yes. Note, though, going back to my point above, that the macro
feature means you can do stuff like:

...

… and then:

... ... ...
  1. the “is-numeric” attribute: what does “contains numeric data”
    means? That (or “volume” | “issue” |
    “number” | “number-of-volumes”) is always true?

Hmm … this is sort of new, and I think Julian requested it. We
probably need to formalize that better, but I think it just means that
the content of a given variable is (or can be cast?) to an integer.
Consider an example like an edition, where you have have “1” that you
want as “1st”, or you may have just “new ediition.”

Shouldn’t you use the cs-number element? Edition, volumes, etc., are
defined as carrying “numeric” information in the (numeric | ordinal |
roman) form. Or, just to make a stupid example, in the case of a title
like “123456”, should return true?

The same applies to “is-date”.

I need to look at this again, but if Julian or Simon are around (or
even Johan, since I think he was involved in this discussion) and can
help clarify, that’d be good.

Bruce

Bruce D’Arcus wrote:

In earlier versions of CSL (e.g. before we added macros and changed
the structure), definitions for the fallback were required. One might
argue that this fallback behavior is legacy, though, and that macros
make them less important.

Cool! I’m afraid the Haskell implementation won’t support legacy
features…:wink:

Yes. Note, though, going back to my point above, that the macro
feature means you can do stuff like:

...

… and then:

... ... ...

Well, sure, this is obvious. I love this new design (I found out
about it only when I seriously started to study the schema in order to
write the code): it makes the implementation a lot easier.

(Now I’m studying a way not avoid writing a lot of boilerplate code
for the parsing and the querying …:slight_smile:

Andrea

Bruce D’Arcus wrote:

In earlier versions of CSL (e.g. before we added macros and changed
the structure), definitions for the fallback were required. One might
argue that this fallback behavior is legacy, though, and that macros
make them less important.

Cool! I’m afraid the Haskell implementation won’t support legacy
features…:wink:

That’s fine. But just be aware that we’ve not formally declared it as
such, so that you might end up with discrepancies between
implementations.

If any other implementors have thoughts on this issue, let us know.

(Now I’m studying a way not avoid writing a lot of boilerplate code
for the parsing and the querying …:slight_smile:

What do you mean?

Bruce

This is just Haskell related (is there anyone who does Haskell here?):
there’s a coding technique, called Scarp Your Boilerplate (SBY), for
dealing with complex data structures, based on reflection and generic
programming.

I hope I’ll be able to employ it to (hopefully) simplify the code.

Andrea