While putting together the processor manual for citeproc-js, I made
another attempt to install pandoc, thinking to leverage the formatting
work that has gone into the CSL documentation and schema. It didn’t
work out too well for me on Ubuntu (8.10?).
The pandoc package install for my OS was broken. It turned out that I
needed a different version of Haskell to get pandoc running in the
version required to process the CSL docs, and that had to be compiled
from scratch in my environment. Given that the objective was just to
format a small plain text file into XHTML, I was surprised by the
sheer volume of material that needed to be assembled for the compile,
and by the amount of time required to build the pandoc system. The
downloaded sources were about the size of a LaTeX distribution. The
compile took close to two hours on my little laptop (IIRC). When it
failed with a link error at the final stage, I gave up (again).
Looking around, I found that reStructuredText can now support syntax
highlighting with pygments. I cobbled together a little Python script
to grind the document (attached for reference), which works with a
standard Python installation + the pygments module.
I find myself wondering whether it would be sensible to move the
specification, upgrade notes, and pretty-printed schema to
reStructuredText format. I would be happy to carry out the
conversion. ReST is a well-defined format, and has plenty of
authoring support, including nice little items like this little
real-time preview tool:
http://cometdemo.lshift.net:8080/greed/welcome_document/
Bruce has commented as follows:
I think this is a debatable [point]. The enhanced markdown that pandoc
processes can be read without transforming it to XHTML, and ReST isn’t
widely supported outside docutils (and hence, python).
But I’m open to the idea. I’d just suggest floating it to the list to
see if anyone else has any opinions.
Also, I’d like the output to be compliant XHTML (not HTML 4).
As far as I can tell, the enhanced markdown that pandoc processes is
supported only by haskell, so it’s six of one and half a dozen of the
other as far as dedication to a single scripting engine goes. Haskell
is cool, but it’s a good deal less common than python, and it either
has a less reliable or a slower pipeline for package releases (hence
the need to compile from scratch). In two attempts, I’ve been unable
to get a pandoc system going within a reasonable amount of time, which
is skirting close to red card territory.
The fact that ReST is closely tied to python could be seen as an
advantage. The ReST language is rigorously specified, which is where
processing tools built on top of it come from. Python itself is
cross-platform and extremely common. I don’t really see why a tie-in
to it would be a problem, or where the benefits of a similar tie-in to
haskell come from.
Like markdown, documents formatted in ReST can be read by humans.
Various conversions are possible, and XHTML is certainly among them.
For a sampler, here’s the top hit off Google for “xhtml restructured
text output”: http://www.strangegizmo.com/products/restxsl/
Personally, I don’t really see much benefit in sticking with markdown
and pandoc, but that’s mainly on the basis of two frustrated attempts
to get a working system going. Maybe my installation woes were just a
one-off accident. Have other people had better experiences installing
pandoc?
Frank