citing lower-federal, state, and local court decisions

Are you SURE that’s something that needs configuration in CSL? I’ve
always assumed it’s up to the implementation.

If yes, then add it to the tracker.

I think we should set some deadline. a reasonable one seems to me the
late summer, early autumn. We could start to think about a roadmap
(maybe you could just propose one).

What would we like in this roadmap, beyond tagging a 1.0 release of the schema?

I believe that a date for the tagging would help.

Yes, and we need a common API at the citation processing level too,
maybe, as I think you were suggesting a thread below. But I think this
is premature. Fist we need to have a CSL-1.0, which would be the
target for every implementation. Then we can really start setting some
higher level rules for the implementations.

I think some of this can happen in parallel though.

Sure, but I have the feeling that the implementations, while generally
usable, are still somehow immature for providing that word-processor
level API you were talking about. Or maybe it is just me… take the
recently added first-reference-note-number variable: that made me
rethink about my highest level function’s type signature, which, when
translated, means breaking, once again, almost everything. I still
have to implement it and, thanks to this surprise I’m going to take a
safer approach for the future… still I’m facing a major refactoring
of significant portion of my code.

Ouch. I originally had a rosy notion that note numbers for
back-references could be handled entirely within the processor, but
both Bruce and Simon have provided strong arguments for why that’s not
workable. In Zotero, the note numbers will be attached to item data
by the plugins, before items are submitted to the processor. So in
citeproc-js, at least, I won’t be attempting to track those numbers.

Frank

I think we should set some deadline. a reasonable one seems to me the
late summer, early autumn. We could start to think about a roadmap
(maybe you could just propose one).

What would we like in this roadmap, beyond tagging a 1.0 release of the schema?

Maybe not for 1.0, but for the do-list: multi-lingual layering, with
sort keys for Chinese and Japanese (with options for roman or local
phonetic ordering).

Are you SURE that’s something that needs configuration in CSL? I’ve
always assumed it’s up to the implementation.

If yes, then add it to the tracker.

Will think about it further.

I added the following:

   ## issuing authority (for patents) or judicial authority (such  

as court
## for legal cases)
“authority”

Thanks to everyone for comments on this.
Best,
Elena

Don’t know if it’s relevant, but the fixtures and tests under the
./std subdir of citeproc-js should be digestible by any
implementation, with appropriate function wrappers to remangle the CSL
and data fields. Not much there yet, but I’ll be adding to it
gradually as time goes on. It’s a bit of a jumble, if anyone can
think of better ways to lay things out, feel free to rearrange the
furniture.

Frank

Hi Frank (and all),

I’ve been lurking a bit on the list as I prepare to work on the Ruby
version that Liam started. I’m attempting to work these tests in as
I’m sure they will be helpful in finding some bugs and other places
where it needs work. I note that the format of the testing information
seems to be close to JSON, but the Ruby JSON parser chokes on some
things (the comment line at the start of the files, unescaped "
characters in the csl field, etc.). I’m not sure if this is because
it’s not intended to be JSON in the first place or because of
deficiencies in the Ruby parser (or leniency in other parsers). If
it’s the former I wonder if we could convert it to JSON to make it
easy to parse across languages (happy enough to help in that effort).

What do you think?

Howard

Hi Howard,On Wed, Apr 1, 2009 at 11:08 AM, Howard Ding <@Howard_Ding> wrote:

I’ve been lurking a bit on the list as I prepare to work on the Ruby
version that Liam started. I’m attempting to work these tests in as
I’m sure they will be helpful in finding some bugs and other places
where it needs work. I note that the format of the testing information
seems to be close to JSON, but the Ruby JSON parser chokes on some
things (the comment line at the start of the files, unescaped "
characters in the csl field, etc.). I’m not sure if this is because
it’s not intended to be JSON in the first place or because of
deficiencies in the Ruby parser (or leniency in other parsers). If
it’s the former I wonder if we could convert it to JSON to make it
easy to parse across languages (happy enough to help in that effort).

What do you think?

My guess is they aren’t valid JSON (certainly attribute’s inside a
json value should use ’ rather than " to avoid that problem). Try here
to confirm:

http://www.jsonlint.com/

Bruce

I can’t figure out how to get the embedded XML to validate (as JSON).
I suppose another approach is to load the XML fragments from separate
files.

Bruce

Hi,

I’ve been playing with it. If I take the following steps:

  1. Strip out comments
  2. Remove newlines and tabs
  3. Escape all the quote characters for XML attributes

then all the files (except one which I think hasn’t been converted
from the old format) will
parse as JSON. I don’t think that any of this screws up the XML or
affects the tests in any way - but I haven’t confirmed that.

Howard

Ouch; that seems excessively obnoxious. Shouldn’t we just store them
as separate files?

Bruce

I’d be perfectly happy with that - just reporting my results. Seems
perfectly reasonable given that the citations are in their own files.

Howard

Hi,

Also, I assume that the csl entries (or files if we go that way) are
supposed to be complete, valid CSL documents. For this I think they
need to at least have a skeletal info section, no?

Howard

Also, I assume that the csl entries (or files if we go that way) are
supposed to be complete, valid CSL documents.

We haven’t talked about it. I’m not sure where I stand.

For this I think they need to at least have a skeletal info section, no?

Probably. It might also need at least a citation element to be valid
with the current schema; not sure.

Bruce

Hi,On Wed, Apr 1, 2009 at 3:59 PM, Bruce D’Arcus <@Bruce_D_Arcus1> wrote:

On Wed, Apr 1, 2009 at 4:48 PM, Howard Ding > <@Howard_Ding> wrote:

Also, I assume that the csl entries (or files if we go that way) are
supposed to be complete, valid CSL documents.

We haven’t talked about it. I’m not sure where I stand.

Fair enough. I’m just trying to get a full understanding of them and
what they’re intended to be so that I can do what I need to to smack
the Ruby version with them.

Howard

Sure. What does everyone else think?

  1. do we store the CSL fragments as files? I say yes.

  2. do we tweak the schema to validate them? I don’t know.

Bruce

Hi,

Also, I assume that the csl entries (or files if we go that way) are
supposed to be complete, valid CSL documents.

We haven’t talked about it. I’m not sure where I stand.

Fair enough. I’m just trying to get a full understanding of them and
what they’re intended to be so that I can do what I need to to smack
the Ruby version with them.

Sure. What does everyone else think?

  1. do we store the CSL fragments as files? I say yes.

  2. do we tweak the schema to validate them? I don’t know.

Sorry for coming in late on this; it’s morning-time here, I was asleep
during the discussion.

The files aren’t valid JSON; the unescaped line endings in the csl
element would blow things up. The way it’s all set up at the moment
is a mongrel compromise between ease of reading and spinning out new
tests, and functional separation in the file hierarchy. It’s ugly and
wants to be better.

Now that the CSL bits are expressed as entire style objects, It’s
probably right to do as Bruce suggests, and move them out to separate
files somehow. My only worry with that is that it adds one more layer
to trawl through when trying to figure out what a test actually does.
I’ve written most of the tests so far, and it takes a little
back-and-forth to come up to speed with what the moving parts are,
when revisiting one of the things. It would be the more frustrating
when sorting through them for the first time. Separation between
files is particularly cumbersome when coding on a small monitor where
you can’t lay out multiple files side by side (citeproc-js is being
written on a eeePc).

I’m easy, but I think my preferred move would be in the other
direction, pulling copies of the data file content into the tests as
well, so that each is a self-contained unit. The master files
wouldn’t be valid anything, but they would be easy to edit. I could
provide a little grinder that generates valid JSON and valid CSL from
the master files, for use by programs. With a little documentation
(!), that would save duplication of effort, and avoid confusion like
everyone experienced yesterday.

That’s my thought, anyway, but I’m easy. Would a
test-file-plus-clean-extracted-source approach be acceptable? Or are
there reasons for the files to be written and maintained separately?

Frank

Anyway, I’

Now that the CSL bits are expressed as entire style objects, It’s
probably right to do as Bruce suggests, and move them out to separate
files somehow. My only worry with that is that it adds one more layer
to trawl through when trying to figure out what a test actually does.

Yes, but:

a) reformatting the fragments to be valid JSON would make them much
harder to read visually (and so harder to “figure out what a test
actually does”), and …

b) perhaps a clever file naming convention would at least make
associating the files easier?

I’ve written most of the tests so far, and it takes a little
back-and-forth to come up to speed with what the moving parts are,
when revisiting one of the things. It would be the more frustrating
when sorting through them for the first time. Separation between
files is particularly cumbersome when coding on a small monitor where
you can’t lay out multiple files side by side (citeproc-js is being
written on a eeePc).

Cool (the eeepc part)! I was actually thinking of getting one of those.

I’m easy, but I think my preferred move would be in the other
direction, pulling copies of the data file content into the tests as
well, so that each is a self-contained unit. The master files
wouldn’t be valid anything, but they would be easy to edit. I could
provide a little grinder that generates valid JSON and valid CSL from
the master files, for use by programs. With a little documentation
(!), that would save duplication of effort, and avoid confusion like
everyone experienced yesterday.

I suppose the fixtures could be rewritten as XML. E.g.:

blah, blah, blah [...] [... csl fragment ...] [... expected result ...]

That avoids the funky problems with the XML/JSON syntax mismatch, and
allows self-contained tests; no need to write a pre-processor either.

That’s my thought, anyway, but I’m easy. Would a
test-file-plus-clean-extracted-source approach be acceptable? Or are
there reasons for the files to be written and maintained separately?

I don’t know. I’m coming at this as someone that started out as an XML
geek, so am a bit purist about these things. I think it really depends
on what Andrea, Howard, et al have to say about it. E.g. let’s see if
there’s a consensus?

Bruce

Trying to think of where I might look for an example, I did a search
for testing XSLT (which is effectively a more general templating
language than CSL). Here’s a hit I got:

http://www.jenitennison.com/xslt/utilities/unit-testing/index.html

Bruce

Now that the CSL bits are expressed as entire style objects, It’s
probably right to do as Bruce suggests, and move them out to separate
files somehow. My only worry with that is that it adds one more layer
to trawl through when trying to figure out what a test actually does.

Yes, but:

a) reformatting the fragments to be valid JSON would make them much
harder to read visually (and so harder to “figure out what a test
actually does”), and …

With this route, there would be two top-level hierarchies, with
identical content in different forms, say ./humans and ./machines.
The ./humans area would be easy for us to handle, the ./machines area
would be easy for our silicon friends.

b) perhaps a clever file naming convention would at least make
associating the files easier?

I’ve tried using descriptive names for the data files, which helped a
little. But they can apply to multiple tests, so when fashioning a
new one, it’s hard to know what parts of the data can be safely
changed to suit the new test without incidentally breaking other ones.

I’ve written most of the tests so far, and it takes a little
back-and-forth to come up to speed with what the moving parts are,
when revisiting one of the things. It would be the more frustrating
when sorting through them for the first time. Separation between
files is particularly cumbersome when coding on a small monitor where
you can’t lay out multiple files side by side (citeproc-js is being
written on a eeePc).

Cool (the eeepc part)! I was actually thinking of getting one of those.

I’m easy, but I think my preferred move would be in the other
direction, pulling copies of the data file content into the tests as
well, so that each is a self-contained unit. The master files
wouldn’t be valid anything, but they would be easy to edit. I could
provide a little grinder that generates valid JSON and valid CSL from
the master files, for use by programs. With a little documentation
(!), that would save duplication of effort, and avoid confusion like
everyone experienced yesterday.

I suppose the fixtures could be rewritten as XML. E.g.:

blah, blah, blah [...] [... csl fragment ...] [... expected result ...]

Could do that. Seems like it would require a lot of typing, though.

That avoids the funky problems with the XML/JSON syntax mismatch, and
allows self-contained tests; no need to write a pre-processor either.

That’s my thought, anyway, but I’m easy. Would a
test-file-plus-clean-extracted-source approach be acceptable? Or are
there reasons for the files to be written and maintained separately?

I don’t know. I’m coming at this as someone that started out as an XML
geek, so am a bit purist about these things. I think it really depends
on what Andrea, Howard, et al have to say about it. E.g. let’s see if
there’s a consensus?

Sure thing. Now that things are starting to fall together at my end,
I’m happy to refactor the tests in any form that is easy for everyone
to work with. If a “test source file” approach were used, the chunks
could be set off with simple plain text headers. The
prepare-for-machine script could be a little bash script that mangles
the source with sed. It’s a thought, anyway.

To help move this forward, I’ve merged my work from the fbennett
branch over to trunk, so fbennett can be treated as a sandbox to try
out test formats, etc. If anyone wants to have a bash directly, feel
free.

Frank

Now that the CSL bits are expressed as entire style objects, It’s
probably right to do as Bruce suggests, and move them out to separate
files somehow. My only worry with that is that it adds one more layer
to trawl through when trying to figure out what a test actually does.

Yes, but:

a) reformatting the fragments to be valid JSON would make them much
harder to read visually (and so harder to “figure out what a test
actually does”), and …

With this route, there would be two top-level hierarchies, with
identical content in different forms, say ./humans and ./machines.
The ./humans area would be easy for us to handle, the ./machines area
would be easy for our silicon friends.

b) perhaps a clever file naming convention would at least make
associating the files easier?

I’ve tried using descriptive names for the data files, which helped a
little. But they can apply to multiple tests, so when fashioning a
new one, it’s hard to know what parts of the data can be safely
changed to suit the new test without incidentally breaking other ones.

I’ve written most of the tests so far, and it takes a little
back-and-forth to come up to speed with what the moving parts are,
when revisiting one of the things. It would be the more frustrating
when sorting through them for the first time. Separation between
files is particularly cumbersome when coding on a small monitor where
you can’t lay out multiple files side by side (citeproc-js is being
written on a eeePc).

Cool (the eeepc part)! I was actually thinking of getting one of those.

I’m easy, but I think my preferred move would be in the other
direction, pulling copies of the data file content into the tests as
well, so that each is a self-contained unit. The master files
wouldn’t be valid anything, but they would be easy to edit. I could
provide a little grinder that generates valid JSON and valid CSL from
the master files, for use by programs. With a little documentation
(!), that would save duplication of effort, and avoid confusion like
everyone experienced yesterday.

I suppose the fixtures could be rewritten as XML. E.g.:

blah, blah, blah [...] [... csl fragment ...] [... expected result ...]

Could do that. Seems like it would require a lot of typing, though.

That avoids the funky problems with the XML/JSON syntax mismatch, and
allows self-contained tests; no need to write a pre-processor either.

That’s my thought, anyway, but I’m easy. Would a
test-file-plus-clean-extracted-source approach be acceptable? Or are
there reasons for the files to be written and maintained separately?

I don’t know. I’m coming at this as someone that started out as an XML
geek, so am a bit purist about these things. I think it really depends
on what Andrea, Howard, et al have to say about it. E.g. let’s see if
there’s a consensus?

Here’s a thought. Casting the tests in XML would be clean, I just
worry that the amount of typing involved might discourage people from
writing them. It needs a UI – and we have one, in Zotero. If there
were a plugin to allow a developer to select a set of items from his
local database, select a style, and select whether they should be
rendered as bibliography or citations, and (possibly) enter a notional
output string, the plugin could dump out XML that’s ready to go.

I wouldn’t be up to writing that myself, but it would be really nice
to have. If there is a prospect of that emerging in the medium term,
the minor pain of refactoring the tests in XML by hand would
definitely be worth it. You could validate the bejeebers out of a
processor with very little effort.

Frank