Modular legal style support in CSL-m

I’ve opened a bookmark in the citeproc-js repo for modular legal style
support. This is an initial note on the design I’m contemplating for
it.

This is a very long post, which I’ll be using as a working note while
building out the support in the processor. Comments are welcome - but
I know that everyone is very busy.*****

Requirements

Legal citation formats vary across jurisdictions. Unification is not
an option for a couple of reasons:

(1) Primary legal materials have a (pretty much) well defined internal
structure that is specific to each jurisdiction, and citation forms
reflect that local structure.

(2) Legal institutions have large stakeholder populations, so local
changes to referencing patterns are expensive to implement.

CSL currently expresses citation styles in a single file. Coding
separate jurisdiction-specific citation formats into a single file is
possible, but there are obvious issues with scalability and
maintainability. On the user side, there is also a problem of
full-line supply - where there are variants of a national
jurisdiction’s citation forms, finding a style with the particular
combination of formats needed for a project becomes the more
difficult.

Proposed solution

The proposed solution is to express jurisdiction-specific citation
forms for primary legal materials (only) in separate modules that can
be loaded on demand by a CSL processor.

If successful, modular legal style support would enable tech-savvy
legal professionals in individual jurisdictions to focus on developing
CSL code that expresses their specific requirements. This would bring
more expertise to the design process, and lighten the maintenance
burden on core CSL maintainers.

Risk factors

The added complexity of modular style support touches processor
development, style designers, and users. Minimizing the impact on all
three is a concern.

Design

Essential features of the proposed design are:

(1) Legal citations are expressed using a fixed set of macro names,
each of which composes one element of a complete citation (the macro
names below are tentative):

  • juris-name
  • juris-name-short
  • juris-main
  • juris-main-short
  • juris-pinpoint-join
  • juris-pinpoint-raw
  • juris-tail
  • juris-tail-short

(2) A legal style module is composed as standard CSL(-m) style using
only the above-named macros.

(3) The jurisdiction and variant of a legal style module is expressed
in the filename under which it is distributed:

juris-us:ak-bluebook.csl
juris-arb.cls-oscola.csl

In the examples above, the “juris-” prefix identifies the file as a
legal style module. The prefix is reserved for this purpose within the
CSL ecosystem. In the first example, “us:ak” indicates the
jurisdiction of Alaska within the United States. In the second,
“arb.cls” indicates a private arbitration. These identifiers are drawn
from the Legal Resource Registry, which is (apparently) the only set
of jurisdiction identifiers with worldwide scope that is open to
community contributions.

http://fbennett.github.com/legal-resource-registry

The “bluebook” and “oscola” name elements allow multiple modules to be
distributed for a single jurisdiction.

(4) In a first-class CSL style file that offers extensible legal style, support:

(a) An attirbute on cs:style optionally declares preferred legal style
module variants:

<options juris-style-prefs="babyblue,oscola" />

This is parsed to a list:

citeproc.opt.juris_style_prefs = ["babyblue", "oscola"];

(b) Primary legal materials (items with a “jurisdiction” value, which
are of item type “bill,” “legislation,” “legal_case,” “report,” or, in
CSL-m, “gazette,” “regulation,” or “hearing”) are rendered by
combining “juris-” macros exclusively, using the normal CSL semantics,
with certain fixed assumptions about joining punctuation:

This example shows that (i) the first-class style controls the join
between “juris-name” and “juris-tail” and the core of the citation;
(ii) the join between “juris-main” and “juris-pinpoint” is controlled
by the latter macro (the CSL-m validation constraint on leading space
in a prefix attribute is relaxed for this purpose).

(5) When the processor encounters a “juris-” prefixed macro while
processing an item with a “jurisdiction” value, it attempts to
retrieve a corresponding style module. Resolution is attempted using
the following data elements to cast filename candidates:

prefix = “juris”;
jurisdiction = Item.jurisdiction.split(":");
variant = citeproc.opt.juris_style_prefs[i];

Matching is attempted by joining the split elements of jurisdiction to
a string identifier, then attempting to find a file match for each
variant in turn beginning with index 0, then trying the filename with
no variant, and finally attempting to find a filename with any
variant.

If a match is not found, one element is removed from the end of the
jurisdiction split, and the match against variants is attempted again.

If a match is found, the style module is instantiated, and the
"juris-" macro code embedded in the style is bypassed.

If jurisdiction elements are exhausted without a match, module
instantiation is aborted, and the embedded macro code is used.

(To avoid unnecessary overhead, a processor should attempt to match a
jurisdiction module only once during a session, caching the result as
success or failure.)

Features

The design described above has the following advantages:

  • Jurisdiction modules are valid CSL(-m) styles, and can be tested for
    accuracy and regressions with existing tools.

  • First-class styles can function without installing style modules.

  • The requirements for adding full legal support to existing styles are modest.

  • Implementation in processors is relatively simple, requiring only
    implementation of the style preference node, and a means of
    instantiating individual macros as standard executable style code
    runs.

Note that this assumes the use of standard identifiers in input data.
The most significant burden for legal style support will fall on
calling applications, which must implement user interface and
underlying code to assure that correct jurisdiction identifiers are
sent to the processor for each item.


Frank

I fully support some type of style modularization for legal styles. My
main concern with the current proposal is the heavy reliance on
(sub)string parsing, which I think is poor design.

Some thoughts:

  • I think it makes sense to modularize through macros. However, module
    files that only contain such macros should probably use a different
    extension (just “.xml”, or “.cslm”?) and be governed by a customized
    schema.
  • In connection with the previous point, I think names like
    "juris-us:ak-bluebook.csl" are a mess. Would it make sense to combine
    multiple variants in a single module file? E.g. we could have a
    "modules" subdirectory of the “styles” repo (which eliminates the need
    for the “juris-” prefix in the file name), with files named like
    "us:ak.xml". Within the file, we could define multiple variants by
    nesting the macros in a “cs:module” element with a “style” attribute.
    The root element could be “cs:modules” with a "jurisdiction"
    attribute, e.g.:
  • For specifying the desired priority of style variants (""), I much rather see a
    structure that relies on element order, like we do for cs:key and the
    children of cs:substitute. E.g.
...
  • Finally, I think relying on macro name prefixes
    (‘macro=“juris-main”’) is also not optimal. I much rather see a
    dedicated attribute like: .

Rintze

I was guilty of mixing specification ideas and implementation details
in the original post. A few comments below to get back on track.

I fully support some type of style modularization for legal styles. My
main concern with the current proposal is the heavy reliance on
(sub)string parsing, which I think is poor design.

Some thoughts:

  • I think it makes sense to modularize through macros. However, module
    files that only contain such macros should probably use a different
    extension (just “.xml”, or “.cslm”?) and be governed by a customized
    schema.

For style-level testing, a module should contain a cs:citation
element, so it’s not all that different from a first-class CSL style
file. It might make sense to give modules a special extension,
distinguishing features in cs:info are probably more important (more
below).

  • In connection with the previous point, I think names like
    “juris-us:ak-bluebook.csl” are a mess.

I confused things with the filename formatting description. An
application installing a module would look into its metadata for some
flag saying that it is a CSL jurisdiction module, an ID, and a variant
name, and use those details to store the code somewhere from which it
can be retrieved. If it ends up stored in a file, it might have any
name, so long as the system knows how to retrieve it. An application
calling for a module, over the Web or internally, would attempt to
retrieve a compatible module based on a jurisdiction ID and a set of
variant names. The resolution mechanism might be based on filenames,
or it might be set up in some other way. The module metadata need to
be specified; the machinery for storage and retrieval are an
implementation detail.

I’ll be using the filename hack for MLZ development in the short term,
at least, but if modules eventually find their way into the CSL repo,
they can be stored and delivered in a more elegant way.

Would it make sense to combine
multiple variants in a single module file?

That would be a step backward, actually. One of the aims with modules
is to reduce the bloat in the current CSL-m styles (MLZ CMS is
currently 3,246 lines of code …). So keeping the modules as small as
possible is an objective. The other aim is to allow local hackers to
build and distribute jurisdiction modules for their special
requirements with a minimum of friction. So keeping modules simple
(and atomic) is another objective.

E.g. we could have a
“modules” subdirectory of the “styles” repo (which eliminates the need
for the “juris-” prefix in the file name), with files named like
“us:ak.xml”. Within the file, we could define multiple variants by
nesting the macros in a “cs:module” element with a “style” attribute.
The root element could be “cs:modules” with a “jurisdiction”
attribute, e.g.:

  • For specifying the desired priority of style variants (“”), I much rather see a
    structure that relies on element order, like we do for cs:key and the
    children of cs:substitute. E.g.
...

We could do this, but so long as the only thing we’re storing is a
short list of ASCII variant names, a comma-delimited list would be
sufficient, and simpler for applications to implement.

  • Finally, I think relying on macro name prefixes
    (‘macro=“juris-main”’) is also not optimal. I much rather see a
    dedicated attribute like: .

I implemented that in the development code, then wound it back out,
for the present.

For deployment, the idea will be to add legal support to existing CSL
styles by slotting in template code that calls the module macros. That
will be simpler if there are no clashes among macro names. We could
solve the namespace problem by adding something like cs:module-macro
to partner with a module-macro attribute - or maybe there is a case
for XML namespaces. Giving modular macros their own syntax might be a
good idea, and it does have a cleaner feel to it, but on the other
hand it would essentially be a layer of syntactic sugar that a
processor converts to some prefixing scheme behind the scenes when
storing macros. The decision can probably wait; it shouldn’t affect
the work to build up the first set of modules, and changes can be made
pretty easily when the concept gets closer to adoption by CSL.

Frank

#sorting orders

I wonder if this also helps to manipulate sorting orders, other then by
number or alphabet.

One of the problems with legal styles is that most styles (asfar as I am
aware) have very specific rules on how to present a bibliography: the
highest ruling court must be mentioned first (or last), then the highest
ruling court (or circuit), and so on. There are two deciding factors
(again, asfar as I am aware):
a. style
b. jurisdiction ‘base’

For example: When I look to the situation in Europe (EU memberstates),
then it is possible a ruling (in any given case) could be based on:

  • regional court
  • national court
  • constituational court
  • EU Court of Justice OR European Court of Human Rights <-- HIGHEST
  • national court (in review because of the ECJ or ECHR ruling)
  • regional court (in review because of the national review of the ECJ or
    ECHR ruling)

Then you have 4 jurisdictions (EU:country:region; EU:country:supreme;
EU:country:constitutional; EU:EU:ECJ or EU:EU:ECHR). In the bibliography
the the EU:EU:ECJ should be presented first (because of european law the
EU:EU:ECHR is a less higher court), and so on.

Then there are also many law systems that have special circuits or
special courts for (for example) administrative law, labour law, social
security law, that sometimes are part of the regional court or sometimes
a court independ of region but resort directly under control of a
national or constitutional court.

I don’t see how the proposed solution helps to manipulate sorting orders
to enable to get the right order.

#source document
The actual ruling (printed, or spoken in public) of the court is the
real source document. But that is not the (first) publication of it.
Case law magazines often publish rulings, and by doing so are the actual
first one to publish it. Without those publications you would not know
that a certain ruling was made.

Most legal styles I know distinguish the ruling from it’s publication.
According to some styles I know, in a bibliography you also want the
’published’ ruling report also listed under the right court (as
mentioned above, which could cause problems.

I don’t see how the proposed solution helps to set this in a right order.

Perhaps I’m wrong, but could you please help me explain how the proposed
solution could do this? Because I think these two things are the main
reason why legal style sheets are so complicated right now.

Jo�lFrank Bennett wrote on 21-2-2015 at 7:00: > I was guilty of mixing specification ideas and implementation details > in the original post. A few comments below to get back on track. >

Joël,

The short answer is that this particular solution to the
citation-level formatting problem is based on stable, machine-readable
jurisdiction and court identifiers that can be leveraged to solve of
the issues you raise.

Identifiers will not solve the sorting problem on their own - as you
say, requirements vary according to style, and coding a static
hierarchy into the identifier spec would be too inflexible.
For semantic sorting, you are going to need an intermediate table of
hierarchy and precedence. Someone would need to do the work to put it
together, but a system of identifiers makes it possible to undertake
the task in a systematic way.

The association of the various official and unofficial reports of a
case, and their ranking by preference, is probably best addressed by
directional links among them. The platform I use for development
(Zotero) does not yet support those, but when they come on stream, we
can think forward to how to implement them. Again, this will involve
leveraging identifiers, and someone is going to have to do some work
on it.

I’ve made a few more specific comments and responses below.

#sorting orders

I wonder if this also helps to manipulate sorting orders, other then by
number or alphabet.

One of the problems with legal styles is that most styles (asfar as I am
aware) have very specific rules on how to present a bibliography: the
highest ruling court must be mentioned first (or last), then the highest
ruling court (or circuit), and so on. There are two deciding factors
(again, asfar as I am aware):
a. style
b. jurisdiction ‘base’

For example: When I look to the situation in Europe (EU memberstates),
then it is possible a ruling (in any given case) could be based on:

  • regional court
  • national court
  • constituational court
  • EU Court of Justice OR European Court of Human Rights ← HIGHEST
  • national court (in review because of the ECJ or ECHR ruling)
  • regional court (in review because of the national review of the ECJ or
    ECHR ruling)

Then you have 4 jurisdictions (EU:country:region; EU:country:supreme;
EU:country:constitutional; EU:EU:ECJ or EU:EU:ECHR). In the bibliography
the the EU:EU:ECJ should be presented first (because of european law the
EU:EU:ECHR is a less higher court), and so on.

Then there are also many law systems that have special circuits or
special courts for (for example) administrative law, labour law, social
security law, that sometimes are part of the regional court or sometimes
a court independ of region but resort directly under control of a
national or constitutional court.

Right. The ranking of authorities is system- or style-specific, and
needs to be expressed separately. You certainly wouldn’t want to
shoehorn these rankings and their permutations into a CSL style, nor
into item data: it belongs in a task-specific sorting engine that a
formatter can call upon as required.

I don’t see how the proposed solution helps to manipulate sorting orders
to enable to get the right order.

So yes, it’s a separate problem; but a solution to it will depend on
the same (nascent) infrastructure as the formatting issue.

#source document
The actual ruling (printed, or spoken in public) of the court is the
real source document. But that is not the (first) publication of it.
Case law magazines often publish rulings, and by doing so are the actual
first one to publish it. Without those publications you would not know
that a certain ruling was made.

Most legal styles I know distinguish the ruling from it’s publication.
According to some styles I know, in a bibliography you also want the
‘published’ ruling report also listed under the right court (as
mentioned above, which could cause problems.

I don’t have a clear idea what the cites you describe look like, but
we are certainly capable of discriminating between a bare cite to a
judgment (by court, date, docket number, etc) and to a report of it
(in a nominate reporter), or a direct official publication of it (in a
vendor-neutral citation). The complexity of handling the detail of
that at jurisdiction level is exactly why modular style support is so
important. As for presenting reports in the order of the deciding
court, that is variant of the sorting issue.

I don’t see how the proposed solution helps to set this in a right order.

Perhaps I’m wrong, but could you please help me explain how the proposed
solution could do this? Because I think these two things are the main
reason why legal style sheets are so complicated right now.

Right, there is a lot of work still to be done.

So this is long enough that I don’t really have time to read it carefully.
But high level question:On Sat, Feb 14, 2015 at 10:20 PM, Frank Bennett <@Frank_Bennett> wrote:

Proposed solution

The proposed solution is to express jurisdiction-specific citation
forms for primary legal materials (only) in separate modules that can
be loaded on demand by a CSL processor.

So would it be fair to say that this is basically the same idea we’ve
discussed previously in order to automate monolithic style creation, but
here instead implemented in such a way that they aren’t monolithic?

Bruce

So this is long enough that I don’t really have time to read it carefully.
But high level question:

Proposed solution

The proposed solution is to express jurisdiction-specific citation
forms for primary legal materials (only) in separate modules that can
be loaded on demand by a CSL processor.

So would it be fair to say that this is basically the same idea we’ve
discussed previously in order to automate monolithic style creation, but
here instead implemented in such a way that they aren’t monolithic?

Yes, that’s the idea exactly. The same processor logic could be
repurposed to enable one or more macro libraries, if that were
desired.