Number, DOI, keywords

Two quick notes:

Shouldn’t cs-extended_fields also include a cs-number?

Also, we perhaps need a field for doi (which can be quite handy to
include in a reference list) and a field for keywords (as a list of
some kind would be very good for annotated bibliographies).

Johan

Shouldn’t cs-extended_fields also include a cs-number?

I’ve been wondering about this issue. As a first principle, I have
always disliked “number” because it’s vague. This is why I have issue
and volume. It’s in part to leave room for a document number. Using
“document” for this feels a bit weird, but is consistent.

Also, we perhaps need a field for doi (which can be quite handy to
include in a reference list)

Yes, but this gets to my bigger question: do we distinguish between
document numbers (say a report or case number) and identifiers (an
isbn)?

In any case, we probably need a generic identifier, and then perhaps
(?) specific sub-ids.

The problem with this is the list can get very long, and it’s almost
impossible to account for all the possibilities. At minimum we’d need
isbn, doi, and ???

and a field for keywords (as a list of
some kind would be very good for annotated bibliographies).

Makes sense.

Bruce

svn commit -m “added identifier, document, keyword, doi, isbn fields”

How’s that for quick?

Bruce

Hi Bruce,

Shouldn’t cs-extended_fields also include a cs-number?

I’ve been wondering about this issue. As a first principle, I have
always disliked “number” because it’s vague. This is why I have issue
and volume. It’s in part to leave room for a document number. Using
“document” for this feels a bit weird, but is consistent.

It’s not that vague. Many (hard core) scientific journals sometimes
subdivide an issue into subparts which is indicated as the number. I
think that using “document” for this is very confusing.

Also, we perhaps need a field for doi (which can be quite handy to
include in a reference list)

Yes, but this gets to my bigger question: do we distinguish between
document numbers (say a report or case number) and identifiers (an
isbn)?

In any case, we probably need a generic identifier, and then perhaps
(?) specific sub-ids.

The problem with this is the list can get very long, and it’s almost
impossible to account for all the possibilities. At minimum we’d need
isbn, doi, and ???

where type could also be anything should be able to handle most cases. It would be very
good to already specify the possibilities as a CiteProc implementation
needs to know how to distract the information from the various biblio
file formats.

Johan

PS: Yup, that was very quick! ;-)–

It actually is vague. Does “number” refer to the article, or the
issue? Answer: it refers by convention (only) to the issue.

I’ll illustrate with a story:

Awhile back I had a conference call with some of the guys at Nature and
related groups about enhancing the PRISM specification. They were
having a problem because PRISM has a “number” property, which they
understood to refer to the issue number. But they needed in some cases
to store article level numbers.

So they wanted to ask PRISM to add a property called "articleNumber"
which I objected to because it’s too narrow.

So if we use just “number” then I am going to explicitly define it as
the number for the reference/document.

Bruce

Well, OK, here’s the simple solution:

 ## Where cs:number is used as a main template field, it is 

understood
## to refer to the document, and so can be used for things like
report
## and case numbers.
cs-number = element cs:number { cs-format }

Clear enough I think.

where type could also be anything should be able to handle most cases. It would be very
good to already specify the possibilities as a CiteProc implementation
needs to know how to distract the information from the various biblio
file formats.

OK, will think about this more.

Bruce

and if it’s found in here

It is understood as the number of this issue, i.e. the convential
meaning of number. Right?

So perhaps extend the doc to:

Where cs:number is used as a main template field, it is understood

to refer to the document, and so can be used for things like report

and case numbers. Inside cs:group for class container it represents

the number used for subdivided issues.

Johan

Hmmm … another good question.

I think the answer is “no.”

Group is really just a dumb container element. I got rid of the earlier
more nested approach (where there would group element with meaningful
semantics, like “container,” “series,” etc.) because I felt it was more
confusing and limited.

Also, in the above example, the semantics would then be more like issue
or volume, and so there’s some ambiguity there. What happens, for
example, if you have:

… ? Are they the same thing then?

Basically, fields are understood to relate to the reference, except
where there is modifying attribute, and these locator numbers do not
allow those sort of attributes; they are flat data properties.

Convinced?

Bruce

Convinced?

I am not so sure what you are trying to convince me of. I just think
that we really need to have place for the field that is called number
in the conventional software of for which we haven’t yet found a
place. I don’t see how you can call it document, because that is very
confusing. Perhaps we need to call it subissue?

Johan–

I just think that we really need to have place for the field that is
called number in the conventional software of for which we haven’t yet
found a place.

This is how you’d do it:

       <type name="article">
         <author/>
         <date prefix=" (" suffix="). ">
           <year/>
         </date>
         <titles/>
         <titles prefix=", " font-style="italic" 

relation=“container”/>





I don’t see how you can call it document, because that is very
confusing. Perhaps we need to call it subissue?

But that doesn’t work for a report or case number. They are both
“document numbers” and I am proposing to simplify it by just using
“number.” I know it’s not consistent with some other formats, but it’s
not my fault they were poorly designed :wink:

Any other opinions on how to do this right?

Bruce

This is how you’d do it.

That’s how I thought we would do it indeed. I understand your issues.
I am just wondering how you would represent the issue-number.

Johan–

Sorry, am not understanding. Can you please rephrase?

Thanks,
Bruce

Sorry, am not understanding. Can you please rephrase?

Sorry, perhaps I am misunderstanding you. In the sample you gave you
use “number” exactly the way I was expecting it for a number that
relates to the issue. From your e-mails I understood you wanted it to
represent a number that relates to the reference directly, such as a
report or case number. Do I understand it right that it can actually
represent all of this, so report or case number as well as the number
in the conventional meaning? If that is so, I think that’s alright.

     <type name="article">
        <author/>
        <date prefix=" (" suffix="). ">
          <year/>
        </date>
        <titles/>
        <titles prefix=", " font-style="italic"

relation=“container”/>





Back to cs-identifier:

Besides doi, some other useful fields might be:

doi : digital object identifier
pmid : PubMed identifier
bibcode : identifier used in Astrophysics Data System
oai : identifier used in the Open Archives initiative

issn An ISSN number
eissn An electronic ISSN number
coden A CODEN
isbn An ISBN number
sici A SICI of a journal article, volume or issue. Compliant with
ANSI/NISO Z39.56-1996 Version 2 (see
http://sunsite.berkeley.edu/SICI/)
bici A BICI for a section of a book, to which an ISBN has been
assigned. Compliant with http://www.niso.org/bici.html

the above from http://www.exlibrisgroup.com/sfx_openurl_syntax.htm

Btw, this week I’ll probably only be able to respond at irregular
times, so it may take a bit longer to reply than usual.

Cheers,

Johan–

From your e-mails I understood you wanted it to represent a number
that relates to the reference directly, such as a report or case
number.

Correct.

Do I understand it right that it can actually represent all of this,
so report or case number as well as the number in the conventional
meaning?

No. Otherwise you’d end up with duplicate issue numbers; right?

To be clear, we have three kinds of numbers: document (the current
context: the reference), issue and volume. Remember, default config for
volume is:

<volume>
	<number/>
</volume>

So when you use in a template, it’s basically switching
context (in XSLT terms) to the volume relation, and using its number.
Because the current context is by definition the document/reference, we
can sensibly shorten document number to just “number.”

But I don’t think we can mix what are basically incompatible models.
“Number” as used in PRISM and BibTeX is just bad design, and it means
that people have to result to hacks like articleNumber to get around
the problem.

I guess edition is still another number of sorts, though of a different
sort.

Back to cs-identifier:

Besides doi, some other useful fields might be:

[… snip …]

the above from http://www.exlibrisgroup.com/sfx_openurl_syntax.htm

OK, I need a vote on what to do here, because I don’t think it’s
sensible to have 10 or 15 elements to represent this stuff. I’ve never
seen references with anything but dois in them, but I could imagine
people might want this stuff. So:

Option 1: support only the basics (doi, isbn)
Option 2: have a generic identifier element with type attributes
Option 3: support a mix of 1 and 2

This get us into large design patterns. The decision can’t be made in
total isoldation. If we go with 2, what other structures also warrant
this approach? What do not?

If you’ll remember, a bit ago I had gone with the more generic
approach, but hadn’t been that happy with it. In the end, though,
flexibility may be more important than absolute purity.

Btw, this week I’ll probably only be able to respond at irregular
times, so it may take a bit longer to reply than usual.

OK, have fun!

Bruce

I dunno, i guess to answer my own question, we could say that
identifiers and locators are generic fields that require a type
attribute. That would suggest reverting a bit to something like this:

       <type name="article">
         <author/>
         <date prefix=" (" suffix="). ">
           <year/>
         </date>
         <titles/>
         <titles prefix=", " font-style="italic" 

relation=“container”/>






I think there’s value in keeping the contributors separate, but I could
also see doing something like creator and contributor, each of which
can have role attributes. That would allow tighter validation, but also
free things up.

Don’t worry, we’'ll still get 1.0 wrapped up soon enough :wink:

Bruce

OK, slept on it, and here’s what I’m now thinking:

Keep the basic fields we have. They are widely used and
user/developer-friendliness is important.

However, we want to allow for the other stuff. So why not just allow an
optional type attribute on the generic fields (contributor, locator,
identifier), and then have a pattern like this:

identifier_types = “pmid” | “sici” | identifier_types.extension
identifier_types.extension = notAllowed

The identifier_types.extension pattern is a conventional way by which
you build in customization options in RELAX NG. So the idea is, if
somebody somewhere needed to validate an extended list, they can import
the schema and override the extension to include that list. Ideally, of
course, they’d contact us and have us add it to the core schema.

How’s that?

I’m keeping the approach to number/issue I outlined yesterday unless I
hear Simon pipe up with a better idea. This means that “number” as you
know from BibTeX = “issue” number. This then means that cs:number
unqualified with some other container (volume or issue) is a document
number … always.

Hmm … come to think of it, I’m a little unsure of what to do with
series numbers then. I guess it’d be a generic locator with a "series"
type.

Bruce