SRU

Matthias_Steffens · May 16, 2005, 11:51pm

Hi Bruce,

[cc-ing to the xbiblio-devel list]

I’ve checked the resources available at the SRW/U home page and I
think that a preliminary (i.e limited) implemenation of a simple SRU
server is not too difficult to implement. Especially the code &
examples given by Mike Taylor would be helpful:

http://sru.miketaylor.org.uk/

However, I would appreciate to have more syntax/implementation
examples etc. Any pointers?

Rob --it’s Matthias Steffens (cc-ing), who is affiliated with the
Bibliophile project. My understanding is that he’d start by just
supporting this specific query, so I’m wanting to standardize on the
precise syntax.

Something like:
http://localhost:8081/biblio?operation=searchRetrieve&version=1.1
&query=cite.key+any+“Smith1992a+Smith1992b+Mitchell1995a”
&recordSchema=mods
&startRecord=1
&maximumRecords=9999

Parsing the query would be a breeze. Delivering a correctly formed
srw:searchRetrieveResponse would involve quite some work, though.
It’s certainly doable but I fear I won’t have the time to implement
it right away. Anyhow, speaking of the distant future, I’m really
willing to support a full SRU solution.

As a preliminary solution, would it be possible for citeproc to send
a URL query (be it SRU or whatever) and have plain MODS XML returned?
Citeproc expects MODS and refbase outputs MODS. So the most simple
solution would be if refbase could return raw MODS XML data (without
any srw:searchRetrieveResponse data wrapped around it).

Of course, I agree that in the long run support of proper standards
like SRU (and CQL) is the way to go. But for now it would be cool, if
I could just send plain MODS.

This works already. You can try it out yourself. To do so, login at:

http://polaris.ipoe.uni-kiel.de/refs/index.php

using

email: user@refbase.net
pwd: guest

then click the links below (the links won’t work if not logged in):

Return records no. 623, 21654 and 23961 in MODS XML format:

http://polaris.ipoe.uni-kiel.de/refs/show.php?serial=^(21654|623|23961)$&submit=Export&exportFormatSelector=MODS%20XML&exportType=xml

This should return 3 records in MODS XML format. Change ‘exportType’
to ‘text’ (or ‘html’) to have it rendered as plain text (or wrapped
into html):

http://polaris.ipoe.uni-kiel.de/refs/show.php?serial=^(21654|623|23961)$&submit=Export&exportFormatSelector=MODS%20XML&exportType=text
http://polaris.ipoe.uni-kiel.de/refs/show.php?serial=^(21654|623|23961)$&submit=Export&exportFormatSelector=MODS%20XML&exportType=html

Querying of cite keys is supported as well but you must be logged in
as a regular user to do so (since cite keys are unique to every user).

Btw, ‘show.php’ supports also other output formats (RIS, Endnote &
Bibtex via bibutils) and querying of many other fields. Here’s an
example:

Return all database entries (in Endnote format wrapped into HTML)
where the title field contains “situ” and where the author field
contains “mock”, excluding any duplicate records:

http://polaris.ipoe.uni-kiel.de/refs/show.php?title=situ&author=mock&without=dups&submit=Export&exportFormatSelector=Endnote&exportType=html

This should return 2 records…

Now, to gain citeproc integration I would just need to convert a
citeproc query into a refbase query (similar to the above) and
incorporate the returned results. As discussed before, another
pathway would be to directly send a MODS XML file to citeproc and
display the returned results.

I’m eager to do this…

Matthias

Matthias_Steffens · May 17, 2005, 1:04pm

Hi Bruce,

However, I would appreciate to have more syntax/implementation
examples etc. Any pointers?

Rob Sanderson and Mike Taylor are really the guys to ask about
documentation on SRU/W. Maybe this is another place to start?

http://www.loc.gov/z3950/agency/zing/srw/sru-simple.html

Thanks, those examples as well as Rob’s PDF on SRW-1.1[1] are
definitely helpful.

[1] http://srw.cheshire3.org/SRW-1.1.pdf

As a preliminary solution, would it be possible for citeproc to
send a URL query (be it SRU or whatever) and have plain MODS XML
returned? Citeproc expects MODS and refbase outputs MODS. So the
most simple solution would be if refbase could return raw MODS
XML data (without any srw:searchRetrieveResponse data wrapped
around it).

Yes, but it would be preferable if we could get it close as
possible to SRU (for practical reasons, I don’t want to have to
support a different query protocol for every database).

That’s understandable.

Querying of cite keys is supported as well but you must be logged
in as a regular user to do so (since cite keys are unique to
every user).

Hmm … I think this is a problem, and I don’t just mean for
citeproc.

There’s two issues:

I see no way to get the XSLT processor to be able to login.

With regard to refbase, a user is normally requested to login before
he can access features like export, etc. Anyhow, I agree that a SRU
service should work without requiring a user to log into the database.

Perhaps instead there might be another way to bind a citekey to a
user in the url so the data can be returned without being logged in?

Yes, that’s possible. For another user-specific field refbase does
already allow every user (logged in or not) to query this field if a
‘userID’ parameter is given in the query URL.

However, I’d prefer if querying of user-specific cite keys is only
allowed to users who are logged in and then only allow them to query
their own cite keys. There are other truly unique identifiers which
ensure unique record identity on a database-wide or even global basis
(see below).

the bigger issue which is that citekeys aren’t a good solution
in a multi-user context (I use them, so am not throwing stones; am
just remembering they’re less-than-ideal).

Yes, I agree. This is why refbase allows a user to only access his
own cite keys (and only when logged in).

To uniquely identify records on a database-wide level refbase uses
simply a record serial number. Our database users include these
serial numbers within the body text along with a preformatted
citation string (like Steffens et al 2004a {1234}). Unlike citeproc
refbase doesn’t offer to reformat these citation strings, though, but
offers to extract all citations from a text and build an appropriate
references list from it.

To uniquely identify records on a global basis I’d prefer the DOI
identifier (if available). refbase supports DOIs and I plan to allow
a SRU interface to query for these DOI numbers. OpenURLs would be
another good candidate as truly unique identifier, plus it offers
better backwards compatibility than the DOI system (IIRC).

I think Mike, Rob and I concluded that an ideal solution would code
multiple ids, and allow the database to resolve them.

Yes, that sounds reasonable. The database could prefer truly unique
identifiers (DOI & OpenURL) if present and may otherwise fall back to
database- or user-specific identifiers.

BTW, did you manage to get it working on the commandline with the
include DocBook sample?

Not yet (I’m not in front of my machine right now but I will report
back when I’ve tried it out).

Matthias

Matthias_Steffens · May 17, 2005, 2:06pm

Hi Mike & Rob,

thanks for your comments! I’m already convinced about SRW/U :-),
problems are more to find the development time for implementation.

To uniquely identify records on a global basis I’d prefer the DOI
identifier (if available). […] OpenURLs would be another good
candidate as truly unique identifier

I really think that database IDs, and DOIs, are solving a different
problem from the one that we face here. All I want is the ability to
type
\cite{WilsonSereno1998}
into my document and know what it will be resolved to. If I have to
type
\cite{doi:3244.1327/324326hjg2132}
instead, then … well, it’s just not a solution.

Ah, yes, now I understand. Seems I was only thinking in terms of the
database, not in terms of the user.

If cite keys are required to be user-specific and a SRU client sends
a userID along with its query, then these cite keys could be resolved
easily by the database to get truly unique IDs (like DOI or OpenURL).

If I understand Rob correctly, he’s suggesting the same:On 17 May 2005 at 14:27 +0100 Mike Taylor wrote:

On 17 May 2005 at 14:42 +0100 Dr Robert Sanderson wrote:

As far as citkeys go, I don’t think there’s a reasonable solution
which isn’t user customised. If the UI is to include a short non
globally unique string, then it’s not globally unique. Pithy, I
know, but that’s the way of it.

A user profile of citekey to article unique id seems to be the most
appropriate solution.

Regards, Matthias

Bruce_D_Arcus1 · May 17, 2005, 12:46am

However, I would appreciate to have more syntax/implementation
examples etc. Any pointers?

Rob Sanderson and Mike Taylor are really the guys to ask about
documentation on SRU/W. Maybe this is
another place to start?

http://www.loc.gov/z3950/agency/zing/srw/sru-simple.html

I don’t have my bookmarks handy here

As a preliminary solution, would it be possible for citeproc to send
a URL query (be it SRU or whatever) and have plain MODS XML returned?
Citeproc expects MODS and refbase outputs MODS. So the most simple
solution would be if refbase could return raw MODS XML data (without
any srw:searchRetrieveResponse data wrapped around it).

Yes, but it would be preferable if we could get it close as possible
to SRU (for practical reasons, I don’t want to have to support a
different query protocol for every database).

Querying of cite keys is supported as well but you must be logged in
as a regular user to do so (since cite keys are unique to every user).

Hmm … I think this is a problem, and I don’t just mean for citeproc.
I blogged about the ID issue about six months ago, and Mike Taylor, Rob
Sanderson and I chatted about it at length somewhat more recently (in
Feburary IIRC).

There’s two issues:

I see no way to get the XSLT processor to be able to login. Perhaps
instead there might be another way to bind a citekey to a user in the
url so the data can be returned without being logged in?
the bigger issue which is that citekeys aren’t a good solution in a
multi-user context (I use them, so am not throwing stones; am just
remembering they’re less-than-ideal).

I think Mike, Rob and I concluded that an ideal solution would code
multiple ids, and allow the database to resolve them.

This starts to get complicated though, but IIRC an SRU-like solution can
make it easier.

This should return 2 records…

Now, to gain citeproc integration I would just need to convert a
citeproc query into a refbase query (similar to the above) and
incorporate the returned results. As discussed before, another
pathway would be to directly send a MODS XML file to citeproc and
display the returned results.

I’m eager to do this…

Cool!

BTW, did you manage to get it working on the commandline with the
include DocBook sample?

You may have noted I included an SRU example/ I’ve actually not tried
it, but the biblioref elements actually point to ISBNs in the LoC
database, which is accessible over SRU.

Perhaps I ought to get that working tomorrow as a demo!

BruceOn Tue, 17 May 2005 01:51:03 +0200, “Matthias Steffens” <@Matthias_Steffens> said:

Mike_Taylor · May 17, 2005, 7:55am

Date: Mon, 16 May 2005 20:46:47 -0400
From: Bruce D’Arcus <@Bruce_D_Arcus1>

However, I would appreciate to have more syntax/implementation
examples etc. Any pointers?

Rob Sanderson and Mike Taylor are really the guys to ask about
documentation on SRU/W.

That’s me – hi! (Er. I’m not Rob, though.)

As a preliminary solution, would it be possible for citeproc to
send a URL query (be it SRU or whatever) and have plain MODS XML
returned? Citeproc expects MODS and refbase outputs MODS. So the
most simple solution would be if refbase could return raw MODS XML
data (without any srw:searchRetrieveResponse data wrapped around
it).

Yes, but it would be preferable if we could get it close as possible
to SRU (for practical reasons, I don’t want to have to support a
different query protocol for every database).

I’m sorry that SRU forces you to deal with this extra layer of XML;
but I’m sure you’ll easily see how much more flexibility that gives
you, and how that might well come in useful down the line. The
element give the server a way to wrap
multiple records and a place to include diagnostics, session data,
extra response data, etc. You don’t have to use all (or any) of this,
but by committing to SRU you’re making it available for more
sophisticated subsequent versions of your client.

the bigger issue which is that citekeys aren’t a good solution in
a multi-user context (I use them, so am not throwing stones; am just
remembering they’re less-than-ideal).

This is indeed a big problem if you’re referring to the same thing
that I think you are. Let me check whether you are. Consider three
papers:

Janensch, W.  1929a.  Material und Formengehalt der Sauropoden
in der Ausbeute der Tendaguru-Expedition.  Palaeontographica
(Suppl. 7) 2:1-34.

Janensch, W.  1929b.  Die Wirbelsaule der Gattung
Dicraeosaurus. Palaeontographica (SuppI. 7) 2: 39-133.

Janensch, W.  1929c.  Magensteine bei Sauropoden der
Tendaguru-Schichten. Palaeontographica (Suppl. 7) 2:135-144.

If one paper cites the first two, then it will call them Janensch1929a
and Janensch1929b; if another paper cites the second and third, it
will call them Janensch1929a and Janensch1929b. So the citation
keys have two different meanings to the different papers, and the
second of the three papers has two different citation keys.

Is that the lon-globality problem you’re referring to?

/| ___________________________________________________________________
/o ) / Mike Taylor <@Mike_Taylor> http://www.miketaylor.org.uk
)v_/\ “Any sufficiently complicated C program contains an ad-hoc,
informally-specified, bug-ridden, slow implementation of half
of Common Lisp” – Greenspun’s Tenth Rule of Programming.–
Listen to free demos of soundtrack music for film, TV and radio
http://www.pipedreaming.org.uk/soundtrack/

Bruce_D_Arcus1 · May 17, 2005, 12:54pm

The issue is pretty simple really: one user might have Janensch1929a
for the natural language id (e.g. citekey) and another might have
janensch29. We really need a more general solution, and perhaps we
ought to think about how to best achieve that within an SRU framework?

Bruce

Mike_Taylor · May 17, 2005, 1:17pm

Date: Tue, 17 May 2005 08:54:16 -0400
From: Bruce D’Arcus <@Bruce_D_Arcus1>

the bigger issue which is that citekeys aren’t a good solution
in a multi-user context (I use them, so am not throwing stones; am
just remembering they’re less-than-ideal).

This is indeed a big problem if you’re referring to the same thing
that I think you are. Let me check whether you are. Consider
three papers:

Janensch, W. 1929a. Material und Formengehalt der Sauropoden
in der Ausbeute der Tendaguru-Expedition. Palaeontographica
(Suppl. 7) 2:1-34.

Janensch, W. 1929b. Die Wirbelsaule der Gattung
Dicraeosaurus. Palaeontographica (SuppI. 7) 2: 39-133.

Janensch, W. 1929c. Magensteine bei Sauropoden der
Tendaguru-Schichten. Palaeontographica (Suppl. 7) 2:135-144.

If one paper cites the first two, then it will call them
Janensch1929a and Janensch1929b; if another paper cites the second
and third, it will call them Janensch1929a and Janensch1929b. So
the citation keys have two different meanings to the different
papers, and the second of the three papers has two different
citation keys.

The issue is pretty simple really: one user might have Janensch1929a
for the natural language id (e.g. citekey) and another might have
janensch29.

Right, OK, you don’t even need the complexity of multiple potentially
identical citations to raise the problem. Although of course trivial
solutions just say something like “always use the first author’s
surname with the first letter captialised followed by the four-digit
year and, if necessary, a discriminator” – and they are tripped up
by the scenario I outlined.

We really need a more general solution, and perhaps we ought to
think about how to best achieve that within an SRU framework?

Surely – surely! – someone has already faced and solved this
problem? They must have! Mustn’t they? It seems much too core a
problem for us to be facing up to now, in the 21st century.

/| ___________________________________________________________________
/o ) / Mike Taylor <@Mike_Taylor> http://www.miketaylor.org.uk
)v_/\ “I never make predictions and I never will” – Paul Gascoigne.–
Listen to free demos of soundtrack music for film, TV and radio
http://www.pipedreaming.org.uk/soundtrack/

Mike_Taylor · May 17, 2005, 1:27pm

Date: Tue, 17 May 2005 15:04:36 +0200
From: Matthias Steffens <@Matthias_Steffens>

the bigger issue which is that citekeys aren’t a good solution
in a multi-user context (I use them, so am not throwing stones; am
just remembering they’re less-than-ideal).

To uniquely identify records on a database-wide level refbase uses
simply a record serial number. Our database users include these
serial numbers within the body text along with a preformatted
citation string (like Steffens et al 2004a {1234}). Unlike citeproc
refbase doesn’t offer to reformat these citation strings, though,
but offers to extract all citations from a text and build an
appropriate references list from it.

To uniquely identify records on a global basis I’d prefer the DOI
identifier (if available). refbase supports DOIs and I plan to allow
a SRU interface to query for these DOI numbers. OpenURLs would be
another good candidate as truly unique identifier, plus it offers
better backwards compatibility than the DOI system (IIRC).

I really think that database IDs, and DOIs, are solving a different
problem from the one that we face here. All I want is the ability to
type
\cite{WilsonSereno1998}
into my document and know what it will be resolved to. If I have to
type
\cite{doi:3244.1327/324326hjg2132}
instead, then … well, it’s just not a solution.

/| ___________________________________________________________________
/o ) / Mike Taylor <@Mike_Taylor> http://www.miketaylor.org.uk
)v_/\ “People will accept your ideas much more readily if you tell
them Benjamin Franklin said it first” – David H. Comins.–
Listen to free demos of soundtrack music for film, TV and radio
http://www.pipedreaming.org.uk/soundtrack/

Dr_Robert_Sanderson · May 17, 2005, 1:37pm

Rob Sanderson and Mike Taylor are really the guys to ask about

Hi

As a preliminary solution, would it be possible for citeproc to send
a URL query (be it SRU or whatever) and have plain MODS XML returned?

It’s probably pretty easy to create a couple of templates to wrap the data
in – one for the response and one for each record.

I see no way to get the XSLT processor to be able to login. Perhaps
instead there might be another way to bind a citekey to a user in the
url so the data can be returned without being logged in?

SRU works via an authentication token included in each request. How this
token is acquired is out of the scope of the protocol, so you’re free to
do it however you like.

the bigger issue which is that citekeys aren’t a good solution in a
multi-user context (I use them, so am not throwing stones; am just
remembering they’re less-than-ideal).

Looking at OpenURL as a way of identifying (rather than searching for)
items might be useful, or at least of interest.

Feel free to ask about whatever WRT SRW/U you need to know

Rob

   ,'/:.          Dr Robert Sanderson (@Dr_Robert_Sanderson)
 ,'-/::::.        http://www.csc.liv.ac.uk/~azaroth/

,‘–/::(@)::. Dept. of Computer Science, Room 805
,’—/::::::::::. University of Liverpool
____/:::::::::::::.
I L L U M I N A T I Cheshire3 IR System: http://www.cheshire3.org/

Dr_Robert_Sanderson · May 17, 2005, 1:42pm

From: Bruce D’Arcus <@Bruce_D_Arcus1>
Rob Sanderson and Mike Taylor are really the guys to ask about
That’s me – hi! (Er. I’m not Rob, though.)

That’s me

extra response data, etc. You don’t have to use all (or any) of this,
but by committing to SRU you’re making it available for more
sophisticated subsequent versions of your client.

And possibly equally importantly, you’re making your service available for
use by generic tools and toolkits rather than very specific ones. It
makes it interoperable with services at OCLC, the Library of Congress, etc
etc etc.

As far as citkeys go, I don’t think there’s a reasonable solution which
isn’t user customised. If the UI is to include a short non globally
unique string, then it’s not globally unique. Pithy, I know, but that’s
the way of it.

A user profile of citekey to article unique id seems to be the most
appropriate solution.

Rob

   ,'/:.          Dr Robert Sanderson (@Dr_Robert_Sanderson)
 ,'-/::::.        http://www.csc.liv.ac.uk/~azaroth/

,‘–/::(@)::. Dept. of Computer Science, Room 805
,’—/::::::::::. University of Liverpool
____/:::::::::::::.
I L L U M I N A T I Cheshire3 IR System: http://www.cheshire3.org/

Bruce_D_Arcus1 · May 17, 2005, 1:49pm

Indeed! This is exactly the tension.

Bruce

Matthias_Steffens · May 17, 2005, 4:16pm

Date: Tue, 17 May 2005 16:06:29 +0200
From: Matthias Steffens <@Matthias_Steffens>

If cite keys are required to be user-specific and a SRU client
sends a userID along with its query, then these cite keys could
be resolved easily by the database to get truly unique IDs (like
DOI or OpenURL).

Ye-es. But I don’t think that’s enormously helpful, since it
requires the server side to hold information that’s specific to the
client (or, rather, the user that the client represents).

That’s correct. While this may work well for bibliographic databases
(as developed by the members of the bibliophile initiative) I
understand that it may not work equally well in other applications.

You could go a long way by using something like “Janensch1929p39”
(where 39 is the first page number).

That sounds reasonable. Personally I’m using
(or something similar) since
its more intuitive than page number to me. But, obviously, that
wouldn’t work in a multiple user environment since every user could
come up with another word from the title.

Try this. When you look up a citation key of the form
, the database returns an exact hit if it has one, but
says “be more specific” if has more than one, in which case you
have to try again with the “p” suffix. If that has multiple
hits, then you need to make a yet more specific citation key:
append the initials to get “Janensch1929p39W”.

Sounds clever. But what if a user did previously search for
“Janensch1929” and it returned an exact hit. Now he uses this cite
key in all of his documents. After five years another record appears
in the database that would have the same identifier (“Janensch1929”).
That would require the user to correct all of his documents which may
not be feasable after all. In order to be truly useful cite keys
should remain unique no matter what happens.

Matthias

Matthias_Steffens · May 17, 2005, 5:00pm

Yep, that would be a good rule. Still, it would only work within the
scope of one database. What if there are two or more databases
where different Janensch articles from 1929 were identified as
"Janensch1929"?

Matthias

Matthias_Steffens · May 17, 2005, 6:05pm

Well, I fear that there are cases were even “Janensch1929p39W” isn’t
unique. In fact, any cite key may fail that does not include all of
the important bibliographic source info.

E.g., I can imagine that in, say, year 2000 there were two authors
named “William Miller” whose articles started on page 39. I.e.,
“Miller2000p39W” wouldn’t be truly unique. It would require at least
the addition of a journal and a volume identifier to solve this
case.

In our database we have the same problem with naming of files that
are associated with a given database entry. We use the DOI if
available, otherwise we use file names like

Angel1994Nature367p126.pdf
Thomas+Dieckmann2004Science276p394.pdf
AdamsEtal2001MarBiol138p281.pdf

However, while these names may be unique (who knows if they really
are!?) they are ugly when used as cite keys within a document – but
they are still better than a DOI number, IMHO.

Matthias

Dr_Robert_Sanderson · May 17, 2005, 8:47pm

A user profile of citekey to article unique id seems to be the most
appropriate solution.
Can you explain this Rob in jargon-free language?

Each user has a map between their own citation key (eg Sanderson05) and
the globally unique identifier for the article
(doi://dlib.org/sanderson/05/03/sandersonLevanYoung03 or whatever)
and the system does the mapping at search time.

Rob

   ,'/:.          Dr Robert Sanderson (@Dr_Robert_Sanderson)
 ,'-/::::.        http://www.csc.liv.ac.uk/~azaroth/

,‘–/::(@)::. Dept. of Computer Science, Room 805
,’—/::::::::::. University of Liverpool
____/:::::::::::::.
I L L U M I N A T I Cheshire3 IR System: http://www.cheshire3.org/

Mike_Taylor · May 17, 2005, 3:18pm

Date: Tue, 17 May 2005 16:06:29 +0200
From: Matthias Steffens <@Matthias_Steffens>

thanks for your comments! I’m already convinced about SRW/U :-),
problems are more to find the development time for implementation.

Well, this is good. You’ll find the time, SRU is such fun!

To uniquely identify records on a global basis I’d prefer the DOI
identifier (if available). […] OpenURLs would be another good
candidate as truly unique identifier

I really think that database IDs, and DOIs, are solving a different
problem from the one that we face here. All I want is the ability to
type
\cite{WilsonSereno1998}
into my document and know what it will be resolved to. If I have to
type
\cite{doi:3244.1327/324326hjg2132}
instead, then … well, it’s just not a solution.

Ah, yes, now I understand. Seems I was only thinking in terms of the
database, not in terms of the user.

Right.

If cite keys are required to be user-specific and a SRU client sends
a userID along with its query, then these cite keys could be
resolved easily by the database to get truly unique IDs (like DOI or
OpenURL).

Ye-es. But I don’t think that’s enormously helpful, since it requires
the server side to hold information that’s specific to the client (or,
rather, the user that the client represents).

If I understand Rob correctly, he’s suggesting the same:

I don’t think so:

Date: Tue, 17 May 2005 14:42:26 +0100 (BST)
From: Dr Robert Sanderson <@Dr_Robert_Sanderson>

As far as citkeys go, I don’t think there’s a reasonable solution
which isn’t user customised. If the UI is to include a short non
globally unique string, then it’s not globally unique. Pithy, I
know, but that’s the way of it.

I don’t know. You could go a long way by using something like
“Janensch1929p39” (where 39 is the first page number). DOIs are a
kind of ID that is truly globally unique, but the price you pay for
that is total opacity; Janensch1929p39 is a “nearly unique” ID that is
easy to remember, type, and indeed make up on the spot. That’s darned
useful combination of properties, and we would be silly to let the
absence of Total Guaranteed Uniqueness blind us to that.

Try this. When you look up a citation key of the form ,
the database returns an exact hit if it has one, but says “be more
specific” if has more than one, in which case you have to try again
with the “p” suffix. If that has multiple hits, then you need
to make a yet more specific citation key: append the initials to get
“Janensch1929p39W”. You could make a global database of every article
ever published anywhere and still get very, very few clashes using
this simple scheme. I for one would very much appreciate access to
that database!

A user profile of citekey to article unique id seems to be the most
appropriate solution.

That would not be bad. Then instead of having to maintain my own
local bibliography file that says:

ZOOM:
  authors:
    - firstname: Mike
      surname: Taylor
    - firstname: Sebastian
      surname: Hammer
    - firstname: Ashley
      surname: Sanders
    - firstname: Adam
      surname: Dickmeiss
    - firstname: Rob
      surname: Sanderson
    - firstname: Aaron
      surname: Lav
  title: ZOOM: The Z39.50 Object-Orientation Model, v1.4
  year: 2004
  url: http://zoom.z3950.org/

I could just say

ZOOM: doi:1234.5678/foobarbaz

The problem is that DOIs are not free to come by, in general. Which
makes them a bad choice of unique ID for some purposes. Maybe
identifier URIs would be better, since anyone can have one. But then
we need to be sure people don’t confuse them with actionable URLs in
the case of web-based articles such as the ZOOM AAPI.

/| ___________________________________________________________________
/o ) / Mike Taylor <@Mike_Taylor> http://www.miketaylor.org.uk
)v_/\ I celebrated as only we English know how, by popping into
an office supplies shop on the way home and buying flat-pack
furniture.–
Listen to free demos of soundtrack music for film, TV and radio
http://www.pipedreaming.org.uk/soundtrack/

Bruce_D_Arcus1 · May 17, 2005, 2:11pm

However, I’d prefer if querying of user-specific cite keys is only
allowed to users who are logged in and then only allow them to query
their own cite keys.

The details really come down to the user experience. The above works
in the case of the user who wants to access the web interface and print
records. I don’t think it works that well (?) for the user working on
a document and just wanting it to be automatically – and transparently
– formatted.

Let’s see if we can be concrete:

We want to build the OpenOffice bib interface around these standards.
So let’s imagine that happens, and that RefBase supports SRU and that
then can become a plug-in for OOo.

How does authenticated communication between OOo and RefBase then
happen? The user would then be viewing the RB DB through an OOo GUI.

BTW, Mike gave us a LaTeX citation example. It’s worth noting perhaps
that XSLT 2.0 makes it possible to parse non-XML documents

I think Mike, Rob and I concluded that an ideal solution would code
multiple ids, and allow the database to resolve them.

Yes, that sounds reasonable. The database could prefer truly unique
identifiers (DOI & OpenURL) if present and may otherwise fall back to
database- or user-specific identifiers.

Yes, that’s the idea.

Bruce

Bruce_D_Arcus1 · May 17, 2005, 2:20pm

Can you explain this Rob in jargon-free language?

Bruce

Mike_Taylor · May 17, 2005, 4:26pm

Date: Tue, 17 May 2005 18:16:09 +0200
From: Matthias Steffens <@Matthias_Steffens>

You could go a long way by using something like “Janensch1929p39”
(where 39 is the first page number).

That sounds reasonable. Personally I’m using
(or something similar) since
its more intuitive than page number to me. But, obviously, that
wouldn’t work in a multiple user environment since every user could
come up with another word from the title.

Agreed on both counts: the descriptive word is nicer, but I think that
"nearly unique"ness is worth trying for, which make first-page a
better bet.

Try this. When you look up a citation key of the form
, the database returns an exact hit if it has one, but
says “be more specific” if has more than one, in which case you
have to try again with the “p” suffix. If that has multiple
hits, then you need to make a yet more specific citation key:
append the initials to get “Janensch1929p39W”.

Sounds clever.

Why, thank you!

But what if a user did previously search for “Janensch1929” and it
returned an exact hit. Now he uses this cite key in all of his
documents. After five years another record appears in the database
that would have the same identifier (“Janensch1929”). That would
require the user to correct all of his documents which may not be
feasable after all. In order to be truly useful cite keys should
remain unique no matter what happens.

Perhaps. We could have the database remember which was the first
Janensch 1929 paper it was told about, and have that one remain in use
whenever plain “Janensch1929” is used?

/| ___________________________________________________________________
/o ) / Mike Taylor <@Mike_Taylor> http://www.miketaylor.org.uk
)v_/\ “I know if I had one of those [Pterosaur crests], women would
be throwing themselves at me, convinced of my sexual prowess
and reproductive fitness, and men would shrink from me, sure
that my strength greatly exceeded theirs” – Chris Bennett.–
Listen to free demos of soundtrack music for film, TV and radio
http://www.pipedreaming.org.uk/soundtrack/

Mike_Taylor · May 17, 2005, 5:29pm

Date: Tue, 17 May 2005 19:00:09 +0200
From: Matthias Steffens <@Matthias_Steffens>

But what if a user did previously search for “Janensch1929” and it
returned an exact hit. Now he uses this cite key in all of his
documents. After five years another record appears in the database
that would have the same identifier (“Janensch1929”). That would
require the user to correct all of his documents which may not be
feasable after all. In order to be truly useful cite keys should
remain unique no matter what happens.

Perhaps. We could have the database remember which was the first
Janensch 1929 paper it was told about, and have that one remain in
use whenever plain “Janensch1929” is used?

Yep, that would be a good rule. Still, it would only work within the
scope of one database. What if there are two or more databases
where different Janensch articles from 1929 were identified as
“Janensch1929”?

Hmm. I think you’re right.

So are you saying we need to use “Janensch1929p39W” from the get-go?

(If so, the “JanenschW1929p39” would be less offensive.)

/| ___________________________________________________________________
/o ) / Mike Taylor <@Mike_Taylor> http://www.miketaylor.org.uk
)v_/\ “I saw one of the Democrat delegates, a middle-aged man, driving
a bumper car while talking on his cellphone. It’s time for the
Revolution” – Dave Barry.–
Listen to free demos of soundtrack music for film, TV and radio
http://www.pipedreaming.org.uk/soundtrack/

Topic		Replies	Views
sru xslt support CSL Development	31	383	June 13, 2005
Web Publication List? CSL Development	1	220	May 27, 2005
a Haskell implementation of citeproc CSL Development	8	282	June 19, 2008
0.7 pre-release CSL Development	1	196	June 9, 2005
Mods2biblio.xsl CSL Development	6	547	August 30, 2006

SRU

Related topics