sru url issues

Not sure if this is my bug, or Saxon’s. Any ideas?

FODC0005: Invalid URI {http://polaris.ipoe.uni-kiel.d…} - base
{file:/Users/darcusb/Projects/c…}
Error on line 115 of file:/Users/darcusb/Projects/citeproc/xsl/citeproc.xsl:
FODC0005: Failed to load document
http://polaris.ipoe.uni-kiel.de/refs/sru.php?version=1.1&query=bib.citekey%20any%20"Ackley1990SnowCover"&operation=searchRetrieve&recordSchema=mods&recordPacking=xml&startRecord=1&maximumRecords=9999&x-info-2-auth1.0-authenticationToken=email=user@refbase.net

Bruce

Your URL works well for me if If I paste it into a browser.

Have you tried escaping the quotes (" -> %22)?

Does the error also occur if you try a very simple (but still valid)
query like this one:

http://polaris.ipoe.uni-kiel.de/refs/sru.php?version=1.1&query=1

Matthias

No. Try it with wget. It seems there’s something weird with
redirecting?? Browsers handle it, but not wget or saxon.

Bruce

Could again the dot in “1.1” the culprit such that the server does
only see “http://polaris.ipoe.uni-kiel.de/refs/sru.php?version=1

Have you tried escaping the dots?

Matthias

Using GNU Wget 1.9.1 this works for me:

wget -O srutest.xml “http://polaris.ipoe.uni-kiel.de/refs/sru.php?version=1.1&query=1

If I don’t specify a output file name explicitly I do get “Can’t
write to file … (file name too long)” errors, though. Output to
standard output does also work for me:

wget -O - “http://polaris.ipoe.uni-kiel.de/refs/sru.php?version=1.1&query=1

Matthias

The problem is the quote, which apparently is not valid in uris, and
Saxon is pretty strict about that. For some reason, I’m not getting
output results though. Must explore some more.

BTW, would be nice if you output in utf-8.

Bruce

Try this:

wget -O -
http://polaris.ipoe.uni-kiel.de/refs/sru.php?version=1.1&query=bib.
citekey%20any%20%3DAckley1990SnowCover%3D&operation=searchRetrieve&recor
dSchema=mods&recordPacking=xml&startRecord=1&maximumRecords=9999&x-
info-2-auth1.0-authenticationToken=email=user@refbase.net

Bruce

The database admin can choose whether he wants to store data in
utf-8 or in latin1. If it’s utf-8, then all output will be utf-8,
otherwise it will be latin1. So output is currently based on the
admin’s choice and we don’t attempt to convert data on the fly right
now.

AFAIK, the XML specification requires that all parsers support both
the UTF-8 encoding (by default), and the ISO-8859-1 character set. So
would this pose any problems to you if you receive latin1 encoded
data?

Thanks, Matthias

How is this supposed to work? Maybe your URL got garbled? Your email
came thru with the cite key being enclosed by ‘%3D’ which is the
equals sign:

bib.citekey any =Ackley1990SnowCover=

This isn’t valid CQL, is it? Using ‘%22’ would have worked, though:

wget -O - “http://polaris.ipoe.uni-kiel.de/refs/sru.php?version=1.1
&query=bib.citekey%20any%20%22Ackley1990SnowCover%22&operation=
searchRetrieve&recordSchema=mods&recordPacking=xml&startRecord=1
&maximumRecords=9999&x-info-2-auth1.0-authenticationToken=email=
user@refbase.net

At least it works here. Infact, also the originally given query works
for me, using wget, lynx or whatever browser:

wget -O - “http://polaris.ipoe.uni-kiel.de/refs/sru.php?version=1.1
&query=bib.citekey%20any%20%22Ackley1990SnowCover%20Broecker1997Science
%20Granskog2004Baltic%20Mock2002Hydrobiologia%20Simstich2003MarMicropaleontol
%20Thomas2003SeaIce%22&x-info-2-auth1.0-authenticationToken=email=
user@refbase.net

Matthias

Oops, was using the wrong escape.

Anyway, now … it works! Here’s the output:On 6/14/05, Matthias Steffens <@Matthias_Steffens> wrote:

This isn’t valid CQL, is it? Using ‘%22’ would have worked, though:

===========

CiteProc Test

Introduction

A citation with page number detail: (Broecker 1997, 23–24).

References

Broecker, WS 1997. Thermohaline Circulation, the achilles heel of our
climate system: Will man-made CO[sub:2] upset the current balance?,
Science278: 1582–88available from: Andrea Lorenz
(alorenz@ipoe.uni-kiel.de), .

============

BTW, aside from some punctuation problems I need to look at (probably
in the CSL file) this points out the problem with our inconsistent use
of the location element. I suggest maybe it’d be better for you to
use recordInfo for these data? If you read thee MODS docs, they
define location as:

“location” identifies the institution, repository holding the resource
or a remote location in form of a URL from which it is available.

The key part is the last bit, and is meant to suggest either online
resources (think of a newspaper article with a url) or physical
locations of unpublished manuscripts.

Question: do you want me to include these two simple refbase xslt and
xml files in the distribution (as examples), or not?

Bruce

Anyway, now … it works! Here’s the output:

Cool! That’s good news.

===========

CiteProc Test

Introduction

A citation with page number detail: (Broecker 1997, 23–24).

References

Broecker, WS 1997. Thermohaline Circulation, the achilles heel of our
climate system: Will man-made CO[sub:2] upset the current balance?,
Science278: 1582–88available from: Andrea Lorenz
(alorenz@ipoe.uni-kiel.de), .

============

BTW, aside from some punctuation problems I need to look at (probably
in the CSL file) this points out the problem with our inconsistent use
of the location element. I suggest maybe it’d be better for you to
use recordInfo for these data? If you read thee MODS docs, they
define location as:

“location” identifies the institution, repository holding the resource
or a remote location in form of a URL from which it is available.

That’s exactly how we use it, IMHO. In our case, a (possibly remote)
person with an own collection of physically available papers
represents a repository holding the resource. Anyone interested to
get a copy can then contact this person (e.g. via email). I don’t see
how we abuse the meaning of the location element.

The key part is the last bit, and is meant to suggest either online
resources (think of a newspaper article with a url) or physical
locations of unpublished manuscripts.

The text you quoted seems definitively more generic to me. Of course,
if your interpretation is true, we should change it (and I’ll happily
do so). But I’d like to have this confirmed first.

From the explanation of elements for recordInfo at the MODS site I
don’t see an element where our “physically-available-from-person”
information would fit.

Question: do you want me to include these two simple refbase xslt and
xml files in the distribution (as examples), or not?

Yes, I’d welcome this!

Matthias

OK. So in that case, it would be appropriate to have the “available
from John Doe doe@example.net” in the citation.

I was thinking you were using it to store information about the user’s
copy or something.

Bruce

Maybe we (or at least I) have still a misunderstanding here since I
don’t really see any difference between your last sentence an my
statement above. The “physically available paper” is indeed only a
copy of the artcile, not the original (say, in a public library).

But the important point is, that in the scope of the database (quite
often representing an inventory about all articles being physically
available within one institution) such a copy is the physical
location from where yet another copy can be retrieved. It is very
important to know where one can get the particular paper.

However, when citing records, people wouldn’t want this information
to occur along with the citation. This is different from the case of
a newspaper article being available from http://…

I’m happy to put this information elsewhere if there’s another place
that makes sense.

Thanks again, Matthias

I think you rightly point out some ambiguity in this. Do you feel like
posting a question about it to the MODS list? People there might have
some ideas of how to handle this.

It might be worth noting that location does have an authority
attribute, whereby in theory you could do:

… and citeproc could just not process locations with an authority
attribute. That seems a little awkward perhaps.

Bruce

I think you rightly point out some ambiguity in this. Do you feel
like posting a question about it to the MODS list? People there
might have some ideas of how to handle this.

Yes, I can try but I’m not sure if I’ll be able to understand their
answers. Quite often this list is just too techyy for me. :-/

It might be worth noting that location does have an authority
attribute, whereby in theory you could do:

… and citeproc could just not process locations with an
authority attribute. That seems a little awkward perhaps.

Well, it would be fine for me.

Matthias

I’d basically just explain the issue, and then ask “how should I code
this?”

If you prefer, I can do it.

Bruce

Yes, I can try but I’m not sure if I’ll be able to understand
their answers. Quite often this list is just too techyy for me.

I’d basically just explain the issue, and then ask “how should I
code this?”

Ok, haven’t found time to post a message yet.

If you prefer, I can do it.

Well, I’d appreciate that! :slight_smile:

And please let me know, if you encounter any other problems with the
refbase MODS XML output. Your testing has helped us already quite a
bit to fine tune our MODS output which I appreciate!

Thanks, Matthias