One of the most useful scrapers that I have running locally is for
committee transcripts of the Japanese National Diet. It’s a hack at
the moment, and depends on a hacked schema and invalid CSL. I would
like to find a way to make it compatible with CSL and move it into
Zotero main.
The archive of transcripts covers all committee hearings from 1945 to
the present. It is among the largest unified repositories of such
text on the Web. Apart from being an important resource for research
into law, policy and politics, it has a role as test data for NLP
tools that operate on Japanese. So it’s just one site, but it’s an
important one.
In the archive, records are identified by:
legislative body (upper or lower house); then by
legislative session, then by
committee name, then by
a sequence number identifying the committee session
The transcript of an individual meeting should be treated as a
document unit. Currently, my scraper returns this information in the
following Zotero fields:
legislativeBody
session
committee
meetingNumber
As far as I can tell, none of these fields are represented in CSL.
They could be mapped, but I’m not sure what the correct mapping would
be. Perhaps:
legislativeBody = collection-title
(i.e. Japanese National Diet)
session = collection-number
(i.e. 171st session)
chamber = container-title
(i.e. House of Representatives)
committee = title
(i.e. Committee on Legal Affairs)
meetingNumber = number
(i.e. 2nd meeting)
Not sure, though. Does that seem sensible?
Frank