inline markup wiki entry

Bruce_D_Arcus1 · April 30, 2009, 12:25pm

To gather information on the inline stuff, I’ve created a wiki page.

https://apps.sourceforge.net/trac/xbiblio/wiki/InlineMarkup

We could use some more examples.

Bruce

Frank_Bennett · May 3, 2009, 3:49am

To gather information on the inline stuff, I’ve created a wiki page.

XBib download | SourceForge.net

We could use some more examples.

I’ll try to add to the list sometime soon.

In citeproc-js, I’m thinking of implementing inline formatting as a
facility configurable at runtime, with no fixed set of tags, to help
folks cope while this issue develops. This will give a boost to
people like Rintze (and probably many others) who have adopted private
markup schemes in their Zotero databases. If the tags are
configurable, it can be adapted to produce visuals from a semantic
markup scheme, and passing semantic markup through for special
purposes (RDFa) should be no problem, when things have settled down.

Bruce_D_Arcus1 · May 3, 2009, 2:19pm

Can you add a specific proposal to the wiki?

Also, on RDFa, the main issue will be keeping track of URIs for
different objects. But I think it a really good idea for people to
think about this.

Frank_Bennett · May 7, 2009, 11:31pm

Can you add a specific proposal to the wiki?

Also, on RDFa, the main issue will be keeping track of URIs for
different objects. But I think it a really good idea for people to
think about this.

I have implemented a mechanism for arbitrary wiki-style in-field
markup in citeproc-js that may help reduce the barriers to support for
semantic markup. I’ve added a proposal to the wiki that attempts to
explain how I think it might be useful, with a link to the citeproc-js
sources and a reference to the rpc-stuff demo.

Frank

Bruce_D_Arcus1 · May 8, 2009, 12:01am

Frank,

Can you add a specific proposal to the wiki?

Also, on RDFa, the main issue will be keeping track of URIs for
different objects. But I think it a really good idea for people to
think about this.

I have implemented a mechanism for arbitrary wiki-style in-field
markup in citeproc-js that may help reduce the barriers to support for
semantic markup. I’ve added a proposal to the wiki that attempts to
explain how I think it might be useful, with a link to the citeproc-js
sources and a reference to the rpc-stuff demo.

I’m having a hard time understanding exactly what you’re proposing,
and a) am really busy, and b) am running into this problem:

$ ./client.sh
./results/config.result: No such file or directory
wget exited with error … is server.py running?

The server seems to be running, but am not really sure.

So aside from resolving this particular problem, can you just boil
down the proposal?

Bruce

Frank_Bennett · May 8, 2009, 12:46am

Frank,

Can you add a specific proposal to the wiki?

Also, on RDFa, the main issue will be keeping track of URIs for
different objects. But I think it a really good idea for people to
think about this.

I have implemented a mechanism for arbitrary wiki-style in-field
markup in citeproc-js that may help reduce the barriers to support for
semantic markup. I’ve added a proposal to the wiki that attempts to
explain how I think it might be useful, with a link to the citeproc-js
sources and a reference to the rpc-stuff demo.

I’m having a hard time understanding exactly what you’re proposing,
and a) am really busy, and b) am running into this problem:

$ ./client.sh
./results/config.result: No such file or directory
wget exited with error … is server.py running?

The server seems to be running, but am not really sure.

Sorry about that. I’ve fixed the script, next time you check you can
get it with pull and update.

So aside from resolving this particular problem, can you just boil
down the proposal?

The idea is that a Zotero user should be able to go to Preferences and
register a set of wiki markup strings for use with their database,
associating each with a Zotero presentational function and an
alternate (@font-style:italics/@font-style:normal,
@quotes:true/@squotes:true, etc.). Multiple markup strings can be
associated with the same presentational element.

This gets wiki visual markup going in a way that plays well with
legacy databases (I’m thinking especially of Rintze here) – and since
the local markup conventions are known to the processor, for RDF
export, say, each marker can be converted to a semantic markup form.
The exporting user would have to take care that the markup in the
entries being exported actually lines up with the semantic
association, but if that small pitfall is acceptable, this allows you
to convert existing stores, and stores that people (inevitably) build
with visual-only markup embedded in their records, into a standard
form of semantic markup suitable for wider circulation and
collaborative projects.

I hope that’s clear. The key point is that having a configurable
syntax in Zotero would bring some happiness to people on both sides of
the semantic/visual discussion, and open a low-cost path to building
shared databases with true semantic markup in them.

Frank

Bruce_D_Arcus1 · May 8, 2009, 1:37pm

…

The idea is that a Zotero user should be able to go to Preferences and
register a set of wiki markup strings for use with their database,
associating each with a Zotero presentational function and an
alternate (@font-style:italics/@font-style:normal,
@quotes:true/@squotes:true, etc.). Multiple markup strings can be
associated with the same presentational element.

Right; I get that part. But a) citeproc-js can and should be agnostic
about the application, and b) I just wasn’t of how this would work in
practice.

This gets wiki visual markup going in a way that plays well with
legacy databases (I’m thinking especially of Rintze here) – and since
the local markup conventions are known to the processor, for RDF
export, say, each marker can be converted to a semantic markup form.

OK, but how?

The exporting user would have to take care that the markup in the
entries being exported actually lines up with the semantic
association, but if that small pitfall is acceptable, this allows you
to convert existing stores, and stores that people (inevitably) build
with visual-only markup embedded in their records, into a standard
form of semantic markup suitable for wider circulation and
collaborative projects.

I hope that’s clear. The key point is that having a configurable
syntax in Zotero would bring some happiness to people on both sides of
the semantic/visual discussion, and open a low-cost path to building
shared databases with true semantic markup in them.

Yes, but let’s forget about Zotero for a moment. Let’s say someone
wants to build out your server example and have citations and
bibliographies done as a web service. How would this work? What input
would they get? How does it get mapped to a particular output?

Also, are you saying we offer no attempt to define any standard
semantic micro-structures?

I’m sure you’ve thought about this; it just hasn’t translated into
text that my brain can process

Bruce

Rintze_Zelle · May 8, 2009, 2:50pm

Also, are you saying we offer no attempt to define any standard
semantic micro-structures?

Your main (or even only?) goal for using semantic markup is to get correct
presentational markup between different styles, right? Which is needed when
there is some semantic-dependent variation in the desired presentational
markup between styles?

I was wondering whether it would make sense to separate semantic and
presentational markup somewhat, at least for the chemical/genetic
nomenclature, thus allowing for more coarse semantics. This

L-[methyl-14C]methionine

would require much less detailed semantic information than

L</chemical-enantiomere

-[methyl</chemical-sidegroup
-14C</chemical-isotope
]methionine

and would allow chemical-markup to be handled differently from e.g.
quote-markup.

Rintze>

Bruce_D_Arcus1 · May 8, 2009, 2:56pm

Whoa; just to be clear, I am NOT suggesting inventing a new XML
micro-language for this. Am suggesting using (X)HTML.

Am busy, but a quick example (e.g. the details are surely wrong) could
theoretically be:

L-[methyl-14c]

One could also imagine standard presentational classes (“italic”, “bold”, etc.).

Bruce

Rintze_Zelle · May 8, 2009, 3:12pm

Whoa; just to be clear, I am NOT suggesting inventing a new XML
micro-language for this.

Neither was I (sorry for my botched attempt to explain myself).

Am suggesting using (X)HTML.

Am busy, but a quick example (e.g. the details are surely wrong) could
theoretically be:

L-[methyl-14c]

One could also imagine standard presentational classes (“italic”, “bold”,
etc.).

My point was more that, if all you need is to distinguish chemistry-markup
as a group from e.g. quote-markup (which I think is enough in (almost) all
cases), I wouldn’t want to put the significant burden on the user to go into
the nitty-gritty details on how to properly markup the chemical names in the
titles of his/her library semantically (if you look at the title of a paper,
a layman can see how it is formatted presentationally, but it would require
a significant amount of knowledge about chemistry to see how that
presentational markup represents semantic information). Just putting the
whole chemical-name in a tag would be enough, and pure presentational tags
could be used inside the tagged chemical-name to identify the individual
parts that require markup. It would also greatly simplify the total amount
of classes that are required.

Rintze

Bruce_D_Arcus1 · May 8, 2009, 3:27pm

Yes, that’s fine. I agree any semantic content should be optional.

I’m just wondering if, if we agree on that, what the optional content
should be. It sounds like Frank was saying we don’t care; totally up
to the user.

Bruce

Frank_Bennett · May 8, 2009, 11:39pm

…

The idea is that a Zotero user should be able to go to Preferences and
register a set of wiki markup strings for use with their database,
associating each with a Zotero presentational function and an
alternate (@font-style:italics/@font-style:normal,
@quotes:true/@squotes:true, etc.). Multiple markup strings can be
associated with the same presentational element.

Right; I get that part. But a) citeproc-js can and should be agnostic
about the application, and b) I just wasn’t of how this would work in
practice.

This gets wiki visual markup going in a way that plays well with
legacy databases (I’m thinking especially of Rintze here) – and since
the local markup conventions are known to the processor, for RDF
export, say, each marker can be converted to a semantic markup form.

OK, but how?

The exporting user would have to take care that the markup in the
entries being exported actually lines up with the semantic
association, but if that small pitfall is acceptable, this allows you
to convert existing stores, and stores that people (inevitably) build
with visual-only markup embedded in their records, into a standard
form of semantic markup suitable for wider circulation and
collaborative projects.

I hope that’s clear. The key point is that having a configurable
syntax in Zotero would bring some happiness to people on both sides of
the semantic/visual discussion, and open a low-cost path to building
shared databases with true semantic markup in them.

Yes, but let’s forget about Zotero for a moment. Let’s say someone
wants to build out your server example and have citations and
bibliographies done as a web service. How would this work? What input
would they get? How does it get mapped to a particular output?

Also, are you saying we offer no attempt to define any standard
semantic micro-structures?

I’m sure you’ve thought about this; it just hasn’t translated into
text that my brain can process

Clarity is not my strong suit.

I’ll doodle up an illustration of the workflow I have in mind and put
it on the wiki. We do care that the data end up with a uniform markup
scheme when it is shared between users; the idea is to create a
funnel, so that diverse markup patterns can be migrated via the
processor to that unified scheme.

Coming soon …

Frank_Bennett · May 11, 2009, 12:34am

…

The idea is that a Zotero user should be able to go to Preferences and
register a set of wiki markup strings for use with their database,
associating each with a Zotero presentational function and an
alternate (@font-style:italics/@font-style:normal,
@quotes:true/@squotes:true, etc.). Multiple markup strings can be
associated with the same presentational element.

Right; I get that part. But a) citeproc-js can and should be agnostic
about the application, and b) I just wasn’t of how this would work in
practice.

This gets wiki visual markup going in a way that plays well with
legacy databases (I’m thinking especially of Rintze here) – and since
the local markup conventions are known to the processor, for RDF
export, say, each marker can be converted to a semantic markup form.

OK, but how?

The exporting user would have to take care that the markup in the
entries being exported actually lines up with the semantic
association, but if that small pitfall is acceptable, this allows you
to convert existing stores, and stores that people (inevitably) build
with visual-only markup embedded in their records, into a standard
form of semantic markup suitable for wider circulation and
collaborative projects.

I hope that’s clear. The key point is that having a configurable
syntax in Zotero would bring some happiness to people on both sides of
the semantic/visual discussion, and open a low-cost path to building
shared databases with true semantic markup in them.

Yes, but let’s forget about Zotero for a moment. Let’s say someone
wants to build out your server example and have citations and
bibliographies done as a web service. How would this work? What input
would they get? How does it get mapped to a particular output?

Also, are you saying we offer no attempt to define any standard
semantic micro-structures?

I’m sure you’ve thought about this; it just hasn’t translated into
text that my brain can process

Clarity is not my strong suit.

I’ll doodle up an illustration of the workflow I have in mind and put
it on the wiki. We do care that the data end up with a uniform markup
scheme when it is shared between users; the idea is to create a
funnel, so that diverse markup patterns can be migrated via the
processor to that unified scheme.

Coming soon …

I’ve begun building a set of illustrations on the wiki:

XBib download | SourceForge.net

I will carry forward and lay out how the workflow of a configurable
wiki syntax mechanism would work, but there’s enough there to
illustrate the problem that I see emerging with the introduction of
inline markup.

The next step can be summarized very simply, as analogous to the mess
with code pages and other character encoding hacks, when the time came
to map them onto Unicode. A CSL inline markup processor essentially
needs to be able to do the same thing, but mapping from arbitrary ad
hoc user-defined markup arrangements to a standard scheme. If it has
that ability, you can get out in front of the problem and provide a
smooth migration path.

The result will not be perfect uniformity, but if you have a
demonstrated mechanism in place of producing uniformity, and a model
for what it should look like, then you can forestall vendors from
deploying proprietary markup, if and when semantic markup becomes the
Next Big Thing — and they can be trusted to do exactly that. For an
egregious example involving our friends at Thomson Reuters, see West
Pub. Co. v. Mead Data Cent., Inc., 616 F. Supp. 1571 (D. Minn. 1985),
aff’d, 799 F.2d 1219 (8th Cir.), cert. denied, 479 U.S. 1070 (1986):

Westlaw - Wikipedia

(As I’ve said before, not nice people.)

More coming, but that’s the line of reasoning.

Frank

Bruce_D_Arcus1 · May 11, 2009, 1:00pm

I’ve begun building a set of illustrations on the wiki:

XBib download | SourceForge.net

I will carry forward and lay out how the workflow of a configurable
wiki syntax mechanism would work, but there’s enough there to
illustrate the problem that I see emerging with the introduction of
inline markup.

Yes.

The next step can be summarized very simply, as analogous to the mess
with code pages and other character encoding hacks, when the time came
to map them onto Unicode. A CSL inline markup processor essentially
needs to be able to do the same thing, but mapping from arbitrary ad
hoc user-defined markup arrangements to a standard scheme. If it has
that ability, you can get out in front of the problem and provide a
smooth migration path.

But it remains an open question whether the normalization happens at
the CSL stage, and whether user’s really get the choice to invent
their own markups.

For sake of argument, applications like Zotero could invent schemes
that only work through the GUI. Imagine a contextual-menu item
populated with a preset numbers of class items, with the ability of
the user to add new ones. Maybe those can get optional keyboard
bindings.

That offers the same advantages to users, but allows us to say “we
only accept inline markup of X form.” It also solves data
import/export problems.

…

More coming, but that’s the line of reasoning.

Yeah, I’m not yet seeing the precise proposal vis-a-vis CSL

Bruce

Frank_Bennett · May 11, 2009, 1:39pm

I’ve begun building a set of illustrations on the wiki:

XBib download | SourceForge.net

I will carry forward and lay out how the workflow of a configurable
wiki syntax mechanism would work, but there’s enough there to
illustrate the problem that I see emerging with the introduction of
inline markup.

Yes.

The next step can be summarized very simply, as analogous to the mess
with code pages and other character encoding hacks, when the time came
to map them onto Unicode. A CSL inline markup processor essentially
needs to be able to do the same thing, but mapping from arbitrary ad
hoc user-defined markup arrangements to a standard scheme. If it has
that ability, you can get out in front of the problem and provide a
smooth migration path.

But it remains an open question whether the normalization happens at
the CSL stage,

I thought I made that one clear. At the moment, it’s a choice of
Zotero+Mendeley+pandoc, or
Word+OpenOffice+WordForMac+various-scripting-languages,
or the CSL processor. Which of those three categories has a single
mailing list via which
you can request comments from all of the developers involved?

and whether user’s really get the choice to invent
their own markups.

They already have.

For sake of argument, applications like Zotero could invent schemes
that only work through the GUI. Imagine a contextual-menu item
populated with a preset numbers of class items, with the ability of
the user to add new ones. Maybe those can get optional keyboard
bindings.

That would be great, but not all CSL deployments have a graphical
UI and pulldown menus.

That offers the same advantages to users, but allows us to say “we
only accept inline markup of X form.” It also solves data
import/export problems.

If the processor is fully configurable for markup, any engine
that wanted to control it in that way could do so. But if the processor
is fully configurable, it’s not necessary to impose restrictions that strand
users who have legacy markup schemes in their existing databases.

…

More coming, but that’s the line of reasoning.

Yeah, I’m not yet seeing the precise proposal vis-a-vis CSL

All things with time. I’m kind of busy myself at the moment.

Frank

Bruce_D_Arcus1 · May 11, 2009, 2:17pm

…

But it remains an open question whether the normalization happens at
the CSL stage,

I thought I made that one clear. At the moment, it’s a choice of
Zotero+Mendeley+pandoc, or
Word+OpenOffice+WordForMac+various-scripting-languages,
or the CSL processor. Which of those three categories has a single
mailing list via which
you can request comments from all of the developers involved?

Well, we could say “CSL accepts X” and leave it to the application to
create that representation.

I’m not saying I’m advocating it; just that it’s an obvious option.

and whether user’s really get the choice to invent
their own markups.

They already have.

For sake of argument, applications like Zotero could invent schemes
that only work through the GUI. Imagine a contextual-menu item
populated with a preset numbers of class items, with the ability of
the user to add new ones. Maybe those can get optional keyboard
bindings.

That would be great, but not all CSL deployments have a graphical
UI and pulldown menus.

Right, but does it matter?

I definitely prefer writing in emacs and using something like pandoc,
so I’m not forgetting that use case; just saying that it can be up to
Andrea what he expects. He just needs to know how it should interact
with CSL.

That offers the same advantages to users, but allows us to say “we
only accept inline markup of X form.” It also solves data
import/export problems.

If the processor is fully configurable for markup, any engine
that wanted to control it in that way could do so. But if the processor
is fully configurable, it’s not necessary to impose restrictions that strand
users who have legacy markup schemes in their existing databases.

But you would admit that adding this kind of configuration (any
really) has costs; right? While one or two people have requested it,
for example, we don’t allow people to invent arbitrary new variables
and types in CSL because doing so would impose a lot of pain.

In any case, I’ll have to wait to see your proposal.

More coming, but that’s the line of reasoning.

Yeah, I’m not yet seeing the precise proposal vis-a-vis CSL

All things with time. I’m kind of busy myself at the moment.

Same here. No rush on my end. We’ve still got a couple months to figure it out.

Bruce

Frank_Bennett · May 11, 2009, 9:19pm

…

But it remains an open question whether the normalization happens at
the CSL stage,

I thought I made that one clear. At the moment, it’s a choice of
Zotero+Mendeley+pandoc, or
Word+OpenOffice+WordForMac+various-scripting-languages,
or the CSL processor. Which of those three categories has a single
mailing list via which
you can request comments from all of the developers involved?

Well, we could say “CSL accepts X” and leave it to the application to
create that representation.

I’m not saying I’m advocating it; just that it’s an obvious option.

and whether user’s really get the choice to invent
their own markups.

They already have.

For sake of argument, applications like Zotero could invent schemes
that only work through the GUI. Imagine a contextual-menu item
populated with a preset numbers of class items, with the ability of
the user to add new ones. Maybe those can get optional keyboard
bindings.

That would be great, but not all CSL deployments have a graphical
UI and pulldown menus.

Right, but does it matter?

I definitely prefer writing in emacs and using something like pandoc,
so I’m not forgetting that use case; just saying that it can be up to
Andrea what he expects. He just needs to know how it should interact
with CSL.

That offers the same advantages to users, but allows us to say “we
only accept inline markup of X form.” It also solves data
import/export problems.

If the processor is fully configurable for markup, any engine
that wanted to control it in that way could do so. But if the processor
is fully configurable, it’s not necessary to impose restrictions that strand
users who have legacy markup schemes in their existing databases.

But you would admit that adding this kind of configuration (any
really) has costs; right? While one or two people have requested it,
for example, we don’t allow people to invent arbitrary new variables
and types in CSL because doing so would impose a lot of pain.

In any case, I’ll have to wait to see your proposal.

Okay, I’m starting to get this into focus. The flip-flop code that
I’ve written could as easily form part of a pre-processing chain in
Zotero for the normalization of legacy data. I don’t want this
sub-mechanism to be a distraction from the task of settling the core
markup, so I’ll move the “Proposal #2” content to the citeproc-js
wiki.

Frank

Frank_Bennett · May 12, 2009, 1:12am

I’ve begun building a set of illustrations on the wiki:

XBib download | SourceForge.net

I will carry forward and lay out how the workflow of a configurable
wiki syntax mechanism would work, but there’s enough there to
illustrate the problem that I see emerging with the introduction of
inline markup.

Yes.

The next step can be summarized very simply, as analogous to the mess
with code pages and other character encoding hacks, when the time came
to map them onto Unicode. A CSL inline markup processor essentially
needs to be able to do the same thing, but mapping from arbitrary ad
hoc user-defined markup arrangements to a standard scheme. If it has
that ability, you can get out in front of the problem and provide a
smooth migration path.

But it remains an open question whether the normalization happens at
the CSL stage, and whether user’s really get the choice to invent
their own markups.

For sake of argument, applications like Zotero could invent schemes
that only work through the GUI. Imagine a contextual-menu item
populated with a preset numbers of class items, with the ability of
the user to add new ones. Maybe those can get optional keyboard
bindings.

Looking back, I confess to getting confused at this point. If the purpose of
semantic classes is to allow an element to be rendered differently in
different styles, how does the style know what presentation to associate
with a user-defined element? That will need to be known at some point
in the chain, and for RTF, it needs to be known to the CSL processor.

Wouldn’t it be better to offer a set of semantic labels for a few elements
that either have special significance for indexing and linking (like chemical
names) or which may render differently in different styles (like titles),
and a set of purely presentational tags, and leave it at that? Then if
a user gets stuck for a need to render differently to different publications,
they can work around it short term with presentation markup, and when they
complain you can add a fresh semantic tag to CSL to cover their use case.

That would give you a predictable central store of known semantic elements,
which style authors could map to appropriate presentation. The
semantic → presentation mapping should be controlled at the style
level, it seems to me.

Simon_Kornblith · May 12, 2009, 2:48am

I haven’t been following this thread too closely, but I think the
restricted label set approach is best. I can think of ways of
implementing an extensible approach elegantly in Zotero, but I don’t
really think it’s worth the effort. For one, what’s not covered by a
restricted approach at any given time would likely be fringe use
cases, and the UI implementation cost would likely outweigh its
utility. Additionally, if the URIs (or identifiers of another sort)
that are used to specify semantic labels aren’t normalized, they’re
not particularly useful anyway. In either case, it seems absolutely
vital that CSL specify some normative set of labels along with default
presentational styles (and maybe associated markup syntax).

Simon

Topic		Replies	Views
Sub-field parsing CSL Development	31	2050	July 16, 2020
Types CSL Development	21	399	August 15, 2007
What is the use case and meaning of rich-text's "span" elements? CSL Development	7	516	September 25, 2010
CSL processor status CSL Development	12	260	May 19, 2009
CSL editor CSL Development	5	317	April 13, 2009

inline markup wiki entry

Related topics