Markup in titles

Maloney_Christopher · February 12, 2014, 2:58pm

Is there any allowance in the citeproc-json format or in any of the tools to deal with articles that have markup in titles? For example, here is an article with a sup element in the title, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC26831/.

I suspect that the markup is just dropped, but wanted to double check. Has it been discussed before? I searched the mailing list archives, with no luck.

Chris Maloney
NIH/NLM/NCBI (Contractor)
Building 45, 5AN.24D-22
301-594-2842

Sebastian_Karcher · February 12, 2014, 3:16pm

citeproc-js - and hence CSL JSON - accept html markup for subscript,
superscript, italics, bold, and small caps:
These: http://www.zotero.org/support/kb/rich_text_bibliography
get passed on literally to citeproch, i.e. your example should ideally have:
“title”: “Solutions of a Lagrangian system on T²”,
which is, I see, what’s already in the XML output from PMC. I’ll look at
implementing that on the Zotero import side.

Maloney_Christopher · February 12, 2014, 4:24pm

Thanks for the quick response.

So, it looks like this is a pseudo-HTML format, that only supports the limited set of tags, and no character entity references, right? Is this the complete set of elements: , , ^{, _{, and ?}}

I did some testing with citeproc-json, and it seems to handle it surprisingly well. Here’s the results of my tests converting into MLA in HTML format:

‘πr² & pies are round.’ => ‘πr ² & Pies Are Round’
’^{’ => ‘’

’^{’ => ‘’

‘^ij’ => ‘^ij’}}

But it means (as I guess you all are probably aware) that there are certain strings that cannot appear in one of these fields. For example, if I wanted to talk about the literal string “j” in my abstract, I don’t think there’s any way it could be represented, is there?

Chris Maloney
NIH/NLM/NCBI (Contractor)
Building 45, 5AN.24D-22
301-594-2842From: Sebastian Karcher [mailto:@Sebastian_Karcher]
Sent: Wednesday, February 12, 2014 10:17 AM
To: development discussion for xbiblio
Subject: Re: [xbiblio-devel] Markup in titles

citeproc-js - and hence CSL JSON - accept html markup for subscript, superscript, italics, bold, and small caps:
These: http://www.zotero.org/support/kb/rich_text_bibliography
get passed on literally to citeproch, i.e. your example should ideally have:
“title”: “Solutions of a Lagrangian system on T²”,
which is, I see, what’s already in the XML output from PMC. I’ll look at implementing that on the Zotero import side.

On Wed, Feb 12, 2014 at 7:58 AM, Maloney, Christopher (NIH/NLM/NCBI) [C] <@Maloney_Christophermailto:Maloney_Christopher> wrote:
Is there any allowance in the citeproc-json format or in any of the tools to deal with articles that have markup in titles? For example, here is an article with a sup element in the title, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC26831/.

I suspect that the markup is just dropped, but wanted to double check. Has it been discussed before? I searched the mailing list archives, with no luck.

Chris Maloney
NIH/NLM/NCBI (Contractor)
Building 45, 5AN.24D-22
301-594-2842tel:301-594-2842

Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience. Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk

xbiblio-devel mailing list
xbiblio-devel@lists.sourceforge.net mailto:xbiblio-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

Sebastian_Karcher · February 12, 2014, 4:46pm

So, it looks like this is a pseudo-HTML format, that only supports the
limited set of tags, and no character entity references, right? Is this
the complete set of elements: , , ^{, _{, and ?}}

citeproc-js also accepts for legacy reasons, though we advise against
using it.

But it means (as I guess you all are probably aware) that there are
certain strings that cannot appear in one of these fields. For example, if
I wanted to talk about the literal string “j” in my abstract, I
don’t think there’s any way it could be represented, is there?

It has never come up, but you can use backslash to escape html tags, i.e.
j renders as "j. You can escape backslashes with double
backslashes. This isn’t heavily tested and I don’t know to what degree
escaping via backslash is “officially” supported, but it works if you need
it.>

Sebastian_Karcher · February 12, 2014, 4:47pm

and yes to this:
"So, it looks like this is a pseudo-HTML format, that only supports the
limited set of tags, and no character entity references, right"
citeproc-js handles these individually, it doesn’t run a html parser or
anything like that.

Maloney_Christopher · February 12, 2014, 5:00pm

Great, thanks!

Chris Maloney
NIH/NLM/NCBI (Contractor)
Building 45, 5AN.24D-22
301-594-2842From: Sebastian Karcher [mailto:@Sebastian_Karcher]
Sent: Wednesday, February 12, 2014 11:48 AM
To: development discussion for xbiblio
Subject: Re: [xbiblio-devel] Markup in titles

and yes to this:
"So, it looks like this is a pseudo-HTML format, that only supports the limited set of tags, and no character entity references, right"
citeproc-js handles these individually, it doesn’t run a html parser or anything like that.

On Wed, Feb 12, 2014 at 9:46 AM, Sebastian Karcher <@Sebastian_Karchermailto:Sebastian_Karcher> wrote:

So, it looks like this is a pseudo-HTML format, that only supports the limited set of tags, and no character entity references, right? Is this the complete set of elements: , , ^{, _{, and ?

citeproc-js also accepts for legacy reasons, though we advise against using it.}}

But it means (as I guess you all are probably aware) that there are certain strings that cannot appear in one of these fields. For example, if I wanted to talk about the literal string “j” in my abstract, I don’t think there’s any way it could be represented, is there?
It has never come up, but you can use backslash to escape html tags, i.e. j renders as "j. You can escape backslashes with double backslashes. This isn’t heavily tested and I don’t know to what degree escaping via backslash is “officially” supported, but it works if you need it.

Chris Maloney
NIH/NLM/NCBI (Contractor)
Building 45, 5AN.24D-22
301-594-2842tel:301-594-2842

From: Sebastian Karcher [mailto:@Sebastian_Karcher mailto:Sebastian_Karcher]
Sent: Wednesday, February 12, 2014 10:17 AM
To: development discussion for xbiblio
Subject: Re: [xbiblio-devel] Markup in titles

citeproc-js - and hence CSL JSON - accept html markup for subscript, superscript, italics, bold, and small caps:
These: http://www.zotero.org/support/kb/rich_text_bibliography
get passed on literally to citeproch, i.e. your example should ideally have:
“title”: “Solutions of a Lagrangian system on T²”,
which is, I see, what’s already in the XML output from PMC. I’ll look at implementing that on the Zotero import side.

On Wed, Feb 12, 2014 at 7:58 AM, Maloney, Christopher (NIH/NLM/NCBI) [C] <@Maloney_Christophermailto:Maloney_Christopher> wrote:
Is there any allowance in the citeproc-json format or in any of the tools to deal with articles that have markup in titles? For example, here is an article with a sup element in the title, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC26831/.

I suspect that the markup is just dropped, but wanted to double check. Has it been discussed before? I searched the mailing list archives, with no luck.

Chris Maloney
NIH/NLM/NCBI (Contractor)
Building 45, 5AN.24D-22
301-594-2842tel:301-594-2842

Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience. Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk

xbiblio-devel mailing list
xbiblio-devel@lists.sourceforge.net mailto:xbiblio-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

Bruce_D_Arcus1 · February 12, 2014, 5:23pm

As you might guess, there are some tricky trade-offs here. We’re
trying to be practical.

Maloney_Christopher · February 12, 2014, 5:28pm

Yes, I’m aware of the tradeoffs, and the motivation for doing things this way: mainly so as not to force users to enter every ampersand as “&” and every less-than sign as “<”.

But I’m also aware of how tricky things can get when you invent your own markup format that looks a lot like html, but isn’t. And I know that a lot of other devs aren’t aware of these issues, so I thought I’d mention them.

Chris Maloney
NIH/NLM/NCBI (Contractor)
Building 45, 5AN.24D-22
301-594-2842From my testing, it looks like citeproc-json does a really good job.

Sebastian_Karcher · February 12, 2014, 5:03pm

while I have you here - do you know if the way the superscript is handled
in the PMC xml record the way this would generally appear for pubmedXML?
What other html tags should we expect there?

Bruce_D_Arcus1 · February 12, 2014, 5:38pm

Not to mention there’s broad unicode support.

Maloney_Christopher · February 12, 2014, 6:14pm

Yes, you do have me! In PMC, we store article titles in JATS XML, http://jatspan.org/niso/publishing-1.1d1/#p=elem-article-title, which allows inline markup, and, of course, is well-formatted XML.

PubMed usually drops the markup. I think there is work afoot to get rich text into the PubMed titles and abstracts, but I’m not sure the status. I’ve seen people here suggesting these kinds of pseudo-HTML fields, and I’m always warning them of the dangers, so that’s where I’m coming from.

Chris Maloney
NIH/NLM/NCBI (Contractor)
Building 45, 5AN.24D-22
301-594-2842From: Sebastian Karcher [mailto:@Sebastian_Karcher]
Sent: Wednesday, February 12, 2014 12:04 PM
To: development discussion for xbiblio
Subject: Re: [xbiblio-devel] Markup in titles

while I have you here - do you know if the way the superscript is handled in the PMC xml record the way this would generally appear for pubmedXML? What other html tags should we expect there?

On Wed, Feb 12, 2014 at 10:00 AM, Maloney, Christopher (NIH/NLM/NCBI) [C] <@Maloney_Christophermailto:Maloney_Christopher> wrote:
Great, thanks!

Chris Maloney
NIH/NLM/NCBI (Contractor)
Building 45, 5AN.24D-22
301-594-2842tel:301-594-2842

From: Sebastian Karcher [mailto:@Sebastian_Karcher mailto:Sebastian_Karcher]
Sent: Wednesday, February 12, 2014 11:48 AM

Topic		Replies	Views
Sub-field parsing CSL Development	31	2051	July 16, 2020
What is the use case and meaning of rich-text's "span" elements? CSL Development	7	516	September 25, 2010
Design Principles for CSL JSON CSL Development	76	2363	July 20, 2020
title casing skip words CSL Development	37	1008	November 21, 2013
inline markup wiki entry CSL Development	18	304	May 12, 2009

Markup in titles

Related topics