Markup in titles

Is there any allowance in the citeproc-json format or in any of the tools to deal with articles that have markup in titles? For example, here is an article with a sup element in the title, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC26831/.

I suspect that the markup is just dropped, but wanted to double check. Has it been discussed before? I searched the mailing list archives, with no luck.

Chris Maloney
NIH/NLM/NCBI (Contractor)
Building 45, 5AN.24D-22
301-594-2842

citeproc-js - and hence CSL JSON - accept html markup for subscript,
superscript, italics, bold, and small caps:
These: http://www.zotero.org/support/kb/rich_text_bibliography
get passed on literally to citeproch, i.e. your example should ideally have:
“title”: “Solutions of a Lagrangian system on T2”,
which is, I see, what’s already in the XML output from PMC. I’ll look at
implementing that on the Zotero import side.

Thanks for the quick response.

So, it looks like this is a pseudo-HTML format, that only supports the limited set of tags, and no character entity references, right? Is this the complete set of elements: , , , , and ?

I did some testing with citeproc-json, and it seems to handle it surprisingly well. Here’s the results of my tests converting into MLA in HTML format:

‘πr2 & pies are round.’ => ‘πr 2 & Pies Are Round’
’ => ‘<sup>’
’ => ‘<sup/>’
ij’ => ‘ij

But it means (as I guess you all are probably aware) that there are certain strings that cannot appear in one of these fields. For example, if I wanted to talk about the literal string “j” in my abstract, I don’t think there’s any way it could be represented, is there?

Chris Maloney
NIH/NLM/NCBI (Contractor)
Building 45, 5AN.24D-22
301-594-2842From: Sebastian Karcher [mailto:@Sebastian_Karcher]
Sent: Wednesday, February 12, 2014 10:17 AM
To: development discussion for xbiblio
Subject: Re: [xbiblio-devel] Markup in titles

citeproc-js - and hence CSL JSON - accept html markup for subscript, superscript, italics, bold, and small caps:
These: http://www.zotero.org/support/kb/rich_text_bibliography
get passed on literally to citeproch, i.e. your example should ideally have:
“title”: “Solutions of a Lagrangian system on T2”,
which is, I see, what’s already in the XML output from PMC. I’ll look at implementing that on the Zotero import side.

On Wed, Feb 12, 2014 at 7:58 AM, Maloney, Christopher (NIH/NLM/NCBI) [C] <@Maloney_Christophermailto:Maloney_Christopher> wrote:
Is there any allowance in the citeproc-json format or in any of the tools to deal with articles that have markup in titles? For example, here is an article with a sup element in the title, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC26831/.

I suspect that the markup is just dropped, but wanted to double check. Has it been discussed before? I searched the mailing list archives, with no luck.

Chris Maloney
NIH/NLM/NCBI (Contractor)
Building 45, 5AN.24D-22
301-594-2842tel:301-594-2842


Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience. Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk


xbiblio-devel mailing list
xbiblio-devel@lists.sourceforge.netmailto:xbiblio-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

So, it looks like this is a pseudo-HTML format, that only supports the
limited set of tags, and no character entity references, right? Is this
the complete set of elements: , , , , and ?

citeproc-js also accepts for legacy reasons, though we advise against
using it.

But it means (as I guess you all are probably aware) that there are
certain strings that cannot appear in one of these fields. For example, if
I wanted to talk about the literal string “j” in my abstract, I
don’t think there’s any way it could be represented, is there?

It has never come up, but you can use backslash to escape html tags, i.e.
<i>j</i> renders as "j. You can escape backslashes with double
backslashes. This isn’t heavily tested and I don’t know to what degree
escaping via backslash is “officially” supported, but it works if you need
it.>

and yes to this:
"So, it looks like this is a pseudo-HTML format, that only supports the
limited set of tags, and no character entity references, right"
citeproc-js handles these individually, it doesn’t run a html parser or
anything like that.

Great, thanks!

Chris Maloney
NIH/NLM/NCBI (Contractor)
Building 45, 5AN.24D-22
301-594-2842From: Sebastian Karcher [mailto:@Sebastian_Karcher]
Sent: Wednesday, February 12, 2014 11:48 AM
To: development discussion for xbiblio
Subject: Re: [xbiblio-devel] Markup in titles

and yes to this:
"So, it looks like this is a pseudo-HTML format, that only supports the limited set of tags, and no character entity references, right"
citeproc-js handles these individually, it doesn’t run a html parser or anything like that.

On Wed, Feb 12, 2014 at 9:46 AM, Sebastian Karcher <@Sebastian_Karchermailto:Sebastian_Karcher> wrote:

So, it looks like this is a pseudo-HTML format, that only supports the limited set of tags, and no character entity references, right? Is this the complete set of elements: , , , , and ?
citeproc-js also accepts for legacy reasons, though we advise against using it.

But it means (as I guess you all are probably aware) that there are certain strings that cannot appear in one of these fields. For example, if I wanted to talk about the literal string “j” in my abstract, I don’t think there’s any way it could be represented, is there?
It has never come up, but you can use backslash to escape html tags, i.e. <i>j</i> renders as "j. You can escape backslashes with double backslashes. This isn’t heavily tested and I don’t know to what degree escaping via backslash is “officially” supported, but it works if you need it.

Chris Maloney
NIH/NLM/NCBI (Contractor)
Building 45, 5AN.24D-22
301-594-2842tel:301-594-2842

From: Sebastian Karcher [mailto:@Sebastian_Karchermailto:Sebastian_Karcher]
Sent: Wednesday, February 12, 2014 10:17 AM
To: development discussion for xbiblio
Subject: Re: [xbiblio-devel] Markup in titles

citeproc-js - and hence CSL JSON - accept html markup for subscript, superscript, italics, bold, and small caps:
These: http://www.zotero.org/support/kb/rich_text_bibliography
get passed on literally to citeproch, i.e. your example should ideally have:
“title”: “Solutions of a Lagrangian system on T2”,
which is, I see, what’s already in the XML output from PMC. I’ll look at implementing that on the Zotero import side.

On Wed, Feb 12, 2014 at 7:58 AM, Maloney, Christopher (NIH/NLM/NCBI) [C] <@Maloney_Christophermailto:Maloney_Christopher> wrote:
Is there any allowance in the citeproc-json format or in any of the tools to deal with articles that have markup in titles? For example, here is an article with a sup element in the title, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC26831/.

I suspect that the markup is just dropped, but wanted to double check. Has it been discussed before? I searched the mailing list archives, with no luck.

Chris Maloney
NIH/NLM/NCBI (Contractor)
Building 45, 5AN.24D-22
301-594-2842tel:301-594-2842


Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience. Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk


xbiblio-devel mailing list
xbiblio-devel@lists.sourceforge.netmailto:xbiblio-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

As you might guess, there are some tricky trade-offs here. We’re
trying to be practical.

Yes, I’m aware of the tradeoffs, and the motivation for doing things this way: mainly so as not to force users to enter every ampersand as “&” and every less-than sign as “<”.

But I’m also aware of how tricky things can get when you invent your own markup format that looks a lot like html, but isn’t. And I know that a lot of other devs aren’t aware of these issues, so I thought I’d mention them.

Chris Maloney
NIH/NLM/NCBI (Contractor)
Building 45, 5AN.24D-22
301-594-2842From my testing, it looks like citeproc-json does a really good job.

while I have you here - do you know if the way the superscript is handled
in the PMC xml record the way this would generally appear for pubmedXML?
What other html tags should we expect there?

Not to mention there’s broad unicode support.

Yes, you do have me! In PMC, we store article titles in JATS XML, http://jatspan.org/niso/publishing-1.1d1/#p=elem-article-title, which allows inline markup, and, of course, is well-formatted XML.

PubMed usually drops the markup. I think there is work afoot to get rich text into the PubMed titles and abstracts, but I’m not sure the status. I’ve seen people here suggesting these kinds of pseudo-HTML fields, and I’m always warning them of the dangers, so that’s where I’m coming from.

Chris Maloney
NIH/NLM/NCBI (Contractor)
Building 45, 5AN.24D-22
301-594-2842From: Sebastian Karcher [mailto:@Sebastian_Karcher]
Sent: Wednesday, February 12, 2014 12:04 PM
To: development discussion for xbiblio
Subject: Re: [xbiblio-devel] Markup in titles

while I have you here - do you know if the way the superscript is handled in the PMC xml record the way this would generally appear for pubmedXML? What other html tags should we expect there?

On Wed, Feb 12, 2014 at 10:00 AM, Maloney, Christopher (NIH/NLM/NCBI) [C] <@Maloney_Christophermailto:Maloney_Christopher> wrote:
Great, thanks!

Chris Maloney
NIH/NLM/NCBI (Contractor)
Building 45, 5AN.24D-22
301-594-2842tel:301-594-2842

From: Sebastian Karcher [mailto:@Sebastian_Karchermailto:Sebastian_Karcher]
Sent: Wednesday, February 12, 2014 11:48 AM