East Asian author names and white space for separation

CSL makes a difference between names written in East Asian writing systems (Korean, Chinese, Japanese) such that delimiters like ", " between family name and given name are collapsed. This is basically a good idea, since it reflects the tradition of writing these names.

Still, an option to separate East Asian names (with whitespace and/or symbols like ・) is vital.

  1. Many styles require such a separation.
  2. Korean names are most often separated this way.
  3. Western names transscribed to Japanese (in Katakana) are normally written with “・” inbetween, sometimes they need just whitespace.

These are examples that come to mind at the moment.

As a workaround, I tried adding whitespace, even East Asian double spaced white space before the given name, but these are collapsed.

For a center point “・”, these appear, but although input is double space, they appear like single spaced, which does not look good and makes the separation of family name and given name difficult. And anyway, it is a dirty workaround.

The defaults are great, but the cases I described are common now, and wish to have the option to set a separator in East Asian names.

I am writing several CSL styles at the moment for Japanese journals at the moment, but I do not see any work around.

@Frank_Bennett How is this handled in CSLm?

Since there is no answer yet, let me add my experience from using JurisM. It is the same as I described for Zotero.

I checked with the Nihon Shakaigaku Gakkai (author-name) style, there is no white space neither a nakaten.

Sorrry, I can’t really follow. Can you give a few examples? What’s happening at the moment, and what is the desired result?

I can’t read any of those writing systems, but perhaps you can you romanization to illustrate the problem.

Thanks for looking into it.

Take my name as an example. My Western name is Maria Shinoto. It comes from Japanese "篠藤真理亜”, which is transliterated correctly as “Shinotō Maria”. But there are syllable alphabets, so my name is sometime transcribed as シノトウ・マリア etc… Western names that have no Japanese original are most often transcribed in this form.

I can send you examples in Hangeul and Chinese as well, if you need them. The situation is a bit easier, since in Chinese every name is transliterated into the same writing system as the Chinese one, and in Korean, it is common to write any name in Hangeul – although there may be cases where the names in Chinese characters are preferred for East Asian names, but we may set this aside as a rare case that can be solved with the experience from the Japanese case for the moment.

Now to my problem:

Traditionally, the Chinese characters of a name are written without distinction of family name and given name, without separation and in the order “family name > given name”. But it is also common to separate both with a white space, sometimes even necessary. Foreign names are separated with the “・” (nakaten) in Japanese, in Hangeul and Chinese they are written like the East Asian names.

Zotero/JurisM or the CSL processor separates Western names from East Asian names automatically: It puts them together without whitespace or nakaten. Here are examples:

  • CSL:
    • 篠藤真理亜
    • シノトウマリア
  • Standard solution:
    • 篠藤 真理亜
    • シノトウ・マリア

So, there are two parts I would like to ask for:

  1. Keep the separation between names written in Western and Far Eastern writing systems,
    • I.e. When Western given names are initialized or separated be colon from the family name, keep East Asian names as is.
  2. Give an opportunity to adress given names and family names separately and set a separator in the CSL for East Asian names.
    • Things get complicated in Japanese, because we should get the chance to set a different separator (the nakaten), when the name is written in Katakana / syllables (I set the difference in looking at the Unicode numbers when writing Biblatex, but this was some time ago). But, if this is difficult, having the opportunity to separate all East Asian names in one way is a huge improvement, and one may set the few nakaten manually for the time being.
    • There should be a choice for the separator, since there are several forms of whitespace that differ from standard to standard, mainly double spaced (zenkaku) and normal spaced (hankaku).

Thanks for listening and for looking into it!
Maria

PS Just for the record, correct writing of my Japanese name is not all Kanji, but rather 篠藤マリア. :slightly_smiling_face:

I have a feeling we’re going to need input from @Frank_Bennett, given that he understands all this (the complexity of international naming including Asian names, and implementation details in both CSL and Zotero/Juris-M) better than any of us.

I can’t figure out how to translate your “I would like to ask for” items into specific CSL schema and/or specification changes, which is really what we need.

For example, what code or specification is responsible for which pieces? I get the sense that some is CSL-specific, but not all.

Thank you so much. It is good to know that people care.

I will try to help in any way possible, and I know a lot of international colleagues who are keen to have such a system for their work.

I am not a programmer, but I can help with anything XML related and I can at least read some code.

If you know how to read a relevant CSL file section for name rendering, and give us a short before and after of what you think you need, that might help.

OK, I can look into it this evening. If you have a special CSL to look into, just send me the link.

Otherwise I could start with one of these two:


Chicago Manual of Style 17th edition (author-date)

(木下 2020)

木下尚子. 2020. ‘埋葬と装身習俗から見た広田遺跡 ―下層期の3~5世紀を中心に’ Maisō to sōshin shūzoku kara mita Hirota iseki: Kasōki no 3 kara 5 seiki o chūshin ni. In 広田遺跡の研究 ー 人の形質・技術・移動 Hirota iseki no kenkyū: Hito no keishitsu, gijutsu, idō, edited by 木下尚子, 281–327. 平成29年度〜令和元年度科学研究費補助金基盤研究(B)研磨穿孔報告書. 熊本: 熊本大学.



日本社会学会 (author-date, Japanese)

(木下 2020)

木下尚子,2020,「埋葬と装身習俗から見た広田遺跡 ―下層期の3~5世紀を中心に」 Maisō to sōshin shūzoku kara mita Hirota iseki: Kasōki no 3 kara 5 seiki o chūshin ni木下尚子編『広田遺跡の研究 ー 人の形質・技術・移動』 Hirota iseki no kenkyū: Hito no keishitsu, gijutsu, idō熊本大学,平成29年度〜令和元年度科学研究費補助金基盤研究(B)研磨穿孔報告書281–327.


I am aware that we need to develop our own styles for Japanese journals and a multilingual book we are editing at the moment, but looking into both of these styles was helpful in the beginning. OK this way?

Yes, either of those would work.

Great.

You will get my first comments tomorrow morning.

Since I am a beginner in such an environment, I think I will make a lot of mistakes. Whenever necessary, do not hesitate to point me to mistakes or errors.

I looked into some CSL regarding the two Japanese entries in my library. I hope it helps a bit, I am still exploring. You can download the expamples from the link below until Friday, 24 April. If you need the link again, just tell me.

Both entries with all data in my JurisM database (including the variants) are in the two .png screenshots (jurism-entry-screenshot.png).

The main data, but not the variants, can be exported into various format, I just added the “japanese-multilingual-RIS.ris” which was the easiest to read and shows how the name sub fields are adressed independently even with CJK names.

Then I exported the bibliographical entries formatted in Chicago 17th edition with author-date – nobody would expect the variants to appear, but the CJK is collapsed, so I assume this is set at some stage before the CSL. The other format is “nihon shakai gakkai”, this is the only one that does adress variants, but it seems, it only accepts the Rōmaji, no English translation or Katakana.

I commented on these files in the file “comments-shinoto-2020-04-20.rtf”.

After that comment I looked into the CSL of the Chicago style and the nihon-shakaigakkai style. For your convenience I added both CSL files, but then I extraced most of the macros that are concerned with names; and the Chicago style, shows of course that it is unaware of Japanese or other East Asian scripts, still it collapses the CJK names. These files are not interesting, but the file derived from the nihon-shakaigakkai, the style-nihon-shakaigaku-parts.txt. I have commented randomly after the extracted parts in the file, and it seems that the style determines whether to use original Japanese (zenkaku) or something else (hankaku) according to the locale, which seems to be set with the language field of the entry.

I may be wrong.

Furthermore I assume that the collapse of the CJK script is on the citeproc level, and that it may be possible to adress the variants according to the locale /language field, but I do not understand how to. This would need a lot of trial and error, and maybe, asking here is the easier way?

Thanks a lot for listening!

Download archive with examples

OK, I’m pasting that txt file content below, in case others can help sort it out, and for the for the record.

  <macro name="contributors-半角">
    <group delimiter=". ">
      <group delimiter=" ">
        <choose>
          <if context="alternative" match="none"> <!-- shinoto: does this adress the variants, so here is no variant? -->
            <names variable="author">
              <name and="text" name-as-sort-order="first" sort-separator=", " delimiter=", " delimiter-precedes-last="contextual"/> <!-- shinoto: This looks like only for Western contributors, in Chicago style it ignores the delimiters etc. for CJK names -->
              <label form="short" plural="contextual" prefix=" "/>
              <substitute>
                <names variable="editor"/>
                <names variable="translator"/>
                <text macro="title-半角"/>
              </substitute>
            </names>
          </if>
          <else>
            <names variable="author">
              <name and="text" delimiter=", " delimiter-precedes-last="contextual"/> <!-- shinoto: looks Western againg -->
              <label form="short" plural="contextual" prefix=" "/>
              <substitute>
                <names variable="translator">
                  <name and="text" delimiter=", " delimiter-precedes-last="contextual"/>
                  <label form="short" plural="contextual" prefix=", "/>
                </names>
                <names variable="editor"/>
                <text macro="title-半角"/>
              </substitute>
            </names>
          </else>
        </choose>
        <text macro="recipient"/>
      </group>
    </group>
  </macro>
<macro name="contributors-全角"> <!-- shinoto: This seems for Eastern names -->
    <group delimiter=". ">
      <group delimiter=" ">
        <choose>
          <if context="alternative" match="none">
            <names variable="author">
              <name sort-separator=", " delimiter="・" delimiter-precedes-last="always"/> <!-- shinoto: But the formatting is Western again, and it is different in the result -->
              <label form="short" plural="never"/>
              <substitute>
                <names variable="editor"/>
                <names variable="translator"/>
                <text macro="title-全角"/>
              </substitute>
            </names>
          </if>
          <else>
            <names variable="author">
              <name sort-separator=", " delimiter="・" delimiter-precedes-last="always"/> <!-- shinoto: Separates Eastern authors -->
              <label form="short" plural="never"/>
              <substitute>
                <names variable="translator"/>
                <names variable="editor"/>
              </substitute>
            </names>
            <text macro="recipient"/>
          </else>
        </choose>
        <text macro="recipient"/>
      </group>
    </group>
  </macro>

...
 
...
    
    <!-- shinoto: these two macros define the citation in the text, they separate CJK and Western names -->
    <names variable="author"> 
    
    <macro name="layout-citation-全角">
      <group delimiter=" ">
        <text macro="contributors-short-全角"/>
        <text macro="layout-citation-year-page"/>
      </group>
    </macro>
    <macro name="layout-citation-半角">
      <group delimiter=" ">
        <text macro="contributors-short-半角"/>
        <text macro="layout-citation-year-page"/>
      </group>
    </macro>
...
      <!-- shinoto: this separates the Japanese (locale = "ja", which I assume is done with the "language" field /language variable in Zotero/JurisM/CSL -->

  <citation et-al-min="3" et-al-use-first="1" disambiguate-add-year-suffix="true" disambiguate-add-names="true" disambiguate-add-givenname="true" givenname-disambiguation-rule="primary-name">
    <layout prefix="(" suffix=")"  delimiter="; " locale="ja">
      <text macro="layout-citation-全角"/>
    </layout>
    <layout prefix="(" suffix=")" delimiter="; ">
      <text macro="layout-citation-半角"/>
    </layout>
  </citation>
  
  <bibliography hanging-indent="true" et-al-min="11" et-al-use-first="7" subsequent-author-substitute="&#8212;&#8212;&#8212;" entry-spacing="0" name-delimiter=", ">
    <sort>
      <key macro="contributors-半角"/><!-- shinoto: This seems to adress the Rōmaji variant of the name, which is used for alphabetically sorting the entries -->
      <key macro="year-sort"/>
    </sort>
    <layout locale="ja">
...

Do I understand you correctly, that this is now for the records, and there is no interest in solving the problem? What about the other comments?

I really hope for progress in this matter; as soon as it is possible to adress the variants of the fields and the subfields of names in CJK, I can go on with CSL for Japanese and multilingual papers.

No, I’m hoping someone else can weigh in, as I still don’t understand it; what the current situation is, where the problems are, what the options to fix are, who’s responsible, etc., etc.

I suspect @Frank_Bennett is the only one that will understand, but there may be others here.

I see.

Thank you for your patience.

I am so glad I found the JurisM version of CSL because this is the first time after decades – really – that I found someone who really understands the matter. My colleagues still write their bibliographies by hand with no exception, and I hope that there is a chance now, that there seems to be just a little last step to go.

Have you tried contacting JurisM, which of course is Frank’s project? He’s often ahead of where we are, particularly on internationalization issues.

Yes, I contacted him via the mailing list, but there was no answer. I do not know any other way to contact him. I would love to get into contact with him.

I can imagine that he is very busy at the moment, we are all struggling to get online teaching done in these days…

I think the combination of the crisis and a big project he seems to be working, probably explains the delay.

Didn’t you get his answer? He send you a message via the list. (Albeit just telling you that he’ll contact you.)