CEAL Home | Executive Board | Chinese Materials | Japanese Materials | Korean Materials | Library Technology | Membership | Public Services | Technical Processing | Publications | Statistics | Eastlib

Committee on Japanese Materials (CJM)

Council on East Asian Libraries (CEAL)

<< Go Back to Archives

Annual Session Minutes for 2014 - Philadelphia

Thursday, March 27, 2014


Philadelphia Marriott Downtown
Marriott Liberty Ballroom C

Digital resource development for Japanese Studies:
our opportunities and challenges


This session was organized jointly by CJM and NCC.


Digital resources of Japanese texts from a viewpoint of Digital Humanities

Kiyonori Nagasaki - International Institute for Digital Humanities, Tokyo


Prof Nagasaki began by outlining his long involvement with Digital Humanities in a number of institutions and projects leading to the establishment of the International Institute for Digital Humanities. He emphasized the importance of digitizing Japanese content as the foundation for digital humanities and introduced key large-scale (e.g. NDL, NIHU, CiNii) and medium/small-scale digital resources (e.g. National Institute of Japanese Language and Linguistics, National Institute of Japanese Literature, and university databases). The large-scale resources tend to be managed by more general organizations and are more sustainable over time. They tend to have simple search functions and offer image files rather than text data. Medium/small-scale resources are operated by research groups or specialized organizations and can offer specific search functions, metadata, and text data. However, their long-term sustainability may be less reliable.

Prof Nagasaki gave an overview of the development of Digital Humanities from the 1950s and the establishment of the Mathematical Linguistic Society of Japan, through the spread of computing using kanji/kana in the 1960s-1970s, the emergence of Digital Humanities communities from the 1980s to the rapid expansion following the birth of the internet. He highlighted the role of the Special Interest Group – Computer and Humanities (SIG-CH) which has organized quarterly workshops since 1989. There have been over 800 presentations from 300 researchers, with a strong IT-orientation. Recently the number of humanities scholars has increased reflecting the development of Digital Humanities.

Prof Nagasaki outlined the main types of Digital Humanities research:
Digital Humanities researchers in Japan face a number of difficulties:
Prof Nagasaki asked the East Asian studies library community to assist by:
Prof Nagasaki concluded by outlining a number of recent initiatives in the Digital Humanities:
Prof Nagasaki recommended anyone interested in the Japanese Digital Humanities situation to read Digital Humanities Monthly.


“What’s cooking?: two projects at LC with Japanese partners”

Eiichi Itō (Library of Congress)


Mr Itō introduced two current initiatives at Library of Congress:
  1. Digitization of the Pre-WWII Japanese censorship collection
  2. Digitization and transcription of a Tale of Genji manuscript

The Asian Division of Library of Congress has been engaged in a number of collaborative digitization projects over the last 10 years. In 2004 the International Research Center for Japanese Studies (Nichibunken) digitized 1,500 Kabuki prints, the first illustrated edition of the Tale of Genji (1654) and 4 Nara ehon manuscripts. Since 2010 LC has been working with National Diet Library to digitize its Pre-WWII Japanese censorship collection and with the National Institute of Japanese Language and Linguistics (NINJAL) on digitizing a 16th century manuscript of the Tale of Genji.

Pre-WWII Japanese censorship collection:

The project involves 1,300 censored books, each with notes by officials of the Naimushō indicating the reason for the censorship. Although other copies of the books may exist in Japan, the censors’ annotations make LC’s copies unique. This, together with the fact that many are in poor condition, made them a priority for digitization. Negotiations with NDL began in 2009 and an agreement was signed the following year. Digitization is being carried out in 3 phases in March 2013, March 2014 and March 2015.

Tale of Genji Project with NINJAL:

In 2008 LC acquired a pre-1537 manuscript of the Tale of Genji. Researchers from NINJAL visited LC in 2010 and produced full transcriptions. LC has put digital images of 3 volumes on a page turner on its website linked to the LC catalog. NIJL has mounted digital images with searchable transcription of the text. This is also linked to the LC catalog record.

Mr Itō concluded by explaining that the projects had presented challenges in terms of resources (funding, staff time and expertise), copyright and governing law issues. He stressed the importance of sharing and preserving collections and the need to collaborate with other institutions to contribute to study and research.


Panel discussion


For the panel discussion Prof Nagasaki and Mr Itō were joined Dr Ikki Ōmukai of the National Institute of Informatics, one of the key architects of CiNii and an expert on the semantic web.

Mr Itō began by welcoming the launch by NII of the Japan Institutional Repositories Online (JAIRO) Cloud. He also believed that the JADH/NDL Hondeji/翻デジ project was an excellent way to give scholars access to a large corpus of Japanese texts.

Prof Nagasaki explained that Hondeji was a voluntary system for scholars to contribute text. Data can be provided in TEI format. Michiko Ito asked why Hondeji was fully searchable whereas it was only possible to search OCR’d text in, for example, Asahi Kikuzo II using keywords. Prof Nagasaki explained that in Hondeji the text was first OCR’d and then retyped so as to be searchable.

In answer to a question from Mari Suzuki, Prof Nagasaki said that FineReader was the best tool to OCR Japanese text. Shirin Eshghi asked if Hondeji was crowd-sourced or if it was reviewed. Prof Nagasaki explained that it was a voluntary project and there was no quality assurance or proof-reading beyond what individual contributors did for their own texts before submitting them. Dr Ōmukai added that all data was recorded and stored so if there were problems, an earlier version of a text could be restored.

Vickey Bestor mentioned that Mr Itō’s presentation showed that a number of browsers could be used to view the digitized Tale of Genji but she had heard that many projects in Japan use a limited number of viewers – e.g. Internet Explorer but not Chrome - and wondered if Japanese institutions could be encouraged to widen the range of browsers to improve accessibility. Mr Itō replied that there had been long discussions with NIJL about this issue and NIJL was committed to ensuring that its software could be used with as many browsers as possible.

Setsuko Noguchi noted that the Daizōkyō database was very popular with faculty and students and asked Prof Nagasaki if he had any usage statistics. He replied that Japan and the US are the principal users and there are a total of 200,000 users per month.


Minutes taken by:
Minutes coded by: Adam H. Lisbon (May 6th, 2015)