Annual Session Minutes for 2014 - Philadelphia
Thursday, March 27, 2014
Philadelphia Marriott Downtown Marriott Liberty Ballroom C
Digital resource development for Japanese Studies:our opportunities and challenges
This session was organized jointly by CJM and NCC.
Digital resources of Japanese texts from a viewpoint of Digital Humanities
Kiyonori Nagasaki - International Institute for Digital Humanities, Tokyo
Prof Nagasaki began by outlining his long involvement with Digital Humanities in a number of institutions and projects leading to the establishment of the International Institute for Digital Humanities. He emphasized the importance of digitizing Japanese content as the foundation for digital humanities and introduced key large-scale (e.g. NDL, NIHU, CiNii) and medium/small-scale digital resources (e.g. National Institute of Japanese Language and Linguistics, National Institute of Japanese Literature, and university databases). The large-scale resources tend to be managed by more general organizations and are more sustainable over time. They tend to have simple search functions and offer image files rather than text data. Medium/small-scale resources are operated by research groups or specialized organizations and can offer specific search functions, metadata, and text data. However, their long-term sustainability may be less reliable.
Prof Nagasaki gave an overview of the development of Digital Humanities from the 1950s and the establishment of the Mathematical Linguistic Society of Japan, through the spread of computing using kanji/kana in the 1960s-1970s, the emergence of Digital Humanities communities from the 1980s to the rapid expansion following the birth of the internet. He highlighted the role of the Special Interest Group – Computer and Humanities (SIG-CH) which has organized quarterly workshops since 1989. There have been over 800 presentations from 300 researchers, with a strong IT-orientation. Recently the number of humanities scholars has increased reflecting the development of Digital Humanities.
Prof Nagasaki outlined the main types of Digital Humanities research:- Archiving
- Making digital cultural resources
- Providing digital cultural resources
- Analyzing
- Analyzing digital cultural resources
- Analyzing human (or research) activities in digital age
- Representing
- Making exhibitions using digital cultural resources
- Funding is getting tighter. Once project funding stops any resources created tend to disappear.
- Need to survey global/North American trends and standards for digital cultural resources.
- Need to provide useful information in English.
- Creating a system for evaluating digital resources.
- Helping researchers to get information about research trends and standards in the West at an early stage of their projects while it was still possible to change them.
- Helping them to write useful information in English and gathering it on e.g. a wiki site.
- SAT Daizōkyō Text Database. Database of Buddhist texts which began in 1984 and by 2008 had over 100 million characters of text.
- Proposing additional Kanji and Siddham characters missing from Unicode to ISO/IEC10646.
- Japanese Association for Digital Humanities (JADH) has begun Hondeji 2014 (翻デジ2014) with support from NDL to transcribe its digital images so that users can search Japanese.
“What’s cooking?: two projects at LC with Japanese partners”
Eiichi Itō (Library of Congress)
Mr Itō introduced two current initiatives at Library of Congress:- Digitization of the Pre-WWII Japanese censorship collection
- Digitization and transcription of a Tale of Genji manuscript
The Asian Division of Library of Congress has been engaged in a number of collaborative digitization projects over the last 10 years. In 2004 the International Research Center for Japanese Studies (Nichibunken) digitized 1,500 Kabuki prints, the first illustrated edition of the Tale of Genji (1654) and 4 Nara ehon manuscripts. Since 2010 LC has been working with National Diet Library to digitize its Pre-WWII Japanese censorship collection and with the National Institute of Japanese Language and Linguistics (NINJAL) on digitizing a 16th century manuscript of the Tale of Genji.
Pre-WWII Japanese censorship collection:
The project involves 1,300 censored books, each with notes by officials of the Naimushō indicating the reason for the censorship. Although other copies of the books may exist in Japan, the censors’ annotations make LC’s copies unique. This, together with the fact that many are in poor condition, made them a priority for digitization. Negotiations with NDL began in 2009 and an agreement was signed the following year. Digitization is being carried out in 3 phases in March 2013, March 2014 and March 2015.
Tale of Genji Project with NINJAL:
In 2008 LC acquired a pre-1537 manuscript of the Tale of Genji. Researchers from NINJAL visited LC in 2010 and produced full transcriptions. LC has put digital images of 3 volumes on a page turner on its website linked to the LC catalog. NIJL has mounted digital images with searchable transcription of the text. This is also linked to the LC catalog record.
Mr Itō concluded by explaining that the projects had presented challenges in terms of resources (funding, staff time and expertise), copyright and governing law issues. He stressed the importance of sharing and preserving collections and the need to collaborate with other institutions to contribute to study and research.Panel discussion
For the panel discussion Prof Nagasaki and Mr Itō were joined Dr Ikki Ōmukai of the National Institute of Informatics, one of the key architects of CiNii and an expert on the semantic web.
Mr Itō began by welcoming the launch by NII of the Japan Institutional Repositories Online (JAIRO) Cloud. He also believed that the JADH/NDL Hondeji/翻デジ project was an excellent way to give scholars access to a large corpus of Japanese texts.
Prof Nagasaki explained that Hondeji was a voluntary system for scholars to contribute text. Data can be provided in TEI format. Michiko Ito asked why Hondeji was fully searchable whereas it was only possible to search OCR’d text in, for example, Asahi Kikuzo II using keywords. Prof Nagasaki explained that in Hondeji the text was first OCR’d and then retyped so as to be searchable.
In answer to a question from Mari Suzuki, Prof Nagasaki said that FineReader was the best tool to OCR Japanese text. Shirin Eshghi asked if Hondeji was crowd-sourced or if it was reviewed. Prof Nagasaki explained that it was a voluntary project and there was no quality assurance or proof-reading beyond what individual contributors did for their own texts before submitting them. Dr Ōmukai added that all data was recorded and stored so if there were problems, an earlier version of a text could be restored.
Vickey Bestor mentioned that Mr Itō’s presentation showed that a number of browsers could be used to view the digitized Tale of Genji but she had heard that many projects in Japan use a limited number of viewers – e.g. Internet Explorer but not Chrome - and wondered if Japanese institutions could be encouraged to widen the range of browsers to improve accessibility. Mr Itō replied that there had been long discussions with NIJL about this issue and NIJL was committed to ensuring that its software could be used with as many browsers as possible.
Setsuko Noguchi noted that the Daizōkyō database was very popular with faculty and students and asked Prof Nagasaki if he had any usage statistics. He replied that Japan and the US are the principal users and there are a total of 200,000 users per month.