Minna de Honkoku: Learning-driven Crowdsourced Transcription of 
Pre-modern Japanese Earthquake Records

Yuta Hashimoto (yhashimoto1984@gmail.com), National Museum of Japanese History, Japan and Yasuyuki Kano (kano@rcep.dpri.kyoto-u.ac.jp), Kyoto University, Japan and Ichiro Nakasnishi (ichiro@kugi.kyoto-u.ac.jp), Kyoto University, Japan and Junzo Ohmura (ohmura1204@yahoo.co.jp), Bukkyo University, Japan and Yoko Odagi (odagi@kugi.kyoto-u.ac.jp), Kyoto University, Japan and Kentaro Hattori (hattori@kueps.kyoto-u.ac.jp), Kyoto University, Japan and Tama Amano (tama@npo-kikou.com), Kyoto University, Japan and Tomoyo Kuba (tomoyokuba@gmail.com), Kyoto University, Japan and Haruno Sakai (sakai.haruno.36r@gmail.com), Tokyo Metropolitan Library

1. Introduction

In the last decade, crowdsourcing has become a major technique for transcribing a large volume of historical manuscripts. The volunteers of Transcribe Bentham 1 have transcribed more than 19,000 pages of manuscripts written by Jeremy Bentham (Causer and Wallace 2012). More than 480,000 pages of weather observations from the US Government Arctic logbooks written in the 19th century were transcribed by 4,730 people through the Old Weather 2 project (Eveleigh et al. 2013).

However, managing a crowdsourcing project remains a big challenge for humanities scholars. The following practical difficulties are encountered:

  1. The need to draw public attention to the project successfully.
  2. The need to encourage participants’ long-term involvement.
  3. The tasks requiring crowdsourcing in humanities studies (e.g. transcribing ancient handwritten manuscripts) are often difficult for non-trained participants.

In case of Japanese Studies, the last difficulty is particularly crucial; due to the drastic change in the writing system that occurred at the end of 19th century, 99% of modern Japanese people are unable to read kuzushiji, classical calligraphic renderings of Japanese characters that were common for both publishing and handwriting. Therefore, the crowdsourcing technique has never been successfully applied to pre-modern Japanese materials.

However, humanities scholars can use education to draw the attention of a large number of people, promote their long-term participation, and train them to tackle difficult tasks. The fundamental idea in this paper is to develop a crowdsourcing system embedded in a collaborative learning environment that enables learners to conduct crowdsourced tasks as a part of their learning with their peers.

Minna de Honkoku 3 ( https://honkoku.org/) is a crowdsourced transcription project of pre-modern Japanese earthquake records, developed by the members of the Historical Earthquake Study Group (HESG) at Kyoto University based on this idea. In this paper, we will briefly describe the aim, materials, approach, and results of Minna de Honkoku.

2. The Background and aim of the project

HESG is a joint group of seismologists and historians including the authors at Kyoto University who have been studying pre-modern earthquake records for seismic research and disaster prevention. Since instrumental observation of earthquakes in Japan began only after the end of 19th century, transcribing written records are required for studying past earthquakes. Therefore, Japanese seismologists have developed an extensive collaboration with historians and archivists.

However, the number of records to be transcribed is vast and cannot be handled by a small group of scholars. This prompted the members of HESG to think of using crowdsourcing for transcribing historical earthquake records.

We have set the first goal of our project, Minna de Honkoku, to transcribe all the 114 books from the Ishimoto Collection, which is composed of historical earthquake records collected by a seismologist Mishio Ishimoto (1893-1940) and digitized by Earthquake Research Institute (ERI), Tokyo University. The number of pages in the books ranges from 14 to 268. The total number of pages across the 114 books is 6,386. Each digital image in the collection contains two pages, as presented in Figure. 1.

An example of two digitized pages in a book from the Ishimoto Collection
Figure 1. An example of two digitized pages in a book from the Ishimoto Collection

The challenge and our approach

The biggest challenge of our project is to crowdsource the reading of kuzushiji, which is illegible for most modern Japanese people except trained experts. Our approach to this challenge is to design our crowdsourcing system as an online learning environment where participants can learn kuzushiji by transcribing the earthquake records in a collaborative manner.

More specifically, Minna de Honkoku integrates crowdsourcing with online learning in the following two ways:

  • Collaboration with a mobile learning app: Minna de Honkoku collaborates with KuLA 4 (Kuzushiji Learning App), a mobile learning app for reading kuzushiji that was developed by one of the authors (Hashimoto 2017) and has been downloaded 85,000 times since its release in 2016 (see Figure. 2). After completing a set of basic lessons for reading kuzushiji, the users of KuLA are invited to Minna de Honkoku as an opportunity to acquire more practical training by transcribing actual materials from pre-modern Japan. They can thus begin participating in the project as a continuation of their learning.
  • Collaborative learning through distributed proofreading: Transcribing kuzushiji correctly is quite difficult, and beginners usually make a lot of mistakes. For quality control of transcriptions, Minna de Honkoku uses “distributed proofreading” adopted by Project Gutenberg (Newby 2003) but with an educational purpose; when you finish transcribing an image from a book on the transcription editor of Minna de Honkoku (see Figure. 3), your transcription will be shared and reviewed by other participants on the timeline that shows user activities in real-time (see Figure. 4). When another participant makes corrections on your transcription, you will receive a notification with the feedback, informing you of the mistakes you made and the corrections (see Figure. 5, 6).
Screenshots of KuLA
Figure 2. Screenshots of KuLA
Transcription editor of Minna de Honkoku
Figure 3. Transcription editor of Minna de Honkoku
The timeline view of user activities
Figure 4. The timeline view of user activities
The notification panel
Figure 5. The notification panel
Corrections made by another participant (added texts are colored in green and deleted texts in red)
Figure 6. Corrections made by another participant (added texts are colored in green and deleted texts in red)

3. The results

The website of Minna de Honkoku was launched on January 10, 2017. The transcription of 114 books (6,386 pages) from the Ishimoto Collection was completed on May 31, 2017. Thus, our initial goal was completed in less than five months since the project launch. We extended our goal and added another 223 books stored in ERI. As of November 2017, 271 books out of 337 (9,254 pages out of 9,716) including those from the Ishimoto Collection have been transcribed by volunteers. A total number of 3.12 million characters have been transcribed.

A total of 3,457 people have registered an account, and 285 of them have transcribed at least one character on the website. While we were unable to include all registered users in the transcription process, a small number of regular volunteers have eagerly contributed to the project: 35 users have transcribed more than 10,000 characters, and 6 of them more than 100,000.

4. The background and motivations of the participants

In order to understand the backgrounds and motivations of the participants, we administered an online questionnaire to them via Google Form between March 8 to May 13, 2017. We obtained responses from 64 participants. The following is a brief summary of the questionnaire results:

  • 70% of respondents (45 people) are KuLA users.
  • We asked the respondents to choose the reasons of their participations from 12 pre-defined choices (multiple choices up to three are allowed). The most selected reasons are as follows:
    1. “Transcribing historical manuscripts is fun” (70%, 45 choices). 
    2. “I can learn from other participants’ transcriptions and reviews” (50%, 32 choices).
    3. “I can contribute to seismic research and disaster prevention through the project” (44%, 28 choices).

The results above suggest the following: (1) KuLA works effectively as an “entrance” to Minna de Honkoku, and (2) the possibilities of collaborative learning greatly motivate the participants, although the most powerful motivation is the enjoyment gained from transcribing.

5. Conclusion

In this paper, we have described the background, aim, approach, and results of Minna de Honkoku, a crowdsourced transcription of historical earthquake records of pre-modern Japan. It had been often said that crowdsourced transcription of pre-modern Japanese materials is not possible because reading kuzushiji is too difficult for non-trained volunteers. However, our learning-centered approach appears to have achieved considerable success. The same approach may also be used in many other countries that are facing difficulties in reading historical manuscripts due to changes in writing systems.

Lastly, desire to learn is one of the most fundamental characteristics of human beings, fulfilling which is one of the important roles of a scholar as a teacher. We therefore believe that considering academic crowdsourcing in the context of education will bring beneficial outcomes.


Appendix A

Bibliography
  1. Causer, T., and Wallace V. (2012). Building a volunteer community: results and findings from Transcribe Bentham. Digital Humanities Quarterly, 6(2).
  2. Eveleigh, A., et al. (2013). “I want to be a Captain! I want to be a Captain!”: Gamification in the Old Weather Citizen Science Project.” Proceedings of the first international conference on gameful design, research, and applications. ACM.
  3. Hashimoto, Y., et al. (2017). The Kuzushiji Project: Developing a Mobile Learning Application for Reading Early Modern Japanese Texts. Digital Humanities Quarterly, 11(1).
  4. Newby, G. B., and Franks, C. (2003). Distributed proofreading. In Digital Libraries, 2003. Proceedings. 2003 Joint Conference on (pp. 361-363). IEEE.
Notes
3.

The literal translation of Minna de Honkoku in English is “Transcribe with everyone.” Also, the video tutorial of Minna de Honkoku in English is available at: https://www.youtube.com/watch?v=iX5xN4vZeao.