An Easy-to-use Data Analysis and Visualization Tool for Studying Chinese Buddhist Literature

Jen-Jou Hung (jenjou.hung@gmail.com), Dharma Drum Institute of Liberal Arts, Taiwan

In the field of Chinese ancient texts digitalization, the digitization of Buddhist scriptures has been regarded as a relatively complete and fruitful collection. The Chinese Buddhist Electronic Text Association (CBETA) has made the Chinese electronic Tripiṭaka collection widely available for many years and provided a resourceful platform for the studies on Chinese Buddhist texts. As of the 2016 version(CBETA 2016), more than 210 million Chinese characters are freely and publicly available in digital form through the efforts of the CBETA.

The digital age that we have now entered has provided us with tools which help us in conducting surveys of Buddhist texts at a scale larger than before, and The text analysis techniques has been proofed as useful in many Buddhist literature research studies (Hung 2010, Bingenheimer 2017). It is with this goal in mind, our team made use of these new tools of the digital age to create a digital research environment which tailored to the needs of research in the field of Buddhist studies (and beyond). In order to achieve these goals, we established the CBETA Research Platform ( http://cbeta-rp.dila.edu.tw/?lang=en). This research platform provides high-quality digital content from the CBETA corpus, combines with relevant reference materials based on the latest findings. Additionally, we implemented tools for quantitative analysis with the ultimate goal of creating a digital research platform which will assist scholars in their study of Chinese Buddhist texts or the underlying Indian origins.

1. CBETA Research Platform

The system architecture of CBETA Research Platform is shown in fig. 1. We have integrated the full text of CBETA corpus with Tripiṭaka catalogue, bibliographic databases, Buddhist dictionaries and authority databases of person and places to form the backend database of CBETA Research Platform. We then create tools to assist researchers in reading, searching and analyzing Buddhist literature.

Fig 1. the system architecture of CBETA Research Platform

2. Concordance Search and Analysis

Concordance Search and Analysis is the first quantitative analysis tool implemented in CBETA Research platform 1 . It is a tool for gaining deeper insight into the search results from CBETA corpus. It allows user to aggregate search results from different dimensions (by Text Category, by Date and Dynasty, by Authors and Translators), and compare the results of multiple search terms.

2.1. Start a New Analysis

Concordance Search and Analysis will first require user to enter the keywords they want to compare and specify the search scope.

Fig 2. the start page of Concordance Search and Analysis system

2.2. Data

The system retrieves the complete search results and stores the search results for different keywords in the system cache at the same time. On data page, users can examine the complete list of the matches, and delete unwanted records from the result set.

Fig 3. the data page of Concordance Search and Analysis system

2.3. Analysis

The System allows user to aggregate search results from different dimensions: by Text Category, by Date and Dynasty, by Authors and Translators, and compare the result of multiple search terms. Fig 4, 5 and 6, show the analysis results of two Synonyms: 泥洹(ní huán)and 涅槃(niè pán) form above-mentioned three different dimensions

Fig 4 The statistics keywords in different text categories

Fig. 5: The statistics of keywords with different translators

Fig. 6 The statistics of keywords in different dynasties

The system offers several statistical range settings. Thus, users are able to observe a wider usage of keyword from large-scale view, and at the same time, to trace a particular phenomenon back to the source text for identification and further research.

Fig. 7. statistics of keywords in different texts from Eastern-Jin Dynasty (C.E. 317 -420)

Fig. 8 statistics of keywords in different fascicles of 長阿含經(Dīrghāgama).

If we click the points represented the fascicle 3 of長阿含經(Dīrghāgama) in the Fig.8, we will see sentences that actually contain keywords in the text.

Fig 9. sentences that actually contain keywords in fascicle 3 of Dīrghāgama.

In addition, the system also provides the "prefix and suffix analysis" feature, allowing users to quickly access the statistics of a character before and after the keyword.

Fig 10. prefix and suffix analysis of keywords

In addition, in the spatial analysis function, we use a GIS system to display the location of the text containing the keywords, which allows users to compare the use of keywords geographically.

Fig 11. the spatial analysis of keywords


Appendix A

Bibliography
  1. Bingenheimer, M., Hung, J., and Hsieh, C. (2017) Stylometric Analysis of Chinese Buddhist texts – Do different Chinese translations of the Gaṇḍavyūha reflect stylistic features that are typical for their age? Journal of the Japanese Association for Digital Humanities, 2(1): 1-30
  2. CBETA. (2016) CBETA Chinese Electronic Tripiṭaka Collection, Available at: http://www.cbeta.org/cbreader/help/index_e.htm (Accessed: 11 July 2017)
  3. Hung, J., Bingenheimer, M., and Wiles, S. (2010) Quantitative Evidence for a Hypothesis regarding the Attribution of early Buddhist Translations Literary and Linguistic Computing, 25(1):119-134
Notes
1.

Besides to Concordance Search and Analysis, CBETA Research Platform has provided an user-friendly reading interface ( called as CBETA Online Reader, http://CBETAOnline.dila.edu.tw) for accessing texts and reference materials from backend database.