Bridging the Gap: Digital Humanities and the Arabic-Islamic Corpus

Dafne Erica van Kuppevelt (, Netherlands eScience Center and E.G. Patrick Bos (, Netherlands eScience Center and A. Melle Lyklema (, Utrecht University and Umar Ryad (, University of Leuven and Christian R. Lange (, Utrecht University and Janneke van der Zwaan (, Netherlands eScience Center

1. Introduction

Despite some pioneering efforts in recent times, the longue durée analysis of conceptual history in the Islamic world remains largely unexplored. Researchers of Islamic intellectual history still tend to study a certain canon of texts, made available by previous Western researchers of the Islamic world largely based on considerations of the relevance of these texts for Western theories, concepts and ideas. Indigenous conceptual developments and innovations are therefore insufficiently understood, particularly as concerns the transition from premodern to modern thought in Islam.

What, then, are the silenced continuities, transformations and major fault lines in Arabic-Islamic discourses? The Islamic tradition offers a vast textual corpus for exploring this question from a longue durée perspective, but its very breadth poses substantial problems for the individual scholar seeking to survey the literature by traditional methods. In the last decade, vast collections of digitized classical Arabic texts have become available online (Muhanna 2016, pp. 11-64). This marks the “beginning of what could become a methodological revolution in the fields of Arabic and Islamic Studies”, as noted by Peralta and Verkinderen in the very first edited volume on Digital Humanities and the Arabic-Islamic corpus (Muhanna 2016, pp. 199).

This paper presents ongoing research to use state-of-the art Digital Humanities approaches and technologies to make pioneering forays into the vast corpus of digitized Arabic. This is done along the lines of three case studies, each of which examines a separate genre of Arabic and Islamic literary history.

2. Case studies

(1) Islamic law: This case study analyzes the corpus of digitally available (Sunni) legal works (furu’ al-fiqh) from premodern to modern times (ca. 150 digitized works with ca. 75 million words, extracted from the OpenITI corpus 1 to investigate longue durée shifts in concepts and idioms employed in Muslim juridical discourse. The scholarly questions pursued relate to the history of the senses and of sense perception in the Islamic world, and of the human body more broadly speaking. Digital humanities methods applied to this corpus will include topic modelling (around the five senses) and computer-supported statistical analysis in historical perspective, that is, by comparing legal teachings throughout the fourteen centuries of Islamic law.

(2) Modern Islamic proselytizing literature: This case study analyses a largely neglected corpus of Arabic texts written between the 19th and 21st centuries (approx. 500 titles) on Islamic missionary activities (da’wa). The focus of the analysis will be to identify continuities and changes regarding the key concept of da’wa and the discursive idioms used to express them, and identify, graph and visualize the transnational networks involved with the discourses on da’wa.

(3) Arabic poetry: This case study will investigate the digital corpus of Arabic poetry (estimated 2,5 billion words, extracted from the OpenITI corpus). Poetry is an especially apt corpus to study the history of the senses and of sense perception in the Islamic world. What senses were favored by Arabic poets over the course of centuries? What kind of semantic fields are constructed in Arabic poetry around, for example, the sense of vision, and how does this contrast with, for example, legal constructions of vision?

3. Method

Most of the research projects in Digital Humanities have focused on Western Europe and the Americas, leaving a gap between state-of-the-art Digital Humanities tools and the Arabic text corpus. Many current initiatives in Arabic Digital Humanities seek to teach programming languages to humanities scholars. We pursue a different strategy to move Arabic Digital Humanities forward, by developing a freely accessible, user friendly interface to Digital Humanities technology, based on existing software.

The development of the technology is at an early stage, and we aim to present a first version of an Arabic-specific Digital Humanities toolkit at the conference. The toolkit integrates existing tools for stemming and morphological analysis in Arabic, as such as the Khoja stemmer (Khoja, Garside and Knowles, 2001), Tashaphyne stemmer 2 and the AlKhalil morphological analyzer (Boudchiche et al., 2017). We will use the SAFAR software (Jaafar and Bouzoubaa, 2015) to compare these libraries and integrate the most relevant tools in a pipeline for humanities research. The resulting tagged datasets will be made available in an existing search engine, such as BlackLab 3 . All software developed for this paper is published open source 4 .

We will present the development of the Arabic-specific Digital Humanities toolkit, including challenges that emerge from developing text mining tools specific for Arabic, with proposed solutions. It will also present early findings from the three case studies.

Appendix A

  1. Boudchiche, M., Mazroui, A., Ould Abdallahi Ould Bebah, M., Lakhouaja, A. and Boudlal, A. (2017) ‘AlKhalil Morpho Sys 2: A robust Arabic morpho-syntactic analyzer’, Journal of King Saud University - Computer and Information Sciences. Elsevier, 29(2), pp. 141–146. doi: 10.1016/J.JKSUCI.2016.05.002.
  2. Jaafar, Y. and Bouzoubaa, K. (2015) ‘Arabic Natural Language Processing from Software Engineering to Complex Pipeline’, in 2015 First International Conference on Arabic Computational Linguistics (ACLing). IEEE, pp. 29–36. doi: 10.1109/ACLing.2015.11.
  3. Khoja, S., Garside, R. and Knowles, G. (2001) ‘An Arabic tagset for the morphosyntactic tagging of Arabic’, in Corpus Linguistics. Lancaster University. Available at: (Accessed: 24 April 2018).
  4. Muhanna, E. (ed.) (2016) The Digital Humanities and Islamic & Middle East Studies. Berlin, Boston: De Gruyter. doi: 10.1515/9783110376517.

Romanov, M, OpenITI.


T. Zerrouki, Tashaphyne, Arabic light stemmer,