Revitalizing Wikipedia/DBpedia Open Data by Gamification -SPARQL and API Experiment for Edutainment in Digital Humanities

Go Sugimoto (go.sugimoto@oeaw.ac.at), Austrian Academy of Sciences, Austria

1. Introduction

The Linked Open Data (LOD) community is growing In Digital Humanities (DH). Important datasets are being published in RDF. SPARQL endpoints have been progressively created in many cultural heritage organizations (Edelstein et al., 2013). However, the use of those datasets in real research is still not prevalent. Although there are several DH projects (Boer, V. de et al., 2016), SPARQL query exploitation is often limited within small technology-savvy communities (Lincoln, 2017). The situation is better for less-complicated Application Programming Interfaces (APIs) (XML and JSON). However, Sugimoto (2017b) suggests the needs of API standardization and ease of data reuse for ordinary users. In a broader context, the underuse of data, tools, and infrastructures seems to be a common phenomenon in DH. For example, the use of the Virtual Language Observatory in CLARIN is rather low (Sugimoto, 2017a). In case of the limited use of SPARQL endpoints, there could be different reasons for this:

• Lack of awareness of existence

• Lack of skills to use SPARQL

• Opened data is too narrow in scope

• Lack of computing performance to be usable

• Interdisciplinary research is not widely exercised

It is a pity that the benefit of Open Data is only partially spread, although data is available. To this end, the author has experimented with Wikipedia/DBpedia to explore the potential use of and/or the revitalization of Open Data in and outside research community.

2. Revitalization of Wikipedia/DBpedia by gamification

The choice of Wikipedia/DBpedia is rationalized by taking into account the above-mentioned issues. The broad scope of their datasets would solve the problem of datasets in DH being too specific to be used by third party researchers (or the researchers do not know how to use data and/or what to do with them (Edmond and Garnett, 2014; Orgel et al., 2015). In addition, interdisciplinary research could be more easily adopted, using a more comprehensive yet relatively detailed level of knowledge.

The keyword of the approach of this project is gamification. In order to showcase a social benefit of Open Data and DH, gamification would be a catalyst to connect the scholars and the increasingly greedy public consumers. Kelly and Bowan (2014) stated that limited attention has been paid to digital games until recently, although this is changing rapidly (see Hacker, 2015). Although there are a few projects such as Cross Cult which uses elaborate semantic technologies (Daif et al., 2017), this article contributes to this discourse from a web innovation perspective in a simplified DIY project environment.

The game developed for the project is quite simple. It is a quiz that requires users to guess the age of a randomly selected person by looking at a portrait of the person (born between 1700 and 2002) (Figure 1). Apparently, the age of a person in a particular image is provided neither by Wikipedia, nor by DBpedia. It is, in fact, calculated programmatically by comparing the birthdate and the date of image. The random selection of data is sometimes costly for data processing, but it is the key to developing a game application. The application is intended for fun, thus, includes all types of contemporary persons such as politicians, sport athletes, musicians, actors, and businesspersons. In addition, the inclusion of historical figures is very important in DH in that the user would learn history.

Figure 1 Quiz to guess the age of a person found in a Wikipedia article

When the user cannot guess the age, there is a help function. A hint section is equipped with a face detection API of IBM Watson, suggesting the estimate age and gender of the person in the image by machine learning. Finally, this game is extended into another quiz to guess the nationality of a person. Indeed, any interesting data of Wikipedia/DBpedia can be used for gamification, and the method is easily adoptable.

3. Potential for Citizen Science

As a reflection of critics of Linked Data quality, Daif et al. (2017) reckon that human supervision is needed to manage the data. In our case, the application is sometimes not able to calculate the age of a person, due to several reasons of metadata quality. For instance, data may be not numeric (“16th century”) (Figure 2), malformed (not ISO compliant: “05/11/88”), confusing (the creation date of digital image is used instead of that of analogue image), inaccurate, wrong, or missing, resulting in an error message. This is normally regarded as an optimization problem of the code. However, it is possible to take advantage of this error. When it occurs, it is a sign of data quality problem. Therefore, users are persuaded to follow the provided links to Wikipedia/DBpedia and able to double-check the original data (Figure 3). This scenario creates a dual possibility. In other words, the application can be used as:

• A curation tool of Wikipedia/DBpedia for existing active editors of Wikipedia.

• A tool to transform normal users into new curators of Wikipedia

Although this scenario has not happened due to the project setting, if the users are able to correct data, the impact for data curation could be considerable. Not only is it to the benefit of correcting and/or adding data in Wikipedia, but DBpedia will also be improved, leading to the higher quality of datasets of this LOD magnet, affecting hundreds of applications worldwide. In this way, this application opens up the potential to crowdsource the curation of Wikipedia/DBpedia. The success of the crowd data curation has been proven in DH (see Brinkerink, (2010) and NYPL Labs).

Figure 2 Wikimedia metadata displaying “16 th century”

Figure 3 The game persuades users to improve Wikipedia

4. Conclusion

In conclusion, this article demonstrates an experimental case study of mixing gamification (entertainment) with data-driven research (education) and the possibility for data curation (crowdsourcing), showcasing cutting-edge technologies such as SPARQL and Deep Learning API, with the help of Open Data in the framework of DH. It also displays a potential for a new digital research ecosystem among humanities research and digital technologies, connecting various stakeholders including humanities researchers and the public.


Appendix A

Bibliography
  1. Boer, V. de, Penuela, A. M. and Ockeloen, C. J. (2016). Linked Data for Digital History: Lessons Learned from Three Case Studies. Anejos de La Revista de Historiografía(4): pp139–62.
  2. Brinkerink, M. (2010). Waisda? Video Labeling Game: Evaluation Report. Images for the Future – Research Blog http://research.imagesforthefuture.org/index.php/waisda-video-labeling-game-evaluation-report/index.html (accessed 12 April 2018).
  3. Daif, A., Dahroug, A., López-Nores, M., Gil-Solla, A., Ramos-Cabrer, M., Pazos-Arias, J. J. and Blanco-Fernández, Y. (2017). Developing Quiz Games Linked to Networks of Semantic Connections Among Cultural Venues. Metadata and Semantic Research. (Communications in Computer and Information Science). Springer, Cham, pp. 239–46 doi:10.1007/978-3-319-70863-8_23.
  4. Edelstein, J., Galla, L., Li-Madeo, C., Marden, J., Rhonemus, A. and Whysel, N. (2013). Linked Open Data for Cultural Heritage: Evolution of an Information Technology. http://www.whysel.com/papers/LIS670-Linked-Open-Data-for-Cultural-Heritage.pdf (accessed 24 April 2018).
  5. Edmond, J. and Garnett, V. (2014). Building an API is not enough! Investigating Reuse of Cultural Heritage Data. LSE Impact Blog http://blogs.lse.ac.uk/impactofsocialsciences/2014/09/08/investigating-reuse-of-cultural-heritage-data-europeana/ (accessed 27 February 2018).
  6. Hacker, P. (2015). The Games Art Historians Play: Online Game-based Learning in Art History and Museum Contexts. The Chronicle of Higher Education Blogs: ProfHacker https://www.chronicle.com/blogs/profhacker/the-games-art-historians-play-online-game-based-learning-in-art-history-and-museum-contexts/61263 (accessed 12 April 2018).
  7. Kelly, L. and Bowan, A. (2014). Gamifying the museum: Educational games for learning | MWA2014: Museums and the Web Asia 2014 https://mwa2014.museumsandtheweb.com/paper/gamifying-the-museum-educational-games-for-learning/ (accessed 12 April 2018).
  8. Lincoln, M. (2017). Using SPARQL to access Linked Open Data. Programming Historian https://programminghistorian.org/lessons/graph-databases-and-SPARQL (accessed 12 April 2018).
  9. NYPL Labs Whats on the menu? http://menus.nypl.org/about (accessed 12 April 2018).
  10. Orgel, T., Höffernig, M., Bailer, W. and Russegger, S. (2015). A metadata model and mapping approach for facilitating access to heterogeneous cultural heritage assets. International Journal on Digital Libraries, 15(2–4): pp189–207 doi:10.1007/s00799-015-0138-2.
  11. Sugimoto, G. (2017a). Number game -Experience of a European research infrastructure (CLARIN) for the analysis of web traffic. CLARIN Annual Conference 2016. Aix-en-Provence, France: CLARIN ERIC and Laboratoire Parole et Langage and Laboratoire des Sciences de l’Information et des Systèmes (LSIS) and Aix-Marseille Université and Centre National de la Recherche Scientifique (CNRS) https://hal.archives-ouvertes.fr/hal-01539048 (accessed 17 November 2017).
  12. Sugimoto, G. (2017b). Battle Without FAIR and Easy Data in Digital Humanities. Metadata and Semantic Research. (Communications in Computer and Information Science). Springer, Cham, pp. 315–26 doi:10.1007/978-3-319-70863-8_30.