More Than “Nice to Have”: TEI-to-Linked Data Conversion

Constance Crompton (constance.crompton@uottawa.ca), University of Ottawa, Canada and Michelle Schwartz (michelle.schwartz@ryerson.ca), Ryerson University, Canada

For developers of TEI-based projects, linked data is often much-desired but nonessential, an added output that would be nice to have, but that is not critical to ultimate success of the project. The recent catalyzation of interest in linked open data in the context of TEI (including the revitalization of ADHO’s LOD SIG and the TEI’s Ontologies SIG) is, however, a promising sign of our field’s engagement with linked data, and our readiness to join international efforts to produce and publish linked data (Huber et al.; Pattuelli et al.; Lehmann et al.; Shadbolt et al.; Hellmann et al). Currently linked data only makes up 1% of the web, and much of that 1% is used for commercial rather than scholarly purposes (Simpson and Brown). The conversion of existing digital humanities data into linked data offers humanities scholars an opportunity to intervene in the semantic web as it is being built. It allows the power of the semantic web to be harnessed for more than just commercial purposes, and offers rich and readily accessible information about the research topic of the liberal arts: the human record. The underlying assumption of the semantic web is the same as the underlying assumption of humanities research—we can never assume ourselves to be in a full state of knowledge; there is always new information that may come to light. The creation and exposure of linked data from the vast number of existing authoritative TEI projects could enable scholars to embrace linked cultural data at scale. But what is the path to success? Our poster reflects on the technical and institutional challenges to linked data creation, and proposes a workflow and toolset for the creation of linked data from TEI.

Despite calls in the digital humanities for TEI-linked data compatibility (Simpson and Brown, Ciotti and Tomasi), scholars have yet to develop best practices for creating linked data from richly encoded TEI resources. For many projects, the production of linked data is an ancillary goal, one that would be gratifying to achieve, but one that is secondary to the encoding itself, or only necessary to facilitate aggregation. We propose the development of XSLT-backed tools to convert and connect otherwise incommensurable data sets. The tools will require human checks, since mapping the unique usages of hierarchical elements by TEI-based projects onto existing ontologies—including CIDOC-CRM, FOAF, SKOS, schema, dcterms, and others—is hardly one-to-one. Furthermore, the historical primary source material that the TEI permits encoders to so diligently represent requires significant contextualization, since the conditions of its production were often underpinned by historical worldviews that today may be read as racist, sexist, ableist, or homophobic. Without machine and human-readable contextualization, historic intents, biases, and worldview may be reified by the inferencing that linked data permits. The ideal outcome would instead be an understanding, without valourization, of those worldviews. We are testing our tools and workflow against data sets that present exactly these challenges. We are working with four sample TEI-based data sets representing four hundred years of Atlantic cultural production, including manuscripts, books, periodicals, biographies, art works, legislation, places, and events, representing 45,000 entities. The data spans four hundred years, two regions (Europe and the Americas), five religions, three languages, all with particular historical-contextual specificity. The upcoming phases of our work will involve testing the tools against more diverse TEI sets. We are especially interested in the poster format, as we are keen to solicit feedback from peers on the balance between granularity and generality in the representation of people, places, time, and cultural production as linked data.


Appendix A

Bibliography
  1. References
  2. Ciotti, F., Tomasi, F. (2016). Formal Ontologies, Linked Data, and TEI Semantics. Journal of the Text Encoding Initiative. https://doi.org/10.4000/jtei.1480
  3. Hellmann, S., et al. (2014). Knowledge Base Creation, Enrichment and Repair, in: Auer, S., Bryl, V., Tramp, S. (Eds.), Linked Open Data — Creating Knowledge Out of Interlinked Data, Lecture Notes in Computer Science. Springer International Publishing, pp. 45–69.
  4. Huber, J., Sztyler, T., Noessner, J., Murdock, J., Allen, C., Niepert, M. (2014). LODE: Linking Digital Humanities Content to the Web of Data. IEEE/ACM Joint Conference on Digital Libraries. http://arxiv.org/abs/1406.0216.
  5. Pattuelli, M.C., Miller, M., Lange, L., Fitzell, S., Li-Madeo, C. (2013). Crafting Linked Open Data for Cultural Heritage: Mapping and Curation Tools for the Linked Jazz Project. The Code4Lib Journal.
  6. Shadbolt, N., O’Hara, K., Berners-Lee, T., Gibbins, N., Glaser, H., Hall, W., Schraefel, M.C. (2012). Linked Open Government Data: Lessons from Data.gov.uk. IEEE Intelligent Systems 27, 16–24. https://doi.org/10.1109/MIS.2012.23
  7. Simpson, J., Brown, S. (2014). Inference and Linking of the Humanist’s Semantic Web, in: Implementing New Knowledge Environments. Presented at the Building Partnerships to Transform Scholarly Publishing, Whistler, BC.