A Comprehensive Image-Based Digital Edition Using CEX: A fragment of the Gospel of Matthew

Janey Capers Newland (janeycapers.newland@furman.edu), Furman University, United States of America and Emmett Baumgarten (emmett.baumgarten@furman.edu), Furman University, United States of America and De'sean Markley (desean.markley@furman.edu), Furman University, United States of America and Jeffrey Rein (jeffrey.rein@furman.edu), Furman University, United States of America and Brienna Dipietro (brienna.dipietro@furman.edu), Furman University, United States of America and Anna Sylvester (anna.sylvester@furman.edu), Furman University, United States of America and Brandon Elmy (brandon.elmy@furman.edu), Furman University, United States of America and Summey Hedden (summey.hedden@furman.edu), Furman University, United States of America

This poster (with accompanying downloadable dataset and application) will demonstrate as a proof-of-concept a comprehensive image-based publication and analysis of a text bearing artifact, [catalog number redacted for anonymous review], a hitherto unpublished 10th Century palimpsest fragment of the Gospel According to Matthew. The fragment contains most of the “Parable of the Sower”.

In editing this text, we sought to be as comprehensive as possible, capturing:

  • Natural light and UV images, both overview images and details
  • A diplomatic transcription of the overwritten text of Matthew and any legible characters from the undertext
  • A word tokenization of the diplomatic transcription, mapping to each token:
    • a normalization
    • editorial status
    • lexical status
    • morphology, part of speech, and syntactic relations
    • alignment to the image data
  • A character tokenization, aligned to the image data
  • An edition of the whole Gospel according to Matthew from a critical edition, for comparison and context
  • Translations aligned to the text
  • Editors’ comments

In publishing it, we sought simplicity, longevity, and clarity. While we use TEI XML as a format for capturing an initial transcription, the overlapping analytical categories, many-to-many alignments of text and image, and open ended possibilities for commentary precluded implementing a coherent data model fully in XML. At the same time, we wanted a concise and integrated publication.

By using the CEX format 1 , a plain text, self-documenting format based on the CITE/CTS architecture, we are able to bring together these many levels of analysis in a form that is at once disaggregated, with each scholarly primitive explicitly and unambiguously citable, while still united in a single file. CEX allows us to distribute a fully integrated dataset in the form of a single plain text file and a single directory of images.

Our publication, a CEX file and a directory of images, is technology-agnostic readable by humans, but also able to serve as the data for an end-user application. We will describe, and have available for download and on USB thumbdrives, a lightweight, zero-configuration single page web application (SPA), fully usable offline, that integrates the data and images for this publication for end-users. 2

Finally, we will outline the low cost, low technology, collaborative work behind the digitization and editing of this manuscript fragment: off-the-shelf cameras, simple handheld UV lighting, readily available FOSS software.

We believe that this work will be of interest to the international Digital Humanities research community both as a new publication of a Byzantine Greek text and as a demonstration of a replicable and sustainable combination of technology and workflow. We think this approach provides a compelling alternative to XML or RDF editions and complex database-driven end user applications, offering advantages both on the back end (a flexible, scalable, and self-documenting format for implementing diverse data models), and on the front end (lightweight and portable presentation for readers). At the same time the data we present as CEX and images is easily transferrable to other standard formats. 3

All project data is under version control in a public GitHub repository, and licensed under a CC-BY license. All source code is under either a GPL or MIT public license.

Notes
1.

CEX (CITE Exchange Format) is a plain text format for capturing data about texts and collections, based on the CITE/CTS architecture and developed by C. Blackwell ( Homer Multitext), T. Köntges (University of Leipzig), and N. Smith ( Homer Multitext). For implementations and projects using CEX, see: T. Köentges, (Meletē)ToPān (topic modelling environment): https://thomask81.github.io/ToPan/; C. Blackwell, N. Smith, CEX Library (Scala): https://github.com/cite-architecture/cex; C. Blackwell, N. Smith, CEX Dataset Repository: https://github.com/cite-architecture/citedx

2.

This application is based on the ScalaJS implementation of “CITE App” by C. Blackwell and N. Smith: https://github.com/cite-architecture/CITE-App

3.

Existing code libraries for working with CEX include a microservice framework that delivers textual and other data from CEX files as JSON objects, via HTTP requests (see https://github.com/cite-architecture/scs-akka ) and libraries that export CEX data into other formats, such as 2-column tabular data or 82XF (see https://github.com/cite-architecture/scm ).