Research Environment for Ancient Documents (READ)

Andrew Glass (asg@uw.edu), Microsoft Corp., University of Washington and Stephen White (stephenawhite57@gmail.com), Stephen White - Italy and Ian McCrabb (ian@prakas.org), University of Sydney, Prakas

The Research Environment for Ancient Documents (READ) is an integrated Open Source web platform for epigraphical and manuscript research. It may be configured as the underlying engine for a text repository or as a complementary research toolset to an existing repository. The defining innovation of this software is the atomization of text into orthographic subunits (as opposed to lines or words). This enables mapping across all layers of textual analysis, from factual data (the location of a character on a surface) through contestable (the transcription of a character) to the purely interpretive (a semantic annotation). This data architecture enables:

  • The integration of physical, textual, and interpretive aspects of research
  • The transformation of conventional editing practice into optimized workflows
  • Granular attribution of components of a text, which allows for alternative interpretations and flexible collaboration

This poster outlines the workflows and outputs supported by version 1.1 (2017) of READ. The first release is optimized for use with Indic languages using akṣara-based writing systems (abugida). We will demonstrate the platform using documents in Gāndhārī language. We will also demonstrate the ability to generalize READ to support other languages and writing systems, e.g., Aramaic, Chinese, English, Italian, and Mayan.

The core workflows of the READ are:

  1. Creating a new item for study and inputting a text transcription
    A researcher creates a new item in READ and adds basic metadata and enters a transcription of the item in free text. Once entered, the researcher can immediately access two types of reports: a wordlist generated from the text; and alternate presentations of the text edition (diplomatic, reconstructed, and hybrid). These reports are available via READ’s web interface, as well as the following downloadable export formats: HTML export, RTF, TEI (EpiDoc).
  2. Uploading images of the source text and linking to the text transcription
    READ provides tools to mark segment boundaries around the graphical units of the writing system depicted in the images. These segments are then automatically linked to the transcription entered in step 1. At this point the researcher can view the edition side by side with the image using synchronized scrolling provided by READ’s web interface. In addition, the researcher can access a paleographic report generated from the image segments using the linked transcription. The TEI (EpiDoc) export includes the image as Facsimile element. All image segments can be exported as distinct files for paleographic processing using external tools.
  3. Creating a text glossary by adding lexicographical data to the generated wordlist
    The researcher uses tools provided by READ to add lexicographical data to the wordlist that was generated in step 1. At this point, a glossary can be generated and exported (HTML, RTF) or viewed in READ’s web interface. Also, the edition viewer in READ’s web interface integrates glossary data in flyouts associated with each word in the edition.
  4. Completing the glossary
    The researcher views the glossary created in step 3 in the READ’s web interface and adds compound analysis to any compounds occurring in the text. At this point, glossary generation includes cross-reference entries for compound members.
  5. Annotating the edition
    The researcher uses annotation tools provided by READ to add footnotes and tags to the edition. The researcher can add text-structural information as well as textual parallels, translation, and alternate transliteration forms. These annotations can be viewed in the web interface, as footnotes in exported RTF and HTML output.
  6. Cubing the edition
    A researcher can integrate alternate editions of the same text using tools provided by READ. Any alternate editions so integrated, will be linked to the same image added in Step 2. Alternate editions can be viewed side-by-side in READ’s web interface to support comparison between alternate editions of a text.
  7. Sharing the research
    READ has been designed as a collaborative tool from the outset. Researchers can choose to share visibility and editing rights to any of the elements in their work. Work can also be published in mutable and immutable forms via the READ viewer interface, as well full text editions in TEI, exported HTML, and RTF that can be opened in common word processing and desktop publishing software applications.

The READ project began in 2013 and has been funded by Ludwig-Maximilians Universität, Munich, Germany; the University of Washington, Seattle, USA; Université de Lausanne, Switzerland; University of Sydney, Australia; and Prakaś Foundation, Sydney, Australia.