Getting to Grips with Semantic and Geo-annotation using Recogito 2

Leif Isaksen (l.isaksen@exeter.ac.uk), University of Exeter, United Kingdom and Gimena del Río Riande (gdelrio.riande@gmail.com), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina and Romina De León (rdeleon@conicet.gov.ar), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina and Nidia Hernández (nidiahernandez@conicet.gov.ar), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina

This workshop introduces Recogito 2 , a tool developed by Pelagios Commons that enables annotation of geographic place references in text, images and data through a user-friendly online platform. Perhaps the most notable feature of Recogito 2 is the ability to produce semantic data without the need to work with formal languages directly, while at the same time allowing the user to export the annotations produced as valid RDF, XML and GeoJSON formats.

The availability of born digital data as well as digitised collections, is changing the way we study and understand the humanities. This amount of information has even greater potential for research when semantic links can be established, and relationships between entities highlighted. The work of Pelagios Commons has shown that connecting historical data according to their common reference to places (expressed via URIs stored in gazetteers) is a particularly powerful approach: information about material culture, archaeological excavations, ancient texts and related scholarship can be connected and cross referenced through the geodata.

Producing semantic annotations usually requires a certain amount of knowledge of digital technologies such as RDF, ontologies and/or text encoding. These techniques can sometimes act as a barrier for users that are not already familiar with Semantic Web theory. The Recogito annotation tool aims to facilitate the creation and publication of Linked Open Data by dramatically reducing some commonly encountered obstacles. First developed in 2014, the community-oriented philosophy behind Pelagios Commons has made users an active agent in shaping its functionality and interface. A dedicated forum on the Pelagios Commons website gathers feedback and suggestions. Recogito code is Open Access and available through GitHub where discussions of Recogito’s more technical aspects are held. After a year of intensive redevelopment from the ground up, Recogito 2 was launched in December 2016 and now has almost 1,500 registered users. Introductory documentation is available in English, Spanish, German and Italian with the interface itself being translated into multiple languages in February 2018.

Recogito now supports both additional image standards (such as IIIF) and text standards (TEI export). This allows researchers to use the annotation tool as either a starting or intermediate point for their workflow in the production of semantic annotations that can be then built upon with other technologies. While the initial release already enabled collaboration among users, Recogito 2 features a more refined series of options to manage degrees of collaboration, from private annotations that can only be accessed by their creator, to collaborative and public ones that anyone can see and download. These options offer the opportunity to collaborate, but leaves users free to choose the degree of openness that best suits their materials at different stages of research.

Originally conceived for data related to the ancient world, Recogito 2 has become a valuable tool for annotating many other kinds of historical and modern sources, especially (but not confined to) those containing geographical information. Recogito 2 facilitates the annotation of any named entity. Where applicable, they can be resolved against a number of aligned digital gazetteers, including the ancient world ( Pleiades) and modern ( Geonames). Although the annotation of geographical information is its most principal focus, Recogito 2 also allows “people” and “event” references to be annotated (currently without semantic resolution), and the opportunity to add tags and comments to disambiguate and refine later searches. Two different colour-coding options makes it easy to identify the different kind of annotations (places, people or events) or different status of the geographic annotations.

This workshop walks participants through all stages of using Recogito 2 to annotate different types of source documents: from uploading a file to the online platform, through annotation, to the download of the annotations in the available data formats. More specifically, the workshop will show practical examples of:

Annotation of sources in text format
Attendees will learn how to benefit from Recogito’s automatic recognition of named entities, and how to refine it manually. They will create annotations ex novo, and check or modify those identified by Recogito. The geo-annotations produced on the text can then be plotted on a digital map, through a user-friendly visualisation mode. The relevance of each place is displayed on the map proportionally to the number of annotations that the place has received. Places are linked, via a pop-up window, to all their annotations in the same document, and users are able to browse each annotation in a short, essential context, or to see them in the full text.

Annotation of images and tables
After beginning with text files, attendees will work on the semantic annotations of images. Maps are especially well suited to geo-annotation but Recogito 2 can also be used for the annotation of other types of image, such as photographs or even textual sources in the form of digitised manuscripts. Users will upload images to the Recogito platform and be able to select, transcribe, annotate and, georesolve toponyms within the image. Workshop attendees will also see how Recogito can import and annotate or align tabular (CSV) data such as that derived from spreadsheets, databases or gazetteers.

Exporting data from Recogito
Finally, participants will learn how to export data from Recogito in a variety of formats suitable for visualizing and analysis in other tools, such as spreadsheets, databases or GIS.

To maximise the benefit of the workshop, participants are invited to bring their own data and documents to annotate. Recogito currently has greatest support for ancient and modern sources (including most languages). Materials from other periods can also be annotated but the level and quality of georesolution may vary. The workshop will provide sample texts, imagery and data for attendees without their own datasets. The workshop will show examples of annotations of different kind of sources, and discuss their specific challenges. Throughout the workshop there will be opportunities for participants to discuss how Recogito 2 might be used to support their own research.

Visualising and contextualizing geographical information within documents can be an important step in reaching a deeper understanding of their content, potentially highlighting phenomena that would have been otherwise difficult to identify. It is also an effective tool for engaging students when encountering historical texts and collections. The design of Recogito 2 is intended to make the production of semantic annotations easy and intuitive, opening the door of the Linked Open Data ecosystem to a wide range of users, including without prior experience of semantic technologies.