Distant Viewing with Deep Learning: An Introduction to Analyzing Large Corpora of Images
This tutorial provides a hands-on introduction to the use of deep learning techniques in the study of large image corpora. The TensorFlow and Keras libraries within the Python programming language are used to facilitate this analysis. No prior programming experience is required.
Image analysis tasks covered in the tutorial include object detection, facial recognition, image similarity, and image clustering. We will make three open-access image corpora (historic photographs, still frames from moving images, and scanned works of art) available in order to test these methods. Alternatively, participants may bring and use an image dataset of interest to them. At the conclusion of the tutorial, participants will have created an interactive website running locally on their machines. This website will provide tools for analyzing their selected dataset. Additional instructions for making the website publicly available will be provided.
Audience and Number of Participants
This tutorial is aimed at scholars who work with visual materials who want to integrate DH methods into their analysis of image corpora. Our tutorial is based off of lectures notes used in a non-major, undergraduate-level course at the University of Richmond. It is accessible to participants with little to no programming background. However, as the tutorial will focus on the methods behind image processing rather than low-level coding, it will also be interesting and useful for experienced programmers new to image processing.
Following the large number of participants at the AVinDH SIG sponsored Workshop in Montreal for DH20167 and our popular tutorial at DH2016 in Krakow, we expect the workshop participation to be equally popular with somewhere between 15 and 25 participants.
Taylor Arnold is Assistant Professor of Statistics at the University of Richmond. A recipient of grants from the NEH and ACLS, Arnold’s research focuses on computational statistics, text analysis, image processing, and applications within the humanities. His first book
Humanities Data in R (Springer, 2015) explores four core analytical areas applicable to data analysis in the humanities: networks, text, geospatial data, and images. His second book, the forthcoming
Lauren Tilton is Assistant Professor of Digital Humanities in the Department of Rhetoric and Communications at the University of Richmond and a member of Richmond’s Digital Scholarship Lab. Her current book project focuses on participatory media in the 1960s and 1970s. She is the Co-PI of the project
Participatory Media, which interactively engages with and presents participatory community media from the 1960s and 1970s. She is also a director of
Photogrammar, a web-based platform for organizing, searching and visualizing the 170,000 photographs from 1935 to 1945 created by the United States Farm Security Administration and Office of War Information (FSA-OWI). She is the co-author of
Humanities Data in R (Springer, 2015). She is co-chair of the American Studies Association’s Digital Humanities Caucus.
In this three hour tutorial we plan to spend the first 15 minutes getting all participants set up with the software and datasets required for the tutorial. The tutorial participants will we able to work on any reasonably recent version of Windows, macOS, or Linux. All of the software is free and open source. The remained of the workshop will consist of two 75-minutes sessions with a 15 minute break between them.
Each of the two 75-minute sessions will consist of working collectively through “labs” formatted as IPython notebooks. Participants will have the option of using one of three pre-compiled datasets during the workshop depending on their interests:
- historic photographs
- still frames from moving images
- scanned works of art
Alternatively, tutorial participants may alternatively work with their own collection of images.
The first session will focus on describing the potential difficulties of working with image data and explaining how deep learning can be used to address several of these challenges. Working at a conceptual level we will work through the following tasks:
- how to structure a large collection of images as files on a computer
- how to load images into Python as multidimensional arrays
- the concepts behind applying neural networks to image data
- code for projecting images into the penultimate layer of the YOLOv4 neural network
- methods for visualizing the output projects from the neural networks
The second session will focus on how the features detect in the first session can be used to annotate higher level features and measure the similarity between images. Specifically:
- the application of image projections to image similarity metrics
- the application of image projections to object detection
- the application of image projections to face detection
In the final 30 minutes, we will discuss how these techniques ultimately can be used to address humanities questions. This will culminate in running Python code that will output the constructed annotations as an interactive website running locally on each user’s computer. This will open up further possibilities for extending the methods of the tutorial without the need for an extensive programming background.
The tutorial will require an overhead projector and internet access for all participants. We will forward information for downloading the required software ahead of time, however, to reduce the load on the internet on the day of the tutorial. Similarly, we will bring USB drives with the image corpora in case the internet connection speed makes manual downloading large files prohibitively slow.
- Arnold, T. and Tilton, C. (2015).
Humanities Data in R. New York, NY: Springer.
- Arnold, T., Kane, M., and Lewis, B. (2017).
A Computational Approach to Statistical Learning. New York, NY: CRC Press.