Bridging Digital Humanities Internal and Open Source Software Projects through Reusable Building Blocks

Rebecca Sutton Koeser (rebecca.s.koeser@princeton.edu), Center for Digital Humanities @ Princeton University, United States of America and Benjamin W Hicks (bhicks@princeton.edu), Center for Digital Humanities @ Princeton University, United States of America

Software development is often an integral aspect of Digital Humanities projects.  By working to generalize and build small modules or utilities targeting specific needs rather than large-scale systems, DH software developers have the capacity to generate tools with greater potential for scholarly reuse, which should enable more rapid development on future projects, and allow developers to focus on innovative work. This poster demonstrates a case study of modular software developed as part of ongoing DH projects.

There is a tendency among some institutions, particularly libraries, to adopt existing large-scale Open Source Software solutions and adapt them for local needs; but as Hector Correa points out, this approach results in skipping the work of thinking carefully about users and local needs (Correa, 2017).  If large-scale software solutions developed by coalitions of libraries are problematic (Princeton University Library Systems, 2017) where needs are at least similar, even where content structures or workflows differ, this problem is redoubled for research software, which is much more likely bespoke to a particular problem.  As Correa argues, single-purpose software is less complex and easier to understand and manage; and understanding the logic of code is crucial for research that is based on or otherwise makes use of software (Koeser, 2015).

Applying best practices from software development such as modular design can mitigate these problems through an emphasis on delivering working components of software and focusing on simplicity of purpose—a single, well-honed and balanced knife rather than a multi-tool with every imaginable attachment.  This approach is consistent with the design philosophy from one of the greatest success stories of modern open-source software, UNIX and its derivatives (Raymond, 2003).

There are certainly possible drawbacks and concerns about this approach.  It may require more effort, and perhaps different skills, to create, release, and manage independent software packages or modules.  According to Glass’ Facts and Fallacies of Software Engineering, it is “three times as difficult to build reusable components as single use components” (Glass, 2003: 49). In our case, when new software modules were being developed and extended in tandem with an existing software project, finalizing a new release of that project involved releasing and publishing multiple software modules.  There is also a danger of generalizing too soon; another familiar rule of thumb in software is that you have to do something three times before you know how to generalize it properly (Glass, 2003).

As a case study, our poster will present an overview of the software written for two annotation projects that were developed at the same time. “Derrida’s Margins” analyzes the work of Jacques Derrida through references in De la grammatologie and corresponding annotations in the books he cited. “The Winthrop Family on the Page” examines a community of readers connected through books over time via annotations.  This software ecosystem includes two project codebases (Koeser et al., 2018; Koeser and Hicks, 2018a) that make use of four new reusable components (Koeser and Hicks, 2018b; Koeser, 2018b), two of which (Koeser, 2018a; Koeser and Hicks, 2018c) were adapted from the “Readux” codebase (Koeser et al., 2017), which was previous developed at Emory University. In the process, we also used and made minor updates to a related, pre-existing module (Koeser, 2018c).

For each of these tools, a use case emerged in one project which could be generalized for other projects, with potential for broader reuse. As an example, “viapy”—a Python module for searching and providing VIAF data to a web framework—was adapted from previous work, and first existed as code for one of the annotation projects, but it proved generalizable.  In fact, it proved easier to extract as a reusable component rather than duplicate; one project team discovered a bug that had previously gone undetected, and creating a reusable package allowed us to correct the problem once for both projects. Likewise, code for storing and displaying annotations from the Readux project was ripe for repackaging as a general module because of its relatively direct purpose despite the different intellectual aims of these projects. However, these codebases also contain similar, potentially reusable functionality that is not yet ready for generalization.

These projects provide a view into the ongoing process of balancing customized solutions to DH projects with generalizing focused portions of functionality. Modular design aimed at ‘doing one thing and doing it well’ offers the possibility of creating an ecosystem of reusable packages that are widely useful and applicable, and can participate in a larger community of open source and other DH software research.


Appendix A

Bibliography
  1. Correa, H. (2017). Build your own software Hector Correa http://hectorcorrea.com/blog/build-your-own-software/70 (accessed 28 November 2017).
  2. Glass, R. L. (2003). Facts and Fallacies of Software Engineering. Addison-Wesley Professional.
  3. Koeser, R. S. (2015). Trusting Others to ‘Do the Math’. Interdisciplinary Science Reviews, 40(4): 376–92 doi:10.1080/03080188.2016.1165454. http://dx.doi.org/10.1080/03080188.2016.1165454 (accessed 29 June 2016).
  4. Koeser, R. S. (2018a). Django-Annotator-Store: Django Application to Act as an Annotator.js 2.x Annotator-Store Backend. Python Center for Digital Humanities at Princeton https://github.com/Princeton-CDH/django-annotator-store.
  5. Koeser, R. S. (2018b). Viapy: VIAF via Python. Python Center for Digital Humanities at Princeton https://github.com/Princeton-CDH/viapy.
  6. Koeser, R. S., Glover, K., Li, Y., Varner, J. and Thomas, A. (2017). Readux: Django Web Application to Display, Annotate, and Export Digitized Books in a Fedora Commons Repository. JavaScript Emory Center for Digital Scholarship https://github.com/ecds/readux.
  7. Koeser, R. S. and Hicks, B. W. (2018a). Winthrop-Django: Django Web Application for the Winthrop Family on the Page Project. Python Center for Digital Humanities at Princeton https://github.com/Princeton-CDH/winthrop-django.
  8. Koeser, R. S. and Hicks, B. W. (2018b). Django-Pucas: Django App to Streamline CAS Auth and Populate User Attributes from LDAP. Python Center for Digital Humanities at Princeton https://github.com/Princeton-CDH/django-pucas.
  9. Koeser, R. S. and Hicks, B. W. (2018c). Djiffy: Django Application to Index and Display IIIF Manifests for Books. Python Center for Digital Humanities at Princeton https://github.com/Princeton-CDH/djiffy.
  10. Koeser, R. S., Hicks, B. W., Glover, K. and Budak, N. (2018). Derrida-Django: Django Web Application for Derrida’s Margins. Python Center for Digital Humanities at Princeton https://github.com/Princeton-CDH/derrida-django.
  11. Koeser, R. S. (2018). Piffle: Python Library for Generating and Parsing IIIF Image API URLs. Python Center for Digital Humanities at Princeton https://github.com/Princeton-CDH/piffle.
  12. Princeton University Library Systems (2017). Valkyrie Princeton University Library Systems by Pulibrary https://pulibrary.github.io/2017-07-06-valkyrie (accessed 28 November 2017).
  13. Raymond, E. S. (2003). Art of Unix Programming, The. Addison-Wesley Professional http://proquest.safaribooksonline.com/book/operating-systems-and-server-administration/unix/0131429019.