Mid-Range Reading: Manifesto Edition

Grant Wythoff (grant.wythoff@gmail.com), Pennsylvania State University, United States of America y Alison Booth (ab6j@virginia.edu), University of Virginia, United States of America y Sarah Allison (sallison@loyno.edu), Loyola University New Orleans, United States of America y Daniel Shore (Daniel.Shore@georgetown.edu), Georgetown University, United States of America

1. Overview

This panel intervenes in debates about interpretative methods that are often lumped under “reading,” and often measured by metaphors of scale, from close to distant. Mining data in vast corpora promises to transform literary history, and all scholars in the humanities rely upon online materials and tools. Yet many humanists stand aloof from DH because of its presumed hyperbolic claims, its apparent blurring of the detailed artifact (the domain of humanities), and to some, its collusion, post-critique, with neo-liberal globalization. Four panelists, collaborating for the first time, have encountered provocative concepts in each other’s work that moderate such stark oppositions between the humanist and the computational. The panelists’ previous studies have demonstrated the “payoff” or mutual instruction of DH and other recognized standards of scholarship. At the same time, in meticulous capture of language, style, form, and cultural production, the panelists highlight the limits that some champions of algorithms might want to leap in a single bound. Technological approaches to literary studies require highly curated corpora and modulation, often excision, of noisy results. Each paper addresses the loss inherent in categories and models, and the gain in tracing discarded, fuzzy, or inaccessible data. While our fields span centuries of Anglophone culture, our work advocates diversity, women’s history, and the DH community’s values of open access and collaborative technological innovation.

Our papers address disruptions as well as continuities in observational scale as the tools and materials shift. Each panelist speaks from experience with a different dataset and her or his innovative approach to interpretation, touching on both language and technology. The first two speakers propose forms of mid-range reading to describe imaginative and interpretive leaps that scholars make between individual documents/texts and broader social forces; the second two address the reductions and abstractions that are necessary to the research project, themes common to all papers. As an archeologist of technologies, Wythoff rediscovers the concept of the gadget as an instance of human-inanimate interaction mirrored in DH. Booth expands on her response in PMLA to Franco Moretti’s Distant Reading, highlighting typologies as well as specific textual features in biographical nonfiction that enforce communal narratives. Allison, co-author on Stanford Lit Lab pamphlets associated with distant reading, proposes reductive reading, or explicit acknowledgment of necessary simplification, even of such ambitious problems as the nature of fictionality, which has been differently framed in studies by Piper, Underwood, and Eliot. While concepts of scale pervade claims for methods, Shore offers the approaches of construction grammar and corpus linguistics for particular insights into abstractions and categorizations. Shore, like Allison, calls on us to acknowledge the motivated reductions that are necessary to the research process. Our talks reflect on the history of technology and biographical representation, the forms of fiction and nonfiction, and the preconditions of selection and labeling of data—enduring issues in the humanities that become more telling with the expanding digital capacity to “read” at large and at speed.

Grant Wythoff, Tacit computing and method in the humanities

Humanistic research has always involved imaginative and interpretive leaps from the person to "the social," from the text to "the historical." Think for instance of the Annales school and its emphasis on the history of collective mentalities, or how Foucault described "discourse" by reverse-engineering historical ways of constituting knowledge. Today however, with the availability of big data, many of these forms of humanistic interpretation have become second nature. The search for broad cultural formations is implicit in the earliest steps we take in a research project, from keyword searches to frequency analyses. To what degree are certain kinds of historical argumentation baked into these mundane, day-to-day research activities, and what other kinds of cultural formations might we be overlooking?

In my current book project, Gadgetry: A History of Techniques, I reconstruct the history of a discourse on technology. The book focuses on the many kinds of objects that were described as "gadgets" across the twentieth century, from dashboard gauges to atomic bombs, can-openers to smartphones. While “gadget” can be a placeholder for any kind of object, even imaginary ones, I argue that its evolving application to particular tools and techniques reveals important lessons about our relationship to technology.

In this book, I explore the user's imagination of how their gadgets work. For example, a single iPhone contains over half the elements of the periodic table, extracted from almost every continent on the planet and compressed into a thin slab that allows the user to dip her toes into a river of collective affect generated by the social network of everyone she's ever met. This is a fantastically science-fictional experience that is now part of our everyday lives. But the emergence of new digital cultures, political movements, and forms of intimacy are all predicated on the unique habits each user adopts in order to understand these complex gadgets.

For this book, I text mine archives of novels, magazines, and newspapers in order to explore the distinctly vernacular philosophies––the media theories from below––that emerge from users and their everyday practices. Using databases like the Corpus of Historical American English, Historical American Newspapers, and the Media History Digital Library, I proceed by collecting as many instances of the word "gadget" as possible and plugging them into categories of my own making based on how the term is applied: is the gadget handmade or mass produced, seen as important or a trinket, does the word refer to the entirety of the tool or a component within it, and so on. Because I have hand-coded this "dataset" and designated myself the categories into which I sort each instance of the word, the portrait that emerges of a discourse on technology could be described as entirely of my own making, as opposed to algorithmically-generated. But what really is the distance between these two categories of interpretation? In this talk, I will compare my digital methods to other methods throughout the history of the humanities that have attempted to paint a portrait of collective feeling.

Alison Booth, Mid-Range reading: typologies, events, and discourse in a network of women’s biographies

Although many investigate fictionality, scholars have attended much less to nonfiction and biography than to imaginative forms such as novels or film. Digital humanities (DH) expand the scale of literary history while building on existing maps of period, genre, and notable authors, with finding aids shaped by previous scholarship. Thus Andrew Piper’s impressive textual analysis, “Fictionality,” neglects life narrative. Collective Biographies of Women (CBW) accesses a corpus of 1270 English-language biographical collections published across centuries, in a feminist historical study of a “hidden collection” of nonfiction. CBW developed before Google Books glimmered on the horizon; we worked with WorldCat and analogue materials to rediscover such publications as Noted Negro Women (1893). Reversing the usual DH phases, I published the book before collaborating on an online resource. What could we learn about the trends in gender ideology already constructed by biographers and publishers, publication data, and contents? Biography is a model (i.e. reduction) of a life within networks of typologies based on social difference. Distant reading is not best adapted to ramifications within curated corpora, where there is no mystery of author or genre. We capture the distinctive form and rhetoric of biography (and changing meaning of words such as “noble”) in relation to such scenarios as inter-class contact or recognition of genius. Sentiment analysis or word vectors developed for large corpora of novels or newspapers would miss the mark. The actual dynamics of gender representation, for example, can hardly be captured as a grammatical binary or by rates of male or female agents per 300 words, while nationality is a shifting attribute across geopolitical and individual transformations.

This paper extends Booth’s “Mid-range reading: not a manifesto” and builds on the findings from CBW’s method of mid-range reading as well as from the typologies and networks of women in the CBW database. CBW researchers are tagging discourse in biographies, such as first-person plural and plural proper names, and quantifying the distribution of types of events across versions of the same person or occupational types. Both scales of reading and typologies press upon ethics as well as epistemology: how to classify the individual text, or the character/person. Attention must be paid, yet cognition and knowledge depend on generalizations. CBW has focused on sets of books that document the ways women’s lives have been typologically interpreted. Our “sample corpora” range from all the books that include a short life of the saintly Victorian nurse, Sister Dora, and the distinct set of books that feature the famous adventuress, Lola Montez; other networks cluster around Queen Cleopatra, Frances Trollope, African Americans, women in medicine, Latinas, presenters (publishers, biographers), and others among the 8500 persons. A method we call mid-range reading uses the Biographical Elements and Structure Schema (BESS), a stand-aside XML schema (not TEI editing within the text file) that links element types (of stage of life, events, discourse, persona description, topos) to numbered paragraphs. BESS analyses, then, measure rates and distributions of element types across versions of lives sorted typologically by the contents of interrelated books. In 2018 we will obtain TEI files of remaining texts, with non-consumptive use of the copyright materials, through the HathiTrust Research Center. Becoming in this sense an archive as well as a testing ground for narrative theory of biography and network analysis across centuries of representation of women, CBW can demonstrate the comparative rewards of large-scale textual analysis and mid-range reading, and add to the understanding of biographical representation in many forms.

Sarah Allison, Harnessing Pegasus: On Setting Reasonable Limits

This paper takes up a theoretical question in digital humanities practice: how we understand the borders or boundaries of projects. “Reductive reading” is my term for critical methods that call attention to how they subordinate, or reduce, textual complexity. I argue that the explicit way with which DH research acknowledges this act of simplification creates an ethos of critical frankness. As Stephen Ramsay argues, code must “assert its utter lack of neutrality with candor, so that the demonstrably non-neutral act of interpretation can occur.” “Harnessing Pegasus” focuses on the poignant question of setting limits. How do researchers establish the right distance from the texts under consideration, or reduce the scope of their inquiry? Here, I consider how researchers set limits in three projects that aim to understand what we might take to be the constitutive feature of the novel: fictionality.

It is axiomatic that the most irritating questions after a talk--but often also the best--are those that deal with a project’s limits. Researchers announce what they have done, and the members of the audience say, Ah, but why didn’t you do something else? This practice can help establish that one has taken a reasonable approach to a legitimate question in the field or open up future possibilities for research. It can also bring home the importance of narrowing one’s approach in order to answer a specific question, as in We didn’t do something else. We did the thing we did. In sharing work publically, researchers are called to account for the boundaries they have set--or, as it is often framed, that they have been forced to set.

It is the latter attitude that interests me here, the moment when the scalar ambitions of distant reading meet pragmatic reality and intellectual justification. Mid-range reading leaves space to account for both. In this paper, I will consider three approaches to fictionality in literary history: by Andrew Piper in Cultural Analytics, by Ted Underwood, Michael L. Black, Loretta Auvil, and Boris Capitanu in their work on genre in the HathiTrust Digital Library, and by Simon Eliot in his bibliographic work on trends in publishing, 1800-1914. In considering the way each project treats its limitations, I seek to create connections--bridges--across them. How do their definitions of fictionality intersect with Catherine Gallagher’s theoretical treatment of the topic, and what that that tell us about nonfictionality? In each of these three studies, non-fiction is represented by a discrete collection of texts. How does limiting the generic canon change the way we understand fictionality?

Dan Shore, Other than Scale

This paper explores the limits of the concept of scale in digital inquiry. Quantitative scholars in particular have naturally chosen scale as what sets their approach apart from other established methods. They speak of the computer as a “macroscope” that permits “macroanalysis.” Scholars counted things before computers, but computers let them count and compute lots of things. Contrasting themselves with close readers, distant readers propose, with the help of machines, to step back from the page to see more and see bigger. Claims of scalar difference are often quite quantitatively precise. Instead of offering a reading of a single novel, distant readers study the titles of 7,000 British novels from 1740-1850, or ask how not to read a million books, or search through the 60,237 full texts in EEBO TCP I and II. For nearly all quantitative analyses of texts, the authors could tell the reader exactly how many words they count in how many documents, in light of sophisticated metrics and models.

Talk of scale in the digital humanities has not been simply ill advised. In spite of quantitative precision, we don’t really know what we talk about when we talk about scale. Individual texts are much bigger than are usually acknowledged. Even when bag-of-words approaches are forthright about discarding word order and syntax, they rarely itemize what they are discarding. What has been characterized as an increase in scale can be more accurately described as the sacrifice of one sort of information for another. The point is not to oppose reductionism, but to be fully aware of what is being reduced.

Scalar conceptualization of digital tools and methods has tended to crowd out other, non-scalar distinctions. Some, like experimental design, theories of evidence, and falsifiability (an account of what it would mean to be wrong) should be more prominent in the conversation. I’ll focus on concepts - abstraction, categorization, hierarchy - that are central to meaning and linguistic creativity across languages. Here I turn to the insights of construction grammar and corpus linguistics to suggest further possibilities for investigation. The bigram thought leader is two words, but it is also a single compound noun, the meaning of which can’t be fully predicted from the meaning of its parts. How big is it? An abstract construction like Once upon a time… [] and they lived happily ever after may be only ten words, and yet as big as the fairy tale that fills its blank. How long is it? The relevant distinctions in these examples are not scalar in any simple sense, and the methods for understanding them cannot be captured by distance or proximity. I start with linguistic examples at the level of the utterance, propose a few ways forward for qualitative and quantitative inquiry, and close by suggesting how the non-scalar distinctions at work in construction grammar might be relevant for specifically literary questions such as genre and narrative form.


Appendix A

Bibliography
  1. Allison, Sarah. Reductive Reading: A Syntax of Victorian Moralizing. Baltimore: Johns Hopkins University Press, forthcoming 2018.
  2. Allison, Sarah. “Other People’s Data: Humanities Edition,” Cultural Analytics, Dec. 8, 2016. http://culturalanalytics.org/2016/12/other-peoples-data-humanities-edition/
  3. Bode, Katherine. “The Equivalence of ‘Close’ And ‘Distant’ Reading; Or, toward a New Object for Data-Rich Literary History.” Modern Language Quarterly 78, no. 1 (March 1, 2017): 77–106, https://doi.org/10.1215/00267929-3699787 .
  4. Booth, Alison. How to Make It as a Woman: Collective Biographical History from Victoria to the Present. Chicago: University of Chicago Press, 2004.
  5. Booth, Alison. “Mid-Range Reading: Not a Manifesto.” PMLA 132: 3 (May 2017): 620-27.
  6. Burguiere, Andre. The Annales School: An Intellectual History. Trans. Jane Marie Todd. Ithaca, NY: Cornell University Press, 2009.
  7. Eliot, Simon. “Some Trends in Book Publishing, 1800-1914” in John O. Jordan and Robert L. Pattern (eds.), Literature in the Marketplace. Cambridge: Cambridge University Press, 2003.
  8. Eliot, Simon, and Jonathan Rose, eds. A Companion to the History of the Book. Malden, MA: Wiley-Blackwell, 2009.
  9. Gallagher, Catherine. Nobody’s Story: The Vanishing Acts of Women Writers in the Marketplace, 1670-1820. Berkeley, U. of California P, 1994.

  10. Gallagher, Catherine. “The Rise of Fictionality.” The Novel. Ed. Franco Moretti, Vol. 1. Princeton: Princeton UP, 2006. 336-63.

  11. Goldberg, Adele E. Constructions at Work: The Nature of Generalization in Language. New York: Oxford UP, 2006.
  12. Goldberg, Adele E. Constructions: A Construction Grammar Approach to Argument Structure. Chicago: U of Chicago P, 1995.
  13. Hancher, Michael. “Re: Search and Close Reading,” in Debates in the Digital Humanities 2016. University of Minnesota Press, 2016. 118–38. http://conservancy.umn.edu/handle/11299/181603 .
  14. Langacker, Ronald W. Cognitive Grammar: A Basic Introduction. Oxford: Oxford UP, 2008.
  15. Nunberg, Geoffrey, Ivan A. Sag, and Thomas Wasow. “Idioms,” Language 70 (1994): 491–538.
  16. Piper, Andrew. “Fictionality.” Journal of Cultural Analytics, December 20, 2016. https://doi.org/10.22148/16.011 .
  17. Robertson, Stephen, and Lincoln Mullen. “Digital History & Argument White Paper – Roy Rosenzweig Center for History and New Media.” November 13, 2017. https://rrchnm.org/argument-white-paper/.
  18. Shore, Daniel. Cyberformalism: Histories of Linguistic Forms in the Digital Archive. Baltimore: Johns Hopkins UP, forthcoming 2018.
  19. Shore, Daniel. “Shakespeare’s Constructicon,” Shakespeare Quarterly 66.2 (2015): 113-136.
  20. Smith, Barbara Herrnstein. “What Was Close Reading? A Century of Method in Literary Studies,” Minnesota Review 87 (2016): 57–75.
  21. Underwood, Ted. “Distant Reading and the Blurry Edges of Genre. ” The Stone and the Shell. 22 Oct. 2014.
  22. Underwood, Ted. “Understanding Genre in a Collection of a Million Volumes, Interim Report.” Figshare. https://dx.doi.org/10.6084/m9.figshare.1281251
  23. Wythoff, Grant. Gadgetry: A History of Techniques, in progress.
  24. Wythoff, Grant. The Perversity of Things: Hugo Gernsback on Media, Tinkering, and Scientifiction. University of Minnesota Press, 2016.