Exploring Big and Boutique Data through Laboring-Class Poets Online

Cole Daniel Crawford (cole_crawford@fas.harvard.edu), Harvard University, United States of America

Though quantitative methods are becoming increasingly common within the humanities, few researchers readily describe their primary texts as data. Most prefer to see their objects of study as contextually situated and socially constructed entities with independent value that resist complete digital representation. Miriam Posner argues that for many humanities researchers, describing an artifact as data implies “that it exists in discrete, fungible units; that it is computationally tractable; that its meaningful qualities can be enumerated in a finite list; and that someone else performing the same operations on the same data will come up with the same results.” Defined this way, digital artifacts and metadata seem to simultaneously insist on particular interpretations and to be bereft of deeper meaning outside of an aggregate state, thereby resisting the hermeneutic methodologies which form the core of humanistic inquiry.

This position stems from understanding data primarily through a big data mindset. As corporations, governments, and universities have increasingly addressed business problems by embracing data analytics, the essential qualities of big data (large volume, high velocity, and heterogenous variety) have created the illusion among many that such datasets can perfectly model an imperfect and unpredictable world, gaining credibility simply by increasing in volume. The computational authority of big data is persuasive because it presents a seemingly objective, number-driven way of knowing reality – an epistemology of the database, predicated on scale, comprehensiveness, and reproducibility.

While an immense and complete archive possesses an undeniable allure (Manovich, 2012; Kaplan, 2015), there is still value in examining individual records and investigating the intangible stories and datapoints that hide in database gaps or reside outside of databases entirely. I use Cheryl Ball et al’s term “boutique data” to emphasize the ongoing importance of small, localized, partial, and qualitative datasets to the humanities research process. I frame boutique data as both a thing (a boutique dataset) and a theoretical approach to data-intensive work in the humanities. While big data are often automatically generated, boutique data are manually curated – subjective, created capta as opposed to given data (Drucker, 2011). Big data hides the work and decisions that drive data processing, while boutique data foregrounds the hidden labor and assumptions that shape data. Big data fits information into a predetermined mold, while boutique data models are built from the bottom up. Where a big data mindset treats gaps in data coverage as a corrupting null to be fixed, a boutique approach to data sees these gaps not as empty voids but as evocative absences worth further investigation. In this presentation, I will examine both the successes and failures of a boutique approach to data through a case study of Laboring-Class Poets Online and speculate about possible future improvements to the project.

The texts and histories studied by scholars of laboring-class culture are riddled with gaps. Since the publication of E. P. Thompson’s The Making of the English Working Class over fifty years ago, researchers have increasingly viewed laboring-class poets and their writing as subjects worthy of scholarly inquiry. Rather than portraying proletarian writers as isolated anomalies or novelties, such as how George Thomson characterized Robert Burns as a “heav’n taught ploughman” in his famous obituary for the Scottish bard, modern critics acknowledge that working-class writing was a significant, widespread phenomenon. However, while some British laboring-class poets such as Burns or John Clare have achieved near-canonical status, most of these writers are still obscure figures. Information on their lives and access to their writing remains scarce and scattered, hindering research on both their personal histories and their poetry.

Laboring-Class Poets Online ( LCPO) addresses this gap by aggregating biographical and bibliographical information about the more than 2,000 British laboring-class poets who published between 1700 and 1900 and the texts they produced. LCPO draws on collaborative research initially collected by an international distributed team of researchers over several decades and presented as biographical entries in A Database of British and Irish Labouring-Class Poets and Poetry. LCPO transforms these freeform biographical snippets into structured, web-accessible records. This structure facilitates a prosopographic approach to British working-class literary studies. Lawrence Stone defines prosopography as “the investigation of the common background characteristics of a group of actors in history by means of a collective study of their lives.” This methodological shift from the study of individual biographies to collective biographical and bibliographic patterns enables a more comprehensive understanding of laboring-class literary production at a time of great social and economic change. Users can ask questions about laboring-class literature holistically and map trends and themes, including the impact of industrialization; the role of religion as a vehicle for literacy and a source of aesthetic influence; the tension between increased urbanization and a celebration of regional identity, often demonstrated through writing in dialect; the transformation of the publishing industry and the role of patronage and subscription publishing; the growth of literary miscellanies and magazine publishing; and the influence of organized labor movements (e.g., Chartism or Christian Socialism) on laboring-class artistic expression. Scholars can investigate emigration patterns, education level, labor engagement, health outcomes, poet occupations, and interactions with the criminal justice and social relief systems. Publications can similarly be filtered and searched by typical facets such as publication date, author, or location, but also by subscription lists, patronage, cost, or print run size.

Users can interact with aggregate data through numerous data visualizations including geographic maps that show poet and publication locations; timelines of individual lives or major events which shaped the working classes; and network graphs that display connections between writers based on correspondence, personal relationships, or literary influence. Each of these visual forms encourages users to shuttle back and forth between individual records and aggregate analysis. Users can also create collections of content for further interpretation and analysis, correct mistakes in poet entries, or contribute new data to the website. All data presented through Laboring-Class Poets Online are freely available for download or access via a REST API.

This information is vital for scholars of working-class writing and culture, but it is also an instance of boutique humanities data (capta): a collaboratively and manually created and curated small dataset of several thousand entities extracted during ongoing research. While scholars often use context to interpret data points in historical documents, databases and computational methods typically lack this capability. Uncertainty is embedded in historical sources, but databases often strip away ambiguity to perform the computational functions that make their use worthwhile. By taking a boutique approach to historical and literary information, LCPO retains much of this ambiguity and offers insight into how humanities researchers can accommodate a complex understanding of space and time as continuously unfolding events.

Appendix A

  1. Ball, C., Graban, T. S. and Sidler, M. (Forthcoming). The Boutique is Open: Data for Writing Studies. In Rice, J. and McNely, B. (eds), Networked Humanities: Within and Without the University. Parlor Press.
  2. Drucker, J. (2011). Humanities Approaches to Graphical Display. Digital Humanities Quarterly, 5(1) http://www.digitalhumanities.org/dhq/vol/5/1/000091/000091.html
  3. Kaplan, F. (2015). A Map for Big Data Research in Digital Humanities. Frontiers in Digital Humanities, 2 doi: 10.3389/fdigh.2015.00001. http://journal.frontiersin.org/article/10.3389/fdigh.2015.00001/abstract
  4. Goodridge, J. (ed). (2017) A Database of British and Irish Labouring-Class Poets and Poetry, 1700-1900.
  5. Manovich, L. (2012). Trending: The Promises and Challenges of Big Social Data. In Gold, M. (ed), Debates in the Digital Humanities. University of Minnesota Press.
  6. Posner, M. (2015). Humanities Data: A Necessary Contradiction Miriam Posner’s Blog http://miriamposner.com/blog/humanities-data-a-necessary-contradiction/
  7. Stone, L. (1972). Prosopography. In Gilbert, F. and Graubard, S. (eds), Historical Studies Today. New York.