The Archive as Collaborative Learning Space

Natalia Ermolaev (, Princeton University, United States of America and Mark Saccomano (, Columbia University, United States of America

The Serge Prokofiev Archive as Data

In 2014, the Columbia University Rare Book & Manuscript Library (RBML) acquired a unique archival collection. The Serge Prokofiev Archive, which contains materials related to the twentieth-century Russian composer Sergei Prokofiev (1891-1953), contains more than 17,500 diverse items: music manuscripts, letters, financial documents, scores, concert programs, notebooks, monographs, articles, journals, photographs, audio and visual recordings, and ephemera in original, photocopy, and digital formats. The archive was first established in 1994 at Goldsmiths College, London (Mann, 2008). In the twenty years that it grew, a complex, intricate, and item-level descriptive apparatus evolved alongside. By the time the collection came to Columbia, the archival items were accompanied by hundreds of metadata files in formats such as spreadsheets, Word documents, text files, PDF, Endnote databases, Access database, MARC records, and various XML encodings. Our poster describes how we an archivists and digital humanities researcher curated, explored, and analyzed, this dense and diverse body of data.

Our first steps were to satisfy the immediate need of funders and stakeholders: making records of the Prokofiev Archive publicly available through the finding aid on the RBML website. Though the goal was clearly defined —  records in XML using Columbia’s EAD (Encoded Archival Description) schema and style guide — the process was complex. Records from Goldsmiths differed in both structure and content depending on the item catalogued. For example, data about books was captured in EndNote and MARC, while information about music manuscripts was kept in Excel spreadsheets, and correspondence records were in an Access database. We worked to transform all data into XML, and then ran customized XSLT transformations to generate standard EAD. However, what we gained in standardization we lost in information richness: this custom EAD schema didn’t allow the encoding elements at the level of granularity we had in the original records. Significant scholarly information was lost. In addition, the conventional finding aid interface limits the user’s options for exploring a large archival collections: content is presented in blocks of narrative, long lists of items, and search and browse organized by series and sub-series that does not allow for easy cross-collection discovery.  

Thus, our next task was to find alternatives for the analysis and representation of the Serge Prokofiev Archive.  We decided to pivot our approach, and moving away of EAD, transformed structured XML into a series of CSV files that could be manipulated with various data analysis and visualization tools. Not surprisingly, both the processes and our results deepened our understanding of the archive and of Prokofiev’s work and legacy: an alluvial chart of the music manuscript series, for example, showed patterns in the way Prokofiev used multiple languages for different types of annotations and markings as he wrote his scores;  a map using location data of Prokofiev’s letters revealed his correspondence with Russian-American composers who had emigrated to China; a network graph using metadata about the secondary literature on Prokofiev (books, journal articles) showed surprising connections between editors and authors in Soviet and Western publications.

Our experience demonstrated the value of creative engagement with archival data; through experimentation and play, the Serge Prokofiev Archive became a site of collaborative research and learning. Our work was guided by two important conceptual shifts in the library and archives profession: one is the “Collections as Data” movement, which encourages reframing the digital object as data (Padilla, 2016), and the second is the move away from locating value exclusively in the objects of a collection to the impact collections have on people and communities . In Kate Theimer’s notion of “archives as platform,” tools and technologies help users interact with archives in creative ways that add value to their lives and experiences. Work that takes place “behind the scenes” (Theimer, 2014 ) by archivists and their collaborators helps define the archive as a dynamic cross-disciplinary learning space.

Appendix A

  1. Noëlle Mann, “The Serge Prokofiev Archive in London - A Complex Story,”   Fontes Artis Musicae,Vol. 55, No. 3 (July-September 2008), pp. 543-547.
  2. Thomas Padilla, “On a Collections as Data Imperative,” conference report, Collections as Data: Stewardship and Use Models to Enhance Access, Library of Congress, Washington, DC, September 27, 2016,
  3. Kate Theimer, “ The Future of Archives is Participatory: Archives as Platform, or A New Mission for Archives,” April 3, 2014 .