Metadata Challenges to Discoverability in Children’s Picture Book Publishing: The Diverse BookFinder Intervention
37% of the United States population is non-white, but 90% of the books published for children during the last twenty-one years contain no multicultural content. This discrepancy has been called “The Diversity Gap” (Erlick, 2015) and, more starkly, the “Apartheid of Children’s Literature” (Myers, 2014). Based on data gathered since 1985 by the Cooperative Children’s Book Center, the representation gap has barely shifted over thirty years, with books by and about non-white people hovering between 10-14% of total children’s book production (CCBC 2017). Panels and initiatives about diversity in book publishing have not actually produced more books by and about non-white people. Book discoverability is thus a significant challenge to parents, librarians, and teachers seeking picture books depicting the lives of non-white children.
As it’s currently practiced in the North American book industry, “diversity” usually tallies “how many” rather than delving into the lived experience of non-white people. The Diverse BookFinder [DBF], a database and metadata project sponsored by Bates College and funded by the Institute for Museum and Library Services [IMLS], asks: how can metadata help to tackle these entangled problems? Can we build a network of information about children’s picture books that trains users to search for and discover books using complex concepts related to their own communities rather than race or ethnicity as the sole marker?
Our long paper:
- Surveys the problem of whiteness as the de facto point-of-view in children’s books, and the populist social media movements that resist this phenomenon;
- Examines the limitations of current metadata in k-3 books about non-white children;
- Presents Diverse BookFinder as a strategic disruption of current metadata practices;
- Conveys the pedagogical value of Diverse BookFinder in academic and public settings.
Readers have created online, massively participatory movements to prompt the predominantly white book publishing industry to publish more books by and about non-white people (Low, “Diversity Baseline Survey,” 2016). #WeNeedDiverseBooks, #1000BlackGirlBooks, and #OwnVoices originated in Twitter hashtags then converted their social media capital (likes, shares, reposts, followers) into an array of recommendation services: published anthologies, book finder apps, a short story contest, even a granting agency. Populist interventions are welcome and useful, but they are insufficient to remedy the problem of classifying existing books using metadata that reinscribe white privilege.
One intervention in these human and machinic systems is a human-curated and -coded catalog, Diverse BookFinder (
). Metadata and recommendation systems are not neutral. They operationalize cultural assumptions that the creators may not have intended or even be aware of. Critical code studies, the scholarship of platforms and software that examines computer source code hermeneutically, has charted useful ground in exploring how metaphors of containment and layers, for example, rationalize logics of racial exclusion (McPherson 2012). Many metadata schemas for books relate back to the physical structures libraries and classrooms use to organize books for readers. These systems create fixed and singular ways of relating items that construct contextualized exclusion (Drabinski 2013). The common cataloging systems used in the United States, including the Library of Congress and Dewey Decimal System, are centered in whiteness and maleness and reinforce the otherness of diverse titles. The separation of topics on women and gender (including queerness) from broader topics such as literature or history, for example, reinforce the notion that women and queers secondarily contribute to history and literature. This segregation repeats in the various forms of metadata where difference is replicated and continually defined by whiteness.
Exclusion is not specific to physical location like a library. Algorithmic “overfitting” is the phenomenon where recommendations are culled from a narrow spectrum of a user’s interests. Overfitting “can occur when a user is trying to be helpful by providing explicit feedback only about the content s/he strongly likes. This leads to the creation of a very specific model that knows the exact user preferences but is unable to detect any other types of interesting items since the user has not shown any interest in it” (Kunaver & Poztlz, 2017, 156). Under this system, the typical person would have difficulty in finding diverse titles online if those books did not already match to their past search behavior. Inconsistencies in the application of metadata compound this problem.
Metadata sometimes contain errors that hide or misrepresent the books, or don’t classify the types of information that would be most relevant to communities seeking books about non-white children. Such books are “mirrors and windows,” that reflect back or “mirror” one’s own lived experience in the faces, bodies, customs and cultural milieu depicted in the book, and open “windows” onto new cultures different from one’s own (Bishop 1990). Such books develop myriad literacies beyond reading comprehension, including conflict resolution, tolerance for the unfamiliar, and awareness of cultures beyond one’s own. When used in the classroom as an intervention toward intergroup contact, diverse picture books can foster intercultural understanding among children (Aronson, et al. 2016). Unfortunately, existing metadata does not account for the intricacies of diverse titles and so these books remain difficult to comprehensively identify or locate. Hand-coding is a remedy to discoverability problem.
Without controlled language, books are simply not findable. There is no eschewing metadata; there is only writing better metadata, and theorizing the best practices that writers of descriptive metadata should follow in order not to reinscribe racist stereotypes and cultural marginalization. The purpose of this vocabulary is not to undo prior standards–which are each problematic in many ways– but to contribute to the larger representation of diverse books and fill in the information holes. Systematic SEO work is underway to add language as it is used by the communities represented; for example, a user may enter “
Boricua” into DBF and yield results about Puerto Rican characters. The goal is to write metadata that reflects the lived experience of people the books depict.
When books are entered into the Diverse BookFinder, they go through a multi-step, hand coding process to compile the metadata commonly missing from other sources. Book characters are coded for racial and/or ethnic identity, gender, setting, with additional tags such as tribal nation, immigration status, or religion where applicable. Most books fall into one of nine categories that capture the message conveyed by these books. The categories are: Beautiful Life, Oppression, Cross-group, Biography, Race/Culture Concepts, Folklore, Incidental [ensemble or background characters of color], and Informational [factual content unrelated to race or culture]. These categories arose from an application of grounded theory, and created by a rigorous analysis of commonalities in picture book stories. This analysis shows that the concept of Beautiful Life, stories about a particular racial or cultural group experience, dominates diverse book publishing, but such a message is commonly unavailable in existing metadata outside of DBF. African Americans are most likely to be depicted in situations of oppression; Native Americans are disproportionately represented in “folklore,” and Hispanic and Latinx people are underrepresented generally in picture books. DBF has engaged students at all levels, at several institutions in thinking about representation and participating in research to better understand the role of picture books in children’s development.
It’s pedagogically valuable to give students a direct search experience of how imprecise book metadata impedes book discoverability. In a lab exercise designed by Bell and implemented by Inman Berens, master’s students retrieved book metadata across three venues for two books in the DBF database. Those venues were: publisher website, retailer, library catalog. Students discovered significant errors in metadata, and notable variability from venue to venue. Ensuing class discussion allowed students to trace the interoperation of human classification errors and legacy systems such as Library of Congress Subject Headings with machinic processes. The students then reviewed a Library of Congress copyright form submitted to the LoC by our student-run trade press (Ooligan Press), and discovered ambiguities in the Library form’s language that prompted misclassification of our press’s just-released young adult novel. This exercise drove home that automated processes are framed by human judgment.
The Diverse BookFinder is unique precisely for the level of human labor that goes into the data entry and book coding process. The inconsistencies and inaccuracies in book metadata and the additional information added to each book’s metadata could not be done by machine. This process serves to bridge the gaps in metadata, help users identify many more diverse titles than the average search, and provides new insights into what stories dominate in picture books. As public scholarship, this project seeks to move the diverse books discussion beyond a focus simply on the lack of numbers to also consider content and impact by translating research findings so that they are accessible and useful.
- Aronson, K. M., Stefanile C., Matera C., Nerini A., Grisolaghi J., Romani G., …Brown R. (2016). Telling tales in school: extended contact interventions in the classroom.
Journal of Applied Social Psychology, 46, 229–241. doi: 10.1111/jasp.12358
- Bishop, R.S. (1990). Windows and mirrors: children’s books and parallel cultures. In M. Arwell and A. Klein (Eds.),
California State University San Bernardino Reading Conference: 14th Annual Conference Proceedings (pp.11-20). San Bernadino, CA: CSUSB Reading Conference. Retrieved from https://files.eric.ed.gov/fulltext/ED337744.pdf#page=11
- Cooperative Children’s Book Center (CCBC). (2017). Children’s Books By and About People of Color Published in the United States. Retrieved from (
- Drabinski, E. (2013). Queering the catalog: queer theory and the politics of correction.
Library Quarterly: Information, Community, Policy, 83(2), 94-111. doi.org/10.1086/669547
- Erlick, H. (2015, March 5). The diversity gap in children’s publishing, 2015 (blog post).
Lee and Low Books: The Open Book Blog. Retrived from
- Jackson, Chris. (2017). Diversity in Book Publishing Doesn’t Exist — But Here’s How It Can (blog post). Retrieved from
- Kunaver & Poztlz (2017). Diversity in recommender systems — A survey.
Knowledge-Based Systems 123 (154-162).
- Low, Jason T.(2016, January 26). Where is the Diversity in Publishing? The 2015 Diversity Baseline Survey Results (blog post).
Lee and Low Books: The Open Book Blog. Retrieved from http://blog.leeandlow.com/2016/01/26/where-is-the-diversity-in-publishing-the-2015-diversity-baseline-survey-results/
- McPherson, Tara. (2012). Why Are the Digital Humanities So White? or Thinking the Histories of Race and Computation.
Debates in Digital Humanities, ed. Matthew K. Gold. Retrieved from http://dhdebates.gc.cuny.edu/debates/text/29
- Myers, Christopher. (2014, March 15). The Apartheid of Children’s Literature.
New York Times.