Seinfeld at The Nexus of the Universe: Using IMDb Data and Social Network Theory to Create a Digital Humanities Project

Cindy Conaway (, SUNY Empire State College, United States of America and Diane Shichtman (, SUNY Empire State College, United States of America

This Digital Humanities project is an interdisciplinary project effort that uses the lens of, and data from, the U.S. TV show 
Seinfeld to explore questions about television and other media. 
has significant cultural influence over other media, but what is its
reach, meaning the many other media items cast and crew worked on, also known as the
overlap? We are starting with data from the Internet Movie Database (IMDb). This makes this project somewhat different from other Digital Humanities projects as we’re using an existing database rather than primary sources. An associate professor of media studies, accustomed to conducting critical analysis of television shows, and an associate professor of information systems, more used to working with non-media studies data, are working to populate a relational database, to use quantitative analysis, and a social science theory–social network theory, particularly “Small Worlds” theory–to explain trends in media industries, including questions of genre, gender, race, and age in entertainment businesses.

Seinfeld (NBC 1989-1998) was a US-based half-hour, multi-camera, situation comedy, one of several that featured stand-up comics in stories similar to their own lives. Although it ended nearly 20 years ago, it

heavily influences TV shows of today, including “hangout” sitcoms, one-camera comedies featuring conversation and digression, and antihero dramas. Journalist Jennifer Keishen Armstrong writes in the bestselling
that the show “snuck through the network system to become a hit that changed TV’s most cherished rules; from then on, antiheroes would rise to prominence, unique voices would invade the airwaves, and the creative forces behind shows would often gain as much power and fame as the faces in front of the cameras” (Armstrong, 2016). It’s a singularly important show for a variety of reasons.

has significant cultural impact on other shows and movies, but what we wanted to know is, what is its ’reach’? Reach is defined as other media that texts cast and crew from


worked on before, during, and after their appearance(s) on the show. Such texts exist in every media type (movies, video games, web-based media). When two media items share cast/crew, we look for overlap.

Dr. Conaway worked on the project for two years, using cut and paste and Excel spreadsheets for items and people, before involving Dr. Shichtman, who has created a relational database that may be searched. We first used MySql and an Amazon Web Services server, have recently shifted to the college’s virtual machine and the Oracle database management system. We involved two students in a grant funded practicum in the Fall term as well.

Our research revealed that the 1551 cast/crew had worked on over 32,500 other discrete media texts, starting in 1936, and with many texts still on the air today, often with an overlap of more than one. Nearly every televison series, TV movie, and TV special we could think of included overlap. Only recently, in “peak TV”—in which there are over 500 scripted TV shows in production this year alone, in addition to reality, sports, and news shows (many of which also have overlap)—are we seeing well-known US TV series with no overlap. Our research found that although most were US-based, there were media items from over 60 countries.

Social network theory would help us answer some questions. As Duncan Watts writes in
Six Degrees: the Science of a Connected Age
, “Affiliation networks . . . are . . . networks of overlapping cliques, locked together via the comembership of individuals in multiple groups” (Watts, 2004). Small worlds theory discusses how networks of people influence each other, and each others’ connections.

Questions include, what genres did the cast/crew, presumably chosen for a common comic sensibility, work on other than comedy? What genres included the most cast/crew? What genres have less overlap, none at all, and what might be some reasons for that? What is the importance of gender, race, and age?

We looked for other, similar projects that used IMDb and found that there were few that did. Some computer scientists had used IMDb to trace the overlaps among actors involved in ’adult‘ films in the database as an example of a ’small world‘ environment. Media History scholars had traced ’race films‘ that ended before our database started, and Digital Humanities scholars used it to look at patterns of exhibition of films or specifically how Australians worked together, but not to examine how cast circulated among media.

IMDb, it turns, out, is a challenging tool for this purpose.
Deb Verhoeven, Associate Dean of Engagement and Innovation of the University of Technology Sidney, who has done a lot of Digital Humanities work on Australian films explained in 2012 that IMDb consists of “elaborated sets of lists” created by fans, writing:

Accordingly, the primary users of filmographic catalogues are not cinema historians, information managers, analytical filmographers, or cinema scholars, but members of the public, film buffs, students and so on who are content to navigate these databases using the small number of structured search fields provided. (Verhoeven, 2012)

IMDb, which started in the early 1990s, is very robust, and provides information for free download using Python, but is not usable ’as is.’ Entries may be misleading, incomplete, or unclear, with genres in particular organized in unhelpful ways. The Downloadable information includes the full cast and some types of crew members, but not others. In addition, the fields of the two faculty members made shared vocabulary difficult, and getting complete and clean data that could be turned into tables and graphs meant conducting additional research outside of IMDb, and reorganizing the data significantly from the way Dr. Conaway initially tagged it. SUNY Empire State College also lacks the structures that many institutions have for conducting Digital Humanities work.

However, we have been able to create some early data visualizations that will show a microcosm of how the US entertainment industry works for various types of actors and crew members, using specifically the data from television programs. We’ve compared
Seinfeld’s numbers of actors and crew to that of other shows, analyzed how the media items break down by genre, and visualized how women’s careers wax and wane in different patterns from men’s careers. In the future we will do the same for subgenres, actors of color, and actors of various age groups.

Appendix A

  1. Armstrong, J.K., 2016.
    Seinfeldia: How a Show about Nothing Changed Everything. Simon and Schuster.
  2. Gold, M.K. 2012.

    Debates in the Digital Humanities
    . University of Minnesota Press.
  3. Gold, M.K. and Klein, L.F., 2014.

    Debates in the Digital Humanities
    . University of Minnesota Press.
  4. Lavery, D. And Dunne, S.L. 2006.
    Seinfeld, Master Of its Domain. New York: Continuum.
  5. Verhoeven, D.
    New cinema history and the computational turn. Beyond Art, Beyond Humanities, Beyond Technology: A New Creativity, Proceedings Of the World Congress Of Communication and the Arts Conference, University Of Minho, Portugal. 2012
  6. Watts, D.J. 2004.
    Six Degrees: The Science Of a Connected Age. WW Norton & Company.

Leave a Comment