To Catch a Protagonist: Quantitative Dominance Relations in German-Language Drama (1730–1930)

Frank Fischer (frafis@gmail.com), National Research University Higher School of Economics, Moscow, Russian Federation und Peer Trilcke (trilcke@uni-potsdam.de), University of Potsdam und Christopher Kittel (contact@christopherkittel.eu), University of Graz und Carsten Milling (cmil@hashtable.de), National Research University Higher School of Economics, Moscow, Russian Federation und Daniil Skorinkin (dskorinkin@hse.ru), National Research University Higher School of Economics, Moscow, Russian Federation

1. Related Studies

The idea that "quantitative dominance relations" represent an "important parameter for the central or peripheral position of a character" in a drama has been established by Pfister in his crucial structuralist monograph on the analysis of drama (Pfister 1997, p. 227). Digital-empirical studies after Pfister have tested different approaches to provide quantitative descriptions of the dramatis personae. Moretti's suggestion to tie the detection of the protagonist to the network analytical criterion of average distance (Moretti 2011, p. 4) was rejected as too simplistic (Trilcke 2013, p. 204), although this network-analytical approach was taken up by numerous studies. In this vein, Jannidis et al. (2016) not only calculated quantitative measures for the frequency of a character's appearance, but also the weighted degree to determine the accuracy of the identification of main characters. Moretti himself has adjusted his approach conceptually, insofar as he has shifted the focus from the 'protagonist' to a relationally defined concept of 'centrality'. He also emphasised the tension between different criteria (like word space and character space) not as a deficit, but as a productive factor of a multidimensional quantitative analysis of the dramatis personae (Moretti 2013, pp. 5–9). Moretti's basic ideas – the productive multidimensionality of quantitative analysis and the insight into the relational conceptualisation of quantitative character classification – have been taken up by Algee-Hewitt (2017), who worked with two network-analytical centrality measures (betweenness centrality and eigenvector centrality) and tried to examine the quantitative distribution of the cast of a play.

2. Goal and Procedure

We will conceptually discuss and complement available approaches to the quantitative description of characters in dramatic texts and test them on the basis of a corpus of 465 German-language dramas. The aim in theoretical and conceptual terms is to gain a better understanding of the dimensions of quantitative character analysis and to present diachronic and typological insights into quantitative dominance relations in German-language drama from 1730 to 1930. The subject of the analyses is the DLINA corpus (Fischer & Trilcke 2015). The data is calculated using the Python tool "dramavis", which has been supplemented for this purpose with new analysis modules (Kittel & Fischer 2017).

In the first step, we examine the multidimensionality of quantitative descriptions as determined by Moretti (chapter 3). In a second step, we will take up Algee-Hewitt's (2017) proposed approach of working with quartiles and discuss it on the basis of the data from the DLINA corpus (chapter 4). In the third step, we present an approach that describes the quantitative distribution of characters in a play (section 5).

3. The Correlation of Count-Based and Network-Based Rankings of Characters

The above-mentioned approaches have made use of various measures for the quantitative description of dramatic characters, which can be divided in two groups: count-based measures, such as the number of words expressed by a character, and network-based measures, mostly centrality measures. According to Moretti 2013 (cf. also Jannidis et al. 2016), these two descriptive 'dimensions' can differ considerably.

In order to systematically describe the extent of this deviation, we calculate eight values for each character of the 465 dramas of our corpus, three count-based measures (number of scenes a character appears in, number of speech acts, number of spoken words) and five network-related measures (degree, weighted degree, betweenness centrality, closeness centrality, eigenvector centrality). For each measurement a ranking is created. The rankings are then merged into two meta-rankings: one count-based and one network-based. The two meta-rankings are then combined into an overall ranking.

To determine the deviation between the two meta-rankings, we calculate the ranking correlation coefficient Spearmans Rho and check how strongly the two meta-rankings correlate with each other for all dramas of our corpus (fig. 1).

Spearmans Rho for the correlation of count-based and network-based measures.
Figure 1. Spearmans Rho for the correlation of count-based and network-based measures.

Complete congruence of the meta-rankings is an exception. In fact, the different measures capture different 'dimensions' of the quantitative character hierarchy. In order to better understand these dimensions, five dramas (marked by the green dots in fig. 1) are examined in more detail and discussed in this paper (see figs. 2, 4, 6, 8, 10 for rankings, figs. 3, 5, 7, 9, 11 for the network graphs of these dramas).

Rankings for Lessing's
Figure 2. Rankings for Lessing's "Emilia Galotti" (1772) – Spearmans Rho: 0,917.
Network graph for Lessing's
Figure 3. Network graph for Lessing's "Emilia Galotti" (1772).
Rankings for Iffland's
Figure 4. Rankings for Iffland's "Das Erbtheil des Vaters" (1802) – Spearmans Rho: 0.806.
Network graph for Iffland's
Figure 5. Network graph for Iffland's "Das Erbtheil des Vaters" (1802).
Rankings for Schiller's
Figure 6. Rankings for Schiller's "Die Jungfrau von Orleans" (1801) – Spearmans Rho: 0.672.
Network graph for Schiller's
Figure 7. Network graph for Schiller's "Die Jungfrau von Orleans" (1801).
Rankings for Anzengruber's
Figure 8. Rankings for Anzengruber's "Der Meineidbauer" (1871) – Spearmans Rho: 0.442.
Network graph for Anzengruber's
Figure 9. Network graph for Anzengruber's "Der Meineidbauer" (1871).
Rankings for Wedekind's
Figure 10. Rankings for Wedekind's "Franziska" (1912) – Spearmans Rho: -0.222.
Network graph for Wedekind's
Figure 11. Network graph for Wedekind's "Franziska" (1912).

As it turns out, the deviations usually affect characters at the bottom of the hierarchy. Here the network-related measures are particularly sensitive to types of clustering in secondary scenes of a drama (figs. 8 and 9), which has even more severe effects if a drama is quantitatively dominated by very few characters (figs. 10 and 11). On the other hand, both meta-rankings are very similar for the quantitatively dominant characters (top 1 and top 1 or 2). These observations can serve as an argument that the multidimensional description is less relevant to discuss protagonists, but rather for the characterisation of quantitative dominance relations within the cast as a whole.

4. Percentage of Quantitative Dominant Characters

Algee-Hewitt 2017 made a suggestion for the characterisation of the quantitative distribution of a cast, albeit with a continuing focus on quantitatively dominant characters. Working with an English-language drama corpus of several thousands of plays, he calculated the eigenvector centrality of the characters and then calculated the percentage of characters located in the upper quartile of the distribution. We have reproduced this test with our corpus – for all measures mentioned above. As an example, the box plots show the values for the eigenvector centrality (fig. 12) as well as for the count-based measure 'number of words' (fig. 13).

Percentage of characters in the upper quartile according to their eigenvector centrality, grouped by decades. Blue line: median.
Figure 12. Percentage of characters in the upper quartile according to their eigenvector centrality, grouped by decades. Blue line: median.
Percentage of upper quartile characters according to number of words, grouped by decades. Blue line: median.
Figure 13. Percentage of upper quartile characters according to number of words, grouped by decades. Blue line: median.

It is interesting to note that the values we calculated for eigenvector centrality (fig. 12) are significantly higher than the values presented by Algee-Hewitt (for the period of time covered by our corpus the median is lower than 0.15). A comparison of fig. 12 and fig. 13 shows that the network-related measures usually locate a larger percentage of characters in the upper quartile. From a network-analytical point of view, a drama tends to be dominated by several characters. The two curves also tend to follow the same course, especially regarding the big flattening curve in the decades after 1740. What we can see there is the reduction of the percentage of quantitatively dominant characters ('main characters') and the emergence of quantitatively less dominant characters ('secondary characters', 'atmospheric characters').

5. More Detailed Distribution Analysis

With a focus on quantitatively dominant 'main characters', Algee-Hewitt's approach attempts to describe types of distribution of the dramatis personae and thus to identify 'dominance relations' via distribution analyses. Following analyses of the 'small world' phenomenon in drama (Trilcke et al. 2016), we propose to extend this approach and develop a typology of quantitative distribution of characters in dramatic texts. To this end, we take the above-mentioned eight quantitative measures, calculate the distribution of character values across deciles and subject this distribution to a regression analysis (tests on linear, exponential, quadratic and power-law distribution; typologisation according to the regression line with the highest coefficient of determination). Examples in figs. 14 to 19 show the results for three of the dramas discussed above, for a network-related measure (weighted degree) and a count-based measure (number of words).

Lessing's
Figure 14. Lessing's "Emilia Galotti" (1772) – decile distribution of weighted degree, y-axis: number of characters.

Lessing's
Figure 15. Lessing's "Emilia Galotti" (1772) – decile distribution of number of words, y-axis: number of characters.
Anzengruber's
Figure 16. Anzengruber's "Der Meineidbauer" (1871) – decile distribution of weighted degree, y-axis: number of characters.

Anzengruber's
Figure 17. Anzengruber's "Der Meineidbauer" (1871) – decile distribution of number of words, y-axis: number of characters.
Schiller's
Figure 18. Schiller's "Die Jungfrau von Orleans" (1801) – decile distribution of weighted degree, y-axis: number of characters.

Schiller's
Figure 19. Schiller's "Die Jungfrau von Orleans" (1801) – decile distribution of number of words, y-axis: number of characters.

These typologies are calculated for all eight values and for all 465 dramas – we will present the results for the corpus as a whole at the conference. With our approach, a more precise, multidimensional description of typical quantitative dominance relations in drama will be possible. The increase in the number of less dominant characters observed on the basis of our quartile analysis (figs. 12–13) will be described with more precision and supplemented by a more differentiated examination of types of 'middle characters'.

6. Summary

This talk brings together several approaches for the quantitative analysis of characters in literary texts, discusses the potential of a multidimensional description beyond top characters (protagonists) and suggests an approach for typologising quantitative dominance relations within the cast of a drama.


Appendix A

Bibliography
  1. Algee-Hewitt, Mark (2017): "Distributed Character: Quantitative Models of the English Stage, 1500–1920". Digital Humanities 2017. Conference Abstracts. McGill University & Université de Montréal, 119–121. URL: < https://dh2017.adho.org/abstracts/103/103.pdf >.
  2. Fischer, Frank / Trilcke, Peer (2015): "Introducing dlina Corpus 15.07 (Codename: Sydney)". dlina blog, 20 June 2015. URL: < https://dlina.github.io/Introducing-DLINA-Corpus-15-07-Codename-Sydney/ >.
  3. Kittel, Christopher / Fischer, Frank (2017): "dramavis (v0.4)". GitHub, September 2017. URL: < https://github.com/lehkost/dramavis >.
  4. Jannidis, Fotis / Reger, Isabella / Krug, Markus / Weimer, Lukas / Macharowsky, Luisa / Puppe, Frank (2016): "Comparison of Methods for the Identification of Main Characters in German Novels". Digital Humanities 2016: Conference Abstracts. Jagiellonian University & Pedagogical University, Kraków, 578–582.
  5. Moretti, Franco (2011): Network Theory, Plot Analysis (= Literary Lab Pamphlet 2), May 2011. URL: < http://litlab.stanford.edu/LiteraryLabPamphlet2.pdf >.
  6. Moretti, Franco (2013): "Operationalizing": or, the function of measurement in modern literary theory (= Literary Lab Pamphlet 6), December 2013. URL: < https://litlab.stanford.edu/LiteraryLabPamphlet6.pdf >.
  7. Pfister, Manfred (1997): Das Drama. Theorie und Analyse. 9th ed. Munich: Fink.
  8. Trilcke, Peer (2013): "Social Network Analysis (SNA) als Methode einer textempirischen Literaturwissenschaft". Ajouri, Philip / Mellmann, Katja / Rauen, Christoph (eds.): Empirie in der Literaturwissenschaft. Münster: mentis, 201–247.
  9. Trilcke, Peer / Fischer, Frank / Göbel, Mathias / Kampkaspar, Dario (2016): "Theatre Plays as 'Small Worlds'? Network Data on the History and Typology of German Drama, 1730–1930". Digital Humanities 2016. Conference Abstracts. Jagiellonian University & Pedagogical University, Kraków, 385–387.