Visualising The Digital Humanities Community: A Comparison Study Between Citation Network And Social Network

Jin Gao (jin.gao.13@ucl.ac.uk), UCL Centre for Digital Humanities, Department of Information Studies, University College London, United Kingdom and Julianne Nyhan (j.nyhan@ucl.ac.uk), UCL Centre for Digital Humanities, Department of Information Studies, University College London, United Kingdom and Oliver Duke-Williams (o.duke-williams@ucl.ac.uk), UCL Centre for Digital Humanities, Department of Information Studies, University College London, United Kingdom and Simon Mahony (s.mahony@ucl.ac.uk), UCL Centre for Digital Humanities, Department of Information Studies, University College London, United Kingdom

1. Introduction

An understanding of the intellectual and social structures of Digital Humanities (DH) has been sought by many scholars; some have pointed to the potential usefulness of quantitative methods in such analyses (McCarty, 2003; Terras et al., 2013). A few existing studies have applied quantitative methodologies to analyse publication, conference and social media data (e.g. Nyhan and Duke-Williams, 2014; Weingart, 2016; Grandjean, 2016). This study not only incorporates such approaches but extends them by integrating new analysis and visualisation methods into the wider study of DH’s intellectual and social structures.

In this paper, we will introduce research on the citation and social network of DH that is giving rise to new understandings of the field’s community structure; scholarly interactions; disciplinary development; and formal/informal communication channels. The citation network of Author Co-Citation Analysis (ACA) comprises 22,321 cited authors across 52,823 cited references from the three core DH journals, while the social network of Twitter Co-Retweet Analysis comprises 3,160 Twitter users and 5,929,609 tweets. To the best of our knowledge, this study is the first to combine bibliometric and social network methods to visualise and compare the DH communities and to uncover their histories. This research contributes to ongoing discussions and debates about the DH knowledge and community structures (Gold and Klein, 2016).

2. Data Analysis

2.1. Citation network

The Author Co-Citation network study was presented at DH2017; this paper extends this earlier analysis. For reasons of clarity, we here give a brief overview of this research. The network has been constructed by collecting the citation data of three core DH journals up to December 2016: Computers and the Humanities ( CHum), Digital Scholarship in the Humanities ( DSH) and Digital Humanities Quarterly ( DHQ). 2,582 articles with 52,823 cited references were collected (see Figure 1).

Figure 1. number of articles collected each year (1966-2016)

By using fractional non-self-citation count and exclusive co-citation count (Zhao and Strotmann, 2008), the weights of nodes and links respectively were calculated for visualisation using the software VOSviewer 1.6.7 (van Eck and Waltman, 2010). An author name disambiguation method (Strotmann et al., 2009) was used, and 22,321 unique cited authors identified. Where possible, other pertinent information was collected (i.e. author full names, gender, country of affiliation, etc.). After counting the occurrences of two authors being cited together, ACA shows the DH intellectual structure and influential topics and scholar groups (Figure 2).

Figure 2. DH Author Co-Citation network

2.2. Twitter network

Given DH’s early adoption and active use of Twitter (Ross et al., 2011), previous studies have explored the field’s scholarly communications and community on Twitter (e.g. Quan-Haase et al., 2015; Grandjean, 2016). This study introduces a new approach ( co-retweet) to visualise the DH social network.

We have selected all the Twitter users that are followed by the Alliance of Digital Humanities Organisations (ADHO) and its member organisations’ (see http://adho.org/) Twitter accounts. As dynamic and interdisciplinary as the DH community, it is often difficult and subjective to select the users by their account descriptions. In contrast, the following relationships by the DH organisations indicate more genuine and representative identification. A total of 3,160 unique users have been collected along with all the 6 million tweets from 21/03/2006 when Twitter was created up to 22:00 (GMT) on 5/11/2017 (see Figure 3).

Figure 3. number of tweeting users collected each year

Similar to the citation network method, the Twitter user Co-Retweet network has been constructed by calculating the number of non-self-retweets the user received ( non-self-retweet count), and the number of same tweets that two users both retweeted ( co-retweet count) for the weights of the nodes and links respectively. The network resulting visualisations are shown on Figure 4.

Figure 4. DH Twitter Co-Retweet network

3. Results and comparison

In the citation network, the authors identified distinct topic-based clusters of researchers with backgrounds in information studies and historical literature; in linguistics; in statistical text analysis; in early concordance projects; and biotech influenced text analysis. In contrast, the co-retweet network exhibits grouping based on language and region, with clusters related to scholars in North America; in Australia; in the UK; and clusters with Francophonic, Germanophonic and Hispanophonic backgrounds.

The Twitter clusters are connected closely whereas clusters on the citation network are more loosely linked. This makes sense, as topics of study are generally more specific and less likely to change, whereas users on social media probably share a wider range of interests. The citation network is based on formal communications and it would take years to get sufficient citations to form links between two scholars. The Twitter network, however, is constructed by more informal interactions between users, and once two users retweeted the same tweet, they immediately build a link on the network.

By visualising both networks during different time periods, this study will also present the topics, disciplines, countries that are involved, and how the networks have been developed and formed over time.

As shown in Figure 5, we divided the 51 years (1966-2016) of historical citation data into five different periods (Hockey, 2004) for visualisation. The citation clusters experienced isolation (1966-1970); connection (1971-1985); consolidation (1986-1990); sub-fields development (1991-2005); and new specialties expansion (2006-2016). Over time, the most cited topics moved from concordance construction, to computational linguistics, then to information and historical literature studies.

Figure 5. DH Author Co-Citation networks in five periods

As shown in Figure 6, DH Twitter users started to have co-retweet connections in 2009; and then they experienced the beginning of connection (2010); multi-region connection (2010); Anglophonic cluster to centre (2011); Francophonic cluster to develop (2012); North America and UK to separate (2013); Germanophonic to come out (2014); Australian cluster to show (2015); Hispanophonic cluster to emerge (2016); Density continue to move to North America cluster (2017). Over time, the network visualisations show that the density is moving from European clusters towards the North American cluster.

Figure 6. DH Twitter Co-Retweet networks in different years

4. Discussion and conclusion

This study is not only the first to contribute to the DH history and community studies by visualising and comparing bibliometric and social networks, but also introduces new network approaches ( co-retweet) to study communications on social media that could support wider social network and data visualisation studies.

As we will discuss, network studies offer powerful but partial ways of studying the aspects of communities that are amenable to quantitative methods. We do not present the visualisations included in this paper as normative representations of the DH “community” or “communities”. Nevertheless, when used with caution, network studies can shed new light on important aspects of the historical formation of DH.

There are methodological limitations exist. For example, because the research subjects (cited authors and retweeting users) are not the same group of people (although with much overlap), obvious differences are expected. Besides, the citation lag time has been considered. Other practical methods to identify and study the DH Twitter communities can also be applied.


Appendix A

Bibliography
  1. Davis, L.S., Johns, S.A., Aggarwal, J.K. (1979). Texture Analysis Using Generalized Co-Occurrence Matrices. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-1, 251–259. https://doi.org/10.1109/TPAMI.1979.4766921
  2. Gold, M.K. and Klein, L.F. (2016). Debates in the digital humanities: 2016. University of Minnesota Press, Minneapolis London.
  3. Grandjean, M. (2016). A social network analysis of Twitter: Mapping the digital humanities community. Cogent Arts & Humanities,3:1171458. https://doi.org/10.1080/23311983.2016.1171458
  4. Hockey, S. (2004). The History of Humanities Computing, in: Schreibman, S., Siemens, R., Unsworth, J. (2004). A Companion to Digital Humanities. Blackwell Publishing Ltd, Malden, MA, USA, pp. 1–19.
  5. McCarty, W. (2003). Humanities Computing, in: Encyclopedia of Library and Information Science. Marcel Dekker, New York.
  6. Nyhan, J. and Duke-Williams, O. (2014). Joint and multi-authored publication patterns in the Digital Humanities. Literary and Linguistic Computing,29:387–399. https://doi.org/10.1093/llc/fqu018
  7. Quan-Haase, A., Martin, K., McCay-Peet, L. (2015). Networks of Digital Humanities Scholars: The Informational and Social Uses and Gratifications of Twitter. Big Data & Society,2. https://doi.org/10.1177/2053951715589417
  8. Ross, C., Terras, M., Warwick, C., Welsh, A. (2011). Enabled Backchannel: Conference Twitter Use by Digital Humanists. Journal of Documentation,67:214–237. https://doi.org/10.1108/00220411111109449
  9. Strotmann, A., Zhao, D., Bubela, T., (2009). Author name disambiguation for collaboration network analysis and visualization. Proceedings of the American Society for Information Science and Technology,46:1–20. https://doi.org/10.1002/meet.2009.1450460218
  10. Terras, M.M., Nyhan, J., Vanhoutte, E. (2013). Defining Digital Humanities: A Reader. Ashgate Publishing Company, Farnham, Surrey, England: Burlington, VT.
  11. van Eck, N.J. and Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics,84:523–538. https://doi.org/10.1007/s11192-009-0146-3
  12. Weingart, S.B., (2016). dh quantified: A Review of Quantitative Analyses of the Digital Humanities. the scottbot irregular: data are everywhen.
  13. Zhao, D. and Strotmann, A. (2008). All-author vs. first-author co-citation analysis of the Information Science field using Scopus. Proceedings of the American Society for Information Science and Technology, 44:1–12. https://doi.org/10.1002/meet.1450440262