A Tool to Visualize Data on Scientific Performance in the Czech Republic
The poster introduces a project to develop a visualization application for a unique data source on Czech sciences. Information Register of R&D Results (RIV) is the Czech Republic’s inventory of the outputs of basic and applied research since 1992. Although it is potentially an important source of data for analyses of various aspects of the intellectual organization and publication culture in Czech sciences, this particular data source has earned itself a pejorative nickname – “a coffee grinder” – for its central role in purely mechanistic science evaluation in the country.
By employing text-mining technique that are standard in the digital humanities and by getting inspiration from visualization platforms such as Voyant Tools (Sinclair and Rockwell 2012), the project aims to contribute to the shift in the Czech narrative of science evaluation from the exclusively bibliometric perspective to a more diverse one. For example, the hope is that the visual display of the plethora of topics that are discussed in the research outputs registered in RIV will implicitly criticize the myopic vision in which all disciplines are leveled to the singular measure of the number of publications. The latter system is not only intellectually dubious, but it has had documented adverse effects on the quality of research results. Crucially, it stimulates institutions as well as individuals to prioritize quantity over quality (Good et al. 2015; Grančay, Vveinhardt, and Šumilo 2017).
The ill-fated usage of the RIV data to mold nationwide fiscal policies for scientific research reminds us that data analytics is not necessarily a neutral enterprise. A proper treatment of the data is a matter that confronts a data analyst with questions on the borderline of ethics. Although it is perfectly feasible in technical terms, we wish to discourage users from attempts to track individuals researchers; instead we offer features that display institutional or disciplinary dimensions of the data (see Figure 1). Furthermore, the web application will provide a module to visualize textual information from the register. Textual strings, such as abstracts and keywords, have been part and parcel of the recorded entries, but have only served thus far as mere search terms. Meanwhile, the utility of textual data has been demonstrated in studies that strive to map the intellectual organization and relationships within and between disciplines (Leydesdroff 1989; Moody 2004).
Figure 1. Using RIVVIZ to visualize a trend in the publication frequency of research outputs in the “J” (journal) category of the Information Register of R&D Results for the discipline “Philosophy and Religion” [note: the data are only a sample used in the development version]
The target group of the application are the researchers themselves. Namely, the textual module is intended to serve their needs by providing an overview of the trending topics in research or to identify institutions working on similar problems. The specialist user sub-group is envisaged to come from the fields focusing on social and other studies of science. The accessibility of visualized data and the simplicity of the interface can also attract journalists or other members of the public. The prospective users are also likely to be recruit from among the stakeholders in scientific policy-making and management who may wish to gain quick insights into the quantitatively assessed rates of output per research institutions or funding bodies.
The RIVVIZ application is developed in the R language and deployed on the R Server platform using the standard Shiny library. The data are imported from the publicly available repository of the Czech Research, Development and Innovation Information System. The internal setup is also fairly straightforward, relying predominately on the Tidyverse collection of packages, with ggplot2 library being the primary engine for visualization tasks. The underlying principles of the “grammar of graphics”(Wickham 2009) are particularly suitable for programming a user-oriented environment that allows for a control over a wide range of visualization parameters.
Giving the users more choices should help to make them more engaged with the application, although there is a trade-off between user-friendliness and complexity. Reasonable defaults can partially alleviate this dilemma. The user engagement will be important for the future application development (Galey and Ruecker 2010). In the case of visualization schemes, locking users in a single – no matter how aesthetically pleasing – perspective is problematic. The apparent self-explanatory style and transparent communication of images may draw attention away from the complex and multifaceted nature of the data by making some of their aspects more easily accessible than others (Drucker 2011).
- Drucker, J. (2011). Humanities Approaches to Graphical Display. Digital Humanities Quarterly (DHQ), 5(1).
- Galey A. and Ruecker, S. (2010). How a Prototype Argues. Literary and Linguistic Computing, 25 (4): 405-424.
- Good, B., Vermeulen, N., Tiefenthaler, B. and Arnold, E. (2015). Counting Quality? The Czech Performance-Based Research Funding System. Research Evaluation 24 (2): 91–105.
- Grančay, M., Vveinhardt, J. and Šumilo, Ē. (2017). Publish or Perish: How Central and Eastern European Economists Have Dealt with the Ever-Increasing Academic Publishing Requirements 2000–2015. Scientometrics 111 (3): 1813– 37.
- Leydesdroff, L. (1989). Words and Co-Words as Indicators of Intellectual Organization. Research Policy 18 (4): 209–223.
- Moody, J. (2004). The Structure of a Social Science Collaboration Network: Disciplinary Cohesion from 1963 to 1999. American Sociological Review 69 (2): 213–238.
- Sinclair, S., Rockwell, G. and the Voyant Tools Team. (2012). Voyant Tools (web application).
- Wickham, H. (2009). Ggplot2: Elegant Graphics for Data Analysis. Dordrecht: Springer.