Flexibility and Feedback in Digital Standards-Making: Unicode and the Rise of Emojis

S. E. Hackney (s.hackney@pitt.edu), University of Pittsburgh, United States of America

Background

The infrastructures that we use to navigate the world often become invisible as they become indispensable (Bowker and Star, 2000). However, critical examination of information systems is necessary to understand their implicit biases, and the ways that they invite some types of engagement and restrict others. Structures of power continue to be replicated in the ways that technologies are deployed in our lives (Noble, 2016; Tufekci, 2016), and the inability to access and assess the standards which make digital communication possible risks the uncritical perpetuation of those power structures (Drabinski, 2013). The moments of rupture, when an established system takes on a new facet with unintended consequences, can be an important moment of visibility, where we are able to reveal its ideological foundations, and the ways that its users adapt their own behaviors to it, or push back against its uncomfortable constraints (Raley, 2006; Marino, 2007). The introduction of emojis to the Unicode Standard, and their widespread adoption over the decade from 2006-2017 is one such moment of transition.

Scholars of standards and standardization argue that the input of users is necessary for a standard to meet the needs of those users (Foray, 1994), and while the process of adding content to the Unicode Standard remains rigid, the unicode.org website provides an explicit record of the development and evolution of the face that Unicode presents to its users, and is able to be read as a text which reveals the contemporary state of Unicode and the cultural ideologies which shape it.

Methodology

While major language- and script-based additions are made with each update to the Unicode Standard, my analysis focuses on changes to the unicode.org website, and its role as an intermediary document between the Consortium, the Standard itself, and everyday users. The introduction of emojis in various updates to the Standard has resulted in changes to the content and structure of the unicode.org website that reflect an increased engagement with end users, which I argue is the result of increased semantic value of emoji characters for the user 1 , as compared to an individual character in a language's written script. It is my intention, through this analysis, to describe the types of changes that happen to the governing body and public documents of Unicode as major changes happen to the Standard itself.

A timeline was created of the dates of major updates to the Unicode Standard since its introduction in 1991, using the official release dates for updates to the Unicode Standard as maintained by the Unicode Consortium. I cross-reference this document with the rollout of each new version by the major platforms 2 , with a particular emphasis on updates featuring new emoji characters, beginning with Unicode 6.0 in 2010 3 .

With this timeline in mind, I scraped the unicode.org domain using Python and the Beautiful Soup 4 library to collect the URLS of all the unique pages under the parent domain, as well as a table of links between those pages. This serves as a source-target list for the creation of a network visualization of the unicode.org domain, using the network visualization software Gephi. 5 This process is repeated using archived versions of the unicode.org site, available from the Internet Archive’s Wayback Machine 6 , resulting in several structural snapshots of the unicode.org website over time, which can then be overlaid and compared to one another to note particular areas of change within the site.

Additionally, using points of change within the site structure as a guide, I also collect and code page content data to reflect the type of changes made to those pages during each major update. This coding is done on two axes: The first labels each change as being content- or structure-based (eg. adding text or links to a page, respectively), and the second designates which aspect of the Standard and/or Consortium is being addressed by the change. Examples of this second type of labelling would be “Emoji,” “Membership,” “Meta-Documentation,” or “Language Scripts.” This coding is done in two phases— an initial survey of this data in order to formally create labelling categories, and then a closer examination of the updates to apply those labels.

Discussion and next steps

This research project addresses issues of digital infrastructure from a unique angle: one that considers the socially-constructed nature of technology, as well as the meta-narrative of maintenance and upkeep of a system that has become crucial to our ability to communicate in a digital world. Through analysis of the secondary documents relating to the Unicode Standard, it is possible to gain invaluable insights into the ways that knowledge is organized collectively and continuously, as well as the embedded values that shape who can access and influence that knowledge.

This case study will provide a foundation for more expansive examination of systems of digital infrastructure. It is a beginning point both for further analysis of the adoption and adaptation of Unicode (and emojis in particular), but also as a framework for examining other forms of scaffolding which uphold the content of digital spaces.


Appendix A

Bibliography
  1. Bowker, G. C., and Star , S. L. (2000). Sorting things out: Classification and its consequences . Cambridge: MIT Press.
  2. Drabinski, E. (2013). Queering the catalog: queer theory and the politics of correction. The Library Quarterly 83(2): 94-111. doi:10.1086/669547
  3. Foray, D . (1994). Users, standards and the economics of coalitions and committees. Information Economics and Policy , 6 (3): 269-293.
  4. Marino, M. C. (2007, December 4). Critical code studies. Electronic Book Review. Retrieved from http://electronicbookreview.com/thread/electropoetics/codology
  5. Noble, S.U. (2016). A future for intersectional black feminist technology studies. The Scholar & Feminist Online. 13.3 - 14.1. Retrieved from: http://sfonline.barnard.edu/traversing-technologies/safiya-umoja-noble-a-future-for-intersectional-black-feminist-technology-studies/0/
  6. Raley, R. (2006). Code.surface || Code.depth, Dichtung Digital. Retreived from http://www.dichtung-digital.org/2006/01/Raley/index.htm
  7. Tufekci, Z. (2016, June). Machine intelligence makes human morals more important. [Video file]. Retrieved from https://www.ted.com/talks/zeynep_tufekci_machine_intelligence_makes_human_morals_more_important
Notes
1.

A notable exception to this semantic shift is written Chinese, which is already a semantic-character-based language, as opposed to syllable- or alphabet-based, as are the rest of the world’s major languages. Thomas S. Mullaney gives a thorough historical analysis of the implication of this on text-encoding technologies in The Chinese Typewriter (MIT Press, 2017).

2.

https://unicode.org/emoji/format.html#col-vendor lists the major “vendors” of emojis, or platforms with proprietary visual displays of emojis. These vendors are Apple, Google, Twitter, Facebook, Facebook Messenger, Windows, and Samsung.

3.

While the first major batch of emojis were incorporated into Unicode in 2010, and the first official “Emoji 1.0” release was in 2015, work has been done within the standard since late 2006 to consider the addition and management of emoji-like characters within Unicode— hence the specific 2006-2017 emphasis of this research. (https://www.unicode.org/reports/tr51/#Introduction)

4.

https://www.crummy.com/software/BeautifulSoup/

5.

http://gephi.io

6.

https://web.archive.org/