Quechua Real Words: An Audiovisual Corpus of Expressive Quechua Ideophones

Jeremy Browne (jeremy_browne@byu.edu), Brigham Young University, United States of America y Janis Nuckolls (cvd6262@gmail.com), Brigham Young University, United States of America

Introduction

Ideophones, sometimes called “mimetics” (Akita, 2009) or “expressives” (Diffloth, 1976) are expressions that communicate sensory aspects of the physical word such as sound (i.e., onomatopoeia), movement, color, etc., or cognitive/emotional states (e.g., “ta-da” in English). Although most linguistics description of and inquiry into ideophones have focused on vocal expressions, gestures are integrated with ideophonic utterances in some languages. The analysis of these gestures and their symbolism may augment scholars’ understanding of the target language including how native speakers mentally represent their environment.

In this paper, we describe a web-based tool, Quechua Real Words, used by ideophonic linguists at [institution] to catalog and study multimedia representations of gestured ideophones as performed by native speakers of Pastaza Quichua. Research based on this tool is opening new understanding of the target language’s aesthetics, especially regarding the non-arbitrariness of gestured signs. We also discuss the relationship between this tool and other digital humanities efforts.

Project Background

The indigenous people of eastern Ecuador speak Pastaza Quichua (PQ), a dialect of Northern Quechua. Descended from the language of the Incan civilization, Quechua is still spoken by as many as 10 million people in the Andes region stretching from Ecuador in the north to Argentina in the south. In 2015 [second author], a linguistic professor at [institution] led a group of student researchers who spent one semester in Ecuador recording and appreciating the indigenous culture and language. This team videoed over a hundred hours of interviews with PQ speakers, including thousands of examples of PQ ideophonic gestures.

The team returned to [institution] baffled at the scope of archival work that lay between their raw footage and their research goals. In consultation with their [institution]’s Office of Digital Humanities, they constructed a WordPress-based website that facilitated their archival activities, accelerated their research, and opened their work to a global audience.

The Website

Quechua Real Words uses custom content types within WordPress and simple data entry forms that allow students and professors with little computer experience to record ideophones and link entries to specific segments of recorded videos. The project’s footage is hosted on YouTube for simplicity and accessibility, and the data entry form only requests the video segment’s URL and start and stop times.

Quechua Real Words ideophone recording form.

The entry form includes two other important features: First, the researcher may classify each ideophone by one or more “sensory modality” (e.g., color, haptic, movement, etc.). Second, each scholar—be they professor or student—may add their name to the list of the entry’s contributors.

Once an ideophone is saved, it immediately appears on two indexes: the list of all ideophones, and the list of ideophones by modality. The first index allows researchers to look up specific ideophones, while the second promotes synthetic exploration where relationships between apparently unrelated ideophones can be made clear.

Each ideophone page displays the pronunciation (in IPA format), definition and other information one would expect from a traditional dictionary entry. It also shows a text description of the ideophone’s paralinguistic qualities and one or more videos of native speakers expressing the ideophone in candid conversation. These videos are segments of longer YouTube videos, and the segments may be looped, paused, and replayed. (Such functionality is not native to YouTube’s standard embedded player, so the site’s video player is a custom JavaScript that connects to YouTube’s published API.)

An ideophone page from Quechua Real Words.

As insisted on by the supervising professor, each ideophone page displays a “How to Cite” section with a citation in the Linguistics Society of America’s preferred format. To recognize the collaborative nature of the website, the credited parties in the citation include everyone who contributed to the entry, even students.

Research Potential

During the first two years of its existence, [first author] used Quechua Real Words for research published in a special issue of the Canadian Journal of Linguistics ([second author], 2017), in three presentations at international conferences ([second author], 2015a; [second author], 2015b; [second author], 2014), and in two invited book chapters ([second author], in press; [second author], in press). Additionally, the website’s content will inform an upcoming monograph ([second author], in preparation).

These publications focus on contextually-rich methods of understanding PQ ideophones, comparing specific gestures and intonations between speakers and contexts, and discovering how the ideophones are integrated with—rather than distinct from—the language’s verbal aspects. As Akita and Tsujimura (2016) point out, the goal is to seek typological generalizations for ideophones rather than consider them in isolation. [Second author] seeks to extend these integrative studies and semantic generalizations beyond the vocal utterances into the gestured space.

Quechua Real Words as a Model for DH Collaboration

When [second author] proposed this website to the Office of Digital Humanities, [she/he] had little notion that it would lead to such a level of scholarly productivity. It was only as [she/he] saw how the site could function that [she/he] began to grasp its potential. Similarly, [first author], the digital humanists who crafted the website, overlooked its potential because, quite frankly, the technology behind Quechua Real Words is rudimentary for most DH centers.

Perhaps [first author]’s estimation was clouded by the fact that DH as a field has favored text-based literary analysis over multimedia research. Despite the work of the ARTeFACT project (Coartney & Wiesner, 2009) and a few others who have considered digital analysis of performing arts, DH has contributed much less to the analysis of video interactions, such as these ideophones, than it has to the analysis of written text. Garrard, Haigh, and de Jager (2011) demonstrate the status-quo for dealing with nonverbal communication in DH research: “…the recording and representation of various types of paralinguistic feature in transcription is somewhat idiosyncratic, and thus unreliable, suggesting that they should be removed in the interests of consistency.”

This lack of emphasis on paralinguistic and nonverbal communication is in spite of those features’ apparent value. “The nonverbal channel carries important information about emotional expressions… Systems that combine multiple modalities usually outperform single-modality systems in recognizing emotional” (Truong, Westerhof, Lamers, & de Jong, 2014). Unfortunately, even Truong et al. restricted their valuation of nonverbal channels to prosodic qualities such as timing and rhythm; they did not address issues of body language or gestures.

Regardless of why [first author] overlooked the website’s potential, [she/he] has since changed how [she/he] evaluates potential collaborative DH projects. [She/He] now focuses on evaluating the use of the tools, websites, and other resources [she/he] would develop relative to the target discipline rather than relative to the state of the art within DH. This new approach has already proven fruitful (first author, 2017).

Future Plans

While [second author] continues to leverage Quechua Real Words for [her/his] scholarship, [first author] has combed the DH literature to discover methods of extending the site’s capacity. One DH project that could contribute guidance to this project is the work of Paquette-Bigras and Forest (2014) who attempted to build a descriptive vocabulary for dance movements. A similar effort to construct a vocabulary for describing non-vocal expressions may reveal yet-unnoticed relationships between expressive gestures. This would require intense, non-automated markup of the gestures, but the Quechua Real Words website and the student-involved structure of [second author]’s courses would be facilitative. Such detailed modeling of the gestures would extend the modality-based clustering currently available on the website to include form-based clustering of the gestures.

Additionally, we are working with [institution’s library] to add Quechua Real Words to their federated search databases. This will increase the site’s discoverability by scholars and students throughout the world.


Appendix A

Bibliography
  1. Akita, K. 2009. A grammar of sound-symbolic words in Japanese: Theoretical approaches to iconic and indexical properties of mimetics. PhD Dissertation. Kobe University.
  2. Akita, K. & Tsujimura, N. 2016. “Mimetics”. In T. Kageyama and H. Kishimoto (eds), Handbook of Japanese Lexicon and Word Formation, 133–160. Berlin: Gruyter De Mouton.
  3. Coartney, J. S. & Wiesnet, S. L. (2009). Performance as digital text: Capturing signals and secret messages in a media-rich experience. Literary and Linguistic Computing, 24(2), pp. 153–160. https://doi.org/10.1093/llc/fqp012
  4. Diffloth, G. 1976. “Expressives in Semai” Oceanic Linguistics Special Publications
  5. No. 13, Austroasiatic Studies Part I, pp. 249-264
  6. Garrad, P., Haigh, A., & de Jager, C. (2011). Techniques for transcribers: assessing and improving consistency in transcripts of spoken language. Literary and Linguistic Computing, 26(4), pp. 389–405. https://doi.org/10.1093/llc/fqr018
  7. Paquette-Bigras, E. & Forest, D. (2014). A Vocabulary of the Aesthetic Experience for Modern Dance Archives. Paper presented at DH 2014, Lausanne, Switzerland.
  8. Truong, K. P., Westerhof, G. J., Lamers, S. M. A., de Jong, F. (2014). Towards modeling expressed emotions in oral history interviews: Using verbal and nonverbal signals to track personal narratives. Literary and Linguistic Computing, 29(4), pp. 621–636. https://doi.org/10.1093/llc/fqu041
  9. [The following references will be added following double-blind review:]
    [first author], 2017
    [second author], 2014
    [second author], 2015a
    [second author], 2015b
    [second author], 2017
    [second author], in press
    [second author], in press
    [second author], in preparation