Withdraw
Loading…
Exposing the hidden vocal channel: Analysis of vocal expression
Pietrowicz, Mary B.
Loading…
Permalink
https://hdl.handle.net/2142/99356
Description
- Title
- Exposing the hidden vocal channel: Analysis of vocal expression
- Author(s)
- Pietrowicz, Mary B.
- Issue Date
- 2017-12-04
- Director of Research (if dissertation) or Advisor (if thesis)
- Karahalios, Karrie
- Hasegawa-Johnson, Mark
- Doctoral Committee Chair(s)
- Karahalios, Karrie
- Hasegawa-Johnson, Mark
- Committee Member(s)
- Hockenmaier, Julia
- McDonough, Jerome
- Cole, Jennifer S.
- Levow, Gina-Anne
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Signal processing
- Vocal expression
- Paralingual
- Emotion
- Voice quality
- Prosody
- Dimensional analysis
- Voice analytics
- Human-computer interaction (HCI)
- Perception
- Oral history
- Speech
- Speech processing
- Abstract
- This dissertation explored perception and modeling of human vocal expression, and began by asking what people heard in expressive speech. To address this fundamental question, clips from Shakespearian soliloquy and from the Library of Congress Veterans Oral History Collection were presented to Mechanical Turk workers (10 per clip); and the workers were asked to provide 1-3 keywords describing the vocal expression in the voice. The resulting keywords described prosody, voice quality, nonverbal quality, and emotion in the voice, along with the conversational style, and personal qualities attributed to the speaker. More than half of the keywords described emotion, and were wide-ranging and nuanced. In contrast, keywords describing prosody and voice quality reduced to a short list of frequently-repeating vocal elements. Given this description of perceived vocal expression, a 3-step process was used to model vocal qualities which listeners most frequently perceived. This process included 1) an interactive analysis across each condition to discover its distinguishing characteristics, 2) feature selection and evaluation via unequal variance sensitivity measurements and examination of means and 2-sigma variances across conditions, and 3) iterative, incremental classifier training and validation. The resulting models performed at 2-3.5 times chance. More importantly, the analysis revealed a continuum relationship across whispering, breathiness, modal speech, and resonance, and revealed multiple spectral sub-types of breathiness, modal speech, resonance, and creaky voice. Finally, latent semantic analysis (LSA) applied to the crowdsourced keyword descriptors enabled organic discovery of expressive dimensions present in each corpus, and revealed relationships among perceived voice qualities and emotions within each dimension and across the corpora. The resulting dimensional classifiers performed at up to 3 times chance, and a second study presented a dimensional analysis of laughter. This research produced a new way of exploring emotion in the voice, and of examining relationships among emotion, prosody, voice quality, conversation quality, personal quality, and other expressive vocal elements. For future work, this perception-grounded fusion of crowdsourcing and LSA technique can be applied to anything humans can describe, in any research domain.
- Graduation Semester
- 2017-12
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/99356
- Copyright and License Information
- Copyright 2017 Mary Pietrowicz
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…