Withdraw
Loading…
Speech classification and lexical semantic modeling via self-supervision and knowledge transfer
Harvill, John Bowman
Loading…
Permalink
https://hdl.handle.net/2142/124335
Description
- Title
- Speech classification and lexical semantic modeling via self-supervision and knowledge transfer
- Author(s)
- Harvill, John Bowman
- Issue Date
- 2024-04-22
- Director of Research (if dissertation) or Advisor (if thesis)
- Hasegawa-Johnson, Mark
- Doctoral Committee Chair(s)
- Hasegawa-Johnson, Mark
- Committee Member(s)
- Ahuja, Narendra
- Hockenmaier, Julia
- Shomorony, Ilan
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Self-supervision, knowledge transfer, speech classification, lexical semantics
- Abstract
- The field of speech and natural language processing has experienced dramatic progress over the past decade due to a major paradigm shift. Instead of using training data for a target task only, modern speech and text applications rely on pretraining as the first step. After pretraining a model on a certain task, or potentially multiple tasks, the knowledge that was learned can be transferred to a downstream task and lead to large performance gains. Given that labeled data is much more challenging to collect than raw speech or text, the most explosive growth in the field has come from discovering effective ways to perform pretraining in a self-supervised fashion. By cleverly manipulating a raw speech waveform or raw text, it is possible to learn an immense amount of information without requiring annotations from humans. In this dissertation, I explore several speech and text tasks that benefit from self-supervision and knowledge transfer. For speech, I demonstrate that for both the stutter detection and device arbitration problems, tailored self-supervised pretraining schemes can be developed that lead to significant performance gains compared to relying on labeled data only. For stutter detection, I propose the idea of creating artificial stuttered speech from healthy speech and using it for pretraining. I also show that knowledge of whether stuttering occurs somewhere within a window of several seconds of speech audio can be used to learn the location of stuttering to a much finer degree via multiple instance learning. For device arbitration, I show that contrastive learning and autoencoding can both create useful representations of acoustic information that improve the ability of an arbitration system to determine which voice assistant is closest to a user. In the text domain, I explore lexical semantic modeling, exemplification modeling, and Automatic Speech Recognition (ASR) error detection and correction. Similar to the speech tasks, I find that all text-based tasks can be improved via knowledge transfer, self-supervision, or a combination of the two. For lexical semantic modeling, I propose a graph-based solution and find that knowledge from many languages is required to perform well on any single language. For exemplification modeling, I propose an autoencoding technique that can effectively isolate information related to contextual meaning of a target polysemous word and generate new, diverse sentences using that word with the intended meaning. For ASR error detection and correction, I show that significant errors can be detected to a high degree of accuracy by combining knowledge from both a sentence-level semantic encoder and Large Language Model (LLM) and highlight the existence of statistical bias within correction and detection models.
- Graduation Semester
- 2024-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2024 John Harvill
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…