Multimodal spoken unit discovery with paired and unpaired modalities
Wang, Liming
Loading…
Permalink
https://hdl.handle.net/2142/121497
Description
Title
Multimodal spoken unit discovery with paired and unpaired modalities
Author(s)
Wang, Liming
Issue Date
2023-07-12
Director of Research (if dissertation) or Advisor (if thesis)
Hasegawa-Johnson, Mark
Doctoral Committee Chair(s)
Hasegawa-Johnson, Mark
Committee Member(s)
Smaragdis, Paris
Schwing, Alexander
Fleck, Margaret
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
acoustic unit discovery
low-resource speech recognition
unsupervised speech recognition
multimodal learning
self-supervised learning
language acquisition
Abstract
This thesis addresses the challenge of low-resource speech recognition by formulating it as a multimodal learning problem. The goal is to build a multimodal spoken unit discovery system that does not require any textual transcripts. Instead, it leverages speech and semantically related, multimodal signals such as paired images, unpaired text and unpaired sign language videos. To this end, this thesis proposes several novel algorithms based on neural networks and probabilistic graphical models. Further, it provides theoretical insights and empirical evidence to validate the efficacy of multimodal signals for spoken unit discovery.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.