Multimodal spoken unit discovery with paired and unpaired modalities

Wang, Liming

Multimodal spoken unit discovery with paired and unpaired modalities

Wang, Liming

Permalink

https://hdl.handle.net/2142/121497

Description

Title

Multimodal spoken unit discovery with paired and unpaired modalities

Author(s)

Wang, Liming

Issue Date

2023-07-12

Director of Research (if dissertation) or Advisor (if thesis)

Hasegawa-Johnson, Mark

Doctoral Committee Chair(s)

Hasegawa-Johnson, Mark

Committee Member(s)

Smaragdis, Paris
Schwing, Alexander
Fleck, Margaret

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

acoustic unit discovery
low-resource speech recognition
unsupervised speech recognition
multimodal learning
self-supervised learning
language acquisition

Abstract

This thesis addresses the challenge of low-resource speech recognition by formulating it as a multimodal learning problem. The goal is to build a multimodal spoken unit discovery system that does not require any textual transcripts. Instead, it leverages speech and semantically related, multimodal signals such as paired images, unpaired text and unpaired sign language videos. To this end, this thesis proposes several novel algorithms based on neural networks and probabilistic graphical models. Further, it provides theoretical insights and empirical evidence to validate the efficacy of multimodal signals for spoken unit discovery.

Graduation Semester

2023-08

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/121497

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Multimodal spoken unit discovery with paired and unpaired modalities

Wang, Liming

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In