A theory of (almost) zero resource speech recognition

Bharadwaj, Sujeeth Subramanya

A theory of (almost) zero resource speech recognition

Bharadwaj, Sujeeth Subramanya

Permalink

https://hdl.handle.net/2142/78343

Description

Title

A theory of (almost) zero resource speech recognition

Author(s)

Bharadwaj, Sujeeth Subramanya

Issue Date

2015-03-31

Director of Research (if dissertation) or Advisor (if thesis)

Hasegawa-Johnson, Mark A.

Doctoral Committee Chair(s)

Hasegawa-Johnson, Mark A.

Committee Member(s)

Levinson, Stephen E.
Liang, Feng
Smaragdis, Paris

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Date of Ingest

2015-07-22T22:16:25Z

Keyword(s)

Speech recognition
Unsupervised learning
PAC-Bayesian theory
Language Modeling
Acoustic Event Detection
anomaly detection

Abstract

Automatic speech recognition has matured into a commercially successful technology, enabling voice-based interfaces for smartphones, smart TVs, and many other consumer devices. The overwhelming popularity, however, is still limited to languages such as English, Japanese, and German, where vast amounts of labeled training data are available. For most other languages, it is prohibitively expensive to 1) collect and transcribe the speech data required to learn good acoustic models; and 2) acquire adequate text to estimate meaningful language models. A theory of unsupervised and semi-supervised techniques for speech recognition is therefore essential. This thesis focuses on HMM-based sequence clustering and examines acoustic modeling, language modeling, and applications beyond the components of an ASR, such as anomaly detection, from the vantage point of PAC-Bayesian theory. The first part of this thesis extends standard PAC-Bayesian bounds to address the sequential nature of speech and language signals. A novel algorithm, based on sparsifying the cluster assignment probabilities with a Renyi entropy prior, is shown to provably minimize the generalization error of any probabilistic model (e.g. HMMs). The second part examines application-specific loss functions such as cluster purity and perplexity. Empirical results on a variety of tasks -- acoustic event detection, class-based language modeling, and unsupervised sequence anomaly detection -- confirm the practicality of the theory and algorithms developed in this thesis.

Graduation Semester

2015-5

Type of Resource

text

Permalink

http://hdl.handle.net/2142/78343

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

A theory of (almost) zero resource speech recognition

Bharadwaj, Sujeeth Subramanya

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In