Withdraw
Loading…
Semi-supervised learning for acoustic and prosodic modeling in speech applications
Huang, Jui Ting
Loading…
Permalink
https://hdl.handle.net/2142/32006
Description
- Title
- Semi-supervised learning for acoustic and prosodic modeling in speech applications
- Author(s)
- Huang, Jui Ting
- Issue Date
- 2012-06-27T21:24:08Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Hasegawa-Johnson, Mark A.
- Doctoral Committee Chair(s)
- Hasegawa-Johnson, Mark A.
- Committee Member(s)
- Cole, Jennifer S.
- Huang, Thomas S.
- Levinson, Stephen E.
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Semi-Supervised Learning
- Speech Recognition
- Acoustic Modeling
- Prosodic Modeling
- Abstract
- Enormous amounts of audio recordings of human speech are essential ingredients for building reliable statistical models for many speech applications, such as automatic speech recognition and automatic prosody detection. However, most of these speech data are not being utilized because they lack transcriptions. The goal of this thesis is to use untranscribed (unlabeled) data to improve the performance of models trained using only transcribed (labeled) data. We propose a unified semi-supervised learning framework for the problem of phone classification, phone recognition and prosody detection. The proposed approach will be particularly useful in the case where recognition performance is limited by the amount of transcribed data. In the first part of the thesis, we investigate semi-supervised training of Gaussian Mixtures Models (GMMs) and Hidden Markov Models (HMMs) which are the common probabilistic models of acoustic features in a state-of-the-art continuous density HMM based speech recognition system. Specifically, a family of semi-supervised training criteria that reflects reasonable assumptions about labeled and unlabeled data is proposed. Both generative and discriminative kinds of training criteria are explored, and one important proposal of this thesis is to keep the power of discriminative training criteria by using some measures on unlabeled data as regularization to the supervised training objective. Methods are described for the optimization of these criteria, and phone classification experiments show that these criteria reliably give improvements over their supervised versions that use only labeled data. We then extend the proposed semi-supervised training criteria to the phone recognition problem. This problem is novel in the area of semi-supervised learning because there is little research on the use of unlabeled data in the sequence labeling problems. We develop lattice-based approaches for the model optimization that involves both transcribed and untranscribed speech utterances. Experiments for phone recognition show that a maximum mutual information criterion regularized by negative conditional entropy measured using unlabeled data reliably gives better results than other semi-supervised training methods. In the second part of the thesis, we propose to exploit unlabeled data for the task of automatic prosodic event detection. Prosody annotation is even harder to obtain than orthographic text transcription; it usually requires the expert knowledge of phonetics and linguistics. Therefore, we aim at reducing the annotation efforts for building an automatic prosodic event detector. We show that the mixture model has the ability of class discovery when labeled data are available from only one of the two classes and develop the learning algorithm for unsupervised prosodic boundary detection.
- Graduation Semester
- 2012-05
- Permalink
- http://hdl.handle.net/2142/32006
- Copyright and License Information
- Copyright 2012 Jui Ting Huang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…