Semi-supervised learning for acoustic and prosodic modeling in speech applications

Huang, Jui Ting

Semi-supervised learning for acoustic and prosodic modeling in speech applications

Huang, Jui Ting

Content Files

Huang_JuiTing.pdf

Permalink

https://hdl.handle.net/2142/32006

Description

Title

Semi-supervised learning for acoustic and prosodic modeling in speech applications

Author(s)

Huang, Jui Ting

Issue Date

2012-06-27T21:24:08Z

Director of Research (if dissertation) or Advisor (if thesis)

Hasegawa-Johnson, Mark A.

Doctoral Committee Chair(s)

Hasegawa-Johnson, Mark A.

Committee Member(s)

Cole, Jennifer S.
Huang, Thomas S.
Levinson, Stephen E.

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Date of Ingest

2012-06-27T21:24:08Z

Keyword(s)

Semi-Supervised Learning
Speech Recognition
Acoustic Modeling
Prosodic Modeling

Abstract

Enormous amounts of audio recordings of human speech are essential ingredients for building reliable statistical models for many speech applications, such as automatic speech recognition and automatic prosody detection. However, most of these speech data are not being utilized because they lack transcriptions. The goal of this thesis is to use untranscribed (unlabeled) data to improve the performance of models trained using only transcribed (labeled) data. We propose a unified semi-supervised learning framework for the problem of phone classification, phone recognition and prosody detection. The proposed approach will be particularly useful in the case where recognition performance is limited by the amount of transcribed data. In the first part of the thesis, we investigate semi-supervised training of Gaussian Mixtures Models (GMMs) and Hidden Markov Models (HMMs) which are the common probabilistic models of acoustic features in a state-of-the-art continuous density HMM based speech recognition system. Specifically, a family of semi-supervised training criteria that reflects reasonable assumptions about labeled and unlabeled data is proposed. Both generative and discriminative kinds of training criteria are explored, and one important proposal of this thesis is to keep the power of discriminative training criteria by using some measures on unlabeled data as regularization to the supervised training objective. Methods are described for the optimization of these criteria, and phone classification experiments show that these criteria reliably give improvements over their supervised versions that use only labeled data. We then extend the proposed semi-supervised training criteria to the phone recognition problem. This problem is novel in the area of semi-supervised learning because there is little research on the use of unlabeled data in the sequence labeling problems. We develop lattice-based approaches for the model optimization that involves both transcribed and untranscribed speech utterances. Experiments for phone recognition show that a maximum mutual information criterion regularized by negative conditional entropy measured using unlabeled data reliably gives better results than other semi-supervised training methods. In the second part of the thesis, we propose to exploit unlabeled data for the task of automatic prosodic event detection. Prosody annotation is even harder to obtain than orthographic text transcription; it usually requires the expert knowledge of phonetics and linguistics. Therefore, we aim at reducing the annotation efforts for building an automatic prosodic event detector. We show that the mixture model has the ability of class discovery when labeled data are available from only one of the two classes and develop the learning algorithm for unsupervised prosodic boundary detection.

Graduation Semester

2012-05

Permalink

http://hdl.handle.net/2142/32006

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Semi-supervised learning for acoustic and prosodic modeling in speech applications

Huang, Jui Ting

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In