Pronunciation modeling for large vocabulary speech recognition

Kantor, Arthur

Pronunciation modeling for large vocabulary speech recognition

Kantor, Arthur

Content Files

Kantor_Arthur.pdf

Permalink

https://hdl.handle.net/2142/18276

Description

Title

Pronunciation modeling for large vocabulary speech recognition

Author(s)

Kantor, Arthur

Issue Date

2011-01-14T22:43:36Z

Director of Research (if dissertation) or Advisor (if thesis)

Hasegawa-Johnson, Mark A.

Doctoral Committee Chair(s)

Hasegawa-Johnson, Mark A.

Committee Member(s)

Roth, Dan
Fleck, Margaret M.
Livescu, Karen

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Date of Ingest

2011-01-14T22:43:36Z

Keyword(s)

automatic speech recognition (ASR)
Large-Vocabulary Continuous Speech Recognition (LVCSR)
Pronunciation modeling
Conversational speech recognition

Abstract

The large pronunciation variability of words in conversational speech is one of the major causes of low accuracy in automatic speech recognition (ASR). Many pronunciation modeling approaches have been developed to address this problem. Some explicitly manipulate the pronunciation dictionary as well as the set of the units used to define the pronunciations of words. Other approaches model the pronunciation implicitly by using long duration acoustical context to more accurately classify the spoken pronunciation unit. This thesis is a study of the relative ability of the acoustic and the pronunciation models to capture pronunciation variability in a nearly state of the art conversational telephone speech recognizer. Several methods are tested, each designed to improve the modeling accuracy of the recognizer. Some of the experiments result in a lower word error rate, but many do not, apparently because, in different ways, the accuracy gained by one part of the recognizer comes at the expense of accuracy lost or transferred from another part of the recognizer. Pronunciation variability is modeled with two approaches: from above with explicit pronunciation modeling and from below with implicit pronunciation modeling within the acoustic model. Both approaches make use of long duration context, explicitly by considering long-duration pronunciation units and implicitly by having the acoustic model consider long-duration speech segments. Some pronunciation models address the pronunciation variability problem by introducing multiple pronunciations per word to cover more variants observed in conversational speech. However, this can potentially increase the confusability between words. This thesis studies the relationship between pronunciation perplexity and the lexical ambiguity, which has informed the design of the explicit pronunciation models presented here.

Graduation Semester

2010-12

Permalink

http://hdl.handle.net/2142/18276

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Siebel School of Computer Science

Pronunciation modeling for large vocabulary speech recognition

Kantor, Arthur

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In