FSM-Based Pronunciation Modeling using Articulatory Phonological Code

Hu, Chi

FSM-Based Pronunciation Modeling using Articulatory Phonological Code

Hu, Chi

Permalink

https://hdl.handle.net/2142/16726

Description

Title

FSM-Based Pronunciation Modeling using Articulatory Phonological Code

Author(s)

Hu, Chi

Issue Date

2010-08-20T17:56:03Z

Director of Research (if dissertation) or Advisor (if thesis)

Hasegawa-Johnson, Mark A.

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Date of Ingest

2010-08-20T17:56:03Z

Keyword(s)

articulatory phonology
speech production
speech gesture
finite state machine

Abstract

"According to articulatory phonology, the gestural score is an invariant speech representation. Though the timing schemes, i.e., the onsets and offsets, of the gestural activations may vary, the ensemble of these activations tends to remain unchanged, informing the speech content. ""Gestural pattern vector"" (GPV) has been proposed to encode the instantaneous gestural activations that exist across all tract variables at each time. Therefore, a gestural score with a particular timing scheme can be approximated using a GPV sequence. In this work, we propose a pronunciation modeling method that uses a finite state machine (FSM) to represent the invariance of a gestural score. Given the ""canonical"" gestural score of a word with a known activation timing scheme, the plausible activation onsets and offsets are recursively generated and encoded as a weighted FSM. An empirical measure is used to prune out gestural activation timing schemes that deviate too much from the ""canonical"" gestural score. Speech recognition is achieved by matching the recovered gestural activations to the FSM-encoded gestural scores of different speech contents. In particular, the observation distribution of each GPV is modeled by an artificial neural network and Gaussian mixture tandem model. These models are used together with the FSM-based pronunciation models in a Bayesian framework. We carry out pilot word classification experiments using synthesized data from one speaker. The proposed pronunciation modeling achieves over 90% accuracy for a vocabulary of 139 words with no training observations, outperforming direct use of the ""canonical"" gestural score."

Graduation Semester

2010-08

Permalink

http://hdl.handle.net/2142/16726

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

FSM-Based Pronunciation Modeling using Articulatory Phonological Code

Hu, Chi

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In