Multimodal Fusion With Applications to Audio -Visual Speech Recognition
Chu, Stephen Mingyu
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/80819
Description
Title
Multimodal Fusion With Applications to Audio -Visual Speech Recognition
Author(s)
Chu, Stephen Mingyu
Issue Date
2003
Doctoral Committee Chair(s)
Huang, Thomas S.
Department of Study
Electrical Engineering
Discipline
Electrical Engineering
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Computer Science
Language
eng
Abstract
Differences in the characteristics of the intermodal couplings in audio-visual speech recognition and in multichannel biometrics defy a universal fusion method for both applications. For audio-visual speech modeling, we propose a novel sensory fusion method based on the coupled hidden Markov models (CHMMs). The CHMM framework allows the fusion of two temporally coupled information sources to take place as an integral part of the statistical modeling process. An important advantage of the CHMM-based fusion method lies in its ability to model asynchronies between the audio and visual channels. We describe two approaches to carry out inference and learning in CHMMs. The first is an exact algorithm derived by extending the forward-backward procedure used in hidden Markov model (HMM) inference. The second method relies on the model transformation strategy that maps the state space of a CHMM onto the state space of a classic HMM, and therefore facilitates the development of sophisticated audio-visual speech recognition systems using existing infrastructures. For multichannel biometrics, we introduce a general formulation based on the late integration paradigm and address the environmental robustness issue through multichannel fusion. Based on this formulation, two effective approaches to carry out environment-adaptive decision fusion are developed: the environmental confidence weighting method and the optimal channel weighting method.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.