Lipreading with convolutional and recurrent neural network models
Zhu, Tianyilin
Loading…
Permalink
https://hdl.handle.net/2142/97763
Description
Title
Lipreading with convolutional and recurrent neural network models
Author(s)
Zhu, Tianyilin
Issue Date
2017-04-24
Director of Research (if dissertation) or Advisor (if thesis)
Hasegawa-Johnson, Mark
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Lipreading
Convolutional neural network
Abstract
Lip reading is the process of speech recognition from solely visual information. The goal of this thesis is to perform a silence vs. speech classification, and to recognize the triphone spoken by a talking head, given only the video using neural network classification models.
Two neural network architectures are developed and tested on the AVICAR dataset, including one convolutional neural network (CNN) model with fully connected classification layer, and one recurrent neural network (RNN) model with convolutional layer and one long short-term memory (LSTM) layer to perform the classification on a sequence of input. In both models, the convolutional layers serve as feature extractors.
The performance of each model is experimentally evaluated and the detailed network structure and preprocessing pipeline are demonstrated.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.