Multimodal semantic learning with context-correlated speeches and images
Wang, Liming
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/100040
Description
Title
Multimodal semantic learning with context-correlated speeches and images
Author(s)
Wang, Liming
Contributor(s)
Hasegawa-Johnson, Mark
Issue Date
2018-05
Keyword(s)
speech-to-image retrieval
multimodal learning
language acquisition
under-resourced automatic speech recognition
Abstract
Automatic speech recognition (ASR) technologies have been successfully applied
to most of the major languages in the world. However, ASR performs
poorly with under-resourced languages such as Mboshi because those languages suffer from a lack of standardized orthographies and/or manually
transcribed labels for training an ASR system. This work presents an unsupervised machine learning approach to help develop speech technology for
under-resourced languages. Our algorithm imitates the human early language acquisition (LA) process using speech and context-correlated images.
A comparison between different speech features was made and better features
than vanilla mel frequency cepstral coefficients (MFCC) were found for our
multimodal speech-to-image (Sp2Im) retrieval.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.