Computational Models for Binaural Sound Source Localization and Sound Understanding
Li, Danfeng
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/80822
Description
Title
Computational Models for Binaural Sound Source Localization and Sound Understanding
Author(s)
Li, Danfeng
Issue Date
2003
Doctoral Committee Chair(s)
Levinson, Stephen E.
Department of Study
Electrical Engineering
Discipline
Electrical Engineering
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Computer Science
Language
eng
Abstract
As one of humans' primary sensors, the auditory system plays an important role in language acquisition. Computational models for binaural sound source localization and sound source understanding are proposed in this thesis. The models build a fundamental auditory system for a mobile robot that will automatically learn language through multisensory inputs and interaction with the external environment. A hypothesis-driven approach is followed for the localization model. Using only binaural inputs, it enables three-dimensional (3D) localization by combining multiple cues. Two binaural localization cues, interaural time differences (ITDs) and interaural intensity differences (IIDs), and one monoaural localization cue, spectral cues, are extracted from the input sounds. A Bayes rule-based hierarchical framework is applied for decision making. Simulations show the effectiveness of the model. A robust ITD estimation algorithm is introduced and implemented on the robot. Satisfactory results are achieved under real-world environments. A multimodal learning scheme is proposed with the aid of vision to realize autonomous learning for the 3D binaural localization. No human instructors need to be involved. A generic model is presented for sound source understanding. No labelled training data is required to build the model. A histogram is employed as the sound representation, where the time-varying characteristics of sound can be preserved. Histogram intersection is used as the similarity measurement between different sounds. The model is successfully applied to content-based audio information retrieval and automatic audio indexing systems.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.