Non-speech Acoustic Event Detection Using Multimodal Information

Huang, Po-Sen

Non-speech Acoustic Event Detection Using Multimodal Information

Huang, Po-Sen

Permalink

https://hdl.handle.net/2142/29841

Description

Title

Non-speech Acoustic Event Detection Using Multimodal Information

Author(s)

Huang, Po-Sen

Issue Date

2012-02-06T20:21:11Z

Director of Research (if dissertation) or Advisor (if thesis)

Hasegawa-Johnson, Mark A.

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Date of Ingest

2012-02-06T20:21:11Z

Keyword(s)

Acoustic Event Detection
Optical Flow
Hidden Markov Models
Multistream Hidden Markov Models
Coupled Hidden Markov Models
Gaussian Mixture Models
Support Vector Machines
Sensor Fusion
Footstep Detection
Person Detection

Abstract

Non-speech acoustic event detection (AED) aims to recognize events that are relevant to human activities associated with audio information. Much previous research has been focused on restricted highlight events, and highly relied on ad-hoc detectors for these events. This thesis focuses on using multimodal data in order to make non-speech acoustic event detection and classification tasks more robust, requiring no expensive annotation. To be specific, the thesis emphasizes designing suitable feature representations for different modalities and fusing the information properly. Two cases are studied in this thesis: (1) Acoustic event detection in a meeting room scenario using single-microphone audio cues and single-camera visual cues. Non-speech event cues often exist in both audio and vision, but not necessarily in a synchronized fashion. We jointly model audio and visual cues in order to improve event detection using multistream HMMs and coupled HMMs (CHMM). Spatial pyramid histograms based on the optical flow are proposed as a generalizable visual representation that does not require training on labeled video data. In a multimedia meeting room non-speech event detection task, the proposed methods outperform previously reported systems leveraging ad-hoc visual object detectors and sound localization information obtained from multiple microphones. (2) Multimodal feature representation for person detection at border crossings. Based on phenomenology of the differences between humans and four-legged animals, we propose using enhanced autocorrelation pattern for feature extraction for seismic sensors, and an exemplar selection framework for acoustic sensors. We also propose using temporal pattens from ultrasonic sensors. We perform decision and feature fusion to combine the information from all three modalities. From experimental results, we show that our proposed methods improve the robustness of the system.

Graduation Semester

2011-12

Permalink

http://hdl.handle.net/2142/29841

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

Non-speech Acoustic Event Detection Using Multimodal Information

Huang, Po-Sen

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In