Removing redundancy in speech by modeling forward masking

Xie, Song

Removing redundancy in speech by modeling forward masking

Xie, Song

Permalink

https://hdl.handle.net/2142/46743

Description

Title

Removing redundancy in speech by modeling forward masking

Author(s)

Xie, Song

Issue Date

2014-01-16T18:00:56Z

Director of Research (if dissertation) or Advisor (if thesis)

Allen, Jont B.

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Keyword(s)

Speech Recognition
Forward Masking
Perceptual Feature

Abstract

Researchers have been working on fundamental phoneme decoding since the 1920s. There are two different approaches to this, the first being automatic speech recognition (ASR). Even though many state-of-the-art analysis methods, such as LPC, STFT, and MFCC, have been used in speech recognition, the performance of ASR has reached a plateau and the speech decoding problem remains unresolved. Recently, the Human Speech Recognition group (HSR) of the University of Illinois conducted research aimed at improving our understanding of human speech recognition (HSR). Based on the Articulation Index theory, they first developed a tool named the AI-gram which can display the audible components of speech sounds. Using this tool, they discovered that a small set of speech features in the AI-gram which they named primary cues can account for the speech sound identification. And they proposed a method which they called the three-dimensional deep search (3DDS), to extract those primary cues. However, speech masking, especially forward masking, should be considered and modeled when we extract the primary cues. In this thesis, we propose a forward masking model and integrate it into the AI-gram. The forward masking model consists of an RC feedback loop, a comparison operator and a delay. For every speech input, our model will multiply it with a frequency dependent gain map which represents the current status of the cochlea outer hair cells (OHC) to obtain the output. This gain map modifies the AI-gram according to the forward masking model. We conduct two simulations to verify the model. In the first simulation, we modify speech sounds according the forward masking model at SNR=12 dB. In the second experiment, we modify the f103 /tA/ at SNR= 15, 6, 0, -3 dB. What we observe in these simulations is that, while onsets are preserved, a large amount of energy in the AI-gram is removed. We then listen and compare the original and modified speech sounds. The result shows that there are only subtle differences in quality of the modified sounds. The obvious conclusion is that the forward masking model is doing a good job at removing the masked speech features. One might logically conclude from these simulations that the FM model is removing redundancy in the AI-gram that is naturally masked by the cochlea.

Graduation Semester

2013-12

Permalink

http://hdl.handle.net/2142/46743

Copyright and License Information

Owning Collections

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Removing redundancy in speech by modeling forward masking

Xie, Song

Permalink

Description

Owning Collections

Dissertations and Theses - Electrical and Computer Engineering

Graduate Dissertations and Theses at Illinois PRIMARY

Log In