Withdraw
Loading…
Removing redundancy in speech by modeling forward masking
Xie, Song
Loading…
Permalink
https://hdl.handle.net/2142/46743
Description
- Title
- Removing redundancy in speech by modeling forward masking
- Author(s)
- Xie, Song
- Issue Date
- 2014-01-16T18:00:56Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Allen, Jont B.
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Speech Recognition
- Forward Masking
- Perceptual Feature
- Abstract
- Researchers have been working on fundamental phoneme decoding since the 1920s. There are two different approaches to this, the first being automatic speech recognition (ASR). Even though many state-of-the-art analysis methods, such as LPC, STFT, and MFCC, have been used in speech recognition, the performance of ASR has reached a plateau and the speech decoding problem remains unresolved. Recently, the Human Speech Recognition group (HSR) of the University of Illinois conducted research aimed at improving our understanding of human speech recognition (HSR). Based on the Articulation Index theory, they first developed a tool named the AI-gram which can display the audible components of speech sounds. Using this tool, they discovered that a small set of speech features in the AI-gram which they named primary cues can account for the speech sound identification. And they proposed a method which they called the three-dimensional deep search (3DDS), to extract those primary cues. However, speech masking, especially forward masking, should be considered and modeled when we extract the primary cues. In this thesis, we propose a forward masking model and integrate it into the AI-gram. The forward masking model consists of an RC feedback loop, a comparison operator and a delay. For every speech input, our model will multiply it with a frequency dependent gain map which represents the current status of the cochlea outer hair cells (OHC) to obtain the output. This gain map modifies the AI-gram according to the forward masking model. We conduct two simulations to verify the model. In the first simulation, we modify speech sounds according the forward masking model at SNR=12 dB. In the second experiment, we modify the f103 /tA/ at SNR= 15, 6, 0, -3 dB. What we observe in these simulations is that, while onsets are preserved, a large amount of energy in the AI-gram is removed. We then listen and compare the original and modified speech sounds. The result shows that there are only subtle differences in quality of the modified sounds. The obvious conclusion is that the forward masking model is doing a good job at removing the masked speech features. One might logically conclude from these simulations that the FM model is removing redundancy in the AI-gram that is naturally masked by the cochlea.
- Graduation Semester
- 2013-12
- Permalink
- http://hdl.handle.net/2142/46743
- Copyright and License Information
- Copyright 2013 Song Xie
Owning Collections
Dissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer EngineeringGraduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…