Withdraw
Loading…
The benefits of acoustic perceptual information for speech processing systems
He, Di
Content Files

Loading…
Download Files
Loading…
Download Counts (All Files)
Loading…
Edit File
Loading…
Permalink
https://hdl.handle.net/2142/104888
Description
- Title
- The benefits of acoustic perceptual information for speech processing systems
- Author(s)
- He, Di
- Issue Date
- 2019-04-19
- Director of Research (if dissertation) or Advisor (if thesis)
- Chen, Deming
- Doctoral Committee Chair(s)
- Chen, Deming
- Committee Member(s)
- Hasegawa-Johnson, Mark
- Wong, Martin
- Lim, Boon Pang
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Date of Ingest
- 2019-08-23T20:00:08Z
- Keyword(s)
- ASR
- AED
- Acoustic Landmark
- Auditory Roughness
- IoT
- FPGA
- MTL
- CTC
- Abstract
- The frame-synchronized framework has dominated many speech processing systems, such as ASR and AED targeting human speech activities. These systems have little consideration for the science behind speech and treat the task as a simple statistical classification. The framework also assumes each feature vector to be equally important to the task. However, through some preliminary experiments, this study has found evidence that some concepts defined in speech perception theories such as auditory roughness and acoustic landmarks can act as heuristics to these systems and benefit them in multiple ways. Findings of acoustic landmarks hint that the idea of treating each frame equally might not be optimal. In some cases, landmark information can improve system accuracy through highlighting the more significant frames, or improve the acoustic model accuracy by training through MTL. Further investigation into the topic found experimental evidence suggesting that acoustic landmark information can also benefit end-to-end acoustic models trained through CTC loss. With the help of acoustic landmarks, CTC models can converge with less training data and achieve lower error rate. For the first time, positive results were collected on a mid-size ASR corpus (WSJ) for acoustic landmarks. The results indicate that audio perception information can benefit a broad range of audio processing systems.
- Graduation Semester
- 2019-05
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/104888
- Copyright and License Information
- Copyright 2019 Di He
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer EngineeringManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…