Speech enhancement using deep dilated CNN

Qian, Kaizhi

Speech enhancement using deep dilated CNN

Qian, Kaizhi

Permalink

https://hdl.handle.net/2142/101644

Description

Title

Speech enhancement using deep dilated CNN

Author(s)

Qian, Kaizhi

Issue Date

2018-05-22

Director of Research (if dissertation) or Advisor (if thesis)

Hasegawa-Johnson, Mark

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Keyword(s)

speech enhancement
convolutional neural network
beamforming

Abstract

In recent years, deep learning has achieved great success in speech enhancement. However, there are two major limitations regarding existing works. First, the Bayesian framework is not adopted in many such deep-learning-based algorithms. In particular, the prior distribution for speech in the Bayesian framework has been shown useful by regularizing the output to be in the speech space, and thus improving the performance. Second, the majority of the existing methods operate on the frequency domain of the noisy speech, such as spectrogram and its variations. We propose a Bayesian speech enhancement framework, called BaWN (Bayesian WaveNet), which directly operates on raw audio samples. It adopts the recently announced WaveNet, which is shown to be effective in modeling conditional distributions of speech samples while generating natural speech. Experiments show that BaWN is able to recover clean and natural speech. Multi-channel speech enhancement with ad-hoc sensors has been a challenging task. Speech model guided beamforming algorithms are able to recover natural sounding speech, but the speech models tend to be oversimplified to prevent the inference from becoming too complicated. On the other hand, deep learning based enhancement approaches are able to learn complicated speech distributions and perform efficient inference, but they are unable to deal with variable number of input channels. Also, deep learning approaches introduce a lot of errors, particularly in the presence of unseen noise types and settings. We have therefore proposed an enhancement framework called DeepBeam, which combines the two complementary classes of algorithms. DeepBeam introduces a beamforming filter to produce natural sounding speech, but the filter coefficients are determined with the help of a monaural speech enhancement neural network. Experiments on synthetic and real-world data show that DeepBeam is able to produce clean, dry and natural sounding speech, and is robust against unseen noise.

Graduation Semester

2018-08

Type of Resource

text

Permalink

http://hdl.handle.net/2142/101644

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Speech enhancement using deep dilated CNN

Qian, Kaizhi

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In