End-to-end non-negative auto-encoders: a deep neural alternative to non-negative audio modeling

Venkataramani, Shrikant

End-to-end non-negative auto-encoders: a deep neural alternative to non-negative audio modeling

Venkataramani, Shrikant

Permalink

https://hdl.handle.net/2142/108451

Description

Title

End-to-end non-negative auto-encoders: a deep neural alternative to non-negative audio modeling

Author(s)

Venkataramani, Shrikant

Issue Date

2020-07-07

Director of Research (if dissertation) or Advisor (if thesis)

Smaragdis, Paris

Doctoral Committee Chair(s)

Smaragdis, Paris

Committee Member(s)

Singer, Andrew
Hasegawa-Johnson, Mark
Kim, Minje

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Non-negative autoencoders, non-negative matrix facttorization, source separation, end-to-end

Abstract

Over the last decade, non-negative matrix factorization (NMF) has emerged as one of the most popular approaches to modeling audio signals. NMF allows us to factorize the magnitude spectrogram to learn representative spectral bases that can be used for a wide range of applications. With the recent advances in deep learning, neural networks (NNs) have surpassed NMF in terms of performance. However, these NNs are trained discriminatively and lack several key characteristics like re-usability and robustness, compared to NMF. In this dissertation, we develop and investigate the idea of end-to-end non-negative autoencoders (NAEs) as an updated deep learning based alternative framework to non-negative audio modeling. We show that end-to-end NAEs combine the modeling advantages of non-negative matrix factorization and the generalizability of neural networks while delivering significant improvements in performance. To this end, we first interpret NMF as a NAE and show that the two approaches are equivalent semantically and in terms of source separation performance. We exploit the availability of sophisticated neural network architectures to propose several extensions to NAEs. We also demonstrate that these modeling improvements significantly boost the performance of NAEs. In audio processing applications, the short-time fourier transform~(STFT) is used as a universal first step and we design algorithms and neural networks to operate on the magnitude spectrograms. We interpret the sequence of steps involved in computing the STFT as additional neural network layers. This enables us to propose end-to-end processing pipelines that operate directly on the raw waveforms. In the context of source separation, we show that end-to-end processing gives a significant improvement in performance compared to existing spectrogram based methods. Furthermore, to train these end-to-end models, we investigate the use of cost functions that are derived from objective evaluation metrics as measured on waveforms. We present subjective listening test results that reveal insights into the performance of these cost functions for end-to-end source separation. Combining the adaptive front-end layers with NAEs, we propose end-to-end NAEs and show how they can be used for end-to-end generative source separation. Our experiments indicate that these models deliver separation performance comparable to that of discriminative NNs, while retaining the modularity of NMF and the modeling flexibility of neural networks. Finally, we present an approach to train these end-to-end NAEs using mixtures only, without access to clean training examples.

Graduation Semester

2020-08

Type of Resource

Thesis

Permalink

http://hdl.handle.net/2142/108451

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

End-to-end non-negative auto-encoders: a deep neural alternative to non-negative audio modeling

Venkataramani, Shrikant

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In