Study on speech emotion recognition based on deep learning

Guan, Haozhong

Study on speech emotion recognition based on deep learning

Guan, Haozhong

This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.

Permalink

https://hdl.handle.net/2142/117682

Description

Title

Study on speech emotion recognition based on deep learning

Author(s)

Guan, Haozhong

Issue Date

2022-12-05

Director of Research (if dissertation) or Advisor (if thesis)

Hasegawa-Johnson, Mark

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Keyword(s)

Speech emotion recognition
Convolution neural network
ResNet50

Abstract

Speech emotion recognition (SER) is closely related to human life, and has the potential to bring great changes and improvements to people's lives. The continuous development of artificial intelligence and SER will bring new breakthroughs to the field of human-machine interaction. Therefore, studying SER has extremely important theoretical value and research significance. In this thesis, the development status of speech emotion recognition is reviewed, and the existing problems and development challenges are pointed out. On the basis of summarizing the key technologies of speech emotion recognition, the speech emotion recognition model of ResNet50 CNN is constructed, and the recognition experiment and analysis are carried out. The main work is as follows: The speech emotion description model, the process of speech emotion recognition, the preprocessing of speech signals and the extraction method of emotion feature parameters are summarized. The time domain waveform and the spectrogram characteristics of different emotional speeches are analyzed, and the speech emotion recognition scheme combining the extraction of spectrogram and CNN is determined. In this thesis, a CNN model is constructed based on a residual network, which uses ResNet50 network and bottleneck block, and consists of 49 convolutional layers and one fully connected layer. The output is expressed as a linear superposition of nonlinear transformation by “shortcut connections” of residual network, which improves the problem of gradient disappearance or explosion in the process of back propagation, and makes the deep network get better training. Based on IEMOCAP and Emo-DB datasets, the efficient speech emotion recognition is realized. The results show that the recognition accuracies of the constructed ResNet50 CNN model for IEMOCAP and Emo-DB datasets are 69.12% and 85.92%, respectively. Compared with other deep learning models, the proposed ResNet50 CNN model is simple and efficient.

Graduation Semester

2022-12

Type of Resource

Thesis

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Study on speech emotion recognition based on deep learning

Guan, Haozhong

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In