Speech interruption detection for live-streaming audio

Xin, Jin

Speech interruption detection for live-streaming audio

Xin, Jin

This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.

Permalink

https://hdl.handle.net/2142/110308

Description

Title

Speech interruption detection for live-streaming audio

Author(s)

Xin, Jin

Contributor(s)

Patel, Sanjay

Issue Date

2021-05

Keyword(s)

speech interruption detection
live streaming
support vector machine
k-nearest neighbor
multilayer perceptron
mean opinion score

Date of Ingest

2021-08-11T19:47:45Z

Abstract

Conversation is an important human activity. It happens between multiple persons when they start and end talking naturally. However, an interruption may occur when one speaker speaks over another speaker either intentionally or unintentionally. Frequent interruptions during conversation can significantly influence the experience and vastly decrease the efficiency of the conversation. Interruptions can happen more frequently in live-streamed audio calls with significant internet delays. Detection of interruption during conversation can be helpful for live-streaming companies who care about their quality of service. It can also be used for speech-to-text models for audio preprocessing and labeling and estimating conflict level in a debate. This project aims to assess the quality of the interrupted speech in live-streaming audios. The task of interruption detection was divided into two steps: generation of the simulated interrupted speech audio dataset and building machine learning models for interruption detection. The audio dataset was synthetically created by concatenating and overlapping speech audios with different interruption times and latency times. The performance of interruption detection was examined on the k-nearest neighbor classifier, the support vector machine classifier, and the multilayer perceptron model. Each model takes an array of the 0.5s audio segment as input and then predicts the existence of interrupted speech in each 0.5s segment. The result has shown that the SVM model appears to be very effective at detecting interrupted speeches in the audio of a conversation. It has an accuracy of 92.61% on cross-validation of training data and 72.62% on unseen data.

Type of Resource

text

Genre of Resource

dissertation/thesis

Language

Permalink

http://hdl.handle.net/2142/110308

Owning Collections

Senior Theses - Electrical and Computer Engineering PRIMARY

The best of ECE undergraduate research

Speech interruption detection for live-streaming audio

Xin, Jin

Permalink

Description

Owning Collections

Senior Theses - Electrical and Computer Engineering PRIMARY

Log In