Withdraw
Loading…
Regularization for dysarthric speech recognition and telemedicine applications
Harvill, John
Loading…
Permalink
https://hdl.handle.net/2142/115395
Description
- Title
- Regularization for dysarthric speech recognition and telemedicine applications
- Author(s)
- Harvill, John
- Issue Date
- 2022-04-12
- Director of Research (if dissertation) or Advisor (if thesis)
- Hasegawa-Johnson, Mark A
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Regularization
- Dysarthric Speech
- COVID-19 Detection
- Respiratory Rate Estimation
- Abstract
- A common problem encountered when training neural networks is that of overfitting, where a trained model makes high-quality predictions on training data but fails to generalize well to unseen samples. There exist many generic techniques for avoiding overfitting, called regularization, such as weight regularization, dropout, and noise injection. In this work, we explore domain and problem-specific techniques for data augmentation and pretraining, two other forms of regularization. For many tasks, one of the major challenges to avoiding overfitting is the relatively small amount of task-specific labeled data available. Data augmentation seeks to alleviate this problem by artificially creating more data that can be used during training. Although the generated data is not real, it has been shown empirically to improve model performance when used during the training process. Model pretraining is another technique that seeks to take advantage of a large amount of unlabeled data from the target domain that can be used to learn useful features. After pretraining, the model can be fine-tuned for the under-resourced task. The problems we explore in this work are Automatic Speech Recognition (ASR) of dysarthric speech, estimation of respiratory rate from breathing audio, and prediction of COVID-19 status from speech, breathing and cough audio. All three of these problems are relatively under-resourced compared to many other current deep learning problems and can uniquely benefit from task-specific regularization techniques. We explore the ability of voice conversion for use as an effective method to augment an existing dataset of dysarthric speech (UA Speech). After training a voice conversion system to transform healthy speech such that it sounds like dysarthric speech, we generate a large amount of artificial dysarthric speech data from healthy speech. We also propose a data augmentation technique for periodic data that permutes frequency channels that can be used to augment a small breathing audio dataset such that it can be used to train a neural network to estimate respiratory rate. Finally, we apply an audio pretraining technique to cough, speech, and breathing audio modalities for the prediction of COVID-19 status in patients. We find that pretraining improves performance on a small labeled dataset of COVID-19 positive and negative patients.
- Graduation Semester
- 2022-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2022 John Harvill
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…