Withdraw
Loading…
End-to-end modeling for code-switching automatic speech recognition
Zhang, Feiyu
This item's files can only be accessed by the System Administrators group.
Permalink
https://hdl.handle.net/2142/124616
Description
- Title
- End-to-end modeling for code-switching automatic speech recognition
- Author(s)
- Zhang, Feiyu
- Issue Date
- 2024-05-01
- Director of Research (if dissertation) or Advisor (if thesis)
- Hasegawa-Johnson, Mark
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Speech recognition, code-switching, end-to-end, embeddings
- Abstract
- The end-to-end deep neural networks have been the start-of-the-art architecture for many tasks in the field of Automatic Speech Recognition (ASR). However, for code-switched speech, the persistent challenge of dataset scarcity is still a major problem. Given the difficulty in collecting code-switched corpus, it is noticeable that such deep neural network based systems is usually hard to reach high accuracy compared with mono-lingual ASR systems. In this study, we present an simple yet efficient end-to-end ASR system utilizing attention based encoder-decoder framework, specifically engineered to address the complexities of code-switched speech on a English and Mandarin code-switched dataset. To overcome the dataset constraints, our approach leverages attention mechanisms, enhancing the model's ability to focus on relevant linguistic features across different languages. We integrate BERT-multilingual and wav2vec 2.0 models to enrich the system's language understanding and acoustic processing capabilities. These integrations allow the model to capture nuanced language variations and phonetic subtleties inherent in code-switched speech. The results indicate a relatively low Mixed Error Rate (MER), demonstrating the model's effectiveness in decoding complex code-switched speech. Our findings shows that combining neural network architectures with sophisticated language models improves ASR systems' adaptability in multilingual settings. We also discuss the potential of incorporating syntax knowledge into language models to leverage linguistic information.
- Graduation Semester
- 2024-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2024 Feiyu Zhang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…