Withdraw
Loading…
Enforcing constraints for multi-lingual and cross-lingual speech-to-text systems
Ni, Junrui
Loading…
Permalink
https://hdl.handle.net/2142/113928
Description
- Title
- Enforcing constraints for multi-lingual and cross-lingual speech-to-text systems
- Author(s)
- Ni, Junrui
- Issue Date
- 2021-12-09
- Director of Research (if dissertation) or Advisor (if thesis)
- Hasegawa-Johnson, Mark Allan
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- speech-to-text
- multilingual
- cross-lingual
- end-to-end training
- Abstract
- The recent development of neural network-based automatic speech recognition (ASR) systems has greatly reduced the state-of-the-art phone error rates in several languages. However, when an ASR system trained on one language tries to recognize speech from another language, such a system usually fails, even when the two languages come from the same language family. The above scenario poses a problem for low-resource languages. Such languages usually do not have enough paired data for training a moderately-sized ASR model and thus require either cross-lingual adaptation or zero-shot recognition. Due to the increasing interest in bringing ASR technology to low-resource languages, the cross-lingual adaptation of end-to-end speech recognition systems has recently received more attention. However, little analysis has been done to understand how the model learns a shared representation across languages and how language-dependent representations can be fine-tuned to improve the system’s performance. We compare a bi-lingual CTC model with language-specific tuning at earlier LSTM layers to one without such tuning. This is to understand if having language-independent pathways in the model helps with multi-lingual learning and why. We first train the network on Dutch and then transfer the system to English under the bi-lingual CTC loss. After that, the representations from the two networks are visualized. Results showed that the consonants of the two languages are learned very well under a shared mapping but that vowels could benefit significantly when further language-dependent transformations are applied before the last classification layer. These results can be used as a guide for designing multilingual and cross-lingual end-to-end systems in the future. However, creating specialized processing units in the neural network for each training language could yield increasingly large networks as the number of training languages increases. It is also unclear how to adapt such a system to zero-shot recognition. The remaining work adapts two existing constraints to the realm of multi-lingual and cross-lingual ASR. The first constraint is cycle-consistent training. This method defines a shared codebook of phonetic tokens for all training languages. Input speech first passes through the speech encoder of the ASR system and gets quantized into discrete representations from the codebook. The discrete sequence representation is then passed through an auxiliary speech decoder to reconstruct the input speech. The framework constrains the reconstructed speech to be close to the original input speech. The second constraint is regret minimization training. It separates an ASR encoder into two parts: a feature extractor and a predictor. Regret minimization defines an additional regret term for each training sample as the difference between the losses of an auxiliary language-specific predictor with the real language I.D. and a fake language I.D. This constraint enables the feature extractor to learn an invariant speech-to-phone mapping across all languages and could potentially improve the model's generalization ability to new languages.
- Graduation Semester
- 2021-12
- Type of Resource
- Thesis
- Permalink
- http://hdl.handle.net/2142/113928
- Copyright and License Information
- Copyright 2021 Junrui Ni
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…