Learning shared semantic space for speech-to-text translation

Han, Chi

Learning shared semantic space for speech-to-text translation

Han, Chi

This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.

Permalink

https://hdl.handle.net/2142/120363

Description

Title

Learning shared semantic space for speech-to-text translation

Author(s)

Han, Chi

Issue Date

2023-04-13

Director of Research (if dissertation) or Advisor (if thesis)

Ji, Heng

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Keyword(s)

Speech-to-Text Translation
Natural Language Processing
Representation Learning

Abstract

End-to-end speech translation (ST) has far-reaching implications and numerous potential applications, making it an area of significant interest and impact. Despite its importance, ST has traditionally been treated as a separate task, failing to fully leverage the rapid ad- vancements in its closely related sibling - text machine translation (MT). This separation is due to the modality gap, which results from the different representations of text and audio inputs, rendering MT data and end-to-end models incompatible with their ST counterparts. In light of this challenge, we present Chimera, a novel approach designed to bridge the rep- resentation gap between these two modalities. Chimera achieves this by projecting audio and text features onto a common semantic representation, effectively unifying the MT and ST tasks. Consequently, Chimera enhances the performance on ST benchmarks, such as MuST-C and Augmented Librispeech, setting new state-of-the-art results. More specifically, Chimera attains a 27.1 BLEU score on the MuST-C EN-DE benchmark, improving the existing state-of-the-art by a substantial margin of +1.9 BLEU. Further experimental anal- yses substantiate that the shared semantic space indeed facilitates the exchange of common knowledge between the MT and ST tasks. We discovered identifiable semantic regions within the shared joint speech-text encoding space, highlighting the effective integration of both modalities. By plotting neural activation maps between parallel speech and text, we were able to visualize the convergence of semantic information, further demonstrating the success of our approach in bridging the modality gap and fostering a more robust understanding of the underlying linguistic structures. This finding paves the way for augmenting training resources across modalities and opens up new avenues for exploration in the field of speech translation.

Graduation Semester

2023-05

Type of Resource

Thesis

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Learning shared semantic space for speech-to-text translation

Han, Chi

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In