Withdraw
Loading…
Multi-channel multi-modal speech enhancement and separation
Xu, Zhongweiyang
Loading…
Permalink
https://hdl.handle.net/2142/122068
Description
- Title
- Multi-channel multi-modal speech enhancement and separation
- Author(s)
- Xu, Zhongweiyang
- Issue Date
- 2023-12-06
- Director of Research (if dissertation) or Advisor (if thesis)
- Roy Choudhury, Romit
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Speech Enhancement/Separation, Multi-channel, Multi-modal
- Abstract
- This thesis explores the topic of multi-channel multi-modal speech enhancement and separation, addressing the challenges associated with diverse acoustic scenarios and different modalities. The research is organized into three interconnected projects/chapters, each contributing to the same goal of improving speech separation and enhancement in various real-world contexts. The first project introduces a novel approach to speech separation utilizing binaural microphone inputs. The system autonomously separates speeches into distinct pre-defined spatial regions, adapting dynamically to different Head-Related Transfer Functions (HRTFs) for individual users by self-supervised fine-tuning. The second project focuses on speech separation and enhancement using microphone arrays integrated into augmented reality (AR) glasses. The system leverages the spatial information provided by the array to effectively separate and enhance speech, accommodating directions of speech arrivals from visual inputs. This project aims to improve the user experience in scenarios where hands-free, unobtrusive speech processing is crucial, such as augmented reality environments. The third project explores audio-visual speech separation by incorporating video data capturing human facial and lip movements. This multi-modal model significantly enhances speech separation performance by integrating visual cues into the audio processing pipeline. The model not only takes the speech mixture as input but also leverages facial expressions and lip movements, resulting in improved accuracy and robustness, especially in challenging acoustic conditions. Collectively, these projects contribute to the advancement of multi-channel multi-modal speech processing, offering solutions for scenarios ranging from personalized spatial audio processing to hands-free augmented reality applications. The findings and methodologies presented in this thesis show a few challenges and opportunities in the field, progressing for better multi-channel multi-modal speech enhancement and separation systems.
- Graduation Semester
- 2023-12
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2023 Zhongweiyang Xu
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…