Withdraw
Loading…
Open-vocabulary brain-to-text decoding via cross-modal transfer with large language models
Wang, Zhenhailong
Loading…
Permalink
https://hdl.handle.net/2142/120213
Description
- Title
- Open-vocabulary brain-to-text decoding via cross-modal transfer with large language models
- Author(s)
- Wang, Zhenhailong
- Issue Date
- 2023-03-13
- Director of Research (if dissertation) or Advisor (if thesis)
- Ji, Heng
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Large Language Models
- Brain-to-Text Decoding
- Multimodal Transfer Learning
- Abstract
- Recent research on transformer-based large language models and large-scale pretraining has shown unprecedented success in the field of natural language processing and beyond. A slew of previous work demonstrated that the rich knowledge embedded in large language models can be effective transferred across different modalities, to solve a wider range of discrinimative and generative tasks, e.g., vision-language understanding and robotic planning. In this thesis, we investigate an uncharted area of cross-modal transfer with large language models, decoding brain signals. State-of-the-art brain-to-text systems have achieved initial success in decoding language directly from brain signals using modern machine learning models such as recurrent neural networks. However, current approaches are limited to small closed vocabularies which are far from enough for natural communication. Additionally, most of the high-performing approaches require data from invasive devices (e.g., ECoG). To address the existing limitations, we extend the brain-to-text decoding problem to a more challenging open vocabulary setting, where the decoding is performed on the entire English vocabulary that is 100x larger than previous work. Furthermore, we decode from Electroencephalography (EEG) data which is non-invasive but more noisy. To handle the huge vocabulary size with a small amount of parallel EEG-text data, we propose a novel model leveraging pretrained large language models. The key idea is to learn a good brain-language alignment between the ``brain embedding space'' and the ``pretrained language embedding space'' so that we can directly leverage the power of a large language model, e.g., BART. Results show promising EEG-to-Text decoding performance with a BLEU-1 score of 40.1%. Moreover, in-depth analysis shows that the proposed model has strong scalability and robustness with respect to the data source, which is essential towards building a large-scale brain-language foundation model. To further demonstrate the effectiveness of the EEG-to-Text decoder, we propose a novel zero-shot plug-and-play framework and evaluate it on a challenging downstream task, EEG-based sentiment classification, where the task is to predict the sentence's sentiment based solely on the EEG signals. We show that our zero-shot framework can outperform fully-supervised baselines with a large margin. The code is made publicly available for research purpose at https://github.com/MikeWangWZHL/EEG-To-Text.
- Graduation Semester
- 2023-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2023 Zhenhailong Wang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…