Withdraw
Loading…
Training a massively multimodal transformer on YouTube data: pre-training and parameter efficient fine-tuning on HPC infrastructure
Day, Kastan Vrabel
Loading…
Permalink
https://hdl.handle.net/2142/120172
Description
- Title
- Training a massively multimodal transformer on YouTube data: pre-training and parameter efficient fine-tuning on HPC infrastructure
- Author(s)
- Day, Kastan Vrabel
- Issue Date
- 2023-05-04
- Director of Research (if dissertation) or Advisor (if thesis)
- Kindratenko, Volodymyr
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- AI
- ML
- LLMs
- Multimodal Transformers
- PEFT
- RLHF
- RLAIF
- Abstract
- In machine learning, the widespread adoption of pre-trained large language models (LLMs) across many domains has disrupted conventional wisdom on the right model to use for the job. This work investigates the advantages of pre-training large language models from scratch over fine-tuning existing ones in specific scenarios, such as exploring the importance of custom tokenizers for domain-specific applications, addressing information leakage concerns, and examining the use of LLMs in non-traditional applications such as time-series forecasting. When pre-training is unnecessary, this work argues that select parameter efficient fine-tuning (PEFT) methods are strictly superior to traditional-fine tuning for data and computational efficiency and should be preferred in nearly all cases. Furthermore, after PEFT, it is ideal to further sculpt the outputs of one’s LLM with Reinforcement Learning with Human Feedback (RLHF). This work argues that reinforcement learning (RL), rather than any form of supervised fine tuning (SFT), is preferable to achieve truthfulness without hallucination. Practitioners should seek to leverage the benefits of RLHF via RL with AI feedback (RLAIF), an effective, fast, and economical alternative to human feedback that retains the benefits of reward modeling for factuality. Additionally, the paper discusses the three most successful learning objectives in multimodal transformers and the challenges they face in aligning distinct embedding spaces. I present my own model Video Pre-trained Transformer: A Multimodal Mixture of Pre-Trained Experts for video question answering tasks as benchmarked against VQAv2. The importance of modern ML- first databases and filesystems is explored in the context of multimodal, multi-model AI systems for fast and flexible data throughput on HPC and multi-cloud systems. Together, the paper represents the state of open-source LLMs and opportunities for researchers and practitioners to combine the attractive properties of PEFT, RLAIF, and multi-modal transformers that paves the way for the next few years of growth in AI capabilities.
- Graduation Semester
- 2023-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2023 Kastan Day
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…