Withdraw
Loading…
Towards trustworthy large language models
Wang, Boxin
Loading…
Permalink
https://hdl.handle.net/2142/121980
Description
- Title
- Towards trustworthy large language models
- Author(s)
- Wang, Boxin
- Issue Date
- 2023-11-15
- Director of Research (if dissertation) or Advisor (if thesis)
- Li, Bo
- Doctoral Committee Chair(s)
- Li, Bo
- Committee Member(s)
- Ji, Heng
- Zhai, Chengxiang
- Catanzaro, Bryan
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- large language model
- robustness
- privacy
- ethics
- toxicity
- trustworthiness
- Abstract
- In the recent era of artificial intelligence, Large Language Models (LLMs) have achieved unprecedented success in a wide range of Natural Language Processing (NLP) tasks, offering significant advancements in understanding and generating human-like text. However, with this remarkable progress, there are increasing concerns regarding their safety and reliability. Potential misbehaviors, vulnerabilities to adversarial attacks, ethical issues, and privacy leakage of sensitive data present significant challenges. This thesis embarks on an in-depth exploration of the trustworthiness of LLMs, encompassing facets of robustness, privacy, ethics, and comprehensive assessment. Initially setting the stage with foundational principles of trustworthy machine learning and NLP, we transition into the application sphere, identifying and dissecting vulnerabilities in existing LLMs through our novel targeted adversarial attack frameworks through diverse perturbation functions. In response to these vulnerabilities, we design the InfoBERT learning framework to improve robustness from an information-theoretic standpoint. This thesis then extends to the realm of privacy in LLMs, where our proposed method DataLens, leverages generative models and gradient sparsity to provide rigorous differential privacy guarantees. We also delve into federated learning to offer a new paradigm to ensure data privacy while training on-device models, by leveraging the existing public LLMs. Addressing ethical dimensions, we shine a spotlight on the detoxification of LLMs, ensuring their outputs align with acceptable societal norms. To rigorously evaluate LLM trustworthiness, we introduce the Adversarial GLUE benchmark, unearthing model vulnerabilities in models under challenging adversarial conditions. Additionally, we spotlight retrieval-augmented LMs, conducting a thorough study on the scalable pretrained retrieval-augmented model, Retro, and comparing its performance with standard models. This investigation reveals promising directions for future foundational models. Diving deeper into the trustworthiness assessment regime, we introduce DecodingTrust through a granular trustworthiness evaluation, specifically focusing on state-of-the-art LLMs, including GPT-4 and GPT-3.5. Through this deep-dive, we uncover latent misbehaviors, including susceptibility to generate biased outputs, potential data privacy leakage, and the nuanced challenges for state-of-the-art LLMs such as GPT-4. In summary, this thesis provides several key insights into the vulnerabilities inherent in existing LLMs and paves the way for next-generation LLMs that align with human values. The primary aim of this thesis is to advance the domain of trustworthy large language models, promoting the evolution and development of reliable and unbiased LLMs.
- Graduation Semester
- 2023-12
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2023 Boxin Wang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…