Withdraw
Loading…
Knowledge representation and behavior understanding with pre-trained language models
Jiang, Minhao
Loading…
Permalink
https://hdl.handle.net/2142/124388
Description
- Title
- Knowledge representation and behavior understanding with pre-trained language models
- Author(s)
- Jiang, Minhao
- Issue Date
- 2024-05-01
- Director of Research (if dissertation) or Advisor (if thesis)
- Han, Jiawei
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Pre-trained Language Models
- Knowledge Representation
- Data Contamination
- Abstract
- The advent of pre-trained language models has marked a significant milestone in the realm of computational linguistics and data mining, showcasing remarkable performances across diverse domains. These models, endowed with vast training corpora and formidable capacity for semantic knowledge representation, have transformed the landscape of different downstream applications. Among the various applications, the automatic construction and completion of taxonomies have shown to be beneficial in enhancing numerous downstream tasks and minimizing human labor in domain-specific taxonomy development. In this work, we first introduce a taxonomy completion framework that effectively leverages pre-trained language models to extract structural and semantic information from the existing taxonomy to significantly boost the performance of current taxonomy expansion and completion frameworks. On the other hand, even though the performances of pre-trained language models are very high in many datasets, the implications of data contamination during the pre-training stage of language models are still unclear in the current literature. Given their demonstrated prowess in enhancing task performance across diverse downstream applications, concerns arise regarding the authenticity of these capabilities, potentially inflated by the inadvertent inclusion of evaluation datasets within pre-training corpora. Through meticulous experimental investigation, this study endeavors to elucidate the effects of data contamination, emphasizing the imperative for more precise definitions and stringent methodologies to fortify LLMs against such vulnerabilities.
- Graduation Semester
- 2024-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2024 Minhao Jiang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…