Withdraw
Loading…
Exploiting language models for annotation-efficient knowledge discovery
Huang, Jiaxin
Loading…
Permalink
https://hdl.handle.net/2142/121985
Description
- Title
- Exploiting language models for annotation-efficient knowledge discovery
- Author(s)
- Huang, Jiaxin
- Issue Date
- 2023-11-17
- Director of Research (if dissertation) or Advisor (if thesis)
- Han, Jiawei
- Doctoral Committee Chair(s)
- Han, Jiawei
- Committee Member(s)
- Zhai, Chengxiang
- Abdelzaher, Tarek
- Gao, Jianfeng
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Text Mining
- Natural Language Processing
- Knowledge Extraction
- Language Models
- Abstract
- With tremendous amounts of texts across the Internet nowadays, it is incredibly difficult for people to manually seek for valuable knowledge from massive corpora, thus automatic knowledge acquisition systems are becoming highly desirable. Various text mining techniques are built for machines to perform text retrieval, concept understanding, commonsense reasoning, and question answering to solve for downstream tasks proposed by practitioners from different areas. Existing intelligent systems are mostly based on deep learning models that generally require enormous amounts of annotations in downstream domains, which is expensive and time-consuming. Furthermore, they tend to assume a pre-defined task and label space, and are hard to handle unseen tasks that contain new emerging concepts in new domains. For more challenging knowledge utilization tasks such as commonsense reasoning, simple annotations of the final answer is not sufficient to reflect the complex reasoning process for performing the task. My research aims to design text mining approaches for weakly-supervised knowledge extraction and utilization on domain-specific corpus, by leveraging the strong representation and generative power of pre-trained language models. Specifically, my work can be divided into the following three parts: 1. Seed-Guided Hierarchical Concept Organization. I build a systematic framework that takes user-given seed hierarchy, and constructs task-specific concept ontology from domain text corpora. Traditional ontologies often fall short in capturing user-specific interests and relations, leading to potential irrelevance in specialized domains. I introduce a fully automated approach to address semantic drift in entity set expansion and present a new framework for seed-guided topical taxonomy construction. 2. Fine-Grained Entity Extraction. Extracting key information from text corpora serve as foundational steps in text mining applications. I first provide a comprehensive overview of the few-shot learning methodologies for Named Entity Recognition and identify useful techniques, and then introduce a novel approach for Fine-Grained Entity Typing that harnesses the representation and generation capabilities of PLMs. 3. Knowledge-Guided Commonsense Reasoning. Entity knowledge can be used to enhance more complex user-oriented tasks such as commonsense reasoning that relies on dynamically utilizing the static knowledge. I design methods to bridge the gap between the inherent capabilities of Large Language Models (LLMs) and the human brain's metacognitive processes, offering insights into how LLMs can self-enhance their reasoning abilities without the need for supervised data.
- Graduation Semester
- 2023-12
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2023 Jiaxin Huang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…