Knowledge acquisition for natural language understanding

Lai, Tuan Manh

Knowledge acquisition for natural language understanding

Lai, Tuan Manh

This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.

Permalink

https://hdl.handle.net/2142/121297

Description

Title

Knowledge acquisition for natural language understanding

Author(s)

Lai, Tuan Manh

Issue Date

2023-05-24

Director of Research (if dissertation) or Advisor (if thesis)

Ji, Heng

Doctoral Committee Chair(s)

Ji, Heng

Committee Member(s)

Zhai, ChengXiang
Han, Jiawei
Bui, Trung H

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Natural Language Processing
Information Extraction
Deep Learning
Large Language Models

Abstract

Large neural models pretrained on vast volumes of text have achieved remarkable success in various natural language processing tasks. However, these models may still face challenges in knowledge-intensive tasks due to their training methods, which typically focus on learning directly from raw texts and do not incorporate existing linguistic resources or structured domain knowledge. This thesis aims to develop methods to effectively incorporate external knowledge into existing neural models to enhance their performance. We propose three novel approaches that offer varying levels of explicitness for incorporating external knowledge into neural models, accommodating a wide range of use cases and providing great flexibility. The first approach involves incorporating various types of domain knowledge from multiple sources into language models using lightweight adapter modules. For each knowledge source of interest, we train an adapter module to capture the knowledge in a self-supervised way. The knowledge encoded in the adapters can then be combined for downstream tasks using fusion layers. This approach provides an easy-to-use, implicit way of incorporating external knowledge. The second approach involves utilizing a retrieval system to retrieve relevant passages from a knowledge base, which can then be used to enhance an output generation model. To train the retrieval components, we use a novel method for generating pseudo-labels to avoid the need for collecting costly gold-standard retrieval labels. This approach offers a more explicit way of accessing and using external knowledge than the adapter-based approach and provides greater interpretability. Additionally, new knowledge can typically be added to the knowledge base without updating any parameters of the neural component. The third approach involves using entity linking to extract the exact part of a knowledge graph that is relevant to the task at hand. We then utilize graph neural networks to incorporate the extracted subgraph into the existing neural model. This approach provides an even more explicit way of incorporating external knowledge, allowing for fine-grained control over what knowledge to incorporate and offering even more interpretability. We demonstrate the effectiveness of our proposed methods on various knowledge-intensive natural language processing tasks, including biomedical information extraction and knowledge-grounded dialog. We show that incorporating external knowledge can help overcome the difficulty of learning domain-specific knowledge and enhance the model's efficiency and interpretability. Our methods also allow for natural updates and additions of external knowledge, providing a flexible and scalable way of enhancing large neural language models. Overall, our methods achieve state-of-the-art results on many benchmarks.

Graduation Semester

2023-08

Type of Resource

Thesis

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Knowledge acquisition for natural language understanding

Lai, Tuan Manh

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In