Withdraw
Loading…
Automated taxonomy discovery and exploration
Shen, Jiaming
Loading…
Permalink
https://hdl.handle.net/2142/113996
Description
- Title
- Automated taxonomy discovery and exploration
- Author(s)
- Shen, Jiaming
- Issue Date
- 2021-12-02
- Director of Research (if dissertation) or Advisor (if thesis)
- Han, Jiawei
- Doctoral Committee Chair(s)
- Han, Jiawei
- Committee Member(s)
- Ji, Heng
- Zhai, ChengXiang
- Vanni, Michelle T.
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Taxonomy
- Data Mining
- Natural Language Processing
- Abstract
- In an era of information explosion, people are inundated with vast amounts of text data. Every day, there are thousands of scientific papers, tens of thousands of news articles, corporate reports, and millions of social media posts produced and shared worldwide. Turning those massive text data into actionable knowledge is an essential research issue in data science and lays the foundation for realizing machine intelligence. The goal of my research is to unleash hidden knowledge buried in unstructured text. To bring this vision to reality, I propose to first structure raw text using taxonomies and then analyze structured text in a more fine-grained and semantic way. Due to the diversity of application scenarios, different corpora or different use cases may call for different taxonomies. For example, one analyst aiming to find experts in different scientific areas may want a field-of-study taxonomy, while another analyst who studies the technology readiness may call for a taxonomy capturing technology dependencies. Moreover, even within one taxonomy, we also enable users to organize concepts at their will, such as with different levels containing concepts of different categories. For instance, in a computer science taxonomy, top levels could be about the field of studies, intermediate levels may discuss research tasks, and the bottom levels can cover evaluation metrics. Asking human experts to manually curate those taxonomies, one for every possible application, is time-consuming, costly, and unscalable. Therefore, we propose to automatically discover and explore taxonomies based on the datasets and applications, with critical but minimal human guidance. This thesis outlines a data-driven approach that automatically constructs, enriches, and applies taxonomies for unleashing knowledge from massive unstructured text. Particularly, we investigate four areas of research, including: (1) Identifying Concept Sets. To obtain concept nodes in the taxonomy, we first develop a collection of concept set expansion methods [1, 2] to extract concepts from text corpora by expanding a small set of seed concepts into a complete list of concepts that belong to the same semantic class. (2) Recognizing Taxonomic Relations. To organize the above-identified concepts into a hierarchical structure, we propose a set of taxonomy construction methods [3, 4] to discover taxonomic relations among concepts by analyzing example relation instances (i.e., concept pairs indicating the target relation semantics) and utilizing distant supervision from existing, open-domain knowledge bases. (3) Enriching Existing Taxonomies. As human knowledge is constantly growing, a static taxonomy may fail to capture emerging user needs. Thus, a taxonomy enrichment step would be essential to keep our taxonomies up-to-date in real-world applications. We facilitate this process by expanding the taxonomy to incorporate new concepts [5, 6, 7]. (4) Empowering Knowledge-centric Applications. After an up-to-date taxonomy is obtained, we develop principled methods to distill knowledge from taxonomies for downstream applications such as text categorization [8, 9] and intelligent literature search [10, 11]. Finally, we explore how to incorporate event knowledge into the taxonomy by automatically detecting event types from a given corpus. Together, these pieces constitute an integrated framework for leveraging taxonomies to convert massive text data into actionable knowledge.
- Graduation Semester
- 2021-12
- Type of Resource
- Thesis
- Permalink
- http://hdl.handle.net/2142/113996
- Copyright and License Information
- Copyright 2021 Jiaming Shen
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…