Withdraw
Loading…
Concept and entity grounding using indirect supervision
Tsai, Chen-Tse
Loading…
Permalink
https://hdl.handle.net/2142/98336
Description
- Title
- Concept and entity grounding using indirect supervision
- Author(s)
- Tsai, Chen-Tse
- Issue Date
- 2017-07-06
- Director of Research (if dissertation) or Advisor (if thesis)
- Roth, Dan
- Doctoral Committee Chair(s)
- Roth, Dan
- Committee Member(s)
- Chang, Kevin
- Zhai, ChengXiang
- Mihalcea, Rada
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Wikification
- Entity linking
- Cross-lingual wikification
- Named entity recognition
- Indirect supervision
- Incidental supervision
- Entity disambiguation
- Concept disambiguation
- Abstract
- Extracting and disambiguating entities and concepts is a crucial step toward understanding natural language text. In this thesis, we consider the problem of grounding concepts and entities mentioned in text to one or more knowledge bases (KBs). A well-studied scenario of this problem is the one in which documents are given in English and the goal is to identify concept and entity mentions, and find the corresponding entries the mentions refer to in Wikipedia. We extend this problem in two directions: First, we study identifying and grounding entities written in any language to the English Wikipedia. Second, we investigate using multiple KBs which do not contain rich textual and structural information Wikipedia does. These more involved settings pose a few additional challenges beyond those addressed in the standard English Wikification problem. Key among them is that no supervision is available to facilitate training machine learning models. The first extension, cross-lingual Wikification, introduces problems such as recognizing multilingual named entities mentioned in text, translating non-English names into English, and computing word similarity across languages. Since it is impossible to acquire manually annotated examples for all languages, building models for all languages in Wikipedia requires exploring indirect or incidental supervision signals which already exist in Wikipedia. For the second setting, we need to deal with the fact that most KBs do not contain the rich information Wikipedia has; consequently, the main supervision signal used to train Wikification rankers does not exist anymore. In this thesis, we show that supervision signals can be obtained by carefully examining the redundancy and relations between multiple KBs. By developing algorithms and models which harvest these incidental signals, we can achieve better performance on these tasks.
- Graduation Semester
- 2017-08
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/98336
- Copyright and License Information
- Copyright 2017 Chen-Tse Tsai
Owning Collections
Dissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceGraduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…