Concept and entity grounding using indirect supervision

Tsai, Chen-Tse

Concept and entity grounding using indirect supervision

Tsai, Chen-Tse

Permalink

https://hdl.handle.net/2142/98336

Description

Title

Concept and entity grounding using indirect supervision

Author(s)

Tsai, Chen-Tse

Issue Date

2017-07-06

Director of Research (if dissertation) or Advisor (if thesis)

Roth, Dan

Doctoral Committee Chair(s)

Roth, Dan

Committee Member(s)

Chang, Kevin
Zhai, ChengXiang
Mihalcea, Rada

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Wikification
Entity linking
Cross-lingual wikification
Named entity recognition
Indirect supervision
Incidental supervision
Entity disambiguation
Concept disambiguation

Abstract

Extracting and disambiguating entities and concepts is a crucial step toward understanding natural language text. In this thesis, we consider the problem of grounding concepts and entities mentioned in text to one or more knowledge bases (KBs). A well-studied scenario of this problem is the one in which documents are given in English and the goal is to identify concept and entity mentions, and find the corresponding entries the mentions refer to in Wikipedia. We extend this problem in two directions: First, we study identifying and grounding entities written in any language to the English Wikipedia. Second, we investigate using multiple KBs which do not contain rich textual and structural information Wikipedia does. These more involved settings pose a few additional challenges beyond those addressed in the standard English Wikification problem. Key among them is that no supervision is available to facilitate training machine learning models. The first extension, cross-lingual Wikification, introduces problems such as recognizing multilingual named entities mentioned in text, translating non-English names into English, and computing word similarity across languages. Since it is impossible to acquire manually annotated examples for all languages, building models for all languages in Wikipedia requires exploring indirect or incidental supervision signals which already exist in Wikipedia. For the second setting, we need to deal with the fact that most KBs do not contain the rich information Wikipedia has; consequently, the main supervision signal used to train Wikification rankers does not exist anymore. In this thesis, we show that supervision signals can be obtained by carefully examining the redundancy and relations between multiple KBs. By developing algorithms and models which harvest these incidental signals, we can achieve better performance on these tasks.

Graduation Semester

2017-08

Type of Resource

text

Permalink

http://hdl.handle.net/2142/98336

Copyright and License Information

Owning Collections

Dissertations and Theses - Computer Science

Dissertations and Theses from the Dept. of Computer Science

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Concept and entity grounding using indirect supervision

Tsai, Chen-Tse

Permalink

Description

Owning Collections

Dissertations and Theses - Computer Science

Graduate Dissertations and Theses at Illinois PRIMARY

Log In