Learning with Incidental Supervision

Klementiev, Alexandre A.

Learning with Incidental Supervision

Klementiev, Alexandre A.

Content Files

Klementiev_Alexandre.pdf

Permalink

https://hdl.handle.net/2142/14553

Description

Title

Learning with Incidental Supervision

Author(s)

Klementiev, Alexandre A.

Issue Date

2010-01-06T16:11:58Z

Director of Research (if dissertation) or Advisor (if thesis)

Roth, Dan

Doctoral Committee Chair(s)

Roth, Dan

Committee Member(s)

Forsyth, David A.
Pereira, Fernando
Zhai, ChengXiang

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Date of Ingest

2010-01-06T16:11:58Z

Keyword(s)

machine learning
natural language processing
prediction aggregation

Abstract

Recent technological advances have facilitated the collection and distribution of a plethora of increasingly diverse and complex data. Supervised learning has been able to provide the toolbox of choice for exploiting it to study and model numerous natural and social phenomena. These learning techniques typically require substantial amounts of training data in order to induce good solutions. However, generating annotation often places a significant burden on human experts, and makes supervised learning methods costly to apply. On the other hand, data itself often provides hints sufficient to induce high quality supervision and utilizing these hints can be substantially less labor intensive than producing explicit annotation. This thesis introduces a framework we call Learning with Incidental Supervision, which formalizes these concepts. In particular, we show that various aspects of the data often contain cues capable of inducing weak supervision signals, which could in turn be aggregated to produce high quality annotation. We examine both the derivation of these signals and aggregation of their predictions in the context of concrete learning tasks, making independent contributions in both cases. We use the task of Named Entity Discovery to demonstrate that inherent properties of unsupervised multilingual data readily available online can be used to derive multiple weak supervision signals capable of inducing named entity annotation in a new language. We show that combining these signals can substantially improve the resulting annotation. Next, we introduce a general unsupervised learning framework for aggregating predictions from multiple weak supervision sources in order to induce high quality annotation. We exploit agreement between the signals to estimate their relative quality and learn an effective aggregation model. The mathematical and algorithmic aggregation framework can in principle be applied to combining arbitrary types of predictions, and has a large number of applications on its own. We instantiate it and demonstrate its effectiveness for combining permutations, top-k lists, and dependency parses.

Graduation Semester

2009-12

Permalink

http://hdl.handle.net/2142/14553

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Siebel School of Computer Science

Learning with Incidental Supervision

Klementiev, Alexandre A.

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In