Withdraw
Loading…
Learning with Incidental Supervision
Klementiev, Alexandre A.
Loading…
Permalink
https://hdl.handle.net/2142/14553
Description
- Title
- Learning with Incidental Supervision
- Author(s)
- Klementiev, Alexandre A.
- Issue Date
- 2010-01-06T16:11:58Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Roth, Dan
- Doctoral Committee Chair(s)
- Roth, Dan
- Committee Member(s)
- Forsyth, David A.
- Pereira, Fernando
- Zhai, ChengXiang
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- machine learning
- natural language processing
- prediction aggregation
- Abstract
- Recent technological advances have facilitated the collection and distribution of a plethora of increasingly diverse and complex data. Supervised learning has been able to provide the toolbox of choice for exploiting it to study and model numerous natural and social phenomena. These learning techniques typically require substantial amounts of training data in order to induce good solutions. However, generating annotation often places a significant burden on human experts, and makes supervised learning methods costly to apply. On the other hand, data itself often provides hints sufficient to induce high quality supervision and utilizing these hints can be substantially less labor intensive than producing explicit annotation. This thesis introduces a framework we call Learning with Incidental Supervision, which formalizes these concepts. In particular, we show that various aspects of the data often contain cues capable of inducing weak supervision signals, which could in turn be aggregated to produce high quality annotation. We examine both the derivation of these signals and aggregation of their predictions in the context of concrete learning tasks, making independent contributions in both cases. We use the task of Named Entity Discovery to demonstrate that inherent properties of unsupervised multilingual data readily available online can be used to derive multiple weak supervision signals capable of inducing named entity annotation in a new language. We show that combining these signals can substantially improve the resulting annotation. Next, we introduce a general unsupervised learning framework for aggregating predictions from multiple weak supervision sources in order to induce high quality annotation. We exploit agreement between the signals to estimate their relative quality and learn an effective aggregation model. The mathematical and algorithmic aggregation framework can in principle be applied to combining arbitrary types of predictions, and has a large number of applications on its own. We instantiate it and demonstrate its effectiveness for combining permutations, top-k lists, and dependency parses.
- Graduation Semester
- 2009-12
- Permalink
- http://hdl.handle.net/2142/14553
- Copyright and License Information
- Copyright 2009 Alexandre Klementiev
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…