Withdraw
Loading…
Weakly-supervised text classification
Meng, Yu
Loading…
Permalink
https://hdl.handle.net/2142/104867
Description
- Title
- Weakly-supervised text classification
- Author(s)
- Meng, Yu
- Issue Date
- 2019-04-22
- Director of Research (if dissertation) or Advisor (if thesis)
- Han, Jiawei
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Text Classification
- Weakly-supervised Learning
- Neural Classification Model
- Hierarchical Classification
- Abstract
- Deep neural networks are gaining increasing popularity for the classic text classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural text classification models suffer from the lack of training data in many real-world applications. Although many semi-supervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this work, we propose a weakly-supervised framework that addresses the lack of training data in neural text classification. Our framework consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our framework has the flexibility to handle different types of weak supervision and can be easily integrated into existing deep neural models for text classification. Based on this framework, we propose two methods, WeSTClass and WeSHClass, for flat text classification and hierarchical text classification, respectively. We have performed extensive experiments on real-world datasets from different domains. The results demonstrate that our proposed framework achieves inspiring performance without requiring excessive training data and outperforms baselines significantly.
- Graduation Semester
- 2019-05
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/104867
- Copyright and License Information
- Copyright 2019 Yu Meng
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…