Withdraw
Loading…
The surprising effectiveness of explicit semantic analysis in dataless classification
Gupta, Shashank
Loading…
Permalink
https://hdl.handle.net/2142/105826
Description
- Title
- The surprising effectiveness of explicit semantic analysis in dataless classification
- Author(s)
- Gupta, Shashank
- Issue Date
- 2019-07-16
- Director of Research (if dissertation) or Advisor (if thesis)
- Roth, Dan
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- ESA
- Dataless Classification
- Embeddings
- Unsupervised Learning
- EntityESA
- Entity2Vec
- Topic2Vec
- Word2Concept
- Abstract
- Organizing textual content into broad labels is one of the most basic tasks that some people carry out on a regular basis. This simple task helps people navigate through large document collections by exposing the labels of the documents, which can then be used for selecting the documents of interest. Currently, the most popular techniques for providing this basic functionality are supervised in nature, wherein someone has to annotate a collection of documents with the labels of interest. However, it might not always be possible to create a sizeable labeled dataset for every scenario or domain of interest. Thus, techniques like “Dataless Classification” have been proposed in the past that are able to bootstrap the creation of a classifier by only requiring semantic descriptions of the labels. However, despite the encouraging performance of Dataless Classification on Text Classification tasks, there is still a room for large improvement. In this thesis, we identify the limitations of ESA-driven Dataless Classification and systematically design techniques for addressing each limitation. In the process, we end up developing 4 new embeddings – EntityESA, Entity2Vec, Topic2Vec and Word2Concept. However, despite our best efforts, we found it difficult to outperform the original Dataless Classification system. For some of the techniques we provide an explanation for this observed behavior, however we also attribute some of these observations to the datasets that are being used for evaluation purposes. We then propose a way to create a new dataset that can used for future Dataless evaluations. The new embedding methods proposed in this work are generic enough that they can be of independent interest as well.
- Graduation Semester
- 2019-08
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/105826
- Copyright and License Information
- Copyright 2019 Shashank Gupta
Owning Collections
Dissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceGraduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…