Withdraw
Loading…
Model-based feature construction and text representation for social media analysis
Morales, Alex
Loading…
Permalink
https://hdl.handle.net/2142/109405
Description
- Title
- Model-based feature construction and text representation for social media analysis
- Author(s)
- Morales, Alex
- Issue Date
- 2020-12-01
- Director of Research (if dissertation) or Advisor (if thesis)
- Zhai, ChengXiang
- Doctoral Committee Chair(s)
- Zhai, ChengXiang
- Committee Member(s)
- Han, Jiawei
- Hockenmaier, Julia
- Ungar, Lyle
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- machine learning
- nlp
- AI
- features
- feature construction
- humor
- feature development
- truth discovery
- hiv
- Abstract
- Text representation is at the foundation of most text-based applications. Surface features are insufficient for many tasks and therefore constructing powerful discriminative features in a general way is an open challenge. Current approaches use deep neural networks to bypass feature construction. While deep learning can learn sophisticated representations from the text, it requires a lot of training data, which might not be readily available, and the derived features are not necessarily interpretable. In this work, we explore a novel paradigm, model-based feature construction (MBFC), that allows us to construct semantic features that can potentially improve many applications. In brief, MBFC uses human knowledge and expertise as well as big data to guide the design of models that enhance predictive modeling and support the data mining process by extracting useful knowledge, which in turn can be used as features for downstream prediction tasks. In this dissertation, we show how this paradigm can be applied to several tasks of social media analysis. We explore how MBFC can be used to solve the problem of target misalignment for prediction, where the output variable and the data may be at different levels of resolution and the goal is to construct features that can bridge this gap. The MBFC method allows us to use additional related data, e.g. associated context, to facilitate semantic analysis and feature construction. In this dissertation, we focus on a subset of problems in which social media data, in particular text data, can be leveraged to construct useful representations for prediction. We explore several kinds of user-generated content in social media data such as review data for useful review prediction, micro-blogging data for urgent health-based prediction tasks, and discussion forum data for expert prediction. First, we propose a background mixture model to capture incongruity features in text and use these features for humor detection in restaurant reviews. Second, we propose a source reliability feature representation method for trustworthy comment identification that incorporates user aspect expertise when modeling fine-grained reliabilities in an online discussion forum. And finally, we propose multi-view attribute features that adapt MBFC to handle the target misalignment problem for topic-based features and apply this to tweets in order to forecast new diagnosis rates for sexually transmitted infections.
- Graduation Semester
- 2020-12
- Type of Resource
- Thesis
- Permalink
- http://hdl.handle.net/2142/109405
- Copyright and License Information
- Copyright 2020 Alex Morales
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…