Model-based feature construction and text representation for social media analysis

Morales, Alex

Model-based feature construction and text representation for social media analysis

Morales, Alex

Permalink

https://hdl.handle.net/2142/109405

Description

Title

Model-based feature construction and text representation for social media analysis

Author(s)

Morales, Alex

Issue Date

2020-12-01

Director of Research (if dissertation) or Advisor (if thesis)

Zhai, ChengXiang

Doctoral Committee Chair(s)

Zhai, ChengXiang

Committee Member(s)

Han, Jiawei
Hockenmaier, Julia
Ungar, Lyle

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

machine learning
nlp
AI
features
feature construction
humor
feature development
truth discovery
hiv

Abstract

Text representation is at the foundation of most text-based applications. Surface features are insufficient for many tasks and therefore constructing powerful discriminative features in a general way is an open challenge. Current approaches use deep neural networks to bypass feature construction. While deep learning can learn sophisticated representations from the text, it requires a lot of training data, which might not be readily available, and the derived features are not necessarily interpretable. In this work, we explore a novel paradigm, model-based feature construction (MBFC), that allows us to construct semantic features that can potentially improve many applications. In brief, MBFC uses human knowledge and expertise as well as big data to guide the design of models that enhance predictive modeling and support the data mining process by extracting useful knowledge, which in turn can be used as features for downstream prediction tasks. In this dissertation, we show how this paradigm can be applied to several tasks of social media analysis. We explore how MBFC can be used to solve the problem of target misalignment for prediction, where the output variable and the data may be at different levels of resolution and the goal is to construct features that can bridge this gap. The MBFC method allows us to use additional related data, e.g. associated context, to facilitate semantic analysis and feature construction. In this dissertation, we focus on a subset of problems in which social media data, in particular text data, can be leveraged to construct useful representations for prediction. We explore several kinds of user-generated content in social media data such as review data for useful review prediction, micro-blogging data for urgent health-based prediction tasks, and discussion forum data for expert prediction. First, we propose a background mixture model to capture incongruity features in text and use these features for humor detection in restaurant reviews. Second, we propose a source reliability feature representation method for trustworthy comment identification that incorporates user aspect expertise when modeling fine-grained reliabilities in an online discussion forum. And finally, we propose multi-view attribute features that adapt MBFC to handle the target misalignment problem for topic-based features and apply this to tweets in order to forecast new diagnosis rates for sexually transmitted infections.

Graduation Semester

2020-12

Type of Resource

Thesis

Permalink

http://hdl.handle.net/2142/109405

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Dept. of Computer Science

Model-based feature construction and text representation for social media analysis

Morales, Alex

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In