Information sampling from online social networks

Kumar, Suhansanu

Information sampling from online social networks

Kumar, Suhansanu

Permalink

https://hdl.handle.net/2142/110469

Description

Title

Information sampling from online social networks

Author(s)

Kumar, Suhansanu

Issue Date

2021-04-14

Director of Research (if dissertation) or Advisor (if thesis)

Sundaram, Hari

Doctoral Committee Chair(s)

Sundaram, Hari

Committee Member(s)

Tong, Hanghang
Koyejo, Sanmi
Jiang, Meng

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

sampling
network
graph
online social network
reinforcement learning
hidden population
content

Abstract

Data sampling from online social networks is a pre-requisite step for several downstream applications. Further, the massive size of the online social networks coupled with several API limitations and restrictions to the social information makes sampling a challenging problem. This thesis addresses some of the sampling challenges by proposing novel samplers for sampling attributes (content), hidden attributes (population), and networks from online social networks. Specifically, we first propose an information-based sampler in Chapter 3 for sampling content from online social networks. We leverage the surprise of content to direct our sampler towards informative content. The surprise-based sampling strategy allows us to sample the cluster shape and boundary of content clusters efficiently, which is crucial for several data-mining tasks, including clustering, classification, regression, and attribute discovery. We demonstrate our proposed sampler's efficacy on a suite of thirty real-world networks and four data-mining tasks. We further show through empirical counterfactual analysis that network structure does not hinder the performance of surprise-based link-trace samplers in many real-world datasets. Next in Chapter 4, we propose a novel attributed search-based sampler to sample hidden populations. We use a decision-tree-based search strategy to query the attribute-search space systematically. Our proposed decision-tree Thompson sampler follows the exploration and exploitation strategy to sample hidden populations from social networks. We demonstrate our sampler's efficacy over a suite of fourteen sampling tasks on three online social sites and five offline datasets. Furthermore, we show the impact of several factors, like page size, missing information, and noise, affecting hidden population sampling in real-world social networks. Finally, in Chapter 5, we propose a novel framework for learning network samplers. First, we show through theoretical and empirical proof that there exists no universal network sampler that can preserve all the topological properties of the underlying graph in the sample. To address the non-existence issue, we propose a reinforcement learning framework that learns high-quality sampling policies according to application needs. We demonstrate the efficacy of our proposed sampling framework through extensive experiments across ten different graph families and seven diverse tasks. In summary, this thesis develops several sampling strategies for sampling information (attribute, hidden attribute, network) from online social networks while being cognizant of API restrictions' constraints. We propose adaptive samplers that can cater to different application needs.

Graduation Semester

2021-05

Type of Resource

Thesis

Permalink

http://hdl.handle.net/2142/110469

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Dept. of Computer Science

Information sampling from online social networks

Kumar, Suhansanu

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In