A reformulated approach to attribute-aware sampling on large networks
Shang, Charles
Loading…
Permalink
https://hdl.handle.net/2142/104885
Description
Title
A reformulated approach to attribute-aware sampling on large networks
Author(s)
Shang, Charles
Issue Date
2019-04-22
Director of Research (if dissertation) or Advisor (if thesis)
Sundaram, Hari
Department of Study
Computer Science
Discipline
Computer Science
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Sampling
Networks
Data Mining
Abstract
Sampling has long been an important tool for extracting subsets of data for data mining tasks. As the scale of information produced has increased, efficient sampling is only becoming more important. Uniform sampling is often the preferred technique of choice, due to its simplicity and speed. However, many network based data sources prevent random access, necessitating a different way to sample. Algorithms like Breadth first search, Random walk, Expansion sampling, or other related strategies fulfill this role currently. But these algorithms are focused mainly on ensuring properties based on the structure of the graph, without consideration for the attributes of each node.
In this study, we take an existing attribute aware sampler and propose a natural reformulation of the algorithm. We present a new surprise function that avoids some drawbacks of a previous work and take advantage of the submodularity property to reduce the computation that needs to be done when selecting a node and make some arguments about the efficiency and effectiveness of such a strategy. We test our algorithm on some real world data sets and found that our algorithm had increases in sample attribute coverage by up to 4 times when compared to techniques like random walk while still taking time approximately linear in the size of the sample
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.