Online and active learning of big networks: theory and algorithms

Gu, Quanquan

Online and active learning of big networks: theory and algorithms

Gu, Quanquan

Permalink

https://hdl.handle.net/2142/50757

Description

Title

Online and active learning of big networks: theory and algorithms

Author(s)

Gu, Quanquan

Issue Date

2014-09-16

Director of Research (if dissertation) or Advisor (if thesis)

Han, Jiawei

Doctoral Committee Chair(s)

Han, Jiawei

Committee Member(s)

Roth, Dan
Zhai, ChengXiang
Aggarwal, Charu C.

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Big Networks
Online Learning
Active Learning

Abstract

We are living in the Internet Age, in which information entities and objects are interconnected, thereby forming gigantic information networks. These networks are not only massive, but also grow and evolve very quickly. It is critical to quickly process and understand these networks in order to enable data-driven applications. On the other hand, the labels of the nodes in big networks are scarce. It is urgent to optimize the process by which the labels are collected, because it is unrealistic to get labels of every node. The objective of my research is to develop algorithms for big network analytics, which are both statistically and computationally efficient, and with provable guarantee on their performance. In particular, I present active learning, online learning, selective sampling (online active learning), and online learning with bandit feedback algorithms for learning in a network. In the first part of this thesis, I propose a \textit{nonadaptive} active learning approach on a graph, based on generalization error bound minimization. In particular, I present a data-dependent error bound for a graph-based learning method, namely learning with local and global consistency (LLGC). I show that the empirical transductive Rademacher complexity of the function class for LLGC provides a natural criterion for active learning. The resulting active learning approach is to select a subset of nodes on a graph such that the empirical transductive Rademacher complexity of LLGC is minimized. I propose a simple yet effective sequential optimization algorithm to solve it. In the second part of this thesis, I first present an online learning algorithm on a graph for binary node classification. It is an online version of the well-known Learning with Local and Global Consistency method (OLLGC). It is essentially a second-order online learning algorithm, and can be seen as an online ridge regression in the Hilbert space of functions defined on a graph. I prove its regret bound in terms of the structural property (cut size) of a graph. Based on OLLGC, I present a selective sampling algorithm, namely Selective Sampling with Local and Global Consistency (SSLGC), which adaptively queries the label of each node based on the confidence of the linear function on a graph. Its regret bound as well as its label complexity are also derived. I also analyze the low-rank approximation of graph kernels, which enables the online algorithms scale to large graphs. In the third part of this thesis, I first extend the online binary classification algorithm to multi-class regime, based on spectral learning technique. The resulting algorithm is called online spectral learning on a graph (OSLG). Then I study online learning on a graph with bandit feedback, where after the learner makes a prediction of the node label, the oracle will return a single bit indicating whether the prediction is correct or not. I propose an algorithm namely online spectral learning on a graph with bandit feedback (OSLG Bandit), which uses upper-confidence bound of the nonparametric classifier on a graph to trade off the exploration and exploitation of label information. I derive regret bounds for both algorithms, which clearly characterize the difficulty of online learning on a graph for multi-class node classification, both in the full information setting and in the partial information setting. All in all, my efforts in this thesis have provided important findings in these areas, and have yielded new algorithms and theory.

Graduation Semester

2014-08

Permalink

http://hdl.handle.net/2142/50757

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Dept. of Computer Science

Online and active learning of big networks: theory and algorithms

Gu, Quanquan

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In