Withdraw
Loading…
Learning on graphs: from theory to practice
Chien, I
Loading…
Permalink
https://hdl.handle.net/2142/117770
Description
- Title
- Learning on graphs: from theory to practice
- Author(s)
- Chien, I
- Issue Date
- 2022-11-23
- Director of Research (if dissertation) or Advisor (if thesis)
- Milenkovic, Olgica
- Doctoral Committee Chair(s)
- Milenkovic, Olgica
- Committee Member(s)
- Hajek, Bruce
- Raginsky, Maxim
- Schwing, Alexander
- Li, Pan
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Graphs
- Machine Learning
- Graph Neural Networks
- Abstract
- Learning on graphs has raised massive attention in the machine-learning community due to various real-world applications. Random walks on graphs, or say graph propagation, have been one of the most popular core ideas in many graph-learning methods. Examples range from classical methods such as PageRank and label propagation to graph convolution and message-passing mechanisms in Graph Neural Networks (GNNs). While there are plenty of graph learning problems such as link predictions and graph classifications, we mainly focus on node-level tasks such as seed-expansion community detection and node classification problems. These tasks aim to recover the underlying node labels (communities) based on an input graph and potentially with node features and attributes. We propose a series of innovative graph-learning methods and relevant analysis for the node-level tasks. We first study the Generalized PageRank (GPR) method, which unifies many PageRank variants as its special cases via different choices of GPR weights. These include Personalized PageRank (PPR), Heat-kernel PageRank (HPR) and many others. Despite the expressiveness of GPR, previous works in the area have mostly focused on evaluating suitable GPR weights. Only a few studies have attempted to determine the optimal weights of GPR for a given application. We take a step forward in this direction by analyzing the behavior of GPR on random graph models such as stochastic block models. We provide non-asymptotic analysis for GPR which characterizes its first and second moments. Based on our analysis, we propose a new GPR termed Inverse PR (IPR). We demonstrate the superiority of IPR compared to other GPRs via extensive experiments on seed-expansion community detection tasks. We next address two fundamental issues in GNNs, namely lacking universality and the over-smoothing problem. Here, universality refers to independence on homophily and heterophily graph assumptions. We address these issues by introducing a novel GPR-GNN architecture that adaptively learns the GPR weights so as to jointly optimize node feature and topological information extraction. We show that learned GPR weights can automatically adjust to the node label pattern, no matter if it is homophilic or heterophilic. Thus, GPR-GNN is a universal graph learning model. We further show that learning GPR weights allows one to prevent feature over-smoothing, a process that renders feature information nondiscriminative, without containing the depth of the model. Our theoretical analysis rigorously validates on not only real-world datasets but also the novel synthetic benchmark datasets generated by the contextual stochastic block model. Finally, we observe that the standard GNN pipeline requires input data consisting of an input graph and \emph{numerical} node features for node classification tasks. However, we have only raw node attributes such as text, images and audio in many real-world applications. The common way of obtaining \emph{numerical} node features from raw data is applying \emph{graph agnostic} methods for node feature extraction. Notably, recent works exploring the correlation between numerical node features and graph structure via self-supervised learning have paved the way for further performance improvement of GNNs. Under this viewpoint, leveraging \emph{graph agnostic} node feature extraction methods is suboptimal, as they prevent one from fully utilizing potential correlations between node attributes and graph topology. We propose Graph Information Aided Node feature exTraction (GIANT) to mitigate this issue. Our first contribution is the proposal of a new graph self-supervised learning task named \emph{neighborhood prediction}. We show that our neighborhood prediction task is universal and related to the eXtreme Multilabel Classification (XMC) problem. Based on this connection to XMC, we leverage the state-of-the-art XMC solver to fine-tune the language model based on graph information and scales to large datasets. GIANT achieves new state-of-the-art results for the node classification task on large-scale datasets that contains more than $100$ million of nodes with significant gain.
- Graduation Semester
- 2022-12
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2022 I Chien
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…