Statistical inference for complex networks

Liu, Yan

Statistical inference for complex networks

Liu, Yan

Permalink

https://hdl.handle.net/2142/110845

Description

Title

Statistical inference for complex networks

Author(s)

Liu, Yan

Issue Date

2021-04-21

Director of Research (if dissertation) or Advisor (if thesis)

Chen, Yuguo

Doctoral Committee Chair(s)

Chen, Yuguo

Committee Member(s)

Douglas, Jeffrey
Simpson, Douglas
Yang, Yun

Department of Study

Statistics

Discipline

Statistics

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Date of Ingest

2021-09-17T04:04:41Z

Keyword(s)

Network Inference
Random Graphs
Variational Inference
Latent Space Model
Stochastic Block Model
Community Detection
Dynamic Networks
Heterogeneous Networks

Abstract

A network consists of a set of vertices and a set of edges between these vertices. The vertices represent a set of entities, and the edges represent relationships among these entities. Networks and relational data arise in various areas of research, such as neuroscience, biology, computer science, and social science. In the past two decades, a large number of statistical methods have been proposed for modeling such relational data, identifying community structures, hypothesis testing, and model selection. The majority of these methods dealt with the case where only one network observation is available. However, as the data collection ability grows, the surge in the number of complex network datasets poses new challenges for statisticians. Complex networks encompass a variety of frameworks. For example, in the dynamic network setting, one may observe a time series of networks over the same set of nodes. In heterogeneous networks, different types of entities interact with each other and form different types of links. In multi-layer networks, one observes multiple networks on the same set of nodes, which represent different types of interactions among these entities. In these scenarios, new probabilistic models are needed in order to deal with the time-varying aspect or heterogeneity of the networks. Furthermore, while community detection is well-studied for a single network, it is still of interest to develop new methods and theories for community detection in complex networks. As the size of network data becomes increasingly large, scalable algorithms for large-scale networks are also of interest. In this dissertation, we focus on modeling, community detection, theoretical properties, and scalable inference for complex networks. In the first project, we propose a variational inference algorithm for dynamic latent space models. Compared with sampling-based methods such as Markov chain Monte Carlo (MCMC), the proposed algorithm yields similar empirical performance with much less computation time. We derive the theoretical properties of the variational Bayes risk of the proposed algorithm and demonstrate the performance on simulated data and two real-world datasets. Heterogeneous mixed-membership stochastic blockmodel (MMSB) is powerful for modeling overlapping communities and multiple community memberships, but many methods in the literature lack a theoretical guarantee. In the second project, we study the theoretical properties of the heterogeneous version of the MMSB. We propose a community detection consistency criterion for mixed-membership networks, and we show that both the maximum-likelihood estimator and the Bayes estimator of heterogeneous MMSB enjoy community detection consistency. In the third project, we consider the problem of inferring the community memberships of the nodes in dynamic networks, where the community memberships are allowed to change over time. We propose two matrix factorization methods, namely, weighted orthogonal linked matrix factorization (OLMF) and co-regularization spectral clustering. We derive the clustering error bound for both proposed algorithms, and show that both algorithms are consistent under various asymptotic regimes. The proposed algorithms outperform the existing early-fusion and late-fusion methods in several simulation studies and two real data examples. In the fourth project, we study the variational EM algorithm for community detection in the multi-layer stochastic block model (MLSBM). We show that the proportion of misclassified nodes given by the variational EM algorithm converges to zero as the number of nodes, the number of layers, and the number of communities go to infinity, i.e., the variational EM algorithm enjoys the property of community detection consistency.

Graduation Semester

2021-05

Type of Resource

Thesis

Permalink

http://hdl.handle.net/2142/110845

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Statistical inference for complex networks

Liu, Yan

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Statistics

Log In