Withdraw
Loading…
Statistical inference on network data
Zhang, Jingfei
Loading…
Permalink
https://hdl.handle.net/2142/49840
Description
- Title
- Statistical inference on network data
- Author(s)
- Zhang, Jingfei
- Issue Date
- 2014-05-30T17:20:27Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Chen, Yuguo
- Doctoral Committee Chair(s)
- Chen, Yuguo
- Committee Member(s)
- Douglas, Jeffrey A.
- Marden, John I.
- Simpson, Douglas G.
- Department of Study
- Statistics
- Discipline
- Statistics
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Network Inference
- Random Graphs
- Sequential Importance Sampling
- Network Robustness
- Exponential Random Graph Model
- Dense Subgraph Discovery
- Simulated Annealing
- Community Detection
- Degree Corrected Stochastic Block Model
- Abstract
- Networks arise from modeling complex systems in various fields, such as computer science, social science, biology, psychology and finance. Understanding and analyzing networks help us better understand these complex systems and extract useful information. In this dissertation, we study problems on network sampling, network modeling and data mining on networks. Random graphs with given vertex degrees have been widely used as a model for many real-world complex networks. However, both statistical inference and analytic study of such networks present great challenges. In Chapter 2, we propose new sequential importance sampling methods for sampling networks with a given degree sequence. These samples can be used to approximate closely the null distributions of a number of test statistics involved in such networks, and provide an accurate estimate of the total number of networks with given vertex degrees. We study the asymptotic behavior of the proposed algorithm and prove that the importance weight remains bounded as the size of the graph grows. This property guarantees that the proposed sampling algorithm can still work efficiently even for large sparse graphs. We apply our method to a range of examples to demonstrate its efficiency in real problems. One important question for complex networks is how the network's connectivity will be affected if the network is under targeted attacks, i.e., the nodes with the most links are attacked. In Chapter 3, we found that a dolphin network is resilient to targeted attacks. To further study the resilient property, we fit an exponential random graph model to the dolphin network. The fitted model characterizes network resiliency and identifies local structures that can reproduce the global resilience property. Such a statistical model can be used to build the Internet and other networks to increase the attack tolerance of those networks. The problem of finding densely connected subgraphs in a network has attracted a lot of recent attention. Such subgraphs are sometimes referred to as communities in social networks or molecular modules in protein networks. In Chapter 4, we propose two Monte Carlo optimization algorithms for identifying the densest subgraphs with a fixed size or with size in a given range. The new algorithms combine the idea of simulated annealing and efficient moves for the Markov chain, and both algorithms are shown to converge to the set of optimal states (densest subgraphs) with probability one. When applied to a yeast protein interaction network and a stock market graph, the algorithms identify interesting new densely connected subgraphs. One of the most relevant features of networks representing real systems is the community structure. Detecting communities is of great importance in understanding, analyzing, organizing networks. In Chapter 5, we describe a statistical framework for modularity-based network community detection. We derive the modularity function under the proposed statistical framework, and propose a fast modularity maximization algorithm based on the eigen-spectrum of the modularity matrix. A hypothesis testing procedure is developed to determine the significance of an identified community structure. The modularity formulated under the proposed statistical framework is shown to be consistent under the degree-corrected stochastic block model framework. Several synthetic networks and real world networks are used to demonstrate the effectiveness of our method.
- Graduation Semester
- 2014-05
- Permalink
- http://hdl.handle.net/2142/49840
- Copyright and License Information
- Copyright 2014 Jingfei Zhang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…