Statistical models and inference for dynamic networks

Sewell, Daniel K

Statistical models and inference for dynamic networks

Sewell, Daniel K

Permalink

https://hdl.handle.net/2142/78422

Description

Title

Statistical models and inference for dynamic networks

Author(s)

Sewell, Daniel K

Issue Date

2015-04-21

Director of Research (if dissertation) or Advisor (if thesis)

Chen, Yuguo

Doctoral Committee Chair(s)

Chen, Yuguo

Committee Member(s)

Qu, Annie
Liang, Feng
Marden, John I.

Department of Study

Statistics

Discipline

Statistics

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

network
dynamic network
latent space
weighted network
social network
edge attraction
prediction
influence
missing data
scalability
stability
Longitudinal data
clustering
community detection

Abstract

Dyadic data are ubiquitous and arise in the fields of biology, epidemiology, sociology, and many more. Such dyadic data are often best understood within the framework of networks. Network data can vary in many ways. For example, one might have binary or weighted networks, directed or undirected networks, and static or longitudinal networks. This last type of network, also called a dynamic network, is the focus of this work, with the goal of developing important tools and methodology for the analysis of dynamic networks. A general framework is developed for modeling dynamic networks via a latent space approach. Using a latent space approach to model such networks allows the researcher to model both the local and global structure of the network, inherently accounts for transitivity, and yields rich and meaningful visualization which can easily be interpreted for qualitative inference on the network. A Markov chain Monte Carlo (MCMC) estimation method within a Bayesian setting is presented. Several useful tools for the researcher arise from this estimation method. First, a method of predicting future relations, or edges, is given. Second, missing data can easily be incorporated into the model, obtaining a posterior probability of each missing edge. Third, a novel concept called nodal influence is introduced which describes how one actor can influence the edges of another actor. Detection of such nodal influence is given via computationally efficient posterior estimation. This model is shown to outperform the existing method, as well as being able to handle richer and more complex data than the existing method. The MCMC algorithm is made scalable by utilizing a log likelihood approximation proposed in the literature, slightly adapted to allow for missing data. Many of the dynamic networks that arise inherently have weighted edges. The latent space model is extended to handle a variety of types of weighted edges which arise. In particular, the model is extended to account for relational data that can be viewed as, conditioning on the latent actor positions, having come from an exponential family of distributions. An example is also given which demonstrates how, through data augmentation, a similar strategy can be employed when this is not the case. The log likelihood approximation method is then extended to make the MCMC algorithms scalable for weighted networks. Of particular interest is Newcomb's fraternity data, a network which captures the evolution and formation of a network beginning in its most nascent form and and ending at a stabilized form. The previous model is modified in two non-trivial ways; the first allows for the modeling of rank-order data, which does not fall into the broad categories of weighted network data given previously, and the second allows for the estimation of the evolution of the stability of the network. Next, it is shown how to use the uncertainties associated with the posterior estimation for subgroup detection and for determining the time at which these subgroups formed. Finally, the model parameters are used to find the association between individual stability and popularity. A longitudinal mixture model is described which can be used to make hard or soft clustering assignments for p-dimensional real valued data. This model accounts for temporal dependence of both the clustering assignment and the object to be clustered. Additionally, the model allows for covariates which may aid in explaining the clustering assignments. The solutions for implementing the generalized EM algorithm are presented. Recursive relationships are derived which allow the computational cost to grow linearly with time rather than exponentially. The latent space framework and the longitudinal clustering model are combined to perform community detection within dynamic network data, where the communities' characteristics are fixed but the membership of each community can evolve over time. This method can handle directed or undirected weighted dynamic network data. For community detection within directed or undirected binary networks, a novel model is given along with an efficient variational Bayes estimation algorithm. Both methods are shown to have better performance than using community detection methodology which does not borrow information across time.

Graduation Semester

2015-5

Type of Resource

text

Permalink

http://hdl.handle.net/2142/78422

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Statistical models and inference for dynamic networks

Sewell, Daniel K

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Statistics

Log In