Withdraw
Loading…
Statistical algorithms using multisets and statistical inference of heterogeneous networks
Huang, Weihong
Loading…
Permalink
https://hdl.handle.net/2142/98245
Description
- Title
- Statistical algorithms using multisets and statistical inference of heterogeneous networks
- Author(s)
- Huang, Weihong
- Issue Date
- 2017-06-27
- Director of Research (if dissertation) or Advisor (if thesis)
- Chen, Yuguo
- Doctoral Committee Chair(s)
- Chen, Yuguo
- Committee Member(s)
- Culpepper, Steven
- Douglas, Jeffrey
- Liang, Feng
- Department of Study
- Statistics
- Discipline
- Statistics
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Multisets
- Expectation-maximization (EM) algorithm
- Metropolis-Hastings algorithm
- Heterogeneous network
- Clustering
- Mixed membership model
- Variational algorithm
- Abstract
- Computational statistics, including methods such as Markov chain Monte Carlo (MCMC), bootstrap, approximate Bayesian computation, is an important part in modern statistics and has been widely used in many areas, such as Bayesian statistics, computational biology, and computational physics. In this thesis, we study three problems: improvement of the efficiency for the EM algorithm and the MCMC method, and statistical analysis for heterogeneous networks. The expectation-maximization (EM) algorithm is widely used in computing the maximum likelihood estimates when the observations can be viewed as incomplete data. However, the convergence rate of the EM algorithm can be slow especially when a large portion of the data is missing. In Chapter 2, we propose the multiset EM algorithm that can help the convergence of the EM algorithm. The key idea is to augment the system with a multiset of the missing component, and construct an appropriate joint distribution of the augmented complete data. We demonstrate that the multiset EM algorithm can outperform the EM algorithm, especially when EM has difficulties in convergence and the E-step involves Monte Carlo approximation. The multiset sampler proposed by Leman et al. (2009) has been shown to be an effective algorithm to sample from complex multimodal distributions, but the multiset sampler requires that the parameters in the target distribution can be divided into two parts: the parameters of interest and the nuisance parameters. In Chapter 3, we propose a new self-multiset sampler (SMSS) which extends the multiset sampler to distributions without nuisance parameters. We also generalize our method to distributions with unbounded or infinite support. Numerical results show that the SMSS and its generalization have a substantial advantage in sampling multimodal distributions compared to the ordinary Markov chain Monte Carlo algorithm and some popular variants. Heterogeneous networks are useful for modeling complex systems, which consist of different types of objects. However, there are limited statistical models to deal with heterogeneous networks. In Chapter 4, we propose a statistical model for community detection in heterogeneous networks. To allow heterogeneity in the data and the content dependent property of the pairwise relationship, we formulate the heterogeneous version of the mixed membership stochastic blockmodel. We also apply a variational algorithm for posterior inference. We demonstrate the advantage of the proposed method, in modeling overlapping communities and multiple memberships, through simulation studies and applications to the DBLP data.
- Graduation Semester
- 2017-08
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/98245
- Copyright and License Information
- Copyright 2017 Weihong Huang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…