Withdraw
Loading…
Statistical inference in high dimensional data and machine learning
Zhang, Yangfan
Loading…
Permalink
https://hdl.handle.net/2142/115556
Description
- Title
- Statistical inference in high dimensional data and machine learning
- Author(s)
- Zhang, Yangfan
- Issue Date
- 2022-04-22
- Director of Research (if dissertation) or Advisor (if thesis)
- Shao, Xiaofeng
- Yang, Yun
- Doctoral Committee Chair(s)
- Shao, Xiaofeng
- Yang, Yun
- Committee Member(s)
- Chen, Xiaohui
- Zhu, Ruoqing
- Department of Study
- Statistics
- Discipline
- Statistics
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Statistical Inference
- High-dimensional Data
- U-statistics
- Stochastic Gradient Descent
- Mean-field Variational Inference
- Abstract
- This thesis includes four projects. In the first project, we study the non-asymptotic theory of mean-field variational inference and show a BvM theorem for the variational distribution. We propose ELBO as a new criterion for model selection, and demonstrate that it is asymptotically equivalent to BIC but can have better accuracy in terms of evidence approximation. Moreover, we show the geometric convergence of the CAVI algorithm under parametric model framework. In the second project, we propose a class of $L_q$-norm based test statistics, for change point detection in the mean of high-dimensional independent data. We show the asymptotic normality and independence between the statistics with different $q$'s, so that we may combine them to construct an adaptive test with high power against both sparse and dense alternatives. The idea of self-normalization is further applied to avoid variance estimation and leads to pivotal statistics. We also propose a consistent estimator for the change point location, and combine it with a wild binary segmentation algorithm to estimate the change-point number and locations. In the third project, we also propose a class of $L_q$-norm based U-statistics for high-dimensional independent data, but are focused on global testing for model parameters. The statistics are applicable to many testing problems including testing of mean vector and its spatial sign, simultaneous testing of linear model coefficients, and testing of component-wise independence for high-dimensional observations, among others. A variant of the proposed U-statistic with monotone indexes is also considered, with which dynamic programming can be applied to alleviate the computation burden. In the fourth project, we propose an online method based on perturbed SGD to obtain the confidence interval of the true parameters efficiently. The method inherits the online nature of the SGD, and only requires two or four parallel runs of SGD-type algorithms to obtain the confidence interval in any fixed direction. We further combine our method with the UCB algorithm to deal with the bandit problem.
- Graduation Semester
- 2022-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2022 Yangfan Zhang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…