Statistical inference for high-dimensional data via U-statistcs

Wang, Runmin

Statistical inference for high-dimensional data via U-statistcs

Wang, Runmin

Permalink

https://hdl.handle.net/2142/108476

Description

Title

Statistical inference for high-dimensional data via U-statistcs

Author(s)

Wang, Runmin

Issue Date

2020-07-14

Director of Research (if dissertation) or Advisor (if thesis)

Shao, Xiaofeng

Doctoral Committee Chair(s)

Shao, Xiaofeng

Committee Member(s)

Chen, Xiaohui
Fellouris, Georgios
Simpson, Douglas G

Department of Study

Statistics

Discipline

Statistics

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Date of Ingest

2020-10-07T20:59:45Z

Keyword(s)

High-dimensional data
U-statistics

Abstract

Owing to the advances in the science and technology, there is a surge of interest in high-dimensional data. Many methods developed in low or fixed dimensional setting may not be theoretically valid under this new setting, and sometimes are not even applicable when the dimensionality is larger than the sample size. To circumvent the difficulties brought by the high-dimensionality, we consider to use U-statistics based methods. In this thesis, we investigate the theoretical properties of U-statistics under the high-dimensional setting, and develop the novel U-statistics based methods to three problems. In the first chapter, we propose a new formulation of self-normalization for inference about the mean of high-dimensional stationary processes by using a U-statistic based approach. Self-normalization has attracted considerable attention in the recent literature of time series analysis, but its scope of applicability has been limited to low-/fixed-dimensional parameters for low-dimensional time series. Our original test statistic is a U-statistic with a trimming parameter to remove the bias caused by weak dependence. Under the framework of nonlinear causal processes, we show the asymptotic normality of our U-statistic with the convergence rate dependent upon the order of the Frobenius norm of the long-run covariance matrix. The self-normalized test statistic is then constructed on the basis of recursive subsampled U-statistics and its limiting null distribution is shown to be a functional of time-changed Brownian motion, which differs from the pivotal limit used in the low-dimensional setting. An interesting phenomenon associated with self-normalization is that it works in the high-dimensional context even if the convergence rate of original test statistic is unknown. We also present applications to testing for bandedness of the covariance matrix and testing for white noise for high-dimensional stationary time series and compare the finite sample performance with existing methods in simulation studies. At the root of our theoretical arguments, we extend the martingale approximation to the high-dimensional setting, which could be of independent theoretical interest. In the second chapter, we consider change point testing and estimation for high dimensional data. In the case of testing for a mean shift, we propose a new test which is based on U-statistics and utilizes the self-normalization principle. Our test targets dense alternatives in the high dimensional setting and involves no tuning parameters. The weak convergence of a sequential U-statistic based process is shown as an important theoretical contribution. Extensions to testing for multiple unknown change points in the mean, and testing for changes in the covariance matrix are also presented with rigorous asymptotic theory and encouraging simulation results. Additionally, we illustrate how our approach can be used in combination with wild binary segmentation to estimate the number and location of multiple unknown change points. In the third chapter, we consider the estimation and inference for the location of single change point in the mean of independent high-dimensional data. Our change point location estimator maximizes a new U-statistic based objective function, and its convergence rate and asymptotic distribution after suitable centering and normalization are obtained under mild assumptions. Our estimator turns out to have better efficiency as compared to the least squares based counterpart in the literature. Based on the asymptotic theory, we construct a confidence interval by plugging in consistent estimates of several quantities in the normalization. We also provide a bootstrap-based confidence interval and state its asymptotic validity under suitable conditions. Through simulation studies, we demonstrate favorable finite sample performance of the new change point location estimator as compared to its least squares based counterpart, and our bootstrap-based confidence intervals, as compared to several existing competitors. The asymptotic theory based on high-dimensional U-statistic is substantially different from those developed in the literature and is of independent interest.

Graduation Semester

2020-08

Type of Resource

Thesis

Permalink

http://hdl.handle.net/2142/108476

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Statistical inference for high-dimensional data via U-statistcs

Wang, Runmin

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Statistics

Log In