Withdraw
Loading…
Statistical methods for learning sparse features
Hu, Jianjun
Loading…
Permalink
https://hdl.handle.net/2142/97366
Description
- Title
- Statistical methods for learning sparse features
- Author(s)
- Hu, Jianjun
- Issue Date
- 2017-04-20
- Director of Research (if dissertation) or Advisor (if thesis)
- Liang, Feng
- Doctoral Committee Chair(s)
- Liang, Feng
- Committee Member(s)
- Simpson, Douglas G.
- Shao, Xiaofeng
- Chen, Xiaohui
- Department of Study
- Statistics
- Discipline
- Statistics
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Dimension reduction
- Sparsity
- Principal component analysis
- Matrix decomposition
- Regularization
- Thresholding
- Variational-Bayes
- Selection consistency
- Abstract
- With the fast development of networking, data storage, and the data collection capacity, big data are now rapidly expanding in all science and engineering domains. When dealing with such data, it is appealing if we can extract the hidden sparse structure of the data since sparse structures allow us to understand and interpret the information better. The aim of this thesis is to develop algorithms that can extract such hidden sparse structures of the data in the context of both supervised learning and unsupervised learning. In chapter 1, this thesis first examines the limitation of the classical Fisher Discriminant Analysis (FDA), a supervised dimension reduction algorithm for multi-class classification problems. This limitation has been discussed by Cui (2012), and she has proposed a new objective function in her thesis, which is named Complementary Dimension Analysis (CDA) since each sequentially added new dimension boosts the discriminative power of the reduced space. A couple of extensions of CDA are discussed in this thesis, including sparse CDA (sCDA) in which the reduced subspace involves only a small fraction of the features, and Local CDA (LCDA) that handles multimodal data more appropriately by taking the local structure of the data into consideration. A combination of sCDA and LCDA is shown to work well with real examples and can return sparse directions from data with subtle local structures. In chapter 2, this thesis considers the problem of matrix decomposition that arises in many real applications such as gene repressive identification and context mining. The goal is to retrieve a multi- layer low-rank sparse decomposition from a high dimensional data matrix. Existing algorithms are all sequential algorithms, that is, the first layer is estimated, and then remaining layers are estimated one by one, by conditioning on the previous layers. As discussed in this thesis, such sequential approaches have some limitations. A new algorithm is proposed to address those limitations, where all the layers are solved simultaneously instead of sequentially. The proposed algorithm in chapter 2 is based on a complete data matrix. In many real applications and cross-validation procedures, one needs to work with a data matrix with missing values. How to operate the proposed matrix decomposition algorithm when there exist missing values is the main focus of chapter 3. The proposed solution seems to be slightly different from some existing work such as penalized matrix decomposition (PMD). In chapter 4, this thesis considers a Bayesian approach to sparse principal component analysis (PCA). An efficient algorithm, which is based on a hybrid of Expectation-Maximization (EM) and Variational-Bayes (VB), is proposed and it can be shown to achieve selection consistency when both p and n go to infinity. Empirical studies have demonstrated the competitive performance of the proposed algorithm.
- Graduation Semester
- 2017-05
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/97366
- Copyright and License Information
- Copyright 2017 Jianjun Hu
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…