Withdraw
Loading…
Statistical modeling of heterogeneous data
Liu, Yufei
Loading…
Permalink
https://hdl.handle.net/2142/45589
Description
- Title
- Statistical modeling of heterogeneous data
- Author(s)
- Liu, Yufei
- Issue Date
- 2013-08-22T16:48:49Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Liang, Feng
- Doctoral Committee Chair(s)
- Liang, Feng
- Committee Member(s)
- Simpson, Douglas G.
- Marden, John I.
- Chen, Yuguo
- Department of Study
- Statistics
- Discipline
- Statistics
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Statistical Learning
- Clustering
- Non-parametric Bayes
- Dirichlet Process
- Mixture Model
- Heterogeneous Data
- Abstract
- This dissertation is centered on the modeling of heterogeneous data which is ubiquitous in this digital information age. From the statistical point of view heterogeneous data is composed of dissimilar components, where objects in each component are homogeneous themselves. One such example from the real world is the stock return data, where stocks in the same industry segments tend to move closely together, while different segments tend to have distinct movement patterns. Clustering is one of the most popular ways to characterize data heterogeneity. It is a classical problem of unsupervised learning. We will review major clustering approaches in Chapter 1. In recent years non-parametric Bayesian mixture models have attracted increasing attention in the clustering literature, which is closely related with our work. So we review the Mixture of Dirichlet Process Model in Chapter 2. The main dissertation body consists of three generic statistical methods to model heterogeneity in different scenarios. As data are becoming more and more prevailing today, traditional clustering tasks are often accompanied by additional information about the objects to cluster, known as the side information. The opportunity is that the side information has the potential to complement clustering algorithms to achieve more accurate and meaningful results. In Chapter 3 we describe Two-view Clustering method, a novel non-parametric clustering model that is capable of robustly incorporating noisy side information. We demonstrate the effectiveness of this new model with three real world applications in Chapter 4. Our second work is driven by market segmentation which is a key factor to a modern business's success by accurately recognizing customer groups with varying needs. Market segmentation involves dividing a larger market into sub-markets based upon a variety of factors such customers’ demographic information and product preferences. In Chapter 5 we will propose a multi-task learning framework to solve this problem. Our third work in Chapter 6 tries to solve a problem arising from citation analysis for research evaluation. In bibliometrics one central task is to characterize the statistical distribution of citations. This problem has been regarded as a challenging one for two reasons: (i) the citation distributions of almost all the subject areas are highly right-skewed; (ii) the citation behaviors across various subject areas can be drastically different. We propose a mixture model to formally characterize the statistical distribution of citation data. Based on this model we develop new criteria to evaluate impact of journals and performance of research institutes.
- Graduation Semester
- 2013-08
- Permalink
- http://hdl.handle.net/2142/45589
- Copyright and License Information
- Copyright 2013 Jeffrey Yufei Liu
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…