Withdraw
Loading…
Machine learning methods based on diffusion processes
Zhao, Chenchao
Loading…
Permalink
https://hdl.handle.net/2142/101545
Description
- Title
- Machine learning methods based on diffusion processes
- Author(s)
- Zhao, Chenchao
- Issue Date
- 2018-07-09
- Director of Research (if dissertation) or Advisor (if thesis)
- Song, Jun S.
- Doctoral Committee Chair(s)
- Maslov, Sergei
- Committee Member(s)
- Chemla, Yann R.
- Zhao, Sihai
- Department of Study
- Physics
- Discipline
- Physics
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- machine learning
- diffusion
- Abstract
- This thesis presents three distinct machine learning algorithms based on the mathematical formalism and physical idea of diffusion processes. First, the idea of using heat diffusion on a hypersphere to measure similarity has been previously proposed and tested by computer scientists, demonstrating promising results based on a heuristic heat kernel obtained from the zeroth order parametrix expansion; however, how well this heuristic kernel agrees with the exact hyperspherical heat kernel remains unknown. This thesis presents a higher order parametrix expansion of the heat kernel on a unit hypersphere and discusses several problems associated with this expansion method. We then compare the heuristic kernel with an exact form of the heat kernel expressed in terms of a uniformly and absolutely convergent series in high-dimensional angular momentum eigenmodes. Being a natural measure of similarity between sample points dwelling on a hypersphere, the exact kernel often shows superior performance in kernel SVM classifications applied to text mining, tumor somatic mutation imputation, and stock market analysis. The improvement in classification accuracy compared with kernels based on Euclidean geometry may arise from ameliorating the curse of dimensionality on compact manifolds. Second, the effective dissimilarity transformation (EDT) on empirical dissimilarity hyperspheres is proposed and studied using synthetic and gene expression data sets. Iterating the EDT turns a static data distribution into a dynamical process purely driven by the empirical data set geometry and adaptively ameliorates the curse of dimensionality, partly through changing the topology of a Euclidean feature space \mathbb{R}^{n} into a compact hypersphere S^{n}. The EDT often improves the performance of hierarchical clustering via the automatic grouping of information emerging from global interactions of data points. The EDT is not restricted to hierarchical clustering, and other learning methods based on pairwise dissimilarity should also benefit from the many desirable properties of EDT. Finally, quantum time evolution exhibits rich physics, attributable to the interplay between the density and phase of a wave function. However, unlike classical heat diffusion, the wave nature of quantum mechanics has not yet been extensively explored in modern data analysis. We propose that the Laplace transform of quantum transport (QT) can be used to construct an ensemble of maps from a given complex network to a circle S^{1}, such that closely-related nodes on the network are grouped into sharply concentrated clusters on S^{1}. The resulting QT clustering (QTC) algorithm is as powerful as the state-of-the-art spectral clustering in discerning complex geometric patterns and more robust when clusters show strong density variations or heterogeneity in size. The observed phenomenon of QTC can be interpreted as a collective behavior of the microscopic nodes that evolve as macroscopic cluster “orbitals” in an effective tight-binding model recapitulating the network. In summary, the three machine learning methods are based on three distinct diffusion processes. The dynamic diffusion processes serve as a promising foundation for future development in machine learning methods.
- Graduation Semester
- 2018-08
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/101545
- Copyright and License Information
- Copyright 2018 Chenchao Zhao
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Physics
Dissertations in PhysicsManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…