Withdraw
Loading…
Segmentation of Multivariate Mixed Data via Lossy Data Coding and Compression
Ma, Yi; Derksen, Harm; Hong, Wei; Wright, John
Loading…
Permalink
https://hdl.handle.net/2142/99597
Description
- Title
- Segmentation of Multivariate Mixed Data via Lossy Data Coding and Compression
- Author(s)
- Ma, Yi
- Derksen, Harm
- Hong, Wei
- Wright, John
- Issue Date
- 2006-08
- Keyword(s)
- Multivariate mixed data
- Data segmentation
- Rate distortion
- Lossy data coding
- Data compression
- Image segmentation
- Microarray data clustering
- Abstract
- In this paper, based on ideas from lossy data coding and compression, we present a simple but effective technique for segmenting multivariate mixed data that are drawn from a mixture of Gaussian distributions, which are allowed to be almost degenerate. The goal is to find the optimal segmentation that minimizes the overall coding length of the segmented data, subject to a given distortion. By analyzing the coding length/rate of mixed data, we formally establish some strong connections of data segmentation to many fundamental concepts in lossy data compression, rate distortion theory, and multiple-channel communications. We show that a deterministic segmentation is the (asymptotically) optimal solution for compressing mixed data. We propose a very simple and effective algorithm to find the optimal segmentation, which does not require any prior knowledge of the number or dimension of the groups, nor does it involve any parameter estimation. Simulation results reveal intriguing phase-transition behaviors of the number of segments when changing the level of distortion or the amount of outliers. Finally, we demonstrate how this technique can be readily applied to segment real imagery and bioinformatic data.
- Publisher
- Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
- Series/Report Name or Number
- Coordinated Science Laboratory Report no. UILU-ENG-06-2216, DC-224
- Type of Resource
- text
- Language
- en
- Permalink
- http://hdl.handle.net/2142/99597
- Sponsor(s)/Grant Number(s)
- National Science Foundation / NSF CAREER DMS-0349019, NSF CAREER IIS-0347456, NSF CRS-EHS-0509151, and NSF CCF-TF-0514955
- ONR YIP N00014-05-1-0633
Owning Collections
Manage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…