One-vector representations of stochastic signals for pattern recognition

Tang, Hao

One-vector representations of stochastic signals for pattern recognition

Tang, Hao

Permalink

https://hdl.handle.net/2142/18595

Description

Title

One-vector representations of stochastic signals for pattern recognition

Author(s)

Tang, Hao

Issue Date

2011-01-21T22:51:06Z

Director of Research (if dissertation) or Advisor (if thesis)

Huang, Thomas S.

Doctoral Committee Chair(s)

Huang, Thomas S.

Committee Member(s)

Levinson, Stephen E.
Hasegawa-Johnson, Mark A.
Ouyang, Yanfeng

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Pattern Recognition
Stochastic Signal
One-Vector Representation
Hidden Markov Model

Abstract

When building a pattern recognition system, we primarily deal with stochastic signals such as speech, image, video, and so forth. Often, a stochastic signal is ideally of a one-vector form so that it appears as a single data point in a possibly high-dimensional representational space, as the majority of pattern recognition algorithms by design handle stochastic signals having a one-vector representation. More importantly, a one-vector representation naturally allows for optimal distance metric learning from the data, which generally accounts for significant performance increases in many pattern recognition tasks. This is motivated and demonstrated by our work on semi-supervised speaker clustering, where a speech utterance is represented by a Gaussian mixture model (GMM) mean supervector formed based on the component means of a GMM that is adapted from a universal background model (UBM) which encodes our prior knowledge of speakers in general. Combined with a novel distance metric learning technique that we propose, namely linear spherical discriminant analysis, which performs discriminant analysis in the cosine space, the GMM mean supervector representation of utterances leads to the state-of-the-art speaker clustering performance. Noting that the main criticism of the GMM mean supervector representation is that it assumes independent and identically distributed feature vectors, which is far from true in practice, we propose a novel one-vector representation of stochastic signals based on adapted ergodic hidden Markov models (HMMs) and a novel one-vector representation of stochastic signals based on adapted left-to-right HMMs. In these one-vector representations, a single vector is constructed based on a transformation of the parameters of an HMM that is adapted from a UBM by various controllable degrees, where the transformation is mathematically derived based on an upper bound approximation of the Kullback-Leibler divergence rate between two adapted HMMs. These one-vector representations possess a set of very attractive properties and are rather generic in nature, so they can be used with various types of stochastic signals (e.g. speech, image, video, etc.) and applied to a broad range of pattern recognition tasks (e.g. classification, regression, etc.). In addition, we propose a general framework for one-vector representations of stochastic signals for pattern recognition, of which the proposed one-vector representations based on adapted ergodic HMMs and adapted left-to-right HMMs respectively are two special cases. The general framework can serve as a unified and principled guide for constructing ``the best'' one-vector representations of stochastic signals of various types and for various pattern recognition tasks. Based on different types of underlying statistical models carefully and cleverly chosen to best fit the nature of the stochastic signals, ``the best'' one-vector representations of the stochastic signals may be constructed by a possibly nonlinear transformation of the parameters of the underlying statistical models which are learned from the stochastic signals, where the transformation may be mathematically derived from a properly chosen distance measure between two statistical models that has an elegant root in the Kullback-Leibler theory. Since most work in this dissertation is based on HMMs, we contribute to this fascinating tool via proposing a new maximum likelihood learning algorithm for HMMs, which we refer to as the boosting Baum-Welch algorithm. In the proposed boosting Baum-Welch algorithm, we formulate the HMM learning problem as an incremental optimization procedure which performs a sequential gradient descent search on a loss functional for a good fit in an inner product function space. The boosting Baum-Welch algorithm can serve as an alternative maximum likelihood learning algorithm for HMMs to the traditional Baum-Welch or expectation-maximization (EM) algorithm, and a preferred method for use in situations where there is insufficient training data available. Compared to the traditional Baum-Welch or EM algorithm, the boosting Baum-Welch algorithm is less susceptible to the over-fitting problem (known as a general property of maximum likelihood estimation techniques) in that the boosting Baum-Welch algorithm has a tendency to produce a ``large margin'' effect.

Graduation Semester

2010-12

Permalink

http://hdl.handle.net/2142/18595

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

One-vector representations of stochastic signals for pattern recognition

Tang, Hao

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In