Hypothesis testing and learning with small samples

Huang, Dayu

Hypothesis testing and learning with small samples

Huang, Dayu

Permalink

https://hdl.handle.net/2142/42475

Description

Title

Hypothesis testing and learning with small samples

Author(s)

Huang, Dayu

Issue Date

2013-02-03T19:47:07Z

Director of Research (if dissertation) or Advisor (if thesis)

Meyn, Sean P.

Doctoral Committee Chair(s)

Meyn, Sean P.

Committee Member(s)

Blahut, Richard E.
Milenkovic, Olgica
Veeravalli, Venugopal V.

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Hypothesis Testing
Large Deviations
Classification
Large Alphabet
Feature Extraction

Abstract

Statistical hypothesis testing is a method to make a decision among two or more hypotheses using measurement data. It includes, for instance, deciding whether a system is in its normal state based on sensor measurements, or whether a person is healthy using data from medical tests. We are interested in the situation where the amount of measurement data available is sometimes limited, and the statistical models under the hypotheses have significant uncertainties: for example, a system could have many different abnormal states. The goal of this thesis is to develop appropriate analysis methods for hypothesis testing problems with a small number of observations and uncertainties regarding the hypotheses. We focus on two problems: a universal hypothesis testing problem and a binary classification problem. In the first problem, only one of the hypotheses has a clearly specified statistical model. In the second problem, the statistical model under either hypothesis is only partially known and training data are available to help learn the model. For both problems, existing analysis using large deviations has been shown to be a useful tool that leads to asymptotically optimal tests. However, the classical error exponent criterion that forms the foundation of this theory is not applicable for problems where the number of observations is relatively small compared to the number of possible outcomes in each observation (or the size of the observation alphabet). We introduce a new performance criterion based on large deviations analysis that generalizes the classical error exponent. The generalized error exponent characterizes how the probability of error depends on the number of observations and the observation alphabet size. It leads to optimal or near-optimal tests and new insights on some existing tests. The generalized error exponent analysis, as well as the classical CLT and error exponent analysis, reveals how the size of the alphabet, or more generally the number of features, affects a test's performance. Results from these analyses suggest that quantizing the observation or selecting a subset of features could help improve a test. We develop an optimization-based algorithm that learns the appropriate features from training data.

Graduation Semester

2012-12

Permalink

http://hdl.handle.net/2142/42475

Copyright and License Information

Owning Collections

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Hypothesis testing and learning with small samples

Huang, Dayu

Permalink

Description

Owning Collections

Dissertations and Theses - Electrical and Computer Engineering

Graduate Dissertations and Theses at Illinois PRIMARY

Log In