Binary classification with training under both classes

Li, Yun

Binary classification with training under both classes

Li, Yun

Permalink

https://hdl.handle.net/2142/34344

Description

Title

Binary classification with training under both classes

Author(s)

Li, Yun

Issue Date

2012-09-18T21:12:30Z

Director of Research (if dissertation) or Advisor (if thesis)

Veeravalli, Venugopal V.

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Keyword(s)

binary hypothesis testing
countably infinite alphabet
binary classification
Stein's lemma
finite sample performance

Abstract

This thesis focuses on the binary classification problem with training data under both classes. We first review binary hypothesis testing problems and present a new result on the case of countably infinite alphabet. The goal of binary hypothesis testing is to decide between the two underlying probabilistic processes. Asymptotic optimality of binary hypothesis testing can be achieved with the knowledge of only one of the processes. It is also shown that the finite sample performance could improve greatly with additional knowledge of the alternate process. Most previous work focuses on the case where the alphabet is finite. This thesis extends the existing results to the case of countably infinite alphabet. It is proved that, without knowledge of the alternate process, the worst-case performance of any test is arbitrarily bad, even when the alternate process is restricted to be ``far'' in the sense of relative entropy. Binary classification problems arise in applications where a full probabilistic model of either of the processes is absent and pre-classified samples from both of the processes are available. It is known that asymptotic optimality can be achieved with the knowledge of only one pre-classified training sequence. We propose a classification function that depends on both training sequences. Then Stein's lemma for classification is proved using this new classification function. It states that the maximal error exponent under one class is given by the relative entropy between the conditional distributions of the two classes. Our results also shed light on how the classification errors depend on the relative size of the training and test data. It is shown in the simulation results that our classification method outperforms the asymptotically optimal one when the test samples are of limited size.

Graduation Semester

2012-08

Permalink

http://hdl.handle.net/2142/34344

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

Binary classification with training under both classes

Li, Yun

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In