Learning and Smooth Simultaneous Estimation of Errors Based on Empirical Data

Buescher, Kevin Lee

Learning and Smooth Simultaneous Estimation of Errors Based on Empirical Data

Buescher, Kevin Lee

This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.

Permalink

https://hdl.handle.net/2142/71990

Description

Title

Learning and Smooth Simultaneous Estimation of Errors Based on Empirical Data

Author(s)

Buescher, Kevin Lee

Issue Date

1993

Doctoral Committee Chair(s)

Kumar, P.R.

Department of Study

Electrical Engineering

Discipline

Electrical Engineering

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Statistics
Engineering, System Science
Artificial Intelligence
Computer Science

Abstract

This thesis examines issues related to Valiant's Probably Approximately Correct (PAC) model for learning from examples. In this model, a student observes examples that consist of sample points drawn according to a fixed, unknown probability distribution and labeled by a fixed, unknown binary-valued function. Based on this empirical data, the student must select, from a set of candidate functions, a particular function, or "hypothesis," that will accurately predict the labels of future sample points. The expected mismatch between its prediction and the label of a new sample point is called a hypothesis' "generalization error."
We treat a more realistic problem than Valiant's model in this thesis. We allow the labels and the hypotheses to take general values (rather than just the values zero and one). We do not assume that a hypothesis with zero generalization error exists. Further, we do not even assume that a deterministic functional relationship exists between the sample points and the labels. In particular, the observed values of the labels and sample points may be subject to noise. Also, we allow prior knowledge about the set of probability distributions to be incorporated into the problem. In addition, we address the issue of determining the appropriate complexity for the class of candidate hypotheses. This is related to the problem of the tendency to fit the noise in the data, and the attendant increase in generalization error, as the complexity of the candidate hypotheses increases.
Following the pioneering work of Vapnik and Chervonenkis, others have attacked this sort of learning problem by finding hypotheses that minimize the relative frequency-based empirical error estimate. We generalize this approach by examining the "simultaneous estimation" problem: When does some procedure exist for estimating the generalization error of all of the candidate hypotheses, simultaneously, from the same labeled sample? We demonstrate how one can learn from such a simultaneous error estimate and propose a new class of estimators, called "smooth estimators," that, in many cases of interest, contains the empirical estimator. We characterize the class of simultaneous estimation problems solvable by a smooth estimator. We give a canonical form for the smooth simultaneous estimator. Further, we show that, when the empirical estimator is smooth, a learning procedure based on the canonical estimator will work in every case in which empirical error minimization does. We derive bounds to show that the number of samples required by the canonical estimator and the empirical estimator is comparable.

Graduation Semester

1993

Type of Resource

text

Permalink

http://hdl.handle.net/2142/71990

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

Learning and Smooth Simultaneous Estimation of Errors Based on Empirical Data

Buescher, Kevin Lee

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In