Change of representation in machine learning, and an application to protein structure prediction

Ioerger, Thomas Richard

Change of representation in machine learning, and an application to protein structure prediction

Ioerger, Thomas Richard

This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.

Permalink

https://hdl.handle.net/2142/21110

Description

Title

Change of representation in machine learning, and an application to protein structure prediction

Author(s)

Ioerger, Thomas Richard

Issue Date

1996

Doctoral Committee Chair(s)

Rendell, Larry A.

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Chemistry, Physical
Artificial Intelligence
Computer Science

Language

eng

Abstract

While many excellent induction algorithms are known for making predictions from databases in well-studied domains, learning systems still perform poorly in many difficult real-world domains, such as weather prediction or financial risk analysis. Two characteristics of real-world domains are inadequately addressed by current machine learning research. First, the difficulty in these domains is often caused by a low-level representation, which necessitates shifting to a higher-level representation. But the space of possible representations is very large, so we need intelligent methods for finding higher-level representations. Second, background knowledge is almost always available in real-world domains, which we would like to take advantage of to increase predictive accuracy. However, known roles for domain knowledge in machine learning are often inflexible, requiring the use of a specific induction algorithm or being sensitive to incorrectness or incompleteness in the knowledge.
We propose a general framework for change-of-representation based on searching for alternative representations to improve the accuracy of an underlying induction algorithm. Representations are selected as candidates by querying a strategy component, which relies on domain knowledge to suggest which alternatives to search. An evaluation component then compares these representations by applying each representation to a set of examples and running the induction algorithm on the transformed examples to empirically determine the effect of the change on accuracy. This approach provides solutions to the two characteristic problems of learning in real-world domains. First, domain knowledge is used as a heuristic to guide the search for alternative representations, enabling more intelligent decisions during change-of-representation. Second, the framework provides a flexible role for knowledge that can be used with any learning algorithm and is tolerant of uncertainty. An implementation of this framework could be used as an interface between a human expert and a learning program in which: (1) the human uses background knowledge to generate and prioritize alternative representations, and (2) the system empirically evaluates these to discover the best change for improving accuracy.
We apply our framework for change-of-representation to the difficult, real-world domain of protein tertiary (3D) structure prediction. The best computational method to date for determining the structure of a protein from its amino acid sequence is homology modeling, which is based on sequence alignments with a protein database. Homology modeling can fail in cases where the sequence similarity is low between proteins with similar structures. However, the physical and chemical properties of amino acids are believed to relevant to protein structure. Using an instantiation of our framework, we incorporate this domain knowledge to suggest ways to change the representation of amino acid sequences. Efficient search procedures are derived from the knowledge that lead to the discovery of representations that improve the ability to predict protein structures by homology modeling.

Type of Resource

text

Permalink

http://hdl.handle.net/2142/21110

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Dept. of Computer Science

Change of representation in machine learning, and an application to protein structure prediction

Ioerger, Thomas Richard

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In