Change of representation in machine learning, and an application to protein structure prediction
Ioerger, Thomas Richard
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/21110
Description
Title
Change of representation in machine learning, and an application to protein structure prediction
Author(s)
Ioerger, Thomas Richard
Issue Date
1996
Doctoral Committee Chair(s)
Rendell, Larry A.
Department of Study
Computer Science
Discipline
Computer Science
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Chemistry, Physical
Artificial Intelligence
Computer Science
Language
eng
Abstract
While many excellent induction algorithms are known for making predictions from databases in well-studied domains, learning systems still perform poorly in many difficult real-world domains, such as weather prediction or financial risk analysis. Two characteristics of real-world domains are inadequately addressed by current machine learning research. First, the difficulty in these domains is often caused by a low-level representation, which necessitates shifting to a higher-level representation. But the space of possible representations is very large, so we need intelligent methods for finding higher-level representations. Second, background knowledge is almost always available in real-world domains, which we would like to take advantage of to increase predictive accuracy. However, known roles for domain knowledge in machine learning are often inflexible, requiring the use of a specific induction algorithm or being sensitive to incorrectness or incompleteness in the knowledge.
We propose a general framework for change-of-representation based on searching for alternative representations to improve the accuracy of an underlying induction algorithm. Representations are selected as candidates by querying a strategy component, which relies on domain knowledge to suggest which alternatives to search. An evaluation component then compares these representations by applying each representation to a set of examples and running the induction algorithm on the transformed examples to empirically determine the effect of the change on accuracy. This approach provides solutions to the two characteristic problems of learning in real-world domains. First, domain knowledge is used as a heuristic to guide the search for alternative representations, enabling more intelligent decisions during change-of-representation. Second, the framework provides a flexible role for knowledge that can be used with any learning algorithm and is tolerant of uncertainty. An implementation of this framework could be used as an interface between a human expert and a learning program in which: (1) the human uses background knowledge to generate and prioritize alternative representations, and (2) the system empirically evaluates these to discover the best change for improving accuracy.
We apply our framework for change-of-representation to the difficult, real-world domain of protein tertiary (3D) structure prediction. The best computational method to date for determining the structure of a protein from its amino acid sequence is homology modeling, which is based on sequence alignments with a protein database. Homology modeling can fail in cases where the sequence similarity is low between proteins with similar structures. However, the physical and chemical properties of amino acids are believed to relevant to protein structure. Using an instantiation of our framework, we incorporate this domain knowledge to suggest ways to change the representation of amino acid sequences. Efficient search procedures are derived from the knowledge that lead to the discovery of representations that improve the ability to predict protein structures by homology modeling.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.