Computational Prediction of Functional Elements through Comparative Genomics

Ling, Xu

Computational Prediction of Functional Elements through Comparative Genomics

Ling, Xu

Content Files

Ling_Xu.pdf

Permalink

https://hdl.handle.net/2142/14586

Description

Title

Computational Prediction of Functional Elements through Comparative Genomics

Author(s)

Ling, Xu

Issue Date

2010-01-06T16:13:18Z

Director of Research (if dissertation) or Advisor (if thesis)

Zhai, ChengXiang
Sinha, Saurabh

Doctoral Committee Chair(s)

Zhai, ChengXiang

Committee Member(s)

Sinha, Saurabh
Schatz, Bruce R.
Blanchette, Mathieu

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Date of Ingest

2010-01-06T16:13:18Z

Keyword(s)

Bioinformatics
Comparative Genomics
Cis-regulatory Elements
Conserved Gene Clusters
Probabilistic Model
Algorithm

Abstract

Understanding the evolution and organization of the genomic functional elements is one of the most important goals of genomic studies. The complexity of the functional information encoded in the genome sequences and the variabilities of the manners of encoding the information make it a very challenging task. Nucleotides mutations and genome-wide re-arrangements bring additional great challenges in identification and understanding of the functional elements in the genome. On the other hand, due to natural selection, functional sequences tend to evolve at a slower rate than non-functional sequences. Therefore, the conservation pattern across species often indicates where functional sequences are located. With the increasing number of species being sequenced, comparative genomes, which compares the sequences from multiple species at varying evolutionary distances, has now merged as a very powerful approach for identifying variety types of functional elements, such as protein coding genes, transcriptional regulatory sequences, and non-coding RNA genes. This dissertation research has been focused on two grand challenges of genomics: (i) to decode cis-regulatory modules (CRMs), non-coding DNA sequences controlling gene expression; and (ii) to discover gene groups that are functionally related. For both lines of work, the key idea is to leverage the power of comparative genomics in decoding the genomic information. The first part of this thesis developed a probabilistic framework for CRM prediction. This framework is based on a probabilistic model of CRM evolution, which captures the content feature of regulatory sequences as well as their dynamic process of evolution. This model advances the previous models by dealing with the inherent uncertainties of transcription factor binding site (TFBS) annotations in a probabilistic framework, as partially conserved binding site has been recognized as an important aspect of regulatory sequence evolution. we explicitly model the two stochastic process of loss of existing TFBSs and TFBS gain from background nucleotides, to leverage the power of comparative genomics for CRM prediction, while at the same time utilize the information of this lineage-specific pattern. The second part of this thesis focuses on discovering functionally related gene groups. Understanding how genes are organized in the genomes and what information is encoded in genomic contexts is one of the fundamental problems in genomics. During evolution, the gene order is generally not well conserved because of the rapid rearrangement events that reshuffle genomes. On the other hand, functionally related genes may be constrained to remain close to each other due to natural selection, forming so called conserved gene clusters. Conservation of spatial organization of genes provides an important source of information that is orthogonal to primary sequences of genes and thus could be exploited to supplement our existing genomic analysis tools. In this thesis, we developed a highly efficient algorithm to discover conserved gene clusters across multiple genomes. These gene clusters are likely under some evolutionary constraint and indicate functional relationship among the genes within a cluster. Our algorithm advances existing work by allowing genes in the clusters to appear in different orders and at the same time making the computation orders of magnitude faster. This allows us to detect conserved gene clusters under flexible evolutionary constraints in a large number of genomes. In addition, we developed a statistical evaluation method, which incorporates the evolutionary relationship among genomes, a key aspect that has been missing in most previous studies. The combined algorithmic and statistical methods provide a rigorous framework for systematically studying evolutionary constraints of genomic contexts.

Graduation Semester

2009-12

Permalink

http://hdl.handle.net/2142/14586

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Siebel School of Computer Science

Computational Prediction of Functional Elements through Comparative Genomics

Ling, Xu

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In