Withdraw
Loading…
Probabilistic Model-Based Approach to Evolutionary Analysis of Non-Coding Sequences
Kim, Jaebum
Loading…
Permalink
https://hdl.handle.net/2142/16981
Description
- Title
- Probabilistic Model-Based Approach to Evolutionary Analysis of Non-Coding Sequences
- Author(s)
- Kim, Jaebum
- Issue Date
- 2010-08-31T20:02:57Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Sinha, Saurabh
- Doctoral Committee Chair(s)
- Sinha, Saurabh
- Committee Member(s)
- Han, Jiawei
- Zhai, ChengXiang
- Ma, Jian
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Multiple sequence alignment
- Probabilistic model
- Insertions and deletions
- Simulation-based benchmark
- Regulatory sequences
- Sequence evolution
- Abstract
- Non-coding sequences, constituting a large fraction of genomic DNA, are of great importance because (i) they harbor functional elements that are involved in the regulation of gene expression and (ii) they are essential for the study of genome structure and evolution. The availability of genome sequences of closely related species has provided opportunities to analyze non-coding sequences by comparing multiple genomes from different species. The success of comparative genomic studies relies on bioinformatics tools that aid the comparison and analysis of genome sequences. Here, we propose and develop computational tools to evolutionarily analyze non-coding sequences, which are based on probabilistic models of sequence evolution. We present a probabilistic framework for finding the locations of insertions and deletions (indels) in a multiple alignment. Its performance is found to be better than that obtained by a parsimony-based method. We study the evolution of sequences involved in the regulation of body patterning in the Drosophila embryo, reporting statistical evidence in favor of key evolutionary hypotheses related to regulatory elements and constraints on indels. We also propose a new simulation scheme for generating biologically realistic benchmarks for the alignments of non-coding sequences. This scheme is used to construct benchmarks for Drosophila non-coding sequences, and evaluation results are shown for several multiple alignment and indel annotation tools on those benchmarks. Finally, we develop a probabilistic framework for multiple sequence alignment that finds an optimal alignment by incrementally building up alignment columns, based on a model for the evolution of three sequences and the joint probability of an alignment column as a substitute for the traditionally used sum-of-pairs score. We find that the new framework produces alignments of much greater specificity than state-of-the-art methods, without compromising too much in terms of sensitivity. The computational tools developed here will play a significant role in solving many biological problems and further contribute to broaden our understanding of organismal diversity and evolution.
- Graduation Semester
- 2010-08
- Permalink
- http://hdl.handle.net/2142/16981
- Copyright and License Information
- Copyright 2010 Jaebum Kim
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…