Withdraw
Loading…
Tiger: tiled iterative genome assembler and approximate multi-genome aligner
Wu, Xiao-Long
Loading…
Permalink
https://hdl.handle.net/2142/45618
Description
- Title
- Tiger: tiled iterative genome assembler and approximate multi-genome aligner
- Author(s)
- Wu, Xiao-Long
- Issue Date
- 2013-08-22T16:55:37Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Hwu, Wen-Mei W.
- Doctoral Committee Chair(s)
- Hwu, Wen-Mei W.
- Committee Member(s)
- Ma, Jian
- Chen, Deming
- Liang, Zhi-Pei
- Robinson, Gene E.
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- De novo genome assembly
- next-generation sequencing
- third-generation sequencing
- iterative genome assembler
- read partitioning
- Multiple sequence alignment
- multiple genome alignment
- Abstract
- Sequence assembly and alignments are two important stepping stones for comparative genomics. With the fast adoption of the next-generation sequencing (NGS) technologies and the coming of the third-generation sequencing (TGS) technologies, genomics has provided us with an unprecedented opportunity to answer fundamental questions in biology and elucidate human diseases. However, most de novo assemblers require an enormous amount of computational resource, which is not readily available to most research groups and medical personnel. Moreover, there has been little progress on sequence assembly qualities, especially for genomes having high repetitions. As more affordable raw data and assembled genomes are accessible to the community, there is an emerging demand for genome searches among the big amount of divergent genomes in gene banks. The genomes can be in the form of raw reads, unfinished/low-quality assemblies, or completed genomes, on which traditional multi-sequence alignment tools may not be suitable to perform similarity searches. Yet there are few research studies aiming at meeting this demand. We have developed a novel de novo assembly framework, called Tiger assembler, which adapts to available computing resources by iteratively decomposing the assembly problem into sub-problems. Our method can flexibly embed different assemblers for various types of target genomes. Using the sequence data from a human chromosome, our results show that Tiger can achieve much better NG50s, better genome coverage, and slightly higher errors, as compared to Velvet and SOAPdenovo, using a modest amount of memory that is available in commodity computers today. We also experimented with a real de novo assembly, i.e., the E. mexicana genome, and demonstrated the strength of our work. The N50s of our contigs and scaffolds by Tiger were 7 and 57 times longer than those by SOAPdenovo. On the other hand, the assembly done by ALLPATHS-LG had only one-third genome size. We also developed a multi-genome sequence aligner, called Tiger aligner, able to perform fast similarity checks among multiple genomes with distant biological relationship and low quality raw data. Practical applications of our tool are demonstrated through experiments. The performance of Tiger aligner on traditional multi-sequence alignments is also compared against existing works, MUMmer and SOAPaligner. The results show the practicality and strengths of our tool. Most state-of-the-art assemblers that can achieve relatively high assembly quality need an excessive amount of computing resource (in particular, memory) that is not readily available to most researchers. Tiger assembler provides the only known viable path to utilize NGS de novo assemblers that require more memory than that is present in available computers. Evaluation results demonstrate the feasibility of getting better quality results with low memory footprint and the scalability of using distributed commodity computers. The quantity explosion of genomes makes existing multi-sequence aligners impractical to check similarities among genomes with different characteristics in terms of evolutionary relationship and sequence completeness. Current pairwise sequence aligners cannot cope with them without big revisions because of the inherently algorithmic limitations. Tiger aligner is the first known work invented to deal with the multi-genome problems, leveraging the feature-based image recognition idea.
- Graduation Semester
- 2013-08
- Permalink
- http://hdl.handle.net/2142/45618
- Copyright and License Information
- Copyright 2013 Xiao-Long Wu
Owning Collections
Dissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer EngineeringGraduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…