New compression scheme for integer annotation in VCF files
Guan, Haozhong
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/104011
Description
Title
New compression scheme for integer annotation in VCF files
Author(s)
Guan, Haozhong
Contributor(s)
Ochoa, Idoia
Issue Date
2019-05
Keyword(s)
Compression scheme for VCF file
Genomic data storage
Abstract
This thesis introduces specialized compression schemes for integer type annotations in
genomic VCF files. Variant call format (VCF) is a text file format. The genomic VCF files contain
the genotype information of a collection of samples, i.e., the variants/differences of a given
genome with respect to a reference sequence, together with several important variant
annotations. These annotations such as read depth (DP) and allele frequency (AF) are stored in
different data types, which are always used as input to several analysis pipelines, especially in
the clinical setting. Therefore, easy access to the data is crucial for clinics to facilitate their
analysis and meet possible time and memory constraints. In consideration of such
requirements, the goal of the project is to design compression schemes supporting fast queries
for VCF files. The main focus of this thesis is introducing a new compression scheme for RO, QA,
and QR annotations in VCF files.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.