Improving the genome assembly and annotation of the white-tailed deer (Odocoileus virginianus borealis)
London, Evan W.
Loading…
Permalink
https://hdl.handle.net/2142/110854
Description
Title
Improving the genome assembly and annotation of the white-tailed deer (Odocoileus virginianus borealis)
Author(s)
London, Evan W.
Issue Date
2021-04-27
Director of Research (if dissertation) or Advisor (if thesis)
Mateus-Pinilla, Nohra E
Committee Member(s)
Novakofski, Jan E
Roca, Alfred L
Catchen, Julian M
Department of Study
Animal Sciences
Discipline
Bioinformatics
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Genomic resource
PacBio sequencing, Wildlife disease
Abstract
Widely distributed in North America, the white-tailed deer (Odocoileus virginianus) has recreational and commercial value and is a food source for many communities. The impacts that deer impose on agriculture, conservation, and public health are rising. They are responsible for deer-vehicle collisions and damage to crops and natural areas. The species is affected by infectious diseases such as chronic wasting disease, epizootic hemorrhagic disease, and bovine tuberculosis. Genomic resources facilitate the study of pathogens, host-pathogen interactions, host genetic variation, and behavior. Repetitive elements are ubiquitous within mammalian genomes, and long single-molecule reads produced by third-generation sequencing can span these regions. I present a genome produced with DNA from a single white-tailed deer sequenced on the PacBio Sequel II platform and assembled using Redbean (WTDBG2) long-read assembly software. Post-assembly, long and short reads from the same animal were used for error-correcting and polishing the assembly. Gene models were predicted with the BRAKER annotation pipeline using RNA and protein sequences as extrinsic evidence. The final assembly was highly contiguous, with 90% of the total length represented by 134 contigs. The largest contig was 108 million base pairs. Functional annotation was performed using reciprocal best hits with cattle protein sequences. Protein function was able to be assigned to 16,125 coding sequences. The locations of genes related to CWD, EHD, and bTB were also identified. An analysis using the sequentially Markovian coalescent was used to infer population diversity of white-tailed deer for the past 2 million years. This accurate and more complete assembly will support future genomic studies on white-tailed deer and permit the use of chromatin-contact information to construct a chromosome-level assembly of the genome.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.