Small-sample estimation of the mutational support and the distribution of mutations in the SARS-CoV-2 genome
Rana, Vishal
Loading…
Permalink
https://hdl.handle.net/2142/110417
Description
Title
Small-sample estimation of the mutational support and the distribution of mutations in the SARS-CoV-2 genome
Author(s)
Rana, Vishal
Issue Date
2021-02-23
Director of Research (if dissertation) or Advisor (if thesis)
Milenkovic, Olgica
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Comparative ORF study
Good-Turing estimation
Mutation rates
SARS-Cov-2 Data analysis
Small-sample support estimation
Abstract
The problem of accurately estimating and characterizing different mutations in the viral genomes present within a population is of great importance in tracking and mitigating the spread of the virus and is made difficult by the lack of a sufficient number of sequenced genomes especially during the early stages of an outbreak. We consider the problem of determining the mutational support and distribution of mutations in the SARS-Cov-2 genome and its open reading frames (ORFs). The mutational support refers to the unknown number of sites that are mutated among all the viral strains present in a population. The support and distribution of mutations can be used to guide primer selection for RT PCR test kits, study the virulence of the virus, discover adaptation mechanisms deployed by the virus to evade the host immune system, as well as to identify new strains that might be circulating in the population early on. We propose new state-of-the-art polynomial estimation techniques using weighted and regularized Chebyshev approximations for small-sample mutational support estimation and we use a modified Good-Turing estimator for distribution estimation. Our differential analysis of mutations in various population subgroups (based on data retrieved from GISAID repository) revealed several important differences including those in the ORF6 and ORF7a regions for older versus younger patients, ORF1b and ORF10 regions for females versus males, and in several ORFs for Asia versus Europe and North America. We also found no significant mutations in the primer regions from ORF N chosen by CDC for RT-PCR test kits in any of the subpopulations, which is important for the reliability of the test results.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.