Exploration of GPU acceleration for pair-HMM algorithm and its application in the DNA alignment problem

Li, Enliang

Exploration of GPU acceleration for pair-HMM algorithm and its application in the DNA alignment problem

Li, Enliang

Permalink

https://hdl.handle.net/2142/104023

Description

Title

Exploration of GPU acceleration for pair-HMM algorithm and its application in the DNA alignment problem

Author(s)

Li, Enliang

Contributor(s)

Chen, Deming

Issue Date

2019-05

Keyword(s)

Hardware Acceleration
DNA Alignment
Pair-HMM
Forward Algorithm
CUDA Implementation

Abstract

The hidden Markov model, known as HMM, is an important type of statistical model with extensive application in estimating hidden parameters and decoding observed Markov chains. On top of the HMM, the Pair-HMM Algorithm with Halotype-Caller is developed as a popular solution for the DNA alignment problem. For two aligned sequences of DNA observations, one named as reference, and the other one named as read, there are only three possible hidden states, i.e. match (A , A), insertion (- , A), and deletion (A , -). However, what we could observe by DNA sequencing in real-life is the summation of the possibilities for match, insertion, and deletion as macrostates. In order to determine the alignment with maximum probability, we need to score each possible pairwise alignment and which leads to a computationally intensive problem that usually contributes to the most latency in a variant calling with the GATK HaplotypeCaller. In the CPU implementation of a proper Pair-HMM forward algorithm, there are 7 multiply-accumulate operations for each ( i , j ) location on the read-reference matrix. Moreover, since transitions and emission matrices are fixed throughout a single alignment process, a CUDA implementation with single-precision floating-point is proposed to accelerate the Pair-HMM forward algorithm. CUDA implementation with minibatch and states-parallelization, along with the use of float32, gives us an around 22.6x speedup compared to the CPU implementation. While it comes with a price, using single-precision instead of double-precision floating-point introduces a more serious under flow problem at the beginning of the alignment scoring process. A normalization technique is used to help fix this problem.

Type of Resource

text

Language

Permalink

http://hdl.handle.net/2142/104023

Owning Collections

Senior Theses - Electrical and Computer Engineering PRIMARY

The best of ECE undergraduate research

Exploration of GPU acceleration for pair-HMM algorithm and its application in the DNA alignment problem

Li, Enliang

Permalink

Description

Owning Collections

Senior Theses - Electrical and Computer Engineering PRIMARY

Log In