Sampling architectures for probabilistic inference

Ko, Glenn Gihyun

Sampling architectures for probabilistic inference

Ko, Glenn Gihyun

Permalink

https://hdl.handle.net/2142/99482

Description

Title

Sampling architectures for probabilistic inference

Author(s)

Ko, Glenn Gihyun

Issue Date

2017-11-17

Director of Research (if dissertation) or Advisor (if thesis)

Rutenbar, Rob A.

Doctoral Committee Chair(s)

Rutenbar, Rob A.

Committee Member(s)

Chen, Deming
Shanbhag, Naresh R.
Smaragdis, Paris
Nurvitadhi, Eriko

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Machine learning
Probabilistic graphical model
Probabilistic inference
Markov chain Monte Carlo
Gibbs sampling

Abstract

In recent years, machine learning (ML) algorithms for applications such as computer vision, machine listening, topic modeling (i.e., extraction) from large text data sets, etc., have proven to be effective in terms of perceived quality. However, these ML applications tend to be compute-intensive and create performance challenges. We focus on hardware accelerator architectures for inference on probabilistic graphical models, in particular for Markov random field (MRF) and for latent Dirichlet allocation (LDA). Our work focuses on inference via sampling methods, in particular, Markov chain Monte Carlo (MCMC) methods. Roughly speaking, we generate samples from the distribution of labels implied by the structure of the graphical model, and use results computed from the samples to approximate the results we seek. MCMC methods are extraordinarily popular in inference tasks and are widely used, especially with very large models, but they are not commonly seen as either “fast” or “low power” - which are the challenges we seek to address. However, performance is not our only concern. We focus on two applications to drive this research. First, we explore sound source separation, which can be used to separate a human voice from back- ground noise on mobile phones, e.g. talking on your cell phone in an airport. The challenges involved are real-time execution and power constraints. As a solution, we present a novel hardware-based sound source separation implementation capable of real-time streaming performance. The implementation uses a Markov random field inference formulation of foreground/background separation, and targets voice separation on mobile phones with two micro- phones. We demonstrate a real-time streaming FPGA implementation running at 150 MHz with total of 207 KB RAM. Our implementation achieves a speedup of 22X over a conventional software implementation, achieves an SDR of 7.021 dB with 1.6 ms latency, and exhibits excellent perceived audio quality. A virtual ASIC design shows that this architecture is quite small (less than 10 million gates), and consumes only 70 mW and appears amenable to additional optimization for power. The second application is an enterprise-scale application, topic modeling, which is used to extract hidden thematic structure of large sets of documents. Enterprise-scale clusters are usually required to run such massive tasks. We would like to explore the potential benefits of accelerating topic models and provide speed/power trade-offs of building hardware accelerators for them. We took a latent Dirichlet allocation, a probabilistic topic model, and Gibbs sampling inference implementation and profiled it to show that 96% of the run-time is spent sampling, which would be the main focus of the acceleration. We describe a parallel architecture on a FPGA that is theoretically only bounded by memory bandwidth running at 220 MHz and where even a single core is faster than workstation-grade CPU cores. Lastly, we share our findings on accelerating parallel versions of the Gibbs sampling algorithm and also look at precision requirements and potential for huge reduction in number of bits used to perform Gibbs sampling inference on applications such as source separation. We implement a multi-threaded C++ and CUDA GPU implementation of chromatic Gibbs sampling which is a parallel version of Gibbs sampling that uses a graph-coloring scheme to construct Markov chains that can be executed in parallel. We show 1.9X and 22.9X speedups respectively, compared to a conventional single core version running on Intel Xeon. Furthermore, our analysis of the precision and dynamic range of the source separation application showed that we only required 8-bit reduced floating-point to maintain a very low decision error rate on the Gibbs sampler. These early results suggest reduced precision asynchronous Gibbs sampling architectures.

Graduation Semester

2017-12

Type of Resource

text

Permalink

http://hdl.handle.net/2142/99482

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

Sampling architectures for probabilistic inference

Ko, Glenn Gihyun

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In