학술논문

Accelerating Bayesian Inference on Structured Graphs Using Parallel Gibbs Sampling
Document Type
Conference
Source
2019 29th International Conference on Field Programmable Logic and Applications (FPL) Field Programmable Logic and Applications (FPL), 2019 29th International Conference on. :159-165 Sep, 2019
Subject
Computing and Processing
Source separation
Computational modeling
Bayes methods
Image color analysis
Markov processes
Computer architecture
Field programmable gate arrays
Bayesian inference
Markov chain Monte Carlo
Gibbs sampling
hardware accelerator
Markov random field
Language
ISSN
1946-1488
Abstract
Bayesian models and inference is a class of machine learning that is useful for solving problems where the amount of data is scarce and prior knowledge about the application allows you to draw better conclusions. However, Bayesian models often requires computing high-dimensional integrals and finding the posterior distribution can be intractable. One of the most commonly used approximate methods for Bayesian inference is Gibbs sampling, which is a Markov chain Monte Carlo (MCMC) technique to estimate target stationary distribution. The idea in Gibbs sampling is to generate posterior samples by iterating through each of the variables to sample from its conditional given all the other variables fixed. While Gibbs sampling is a popular method for probabilistic graphical models such as Markov Random Field (MRF), the plain algorithm is slow as it goes through each of the variables sequentially. In this work, we describe a binary label MRF Gibbs sampling inference architecture and extend it to 64-label version capable of running multiple perceptual applications, such as sound source separation and stereo matching. The described accelerator employs a chromatic scheduling of variables to parallelize all the conditionally independent variables to 257 samplers, implemented on the FPGA portion of a CPU-FPGA SoC. For real-time streaming sound source separation task, we show the hybrid CPU-FPGA implementation is 230x faster than a commercial mobile processor, while maintaining a recommended latency under 50 ms. The 64-label version showed 137x and 679x speedups for binary label MRF Gibbs sampling inference and 64 labels, respectively.