Speed up your Gibbs sampler 40 times with Rcpp
When Gibbs sampling in R is slow, we can use Rcpp to speed it up by integrating C++ code into R. However, Rcpp code is harder to debug because we cannot run Rcpp code line-by-line like R. It’s paramount that we ensure the Rcpp code is doing what we wants it to.
The solution is to start with small Rcpp programs and make sure that they produce the same results as their pure R counterpart. Following this advice, I’m writing a series of increasingly complex Markov chain Monte Carlo (MCMC) code in Rcpp, eventually building towards my final goal of a custom Metropolis-Hastings algorithm for my work on modeling FDI flow.
The first installment will be a Gibbs sampler of a semi-conjugate normal model, arguably the most common introductory model to Bayesian statistics and MCMC. (Both Hoff’s First Course in Bayesian and Gelman’s Bayesian Data Analysis cover it.)
I’ll follow Hoff’s example in Chapter 6 as a ground truth to check whether my code is working correctly. For Rcpp basics, I read Hadley Wickham’s introduction to Rcpp. I’ll note additional pitfalls along the way of converting R to Rcpp.
Normal model with semi-conjugate priors
A normal model with semi-conjugate priors has the following specifications. For more details and intuition behind the model, see Chapter 5 & 6 in Hoff’s book.
Likelihood
\[Data = Y_1, \dots, Y_n \sim i.i.d. N(\theta, \sigma^2)\]Priors
\[\begin{align} p(\theta) &\sim N(\mu_0, \tau^2_0) \\ p(\sigma^2) &\sim Inverse-Gamma(\nu_0 / 2, \nu_0 \sigma^2_0 / 2) \end{align}\]Full conditionals
\[\begin{align} p(\theta | \tilde \sigma^2, Data) &= N(\mu_n, \tau^2_n) \\ p(\tilde \sigma^2 | \theta, Data) &= Inverse-Gamma(\frac{\nu_n}{2}, \frac{\nu_n \sigma_n^2(\theta)}{2}) \end{align}\]with
\[\begin{align} \tau_n^2 = \frac{1}{\frac{1}{\tau_0^2} + \frac{n}{\sigma^2}} \qquad &\text{and} \qquad \mu_n = \frac{\frac{\mu_0}{\tau_0^2} + \frac{n\bar y}{\sigma^2}}{\frac{1}{\tau_0^2} + \frac{n}{\sigma^2}} \\ \nu_n = \nu_0 + n \qquad &\text{and} \qquad \sigma^2_n(\theta) = \frac{1}{\nu_n} \left[ \nu_0\sigma_0^2 + n \frac{\sum (y_i - \theta)^2}{n}\right] \end{align}\]There is a quick intuitive interpretation of the full conditional of \(\theta\), \(p(\theta | \tilde\sigma^2, Data)\). Its posterior mean \(\mu_n\) is the weighted sum of the prior mean \(\mu_0\) and the sample mean \(\bar y\). This makes sense if we think of the posterior as combining the information from the prior and the sample.
But how should we weigh the information from the prior and the sample? Intuitively, if there’s more information in the prior, we should weigh it more. The amount of information in the prior is represented by the prior precision, \(\frac{1}{\tau_0^2}\). As the formula shows, the precision is the inverse of variance, because the smaller the variance there is in our prior belief about \(\theta\), the more information we have in the prior. In sum, because precision represents the amount of information available, we should weigh the prior mean \(\mu_0\) by the prior precision \(\frac{1}{\tau_0^2}\). That is indeed the weight of the prior mean in the weighted sum.
Similarly, we weigh the sample mean \(\bar y\) by the sample precision \(\frac{n}{\sigma^2}\). Note that we multiply the precision by \(n\) to capture the fact that the bigger the sample size, the more information there is in the sample.
Pure R Gibbs sampler
Below is a direct translation of the full conditionals below into a Gibbs sampler. Inside the big loop, for the iteration \(s\) we sample:
- \[p(\theta^{(s)} | \tilde \sigma^{2(s - 1)}, Data)\]
- \[p(\tilde \sigma^{2(s)} | \theta^{(s)}, Data)\]
and repeats.
We confirm that our algorithm successfully reproduces the result in Hoff (p. 95) with exactly the same posterior distribution of theta.
Rcpp Gibbs sampler
We now rewrite the Gibbs sampler in Rcpp. Putting the R and Rcpp codes side by side, we see only a few differences thanks to Rcpp’s syntactic sugar that replicates R-like syntax in C++. Most of these differences are covered in Hadley’s Intro to Rcpp. Beyond that, there are 2 additional pitfalls when transitioning from R to Rcpp.
-
Remember to use
1.0
instead of1
while doing division so that we get double division like in R, not integer division. For example, in Rcpp,1 / 2 = 0
while1.0 / 2 = 0.5
. -
rgamma
in R uses two parameters, location and rate. In contrast,rgamma
in Rcpp uses location and scale, which is the inverse of rate.
We now check and confirm that our Gibbs sampler in Rcpp produces exactly the same result as its pure R counterpart.
Comparing the density plots on the left with those on the right, we see that the posterior distributions of \(\theta\) and \(\sigma^2\) are exactly the same for the R and Rcpp implementations.
The Rcpp code is also a lot faster than R as expected. The median running time of f_gibbsC
is about 38 times faster than f_gibbsR
.
And that’s the one of the simplest Gibbs sampler in Rcpp. In the next installment we will implement the Metropolis-Hastings sampler in Rcpp. You can click on the categories Bayesian-sampler-in-Rcpp to find more entries in this series.
Leave a Comment