Using cPseudoMaRg

What Does This Package Provide?

This package provides the boilerplate code for the Correlated Pseudo-Marginal method of (Deligiannidis, Doucet, and Pitt 2015). This vignette demonstrates its use by providing two examples that are closely related to ones given in their paper.

A Background in Markov-chain Monte Carlo

Assume that you are a Bayesian and have some data y, and that you would like to perform inference for a collection of parameters θ. After specifying a likelihood p(y ∣ θ) and a prior p(θ), you are interested in sampling from the posterior p(θ ∣ y) using a Markov chain Monte Carlo (MCMC) algorithm. Such an algorithm will draw correlated samples θ1, θ2, …, θNS from a Markov chain, and under suitable regularity conditions, the average of your large collection of samples will approximate posterior expectations such as the posterior mean: E[θ ∣ y].

This package implements Metropolis-Hastings (MH) type algorithms, which is a specific sub-class of MCMC methods. MH algorithms moves from iteration to iteration by following a two-step procedure. Step one starts with proposing a new value in the chain. Step two makes a decision on whether or not to accept this new proposal. In the event that the sample is rejected, the chain stays in the same place. other MCMC methods, such as Gibbs sampling, produce samples that cannot stay in the same place.

This software is more specifically tailored to an implementation of the Correlated Pseudo-Marginal sampler (CPM). CPM is an extension of the Pseudo-Marginal Metropolis-Hastings algorithm (PMMH) (Andrieu and Roberts 2009), which is itself an extension of the Metropolis-Hastings (MH) algorithm. CPM and PMMH are useful for sampling from the posterior of a model with an intractable likelihood. While the original MH algorithm requires the ability to evaluate the model’s likelihood, CPM and PMMH only requires unbiased approximations of it.

The makeCPMSampler() Function

This package provides a function makeCPMSampler() that implements all three of these algorithms (although it is more suited for PMMH and CPM algorithms). It abstracts away many implementation details, so, mathematically speaking, only a small degree of familiarity with the algorithms is required to use this software.

Programmatically speaking, makeCPMSampler() is a function factory. This means that it is a function that creates and returns another function. The returned function will be the function that generates parameter samples.

Furthermore, many of makeCPMSampler()’s required inputs are functions, as well. Here is a description of all the inputs it takes.

  • Any MH algorithm (e.g. MH, PMMH, CPM) requires that you are able to sample from a proposal distribution, and to evaluate its density. You provide it these to makeCPMSampler() in the first two inputs: paramKernSamp= and logParamKernEval=.

  • You will need to provide a function that evaluates the logarithm of the prior distribution. This is the third argument: logPriorEval.

  • Unlike the standard Metropolis-Hastings algorithm, you will not be required to provide a function that evaluates the logarithm of the likelihood. Recall from earlier that PMMH and CPM only require an unbiased estimate of the likelihood. The logarithm of an unbiased approximation is provided as the fourth argument: logLikeApproxEval=. If you’d like, you may provide for this argument a function giving exact evaluations. In this case, makeCPMSampler() will return a standard MH sampler. This can be useful, for instance, if you are comparing the relative efficiency of pseudo-marginal and non-pseudo-marginal methods.

  • Finally, you must provide the non-functional inputs. yData is the entire data set of observations. numU is the number of standard univariate normal samples that are used for each approximate log-likelihood evaluation. numIters is the total number of samples drawn. This does not count burn-in or thinning. rho is the correlation between standard normal variates at each MCMC iteration. Finally, change storeEvery to some integer greater than 1 if you are concerned about memory on your computer, and you’d like to use thinning. If you set it to 2, say, then every other sample will be retained.

A First Example

The first example is the same as the provided in this package’s documentation. This section breaks up the code into chunks in order to explain it more easily. The model has two parameters: θ = (θ1, θ2). The conditional likelihood is

$$ \begin{eqnarray} p(y_{1:N} \mid x_{1:N}, \theta) = \prod_{i=1}^N p(y_{i} \mid x_{i}, \theta) \tag{1} \end{eqnarray} $$

where $$ \begin{eqnarray} y_{i} \mid x_{i}, \theta \sim \text{Normal}(x_i, \theta_1 - \theta_2) \end{eqnarray} $$ for i = 1, …, N. Here x1, …, xN := x1 : N is a collection of latent/hidden/unobserved random variables. They have the following distribution:

$$ \begin{eqnarray} p(x_{1:N} \mid \theta) = \prod_{i=1}^N p( x_{i} \mid \theta) \tag{2} \end{eqnarray} $$

where

$$ \begin{eqnarray} x_{i} \mid \theta \sim \text{Normal}(x_i, \theta_2) \end{eqnarray} $$

for i = 1, …, N.

We can simulate N = 10 observations from the model with the following code.

realTheta1 <- .2 + .3
realTheta2 <- .2
realParams <- c(realTheta1, realTheta2)
numObs <- 10
realX <- rnorm(numObs, mean = 0, sd = sqrt(realTheta2))
realY <- rnorm(numObs, mean = realX, sd = sqrt(realTheta1 - realTheta2))

Next, construct the sampler function sampler using makeCPMSampler(). We assume a uniform prior over the region 50 > θ1 > θ2 > 0. A simple multivariate normal is used for the proposal distribution.

library(cPseudoMaRg)
numImportanceSamps <- 1000
numMCMCIters <- 1000
randomWalkScale <- 1.5
recordEveryTh <- 1

# create the function that performs sampling
sampler <- makeCPMSampler(
  paramKernSamp = function(params){
    params + rnorm(2)*randomWalkScale
  },
  logParamKernEval = function(oldTheta, newTheta){
    dnorm(newTheta[1], oldTheta[1], sd = randomWalkScale, log = TRUE)
    + dnorm(newTheta[2], oldTheta[2], sd = randomWalkScale, log = TRUE)
  },
  logPriorEval = function(theta){
    if( (50 > theta[1]) & (theta[1] > theta[2]) & (theta[2] > 0) ){
      -7.130899 # - log of 50^2/2
    }else{
      -Inf
    }
  },
  logLikeApproxEval = function(y, thetaProposal, uProposal){
    if(  (50 > thetaProposal[1]) & (thetaProposal[1] > thetaProposal[2]) & (thetaProposal[2] > 0)  ){
      xSamps <- uProposal*sqrt(thetaProposal[2])
      logCondLikes <- sapply(xSamps,
                             function(xsamp) {
                               sum(dnorm(y,
                                         xsamp, 
                                         sqrt(thetaProposal[1] - thetaProposal[2]),
                                         log = TRUE)) })
      m <- max(logCondLikes)
      log(sum(exp(logCondLikes - m))) + m - log(length(y))
    }else{
      -Inf
    }
  },
  realY, 
  numImportanceSamps, 
  numMCMCIters, 
  .99, # change to 0 for original pseudo-marginal method
  recordEveryTh)

The approximate log-likelihood, provided to the logLikeApproxEval= parameter, uses importance sampling: $$ \begin{align*} \log \hat{p}(y_{1:N} \mid \theta) &= \log \left\{ N_X^{-1} \sum_{i=1}^{N_X} \frac{p(y_{1:N} \mid X^i_{1:N}, \theta) p(X^i_{1:N} \mid \theta)}{q(X^i_{1:N} \mid y_{1:N}, \theta)} \right\} \end{align*} $$ where $X^1_{1:N}, \ldots, X^{N_X}_{1:N} \overset{\text{iid}}{\sim} q( \cdot \mid y_{1:N}, \theta)$.

Unlike many importance sampling implementations that call pseudo-random number generating functions (e.g. rnorm) directly, this calculates the approximation based on standard normal samples generated from within makeCPMSampler(). We choose q(x1 : N ∣ y1 : N, θ) = p(x1 : N ∣ θ), despite it not being the optimal proposal/instrumental distribution. Note the use of the “log-sum-exp” trick to avoid numerical underflow.

Finally, sample the Markov chain. Call the samples res, and examine them.

res <- sampler(realParams)
print(res)
#> CPM Results: at a glance...
#>  1000  iterations performed
#>  1000  samples retained
#>  posterior means (in the supplied parameterization):  3.954376 2.877783 
#>  acceptance rate:  0.248
plot(res)

You might have noticed that this particular model has a tractable likelihood, so it is more efficient to use regular MH. The likelihood is equal to the following

$$ \begin{align*} p(y_{1:N} \mid \theta) &= \int p(y_{1:N} \mid x_{1:N}, \theta) p(x_{1:N} \mid \theta) \text{d}x_{1:N} \\ &= \prod_{i=1}^N \text{Normal}(y_i ; 0, \theta_1) \end{align*} $$

samplerExact <- makeCPMSampler(
  paramKernSamp = function(params){
    return(params + rnorm(2)*randomWalkScale)
  },
  logParamKernEval = function(oldTheta, newTheta){
    dnorm(newTheta[1], oldTheta[1], sd = randomWalkScale, log = TRUE)
    + dnorm(newTheta[2], oldTheta[2], sd = randomWalkScale, log = TRUE)
  },
  logPriorEval = function(theta){
    if( (50 > theta[1]) & (theta[1] > theta[2]) & (theta[2] > 0) ){
      -7.130899 # - log of 50^2/2
    }else{
      -Inf
    }
  },
  logLikeApproxEval = function(y, thetaProposal, uProposal){
    # this is exact now!
    if( (50 > thetaProposal[1]) & (thetaProposal[1] > thetaProposal[2]) & (thetaProposal[2] > 0) ){
      sum(dnorm(y, mean = 0, sd = sqrt(thetaProposal[1]), log = TRUE))
    }else{
      -Inf
    }
  },
  realY, 
  numImportanceSamps, # doesn't this matter because Us are not used
  numMCMCIters, 
  .99, # doesn't this matter because Us are not used
  recordEveryTh)
res2 <- samplerExact(realParams)
print(res2)
#> CPM Results: at a glance...
#>  1000  iterations performed
#>  1000  samples retained
#>  posterior means (in the supplied parameterization):  1.214049 0.5982352 
#>  acceptance rate:  0.109
plot(res2)

A Second Example

Consider the following stochastic volatility (Taylor 1982), which is a model that is simpler than the one explored in (Deligiannidis, Doucet, and Pitt 2015). It assumes a latent AR(1) process. It also assumes that, after conditioning on contemporaneous values of the latent process, that the observed returns are normal.

In other words, let Z1, …, ZN and W1, …, WN be iid standard normal random variables, and for t > 1, define the latent process as

Xt = ϕXt − 1 + σZt, and X1 ∼ Normal(0, σ2/(1 − ϕ2)). The distribution of the observed returns is defined by Yt = βexp (Xt/2)Wt for t = 1, …, N. The collection of all unknown parameters is θ = ϕ, β, σ2.

Y1 : N ∣ X1 : N, θ has the same factorization as equation (1), but unlike equation (2), we have the following Markovian factorization for the distribution of all latent variables:

$$ \begin{eqnarray} p(x_{1:N} \mid \theta) = p(x_1) \prod_{t=2}^N p( x_{t} \mid x_{t-1}, \theta) \tag{3}. \end{eqnarray} $$ Assume that all three parameters are independent a priori: π(ϕ)π(β)π(σ2), and select a uniform, Gaussian, and Inverse-Gamma distribution for these three distributions, respectively. For the proposal distribution, we use a multivariate normal in a transformed parameter space. ϕ is transformed with logit((ϕ + 1)/2), β is transformed with the identity map, and σ2 is transformed with the natural logarithm.

The primary bottleneck in the previous example was the function passed in to the logLikeApproxEval= input of makeCPMSampler(). It was written entirely in R, and so, as a result, was not as fast as it could have been. In this example, we provide a compiled function for this input that makes use of C++ code taken from the PF C++ library (Brown 2020). A growing collection of examples of calling PF code from R is provided as an R package called pfexamplesinr that is hosted on Github. Interfacing to c++ from R was facilitated by the RcppEigen package (Bates and Eddelbuettel 2013).

The particle filter used to compute the approximate log-likelihood is described in (Deligiannidis, Doucet, and Pitt 2015). Like the example above, it calculates the approximation using common random normal variables generated from within makeCPMSampler(). As detailed in (Deligiannidis, Doucet, and Pitt 2015), the particle filter that calculates this number uses a resampling technique that makes of the pseudo-inverse of a Hilbert space-filling function. This function was implemented with C++ code lightly edited from the C code provided in (Skilling 2004).

Since version 1.0.1, this package provides an extra input to makeCPMSampler() called nansInLLFatal=. If set to FALSE, then whenever NaN is returned from the approximate log-likelihood function, the parameter proposal is disregarded. Alternatively, if it is set to TRUE, then the entire chain comes to a halt, and all the samples obtained up until that iteration are disregarded. When using particle filters to produce approximations to the likelihood, I recommend setting this to FALSE, especially when the proposal distribution has a high amount of dispersion. This is because, despite one’s best efforts to guard against it, particle filters can frequently generate NaNs.

Please also note that the number of particles used in this particle filter is selected in two places. One place is the code below, and another in a c++ #define directive in the file src/likelihoods.cpp in the pfexamples library. If you are interested in increasing or decreasing the number of particles, fork pfexamplesinr, edit your copy of the files, and run the following example after replacing the appropriate line with devtools::install_github("<your Github username>/pfexamplesinr").

library(cPseudoMaRg)
devtools::install_github("tbrown122387/pfexamplesinr@e4e2a80")
library(pfexamplesinr)

returnsData <- read.csv("data/return_data.csv", header=F)[,1]
numParticles <- 500 # THIS MUST MATCH "#define NP 500" in src/likelihoods.cpp
numMCMCIters <- 500
randomWalkScale <- .1
recordEveryTh <- 1
numUs <- length(returnsData)*(numParticles+1)

# some helper functions
transformParams <- function(untrans){
  p <- vector(mode = "numeric", length = 3)
  p[1] <- boot::logit(.5*(untrans[1] + 1))
  p[2] <- untrans[2]
  p[3] <- log(untrans[3])
  return(p)
}
revTransformParams <- function(trans){
  p <- vector(mode = "numeric", length = 3)
  p[1] <- 2*boot::inv.logit( trans[1] )-1
  p[2] <- trans[2]
  p[3] <- exp(trans[3])
  return(p)
}

sampler <- makeCPMSampler(
  paramKernSamp = function(params){
    revTransformParams(transformParams(params) + rnorm(3)*randomWalkScale)
  },
  logParamKernEval = function(oldTheta, newTheta){
    unconstrainedNew <- transformParams(newTheta)
    unconstrainedOld <- transformParams(oldTheta)
    dnorm(unconstrainedNew[1], unconstrainedOld[1], sd = randomWalkScale, log = TRUE) #phi
    + 0.6931472 + unconstrainedNew[1] - 2*log(1 + exp(unconstrainedNew[1])) # phi jacobian
    + dnorm(unconstrainedNew[2], unconstrainedOld[2], sd = randomWalkScale, log = TRUE) #beta
    + dnorm(unconstrainedNew[3], unconstrainedOld[3], sd = randomWalkScale, log = TRUE) #sigmaSquared
    - unconstrainedNew[3] # jacobian
  },
  logPriorEval = function(theta){
    if( (abs(theta[1]) >= 1.0) || theta[3] <= 0.0 ){
      -Inf
    }else{
       log(.5) + 
        dnorm(theta[2], mean = 0, sd = 10, log = T) + 
        dgamma(x = 1/theta[3], shape = 1.3, rate = .3, log = T)
    }
  },
  logLikeApproxEval = svolApproxLL, # c++ function from pfexamplesinr
  returnsData, numUs, numMCMCIters, .99, recordEveryTh, FALSE
)
svolSampleResults <- sampler( c(.9, 1, .1))
mean(svolSampleResults)
print(svolSampleResults)

References

Andrieu, Christophe, and Gareth O. Roberts. 2009. “The Pseudo-Marginal Approach for Efficient Monte Carlo Computations.” The Annals of Statistics 37 (2): 697–725. http://www.jstor.org/stable/30243645.
Bates, Douglas, and Dirk Eddelbuettel. 2013. “Fast and Elegant Numerical Linear Algebra Using the RcppEigen Package.” Journal of Statistical Software 52 (5): 1–24. https://www.jstatsoft.org/v52/i05/.
Brown, Taylor R. 2020. “A Short Introduction to PF: A c++ Library for Particle Filtering.” Journal of Open Source Software 5 (54): 2599. https://doi.org/10.21105/joss.02599.
Deligiannidis, George, Arnaud Doucet, and Michael K. Pitt. 2015. “The Correlated Pseudo-Marginal Method.”
Skilling, John. 2004. Programming the Hilbert curve.” In Bayesian Inference and Maximum Entropy Methods in Science and Engineering, edited by Gary J. Erickson and Yuxiang Zhai, 707:381–87. American Institute of Physics Conference Series. https://doi.org/10.1063/1.1751381.
Taylor, S. J. 1982. “Financial Returns Modelled by the Product of Two Stochastic Processes-a Study of the Daily Sugar Prices 1961-75.” Time Series Analysis : Theory and Practice 1: 203–26. https://ci.nii.ac.jp/naid/10018822959/en/.