This package provides the boilerplate code for the Correlated Pseudo-Marginal method of (Deligiannidis, Doucet, and Pitt 2015). This vignette demonstrates its use by providing two examples that are closely related to ones given in their paper.
Assume that you are a Bayesian and have some data y, and that you would like to perform inference for a collection of parameters θ. After specifying a likelihood p(y ∣ θ) and a prior p(θ), you are interested in sampling from the posterior p(θ ∣ y) using a Markov chain Monte Carlo (MCMC) algorithm. Such an algorithm will draw correlated samples θ1, θ2, …, θNS from a Markov chain, and under suitable regularity conditions, the average of your large collection of samples will approximate posterior expectations such as the posterior mean: E[θ ∣ y].
This package implements Metropolis-Hastings (MH) type algorithms, which is a specific sub-class of MCMC methods. MH algorithms moves from iteration to iteration by following a two-step procedure. Step one starts with proposing a new value in the chain. Step two makes a decision on whether or not to accept this new proposal. In the event that the sample is rejected, the chain stays in the same place. other MCMC methods, such as Gibbs sampling, produce samples that cannot stay in the same place.
This software is more specifically tailored to an implementation of the Correlated Pseudo-Marginal sampler (CPM). CPM is an extension of the Pseudo-Marginal Metropolis-Hastings algorithm (PMMH) (Andrieu and Roberts 2009), which is itself an extension of the Metropolis-Hastings (MH) algorithm. CPM and PMMH are useful for sampling from the posterior of a model with an intractable likelihood. While the original MH algorithm requires the ability to evaluate the model’s likelihood, CPM and PMMH only requires unbiased approximations of it.
makeCPMSampler()
FunctionThis package provides a function makeCPMSampler()
that
implements all three of these algorithms (although it is more suited for
PMMH and CPM algorithms). It abstracts away many implementation details,
so, mathematically speaking, only a small degree of familiarity with the
algorithms is required to use this software.
Programmatically speaking, makeCPMSampler()
is a
function factory. This means that it is a function that creates
and returns another function. The returned function will be the function
that generates parameter samples.
Furthermore, many of makeCPMSampler()
’s required inputs
are functions, as well. Here is a description of all the inputs it
takes.
Any MH algorithm (e.g. MH, PMMH, CPM) requires that you are able
to sample from a proposal distribution, and to evaluate its density. You
provide it these to makeCPMSampler()
in the first two
inputs: paramKernSamp=
and
logParamKernEval=
.
You will need to provide a function that evaluates the
logarithm of the prior distribution. This is the third
argument: logPriorEval
.
Unlike the standard Metropolis-Hastings algorithm, you will not
be required to provide a function that evaluates the logarithm of the
likelihood. Recall from earlier that PMMH and CPM only require an
unbiased estimate of the likelihood. The logarithm of an unbiased
approximation is provided as the fourth argument:
logLikeApproxEval=
. If you’d like, you may provide for this
argument a function giving exact evaluations. In this case,
makeCPMSampler()
will return a standard MH sampler. This
can be useful, for instance, if you are comparing the relative
efficiency of pseudo-marginal and non-pseudo-marginal methods.
Finally, you must provide the non-functional inputs.
yData
is the entire data set of observations.
numU
is the number of standard univariate normal samples
that are used for each approximate log-likelihood evaluation.
numIters
is the total number of samples drawn. This does
not count burn-in or thinning. rho
is the correlation
between standard normal variates at each MCMC iteration. Finally, change
storeEvery
to some integer greater than 1 if you are concerned about memory on your
computer, and you’d like to use thinning. If you set it to 2, say, then every other sample will be
retained.
The first example is the same as the provided in this package’s documentation. This section breaks up the code into chunks in order to explain it more easily. The model has two parameters: θ = (θ1, θ2). The conditional likelihood is
$$ \begin{eqnarray} p(y_{1:N} \mid x_{1:N}, \theta) = \prod_{i=1}^N p(y_{i} \mid x_{i}, \theta) \tag{1} \end{eqnarray} $$
where $$ \begin{eqnarray} y_{i} \mid x_{i}, \theta \sim \text{Normal}(x_i, \theta_1 - \theta_2) \end{eqnarray} $$ for i = 1, …, N. Here x1, …, xN := x1 : N is a collection of latent/hidden/unobserved random variables. They have the following distribution:
$$ \begin{eqnarray} p(x_{1:N} \mid \theta) = \prod_{i=1}^N p( x_{i} \mid \theta) \tag{2} \end{eqnarray} $$
where
$$ \begin{eqnarray} x_{i} \mid \theta \sim \text{Normal}(x_i, \theta_2) \end{eqnarray} $$
for i = 1, …, N.
We can simulate N = 10 observations from the model with the following code.
realTheta1 <- .2 + .3
realTheta2 <- .2
realParams <- c(realTheta1, realTheta2)
numObs <- 10
realX <- rnorm(numObs, mean = 0, sd = sqrt(realTheta2))
realY <- rnorm(numObs, mean = realX, sd = sqrt(realTheta1 - realTheta2))
Next, construct the sampler function sampler
using
makeCPMSampler()
. We assume a uniform prior over the region
50 > θ1 > θ2 > 0.
A simple multivariate normal is used for the proposal distribution.
library(cPseudoMaRg)
numImportanceSamps <- 1000
numMCMCIters <- 1000
randomWalkScale <- 1.5
recordEveryTh <- 1
# create the function that performs sampling
sampler <- makeCPMSampler(
paramKernSamp = function(params){
params + rnorm(2)*randomWalkScale
},
logParamKernEval = function(oldTheta, newTheta){
dnorm(newTheta[1], oldTheta[1], sd = randomWalkScale, log = TRUE)
+ dnorm(newTheta[2], oldTheta[2], sd = randomWalkScale, log = TRUE)
},
logPriorEval = function(theta){
if( (50 > theta[1]) & (theta[1] > theta[2]) & (theta[2] > 0) ){
-7.130899 # - log of 50^2/2
}else{
-Inf
}
},
logLikeApproxEval = function(y, thetaProposal, uProposal){
if( (50 > thetaProposal[1]) & (thetaProposal[1] > thetaProposal[2]) & (thetaProposal[2] > 0) ){
xSamps <- uProposal*sqrt(thetaProposal[2])
logCondLikes <- sapply(xSamps,
function(xsamp) {
sum(dnorm(y,
xsamp,
sqrt(thetaProposal[1] - thetaProposal[2]),
log = TRUE)) })
m <- max(logCondLikes)
log(sum(exp(logCondLikes - m))) + m - log(length(y))
}else{
-Inf
}
},
realY,
numImportanceSamps,
numMCMCIters,
.99, # change to 0 for original pseudo-marginal method
recordEveryTh)
The approximate log-likelihood, provided to the
logLikeApproxEval=
parameter, uses importance sampling:
$$
\begin{align*}
\log \hat{p}(y_{1:N} \mid \theta)
&= \log \left\{
N_X^{-1} \sum_{i=1}^{N_X} \frac{p(y_{1:N} \mid X^i_{1:N}, \theta)
p(X^i_{1:N} \mid \theta)}{q(X^i_{1:N} \mid y_{1:N}, \theta)}
\right\}
\end{align*}
$$ where $X^1_{1:N}, \ldots,
X^{N_X}_{1:N} \overset{\text{iid}}{\sim} q( \cdot \mid y_{1:N},
\theta)$.
Unlike many importance sampling implementations that call
pseudo-random number generating functions (e.g. rnorm
)
directly, this calculates the approximation based on standard normal
samples generated from within makeCPMSampler()
. We choose
q(x1 : N ∣ y1 : N, θ) = p(x1 : N ∣ θ),
despite it not being the optimal proposal/instrumental distribution.
Note the use of the “log-sum-exp” trick to avoid numerical
underflow.
Finally, sample the Markov chain. Call the samples res
,
and examine them.
res <- sampler(realParams)
print(res)
#> CPM Results: at a glance...
#> 1000 iterations performed
#> 1000 samples retained
#> posterior means (in the supplied parameterization): 3.954376 2.877783
#> acceptance rate: 0.248
plot(res)
You might have noticed that this particular model has a tractable likelihood, so it is more efficient to use regular MH. The likelihood is equal to the following
$$ \begin{align*} p(y_{1:N} \mid \theta) &= \int p(y_{1:N} \mid x_{1:N}, \theta) p(x_{1:N} \mid \theta) \text{d}x_{1:N} \\ &= \prod_{i=1}^N \text{Normal}(y_i ; 0, \theta_1) \end{align*} $$
samplerExact <- makeCPMSampler(
paramKernSamp = function(params){
return(params + rnorm(2)*randomWalkScale)
},
logParamKernEval = function(oldTheta, newTheta){
dnorm(newTheta[1], oldTheta[1], sd = randomWalkScale, log = TRUE)
+ dnorm(newTheta[2], oldTheta[2], sd = randomWalkScale, log = TRUE)
},
logPriorEval = function(theta){
if( (50 > theta[1]) & (theta[1] > theta[2]) & (theta[2] > 0) ){
-7.130899 # - log of 50^2/2
}else{
-Inf
}
},
logLikeApproxEval = function(y, thetaProposal, uProposal){
# this is exact now!
if( (50 > thetaProposal[1]) & (thetaProposal[1] > thetaProposal[2]) & (thetaProposal[2] > 0) ){
sum(dnorm(y, mean = 0, sd = sqrt(thetaProposal[1]), log = TRUE))
}else{
-Inf
}
},
realY,
numImportanceSamps, # doesn't this matter because Us are not used
numMCMCIters,
.99, # doesn't this matter because Us are not used
recordEveryTh)
res2 <- samplerExact(realParams)
print(res2)
#> CPM Results: at a glance...
#> 1000 iterations performed
#> 1000 samples retained
#> posterior means (in the supplied parameterization): 1.214049 0.5982352
#> acceptance rate: 0.109
plot(res2)
Consider the following stochastic volatility (Taylor 1982), which is a model that is simpler than the one explored in (Deligiannidis, Doucet, and Pitt 2015). It assumes a latent AR(1) process. It also assumes that, after conditioning on contemporaneous values of the latent process, that the observed returns are normal.
In other words, let Z1, …, ZN and W1, …, WN be iid standard normal random variables, and for t > 1, define the latent process as
Xt = ϕXt − 1 + σZt, and X1 ∼ Normal(0, σ2/(1 − ϕ2)). The distribution of the observed returns is defined by Yt = βexp (Xt/2)Wt for t = 1, …, N. The collection of all unknown parameters is θ = ϕ, β, σ2.
Y1 : N ∣ X1 : N, θ has the same factorization as equation (1), but unlike equation (2), we have the following Markovian factorization for the distribution of all latent variables:
$$ \begin{eqnarray} p(x_{1:N} \mid \theta) = p(x_1) \prod_{t=2}^N p( x_{t} \mid x_{t-1}, \theta) \tag{3}. \end{eqnarray} $$ Assume that all three parameters are independent a priori: π(ϕ)π(β)π(σ2), and select a uniform, Gaussian, and Inverse-Gamma distribution for these three distributions, respectively. For the proposal distribution, we use a multivariate normal in a transformed parameter space. ϕ is transformed with logit((ϕ + 1)/2), β is transformed with the identity map, and σ2 is transformed with the natural logarithm.
The primary bottleneck in the previous example was the function
passed in to the logLikeApproxEval=
input of
makeCPMSampler()
. It was written entirely in R, and so, as
a result, was not as fast as it could have been. In this example, we
provide a compiled function for this input that makes use of C++ code
taken from the PF C++ library (Brown
2020). A growing collection of examples of calling PF code from R
is provided as an R package called pfexamplesinr
that is
hosted on Github. Interfacing to c++ from R was facilitated by the
RcppEigen package (Bates and Eddelbuettel
2013).
The particle filter used to compute the approximate log-likelihood is
described in (Deligiannidis, Doucet, and Pitt
2015). Like the example above, it calculates the approximation
using common random normal variables generated from within
makeCPMSampler()
. As detailed in (Deligiannidis, Doucet, and Pitt 2015), the
particle filter that calculates this number uses a resampling technique
that makes of the pseudo-inverse of a Hilbert space-filling function.
This function was implemented with C++ code lightly edited from the C
code provided in (Skilling 2004).
Since version 1.0.1, this package provides an extra input to
makeCPMSampler()
called nansInLLFatal=
. If set
to FALSE
, then whenever NaN
is returned from
the approximate log-likelihood function, the parameter proposal is
disregarded. Alternatively, if it is set to TRUE
, then the
entire chain comes to a halt, and all the samples obtained up until that
iteration are disregarded. When using particle filters to produce
approximations to the likelihood, I recommend setting this to
FALSE
, especially when the proposal distribution has a high
amount of dispersion. This is because, despite one’s best efforts to
guard against it, particle filters can frequently generate
NaN
s.
Please also note that the number of particles used in this particle
filter is selected in two places. One place is the code below, and
another in a c++ #define
directive in the file
src/likelihoods.cpp
in the pfexamples
library.
If you are interested in increasing or decreasing the number of
particles, fork pfexamplesinr
, edit your copy of the files,
and run the following example after replacing the appropriate line with
devtools::install_github("<your Github username>/pfexamplesinr")
.
library(cPseudoMaRg)
devtools::install_github("tbrown122387/pfexamplesinr@e4e2a80")
library(pfexamplesinr)
returnsData <- read.csv("data/return_data.csv", header=F)[,1]
numParticles <- 500 # THIS MUST MATCH "#define NP 500" in src/likelihoods.cpp
numMCMCIters <- 500
randomWalkScale <- .1
recordEveryTh <- 1
numUs <- length(returnsData)*(numParticles+1)
# some helper functions
transformParams <- function(untrans){
p <- vector(mode = "numeric", length = 3)
p[1] <- boot::logit(.5*(untrans[1] + 1))
p[2] <- untrans[2]
p[3] <- log(untrans[3])
return(p)
}
revTransformParams <- function(trans){
p <- vector(mode = "numeric", length = 3)
p[1] <- 2*boot::inv.logit( trans[1] )-1
p[2] <- trans[2]
p[3] <- exp(trans[3])
return(p)
}
sampler <- makeCPMSampler(
paramKernSamp = function(params){
revTransformParams(transformParams(params) + rnorm(3)*randomWalkScale)
},
logParamKernEval = function(oldTheta, newTheta){
unconstrainedNew <- transformParams(newTheta)
unconstrainedOld <- transformParams(oldTheta)
dnorm(unconstrainedNew[1], unconstrainedOld[1], sd = randomWalkScale, log = TRUE) #phi
+ 0.6931472 + unconstrainedNew[1] - 2*log(1 + exp(unconstrainedNew[1])) # phi jacobian
+ dnorm(unconstrainedNew[2], unconstrainedOld[2], sd = randomWalkScale, log = TRUE) #beta
+ dnorm(unconstrainedNew[3], unconstrainedOld[3], sd = randomWalkScale, log = TRUE) #sigmaSquared
- unconstrainedNew[3] # jacobian
},
logPriorEval = function(theta){
if( (abs(theta[1]) >= 1.0) || theta[3] <= 0.0 ){
-Inf
}else{
log(.5) +
dnorm(theta[2], mean = 0, sd = 10, log = T) +
dgamma(x = 1/theta[3], shape = 1.3, rate = .3, log = T)
}
},
logLikeApproxEval = svolApproxLL, # c++ function from pfexamplesinr
returnsData, numUs, numMCMCIters, .99, recordEveryTh, FALSE
)