--- title: "Using cPseudoMaRg" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{cPseudoMaRg-vignette} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: bibliography.bib --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## What Does This Package Provide? This package provides the boilerplate code for the Correlated Pseudo-Marginal method of [@cpm]. This vignette demonstrates its use by providing two examples that are closely related to ones given in their paper. ## A Background in Markov-chain Monte Carlo Assume that you are a Bayesian and have some data $y$, and that you would like to perform inference for a collection of parameters $\theta$. After specifying a likelihood $p(y \mid \theta)$ and a prior $p(\theta)$, you are interested in sampling from the posterior $p(\theta \mid y)$ using a Markov chain Monte Carlo (MCMC) algorithm. Such an algorithm will draw correlated samples $\theta^1, \theta^2, \ldots, \theta^{N_{S}}$ from a Markov chain, and under suitable regularity conditions, the average of your large collection of samples will approximate posterior expectations such as the posterior mean: $E[\theta \mid y]$. This package implements Metropolis-Hastings (MH) type algorithms, which is a specific sub-class of MCMC methods. MH algorithms moves from iteration to iteration by following a two-step procedure. Step one starts with *proposing* a new value in the chain. Step two makes a decision on whether or not to *accept* this new proposal. In the event that the sample is rejected, the chain stays in the same place. other MCMC methods, such as Gibbs sampling, produce samples that cannot stay in the same place. This software is more specifically tailored to an implementation of the Correlated Pseudo-Marginal sampler (CPM). CPM is an extension of the Pseudo-Marginal Metropolis-Hastings algorithm (PMMH) [@pseudomarginal], which is itself an extension of the Metropolis-Hastings (MH) algorithm. CPM and PMMH are useful for sampling from the posterior of a model with an *intractable likelihood.* While the original MH algorithm requires the ability to evaluate the model's likelihood, CPM and PMMH only requires unbiased approximations of it. ## The `makeCPMSampler()` Function This package provides a function `makeCPMSampler()` that implements all three of these algorithms (although it is more suited for PMMH and CPM algorithms). It abstracts away many implementation details, so, mathematically speaking, only a small degree of familiarity with the algorithms is required to use this software. Programmatically speaking, `makeCPMSampler()` is a *function factory*. This means that it is a function that creates and returns another function. The returned function will be the function that generates parameter samples. Furthermore, many of `makeCPMSampler()`'s required inputs are functions, as well. Here is a description of all the inputs it takes. - Any MH algorithm (e.g. MH, PMMH, CPM) requires that you are able to sample from a proposal distribution, and to evaluate its density. You provide it these to `makeCPMSampler()` in the first two inputs: `paramKernSamp=` and `logParamKernEval=`. - You will need to provide a function that evaluates the *logarithm* of the prior distribution. This is the third argument: `logPriorEval`. - Unlike the standard Metropolis-Hastings algorithm, you will not be required to provide a function that evaluates the logarithm of the likelihood. Recall from earlier that PMMH and CPM only require an unbiased estimate of the likelihood. The logarithm of an unbiased approximation is provided as the fourth argument: `logLikeApproxEval=`. If you'd like, you may provide for this argument a function giving *exact* evaluations. In this case, `makeCPMSampler()` will return a standard MH sampler. This can be useful, for instance, if you are comparing the relative efficiency of pseudo-marginal and non-pseudo-marginal methods. - Finally, you must provide the non-functional inputs. `yData` is the entire data set of observations. `numU` is the number of standard univariate normal samples that are used for each approximate log-likelihood evaluation. `numIters` is the total number of samples drawn. This does not count burn-in or thinning. `rho` is the correlation between standard normal variates at each MCMC iteration. Finally, change `storeEvery` to some integer greater than $1$ if you are concerned about memory on your computer, and you'd like to use thinning. If you set it to $2$, say, then every other sample will be retained. ## A First Example The first example is the same as the provided in this package's documentation. This section breaks up the code into chunks in order to explain it more easily. The model has two parameters: $\theta = (\theta_1, \theta_2)$. The conditional likelihood is $$ \begin{eqnarray} p(y_{1:N} \mid x_{1:N}, \theta) = \prod_{i=1}^N p(y_{i} \mid x_{i}, \theta) \tag{1} \end{eqnarray} $$ where $$ \begin{eqnarray} y_{i} \mid x_{i}, \theta \sim \text{Normal}(x_i, \theta_1 - \theta_2) \end{eqnarray} $$ for $i=1,\ldots,N$. Here $x_1, \ldots, x_N := x_{1:N}$ is a collection of latent/hidden/unobserved random variables. They have the following distribution: $$ \begin{eqnarray} p(x_{1:N} \mid \theta) = \prod_{i=1}^N p( x_{i} \mid \theta) \tag{2} \end{eqnarray} $$ where $$ \begin{eqnarray} x_{i} \mid \theta \sim \text{Normal}(x_i, \theta_2) \end{eqnarray} $$ for $i=1,\ldots,N$. We can simulate $N=10$ observations from the model with the following code. ```{r, collapse = TRUE} realTheta1 <- .2 + .3 realTheta2 <- .2 realParams <- c(realTheta1, realTheta2) numObs <- 10 realX <- rnorm(numObs, mean = 0, sd = sqrt(realTheta2)) realY <- rnorm(numObs, mean = realX, sd = sqrt(realTheta1 - realTheta2)) ``` Next, construct the sampler function `sampler` using `makeCPMSampler()`. We assume a uniform prior over the region $50 > \theta_1 > \theta_2 > 0$. A simple multivariate normal is used for the proposal distribution. ```{r, collapse=TRUE} library(cPseudoMaRg) numImportanceSamps <- 1000 numMCMCIters <- 1000 randomWalkScale <- 1.5 recordEveryTh <- 1 # create the function that performs sampling sampler <- makeCPMSampler( paramKernSamp = function(params){ params + rnorm(2)*randomWalkScale }, logParamKernEval = function(oldTheta, newTheta){ dnorm(newTheta[1], oldTheta[1], sd = randomWalkScale, log = TRUE) + dnorm(newTheta[2], oldTheta[2], sd = randomWalkScale, log = TRUE) }, logPriorEval = function(theta){ if( (50 > theta[1]) & (theta[1] > theta[2]) & (theta[2] > 0) ){ -7.130899 # - log of 50^2/2 }else{ -Inf } }, logLikeApproxEval = function(y, thetaProposal, uProposal){ if( (50 > thetaProposal[1]) & (thetaProposal[1] > thetaProposal[2]) & (thetaProposal[2] > 0) ){ xSamps <- uProposal*sqrt(thetaProposal[2]) logCondLikes <- sapply(xSamps, function(xsamp) { sum(dnorm(y, xsamp, sqrt(thetaProposal[1] - thetaProposal[2]), log = TRUE)) }) m <- max(logCondLikes) log(sum(exp(logCondLikes - m))) + m - log(length(y)) }else{ -Inf } }, realY, numImportanceSamps, numMCMCIters, .99, # change to 0 for original pseudo-marginal method recordEveryTh) ``` The approximate log-likelihood, provided to the `logLikeApproxEval=` parameter, uses importance sampling: $$ \begin{align*} \log \hat{p}(y_{1:N} \mid \theta) &= \log \left\{ N_X^{-1} \sum_{i=1}^{N_X} \frac{p(y_{1:N} \mid X^i_{1:N}, \theta) p(X^i_{1:N} \mid \theta)}{q(X^i_{1:N} \mid y_{1:N}, \theta)} \right\} \end{align*} $$ where $X^1_{1:N}, \ldots, X^{N_X}_{1:N} \overset{\text{iid}}{\sim} q( \cdot \mid y_{1:N}, \theta)$. Unlike many importance sampling implementations that call pseudo-random number generating functions (e.g. `rnorm`) directly, this calculates the approximation based on standard normal samples generated from within `makeCPMSampler()`. We choose $q(x_{1:N} \mid y_{1:N}, \theta) = p(x_{1:N} \mid \theta)$, despite it not being the optimal proposal/instrumental distribution. Note the use of the "log-sum-exp" trick to avoid numerical underflow. Finally, sample the Markov chain. Call the samples `res`, and examine them. ```{r, collapse=TRUE, out.width='80%',} res <- sampler(realParams) print(res) plot(res) ``` You might have noticed that this particular model has a tractable likelihood, so it is more efficient to use regular MH. The likelihood is equal to the following $$ \begin{align*} p(y_{1:N} \mid \theta) &= \int p(y_{1:N} \mid x_{1:N}, \theta) p(x_{1:N} \mid \theta) \text{d}x_{1:N} \\ &= \prod_{i=1}^N \text{Normal}(y_i ; 0, \theta_1) \end{align*} $$ ```{r, collapse=TRUE, out.width='80%'} samplerExact <- makeCPMSampler( paramKernSamp = function(params){ return(params + rnorm(2)*randomWalkScale) }, logParamKernEval = function(oldTheta, newTheta){ dnorm(newTheta[1], oldTheta[1], sd = randomWalkScale, log = TRUE) + dnorm(newTheta[2], oldTheta[2], sd = randomWalkScale, log = TRUE) }, logPriorEval = function(theta){ if( (50 > theta[1]) & (theta[1] > theta[2]) & (theta[2] > 0) ){ -7.130899 # - log of 50^2/2 }else{ -Inf } }, logLikeApproxEval = function(y, thetaProposal, uProposal){ # this is exact now! if( (50 > thetaProposal[1]) & (thetaProposal[1] > thetaProposal[2]) & (thetaProposal[2] > 0) ){ sum(dnorm(y, mean = 0, sd = sqrt(thetaProposal[1]), log = TRUE)) }else{ -Inf } }, realY, numImportanceSamps, # doesn't this matter because Us are not used numMCMCIters, .99, # doesn't this matter because Us are not used recordEveryTh) res2 <- samplerExact(realParams) print(res2) plot(res2) ``` ## A Second Example Consider the following stochastic volatility [@taylor_svol], which is a model that is simpler than the one explored in [@cpm]. It assumes a latent AR(1) process. It also assumes that, after conditioning on contemporaneous values of the latent process, that the observed returns are normal. In other words, let $Z_1, \ldots, Z_N$ and $W_1, \ldots, W_N$ be iid standard normal random variables, and for $t > 1$, define the latent process as $$ X_t = \phi X_{t-1} + \sigma Z_t, $$ and $X_1 \sim \text{Normal}(0, \sigma^2/(1-\phi^2))$. The distribution of the observed returns is defined by $$ Y_t = \beta \exp(X_t/2) W_t $$ for $t = 1, \ldots, N$. The collection of all unknown parameters is $\theta = \phi, \beta, \sigma^2$. $Y_{1:N} \mid X_{1:N}, \theta$ has the same factorization as equation (1), but unlike equation (2), we have the following Markovian factorization for the distribution of all latent variables: $$ \begin{eqnarray} p(x_{1:N} \mid \theta) = p(x_1) \prod_{t=2}^N p( x_{t} \mid x_{t-1}, \theta) \tag{3}. \end{eqnarray} $$ Assume that all three parameters are independent a priori: $\pi(\phi)\pi(\beta)\pi(\sigma^2)$, and select a uniform, Gaussian, and Inverse-Gamma distribution for these three distributions, respectively. For the proposal distribution, we use a multivariate normal in a transformed parameter space. $\phi$ is transformed with $\text{logit}((\phi+1)/2)$, $\beta$ is transformed with the identity map, and $\sigma^2$ is transformed with the natural logarithm. The primary bottleneck in the previous example was the function passed in to the `logLikeApproxEval=` input of `makeCPMSampler()`. It was written entirely in R, and so, as a result, was not as fast as it could have been. In this example, we provide a compiled function for this input that makes use of C++ code taken from the PF C++ library [@Brown2020]. A growing collection of examples of calling PF code from R is provided as an R package called `pfexamplesinr` that is hosted on Github. Interfacing to c++ from R was facilitated by the RcppEigen package [@rcppeigen]. The particle filter used to compute the approximate log-likelihood is described in [@cpm]. Like the example above, it calculates the approximation using common random normal variables generated from within `makeCPMSampler()`. As detailed in [@cpm], the particle filter that calculates this number uses a resampling technique that makes of the pseudo-inverse of a Hilbert space-filling function. This function was implemented with C++ code lightly edited from the C code provided in [@hilbert]. Since version 1.0.1, this package provides an extra input to `makeCPMSampler()` called `nansInLLFatal=`. If set to `FALSE`, then whenever `NaN` is returned from the approximate log-likelihood function, the parameter proposal is disregarded. Alternatively, if it is set to `TRUE`, then the entire chain comes to a halt, and all the samples obtained up until that iteration are disregarded. When using particle filters to produce approximations to the likelihood, I recommend setting this to `FALSE`, especially when the proposal distribution has a high amount of dispersion. This is because, despite one's best efforts to guard against it, particle filters can frequently generate `NaN`s. Please also note that the number of particles used in this particle filter is selected in two places. One place is the code below, and another in a c++ `#define` directive in the file `src/likelihoods.cpp` in the `pfexamples` library. If you are interested in increasing or decreasing the number of particles, fork `pfexamplesinr`, edit your copy of the files, and run the following example after replacing the appropriate line with `devtools::install_github("/pfexamplesinr")`. ```{r, collapse=TRUE, out.width='80%', eval = FALSE} library(cPseudoMaRg) devtools::install_github("tbrown122387/pfexamplesinr@e4e2a80") library(pfexamplesinr) returnsData <- read.csv("data/return_data.csv", header=F)[,1] numParticles <- 500 # THIS MUST MATCH "#define NP 500" in src/likelihoods.cpp numMCMCIters <- 500 randomWalkScale <- .1 recordEveryTh <- 1 numUs <- length(returnsData)*(numParticles+1) # some helper functions transformParams <- function(untrans){ p <- vector(mode = "numeric", length = 3) p[1] <- boot::logit(.5*(untrans[1] + 1)) p[2] <- untrans[2] p[3] <- log(untrans[3]) return(p) } revTransformParams <- function(trans){ p <- vector(mode = "numeric", length = 3) p[1] <- 2*boot::inv.logit( trans[1] )-1 p[2] <- trans[2] p[3] <- exp(trans[3]) return(p) } sampler <- makeCPMSampler( paramKernSamp = function(params){ revTransformParams(transformParams(params) + rnorm(3)*randomWalkScale) }, logParamKernEval = function(oldTheta, newTheta){ unconstrainedNew <- transformParams(newTheta) unconstrainedOld <- transformParams(oldTheta) dnorm(unconstrainedNew[1], unconstrainedOld[1], sd = randomWalkScale, log = TRUE) #phi + 0.6931472 + unconstrainedNew[1] - 2*log(1 + exp(unconstrainedNew[1])) # phi jacobian + dnorm(unconstrainedNew[2], unconstrainedOld[2], sd = randomWalkScale, log = TRUE) #beta + dnorm(unconstrainedNew[3], unconstrainedOld[3], sd = randomWalkScale, log = TRUE) #sigmaSquared - unconstrainedNew[3] # jacobian }, logPriorEval = function(theta){ if( (abs(theta[1]) >= 1.0) || theta[3] <= 0.0 ){ -Inf }else{ log(.5) + dnorm(theta[2], mean = 0, sd = 10, log = T) + dgamma(x = 1/theta[3], shape = 1.3, rate = .3, log = T) } }, logLikeApproxEval = svolApproxLL, # c++ function from pfexamplesinr returnsData, numUs, numMCMCIters, .99, recordEveryTh, FALSE ) ``` ```{r, collapse=TRUE, out.width='80%', eval = FALSE} svolSampleResults <- sampler( c(.9, 1, .1)) mean(svolSampleResults) print(svolSampleResults) ``` # References