In pf
, the c++ library, each model/algorithm pair gets
its own class. Writing this class’s methods defines a time series model,
and the base class you choose to inherit from selects the particle
filter that will be used. This typically requires writing two files: a
c++ header file that declares the class, and a c++ source file that
implements all the model specifics.
In pfR
, the R package, there is one extra step. In
addition to the header and source file, there is some additional c++
code that helps you to use everything from within an R session.
The beauty of this is that pf
files are usable in
pfR
, and the majority of pfR
code is usable in
a c++ program. The downside to this is that R programmers will need a
passing familiarity with c++–fortunately most of the c++ code is
boilerplate, and pfR
has a function that will write most of
it for you.
Before we do any specific example, we’ll have to define a state space model, and choose some notation.
For t = 1, …, T, let {yt} be your observed time series, {xt} be some set of hidden states, and (optional) {zt} be a sequence of inputs you have at your disposal to help better predict each yt.
Defining a state-space model requires defining three kinds of distributions:
In the diagram below, circle nodes correspond with observed variables, square nodes represent hidden/latent variables, and arrows demonstrate how conditional dependence relationships (the dotted lines convey that they are optional).
Here’s a state space model from Harvey and Shephard (1996) Yu (2005) that can help us understand a segment of univariate financial returns data y1, …yT.
$$ \begin{gather*} y_t = \exp(x_t/2)\epsilon_t, \hspace{10mm} t = 1, \ldots, T \\ x_{t+1} = \mu + \phi(x_t - \mu) + \eta_t, \hspace{10mm} t = 1, \ldots, T-1 \\ x_1 \sim \mathcal{N}(\mu, \sigma^2 / (1-\phi^2)) \\ \begin{bmatrix} \epsilon_t \\ \eta_t \end{bmatrix} \sim \mathcal{N}(\boldsymbol{0}, \boldsymbol{\Sigma}) \end{gather*} $$
𝒩(⋅, ⋅) denotes a (possibly multivariate) Gaussian distribution, −1 < ρ < 1 is the leverage parameter (typically negative for equities), and
$$ \boldsymbol{\Sigma} = \begin{bmatrix}1 & \rho\sigma \\ \rho\sigma & \sigma^2 \end{bmatrix}. $$
The interpretation of xt is the (logarithm of) the volatility of the returns and our covariate will be zt = yt − 1. Writing this model in our notation yields
f(xt ∣ xt − 1, zt, θ) is univariate normal with mean μ + ϕ(xt − μ) + ρσyt − 1exp [−xt/2] and variance σ2(1 − ρ2).
Suppose that the algorithm we want is a bootstrap filter with covariates. Our next step is writing all the c++ code for this model and this algorithm. Here is the bit of code that will do most of the work for you. The first argument is the prefix for all of your filenames and will be used throughout the rest of your work.
This will save three files to your
desktop–svol_leverage_bswc.h
,
svol_leverage_bswc.cpp
, and
svol_leverage_bswc_export.cpp
–and then open them in an
RStudio session. You will notice some TODO
s in the files.
Consider these as requests for you to fill in the details of your
model.
After you are finished editing them, they should like something like this, this, and this.
To compile the code you just wrote, simply run
svol_lev
is an Rcpp Module Eddelbuettel (2013) object that holds all the
functions that allow you to use your model. You can call these functions
like this:
Here’s another state space model from Chib, Omori, and Asai (2009) that can handle multivariate returns data y1, y2, ….
$$ \begin{gather*} \mathbf{y}_t = \mathbf{B} \mathbf{f}_t + \mathbf{V}_t^{1/2} \boldsymbol{\epsilon}_t, \hspace{10mm} \boldsymbol{\epsilon}_t \sim \mathcal{N}_p(\boldsymbol{0}, \mathbf{I}) \\ \mathbf{f}_t = \mathbf{D}_t^{1/2} \boldsymbol{\gamma}_t, \hspace{10mm} \boldsymbol{\gamma}_t \sim \mathcal{N}_q(\boldsymbol{0}, \mathbf{I}) \\ \mathbf{x}_{t+1} = \boldsymbol{\mu} + \boldsymbol{\Phi} (\mathbf{x}_{t} - \boldsymbol{\mu}) + \boldsymbol{\eta}_t, \hspace{10mm} \boldsymbol{\eta}_t \sim \mathcal{N}_{p+q}(\boldsymbol{0}, \boldsymbol{\Sigma} ) \\\\ \mathbf{x}_1 \sim \mathcal{N}(\boldsymbol{\mu}, \text{diag}(\sigma^2_1/(1-\phi_1^2), \ldots, \sigma^2_{p+q}/(1-\phi_{p+q}^2) ) ) \\ \end{gather*} $$
where $$ \begin{gather*} \mathbf{V}_t = \text{diag}[\exp(x_{1,t}), \ldots, \exp(x_{p,t}) ], \\ \mathbf{D}_t = \text{diag}[\exp(x_{p+1,t}), \ldots, \exp(x_{p+q,t}) ], \\ \boldsymbol{\Phi} = \text{diag}(\phi_1, \ldots, \phi_{p+q}), \\ \boldsymbol{\Sigma} = \text{diag}(\sigma^2_1, \ldots, \sigma^2_{p+q}) , \end{gather*} $$
p is the number of observations, and q is the number of factors.
Suppose you want to use an auxiliary particle filter Pitt and Shephard (1999) for this model. The c++ file templates are generated with
The finished files look like the ones location here, here and here. That code is compiled with
and the (approximate) log-likelihood and filtering functions are run with