The goal of mvgam
is to use a Bayesian framework to estimate parameters of Dynamic Generalized Additive Models (DGAMs) for time series with dynamic trend components. The package provides an interface to fit Bayesian DGAMs using
Stan
as the backend, and is particularly suited to the estimation of complex State-Space models. The formula syntax is based on that of the package
mgcv
to provide a familiar GAM modelling interface. There is also built-in support for the increasingly powerful
marginaleffects
package to make interpretation easy. The motivation for the package and some of its primary objectives are described in detail by Clark & Wells 2022 (published in
Methods in Ecology and Evolution). An introduction to the package and some worked examples are also shown in the below seminar:
Ecological Forecasting with Dynamic Generalized Additive Models DGAMs)
The first release of mvgam
has now been published on CRAN. While the package has featured on this blog before, for example when I explained
hierarchical distributed lag models and
how to model temporally autocorrelated time series with nonlinear trends, I have not thorougly introduced the it yet. So the purpose of this brief post is to cover some of the features of this stable release, and to hint at a few of the development goals for upcoming releases.
Generalized Additive Models (GAMs) are flexible tools that replace one or more predictors in a Generalized Linear Model (GLM) with smooth functions of predictors. These are helpful for learning arbitrarily complex, nonlinear relationships between predictors and conditional responses without needing a priori expectations about the shapes of these relationships. Rather, they are learned using penalized smoothing splines.
How do these work? The secret is a basis expansion, which in lay terms means that the covariate (time, in this example) is evaluated at a smaller set of basis functions designed to cover the range of the observed covariate values. Below is one particular type of basis, called a cubic regression basis.
Each basis function acts on its own local neighbourhood of the covariate. Once we have constructed the basis, we can estimate weights for each function. The weights allow the basis functions to have different impacts on the shape of the spline. They also give us a target for learning splines from data, as the weights act as the regression coefficients.
There are many more types of basis expansions that can be used to form penalized smooths, including multidimensional smooths, spatial smooths or even monotonic smooths. For more information on GAMs and how they can smooth through data, see my blogpost on how to interpret nonlinear effects from Generalized Additive Models.
mvgam
is designed to propagate unobserved temporal processes to capture latent dynamics in the observed time series. This works in a state-space format, with the temporal trend evolving independently of the observation process. Briefly, assume \(\tilde{\boldsymbol{y}}_{i,t}\)
is the conditional expectation of response variable \(\boldsymbol{i}\)
at time \(\boldsymbol{t}\)
. Assuming \(\boldsymbol{y_i}\)
is drawn from an exponential distribution with an invertible link function, the linear predictor for a multivariate Dynamic GAM can be written as:
$$for~i~in~1:N_{series}~...$$
$$for~t~in~1:N_{timepoints}~...$$
$$g^{-1}(\tilde{\boldsymbol{y}}_{i,t})=\alpha_{i}+\sum\limits_{j=1}^J\boldsymbol{s}_{i,j,t}\boldsymbol{x}_{j,t}+\boldsymbol{z}_{i,t}\,,$$
Here \(\alpha\)
are the unknown intercepts, the \(\boldsymbol{s}\)
’s are unknown smooth functions of covariates \((\boldsymbol{x})\)
’s, which can potentially vary among the response series, and \(\boldsymbol{z}\)
are dynamic latent processes. Each smooth function \(\boldsymbol{s_j}\)
is composed of basis expansions whose coefficients, which must be estimated, control the functional relationship between \(\boldsymbol{x}_{j}\)
and \(g^{-1}(\tilde{\boldsymbol{y}})\)
. But note that we can also include smooth functions of covariates in the process models \(\boldsymbol{z}\)
.
Several advantages of GAMs are that they can model a diversity of response families, including discrete distributions (i.e. Poisson, Negative Binomial, Gamma) that accommodate common ecological features such as zero-inflation or overdispersion, and that they can be formulated to include hierarchical smoothing for multivariate responses. mvgam
supports a number of different observation families, which are summarized below.
mvgam
was originally designed to analyse and forecast time series of non-negative integer-valued data (counts). These data are traditionally challenging to analyse with existing time series analysis packages. But further development of mvgam
has resulted in support for a growing number of observation families that extend to other types of data. Currently, the package can handle data for the following families:
Distribution | Function | Support | Extra parameter(s) |
---|---|---|---|
Gaussian (identity link) | gaussian() |
Real values in \((-\infty, \infty)\) |
\(\sigma\) |
Student’s T (identity link) | student-t() |
Heavy-tailed real values in \((-\infty, \infty)\) |
\(\sigma\) , \(\nu\) |
LogNormal (identity link) | lognormal() |
Positive real values in \([0, \infty)\) |
\(\sigma\) |
Gamma (log link) | Gamma() |
Positive real values in \([0, \infty)\) |
\(\alpha\) |
Beta (logit link) | betar() |
Real values (proportional) in \([0,1]\) |
\(\phi\) |
Bernoulli (logit link) | bernoulli() |
Binary data in \({0,1}\) |
– |
Poisson (log link) | poisson() |
Non-negative integers in \((0,1,2,...)\) |
– |
Negative Binomial2 (log link) | nb() |
Non-negative integers in \((0,1,2,...)\) |
\(\phi\) |
Binomial (logit link) | binomial() |
Non-negative integers in \((0,1,2,...)\) |
– |
Beta-Binomial (logit link) | beta_binomial() |
Non-negative integers in \((0,1,2,...)\) |
\(\phi\) |
Poisson Binomial N-mixture (log link) | nmix() |
Non-negative integers in \((0,1,2,...)\) |
– |
For all supported observation families, any extra parameters that need to be estimated (i.e. the \(\sigma\)
in a Gaussian model or the \(\phi\)
in a Negative Binomial model) are by default estimated independently for each series. However, users can opt to force all series to share extra observation parameters using share_obs_params = TRUE
in mvgam()
. Note that default link functions cannot currently be changed.
The dynamic processes that mvgam
can fit to time series can take a wide variety of forms, some of which can be multivariate to allow the different time series to interact or be correlated. When using the mvgam()
function, the user chooses between different process models with the trend_model
argument. Available process models are described in
?mvgam_trends
and detailed below.
Use trend_model = 'RW'
or trend_model = RW()
to set up a model where each series in data
has independent latent temporal dynamics of the form:
$$\begin{align*}
z_{i,t} & \sim \text{Normal}(z_{i,t-1}, \sigma_i) \end{align*}$$
Process error parameters \(\sigma\)
are modeled independently for each series. If a moving average process is required, use trend_model = RW(ma = TRUE)
to set up the following:
$$\begin{align*}
z_{i,t} & = z_{i,t-1} + \theta_i * error_{i,t-1} + error_{i,t} \\
error_{i,t} & \sim \text{Normal}(0, \sigma_i) \end{align*}$$
Moving average coefficients \(\theta\)
are independently estimated for each series and will be forced to be stationary by default \((abs(\theta)<1)\)
. Only moving averages of order \(q=1\)
are currently allowed.
If more than one series is included in data
\((N_{series} > 1)\)
, a multivariate Random Walk can be set up using trend_model = RW(cor = TRUE)
, resulting in the following:
$$\begin{align*}
z_{t} & \sim \text{MVNormal}(z_{t-1}, \Sigma) \end{align*}$$
Where the latent process estimate \(z_t\)
now takes the form of a vector. The covariance matrix \(\Sigma\)
will capture contemporaneously correlated process errors. It is parameterised using a Cholesky factorization, which requires priors on the series-level variances \(\sigma\)
and on the strength of correlations using Stan
’s lkj_corr_cholesky
distribution.
Moving average terms can also be included for multivariate random walks, in which case the moving average coefficients \(\theta\)
will be parameterised as an \(N_{series} * N_{series}\)
matrix
Autoregressive models up to \(p=3\)
, in which the autoregressive coefficients are estimated independently for each series, can be used by specifying trend_model = 'AR1'
, trend_model = 'AR2'
, trend_model = 'AR3'
, or trend_model = AR(p = 1, 2, or 3)
. For example, a univariate AR(1) model takes the form:
$$\begin{align*}
z_{i,t} & \sim \text{Normal}(ar1_i * z_{i,t-1}, \sigma_i) \end{align*}$$
All options are the same as for Random Walks, but additional options will be available for placing priors on the autoregressive coefficients. By default, these coefficients will not be forced into stationarity, but users can impose this restriction by changing the upper and lower bounds on their priors. See ?get_mvgam_priors
for more details.
A Vector Autoregression of order \(p=1\)
can be specified if \(N_{series} > 1\)
using trend_model = 'VAR1'
or trend_model = VAR()
. A VAR(1) model takes the form:
$$\begin{align*}
z_{t} & \sim \text{Normal}(A * z_{t-1}, \Sigma) \end{align*}$$
Where \(A\)
is an \(N_{series} * N_{series}\)
matrix of autoregressive coefficients in which the diagonals capture lagged self-dependence (i.e. the effect of a process at time \(t\)
on its own estimate at time \(t+1\)
), while off-diagonals capture lagged cross-dependence (i.e. the effect of a process at time \(t\)
on the process for another series at time \(t+1\)
). By default, the covariance matrix \(\Sigma\)
will assume no process error covariance by fixing the off-diagonals to \(0\)
. To allow for correlated errors, use trend_model = 'VAR1cor'
or trend_model = VAR(cor = TRUE)
. A moving average of order \(q=1\)
can also be included using trend_model = VAR(ma = TRUE, cor = TRUE)
.
Note that for all VAR models, stationarity of the process is enforced with a structured prior distribution that is described in detail in Heaps 2022
Heaps, Sarah E. “ Enforcing stationarity through the prior in vector autoregressions.” Journal of Computational and Graphical Statistics 32.1 (2023): 74-83.
The final option for modelling temporal dynamics is to use a Gaussian Process with squared exponential kernel. These are set up independently for each series (there is currently no multivariate GP option), using trend_model = 'GP'
. The dynamics for each latent process are modelled as:
$$\begin{align*}
z & \sim \text{MVNormal}(0, \Sigma_{error}) \\
\Sigma_{error}[t_i, t_j] & = \alpha^2 * exp(-0.5 * ((|t_i - t_j| / \rho))^2) \end{align*}$$
The latent dynamic process evolves from a complex, high-dimensional Multivariate Normal distribution which depends on \(\rho\)
(often called the length scale parameter) to control how quickly the correlations between the model’s errors decay as a function of time. For these models, covariance decays exponentially fast with the squared distance (in time) between the observations. The functions also depend on a parameter \(\alpha\)
, which controls the marginal variability of the temporal function at all points; in other words it controls how much the GP term contributes to the linear predictor. mvgam
capitalizes on some advances that allow GPs to be approximated using Hilbert space basis functions, which
considerably speed up computation at little cost to accuracy or prediction performance.
Modeling growth for many types of time series is often similar to modeling population growth in natural ecosystems, where there series exhibits nonlinear growth that saturates at some particular carrying capacity. The logistic trend model available in {mvgam
} allows for a time-varying capacity \(C(t)\)
as well as a non-constant growth rate. Changes in the base growth rate \(k\)
are incorporated by explicitly defining changepoints throughout the training period where the growth rate is allowed to vary. The changepoint vector \(a\)
is represented as a vector of 1
s and 0
s, and the rate of growth at time \(t\)
is represented as \(k+a(t)^T\delta\)
. Potential changepoints are selected uniformly across the training period, and the number of changepoints, as well as the flexibility of the potential rate changes at these changepoints, can be controlled using trend_model = PW()
. The full piecewise logistic growth model is then:
$$\begin{align*}
z_t & = \frac{C_t}{1 + \exp(-(k+a(t)^T\delta)(t-(m+a(t)^T\gamma)))} \end{align*}$$
For time series that do not appear to exhibit saturating growth, a piece-wise constant rate of growth can often provide a useful trend model. The piecewise linear trend is defined as:
$$\begin{align*}
z_t & = (k+a(t)^T\delta)t + (m+a(t)^T\gamma) \end{align*}$$
In both trend models, \(m\)
is an offset parameter that controls the trend intercept. Because of this parameter, it is not recommended that you include an intercept in your observation formula because this will not be identifiable. You can read about the full description of piecewise linear and logistic trends
in this paper by Taylor and Letham.
Sean J. Taylor and Benjamin Letham. “ Forecasting at scale.” The American Statistician 72.1 (2018): 37-45.
Most trend models in the mvgam()
function expect time to be measured in regularly-spaced, discrete intervals (i.e. one measurement per week, or one per year for example). But some time series are taken at irregular intervals and we’d like to model autoregressive properties of these. The trend_model = CAR()
can be useful to set up these models, which currently only support autoregressive processes of order 1
. The evolution of the latent dynamic process follows the form:
$$\begin{align*}
z_{i,t} & \sim \text{Normal}(ar1_i^{distance} * z_{i,t-1}, \sigma_i) \end{align*}$$
Where \(distance\)
is a vector of non-negative measurements of the time differences between successive observations. See
the Examples section in ?CAR
for an illustration of how to set these models up.
mvgam
supports an observation model regression formula, built off the mgcv
package, as well as an optional process model regression formula. The formulae supplied to mvgam()
are exactly like those supplied to glm()
except that smooth terms, s()
,
te()
, ti()
and t2()
, time-varying effects using dynamic()
, monotonically increasing (using s(x, bs = 'moi')
) or decreasing penalized splines (using s(x, bs = 'mod')
; see ?smooth.construct.moi.smooth.spec
for details), as well as Gaussian Process functions using gp()
, can be added to the right hand side (and .
is not supported in mvgam
formulae). See
?mvgam_formulae
for more guidance.
For setting up State-Space models, the optional process model formula can be used (see the State-Space model vignette and the shared latent states vignette for guidance on using trend formulae).
I have several priorities to help improve the workflow of mvgam
, as well as some features I would like to implement. These include
ggplot2
so that users can modify figures as they wish\(\sigma\)
parameters in AR or RW processes)\(\rho\)
and dynamic factor spatiotemporal effects