Penalized parametric components and more with paraPen

Author

David L Miller

Published

July 31, 2024

This article covers a little-known mgcv feature: the paraPen argument. You might have seen it referenced in ?gam or ?gam.models but not known what you can do with it? This is for you.

On a broad scale what paraPen can do for you is allow you to setup your own smooths/random effects/etc, allowing you to emulate the s() function in a GAM formula. I’ll show some examples of how to do that and talk about how it works.

Background

Let’s start by talking about the things that happen when you execute a piece of code like:

library(mgcv)
Loading required package: nlme
This is mgcv 1.9-3. For overview type 'help("mgcv-package")'.
# example from ?gam
set.seed(2)
dat <- gamSim(1,n=400,dist="normal",scale=2)
Gu & Wahba 4 term additive model
b <- gam(y~s(x0), data=dat)

Lots of things happen, of course, but a key conceptual step here is that gam() knows to construct the model using only data and the formula specification y~s(x0). The two main parts that the model needs are a design matrix and a penalty matrix. When we write s(), mgcv will assume you’re using the thin plate regression spline basis (Wood, 2003) and pass the data (plus, in this case the default, values for k etc) to the appropriate smooth.construct.tp.smooth.spec method1. That “smooth constructor” method will then generate a standardized output consisting of2 the design matrix and penalty matrix/matrices.

As we know from spline basics (Wood, 2017), the design matrix is constructed by taking each datum and applying each basis function to it. So if we selected 10 (the default for univariate smooths) basis functions (setting k=10) and we have \(n\) data, we end-up with a \(n\times 10\) design matrix.

Acknowledgements

I was alerted to the existence of paraPen by Mark Bravington and Sharon Hedley when we wrote our paper on variance propagation (Bravington et al., 2021).

References

Bravington, M. V., Miller, D. L. and Hedley, S. L. (2021) Variance Propagation for Density Surface Models. Journal of Agricultural, Biological and Environmental Statistics. DOI: 10.1007/s13253-021-00438-2.
Wood, S. N. (2003) Thin plate regression splines. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65, 95–114.
Wood, S. N. (2017) Generalized Additive Models. An Introduction with R. 2nd ed. Texts in Statistical Science. CRC Press.

Footnotes

  1. Each smoother in mgcv has one of these in the form smooth.construct.*.smooth.spec. Looking at the source code for these can be a really useful way to learn new numerical tricks and implementation details.↩︎

  2. Amongst other things, see ?mgcv::smoothCon.↩︎