Tensor products
Tensor products are the standard way to construct smoothers of 2 or more dimensions when each dimension is measured on a different scale (i.e., anisotropic smoothing)1.
We can build a tensor product basis as follows. If we have two univariate smooths \(s_x(x)\) and \(s_y(y)\), we can look at their basis expansions: \[\begin{equation*} s_x(x) = \sum_k \beta_k b_k(x) \quad \text{and} \quad s_y(y) = \sum_l \zeta_l d_l(y), \end{equation*}\] where in each summation the Greek letters, \(\beta_k\) and \(\zeta_l\), indicate parameters to estimate and Latin letters, \(b_k\) and \(d_l\), give fixed basis functions.
We can generate a bivariate function from a combination of these terms by taking \(s_x(x)\) and asking how to make this a function of \(x\) and \(y\). We can do this by making each \(\beta_k\) a function of \(y\), using the basis expansion we already have for \(s_y(y)\), hence: \[\begin{equation*} \beta_k(y) = \sum_l \zeta_{kl} d_l(y). \end{equation*}\] Then we can substitute that back into the basis expansion for \(s_x(x)\) to give: \[\begin{equation*} s_{xy}(x,y) = \sum_k \beta_{k}(y) b_k(x) = \sum_k \sum_l \zeta_{kl} d_l(y) b_k(x) . \end{equation*}\] We can iterate this process to build tensor products of any number of dimensions.
The corresponding penalties, which measure the wigglyness of the smooth, are constructed as follows. Primary reference for this is and readers should consult that text for further details, including more general versions of the penalties.
Going back to the univariate terms we started with, each have a penalty: \[\begin{equation} J_x(s_x) = \lambda_x \boldsymbol{\beta}^\intercal \mathbf{S}_x \boldsymbol{\beta} \quad \text{and} \quad J_y(s_y) = \lambda_y \boldsymbol{\zeta}^\intercal \mathbf{S}_y \boldsymbol{\zeta}, \label{eq:mat-pen} \end{equation}\] where \(\boldsymbol{\beta}\) and \(\boldsymbol{\zeta}\) are vectors containing the relevant parameters. \(\mathbf{S}_x\) and \(\mathbf{S}_y\) are fixed matrices measuring the wigglyness of the basis functions. To regulate the importance of the penalties, each has an associated , \(\lambda_x\) and \(\lambda_y\) which scale the influence of the penalty. \(J_x(s_x)\) and \(J_y(s_y)\) will be added to the negative \(\log\)-likelihood to create a penalized likelihood when taking a ``frequentist’’ (actually empirical Bayes) view of the model, or may be viewed as precision matrices of the multivariate normal (zero mean) priors on the parameters .
For the sake of clarity and simplicity, we’ll say \(J_x(s_x)\) and \(J_y(s_y)\) are based on a cubic regression spline basis, so in each case we have: \[\begin{equation} J_x(s_x) = \lambda_x \int \left( \frac{\partial^2 s_x(x)}{\partial x^2}\right)^2 dx \quad \text{and} \quad J_y(s_y) = \lambda_y \int \left( \frac{\partial^2 s_y(y)}{\partial y^2}\right)^2 dy. \label{eq:int-pen} \end{equation}\] Intuitively, we can think of the second derivatives as calculating the how the smoother changes over the range of \(x\), squaring this removes issues about the sign of those changes, and integrating that gives an overall wigglyness measure for the term. A bit of algebra shows that we can re-write (\(\ref{eq:int-pen}\)) in the form of (\(\ref{eq:mat-pen}\)) where we have fixed matrices (\(\mathbf{S}_x\) and \(\mathbf{S}_y\)) we only need calculate once and parameters (\(\boldsymbol{\beta}\) and \(\boldsymbol{\zeta}\)) which change during model fitting. In practice this means that model fitting has the computational complexity of matrix multiplication, rather than of differentiation/integration.
Now we want to find \(J_{xy}(s_{xy})\), the combined penalty of the interaction between smooths of \(x\) and \(y\). We previously found that the basis for the tensor product of \(s_x\) and \(s_y\) is: \[\begin{equation*} s_{xy}(x,y) = \sum_k \sum_l \zeta_{kl} b_k(x) d_l(y). \end{equation*}\] A straightforward way of thinking about a penalty for this term (again thinking about the cubic spline case) is: \[\begin{equation} J_{xy}(s_{xy}) = \int \lambda_x \left( \frac{\partial^2 s_{xy}(x,y)}{\partial x^2}\right)^2 + \lambda_y \left( \frac{\partial^2 s_{xy}(x,y)}{\partial y^2}\right)^2 dx dy. \label{eq:tensor-pen} \end{equation}\] This formulation lets us look at the changes in each direction, holding the other constant, so we get the overall wigglyness of the function from the sum over those directions.
We can re-write (\(\ref{eq:tensor-pen}\)) in the matrix form, as we did in (\(\ref{eq:mat-pen}\)). We can again iterate this process to build tensor products of any number of dimensions. , Section 5.6.2 provides further details.
%First we create a vector which concatenates the two sets of parameters of the marginal smooths, so \(\boldsymbol{\gamma} = (\boldsymbol{\beta},\boldsymbol{\zeta})\). We can find matrices \(\mathbf{M}_x\) and \(\mathbf{M}_y\) that, for example, \(\mathbf{M}_x\boldsymbol{\gamma} = \boldsymbol{\zeta}(x)\). That is, the matrices that map the vector of all parameters to parameters that are only functions of the opposing covariates (similarly, \(\mathbf{M}_y\boldsymbol{\gamma} = \boldsymbol{\beta}(y)\)). So: %\begin{equation*} %J_{xy}(s_{xy}) = _x ( )^2 + _y ( )^2 dx dy. %\end{equation*}
Thanks
Time to think about this was partly funded by BioSS “topic group”: “what we talk about when we talk about random effects”. Thanks also to John Addy for extremely useful discussions.
Footnotes
For isotropic smoothing we can just use thin plate splines in any number of dimensions.↩︎