User Guide
Setup
ApproximateGPs builds on top of AbstractGPs.jl, so all of its features are reexported automatically by ApproximateGPs.
using ApproximateGPs, Random
rnd = MersenneTwister(1453) # set a random seed
First, we construct a prior Gaussian process with a Matern-3/2 kernel and zero mean function, and sample some data. More exotic kernels can be constructed using KernelFunctions.jl.
f = GP(Matern32Kernel())
x = rand(rng, 100)
fx = f(x, 0.1) # Observe the GP with Gaussian observation noise (σ² = 0.1)
y = rand(rng, f(x)) # Sample from the GP prior at x
The exact GP posterior
The exact posterior of f
conditioned on y
at inputs x
is given by
exact_posterior = posterior(fx, y)
Constructing a sparse approximation
To construct a sparse approximation to the exact posterior, we first need to select some inducing inputs. In this case, we simply pick a subset of the training data, but more sophisticated schemes for inducing point selection are provided in InducingPoints.jl.
M = 15 # The number of inducing points
z = x[1:M]
The inducing inputs z
imply some latent function values u = f(z)
, sometimes called pseudo-points. The SparseVariationalApproximation
specifies a distribution q(u)
over the pseudo-points. In the case of GP regression, the optimal form for q(u)
is a multivariate Gaussian, which is the only form of q
currently supported by this package.
using Distributions, LinearAlgebra
q = MvNormal(zeros(length(z)), I)
Finally, we pass our q
along with the inputs f(z)
to obtain an approximate posterior GP:
fz = f(z, 1e-6) # 'observe' the process at z with some jitter for numerical stability
approx = SparseVariationalApproximation(fz, q) # Instantiate everything needed for the approximation
sva_posterior = posterior(approx) # Create the approximate posterior
The Evidence Lower Bound (ELBO)
The approximate posterior constructed above will be a very poor approximation, since q
was simply chosen to have zero mean and covariance I
. A measure of the quality of the approximation is given by the ELBO. Optimising this term with respect to the parameters of q
and the inducing input locations z
will improve the approximation.
elbo(SparseVariationalApproximation(fz, q), fx, y)
A detailed example of how to carry out such optimisation is given in Regression: Sparse Variational Gaussian Process for Stochastic Optimisation with Flux.jl. For an example of non-conjugate inference, see Classification: Sparse Variational Approximation for Non-Conjugate Likelihoods with Optim's L-BFGS.
Available Parametrizations
Two parametrizations of q(u)
are presently available: Centered
and NonCentered
. The Centered
parametrization expresses q(u)
directly in terms of its mean and covariance. The NonCentered
parametrization instead parametrizes the mean and covariance of ε := cholesky(cov(u)).U' \ (u - mean(u))
. These parametrizations are also known respectively as "Unwhitened" and "Whitened".
The choice of parametrization can have a substantial impact on the time it takes for ELBO optimization to converge, and which parametrization is better in a particular situation is not generally obvious. That being said, the NonCentered
parametrization often converges in fewer iterations, so it is the default – it is what is used in all of the examples above.
If you require a particular parametrization, simply use the 3-argument version of the approximation constructor:
SparseVariationalApproximation(Centered(), fz, q)
SparseVariationalApproximation(NonCentered(), fz, q)
For a general discussion around these two parametrizations, see e.g. [Gorinova]. For a GP-specific discussion, see e.g. section 3.4 of [Paciorek].
- GorinovaGorinova, Maria and Moore, Dave and Hoffman, Matthew Automatic Reparameterisation of Probabilistic Programs
- PaciorekPaciorek, Christopher Joseph. Nonstationary Gaussian processes for regression and spatial modelling. Diss. Carnegie Mellon University, 2003.