FiniteGP and AbstractGP
Intended Audience
This page is best read once you have a sense of what this package is trying to achieve. Therefore, we recommend reading (or at least skimming) some of the examples before reading these docs.
Introduction
AbstractGPs provides the abstract type AbstractGP
, and the concrete type FiniteGP
. An AbstractGP
, f
, should be thought of as a distribution over functions. This means that the output of rand(f)
would be a real-valued function. It's not usually possible to implement this though, so we don't.
A FiniteGP
fx = f(x)
represents the distribution over functions at the finite collection of points specified in x
. fx
is a multivariate Normal distribution, so rand(fx)
produces a Vector
of Real
s.
A FiniteGP
is the interesting object computationally, so if you create a new subtype MyNewGP
of AbstractGP
, and wish to make it interact well with the rest of the GP ecosystem, the methods that you must implement are not those directly involving MyNewGP
, but rather those involving
FiniteGP{<:MyNewGP}
We provide two ways in which to do this. The first is to implement methods directly on FiniteGP{<:MyNewGP}
– this is detailed in the FiniteGP APIs. The second is to implement some methods directly involving MyNewGP
, and utilise default FiniteGP
methods implemented in terms of these – this is detailed in the Internal AbstractGPs API. For example, the first method involves implementing methods like AbstractGPs.mean(fx::FiniteGP{<:MyNewGP})
, while the second involves AbstractGPs.mean(f::MyNewGP, x::AbstractVector)
.
The second interface is generally easier to implement, but isn't always the best choice. See Which API should I implement? for further discussion.
FiniteGP APIs
Let f
be an AbstractGP
, x
an AbstractVector
representing a collection of inputs, and Σ
a positive-definite matrix of size (length(x), length(x))
. A FiniteGP
represents the multivariate Gaussian induced by "indexing" into f
at each point in x
, and adding independent zero-mean noise with covariance matrix Σ
:
fx = f(x, Σ)
# The code below is equivalent to the above, and is just for reference.
# When writing code, prefer the above syntax.
fx = AbstractGPs.FiniteGP(f, x, Σ)
The FiniteGP
has two API levels. The Primary Public API should be supported by all FiniteGP
s, while the Secondary Public API will only be supported by a subset. Use only the primary API when possible.
Primary Public API
These are user-facing methods. You can expect them to be implemented whenever you encounter a FiniteGP
. If you are building something on top of AbstractGPs, try to implement it in terms of these functions.
Required Methods
Base.rand
— Functionrand(rng::AbstractRNG, f::FiniteGP, N::Int=1)
Obtain N
independent samples from the marginals f
using rng
. Single-sample methods produce a length(f)
vector. Multi-sample methods produce a length(f)
× N
Matrix
.
julia> f = GP(Matern32Kernel());
julia> x = randn(11);
julia> rand(f(x)) isa Vector{Float64}
true
julia> rand(MersenneTwister(123456), f(x)) isa Vector{Float64}
true
julia> rand(f(x), 3) isa Matrix{Float64}
true
julia> rand(MersenneTwister(123456), f(x), 3) isa Matrix{Float64}
true
Random.rand!
— Functionrand!(rng::AbstractRNG, f::FiniteGP, y::AbstractVecOrMat{<:Real})
Obtain sample(s) from the marginals f
using rng
and write them to y
.
If y
is a matrix, then each column corresponds to an independent sample.
julia> f = GP(Matern32Kernel());
julia> x = randn(11);
julia> y = similar(x);
julia> rand!(f(x), y);
julia> rand!(MersenneTwister(123456), f(x), y);
julia> ys = similar(x, length(x), 3);
julia> rand!(f(x), ys);
julia> rand!(MersenneTwister(123456), f(x), ys);
AbstractGPs.marginals
— Functionmarginals(f::FiniteGP)
Compute a vector of Normal distributions representing the marginals of f
efficiently. In particular, the off-diagonal elements of cov(f(x))
are never computed.
julia> f = GP(Matern32Kernel());
julia> x = randn(11);
julia> fs = marginals(f(x));
julia> mean.(fs) == mean(f(x))
true
julia> std.(fs) == sqrt.(diag(cov(f(x))))
true
Distributions.logpdf
— Methodlogpdf(f::FiniteGP, y::AbstractVecOrMat{<:Real})
The logpdf of y
under f
if y isa AbstractVector
. The logpdf of each column of y
if y isa Matrix
.
julia> f = GP(Matern32Kernel());
julia> x = randn(11);
julia> y = rand(f(x));
julia> logpdf(f(x), y) isa Real
true
julia> Y = rand(f(x), 3);
julia> logpdf(f(x), Y) isa AbstractVector{<:Real}
true
AbstractGPs.posterior
— Methodposterior(fx::FiniteGP, y::AbstractVector{<:Real})
Construct the posterior distribution over fx.f
given observations y
at fx.x
made under noise fx.Σy
. This is another AbstractGP
object. See chapter 2 of [1] for a recap on exact inference in GPs. This posterior process has mean function
m_posterior(x) = m(x) + k(x, fx.x) inv(cov(fx)) (y - mean(fx))
and kernel
k_posterior(x, z) = k(x, z) - k(x, fx.x) inv(cov(fx)) k(fx.x, z)
where m
and k
are the mean function and kernel of fx.f
, respectively.
Statistics.mean
— Methodmean(fx::FiniteGP)
Compute the mean vector of fx
.
julia> f = GP(Matern52Kernel());
julia> x = randn(11);
julia> mean(f(x)) == zeros(11)
true
Statistics.var
— Methodvar(f::FiniteGP)
Compute only the diagonal elements of cov(f)
.
Examples
julia> fx = GP(Matern52Kernel())(randn(10), 0.1);
julia> var(fx) == diag(cov(fx))
true
Optional methods
Default implementations are provided for these, but you may wish to specialise for performance.
StatsBase.mean_and_var
— Methodmean_and_var(f::FiniteGP)
Compute both mean(f)
and the diagonal elements of cov(f)
.
Sometimes more efficient than computing them separately, particularly for posteriors.
Examples
julia> fx = GP(SqExponentialKernel())(range(-3.0, 3.0; length=10), 0.1);
julia> mean_and_var(fx) == (mean(fx), var(fx))
true
Secondary Public API
Observe that the Primary Public API does not include a function to compute the covariance matrix of a FiniteGP
. While the covariance matrix of any multivariate Gaussian is defined, it is not always a good idea to actually compute it. Fortunately, it's often the case that you're not actually interested in the covariance matrix per-se, rather the other quantities that you might use it to compute (logpdf
, rand
, posterior
). This is similar to the well-known observation that you rarely need the inverse of a matrix, you just need to compute the inverse multiplied by something, so it's considered good practice to avoid ever explicitly computing the inverse of a matrix so as to avoid the numerical issues associated with it. This is important, for example, as TemporalGPs.jl is able to implement all of the Primary Public API in linear time in the dimension of the FiniteGP
, as it never needs to evaluate the covariance matrix.
However, for many (probably the majority of) GPs, this acceleration isn't possible, and there is really nothing lost by explicitly evaluating the covariance matrix. We call this the Secondary Public API, because it's available a large proportion of the time, but should be avoided if at all possible.
Required Methods
Statistics.cov
— Methodcov(f::FiniteGP)
Compute the covariance matrix of fx
.
Noise-free observations
julia> f = GP(Matern52Kernel());
julia> x = randn(11);
julia> cov(f(x)) == kernelmatrix(Matern52Kernel(), x)
true
Isotropic observation noise
julia> cov(f(x, 0.1)) == kernelmatrix(Matern52Kernel(), x) + 0.1 * I
true
Independent anisotropic observation noise
julia> s = rand(11);
julia> cov(f(x, s)) == kernelmatrix(Matern52Kernel(), x) + Diagonal(s)
true
Correlated observation noise
julia> A = randn(11, 11); S = A'A;
julia> cov(f(x, S)) == kernelmatrix(Matern52Kernel(), x) + S
true
Optional Methods
Default implementations are provided for these, but you may wish to specialise for performance.
StatsBase.mean_and_cov
— Methodmean_and_cov(f::FiniteGP)
Equivalent to (mean(f), cov(f))
, but sometimes more efficient to compute them jointly than separately.
julia> fx = GP(SqExponentialKernel())(range(-3.0, 3.0; length=10), 0.1);
julia> mean_and_cov(fx) == (mean(fx), cov(fx))
true
Internal AbstractGPs API
This functionality is not intended to be used directly by the users, or those building functionality on top of this package – they should interact with Primary Public API. If you believe you really do need to interact with this level of the API, please open an issue to discuss as you may have a use-case that was missed during the design of this API.
As discussed at the top of this page, instances of subtypes of AbstractGP
represent Gaussian processes – collections of jointly-Gaussian random variables, which may be infinite-dimensional.
Implementing the following API for your own AbstractGP
subtype automatically implements both the Primary and Secondary public APIs above in terms of them.
Existing implementations of this interface include
Required Methods
Statistics.mean
— Methodmean(f::AbstractGP, x::AbstractVector)
Compute the mean vector of the multivariate Normal f(x)
.
Statistics.cov
— Methodcov(f::AbstractGP, x::AbstractVector, y::AbstractVector)
Compute the length(x)
by length(y)
cross-covariance matrix between f(x)
and f(y)
.
Statistics.var
— Methodvar(f::AbstractGP, x::AbstractVector)
Compute only the diagonal elements of cov(f(x))
.
Optional Methods
Default implementations are provided for these, but you may wish to specialise for performance.
Statistics.cov
— Methodcov(f::AbstractGP, x::AbstractVector)
Compute the length(x)
by length(x)
covariance matrix of the multivariate Normal f(x)
.
StatsBase.mean_and_cov
— Methodmean_and_cov(f::AbstractGP, x::AbstractVector)
Compute both mean(f(x))
and cov(f(x))
. Sometimes more efficient than computing them separately, particularly for posteriors.
StatsBase.mean_and_var
— Methodmean_and_var(f::AbstractGP, x::AbstractVector)
Compute both mean(f(x))
and the diagonal elements of cov(f(x))
. Sometimes more efficient than computing them separately, particularly for posteriors.
Note that, while we could provide a default implementation for var(f, x)
as diag(cov(f, x))
, this is generally such an inefficient fallback, that we find it preferable to error if it's not implemented than to ever hit a fallback.
Which API should I implement?
To answer this question, you need to need to know whether or not the default implementations of the FiniteGP APIs work for your use case. There are a couple of reasons of which we are aware for why this might not be the case (see below) – possibly there are others.
If you are unsure, please open an issue to discuss.
You want to avoid computing the covariance matrix
We've already discussed this a bit on this page. The default implementations of the FiniteGP APIs rely on computing the covariance matrix. If your AbstractGP
subtype needs to avoid computing the covariance matrix for performance reasons, then do not implement the Internal AbstractGPs API. Do implement the Primary Public API. Do not implement the Secondary Public API.
TemporalGPs.jl is an example of a package that does this – see the LTISDE
implementation for an example. The same is true of the BayesianLinearRegressor
type.
You don't want to use the default implementations
Perhaps you just don't like the default implementations because you don't want to make use of Cholesky factorisations. We don't have an example of this yet in Julia, however GPyTorch avoids the Cholesky factorisation in favour of iterative solvers.
In this situation, implement both the Internal AbstractGPs API and the FiniteGP APIs.
In this situation you will benefit less from code reuse inside AbstractGPs, but will continue to benefit from the ability of others use your code, and to take advantage of any existing functionality which requires types which adhere to the AbstractGPs API.
Mean functions
We define an API for prior mean functions with the abstract type MeanFunction
and the mean_vector
function.
AbstractGPs.MeanFunction
— Typeabstract type MeanFunction end
MeanFunction
introduces an API for treating the prior mean function appropriately. On the abstract level, all MeanFunction
are functions. However we generally want to evaluate them on a collection of inputs. To this effect, we provide the mean_vector(::MeanFunction, ::AbstractVector)
function, which is equivalent to map
but with possibilities of optimizations (for ZeroMean
and ConstMean
for example).
AbstractGPs.mean_vector
— Functionmean_vector(m::MeanFunction, x::AbstractVector)::AbstractVector{<:Real}
mean_vector
is the function to call to apply a MeanFunction
to a collection of inputs.
We provide standard mean functions like ZeroMean
and ConstMean
as well as CustomMean
to simply wrap a function.
AbstractGPs.ZeroMean
— TypeZeroMean{T<:Real} <: MeanFunction
Returns zero(T)
everywhere, T
is Float64
by default.
AbstractGPs.ConstMean
— TypeConstMean{T<:Real} <: MeanFunction
Returns c
everywhere.
AbstractGPs.CustomMean
— TypeCustomMean{Tf} <: MeanFunction
A wrapper around whatever unary function you fancy. Must be able to be mapped over an AbstractVector
of inputs.
Warning
CustomMean
is generally sufficient for testing purposes, but care should be taken if attempting to differentiate through mean_vector
with a CustomMean
when using Zygote.jl
. In particular, mean_vector(m::CustomMean, x)
is implemented as map(m.f, x)
, which when x
is a ColVecs
or RowVecs
will not differentiate correctly.
In such cases, you should implement mean_vector
directly for your custom mean. For example, if f(x) = sum(x)
, you might implement mean_vector
as
mean_vector(::CustomMean{typeof(f)}, x::ColVecs) = vec(sum(x.X; dims=1))
mean_vector(::CustomMean{typeof(f)}, x::RowVecs) = vec(sum(x.X; dims=2))
which avoids ever applying map
to a ColVecs
or RowVecs
.
Testing Utilities
AbstractGPs provides several consistency tests in the AbstractGPs.TestUtils
module. These tests will ensure that, for example, the size and type of everything produced by an implementation of the API is correct, and consistent with the other methods. It will not ensure correctness in any absolute sense though (e.g. that logpdf
indeed computes what you wanted it to compute). Consequently, these tests should be seen as a set of necessary conditions for your code to be correct. They are not, however, sufficient.
You should only need to run one of the following test suites.
- If you implement the Primary Public API, run
test_finitegp_primary_public_interface
. - If you implement both the Primary Public API and the Secondary Public API, then run
test_finitegp_primary_and_secondary_public_interface
. - If you implement the Internal AbstractGPs API, run
test_internal_abstractgps_interface
.
Also see Which API should I implement? for more information about the most appropriate API to implement.
AbstractGPs.TestUtils.test_finitegp_primary_public_interface
— Functiontest_finitegp_primary_public_interface(
rng::AbstractRNG, fx::FiniteGP; atol::Real=1e-12
)
Basic consistency tests for the Primary Public FiniteGP API. You should run these tests if you only implement the Primary Public API – see API section of docs for details about this API.
These are consistency checks, not correctness tests in the absolute sense. For example, these tests ensure that samples generated by rand
are of the correct size, but does not check that they come from the intended distribution.
AbstractGPs.TestUtils.test_finitegp_primary_and_secondary_public_interface
— Functiontest_finitegp_primary_and_secondary_public_interface(
rng::AbstractRNG, fx::FiniteGP; atol::Real=1e-12
)
Basic consistency tests for both the Primary and Secondary Public FiniteGP APIs. Runs test_finitegp_primary_public_interface
as part of these tests. You should run these tests if you implement both the primary and secondary public APIs – see API section of the docs for more information about these APIs.
These are consistency checks, not correctness tests in the absolute sense. For example, these tests ensure that samples generated by rand
are of the correct size, but does not check that they come from the intended distribution.
AbstractGPs.TestUtils.test_internal_abstractgps_interface
— Functiontest_internal_abstractgps_interface(
rng::AbstractRNG,
f::AbstractGP,
x::AbstractVector,
z::AbstractVector;
atol=1e-12,
σ²::Real=1e-1,
jitter::Real=AbstractGPs.default_σ²,
)
Basic consistency tests for the Internal AbstractGPs API. Runs test_finitegp_primary_and_secondary_public_interface
as part of these tests. Run these tests if you implement the Internal AbstractGPs API – see the API section of the docs for more information about this API.
These are consistency checks, not correctness tests in the absolute sense. For example, these tests ensure that samples generated by rand
are of the correct size, but does not check that they come from the intended distribution.
kwargs
- atol: the minimum eigenvalue of
cov(f, x)
must be greater than this. - σ²: the observation noise under which things are observed.
- jitter: constant added to diagonal of
cov(f, x)
andcov(f, z)
before inversion.