Available Algorithms
The algorithms available through InducingPoints.jl can be split into offline and online use. While all algorithms can be used to create one-off sets of inducing points, the online algorithms are designed in a way that allows for cheap updating.
We start with a set of N
data points of dimension D
, which we would like to reduce to only M < N
points.
D = 2
N = 50
M = 10
x = [rand(D) .* [0.8, 1.0] for _ in 1:N]
Offline Algorithms
KmeansAlg
Uses the k-means algorithm to select centroids minimizing the square distance with the dataset. The seeding is done via k-means++
. Note that the inducing points are not going to be a subset of the data.
alg = KmeansAlg(M)
Z = inducingpoints(alg, x)
kDPP
Sample from a k-Determinantal Point Process to select k
points. Z
will be a subset of X
. Requires a kernel from KernelFunctions.jl
kernel = SqExponentialKernel()
alg = kDPP(M)
Z = inducingpoints(alg, x; kernel)
StdDPP
Samples from a standard Determinantal Point Process. The number of inducing points is not fixed here. Z
will be a subset of X
. Requires a kernel from KernelFunctions.jl
kernel = with_lengthscale(SqExponentialKernel(), 0.2)
alg = StdDPP()
Z = inducingpoints(alg, x; kernel)
RandomSubset
Sample randomly k
points from the data set uniformly.
alg = RandomSubset(M)
Z = inducingpoints(alg, x)
Greedy
This algorithm will select a subset of X
which maximizes the ELBO
(Evidence Lower BOund), which is done in a stochastic way via minibatches of size s
. This also requires passing the output data, the kernel and the noise level as additional arguments to inducingpoints
.
y = rand(N)
s = 5
kernel = with_lengthscale(SqExponentialKernel(), 0.2)
noise = 0.1
alg = Greedy(M, s)
Z = inducingpoints(alg, x; y, kernel, noise)
GreedyVarSelection
Chooses the subset of x
which minimises sum(d), d := diag(C_f - C_{fu} C_{uu}^{-1} C_{uf})
. Stops when either M
pseudo-inputs have been chosen, or maximum(d) < maximum(diag(C_f)) * tol)
. See Stable and Efficient Gaussian Process Calculations and Convergence of Sparse Variational Inference in Gaussian Processes Regression for a discussion of this criterion's use in the sparse GP context. In this example, a maximum of 50 inducing points can be used, but the algorithm chooses to use less than this as the tolerance is satisfied before all 50 are used.
kernel = with_lengthscale(SqExponentialKernel(), 0.2)
alg = GreedyVarSelection(50, 1e-6)
Z = inducingpoints(alg, x; kernel=kernel)
@show length(Z)
length(Z) = 49
CoverTree
The CoverTree
algorithm is a recent algorithm presented in Numerically Stable Sparse Gaussian Processes via Minimum Separation using Cover Trees . It relies on building a covering tree with the nodes representing the inducing points.
alg = CoverTree(0.2)
Z = inducingpoints(alg, x)
Online Algorithms
These algorithms are useful if we assume that we will have another set of data points that we would like to incorporate into an existing inducing point set.
N₂ = 25
x₂ = [rand(D) .* [0.2, 1.0] + [0.8, 0.0] for _ in 1:N₂]
We can then update the inital set of inducing points Z
via updateZ
(or inplace via updateZ!
).
OnlineIPSelection
A method based on distance between inducing points and data. This algorithm has several parameters to tune the result. It also requires the kernel to be passed as a keyword argument to inducingpoints
and updateZ
.
kernel = with_lengthscale(SqExponentialKernel(), 0.2)
alg = OIPS()
Z = inducingpoints(alg, x; kernel)
Z₂ = updateZ(Z, alg, x₂; kernel)
UniGrid
A regularly-spaced grid whose edges are adapted given the data. The inducing points Z
are returned as the UniformGrid
custom type (see below).
alg = UniGrid(5)
Z = inducingpoints(alg, x)
Z₂ = updateZ(Z, alg, x₂)
UniformGrid
When using the UniGrid
algorithm, InducingPoints.jl provides the memory-efficient custom type UniformGrid
, which is essentially a wrapper around a Iterators.product
. It functions in many ways like an AbstractVector
, but does not explicitly store all elements of the grid. Therefore, shown via the example of a two-dimensional grid, the object size only depends on the dimension, not on the number of grid points.
It is optimized to be very efficient with kernelmatrix
function provided by Kernelfunctions.jl
. However, compared to an explicitly stored Vector
of grid points, it incurs additional overhead when used with other vector operations (illustrated below for the example of broadcasting sum
).
SeqDPP
Sequential Determinantal Point Processes, subsets are regularly sampled from the new data batches conditioned on the existing inducing points.
kernel = with_lengthscale(SqExponentialKernel(), 0.2)
alg = SeqDPP()
Z = inducingpoints(alg, x; kernel)
Z₂ = updateZ(Z, alg, x₂; kernel)
StreamKmeans
An online version of k-means.
alg = StreamKmeans(M)
Z = inducingpoints(alg, x)
Z₂ = updateZ(Z, alg, x₂)
Webscale
Another online version of k-means
alg = Webscale(M)
Z = inducingpoints(alg, x)
Z₂ = updateZ(Z, alg, x₂)