Available Algorithms

The algorithms available through InducingPoints.jl can be split into offline and online use. While all algorithms can be used to create one-off sets of inducing points, the online algorithms are designed in a way that allows for cheap updating.

We start with a set of N data points of dimension D, which we would like to reduce to only M < N points.

D = 2
N = 50
M = 10
x = [rand(D) .* [0.8, 1.0] for _ in 1:N]

Offline Algorithms

KmeansAlg

Uses the k-means algorithm to select centroids minimizing the square distance with the dataset. The seeding is done via k-means++. Note that the inducing points are not going to be a subset of the data.

alg = KmeansAlg(M)
Z = inducingpoints(alg, x)

k-means plot

kDPP

Sample from a k-Determinantal Point Process to select k points. Z will be a subset of X. Requires a kernel from KernelFunctions.jl

kernel = SqExponentialKernel()
alg = kDPP(M)
Z = inducingpoints(alg, x; kernel)

k-DPP plot

StdDPP

Samples from a standard Determinantal Point Process. The number of inducing points is not fixed here. Z will be a subset of X. Requires a kernel from KernelFunctions.jl

kernel = with_lengthscale(SqExponentialKernel(), 0.2)
alg = StdDPP()
Z = inducingpoints(alg, x; kernel)

Standard DPP plot

RandomSubset

Sample randomly k points from the data set uniformly.

alg = RandomSubset(M)
Z = inducingpoints(alg, x)

Random subset plot

Greedy

This algorithm will select a subset of X which maximizes the ELBO (Evidence Lower BOund), which is done in a stochastic way via minibatches of size s. This also requires passing the output data, the kernel and the noise level as additional arguments to inducingpoints.

y = rand(N)
s = 5
kernel = with_lengthscale(SqExponentialKernel(), 0.2)
noise = 0.1
alg = Greedy(M, s)
Z = inducingpoints(alg, x; y, kernel, noise)

Greedy algorithm plot

GreedyVarSelection

Chooses the subset of x which minimises sum(d), d := diag(C_f - C_{fu} C_{uu}^{-1} C_{uf}). Stops when either M pseudo-inputs have been chosen, or maximum(d) < maximum(diag(C_f)) * tol). See Stable and Efficient Gaussian Process Calculations and Convergence of Sparse Variational Inference in Gaussian Processes Regression for a discussion of this criterion's use in the sparse GP context. In this example, a maximum of 50 inducing points can be used, but the algorithm chooses to use less than this as the tolerance is satisfied before all 50 are used.

kernel = with_lengthscale(SqExponentialKernel(), 0.2)
alg = GreedyVarSelection(50, 1e-6)
Z = inducingpoints(alg, x; kernel=kernel)
@show length(Z)
length(Z) = 49

Greedy algorithm plot

CoverTree

The CoverTree algorithm is a recent algorithm presented in Numerically Stable Sparse Gaussian Processes via Minimum Separation using Cover Trees . It relies on building a covering tree with the nodes representing the inducing points.

alg = CoverTree(0.2)
Z = inducingpoints(alg, x)

CoverTree algorithm plot

Online Algorithms

These algorithms are useful if we assume that we will have another set of data points that we would like to incorporate into an existing inducing point set.

N₂ = 25
x₂ = [rand(D) .* [0.2, 1.0] + [0.8, 0.0] for _ in 1:N₂]

We can then update the inital set of inducing points Z via updateZ (or inplace via updateZ!).

OnlineIPSelection

A method based on distance between inducing points and data. This algorithm has several parameters to tune the result. It also requires the kernel to be passed as a keyword argument to inducingpoints and updateZ.

kernel = with_lengthscale(SqExponentialKernel(), 0.2)
alg = OIPS()
Z = inducingpoints(alg, x; kernel)
Z₂ = updateZ(Z, alg, x₂; kernel)

Online inducing point selection plot

UniGrid

A regularly-spaced grid whose edges are adapted given the data. The inducing points Z are returned as the UniformGrid custom type (see below).

alg = UniGrid(5)
Z = inducingpoints(alg, x)
Z₂ = updateZ(Z, alg, x₂)

Unigrid plot

UniformGrid

When using the UniGrid algorithm, InducingPoints.jl provides the memory-efficient custom type UniformGrid, which is essentially a wrapper around a Iterators.product. It functions in many ways like an AbstractVector, but does not explicitly store all elements of the grid. Therefore, shown via the example of a two-dimensional grid, the object size only depends on the dimension, not on the number of grid points.

It is optimized to be very efficient with kernelmatrix function provided by Kernelfunctions.jl. However, compared to an explicitly stored Vector of grid points, it incurs additional overhead when used with other vector operations (illustrated below for the example of broadcasting sum).

Uniform grid bench plot

SeqDPP

Sequential Determinantal Point Processes, subsets are regularly sampled from the new data batches conditioned on the existing inducing points.

kernel = with_lengthscale(SqExponentialKernel(), 0.2)
alg = SeqDPP()
Z = inducingpoints(alg, x; kernel)
Z₂ = updateZ(Z, alg, x₂; kernel)

Sequential DPP plot

StreamKmeans

An online version of k-means.

alg = StreamKmeans(M)
Z = inducingpoints(alg, x)
Z₂ = updateZ(Z, alg, x₂)

Stream k-means plot

Webscale

Another online version of k-means

alg = Webscale(M)
Z = inducingpoints(alg, x)
Z₂ = updateZ(Z, alg, x₂)

Webscale plot