Intro

InducingPoints.jl aims at providing an easy way to select inducing points locations for Sparse Gaussian Processes both in an online and offline setting. These are used most prominently in sparse GP regression (see e.g. `ApproximateGPs.jl)

Quickstart

InducingPoints.jl provides the following list of algorithms. For details on the specific usage see the algorithms section.

All algorithms inherit from AbstractInducingPointsSelection or AIPSA which can be passed to the different APIs.

Offline Inducing Points Selection

These algorithms are designed to compute inducing points for a data set that is likely to remain unchanged. If the data set changes, the algorithms have to be rerun from scratch.

alg = KMeansAlg(10)
Z = inducingpoints(alg, X; kwargs...)

The Offline options are:

  • KmeansAlg: Use the k-means algorithm to select centroids minimizing the square distance with the dataset. The seeding is done via k-means++. Note that the inducing points are not going to be a subset of the data.
  • kDPP: Sample from a k-Determinantal Point Process to select k points. Z will be a subset of X.
  • StdDPP: Sample from a standard Determinantal Point Process. The number of inducing points is not fixed here. Z will be a subset of X.
  • RandomSubset : Sample randomly k points from the data set uniformly.
  • Greedy: Will select a subset of X which maximizes the ELBO (in a stochastic way).
  • CoverTree: Will build a tree to select the optimal nodes covering the data.

Online Inducing Points Selection

Online selection algorithms compute an initial set similarly to the offline methods via inducingpoints. For successive changes of the data sets, InducingPoints.jl allows for efficient updating via updateZ!.

alg = OIPS()
Z = inducingpoints(alg, x_1; kwargs...)
for x in eachbatch(X)
    updateZ!(Z, alg, x; kwargs...)
end

The Online options are:

  • OnlineIPSelection: A method based on distance between inducing points and data
  • UniGrid: A regularly-spaced grid whom edges are adapted given the data. Uses memory efficient custom type UniformGrid.
  • SeqDPP: Sequential Determinantal Point Processes, subsets are regularly sampled from the new data batches conditioned on the existing inducing points.
  • StreamKmeans: An online version of k-means.
  • Webscale: Another online version of k-means

Index