AugmentedGPLikelihoods

Documentation for AugmentedGPLikelihoods.

Gaussian Processes (GPs) are great tools for working with function approximations including uncertainty. On top of Gaussian regression, GPs can be used as latent functions for a lot of different tasks such as non-Gaussian regression, classification, multi-class classification or event counting. However, these tasks involve non-conjugate likelihoods and the GP posterior is then intractable. The typical solution is to approximate the posterior by either sampling from it or approximating it with another distribution. However, both these methods are computationally involved, require gradient and are not always guaranteed to converge.

The augmentation

An alternative proposed in [1] is to represent these non-conjugate likelihoods as scale-mixtures (sometimes requiring multiple steps) to obtain a conditionally conjugate likelihood. More concretely, some likelihoods can be written as:

\[ p(x) = \int q(x|\omega)d\pi(\omega).\]

One can then get rid of the integral by augmenting the model with the auxiliary variable $\omega$. With the right choice of mixture, one can obtain a likelihood conjugate with a GP $f$ but only when conditioned on $\omega$. This means we go from

\[ p(f|y) = \frac{p(y|f)p(f)}{p(y)},\]

which is intractable, to

\[ p(f,\omega|y) = \frac{p(y,\omega|f)p(f)}{p(y)},\]

where the conditionals $p(f|\omega,y)$ and $p(\omega|f,y)$ are known in closed-form.

Bayesian Inference

This new formulation leads to easier, more stable and faster inference. A natural algorithm following from this formulation is the Blocked Gibbs Sampling but also the Coordinate Ascent VI (CAVI) algorithm where the conditionals are used to compute the optimal variational distribution.

What this package does

Although an automatic way is proposed in [1], most of the augmentations require some hand derivations. This package provides the results of these derivations (as well as the derivations) and propose a unified framework to obtain the distributions of interest. Generally values (either samples or distributions) of the auxiliary variables need to be carried around. Since, each likelihood can have a different structure of auxiliary variables, they are uniformly moved as $\Omega$, which is a NamedTuple containing Vectors of either samples or distributions.

Gibbs Sampling

We give an example in the Gibbs Sampling tutorial using AbstractMCMC. But the API can be reduced to 5 main functions:

AugmentedGPLikelihoods.init_aux_variablesFunction
init_aux_variables([rng::AbstractRNG], ::Likelihood, n::Int) -> TupleVector

Initialize collections of n auxiliary variables in a TupleVector to be used in the context of sampling. n should be the number of data inputs.

See also init_aux_posterior for variational inference.

source
AugmentedGPLikelihoods.aux_sampleFunction
aux_sample([rng::AbstractRNG], lik::Likelihood, y, f) -> TupleVector

Sample and allocate the auxiliary variables Ω in a TupleVector based on the full-conditional associated with the likelihood.

See aux_sample! for an in-place version.

source
AugmentedGPLikelihoods.auglik_potentialFunction
auglik_potential(lik::Likelihood, Ω, y) -> Tuple

Given the augmented likelihood $l(\Omega,y,f) \propto \exp(\beta(\Omega,y) f + \gamma(\Omega,y)f^2)$, return the potential, $\beta(\Omega,y)$. Note that this equivalent to the shift of the first natural parameter $\eta_1 = \Sigma^{-1}\mu$. The Tuple contains a Vector for each latent.

See also expected_auglik_potential for variational inference.

source
AugmentedGPLikelihoods.auglik_precisionFunction
auglik_precision(lik::Likelihood, Ω, y) -> Tuple

Given the augmented likelihood $l(\Omega,y,f) \propto \exp(\beta(\Omega,y) f + \frac{\gamma(\Omega,y)}{2}f^2)$, return the precision, $\gamma(\Omega,y)$, note that this equivalent to the shift of the precision $\Lambda = \Sigma^{-1}$. The Tuple contains a Vector for each latent.

See also expected_auglik_precision for variational inference.

source

First init_aux_variables initializes a NamedTuple of Vector(s) of auxiliary variables to be modified in-place during sampling. aux_sample! will sample the auxiliary variables from the full conditional $p(\Omega|y,f)$, given by aux_full_conditional, and return the modified NamedTuple. For generating a new NamedTuple every time, see aux_sample. The full-conditional from $f$ are given by

\[\begin{align*} p(f|y,\Omega) =& \mathcal{N}(f|m,S)\\ S =& \left(K^{-1} + \operatorname{Diagonal}(\lambda)\right)^{-1}\\ m =& S \left(h + K^{-1}\mu_0\right), \end{align*}\]

where $\lambda$ is obtained via auglik_precision and $h$ is obtained via auglik_potential. For likelihoods requiring multiple latent GP (e.g. multi-class classification or heteroscedastic likelihoods), auglik_potential and auglik_precision return a Tuple with the respective Vectors $\lambda$ and $h$.

As a general rule, the augmented likelihood will have the form

\[ p(y|f,\Omega) \propto \exp\left(h(\Omega,y)f + \frac{\lambda(\Omega,y)}{2}f^2\right),\]

with auglik_potential $\equiv h(\Omega,y)$ while auglik_precision $\equiv \lambda(\Omega,y)$.

Coordinate Ascent Variational Inference

CAVI updates work exactly the same way as the Gibbs Sampling, except we are now working with posterior distributions instead of samples. We work with the variational family $q(f)\prod_{i=1}^N q(\Omega_i)$, i.e. we make the mean-field assumption that our new auxiliary variables are independent of each other and independent of $f$.

Like for Gibbs Sampling, there are also 5 main functions

AugmentedGPLikelihoods.init_aux_posteriorFunction
init_aux_posterior([T::DataType], lik::Likelihood, n::Int) -> AbstractProductMeasure

Initialize collections of n (independent) posteriors for the auxiliary variables in the context of variational inference. n should be the size of the data used every iteration. The real variational parameters will be given type T (Float64 by default)

See also init_aux_variables for sampling.

source
AugmentedGPLikelihoods.aux_posterior!Function
aux_posterior!(qΩ, lik::Likelihood, y, qf) -> AbstractProductMeasure

Compute the optimal posterior of the auxiliary variables $q^*(\Omega)$ given the marginal distributions qf by updating the variational parameters in-place using the formula

\[ q^*(\Omega) \propto \exp\left(E_{q(f)}\left[p(\Omega|f,y)\right]\right)\]

See also aux_posterior and aux_sample!

source
AugmentedGPLikelihoods.expected_auglik_potentialFunction
expected_auglik_potential(lik::Likelihood, qΩ, y) -> Tuple

Given the augmented likelihood $l(\Omega,y,f) \propto \exp(\beta(\Omega,y) f + \frac{\gamma(\Omega,y)}{2}f^2)$, return the expected potential, $E_{q(\Omega)}[\beta(\Omega,y)]$, note that this equivalent to the shift of the first variational natural parameter $\eta_1 = \Sigma^{-1}\mu$. The Tuple contains a Vector for each latent.

See also auglik_potential for sampling.

source
AugmentedGPLikelihoods.expected_auglik_precisionFunction
expected_auglik_precision(lik::Likelihood, qΩ, y) -> Tuple

Given the augmented likelihood $l(\Omega,y,f) \propto \exp(\beta(\Omega,y) f + \frac{\gamma(\Omega,y)}{2}f^2)$, return the expected precision, $E_{q(\Omega)}[\gamma(\Omega,y)]$, note that this equivalent to the shift of the variational precision $\Lambda = \Sigma^{-1}$. The Tuple contains a Vector for each latent.

See also auglik_precision for sampling.

source

init_aux_posterior initializes the posterior $\prod_{i=1}^Nq(\Omega_i)$ as a NamedTuple. aux_posterior! updates the variational posterior distributions in-place given the marginals $q(f_i)$ and return the modified NamedTuple. To get a new NamedTuple every time, use aux_posterior. Finally, expected_auglik_potential and expected_auglik_precision give us the elements needed to update the variational distribution $q(f)$. Like for Gibbs Sampling, we have the following optimal variational distributions:

\[\begin{align*} q^*(f) =& \mathcal{N}(f|m,S)\\ S =& \left(K^{-1} + \operatorname{Diagonal}(\lambda)\right)^{-1}\\ m =& S \left(h + K^{-1}\mu_0\right) \end{align*}\]

where $\lambda$ is given by expected_auglik_precision and $h$ is given by expected_auglik_potential. Note that if you work with Sparse GPs, the updates should be:

\[\begin{align*} S =& \left(K_{Z}^{-1} + \kappa\operatorname{Diagonal}(r)\kappa^\top\right)^{-1},\\ m =& S \left(\kappa t + K_Z^{-1}\mu_0(Z)\right), \end{align*}\]

where $\kappa=K_{Z}^{-1}K_{Z,X}$.

Likelihood and ELBO computations

You might be interested in computing the ELBO, i.e. the lower bound on the evidence of the augmented model. The ELBO can be written as:

\[\begin{align*} \mathcal{L(q(f,\Omega)} =& E_{q(f)q(\Omega)}\left[\log p(y,\Omega|f)\right] - \operatorname{KL}\left(q(f)||p(f)\right)\\ =& E_{q(f)q(\Omega)}\left[\log p(y|\Omega,f)\right] - \operatorname{KL}\left(q(\Omega)||p(\Omega)\right) -\operatorname{KL}\left(q(f)||p(f)\right) \end{align*}\]

To work with all these terms, AugmentedGPLikelihoods.jl provide a series of helping functions:

AugmentedGPLikelihoods.aug_loglikFunction
aug_loglik(lik::Likelihood, Ω, y, f) -> Real

Return the augmented log-likelihood with the given parameters. The augmented log-likelihood is of the form

\[ \log p(y,\Omega|f) = \log l(y,\Omega,f) + \log p(\Omega|y, f).\]

To only obtain the $p(\Omega|y, f)$ part see aux_prior and see logtilt for $\log l(y, \Omega, f)$.

A generic fallback exists based on logtilt and aux_prior but specialized implementations are encouraged.

source
AugmentedGPLikelihoods.expected_aug_loglikFunction
expected_aug_loglik(lik::Likelihood, qΩ, y, qf) -> Real

Return the expected augmented log-likelihood with the given parameters. The expected augmented log-likelihood is of the form

\[ E_{q(\Omega,f)}\left[\log p(y,\Omega|f)\right]\]

To only obtain the $p(\Omega|y)$ part see aux_prior and see logtilt for $\log l(y, \Omega, f)$.

A generic fallback exists based on expected_logtilt and aux_kldivergence but specialized implementations are encouraged.

source
AugmentedGPLikelihoods.aux_kldivergenceFunction
aux_kldivergence(lik::Likelihood, qΩ::For, pΩ::For) -> Real
aux_kldivergence(lik::Likelihood, qΩ::For, y) -> Real

Compute the analytical KL divergence between the auxiliary variables posterior $q(\Omega)$, obtained with aux_posterior and prior $p(\Omega)$, obtained with aux_prior.

source
AugmentedGPLikelihoods.logtiltFunction
logtilt(lik::Likelihood, Ω, y, f) -> Real

Compute the quadratic part on $f$ of the augmented likelihood:

\[ \log C(\Omega, y) + \alpha(\Omega,y) + \beta(\Omega,y)f + \frac{\gamma(\Omega,y)}{2}f^2.\]

See also expected_logtilt for variational inference.

source
AugmentedGPLikelihoods.expected_logtiltFunction
expected_logtilt(lik::Likelihood, qΩ, y, qf) -> Real

Compute the expectation of the quadratic part on $f$ of the augmented likelihood.

\[ E_{q(\Omega,f)}\left[\log C(\Omega, y) + \alpha(\Omega,y) + \beta(\Omega,y)f + \frac{\gamma(\Omega,y)}{2}f^2\right].\]

See also logtilt.

source

aug_loglik returns the augmented log-likelihood $\log p(y,\Omega|f)$, but one should avoid using it as computing $p(\Omega)$ can be expensive. aux_prior returns the prior on the auxiliary variables, note that it can depend on the observations $y$. logtilt returns the log of the exponential part of the augmented likelihood, which is conjugate with $f$. aug_loglik is computed by default using logtilt + logpdf(aux_prior, x). Finally, for variational inference purposes, aux_kldivergence computes the KL divergence $\operatorname{KL}(q(\Omega)||p(\Omega))$ and expected_logtilt computes the expectation of [logtilt] analytically.