AugmentedGPLikelihoods

Documentation for AugmentedGPLikelihoods.

Gaussian Processes (GPs) are great tools for working with function approximations including uncertainty. On top of Gaussian regression, GPs can be used as latent functions for a lot of different tasks such as non-Gaussian regression, classification, multi-class classification or event counting. However, these tasks involve non-conjugate likelihoods and the GP posterior is then intractable. The typical solution is to approximate the posterior by either sampling from it or approximating it with another distribution. However, both these methods are computationally involved, require gradient and are not always guaranteed to converge.

The augmentation

An alternative proposed in [1] is to represent these non-conjugate likelihoods as scale-mixtures (sometimes requiring multiple steps) to obtain a conditionally conjugate likelihood. More concretely, some likelihoods can be written as:

\[ p(x) = \int q(x|\omega)d\pi(\omega).\]

One can then get rid of the integral by augmenting the model with the auxiliary variable $\omega$. With the right choice of mixture, one can obtain a likelihood conjugate with a GP $f$ but only when conditioned on $\omega$. This means we go from

\[ p(f|y) = \frac{p(y|f)p(f)}{p(y)},\]

which is intractable, to

\[ p(f,\omega|y) = \frac{p(y,\omega|f)p(f)}{p(y)},\]

where the conditionals $p(f|\omega,y)$ and $p(\omega|f,y)$ are known in closed-form.

Bayesian Inference

This new formulation leads to easier, more stable and faster inference. A natural algorithm following from this formulation is the Blocked Gibbs Sampling but also the Coordinate Ascent VI (CAVI) algorithm where the conditionals are used to compute the optimal variational distribution.

What this package does

Although an automatic way is proposed in [1], most of the augmentations require some hand derivations. This package provides the results of these derivations (as well as the derivations) and propose a unified framework to obtain the distributions of interest. Generally values (either samples or distributions) of the auxiliary variables need to be carried around. Since, each likelihood can have a different structure of auxiliary variables, they are uniformly moved as $\Omega$, which is a NamedTuple containing Vectors of either samples or distributions.

Gibbs Sampling

We give an example in the Gibbs Sampling tutorial using AbstractMCMC. But the API can be reduced to 5 main functions:

AugmentedGPLikelihoods.init_aux_variables — Function

init_aux_variables([rng::AbstractRNG], ::Likelihood, n::Int) -> TupleVector

Initialize collections of n auxiliary variables in a TupleVector to be used in the context of sampling. n should be the number of data inputs.

See also init_aux_posterior for variational inference.

AugmentedGPLikelihoods.aux_sample! — Function

aux_sample!([rng::AbstractRNG], Ω, lik::Likelihood, y, f) -> TupleVector

Sample the auxiliary variables Ω in-place based on the full-conditional aux_full_conditional associated with the augmented likelihood:

\[ p(\Omega|y,f) \propto p(\Omega,y|f).\]

See also aux_sample for an allocating version and aux_posterior! for variational inference.

AugmentedGPLikelihoods.aux_sample — Function

aux_sample([rng::AbstractRNG], lik::Likelihood, y, f) -> TupleVector

Sample and allocate the auxiliary variables Ω in a TupleVector based on the full-conditional associated with the likelihood.

See aux_sample! for an in-place version.

AugmentedGPLikelihoods.aux_full_conditional — Function

aux_full_conditional(lik::Likelihood, y, f::Real) -> Distribution

Given the observation y and latent f, returns the full conditional on the auxiliary variables Ω.

AugmentedGPLikelihoods.auglik_potential — Function

auglik_potential(lik::Likelihood, Ω, y) -> Tuple

Given the augmented likelihood $l(\Omega,y,f) \propto \exp(\beta(\Omega,y) f + \gamma(\Omega,y)f^2)$, return the potential, $\beta(\Omega,y)$. Note that this equivalent to the shift of the first natural parameter $\eta_1 = \Sigma^{-1}\mu$. The Tuple contains a Vector for each latent.

See also expected_auglik_potential for variational inference.

AugmentedGPLikelihoods.auglik_precision — Function

auglik_precision(lik::Likelihood, Ω, y) -> Tuple

Given the augmented likelihood $l(\Omega,y,f) \propto \exp(\beta(\Omega,y) f + \frac{\gamma(\Omega,y)}{2}f^2)$, return the precision, $\gamma(\Omega,y)$, note that this equivalent to the shift of the precision $\Lambda = \Sigma^{-1}$. The Tuple contains a Vector for each latent.

See also expected_auglik_precision for variational inference.

AugmentedGPLikelihoods.auglik_potential_and_precision — Function

auglik_potential_and_precision(lik::Likelihood, Ω, y) -> Tuple{Tuple, Tuple}

Returns both auglik_potential and auglik_precision when some computation can be saved by doing both at the same time.

First init_aux_variables initializes a NamedTuple of Vector(s) of auxiliary variables to be modified in-place during sampling. aux_sample! will sample the auxiliary variables from the full conditional $p(\Omega|y,f)$, given by aux_full_conditional, and return the modified NamedTuple. For generating a new NamedTuple every time, see aux_sample. The full-conditional from $f$ are given by

\[\begin{align*} p(f|y,\Omega) =& \mathcal{N}(f|m,S)\\ S =& \left(K^{-1} + \operatorname{Diagonal}(\lambda)\right)^{-1}\\ m =& S \left(h + K^{-1}\mu_0\right), \end{align*}\]

where $\lambda$ is obtained via auglik_precision and $h$ is obtained via auglik_potential. For likelihoods requiring multiple latent GP (e.g. multi-class classification or heteroscedastic likelihoods), auglik_potential and auglik_precision return a Tuple with the respective Vectors $\lambda$ and $h$.

As a general rule, the augmented likelihood will have the form

\[ p(y|f,\Omega) \propto \exp\left(h(\Omega,y)f + \frac{\lambda(\Omega,y)}{2}f^2\right),\]

with auglik_potential $\equiv h(\Omega,y)$ while auglik_precision $\equiv \lambda(\Omega,y)$.

Coordinate Ascent Variational Inference

CAVI updates work exactly the same way as the Gibbs Sampling, except we are now working with posterior distributions instead of samples. We work with the variational family $q(f)\prod_{i=1}^N q(\Omega_i)$, i.e. we make the mean-field assumption that our new auxiliary variables are independent of each other and independent of $f$.

Like for Gibbs Sampling, there are also 5 main functions

AugmentedGPLikelihoods.init_aux_posterior — Function

init_aux_posterior([T::DataType], lik::Likelihood, n::Int) -> AbstractProductMeasure

Initialize collections of n (independent) posteriors for the auxiliary variables in the context of variational inference. n should be the size of the data used every iteration. The real variational parameters will be given type T (Float64 by default)

See also init_aux_variables for sampling.

AugmentedGPLikelihoods.aux_posterior! — Function

aux_posterior!(qΩ, lik::Likelihood, y, qf) -> AbstractProductMeasure

Compute the optimal posterior of the auxiliary variables $q^*(\Omega)$ given the marginal distributions qf by updating the variational parameters in-place using the formula

\[ q^*(\Omega) \propto \exp\left(E_{q(f)}\left[p(\Omega|f,y)\right]\right)\]

See also aux_posterior and aux_sample!

AugmentedGPLikelihoods.aux_posterior — Function

aux_posterior(lik::Likelihood, y, qf) -> AbstractProductMeasure

Compute the optimal posterior of the auxiliary variables in a new AbstractProductMeasure (For by default).

See aux_posterior! for more details

AugmentedGPLikelihoods.expected_auglik_potential — Function

expected_auglik_potential(lik::Likelihood, qΩ, y) -> Tuple

Given the augmented likelihood $l(\Omega,y,f) \propto \exp(\beta(\Omega,y) f + \frac{\gamma(\Omega,y)}{2}f^2)$, return the expected potential, $E_{q(\Omega)}[\beta(\Omega,y)]$, note that this equivalent to the shift of the first variational natural parameter $\eta_1 = \Sigma^{-1}\mu$. The Tuple contains a Vector for each latent.

See also auglik_potential for sampling.

AugmentedGPLikelihoods.expected_auglik_precision — Function

expected_auglik_precision(lik::Likelihood, qΩ, y) -> Tuple

Given the augmented likelihood $l(\Omega,y,f) \propto \exp(\beta(\Omega,y) f + \frac{\gamma(\Omega,y)}{2}f^2)$, return the expected precision, $E_{q(\Omega)}[\gamma(\Omega,y)]$, note that this equivalent to the shift of the variational precision $\Lambda = \Sigma^{-1}$. The Tuple contains a Vector for each latent.

See also auglik_precision for sampling.

AugmentedGPLikelihoods.expected_auglik_potential_and_precision — Function

expected_auglik_potential_and_precision(lik::Likelihood, Ω, y) -> Tuple{Tuple, Tuple}

Returns both expected_auglik_potential and expected_auglik_precision when some computation can be saved by doing both at the same time.

init_aux_posterior initializes the posterior $\prod_{i=1}^Nq(\Omega_i)$ as a NamedTuple. aux_posterior! updates the variational posterior distributions in-place given the marginals $q(f_i)$ and return the modified NamedTuple. To get a new NamedTuple every time, use aux_posterior. Finally, expected_auglik_potential and expected_auglik_precision give us the elements needed to update the variational distribution $q(f)$. Like for Gibbs Sampling, we have the following optimal variational distributions:

\[\begin{align*} q^*(f) =& \mathcal{N}(f|m,S)\\ S =& \left(K^{-1} + \operatorname{Diagonal}(\lambda)\right)^{-1}\\ m =& S \left(h + K^{-1}\mu_0\right) \end{align*}\]

where $\lambda$ is given by expected_auglik_precision and $h$ is given by expected_auglik_potential. Note that if you work with Sparse GPs, the updates should be:

\[\begin{align*} S =& \left(K_{Z}^{-1} + \kappa\operatorname{Diagonal}(r)\kappa^\top\right)^{-1},\\ m =& S \left(\kappa t + K_Z^{-1}\mu_0(Z)\right), \end{align*}\]

where $\kappa=K_{Z}^{-1}K_{Z,X}$.

Likelihood and ELBO computations

You might be interested in computing the ELBO, i.e. the lower bound on the evidence of the augmented model. The ELBO can be written as:

\[\begin{align*} \mathcal{L(q(f,\Omega)} =& E_{q(f)q(\Omega)}\left[\log p(y,\Omega|f)\right] - \operatorname{KL}\left(q(f)||p(f)\right)\\ =& E_{q(f)q(\Omega)}\left[\log p(y|\Omega,f)\right] - \operatorname{KL}\left(q(\Omega)||p(\Omega)\right) -\operatorname{KL}\left(q(f)||p(f)\right) \end{align*}\]

To work with all these terms, AugmentedGPLikelihoods.jl provide a series of helping functions:

AugmentedGPLikelihoods.aug_loglik — Function

aug_loglik(lik::Likelihood, Ω, y, f) -> Real

Return the augmented log-likelihood with the given parameters. The augmented log-likelihood is of the form

\[ \log p(y,\Omega|f) = \log l(y,\Omega,f) + \log p(\Omega|y, f).\]

To only obtain the $p(\Omega|y, f)$ part see aux_prior and see logtilt for $\log l(y, \Omega, f)$.

A generic fallback exists based on logtilt and aux_prior but specialized implementations are encouraged.

AugmentedGPLikelihoods.expected_aug_loglik — Function

expected_aug_loglik(lik::Likelihood, qΩ, y, qf) -> Real

Return the expected augmented log-likelihood with the given parameters. The expected augmented log-likelihood is of the form

\[ E_{q(\Omega,f)}\left[\log p(y,\Omega|f)\right]\]

To only obtain the $p(\Omega|y)$ part see aux_prior and see logtilt for $\log l(y, \Omega, f)$.

A generic fallback exists based on expected_logtilt and aux_kldivergence but specialized implementations are encouraged.

AugmentedGPLikelihoods.aux_prior — Function

aux_prior(lik::Likelihood, y) -> AbstractProductMeasure

Returns a NamedTuple of distributions with the same structure as aux_posterior, init_aux_posterior and init_aux_variables. Note that an auxiliary prior is not always available. Even if the likelihood is a valid density, there is no guarantee that the prior of the augmented variable is!

AugmentedGPLikelihoods.aux_kldivergence — Function

aux_kldivergence(lik::Likelihood, qΩ::For, pΩ::For) -> Real
aux_kldivergence(lik::Likelihood, qΩ::For, y) -> Real

Compute the analytical KL divergence between the auxiliary variables posterior $q(\Omega)$, obtained with aux_posterior and prior $p(\Omega)$, obtained with aux_prior.

AugmentedGPLikelihoods.logtilt — Function

logtilt(lik::Likelihood, Ω, y, f) -> Real

Compute the quadratic part on $f$ of the augmented likelihood:

\[ \log C(\Omega, y) + \alpha(\Omega,y) + \beta(\Omega,y)f + \frac{\gamma(\Omega,y)}{2}f^2.\]

See also expected_logtilt for variational inference.

AugmentedGPLikelihoods.expected_logtilt — Function

expected_logtilt(lik::Likelihood, qΩ, y, qf) -> Real

Compute the expectation of the quadratic part on $f$ of the augmented likelihood.

\[ E_{q(\Omega,f)}\left[\log C(\Omega, y) + \alpha(\Omega,y) + \beta(\Omega,y)f + \frac{\gamma(\Omega,y)}{2}f^2\right].\]

See also logtilt.

aug_loglik returns the augmented log-likelihood $\log p(y,\Omega|f)$, but one should avoid using it as computing $p(\Omega)$ can be expensive. aux_prior returns the prior on the auxiliary variables, note that it can depend on the observations $y$. logtilt returns the log of the exponential part of the augmented likelihood, which is conjugate with $f$. aug_loglik is computed by default using logtilt + logpdf(aux_prior, x). Finally, for variational inference purposes, aux_kldivergence computes the KL divergence $\operatorname{KL}(q(\Omega)||p(\Omega))$ and expected_logtilt computes the expectation of [logtilt] analytically.