# AugmentedGPLikelihoods

Documentation for AugmentedGPLikelihoods.

Gaussian Processes (GPs) are great tools for working with function approximations including uncertainty. On top of Gaussian regression, GPs can be used as latent functions for a lot of different tasks such as non-Gaussian regression, classification, multi-class classification or event counting. However, these tasks involve non-conjugate likelihoods and the GP posterior is then intractable. The typical solution is to approximate the posterior by either sampling from it or approximating it with another distribution. However, both these methods are computationally involved, require gradient and are not always guaranteed to converge.

## The augmentation

An alternative proposed in [1] is to represent these non-conjugate likelihoods as scale-mixtures (sometimes requiring multiple steps) to obtain a **conditionally conjugate likelihood**. More concretely, some likelihoods can be written as:

\[ p(x) = \int q(x|\omega)d\pi(\omega).\]

One can then get rid of the integral by **augmenting** the model with the auxiliary variable $\omega$. With the right choice of mixture, one can obtain a likelihood conjugate with a GP $f$ but only when conditioned on $\omega$. This means we go from

\[ p(f|y) = \frac{p(y|f)p(f)}{p(y)},\]

which is intractable, to

\[ p(f,\omega|y) = \frac{p(y,\omega|f)p(f)}{p(y)},\]

where the conditionals $p(f|\omega,y)$ and $p(\omega|f,y)$ are known in closed-form.

## Bayesian Inference

This new formulation leads to easier, more stable and faster inference. A natural algorithm following from this formulation is the **Blocked Gibbs Sampling** but also the Coordinate Ascent VI (CAVI) algorithm where the conditionals are used to compute the optimal variational distribution.

## What this package does

Although an automatic way is proposed in [1], most of the augmentations require some hand derivations. This package provides the results of these derivations (as well as the derivations) and propose a unified framework to obtain the distributions of interest. Generally values (either samples or distributions) of the auxiliary variables need to be carried around. Since, each likelihood can have a different structure of auxiliary variables, they are uniformly moved as $\Omega$, which is a `NamedTuple`

containing `Vector`

s of either samples or distributions.

## Gibbs Sampling

We give an example in the Gibbs Sampling tutorial using `AbstractMCMC`

. But the API can be reduced to 5 main functions:

`AugmentedGPLikelihoods.init_aux_variables`

— Function`init_aux_variables([rng::AbstractRNG], ::Likelihood, n::Int) -> TupleVector`

Initialize collections of `n`

auxiliary variables in a `TupleVector`

to be used in the context of sampling. `n`

should be the number of data inputs.

See also `init_aux_posterior`

for variational inference.

`AugmentedGPLikelihoods.aux_sample!`

— Function`aux_sample!([rng::AbstractRNG], Ω, lik::Likelihood, y, f) -> TupleVector`

Sample the auxiliary variables `Ω`

in-place based on the full-conditional `aux_full_conditional`

associated with the augmented likelihood:

\[ p(\Omega|y,f) \propto p(\Omega,y|f).\]

See also `aux_sample`

for an allocating version and `aux_posterior!`

for variational inference.

`AugmentedGPLikelihoods.aux_sample`

— Function`aux_sample([rng::AbstractRNG], lik::Likelihood, y, f) -> TupleVector`

Sample and allocate the auxiliary variables `Ω`

in a `TupleVector`

based on the full-conditional associated with the likelihood.

See `aux_sample!`

for an in-place version.

`AugmentedGPLikelihoods.aux_full_conditional`

— Function`aux_full_conditional(lik::Likelihood, y, f::Real) -> Distribution`

Given the observation `y`

and latent `f`

, returns the full conditional on the auxiliary variables `Ω`

.

`AugmentedGPLikelihoods.auglik_potential`

— Function`auglik_potential(lik::Likelihood, Ω, y) -> Tuple`

Given the augmented likelihood $l(\Omega,y,f) \propto \exp(\beta(\Omega,y) f + \gamma(\Omega,y)f^2)$, return the potential, $\beta(\Omega,y)$. Note that this equivalent to the shift of the first natural parameter $\eta_1 = \Sigma^{-1}\mu$. The `Tuple`

contains a `Vector`

for each latent.

See also `expected_auglik_potential`

for variational inference.

`AugmentedGPLikelihoods.auglik_precision`

— Function`auglik_precision(lik::Likelihood, Ω, y) -> Tuple`

Given the augmented likelihood $l(\Omega,y,f) \propto \exp(\beta(\Omega,y) f + \frac{\gamma(\Omega,y)}{2}f^2)$, return the precision, $\gamma(\Omega,y)$, note that this equivalent to the shift of the precision $\Lambda = \Sigma^{-1}$. The `Tuple`

contains a `Vector`

for each latent.

See also `expected_auglik_precision`

for variational inference.

`AugmentedGPLikelihoods.auglik_potential_and_precision`

— Function`auglik_potential_and_precision(lik::Likelihood, Ω, y) -> Tuple{Tuple, Tuple}`

Returns both `auglik_potential`

and `auglik_precision`

when some computation can be saved by doing both at the same time.

First `init_aux_variables`

initializes a `NamedTuple`

of `Vector`

(s) of auxiliary variables to be modified in-place during sampling. `aux_sample!`

will sample the auxiliary variables from the full conditional $p(\Omega|y,f)$, given by `aux_full_conditional`

, and return the modified `NamedTuple`

. For generating a new `NamedTuple`

every time, see `aux_sample`

. The full-conditional from $f$ are given by

\[\begin{align*} p(f|y,\Omega) =& \mathcal{N}(f|m,S)\\ S =& \left(K^{-1} + \operatorname{Diagonal}(\lambda)\right)^{-1}\\ m =& S \left(h + K^{-1}\mu_0\right), \end{align*}\]

where $\lambda$ is obtained via `auglik_precision`

and $h$ is obtained via `auglik_potential`

. For likelihoods requiring multiple latent GP (e.g. multi-class classification or heteroscedastic likelihoods), `auglik_potential`

and `auglik_precision`

return a `Tuple`

with the respective `Vector`

s $\lambda$ and $h$.

As a general rule, the augmented likelihood will have the form

\[ p(y|f,\Omega) \propto \exp\left(h(\Omega,y)f + \frac{\lambda(\Omega,y)}{2}f^2\right),\]

with `auglik_potential`

$\equiv h(\Omega,y)$ while `auglik_precision`

$\equiv \lambda(\Omega,y)$.

## Coordinate Ascent Variational Inference

CAVI updates work exactly the same way as the Gibbs Sampling, except we are now working with posterior distributions instead of samples. We work with the variational family $q(f)\prod_{i=1}^N q(\Omega_i)$, i.e. we make the **mean-field** assumption that our new auxiliary variables are independent of each other and independent of $f$.

Like for Gibbs Sampling, there are also 5 main functions

`AugmentedGPLikelihoods.init_aux_posterior`

— Function`init_aux_posterior([T::DataType], lik::Likelihood, n::Int) -> AbstractProductMeasure`

Initialize collections of `n`

(independent) posteriors for the auxiliary variables in the context of variational inference. `n`

should be the size of the data used every iteration. The real variational parameters will be given type `T`

(`Float64`

by default)

See also `init_aux_variables`

for sampling.

`AugmentedGPLikelihoods.aux_posterior!`

— Function`aux_posterior!(qΩ, lik::Likelihood, y, qf) -> AbstractProductMeasure`

Compute the optimal posterior of the auxiliary variables $q^*(\Omega)$ given the marginal distributions `qf`

by updating the variational parameters in-place using the formula

\[ q^*(\Omega) \propto \exp\left(E_{q(f)}\left[p(\Omega|f,y)\right]\right)\]

See also `aux_posterior`

and `aux_sample!`

`AugmentedGPLikelihoods.aux_posterior`

— Function`aux_posterior(lik::Likelihood, y, qf) -> AbstractProductMeasure`

Compute the optimal posterior of the auxiliary variables in a new `AbstractProductMeasure`

(`For`

by default).

See `aux_posterior!`

for more details

`AugmentedGPLikelihoods.expected_auglik_potential`

— Function`expected_auglik_potential(lik::Likelihood, qΩ, y) -> Tuple`

Given the augmented likelihood $l(\Omega,y,f) \propto \exp(\beta(\Omega,y) f + \frac{\gamma(\Omega,y)}{2}f^2)$, return the expected potential, $E_{q(\Omega)}[\beta(\Omega,y)]$, note that this equivalent to the shift of the first variational natural parameter $\eta_1 = \Sigma^{-1}\mu$. The `Tuple`

contains a `Vector`

for each latent.

See also `auglik_potential`

for sampling.

`AugmentedGPLikelihoods.expected_auglik_precision`

— Function`expected_auglik_precision(lik::Likelihood, qΩ, y) -> Tuple`

Given the augmented likelihood $l(\Omega,y,f) \propto \exp(\beta(\Omega,y) f + \frac{\gamma(\Omega,y)}{2}f^2)$, return the expected precision, $E_{q(\Omega)}[\gamma(\Omega,y)]$, note that this equivalent to the shift of the variational precision $\Lambda = \Sigma^{-1}$. The `Tuple`

contains a `Vector`

for each latent.

See also `auglik_precision`

for sampling.

`AugmentedGPLikelihoods.expected_auglik_potential_and_precision`

— Function`expected_auglik_potential_and_precision(lik::Likelihood, Ω, y) -> Tuple{Tuple, Tuple}`

Returns both `expected_auglik_potential`

and `expected_auglik_precision`

when some computation can be saved by doing both at the same time.

`init_aux_posterior`

initializes the posterior $\prod_{i=1}^Nq(\Omega_i)$ as a `NamedTuple`

. `aux_posterior!`

updates the variational posterior distributions in-place given the marginals $q(f_i)$ and return the modified `NamedTuple`

. To get a new `NamedTuple`

every time, use `aux_posterior`

. Finally, `expected_auglik_potential`

and `expected_auglik_precision`

give us the elements needed to update the variational distribution $q(f)$. Like for Gibbs Sampling, we have the following optimal variational distributions:

\[\begin{align*} q^*(f) =& \mathcal{N}(f|m,S)\\ S =& \left(K^{-1} + \operatorname{Diagonal}(\lambda)\right)^{-1}\\ m =& S \left(h + K^{-1}\mu_0\right) \end{align*}\]

where $\lambda$ is given by `expected_auglik_precision`

and $h$ is given by `expected_auglik_potential`

. Note that if you work with **Sparse** GPs, the updates should be:

\[\begin{align*} S =& \left(K_{Z}^{-1} + \kappa\operatorname{Diagonal}(r)\kappa^\top\right)^{-1},\\ m =& S \left(\kappa t + K_Z^{-1}\mu_0(Z)\right), \end{align*}\]

where $\kappa=K_{Z}^{-1}K_{Z,X}$.

## Likelihood and ELBO computations

You might be interested in computing the ELBO, i.e. the lower bound on the evidence of the augmented model. The ELBO can be written as:

\[\begin{align*} \mathcal{L(q(f,\Omega)} =& E_{q(f)q(\Omega)}\left[\log p(y,\Omega|f)\right] - \operatorname{KL}\left(q(f)||p(f)\right)\\ =& E_{q(f)q(\Omega)}\left[\log p(y|\Omega,f)\right] - \operatorname{KL}\left(q(\Omega)||p(\Omega)\right) -\operatorname{KL}\left(q(f)||p(f)\right) \end{align*}\]

To work with all these terms, `AugmentedGPLikelihoods.jl`

provide a series of helping functions:

`AugmentedGPLikelihoods.aug_loglik`

— Function`aug_loglik(lik::Likelihood, Ω, y, f) -> Real`

Return the augmented log-likelihood with the given parameters. The augmented log-likelihood is of the form

\[ \log p(y,\Omega|f) = \log l(y,\Omega,f) + \log p(\Omega|y, f).\]

To only obtain the $p(\Omega|y, f)$ part see `aux_prior`

and see `logtilt`

for $\log l(y, \Omega, f)$.

A generic fallback exists based on `logtilt`

and `aux_prior`

but specialized implementations are encouraged.

`AugmentedGPLikelihoods.expected_aug_loglik`

— Function`expected_aug_loglik(lik::Likelihood, qΩ, y, qf) -> Real`

Return the expected augmented log-likelihood with the given parameters. The expected augmented log-likelihood is of the form

\[ E_{q(\Omega,f)}\left[\log p(y,\Omega|f)\right]\]

To only obtain the $p(\Omega|y)$ part see `aux_prior`

and see `logtilt`

for $\log l(y, \Omega, f)$.

A generic fallback exists based on `expected_logtilt`

and `aux_kldivergence`

but specialized implementations are encouraged.

`AugmentedGPLikelihoods.aux_prior`

— Function`aux_prior(lik::Likelihood, y) -> AbstractProductMeasure`

Returns a `NamedTuple`

of distributions with the same structure as `aux_posterior`

, `init_aux_posterior`

and `init_aux_variables`

. Note that an auxiliary prior is not always available. Even if the likelihood is a valid density, there is no guarantee that the prior of the augmented variable is!

`AugmentedGPLikelihoods.aux_kldivergence`

— Function```
aux_kldivergence(lik::Likelihood, qΩ::For, pΩ::For) -> Real
aux_kldivergence(lik::Likelihood, qΩ::For, y) -> Real
```

Compute the analytical KL divergence between the auxiliary variables posterior $q(\Omega)$, obtained with `aux_posterior`

and prior $p(\Omega)$, obtained with `aux_prior`

.

`AugmentedGPLikelihoods.logtilt`

— Function`logtilt(lik::Likelihood, Ω, y, f) -> Real`

Compute the quadratic part on $f$ of the augmented likelihood:

\[ \log C(\Omega, y) + \alpha(\Omega,y) + \beta(\Omega,y)f + \frac{\gamma(\Omega,y)}{2}f^2.\]

See also `expected_logtilt`

for variational inference.

`AugmentedGPLikelihoods.expected_logtilt`

— Function`expected_logtilt(lik::Likelihood, qΩ, y, qf) -> Real`

Compute the expectation of the quadratic part on $f$ of the augmented likelihood.

\[ E_{q(\Omega,f)}\left[\log C(\Omega, y) + \alpha(\Omega,y) + \beta(\Omega,y)f + \frac{\gamma(\Omega,y)}{2}f^2\right].\]

See also `logtilt`

.

`aug_loglik`

returns the augmented log-likelihood $\log p(y,\Omega|f)$, but one should avoid using it as computing $p(\Omega)$ can be expensive. `aux_prior`

returns the prior on the auxiliary variables, note that it can depend on the observations $y$. `logtilt`

returns the log of the exponential part of the augmented likelihood, which is conjugate with $f$. `aug_loglik`

is computed by default using `logtilt`

+ `logpdf(aux_prior, x)`

. Finally, for variational inference purposes, `aux_kldivergence`

computes the KL divergence $\operatorname{KL}(q(\Omega)||p(\Omega))$ and `expected_logtilt`

computes the expectation of [`logtilt`

] analytically.