AugmentedGPLikelihoods
Documentation for AugmentedGPLikelihoods.
Gaussian Processes (GPs) are great tools for working with function approximations including uncertainty. On top of Gaussian regression, GPs can be used as latent functions for a lot of different tasks such as non-Gaussian regression, classification, multi-class classification or event counting. However, these tasks involve non-conjugate likelihoods and the GP posterior is then intractable. The typical solution is to approximate the posterior by either sampling from it or approximating it with another distribution. However, both these methods are computationally involved, require gradient and are not always guaranteed to converge.
The augmentation
An alternative proposed in [1] is to represent these non-conjugate likelihoods as scale-mixtures (sometimes requiring multiple steps) to obtain a conditionally conjugate likelihood. More concretely, some likelihoods can be written as:
\[ p(x) = \int q(x|\omega)d\pi(\omega).\]
One can then get rid of the integral by augmenting the model with the auxiliary variable $\omega$. With the right choice of mixture, one can obtain a likelihood conjugate with a GP $f$ but only when conditioned on $\omega$. This means we go from
\[ p(f|y) = \frac{p(y|f)p(f)}{p(y)},\]
which is intractable, to
\[ p(f,\omega|y) = \frac{p(y,\omega|f)p(f)}{p(y)},\]
where the conditionals $p(f|\omega,y)$ and $p(\omega|f,y)$ are known in closed-form.
Bayesian Inference
This new formulation leads to easier, more stable and faster inference. A natural algorithm following from this formulation is the Blocked Gibbs Sampling but also the Coordinate Ascent VI (CAVI) algorithm where the conditionals are used to compute the optimal variational distribution.
What this package does
Although an automatic way is proposed in [1], most of the augmentations require some hand derivations. This package provides the results of these derivations (as well as the derivations) and propose a unified framework to obtain the distributions of interest. Generally values (either samples or distributions) of the auxiliary variables need to be carried around. Since, each likelihood can have a different structure of auxiliary variables, they are uniformly moved as $\Omega$, which is a NamedTuple containing Vectors of either samples or distributions.
Gibbs Sampling
We give an example in the Gibbs Sampling tutorial using AbstractMCMC. But the API can be reduced to 5 main functions:
AugmentedGPLikelihoods.init_aux_variables — Functioninit_aux_variables([rng::AbstractRNG], ::Likelihood, n::Int) -> TupleVectorInitialize collections of n auxiliary variables in a TupleVector to be used in the context of sampling. n should be the number of data inputs.
See also init_aux_posterior for variational inference.
AugmentedGPLikelihoods.aux_sample! — Functionaux_sample!([rng::AbstractRNG], Ω, lik::Likelihood, y, f) -> TupleVectorSample the auxiliary variables Ω in-place based on the full-conditional aux_full_conditional associated with the augmented likelihood:
\[ p(\Omega|y,f) \propto p(\Omega,y|f).\]
See also aux_sample for an allocating version and aux_posterior! for variational inference.
AugmentedGPLikelihoods.aux_sample — Functionaux_sample([rng::AbstractRNG], lik::Likelihood, y, f) -> TupleVectorSample and allocate the auxiliary variables Ω in a TupleVector based on the full-conditional associated with the likelihood.
See aux_sample! for an in-place version.
AugmentedGPLikelihoods.aux_full_conditional — Functionaux_full_conditional(lik::Likelihood, y, f::Real) -> DistributionGiven the observation y and latent f, returns the full conditional on the auxiliary variables Ω.
AugmentedGPLikelihoods.auglik_potential — Functionauglik_potential(lik::Likelihood, Ω, y) -> TupleGiven the augmented likelihood $l(\Omega,y,f) \propto \exp(\beta(\Omega,y) f + \gamma(\Omega,y)f^2)$, return the potential, $\beta(\Omega,y)$. Note that this equivalent to the shift of the first natural parameter $\eta_1 = \Sigma^{-1}\mu$. The Tuple contains a Vector for each latent.
See also expected_auglik_potential for variational inference.
AugmentedGPLikelihoods.auglik_precision — Functionauglik_precision(lik::Likelihood, Ω, y) -> TupleGiven the augmented likelihood $l(\Omega,y,f) \propto \exp(\beta(\Omega,y) f + \frac{\gamma(\Omega,y)}{2}f^2)$, return the precision, $\gamma(\Omega,y)$, note that this equivalent to the shift of the precision $\Lambda = \Sigma^{-1}$. The Tuple contains a Vector for each latent.
See also expected_auglik_precision for variational inference.
AugmentedGPLikelihoods.auglik_potential_and_precision — Functionauglik_potential_and_precision(lik::Likelihood, Ω, y) -> Tuple{Tuple, Tuple}Returns both auglik_potential and auglik_precision when some computation can be saved by doing both at the same time.
First init_aux_variables initializes a NamedTuple of Vector(s) of auxiliary variables to be modified in-place during sampling. aux_sample! will sample the auxiliary variables from the full conditional $p(\Omega|y,f)$, given by aux_full_conditional, and return the modified NamedTuple. For generating a new NamedTuple every time, see aux_sample. The full-conditional from $f$ are given by
\[\begin{align*} p(f|y,\Omega) =& \mathcal{N}(f|m,S)\\ S =& \left(K^{-1} + \operatorname{Diagonal}(\lambda)\right)^{-1}\\ m =& S \left(h + K^{-1}\mu_0\right), \end{align*}\]
where $\lambda$ is obtained via auglik_precision and $h$ is obtained via auglik_potential. For likelihoods requiring multiple latent GP (e.g. multi-class classification or heteroscedastic likelihoods), auglik_potential and auglik_precision return a Tuple with the respective Vectors $\lambda$ and $h$.
As a general rule, the augmented likelihood will have the form
\[ p(y|f,\Omega) \propto \exp\left(h(\Omega,y)f + \frac{\lambda(\Omega,y)}{2}f^2\right),\]
with auglik_potential $\equiv h(\Omega,y)$ while auglik_precision $\equiv \lambda(\Omega,y)$.
Coordinate Ascent Variational Inference
CAVI updates work exactly the same way as the Gibbs Sampling, except we are now working with posterior distributions instead of samples. We work with the variational family $q(f)\prod_{i=1}^N q(\Omega_i)$, i.e. we make the mean-field assumption that our new auxiliary variables are independent of each other and independent of $f$.
Like for Gibbs Sampling, there are also 5 main functions
AugmentedGPLikelihoods.init_aux_posterior — Functioninit_aux_posterior([T::DataType], lik::Likelihood, n::Int) -> AbstractProductMeasureInitialize collections of n (independent) posteriors for the auxiliary variables in the context of variational inference. n should be the size of the data used every iteration. The real variational parameters will be given type T (Float64 by default)
See also init_aux_variables for sampling.
AugmentedGPLikelihoods.aux_posterior! — Functionaux_posterior!(qΩ, lik::Likelihood, y, qf) -> AbstractProductMeasureCompute the optimal posterior of the auxiliary variables $q^*(\Omega)$ given the marginal distributions qf by updating the variational parameters in-place using the formula
\[ q^*(\Omega) \propto \exp\left(E_{q(f)}\left[p(\Omega|f,y)\right]\right)\]
See also aux_posterior and aux_sample!
AugmentedGPLikelihoods.aux_posterior — Functionaux_posterior(lik::Likelihood, y, qf) -> AbstractProductMeasureCompute the optimal posterior of the auxiliary variables in a new AbstractProductMeasure (For by default).
See aux_posterior! for more details
AugmentedGPLikelihoods.expected_auglik_potential — Functionexpected_auglik_potential(lik::Likelihood, qΩ, y) -> TupleGiven the augmented likelihood $l(\Omega,y,f) \propto \exp(\beta(\Omega,y) f + \frac{\gamma(\Omega,y)}{2}f^2)$, return the expected potential, $E_{q(\Omega)}[\beta(\Omega,y)]$, note that this equivalent to the shift of the first variational natural parameter $\eta_1 = \Sigma^{-1}\mu$. The Tuple contains a Vector for each latent.
See also auglik_potential for sampling.
AugmentedGPLikelihoods.expected_auglik_precision — Functionexpected_auglik_precision(lik::Likelihood, qΩ, y) -> TupleGiven the augmented likelihood $l(\Omega,y,f) \propto \exp(\beta(\Omega,y) f + \frac{\gamma(\Omega,y)}{2}f^2)$, return the expected precision, $E_{q(\Omega)}[\gamma(\Omega,y)]$, note that this equivalent to the shift of the variational precision $\Lambda = \Sigma^{-1}$. The Tuple contains a Vector for each latent.
See also auglik_precision for sampling.
AugmentedGPLikelihoods.expected_auglik_potential_and_precision — Functionexpected_auglik_potential_and_precision(lik::Likelihood, Ω, y) -> Tuple{Tuple, Tuple}Returns both expected_auglik_potential and expected_auglik_precision when some computation can be saved by doing both at the same time.
init_aux_posterior initializes the posterior $\prod_{i=1}^Nq(\Omega_i)$ as a NamedTuple. aux_posterior! updates the variational posterior distributions in-place given the marginals $q(f_i)$ and return the modified NamedTuple. To get a new NamedTuple every time, use aux_posterior. Finally, expected_auglik_potential and expected_auglik_precision give us the elements needed to update the variational distribution $q(f)$. Like for Gibbs Sampling, we have the following optimal variational distributions:
\[\begin{align*} q^*(f) =& \mathcal{N}(f|m,S)\\ S =& \left(K^{-1} + \operatorname{Diagonal}(\lambda)\right)^{-1}\\ m =& S \left(h + K^{-1}\mu_0\right) \end{align*}\]
where $\lambda$ is given by expected_auglik_precision and $h$ is given by expected_auglik_potential. Note that if you work with Sparse GPs, the updates should be:
\[\begin{align*} S =& \left(K_{Z}^{-1} + \kappa\operatorname{Diagonal}(r)\kappa^\top\right)^{-1},\\ m =& S \left(\kappa t + K_Z^{-1}\mu_0(Z)\right), \end{align*}\]
where $\kappa=K_{Z}^{-1}K_{Z,X}$.
Likelihood and ELBO computations
You might be interested in computing the ELBO, i.e. the lower bound on the evidence of the augmented model. The ELBO can be written as:
\[\begin{align*} \mathcal{L(q(f,\Omega)} =& E_{q(f)q(\Omega)}\left[\log p(y,\Omega|f)\right] - \operatorname{KL}\left(q(f)||p(f)\right)\\ =& E_{q(f)q(\Omega)}\left[\log p(y|\Omega,f)\right] - \operatorname{KL}\left(q(\Omega)||p(\Omega)\right) -\operatorname{KL}\left(q(f)||p(f)\right) \end{align*}\]
To work with all these terms, AugmentedGPLikelihoods.jl provide a series of helping functions:
AugmentedGPLikelihoods.aug_loglik — Functionaug_loglik(lik::Likelihood, Ω, y, f) -> RealReturn the augmented log-likelihood with the given parameters. The augmented log-likelihood is of the form
\[ \log p(y,\Omega|f) = \log l(y,\Omega,f) + \log p(\Omega|y, f).\]
To only obtain the $p(\Omega|y, f)$ part see aux_prior and see logtilt for $\log l(y, \Omega, f)$.
A generic fallback exists based on logtilt and aux_prior but specialized implementations are encouraged.
AugmentedGPLikelihoods.expected_aug_loglik — Functionexpected_aug_loglik(lik::Likelihood, qΩ, y, qf) -> RealReturn the expected augmented log-likelihood with the given parameters. The expected augmented log-likelihood is of the form
\[ E_{q(\Omega,f)}\left[\log p(y,\Omega|f)\right]\]
To only obtain the $p(\Omega|y)$ part see aux_prior and see logtilt for $\log l(y, \Omega, f)$.
A generic fallback exists based on expected_logtilt and aux_kldivergence but specialized implementations are encouraged.
AugmentedGPLikelihoods.aux_prior — Functionaux_prior(lik::Likelihood, y) -> AbstractProductMeasureReturns a NamedTuple of distributions with the same structure as aux_posterior, init_aux_posterior and init_aux_variables. Note that an auxiliary prior is not always available. Even if the likelihood is a valid density, there is no guarantee that the prior of the augmented variable is!
AugmentedGPLikelihoods.aux_kldivergence — Functionaux_kldivergence(lik::Likelihood, qΩ::For, pΩ::For) -> Real
aux_kldivergence(lik::Likelihood, qΩ::For, y) -> RealCompute the analytical KL divergence between the auxiliary variables posterior $q(\Omega)$, obtained with aux_posterior and prior $p(\Omega)$, obtained with aux_prior.
AugmentedGPLikelihoods.logtilt — Functionlogtilt(lik::Likelihood, Ω, y, f) -> RealCompute the quadratic part on $f$ of the augmented likelihood:
\[ \log C(\Omega, y) + \alpha(\Omega,y) + \beta(\Omega,y)f + \frac{\gamma(\Omega,y)}{2}f^2.\]
See also expected_logtilt for variational inference.
AugmentedGPLikelihoods.expected_logtilt — Functionexpected_logtilt(lik::Likelihood, qΩ, y, qf) -> RealCompute the expectation of the quadratic part on $f$ of the augmented likelihood.
\[ E_{q(\Omega,f)}\left[\log C(\Omega, y) + \alpha(\Omega,y) + \beta(\Omega,y)f + \frac{\gamma(\Omega,y)}{2}f^2\right].\]
See also logtilt.
aug_loglik returns the augmented log-likelihood $\log p(y,\Omega|f)$, but one should avoid using it as computing $p(\Omega)$ can be expensive. aux_prior returns the prior on the auxiliary variables, note that it can depend on the observations $y$. logtilt returns the log of the exponential part of the augmented likelihood, which is conjugate with $f$. aug_loglik is computed by default using logtilt + logpdf(aux_prior, x). Finally, for variational inference purposes, aux_kldivergence computes the KL divergence $\operatorname{KL}(q(\Omega)||p(\Omega))$ and expected_logtilt computes the expectation of [logtilt] analytically.