AugmentedGPLikelihoods
Documentation for AugmentedGPLikelihoods.
Gaussian Processes (GPs) are great tools for working with function approximations including uncertainty. On top of Gaussian regression, GPs can be used as latent functions for a lot of different tasks such as non-Gaussian regression, classification, multi-class classification or event counting. However, these tasks involve non-conjugate likelihoods and the GP posterior is then intractable. The typical solution is to approximate the posterior by either sampling from it or approximating it with another distribution. However, both these methods are computationally involved, require gradient and are not always guaranteed to converge.
The augmentation
An alternative proposed in [1] is to represent these non-conjugate likelihoods as scale-mixtures (sometimes requiring multiple steps) to obtain a conditionally conjugate likelihood. More concretely, some likelihoods can be written as:
\[ p(x) = \int q(x|\omega)d\pi(\omega).\]
One can then get rid of the integral by augmenting the model with the auxiliary variable $\omega$. With the right choice of mixture, one can obtain a likelihood conjugate with a GP $f$ but only when conditioned on $\omega$. This means we go from
\[ p(f|y) = \frac{p(y|f)p(f)}{p(y)},\]
which is intractable, to
\[ p(f,\omega|y) = \frac{p(y,\omega|f)p(f)}{p(y)},\]
where the conditionals $p(f|\omega,y)$ and $p(\omega|f,y)$ are known in closed-form.
Bayesian Inference
This new formulation leads to easier, more stable and faster inference. A natural algorithm following from this formulation is the Blocked Gibbs Sampling but also the Coordinate Ascent VI (CAVI) algorithm where the conditionals are used to compute the optimal variational distribution.
What this package does
Although an automatic way is proposed in [1], most of the augmentations require some hand derivations. This package provides the results of these derivations (as well as the derivations) and propose a unified framework to obtain the distributions of interest. Generally values (either samples or distributions) of the auxiliary variables need to be carried around. Since, each likelihood can have a different structure of auxiliary variables, they are uniformly moved as $\Omega$, which is a NamedTuple
containing Vector
s of either samples or distributions.
Gibbs Sampling
We give an example in the Gibbs Sampling tutorial using AbstractMCMC
. But the API can be reduced to 5 main functions:
AugmentedGPLikelihoods.init_aux_variables
— Functioninit_aux_variables([rng::AbstractRNG], ::Likelihood, n::Int) -> TupleVector
Initialize collections of n
auxiliary variables in a TupleVector
to be used in the context of sampling. n
should be the number of data inputs.
See also init_aux_posterior
for variational inference.
AugmentedGPLikelihoods.aux_sample!
— Functionaux_sample!([rng::AbstractRNG], Ω, lik::Likelihood, y, f) -> TupleVector
Sample the auxiliary variables Ω
in-place based on the full-conditional aux_full_conditional
associated with the augmented likelihood:
\[ p(\Omega|y,f) \propto p(\Omega,y|f).\]
See also aux_sample
for an allocating version and aux_posterior!
for variational inference.
AugmentedGPLikelihoods.aux_sample
— Functionaux_sample([rng::AbstractRNG], lik::Likelihood, y, f) -> TupleVector
Sample and allocate the auxiliary variables Ω
in a TupleVector
based on the full-conditional associated with the likelihood.
See aux_sample!
for an in-place version.
AugmentedGPLikelihoods.aux_full_conditional
— Functionaux_full_conditional(lik::Likelihood, y, f::Real) -> Distribution
Given the observation y
and latent f
, returns the full conditional on the auxiliary variables Ω
.
AugmentedGPLikelihoods.auglik_potential
— Functionauglik_potential(lik::Likelihood, Ω, y) -> Tuple
Given the augmented likelihood $l(\Omega,y,f) \propto \exp(\beta(\Omega,y) f + \gamma(\Omega,y)f^2)$, return the potential, $\beta(\Omega,y)$. Note that this equivalent to the shift of the first natural parameter $\eta_1 = \Sigma^{-1}\mu$. The Tuple
contains a Vector
for each latent.
See also expected_auglik_potential
for variational inference.
AugmentedGPLikelihoods.auglik_precision
— Functionauglik_precision(lik::Likelihood, Ω, y) -> Tuple
Given the augmented likelihood $l(\Omega,y,f) \propto \exp(\beta(\Omega,y) f + \frac{\gamma(\Omega,y)}{2}f^2)$, return the precision, $\gamma(\Omega,y)$, note that this equivalent to the shift of the precision $\Lambda = \Sigma^{-1}$. The Tuple
contains a Vector
for each latent.
See also expected_auglik_precision
for variational inference.
AugmentedGPLikelihoods.auglik_potential_and_precision
— Functionauglik_potential_and_precision(lik::Likelihood, Ω, y) -> Tuple{Tuple, Tuple}
Returns both auglik_potential
and auglik_precision
when some computation can be saved by doing both at the same time.
First init_aux_variables
initializes a NamedTuple
of Vector
(s) of auxiliary variables to be modified in-place during sampling. aux_sample!
will sample the auxiliary variables from the full conditional $p(\Omega|y,f)$, given by aux_full_conditional
, and return the modified NamedTuple
. For generating a new NamedTuple
every time, see aux_sample
. The full-conditional from $f$ are given by
\[\begin{align*} p(f|y,\Omega) =& \mathcal{N}(f|m,S)\\ S =& \left(K^{-1} + \operatorname{Diagonal}(\lambda)\right)^{-1}\\ m =& S \left(h + K^{-1}\mu_0\right), \end{align*}\]
where $\lambda$ is obtained via auglik_precision
and $h$ is obtained via auglik_potential
. For likelihoods requiring multiple latent GP (e.g. multi-class classification or heteroscedastic likelihoods), auglik_potential
and auglik_precision
return a Tuple
with the respective Vector
s $\lambda$ and $h$.
As a general rule, the augmented likelihood will have the form
\[ p(y|f,\Omega) \propto \exp\left(h(\Omega,y)f + \frac{\lambda(\Omega,y)}{2}f^2\right),\]
with auglik_potential
$\equiv h(\Omega,y)$ while auglik_precision
$\equiv \lambda(\Omega,y)$.
Coordinate Ascent Variational Inference
CAVI updates work exactly the same way as the Gibbs Sampling, except we are now working with posterior distributions instead of samples. We work with the variational family $q(f)\prod_{i=1}^N q(\Omega_i)$, i.e. we make the mean-field assumption that our new auxiliary variables are independent of each other and independent of $f$.
Like for Gibbs Sampling, there are also 5 main functions
AugmentedGPLikelihoods.init_aux_posterior
— Functioninit_aux_posterior([T::DataType], lik::Likelihood, n::Int) -> AbstractProductMeasure
Initialize collections of n
(independent) posteriors for the auxiliary variables in the context of variational inference. n
should be the size of the data used every iteration. The real variational parameters will be given type T
(Float64
by default)
See also init_aux_variables
for sampling.
AugmentedGPLikelihoods.aux_posterior!
— Functionaux_posterior!(qΩ, lik::Likelihood, y, qf) -> AbstractProductMeasure
Compute the optimal posterior of the auxiliary variables $q^*(\Omega)$ given the marginal distributions qf
by updating the variational parameters in-place using the formula
\[ q^*(\Omega) \propto \exp\left(E_{q(f)}\left[p(\Omega|f,y)\right]\right)\]
See also aux_posterior
and aux_sample!
AugmentedGPLikelihoods.aux_posterior
— Functionaux_posterior(lik::Likelihood, y, qf) -> AbstractProductMeasure
Compute the optimal posterior of the auxiliary variables in a new AbstractProductMeasure
(For
by default).
See aux_posterior!
for more details
AugmentedGPLikelihoods.expected_auglik_potential
— Functionexpected_auglik_potential(lik::Likelihood, qΩ, y) -> Tuple
Given the augmented likelihood $l(\Omega,y,f) \propto \exp(\beta(\Omega,y) f + \frac{\gamma(\Omega,y)}{2}f^2)$, return the expected potential, $E_{q(\Omega)}[\beta(\Omega,y)]$, note that this equivalent to the shift of the first variational natural parameter $\eta_1 = \Sigma^{-1}\mu$. The Tuple
contains a Vector
for each latent.
See also auglik_potential
for sampling.
AugmentedGPLikelihoods.expected_auglik_precision
— Functionexpected_auglik_precision(lik::Likelihood, qΩ, y) -> Tuple
Given the augmented likelihood $l(\Omega,y,f) \propto \exp(\beta(\Omega,y) f + \frac{\gamma(\Omega,y)}{2}f^2)$, return the expected precision, $E_{q(\Omega)}[\gamma(\Omega,y)]$, note that this equivalent to the shift of the variational precision $\Lambda = \Sigma^{-1}$. The Tuple
contains a Vector
for each latent.
See also auglik_precision
for sampling.
AugmentedGPLikelihoods.expected_auglik_potential_and_precision
— Functionexpected_auglik_potential_and_precision(lik::Likelihood, Ω, y) -> Tuple{Tuple, Tuple}
Returns both expected_auglik_potential
and expected_auglik_precision
when some computation can be saved by doing both at the same time.
init_aux_posterior
initializes the posterior $\prod_{i=1}^Nq(\Omega_i)$ as a NamedTuple
. aux_posterior!
updates the variational posterior distributions in-place given the marginals $q(f_i)$ and return the modified NamedTuple
. To get a new NamedTuple
every time, use aux_posterior
. Finally, expected_auglik_potential
and expected_auglik_precision
give us the elements needed to update the variational distribution $q(f)$. Like for Gibbs Sampling, we have the following optimal variational distributions:
\[\begin{align*} q^*(f) =& \mathcal{N}(f|m,S)\\ S =& \left(K^{-1} + \operatorname{Diagonal}(\lambda)\right)^{-1}\\ m =& S \left(h + K^{-1}\mu_0\right) \end{align*}\]
where $\lambda$ is given by expected_auglik_precision
and $h$ is given by expected_auglik_potential
. Note that if you work with Sparse GPs, the updates should be:
\[\begin{align*} S =& \left(K_{Z}^{-1} + \kappa\operatorname{Diagonal}(r)\kappa^\top\right)^{-1},\\ m =& S \left(\kappa t + K_Z^{-1}\mu_0(Z)\right), \end{align*}\]
where $\kappa=K_{Z}^{-1}K_{Z,X}$.
Likelihood and ELBO computations
You might be interested in computing the ELBO, i.e. the lower bound on the evidence of the augmented model. The ELBO can be written as:
\[\begin{align*} \mathcal{L(q(f,\Omega)} =& E_{q(f)q(\Omega)}\left[\log p(y,\Omega|f)\right] - \operatorname{KL}\left(q(f)||p(f)\right)\\ =& E_{q(f)q(\Omega)}\left[\log p(y|\Omega,f)\right] - \operatorname{KL}\left(q(\Omega)||p(\Omega)\right) -\operatorname{KL}\left(q(f)||p(f)\right) \end{align*}\]
To work with all these terms, AugmentedGPLikelihoods.jl
provide a series of helping functions:
AugmentedGPLikelihoods.aug_loglik
— Functionaug_loglik(lik::Likelihood, Ω, y, f) -> Real
Return the augmented log-likelihood with the given parameters. The augmented log-likelihood is of the form
\[ \log p(y,\Omega|f) = \log l(y,\Omega,f) + \log p(\Omega|y, f).\]
To only obtain the $p(\Omega|y, f)$ part see aux_prior
and see logtilt
for $\log l(y, \Omega, f)$.
A generic fallback exists based on logtilt
and aux_prior
but specialized implementations are encouraged.
AugmentedGPLikelihoods.expected_aug_loglik
— Functionexpected_aug_loglik(lik::Likelihood, qΩ, y, qf) -> Real
Return the expected augmented log-likelihood with the given parameters. The expected augmented log-likelihood is of the form
\[ E_{q(\Omega,f)}\left[\log p(y,\Omega|f)\right]\]
To only obtain the $p(\Omega|y)$ part see aux_prior
and see logtilt
for $\log l(y, \Omega, f)$.
A generic fallback exists based on expected_logtilt
and aux_kldivergence
but specialized implementations are encouraged.
AugmentedGPLikelihoods.aux_prior
— Functionaux_prior(lik::Likelihood, y) -> AbstractProductMeasure
Returns a NamedTuple
of distributions with the same structure as aux_posterior
, init_aux_posterior
and init_aux_variables
. Note that an auxiliary prior is not always available. Even if the likelihood is a valid density, there is no guarantee that the prior of the augmented variable is!
AugmentedGPLikelihoods.aux_kldivergence
— Functionaux_kldivergence(lik::Likelihood, qΩ::For, pΩ::For) -> Real
aux_kldivergence(lik::Likelihood, qΩ::For, y) -> Real
Compute the analytical KL divergence between the auxiliary variables posterior $q(\Omega)$, obtained with aux_posterior
and prior $p(\Omega)$, obtained with aux_prior
.
AugmentedGPLikelihoods.logtilt
— Functionlogtilt(lik::Likelihood, Ω, y, f) -> Real
Compute the quadratic part on $f$ of the augmented likelihood:
\[ \log C(\Omega, y) + \alpha(\Omega,y) + \beta(\Omega,y)f + \frac{\gamma(\Omega,y)}{2}f^2.\]
See also expected_logtilt
for variational inference.
AugmentedGPLikelihoods.expected_logtilt
— Functionexpected_logtilt(lik::Likelihood, qΩ, y, qf) -> Real
Compute the expectation of the quadratic part on $f$ of the augmented likelihood.
\[ E_{q(\Omega,f)}\left[\log C(\Omega, y) + \alpha(\Omega,y) + \beta(\Omega,y)f + \frac{\gamma(\Omega,y)}{2}f^2\right].\]
See also logtilt
.
aug_loglik
returns the augmented log-likelihood $\log p(y,\Omega|f)$, but one should avoid using it as computing $p(\Omega)$ can be expensive. aux_prior
returns the prior on the auxiliary variables, note that it can depend on the observations $y$. logtilt
returns the log of the exponential part of the augmented likelihood, which is conjugate with $f$. aug_loglik
is computed by default using logtilt
+ logpdf(aux_prior, x)
. Finally, for variational inference purposes, aux_kldivergence
computes the KL divergence $\operatorname{KL}(q(\Omega)||p(\Omega))$ and expected_logtilt
computes the expectation of [logtilt
] analytically.