Heteroscedastic Gaussian (Inverse Scaled Logistic Link)

The HeteroscedasticGaussian with an inverse scaled logistic link is defined as

\[p(y|f, g, \lambda) = \frac{\sqrt{\lambda \sigma(g)}}{2\pi}\exp\left(-\frac{\lambda\sigma(g)}{2}\left(y - f\right)^2\right).\]

The augmentation

The reference for the augmentation is in [2].

\[\begin{align*} p(y|f, g, \lambda) =& \frac{\sqrt{\lambda \sigma(g)}}{2\pi}\exp\left(-\frac{\lambda\sigma(g)}{2}\left(y - f\right)^2\right),\\ =& \frac{\sqrt{\lambda \sigma(g)}}{2\pi}\exp\left(\left(\sigma(-g) - 1\right)\frac{\lambda}{2}\left(y - f\right)^2\right). \end{align*}\]

We use the identity with the Moment Generating Function of the Poisson distribution.

\[\begin{align*} p(y|f, g, \lambda) =& \frac{\sqrt{\lambda \sigma(g)}}{2\pi}\sum_{n=0}^\infty \sigma^{-n}(-g)\operatorname{Po}\left(n|\frac{\lambda}{2}\left(y - f\right)^2\right), =& \left(\frac{\lambda}{2\pi}\right)^{-\frac{1}{2}} \end{align*}\]

We augment with the variable $n$.

\[\begin{align*} p(y, n|f, g, \lambda) =& \frac{\sqrt{\lambda \sigma(g)}}{2\pi}\sigma^{-n}(-g)\operatorname{Po}\left(n|\frac{\lambda}{2}\left(y - f\right)^2\right)\\ =& \left(\frac{\lambda}{2\pi}\right)^{-\frac{1}{2}}\sigma^{\frac{1}{2}}(g)\sigma^n(-g)\operatorname{Po}\left(n|\frac{\lambda}{2}\left(y - f\right)^2\right). \end{align*}\]

We finally augment the part $\sigma^{\frac{1}{2}}(g)\sigma^n(-g)$ with the Pólya-Gamma augmentation.

\[ \sigma^{\frac{1}{2}}(g)\sigma^n(-g) = 2^{\frac{1}{2} + n}\int_0^\infty\exp\left(\frac{1}{2}\left(\left(\frac{1}{2} - n\right)g - \omega g^2\right)\right)\operatorname{PG}\left(\omega|\frac{1}{2} + n, 0\right)d\omega.\]

This gives use the final augmented likelihood:

\[ p(y,n,\omega|f,g,\lambda) = \left(\frac{\lambda}{2\pi}\right)^{-\frac{1}{2}}2^{\frac{1}{2} + n}\exp\left(\frac{1}{2}\left(\left(\frac{1}{2} - n\right)g - \omega g^2\right)\right)p(\omega|n)p(n|f,y,\lambda).\]

where $p(\omega|n)=\operatorname{PG}(\omega|\frac{1}{2} + n, 0)$ and $p(n|f,y,\lambda) = \operatorname{Po}\left(n|\frac{\lambda}{2}\left(y - f\right)^2\right)$.

Conditional distributions (Sampling)

Let's start with the augmented variables $n$ and $\omega$. The full conditional is given by:

\[ p(\omega,n|f,g,\lambda,y) = \operatorname{PG}\left(\omega|\frac{1}{2}+n,|g|\right)\operatorname{Po}\left(n|\frac{\lambda\sigma(g)}{2}(y-f)^2\right).\]

The full conditional of $\boldsymbol{g}$ is $p(\boldsymbol{g}|\boldsymbol{\omega}, \boldsymbol{n}) = \mathcal{N}\left(\boldsymbol{g}|\boldsymbol{\mu},\Sigma_g\right)$ where

\[\begin{align*} \Sigma_g =& \left(K_g^{-1} + \mathrm{Diagonal}(\boldsymbol{\omega})\right)^{-1},\\ \boldsymbol{\mu}_g =& \Sigma_g \left(K_g^{-1}\boldsymbol{\mu}_0 + \frac{1}{2}\left(\frac{1}{2} - \boldsymbol{n}\right)\right). \end{align*}\]

There is no closed-form solution for the full-conditional of $\boldsymbol{f}$ but we can compute the collapsed full-conditional, i.e. we take the full-conditional of $\boldsymbol{f}$ and marginalize out $\omega$ and $n$. The conditional is given by $p(\boldsymbol{f}|\boldsymbol{g},\lambda,\boldsymbol{y}) = \mathcal{N}(\boldsymbol{f}|\boldsymbol{\mu}_f,\Sigma_f)$ where

\[\begin{align*} \Sigma_f =& \left(K_f^{-1} + \mathrm{Diagonal}(\lambda \sigma(\boldsymbol{g}))\right)^{-1},\\ \boldsymbol{\mu}_f =& \Sigma_g \left(K_g^{-1}\boldsymbol{\mu}_0 + \frac{\lambda\sigma(\boldsymbol{g})}{2}\boldsymbol{y}\right). \end{align*}\]

We can also treat $\lambda$ probabilistically by adding a Gamma prior $p(\lambda) = \mathrm{\Gamma}(\alpha, \beta)$. The (collapsed) full conditional is then given by

\[ p(\lambda|\boldsymbol{f}, \boldsymbol{g}, \boldsymbol{y}) = \mathrm{Gamma}\left(\lambda|\alpha + \frac{N}{2}, \beta + \sum_{i=1}^N \frac{\sigma(g_i)}{2}(y_i - f_i)^2\right).\]

Variational distributions (Variational Inference)

We define the variational distribution with a block mean-field approximation:

\[ q(\boldsymbol{f}, \boldsymbol{g}, \boldsymbol{n}, \boldsymbol{\omega},\lambda) = \mathcal{N}(\boldsymbol{f}|\boldsymbol{m}_f,\boldsymbol{S}_f)\mathcal{N}(\boldsymbol{g}|\boldsymbol{m}_g,\boldsymbol{S}_g)\prod_{i=1}^N \operatorname{PG}\left(\omega_i|\frac{1}{2} + n_i|c_i\right)\operatorname{Po}(n_i|\gamma_i).\]

The CAVI updates are more complicated to get here. It's not possible to obtain the optimal parameters for $q(\boldsymbol{f})$ since the full-conditional are not available in closed form. We resort instead to a double bound approach described in a document soon to be published (see #97).

The optimal variational parameters are given by:

\[\begin{align*} c^i =& \sqrt{(m^i_g)^2 + S^{ii}_g},\\ \gamma^i =& \frac{\lambda}{2}\psi^i \sqrt{(y^i - m_f^i)^2 + S_f^{ii}} \\ \boldsymbol{S}_g =& \left(K^{-1} + \operatorname{Diagonal}(\boldsymbol{\theta})\right)^{-1},\\ \boldsymbol{m}_g =& \boldsymbol{S}_g\left(\frac{\frac{1}{2} - \boldsymbol{\gamma}}{2} + K_g^{-1}\mu_0\right), \end{align*}\]

where

\[\begin{align*} \psi^i = \frac{e^{-m_g^i/2}}{\sqrt{(m_g^i)^2 + S_g^{ii}}/2} \end{align*}\]

For $\boldsymbol{f}$ the optimal parameters are:

\[\begin{align*} \boldsymbol{S}_f =& \left(K^{-1} + \lambda\operatorname{Diagonal}(1 - \boldsymbol{\psi})\right)^{-1},\\ \boldsymbol{m}_f =& \boldsymbol{S}_f\left(\lambda \mathrm{Diagonal}(1 - \boldsymbol{\psi})\boldsymbol{y} + K_g^{-1}\mu_0\right), \end{align*}\]

As stated earlier the ELBO is made in two steps and is currently not implemented.

We get the ELBO as

\[\begin{align*} \mathcal{L} =& \sum_{i=1}^N - (\frac{1}{2} + \gamma^i_j) \log 2 + \frac{(\frac{1}{2} - \gamma^i) m^i_g}{2} - \frac{(m^i_g)^2 + S_g^{ii}}{2}\theta^i\\ &- \operatorname{KL}(q(\boldsymbol{\omega},\boldsymbol{n})||p(\boldsymbol{\omega},\boldsymbol{n}|\boldsymbol{y})) - \operatorname{KL}(q(\boldsymbol{f,g})||p(\boldsymbol{f,g})), \end{align*}\]

where

\[\begin{align*} \operatorname{KL}(\prod_{j} q(\omega^i_j|n^i_j)q(\bm n^i)||\prod_j p(\omega^i_j|\bm n^i, \boldsymbol{y}^i)p(\bm n^i)) =& \operatorname{KL}(q(\bm n^i)||p(\bm n^i)) + \sum_j E_{q(\bm n^i)}\left[\operatorname{KL}(q(\omega^i_j|\bm n^i)||p(\omega^i_j|\bm n^i, \boldsymbol{y}^i)\right],\\ E_{q(\bm \omega^i,\bm n^i)}\left[\operatorname{KL}(q(\omega^i_j|\bm n^i)||p(\omega^i_j|n^i_j, y^i)\right] =& (y^i + \gamma^i_j)\log\cosh \left(\frac{c^i_j}{2}\right) - (c^i_j)^2 \frac{\theta^i_j}{2}. \end{align*}\]