Sampling a GP Prior

You are seeing the HTML output generated by Documenter.jl and Literate.jl from the Julia source file. The corresponding notebook can be viewed in nbviewer.

Load required packages

using KernelSpectralDensities
using AbstractGPs
using StatsBase
using LinearAlgebra
using CairoMakie

Intro

We use the AbstractGPs package to define a stationary GP prior, in other words, a GP that has not been conditioned on data yet.

ker = SqExponentialKernel()
S = SpectralDensity(ker, 1)

gp = GP(ker)

AbstractGPs.GP{AbstractGPs.ZeroMean{Float64}, KernelFunctions.SqExponentialKernel{Distances.Euclidean}}(AbstractGPs.ZeroMean{Float64}(), Squared Exponential Kernel (metric = Distances.Euclidean(0.0)))

Naive Sampling

If we want to draw a sample from the GP prior, the standard way is to use the Cholesky decomposition of the kernel matrix.

In this example, we want to sample the GP at the following points

x_sample = range(0, 2; length=5)

0.0:0.5:2.0

To sample, we calculate the mean and covariance of the GP at these points. While we use the AbstractGPs interface, in this case the mean is just a zero vector and the covariance is the kernel matrix over the sample points.

m = mean(gp, x_sample)

Zeros(5)

K = cov(gp, x_sample)

5×5 Matrix{Float64}:
 1.0       0.882497  0.606531  0.324652  0.135335
 0.882497  1.0       0.882497  0.606531  0.324652
 0.606531  0.882497  1.0       0.882497  0.606531
 0.324652  0.606531  0.882497  1.0       0.882497
 0.135335  0.324652  0.606531  0.882497  1.0

We then compute the Cholesky decomposition of the covariance matrix samples a vector of standard normal random variables and obtain a sample from the GP prior.

function naive_sample(gp, x_sample)
    m = mean(gp, x_sample)
    K = cov(gp, x_sample)
    Kc = cholesky(K).L
    ζ = randn(length(x_sample))
    return m .+ Kc * ζ
end

ys = naive_sample(gp, x_sample)

5-element Vector{Float64}:
 -0.497880456538927
 -0.5662818332870188
 -0.8454795005912243
 -1.1149744375484971
 -0.8091910946072824

To illustrate we plot a few samples

x_plot = range(0, 1; length=10)
n_samples = 7
ys_plot = [naive_sample(gp, x_plot) for _ in 1:n_samples]

f = Figure(; size=(600, 400))
ax = Axis(f[1, 1]; xlabel="x", ylabel="y", title="Naive Sampling")
series!(ax, x_plot, reduce(hcat, ys_plot)'; labels=["sample $i" for i in 1:n_samples])
f

To evaluate the samples, we define the following function

function evaluate_samples(y_sample, m, K)
    ms = mean(y_sample)
    merr = norm(m .- ms)
    cs = cov(y_sample)
    cerr = norm(K .- cs)
    print("Mean error: $merr, Covariance error: $cerr\n")
    return ms, cs
end

For the small number of samples we have, the results are not very good.

y_sample = [naive_sample(gp, x_sample) for _ in 1:n_samples]
ms, cs = evaluate_samples(y_sample, m, K);

Mean error: 0.9863109755153893, Covariance error: 2.8096375521957353

ms

5-element Vector{Float64}:
 0.10981562476406978
 0.44031083678823524
 0.6171010984970332
 0.5265149839818042
 0.32991581026533917

cs

5×5 Matrix{Float64}:
  1.31714    0.941102   0.307375  -0.305176   -0.751424
  0.941102   0.777541   0.388235  -0.0304563  -0.39374
  0.307375   0.388235   0.501834   0.608555    0.535152
 -0.305176  -0.0304563  0.608555   1.31275     1.59242
 -0.751424  -0.39374    0.535152   1.59242     2.15999

If we sample a lot more functions however, we get closer to the anaytical result

n_manysamples = 1000
y_sample = [naive_sample(gp, x_sample) for _ in 1:n_manysamples]
ms, cs = evaluate_samples(y_sample, m, K);

Mean error: 0.09112083662239238, Covariance error: 0.08574644006897204

ms

5-element Vector{Float64}:
 -0.052207044811031666
 -0.05962241637332068
 -0.0415980364180494
 -0.01214127943625551
  0.01203293415935746

cs

5×5 Matrix{Float64}:
 1.00385   0.873163  0.610198  0.340433  0.144414
 0.873163  0.97661   0.875248  0.621779  0.347945
 0.610198  0.875248  1.00342   0.904101  0.636395
 0.340433  0.621779  0.904101  1.02244   0.892318
 0.144414  0.347945  0.636395  0.892318  0.970916

However, there are two issues with this approach:

It is quite computationally expensive, since we need to calculate the Cholesky decomposition.
Sampling at a larger number of points can cause conditionint issues, as we show below.

x_sample_many = range(0, 2; length=20)
try
    naive_sample(gp, x_sample_many)
catch err
    showerror(stderr, err)
end

PosDefException: matrix is not positive definite; Factorization failed.

RFF Sampling

Random Fourier features are an alternative option to sample the GP prior. Instead of computing the Cholesky decomposition of the kernel matrix, we compute a number of Fourier features and can generate samples from the GP by defining a weighted sum of these features.

\[ f(x) = \sum_{i=1}^l w_i \varphi_i(x)\]

The weights $w_i$ are sampled from a standard normal distribution.

rff = DoubleRFF(S, 10)
agps = ApproximateGPSample(rff)
agps.(x_sample)

5-element Vector{Float64}:
  1.2810295438770924
  1.1162140294256733
  0.7770627207927714
  0.34160330399027394
 -0.07134896426884496

We can plot the samples as before

n_samples = 7
ys_plot = [ApproximateGPSample(rff).(x_plot) for _ in 1:n_samples]

f = Figure(; size=(600, 400))
ax = Axis(f[1, 1]; xlabel="x", ylabel="y", title="RFF Sampling")
series!(ax, x_plot, reduce(hcat, ys_plot)'; labels=["sample $i" for i in 1:n_samples])
f

Unfortunately, the mean and the covariance are worse than with the naive sampling for the same number of samples.

y_sample = [ApproximateGPSample(rff).(x_sample) for _ in 1:n_samples]
ms, cs = evaluate_samples(y_sample, m, K);

Mean error: 0.4703725051165074, Covariance error: 1.7487573694832248

ms

5-element Vector{Float64}:
 -0.2916599272198823
 -0.2788164310088815
 -0.20907095337003995
 -0.11527850400740348
 -0.038031318592695

cs

5×5 Matrix{Float64}:
 0.728927  0.606082  0.467075  0.381741  0.395026
 0.606082  0.562672  0.477294  0.397215  0.365684
 0.467075  0.477294  0.445296  0.395448  0.358293
 0.381741  0.397215  0.395448  0.389309  0.39238
 0.395026  0.365684  0.358293  0.39238   0.468779

However, we now have another parameter to tune: The number of features By increasing the number of features, we get close to the result we saw with the naive sampling.

rff500 = DoubleRFF(S, 500)
y_sample = [ApproximateGPSample(rff500).(x_sample) for _ in 1:n_samples]
ms, cs = evaluate_samples(y_sample, m, K);

Mean error: 1.535041408116047, Covariance error: 2.118621506946047

ms

5-element Vector{Float64}:
 0.9863249796239554
 0.9531885294759388
 0.629955041273104
 0.2548571509962441
 0.11467896510205029

cs

5×5 Matrix{Float64}:
 0.986024   1.06929   0.789453  0.389075  0.0986448
 1.06929    1.59471   1.51785   1.12985   0.812884
 0.789453   1.51785   1.65726   1.42097   1.15974
 0.389075   1.12985   1.42097   1.38114   1.25007
 0.0986448  0.812884  1.15974   1.25007   1.34014

By increasing the number of GP samples, we can again improve the results in both cases.

With 10 feature functions

y_sample = [ApproximateGPSample(rff).(x_sample) for _ in 1:n_manysamples]
ms, cs = evaluate_samples(y_sample, m, K);

Mean error: 0.13897612660738592, Covariance error: 1.215132941499786

ms

5-element Vector{Float64}:
 0.058474311019376266
 0.06276453528159073
 0.06537066457917035
 0.06444674734720866
 0.05940559567355604

cs

5×5 Matrix{Float64}:
 1.0424    1.00334   0.876581  0.712909  0.575138
 1.00334   1.0641    1.01373   0.874083  0.701494
 0.876581  1.01373   1.05605   0.989184  0.841873
 0.712909  0.874083  0.989184  1.01591   0.946257
 0.575138  0.701494  0.841873  0.946257  0.976566

With 500 feature functions

y_sample = [ApproximateGPSample(rff500).(x_sample) for _ in 1:n_manysamples]
ms, cs = evaluate_samples(y_sample, m, K);

Mean error: 0.05521427949286809, Covariance error: 0.1275305259448162

ms

5-element Vector{Float64}:
  0.0365486212262885
  0.03869196883232553
  0.012671806682106273
 -0.006790936547433723
  0.003009151121538044

cs

5×5 Matrix{Float64}:
 0.999612  0.874209  0.573214  0.28624   0.118205
 0.874209  1.02554   0.892758  0.579912  0.292827
 0.573214  0.892758  1.02792   0.880059  0.566414
 0.28624   0.579912  0.880059  1.00106   0.853468
 0.118205  0.292827  0.566414  0.853468  0.980865

Lastly, we note that we no longer have to worry about conditioning issues, and can evaluate a given GP sample at however many points we like

ApproximateGPSample(rff).(x_sample_many)

20-element Vector{Float64}:
 -1.6138981711044387
 -1.6110727561309952
 -1.5941675481835103
 -1.5626196248640434
 -1.5161398429978141
 -1.4547193827504028
 -1.3786307593985894
 -1.2884232818760128
 -1.184913047146441
 -1.0691676676267423
 -0.9424860331470348
 -0.8063735072324709
 -0.6625130478735763
 -0.5127328236304161
 -0.3589709652927843
 -0.20323815002041867
 -0.04757875779138393
  0.10596863175635762
  0.2554106209548927
  0.39883747548733645

Package and system information

Package information (click to expand)

Status `~/work/KernelSpectralDensities.jl/KernelSpectralDensities.jl/examples/3-priorGP/Project.toml`
  [99985d1d] AbstractGPs v0.5.21
  [7834405d] AbstractGPsMakie v0.2.8
⌃ [13f3f980] CairoMakie v0.12.18
  [0b91fe84] DisplayAs v0.1.6
  [027d52a2] KernelSpectralDensities v0.2.0 `/home/runner/work/KernelSpectralDensities.jl/KernelSpectralDensities.jl#main`
  [98b081ad] Literate v2.20.1
  [2913bbd2] StatsBase v0.34.4
Info Packages marked with ⌃ have new versions available and may be upgradable.

To reproduce this notebook's package environment, you can download the full Manifest.toml.

System information (click to expand)

Julia Version 1.11.3
Commit d63adeda50d (2025-01-21 19:42 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)
Environment:
  JULIA_PKG_SERVER_REGISTRY_PREFERENCE = eager
  JULIA_LOAD_PATH = :/home/runner/.julia/packages/JuliaGPsDocs/7M86H/src

This page was generated using Literate.jl.