Mauna Loa time series example
You are seeing the HTML output generated by Documenter.jl and Literate.jl from the Julia source file. The corresponding notebook can be viewed in nbviewer.
In this notebook, we apply Gaussian process regression to the Mauna Loa CO₂ dataset. It is adapted from the corresponding AbstractGPs.jl tutorial. It is therefore instructive to read that first and then see how EasyGPs.jl simplifies the steps.
Setup
We make use of the following packages:
using CSV, DataFrames # data loading
using EasyGPs # handles all things related to GPs
using Plots # visualisation
Let's load and visualize the dataset.
(xtrain, ytrain), (xtest, ytest) = let
data = CSV.read(joinpath("/home/runner/work/EasyGPs.jl/EasyGPs.jl/examples/0-mauna-loa", "CO2_data.csv"), Tables.matrix; header=0)
year = data[:, 1]
co2 = data[:, 2]
# We split the data into training and testing set:
idx_train = year .< 2004
idx_test = .!idx_train
(year[idx_train], co2[idx_train]), (year[idx_test], co2[idx_test]) # block's return value
end
function plotdata()
plot(; xlabel="year", ylabel="CO₂ [ppm]", legend=:bottomright)
scatter!(xtrain, ytrain; label="training data", ms=2, markerstrokewidth=0)
return scatter!(xtest, ytest; label="test data", ms=2, markerstrokewidth=0)
end
plotdata()
Prior
We construct the GP prior using the same kernels and initial parameters as in the original tutorial.
k_smooth_trend = exp(8.0) * with_lengthscale(SEKernel(), exp(4.0))#with_lengthscale(SEKernel(), exp(4.0))
k_seasonality =
exp(2.0) * PeriodicKernel(; r=[0.5]) * with_lengthscale(SEKernel(), exp(4.0))
k_medium_term_irregularities =
1.0 * with_lengthscale(RationalQuadraticKernel(; α=exp(-1.0)), 1.0)
k_noise_terms =
exp(-4.0) * with_lengthscale(SEKernel(), exp(-2.0)) + exp(-4.0) * WhiteKernel()
kernel = k_smooth_trend + k_seasonality + k_medium_term_irregularities + k_noise_terms
We construct the GP
object with the kernel above:
gp = GP(kernel)
Posterior
To construct a posterior, we can call the GP object with the usual AbstractGPs.jl API:
fpost_init = posterior(gp(xtrain), ytrain)
Let's visualize what the GP fitted to the data looks like, for the initial choice of kernel hyperparameters.
We use the following function to plot a GP f
on a specific range, using the AbstractGPs plotting recipes. By setting ribbon_scale=2
we visualize the uncertainty band with $\pm 2$ (instead of the default $\pm 1$) standard deviations.
plot_gp!(f; label) = plot!(f(1920:0.2:2030); ribbon_scale=2, linewidth=1, label)
plotdata()
plot_gp!(fpost_init; label="posterior f(⋅)")
A reasonable fit to the data, but poor extrapolation away from the observations!
Hyperparameter Optimization
We can now call EasyGPs.fit
in order to optimize the hyperparameters. This takes care of all the parameterizations, automatic differentiation, and runs the optimizer for us. We pass an option to choose the exact same optimizer as in the original tutorial.
@time fitted_gp = EasyGPs.fit(
gp,
xtrain,
ytrain;
optimizer=Optim.LBFGS(;
alphaguess=Optim.LineSearches.InitialStatic(; scaled=true),
linesearch=Optim.LineSearches.BackTracking(),
),
)
104.000879 seconds (233.96 M allocations: 32.125 GiB, 13.77% gc time, 74.75% compilation time: <1% of which was recompilation)
Let's now construct the posterior GP with the optimized hyperparameters:
fpost_opt = posterior(fitted_gp(xtrain), ytrain)
This is the kernel with the point-estimated hyperparameters:
fpost_opt.prior.kernel
Sum of 5 kernels:
Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
- Scale Transform (s = 0.006409359405350664)
- σ² = 196189.1377809622
Product of 2 kernels:
Periodic Kernel, length(r) = 1
- σ² = 6.031052736397145
Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
- Scale Transform (s = 0.01211762616986784)
Rational Quadratic Kernel (α = 0.0008271632559866195, metric = Distances.Euclidean(0.0))
- Scale Transform (s = 0.04882049165706758)
- σ² = 160.20288567771527
Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
- Scale Transform (s = 8.753051584372757)
- σ² = 0.028319259298516024
White Kernel
- σ² = 0.035763467727454694
And, finally, we can visualize our optimized posterior GP:
plotdata()
plot_gp!(fpost_opt; label="optimized posterior f(⋅)")
Package and system information
Package information (click to expand)
Status `~/work/EasyGPs.jl/EasyGPs.jl/examples/0-mauna-loa/Project.toml` [336ed68f] CSV v0.10.13 [a93c6f00] DataFrames v1.6.1 [dcfb08e9] EasyGPs v0.1.0 `/home/runner/work/EasyGPs.jl/EasyGPs.jl#master` [98b081ad] Literate v2.16.1 [91a5bcdd] Plots v1.40.2To reproduce this notebook's package environment, you can download the full Manifest.toml.
System information (click to expand)
Julia Version 1.10.2 Commit bd47eca2c8a (2024-03-01 10:14 UTC) Build Info: Official https://julialang.org/ release Platform Info: OS: Linux (x86_64-linux-gnu) CPU: 4 × AMD EPYC 7763 64-Core Processor WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-15.0.7 (ORCJIT, znver3) Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores) Environment: JULIA_DEBUG = Documenter JULIA_LOAD_PATH = :/home/runner/.julia/packages/JuliaGPsDocs/7M86H/src
This page was generated using Literate.jl.