Mauna Loa time series example

You are seeing the HTML output generated by Documenter.jl and Literate.jl from the Julia source file. The corresponding notebook can be viewed in nbviewer.

In this notebook, we apply Gaussian process regression to the Mauna Loa CO₂ dataset. It is adapted from the corresponding AbstractGPs.jl tutorial. It is therefore instructive to read that first and then see how EasyGPs.jl simplifies the steps.

Setup

We make use of the following packages:

using CSV, DataFrames  # data loading
using EasyGPs  # handles all things related to GPs
using Plots  # visualisation

Let's load and visualize the dataset.

(xtrain, ytrain), (xtest, ytest) = let
    data = CSV.read(joinpath("/home/runner/work/EasyGPs.jl/EasyGPs.jl/examples/0-mauna-loa", "CO2_data.csv"), Tables.matrix; header=0)
    year = data[:, 1]
    co2 = data[:, 2]

    # We split the data into training and testing set:
    idx_train = year .< 2004
    idx_test = .!idx_train

    (year[idx_train], co2[idx_train]), (year[idx_test], co2[idx_test])  # block's return value
end

function plotdata()
    plot(; xlabel="year", ylabel="CO₂ [ppm]", legend=:bottomright)
    scatter!(xtrain, ytrain; label="training data", ms=2, markerstrokewidth=0)
    return scatter!(xtest, ytest; label="test data", ms=2, markerstrokewidth=0)
end

plotdata()

Prior

We construct the GP prior using the same kernels and initial parameters as in the original tutorial.

k_smooth_trend = exp(8.0) * with_lengthscale(SEKernel(), exp(4.0))#with_lengthscale(SEKernel(), exp(4.0))
k_seasonality =
    exp(2.0) * PeriodicKernel(; r=[0.5]) * with_lengthscale(SEKernel(), exp(4.0))
k_medium_term_irregularities =
    1.0 * with_lengthscale(RationalQuadraticKernel(; α=exp(-1.0)), 1.0)
k_noise_terms =
    exp(-4.0) * with_lengthscale(SEKernel(), exp(-2.0)) + exp(-4.0) * WhiteKernel()
kernel = k_smooth_trend + k_seasonality + k_medium_term_irregularities + k_noise_terms

We construct the GP object with the kernel above:

gp = GP(kernel)

Posterior

To construct a posterior, we can call the GP object with the usual AbstractGPs.jl API:

fpost_init = posterior(gp(xtrain), ytrain)

Let's visualize what the GP fitted to the data looks like, for the initial choice of kernel hyperparameters.

We use the following function to plot a GP f on a specific range, using the AbstractGPs plotting recipes. By setting ribbon_scale=2 we visualize the uncertainty band with $\pm 2$ (instead of the default $\pm 1$) standard deviations.

plot_gp!(f; label) = plot!(f(1920:0.2:2030); ribbon_scale=2, linewidth=1, label)

plotdata()
plot_gp!(fpost_init; label="posterior f(⋅)")

A reasonable fit to the data, but poor extrapolation away from the observations!

Hyperparameter Optimization

We can now call EasyGPs.fit in order to optimize the hyperparameters. This takes care of all the parameterizations, automatic differentiation, and runs the optimizer for us. We pass an option to choose the exact same optimizer as in the original tutorial.

@time fitted_gp = EasyGPs.fit(
    gp,
    xtrain,
    ytrain;
    optimizer=Optim.LBFGS(;
        alphaguess=Optim.LineSearches.InitialStatic(; scaled=true),
        linesearch=Optim.LineSearches.BackTracking(),
    ),
)

104.000879 seconds (233.96 M allocations: 32.125 GiB, 13.77% gc time, 74.75% compilation time: <1% of which was recompilation)

Let's now construct the posterior GP with the optimized hyperparameters:

fpost_opt = posterior(fitted_gp(xtrain), ytrain)

This is the kernel with the point-estimated hyperparameters:

fpost_opt.prior.kernel

Sum of 5 kernels:
	Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 0.006409359405350664)
			- σ² = 196189.1377809622
	Product of 2 kernels:
			Periodic Kernel, length(r) = 1
					- σ² = 6.031052736397145
			Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
					- Scale Transform (s = 0.01211762616986784)
	Rational Quadratic Kernel (α = 0.0008271632559866195, metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 0.04882049165706758)
			- σ² = 160.20288567771527
	Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 8.753051584372757)
			- σ² = 0.028319259298516024
	White Kernel
			- σ² = 0.035763467727454694

And, finally, we can visualize our optimized posterior GP:

plotdata()
plot_gp!(fpost_opt; label="optimized posterior f(⋅)")

Package and system information

Package information (click to expand)

Status `~/work/EasyGPs.jl/EasyGPs.jl/examples/0-mauna-loa/Project.toml`
  [336ed68f] CSV v0.10.13
  [a93c6f00] DataFrames v1.6.1
  [dcfb08e9] EasyGPs v0.1.0 `/home/runner/work/EasyGPs.jl/EasyGPs.jl#master`
  [98b081ad] Literate v2.16.1
  [91a5bcdd] Plots v1.40.2

To reproduce this notebook's package environment, you can download the full Manifest.toml.

System information (click to expand)

Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)
Environment:
  JULIA_DEBUG = Documenter
  JULIA_LOAD_PATH = :/home/runner/.julia/packages/JuliaGPsDocs/7M86H/src

This page was generated using Literate.jl.