Design

Why AbstractVectors Everywhere?

To understand the advantages of using AbstractVectors everywhere to represent collections of inputs, first consider the following properties that it is desirable for a collection of inputs to satisfy.

Unique Ordering

There must be a clearly-defined first, second, etc element of an input collection. If this were not the case, it would not be possible to determine a unique mapping between a collection of inputs and the output of kernelmatrix, as it would not be clear what order the rows and columns of the output should appear in.

Moreover, ordering guarantees that if you permute the collection of inputs, the ordering of the rows and columns of the kernelmatrix are correspondingly permuted.

Generality

There must be no restriction on the domain of the input. Collections of Reals, vectors, graphs, finite-dimensional domains, or really anything else that you fancy should be straightforwardly representable. Moreover, whichever input class is chosen should not prevent optimal performance from being obtained.

Unambiguously-Defined Length

Knowing the length of a collection of inputs is important. For example, a well-defined length guarantees that the size of the output of kernelmatrix, and related functions, are predictable. It also makes it possible to perform internal error-checking that ensures that e.g. there are the same number of inputs in two collections of inputs.

AbstractMatrices Do Not Cut It

Notably, while AbstractMatrix objects are often used to represent collections of vector-valued inputs, they do not immediately satisfy these properties as it is unclear whether a matrix of size P x Q represents a collection of P Q-dimensional inputs (each row is an input), or Q P-dimensional inputs (each column is an input).

Moreover, they occasionally add some aesthetic inconvenience. For example, a collection of Real-valued inputs, which might be straightforwardly represented as an AbstractVector{<:Real}, must be reshaped into a matrix.

There are two commonly used ways to partly resolve these shortcomings:

Resolution 1: Specify a Convention

One way that these shortcomings can be partly resolved is by specifying a convention that everyone adheres to regarding the interpretation of rows vs columns. However, opinions about the choice of convention are often surprisingly strongly held, and users regularly have to remind themselves which convention has been chosen. While this resolves the ordering problem, and in principle defines the "length" of a collection of inputs, AbstractMatrixs already have a length defined in Julia, which would generally disagree with our internal notion of length. This isn't a show-stopper, but it isn't an especially clean situation.

There is also the opportunity for some kinds of silent bugs. For example, if an input matrix happens to be square because the number of input dimensions is the same as the number of inputs, it would be hard to know whether the correct kernelmatrix has been computed. This kind of bug seems unlikely, but it exists regardless.

Finally, suppose that your inputs are some type T that is not simply a vector of real numbers, say a graph. In this situation, how should a collection of inputs be represented? A N x 1 or 1 x N matrix is the only obvious candidate, but the additional singular dimension seems somewhat redundant.

Resolution 2: Always Specify An obsdim Argument

Another way to partly resolve these problems is to not commit to a convention, and instead to propagate some additional information through the codebase that specifies how the input data is to be interpreted. For example, a kernel k that represents the sum of two other kernels might implement kernelmatrix as follows:

function kernelmatrix(k::KernelSum, x::AbstractMatrix; obsdim=1)
    return kernelmatrix(k.kernels[1], x; obsdim=obsdim) +
        kernelmatrix(k.kernels[2], x; obsdim=obsdim)
end

While this prevents this package from having to pre-specify a convention, it doesn't resolve the length issue, or the issue of representing collections of inputs which aren't immediately represented as vectors. Moreover, it complicates the internals; in contrast, consider what this function looks like with an AbstractVector:

function kernelmatrix(k::KernelSum, x::AbstractVector)
    return kernelmatrix(k.kernels[1], x) + kernelmatrix(k.kernels[2], x)
end

This code is clearer (less visual noise), and has removed a possible bug – if the implementer of kernelmatrix forgets to pass the obsdim kwarg into each subsequent kernelmatrix call, it's possible to get the wrong answer.

This being said, we do support matrix-valued inputs – see Why We Have Support for Both.

AbstractVectors

Requiring all collections of inputs to be AbstractVectors resolves all of these problems, and ensures that the data is self-describing to the extent that KernelFunctions.jl requires.

Firstly, the question of how to interpret the columns and rows of a matrix of inputs is resolved. Users must wrap matrices which represent collections of inputs in either a ColVecs or RowVecs, both of which have clearly defined semantics which are hard to confuse.

By design, there is also no discrepancy between the number of inputs in the collection, and the length function – the length of a ColVecs, RowVecs, or Vector{<:Real} is equal to the number of inputs.

There is no loss of performance.

A collection of N Real-valued inputs can be represented by an AbstractVector{<:Real} of length N, rather than needing to use an AbstractMatrix{<:Real} of size either N x 1 or 1 x N. The same can be said for any other input type T, and new subtypes of AbstractVector can be added if particularly efficient ways exist to store collections of inputs of type T. A good example of this in practice is using Tuple{S, Int}, for some input type S, as the Inputs for Multiple Outputs.

This approach can also lead to clearer user code. A user need only wrap their inputs in a ColVecs or RowVecs once in their code, and this specification is automatically re-used everywhere in their code. In this sense, it is straightforward to write code in such a way that there is one unique source of "truth" about the way in which a particular data set should be interpreted. Conversely, the obsdim resolution requires that the obsdim keyword argument is passed around with the data every single time that you use it.

The benefits of the AbstractVector approach are likely most strongly felt when writing a substantial amount of code on top of KernelFunctions.jl – in the same way that using AbstractVectors inside KernelFunctions.jl removes the need for large amounts of keyword argument propagation, the same will be true of other code.

Why We Have Support for Both

In short: many people like matrices, and are familiar with obsdim-style keyword arguments.

All internals are implemented using AbstractVectors though, and the obsdim interface is just a thin layer of utility functionality which sits on top of this. To avoid confusion and silent errors, we do not favour a specific convention (rows or columns) but instead it is necessary to specify the obsdim keyword argument explicitly.

Kernels for Multiple-Outputs

There are two equally-valid perspectives on multi-output kernels: they can either be treated as matrix-valued kernels, or standard kernels on an extended input domain. Each of these perspectives are convenient in different circumstances, but the latter greatly simplifies the incorporation of multi-output kernels in KernelFunctions.

More concretely, let k_mat be a matrix-valued kernel, mapping pairs of inputs of type T to matrices of size P x P to describe the covariance between P outputs. Given inputs x and y of type T, and integers p and q, we can always find an equivalent standard kernel k mapping from pairs of inputs of type Tuple{T, Int} to the Reals as follows:

k((x, p), (y, q)) = k_mat(x, y)[p, q]

This ability to treat multi-output kernels as single-output kernels is very helpful, as it means that there is no need to introduce additional concepts into the API of KernelFunctions.jl, just additional kernels! This in turn simplifies downstream code as they don't need to "know" about the existence of multi-output kernels in addition to standard kernels. For example, GP libraries built on top of KernelFunctions.jl just need to know about Kernels, and they get multi-output kernels, and hence multi-output GPs, for free.

Where there is the need to specialise implementations for multi-output kernels, this is done in an encapsulated manner – parts of KernelFunctions that have nothing to do with multi-output kernels know nothing about the existence of multi-output kernels.