In functional analysis (a branch of mathematics), a reproducing kernel Hilbert space (RKHS) is a Hilbert space of functions in which point evaluation is a continuous linear functional. Roughly speaking, this means that if two functions f and g in the RKHS are close in norm, i.e., ‖f-g‖ is small, then f and g are also pointwise close, i.e., |f(x)-g(x)| is small for all x. The reverse need not be true. It is not entirely straightforward to construct a Hilbert space of functions which is not an RKHS. Note that L2 spaces are not Hilbert spaces of functions (and hence not RKHSs), but rather Hilbert spaces of equivalence classes of functions (for example, the functions f and g defined by f(x)=0 and g(x)=1
Contents
- Definition
- Example
- MooreAronszajn theorem
- Integral operators and Mercers theorem
- Feature maps
- Properties
- Examples
- Extension to vector valued functions
- References
The reproducing kernel was first introduced in the 1907 work of Stanisław Zaremba concerning boundary value problems for harmonic and biharmonic functions. James Mercer simultaneously examined functions which satisfy the reproducing property in the theory of integral equations. The idea of the reproducing kernel remained untouched for nearly twenty years until it appeared in the dissertations of Gábor Szegő, Stefan Bergman, and Salomon Bochner. The subject was eventually systematically developed in the early 1950s by Nachman Aronszajn and Stefan Bergman.
These spaces have wide applications, including complex analysis, harmonic analysis, and quantum mechanics. Reproducing kernel Hilbert spaces are particularly important in the field of statistical learning theory because of the celebrated representer theorem which states that every function in an RKHS that minimises an empirical risk function can be written as a linear combination of the kernel function evaluated at the training points. This is a practically useful result as it effectively simplifies the empirical risk minimization problem from an infinite dimensional to a finite dimensional optimization problem.
For ease of understanding, we provide the framework for real-valued Hilbert spaces. The theory can be easily extended to spaces of complex-valued functions and hence include the many important examples of reproducing kernel Hilbert spaces that are spaces of analytic functions.
Definition
Let
We say that H is a reproducing kernel Hilbert space if, for all
While property (1) is the weakest condition that ensures both the existence of an inner product and the evaluation of every function in
Since
This allows us to define the reproducing kernel of
From this definition it is easy to see that
for any
Example
The space of bandlimited functions
where
for some
As this inequality shows that the evaluation functional is bounded and
The kernel function
Note, that
Moore–Aronszajn theorem
We have seen how a reproducing kernel Hilbert space defines a reproducing kernel function that is both symmetric and positive definite. The Moore-Aronszajn theorem goes in the other direction; it states that every symmetric, positive definite kernel defines a unique reproducing kernel Hilbert space. The theorem first appeared in Aronszajn's Theory of Reproducing Kernels, although he attributes it to E. H. Moore.
Theorem. Suppose K is a symmetric, positive definite kernel on a set X. Then there is a unique Hilbert space of functions on X for which K is a reproducing kernel.
Proof. For all x in X, define Kx = K(x, ⋅ ). Let H0 be the linear span of {Kx : x ∈ X}. Define an inner product on H0 by
The symmetry of this inner product follows from the symmetry of K and the non-degeneracy follows from the fact that K is positive definite.
Let H be the completion of H0 with respect to this inner product. Then H consists of functions of the form
where
Now we can check the reproducing property (2):
To prove uniqueness, let G be another Hilbert space of functions for which K is a reproducing kernel. For any x and y in X, (2) implies that
By linearity,
Now we need to prove that every element of G is in H. Let
Which shows that
Integral operators and Mercer's theorem
We may characterize a symmetric positive definite kernel
where
Mercer's theorem states that the spectral decomposition of the integral operator
Under these assumptions
for all
Furthermore, it can be shown that the RKHS
where the inner product of
Feature maps
A feature map is a map
We first note that every feature map defines a kernel via
Clearly
For example, we can trivially take
This connection between kernels and feature maps provides us with a new way to understand positive definite functions and hence reproducing kernels as inner products in
Lastly, feature maps allow us to construct function spaces that reveal another perspective on the RKHS. Consider the linear space
We can define a norm on
It can be shown that
Properties
The following properties of RKHSs may be useful to readers.
is a kernel on
By the Cauchy–Schwarz inequality,
This inequality allows us to view
Examples
Common examples of kernels include:
Other common examples are kernels which satisfy
We also provide examples of Bergman kernels. Let X be finite and let H consist of all complex-valued functions on X. Then an element of H can be represented as an array of complex numbers. If the usual inner product is used, then Kx is the function whose value is 1 at x and 0 everywhere else, and K(x,y) can be thought of as an identity matrix since K(x,y)=1 when x=y and K(x,y)=0 otherwise. In this case, H is isomorphic to Cn.
The case of X = D (where D denotes the unit disc) is more sophisticated. Here the Bergman space H2(D) is the space of square-integrable holomorphic functions on D. It can be shown that the reproducing kernel for H2(D) is
Lastly, the space of band limited functions
Extension to vector-valued functions
In this section we extend the definition of the RKHS to spaces of vector-valued functions as this extension is particularly important in multi-task learning and manifold regularization. The main difference is that the reproducing kernel
and
This second property parallels the reproducing property for the scalar-valued case. We note that this definition can also be connected to integral operators, bounded evaluation functions, and feature maps as we saw for the scalar-valued RKHS. We can equivalently define the vvRKHS as a vector-valued Hilbert space with a bounded evaluation functional and show that this implies the existence of a unique reproducing kernel by the Riesz Representation theorem. Mercer's theorem can also be extended to address the vector-valued setting and we can therefore obtain a feature map view of the vvRKHS. Lastly, it can also be shown that the closure of the span of
We can gain intuition for the vvRKHS by taking a component-wise perspective on these spaces. In particular, we find that every vvRKHS is isometrically isomorphic to a scalar-valued RKHS on a particular input space. Let
As noted above, the RKHS associated to this reproducing kernel is given by the closure of the span of
The connection to the scalar-valued RKHS can then be made by the fact that every matrix-valued kernel can be identified with a kernel of the form of (4) via
Moreover, every kernel with the form of (4) defines a matrix-valued kernel with the above expression. Now letting the map
where
While this view of the vvRKHS can be quite useful in multi-task learning, it should be noted that this isometry does not reduce the study of the vector-valued case to that of the scalar-valued case. In fact, this isometry procedure can make both the scalar-valued kernel and the input space too difficult to work with in practice as properties of the original kernels are often lost.
An important class of matrix-valued reproducing kernels are separable kernels which can factorized as the product of a scalar valued kernel and a
for all
We lastly remark that the above theory can be further extended to spaces of functions with values in function spaces but obtaining kernels for these spaces is a more difficult task.