In computational learning theory (machine learning and theory of computation), Rademacher complexity, named after Hans Rademacher, measures richness of a class of real-valued functions with respect to a probability distribution.
Contents
- Rademacher complexity of a set
- Rademacher complexity of a function class
- Examples
- Using the Rademacher complexity
- Bounding the representativeness
- Bounding the generalization error
- Bounding the Rademacher complexity
- Bounds related to the VC dimension
- Bounds related to linear classes
- Bounds related to covering numbers
- Gaussian complexity
- References
Rademacher complexity of a set
Given a set
where
Rademacher complexity of a function class
Given a sample
This can also be written using the previous definition:
where
Let
where the above expectation is taken over an identically independently distributed (i.i.d.) sample
Examples
1.
The same is true for every singleton hypothesis class.
2.
Using the Rademacher complexity
The Rademacher complexity can be used to derive data-dependent upper-bounds on the learnability of function classes. Intuitively, a function-class with smaller Rademacher complexity is easier to learn.
Bounding the representativeness
In machine learning, it is desired to have a training set that represents the true distribution of samples. This can be quantified using the notion of representativeness. Denote by P the probability distribution from which the samples are drawn. Denote by
The representativeness of the sample
Smaller representativeness is better, since it means that the empirical error of a classifier on the training set is not much higher than its true error.
The expected representativeness of a sample can be bounded by the expected Rademacher complexity of the function class:
Bounding the generalization error
When the Rademacher complexity is small, it is possible to learn the hypothesis class H using empirical risk minimization.
For example (with binary error function), for every
Bounding the Rademacher complexity
Since smaller Rademacher complexity is better, it is useful to have upper bounds on the Rademacher complexity of various function sets. The following rules can be used to upper-bound the Rademacher complexity of a set
1. If all vectors in
2. If all vectors in
3. Rad(A+B) = Rad(A) + Rad(B).
4. (Kakade&Tewari Lemma) If all vectors in
5. The Rademacher complexity of the convex hull of
6. (Massart Lemma) The Rademacher complexity of a finite set grows logarithmically with the set size. Formally, let
In particular, if
Bounds related to the VC dimension
Let
This means that, for every set
With more advanced techniques (Dudley's entropy bound and Haussler's upper bound) one can show, for example, that there exists a constant
Bounds related to linear classes
The following bounds are related to linear operations on
1. Define
2. Define
Bounds related to covering numbers
The following bound relates the Rademacher complexity of a set
Suppose
In particular, if
Substituting this in the previous bound gives the following bound on the Rademacher complexity:
Gaussian complexity
Gaussian complexity is a similar complexity with similar physical meanings, and can be obtained from the Rademacher complexity using the random variables