Uniform convergence in probability is a concept in probability theory with applications to statistical learning theory. It means that, under certain conditions, the empirical frequencies of all events in a certain event-family converge to their theoretical probabilities.
Contents
- Definitions
- Uniform convergence theorem statement
- SauerShelah lemma
- Proof of uniform convergence theorem
- Symmetrization
- Permutations
- Reduction to a finite class
- References
The law of large numbers says that, for each single event, its empirical frequency in a sequence of independent trials converges (with high probability) to its theoretical probability. But in some applications, we are interested not in a single event but in a whole family of events. We would like to know whether the empirical frequency of every event in the family converges to its theoretical probability simultaneously. The Uniform Convergence Theorem gives a sufficient condition for this convergence to hold. Roughly, if the event-family is sufficiently simple (its VC dimension is sufficiently small) then uniform convergence holds.
Definitions
For a class of predicates                     
The theoretical probability of                     
The Uniform Convergence Theorem states, roughly, that if                     
Here "simple" means that the Vapnik–Chervonenkis dimension of the class                     
The Uniform Convergence Theorem was first proved by Vapnik and Chervonenkis using the concept of growth function.
Uniform convergence theorem statement
If                     
where, for any                     
and                               
                    
And for any natural number                     
From the point of Learning Theory one can consider                     
Sauer–Shelah lemma
The Sauer–Shelah lemma relates the shattering number                     
Lemma:                     
Corollary:                     
Proof of uniform convergence theorem
and are the sources of the proof below. Before we get into the details of the proof of the Uniform Convergence Theorem we will present a high level overview of the proof.
- Symmetrization: We transform the problem of analyzing                               | Q P ( h ) − Q ^ x ( h ) | ≥ ε into the problem of analyzing | Q ^ r ( h ) − Q ^ s ( h ) | ≥ ε / 2 , where r and s are i.i.d samples of size m drawn according to the distribution P . One can view r as the original randomly drawn sample of length m , while s may be thought as the testing sample which is used to estimate Q P ( h ) . 
- Permutation: Since                     r and s are picked identically and independently, so swapping elements between them will not change the probability distribution on r and s . So, we will try to bound the probability of | Q ^ r ( h ) − Q ^ s ( h ) | ≥ ε / 2 for some h ∈ H by considering the effect of a specific collection of permutations of the joint sample x = r | | s . Specifically, we consider permutations σ ( x ) which swap x i and x m + i in some subset of 1 , 2 , . . . , m . The symbol r | | s means the concatenation of r and s . 
- Reduction to a finite class: We can now restrict the function class                     H to a fixed joint sample and hence, if H has finite VC Dimension, it reduces to the problem to one involving a finite function class. 
We present the technical details of the proof.
Symmetrization
Lemma: Let                     
Then for                     
Proof: By the triangle inequality,
if                               
Therefore,
since                     
Now for                     
Thus for any                     
Notice,                     
for the mentioned bound on                     
Permutations
Let                     
Lemma: Let                     
where the expectation is over                     
Proof: For any                     
(since coordinate permutations preserve the product distribution                     
The maximum is guaranteed to exist since there is only a finite set of values that probability under a random permutation can take.
Reduction to a finite class
Lemma: Basing on the previous lemma,
Proof: Let us define                     
We see that                     
For                     
Since, the distribution over the permutations                     
Thus,
where the probability on the right is over                     
Finally, combining all the three parts of the proof we get the Uniform Convergence Theorem.
