Sanov's theorem - Alchetron, The Free Social Encyclopedia

In information theory, Sanov's theorem gives a bound on the probability of observing an atypical sequence of samples from a given probability distribution.

Let A be a set of probability distributions over an alphabet X, and let q be an arbitrary distribution over X (where q may or may not be in A). Suppose we draw n i.i.d. samples from q, represented by the vector x n = x 1 , x 2 , … , x n . Further, let us ask that the empirical distribution, p ^ x n , of the samples falls within the set A—formally, we write { x n : p ^ x n ∈ A } . Then,

q n ( x n ) ≤ ( n + 1 ) | X | 2 − n D K L ( p ∗ | | q ) ,

where

q n ( x n ) is shorthand for q ( x 1 ) q ( x 2 ) ⋯ q ( x n ) , and

p ∗ is the information projection of q onto A.

In words, the probability of drawing an atypical distribution is proportional to the KL distance from the true distribution to the atypical one; in the case that we consider a set of possible atypical distributions, there is a dominant atypical distribution, given by the information projection.

Furthermore, if A is the closure of its interior,

lim n → ∞ 1 n log ⁡ q n ( x n ) = − D K L ( p ∗ | | q ) .

References

Sanov's theorem Wikipedia

(Text) CC BY-SA