Trisha Shetty (Editor)

Asymptotic equipartition property

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit

In information theory, the asymptotic equipartition property (AEP) is a general property of the output samples of a stochastic source. It is fundamental to the concept of typical set used in theories of compression.

Contents

Roughly speaking, the theorem states that although there are many series of results that may be produced by a random process, the one actually produced is most probably from a loosely defined set of outcomes that all have approximately the same chance of being the one actually realized. (This is a consequence of the law of large numbers and ergodic theory.) Although there are individual outcomes which have a higher probability than any outcome in this set, the vast number of outcomes in the set almost guarantees that the outcome will come from the set. One way of intuitively understanding the property is through Cramér's large deviation theorem, which states that the probability of a large deviation from mean decays exponentially with the number of samples. Such results are studied in large deviations theory; intuitively, it is the large deviations that would violate equipartition, but these are unlikely.

In the field of pseudorandom number generation, a candidate generator of undetermined quality whose output sequence lies too far outside the typical set by some statistical criteria is rejected as insufficiently random. Thus, although the typical set is loosely defined, practical notions arise concerning sufficient typicality.

Definition

Given a discrete-time stationary ergodic stochastic process X on the probability space (Ω, B, p), AEP is an assertion that

1 n log p ( X 1 , X 2 , , X n ) H ( X )  as  n

where H(X) or simply H denotes the entropy rate of X, which must exist for all discrete-time stationary processes including the ergodic ones. AEP is proved for finite-valued (i.e. |Ω| < ∞) stationary ergodic stochastic processes in the Shannon–McMillan–Breiman theorem using the ergodic theory and for any i.i.d. sources directly using the law of large numbers in both the discrete-valued case (where H is simply the entropy of a symbol) and the continuous-valued case (where H is the differential entropy instead). The definition of AEP can also be extended for certain classes of continuous-time stochastic processes for which a typical set exists for long enough observation time. The convergence is proven almost sure in all cases.

AEP for non-stationary discrete-time source producing independent symbols

The assumptions of stationarity/ergodicity/identical distribution of random variables is not essential for the AEP to hold. Indeed, as is quite clear intuitively, the AEP requires only some form of the law of large numbers to hold, which is fairly general. However, the expression needs to be suitably generalized, and the conditions need to be formulated precisely.

We assume that the source is producing independent symbols, with possibly different output statistics at each instant. We assume that the statistics of the process are known completely, that is, the marginal distribution of the process seen at each time instant is known. The joint distribution is just the product of marginals. Then, under the condition (which can be relaxed) that V a r [ log [ [ p ( X i ) ] ] ] < M for all i, for some M > 0, the following holds (AEP):

lim n Pr [ | 1 n log p ( X 1 , X 2 , , X n ) H ¯ n ( X ) | < ϵ ] = 1 ϵ > 0

where

H ¯ n ( X ) = 1 n H ( X 1 , X 2 , , X n )

Proof

The proof follows from a simple application of Markov's inequality (applied to second moment of log ( p ( X i ) ) .

Pr [ | 1 n log p ( X 1 , X 2 , , X n ) H ¯ ( X ) | > ϵ ] 1 n 2 ϵ 2 V a r [ i = 1 n ( log ( p ( X i ) ) 2 ] M n ϵ 2 0  as  n

It is obvious that the proof holds if any moment E [ | log [ [ p ( X i ) ] ] | r ] is uniformly bounded for r > 1 (again by Markov's inequality applied to r-th moment).

Even this condition is not necessary, but given a non-stationary random process, it should not be difficult to test whether the AEP holds using the above method.

Applications for AEP for non-stationary source producing independent symbols

The AEP for non-stationary discrete-time independent process leads us to (among other results) source coding theorem for non-stationary source (with independent output symbols) and channel coding theorem for non-stationary memoryless channels.

Source coding theorem

The source coding theorem for discrete time non-stationary independent sources can be found here: source coding theorem

Channel coding theorem

The channel coding theorem for discrete time non-stationary memoryless channels can be found here: noisy channel coding theorem

Category theory

A category theoretic definition for the equipartition property is given by Gromov Given a sequence of Cartesian powers P N = P × × P of a measure space P, this sequence admits an asymptotically equivalent sequence HN of homogeneous measure spaces (i.e. all sets have the same measure; all morphisms are invariant under the group of automorphisms, and thus factor as a morphism to the terminal object) .

The above requires a definition of asymptotic equivalence. This is given in terms of a distance function, giving how much an injective correspondence differs from an isomorphism. An injective correspondence π : P Q is a partially defined map that is a bijection; that is, it is a bijection between a subset P P and Q Q . Then define

| P Q | π = | P P | + | Q Q |

where |S| denotes the measure of a set S. In what follows, the measure of P and Q are taken to be 1, so that the measure spaces are probability spaces. This distance | P Q | π is commonly known as the earth mover's distance or Wasserstein metric.

Similarly, define

| log P : Q | π = sup p P | log p log π ( p ) | log min ( | set ( P ) | , | set ( Q ) | )

with | set ( P ) | taken to be the counting measure on P. Thus, this definition requires that P be a finite measure space. Finally, let

dist π ( P , Q ) = | P Q | π + | log P : Q | π

A sequence of injective correspondences π N : P N Q N are then asymptotically equivalent when

dist π N ( P N , Q N ) 0  as  N

Given a homogenous space sequence HN that is asymptotically equivalent to PN, the entropy H(P) of P may be taken as

H ( P ) = lim N 1 N | s e t ( H N ) |

References

Asymptotic equipartition property Wikipedia