Inequalities are very important in the study of information theory. There are a number of different contexts in which these inequalities appear.
Contents
Shannon-type inequalities
Consider a finite collection of finitely (or at most countably) supported random variables on the same probability space. For a collection of n random variables, there are 2n − 1 such non-empty subsets for which entropies can be defined. For example, when n = 2, we may consider the entropies
In fact, these can all be expressed as special cases of a single inequality involving the conditional mutual information, namely
where
The cone in
Non-Shannon-type inequalities
Other, less trivial inequalities have been discovered among the entropies and joint entropies of four or more random variables, which cannot be derived from Shannon's basic inequalities. These are known as non-Shannon-type inequalities. In 1997 and 1998, Zhang and Yeung reported two non-Shannon-type inequalities. The latter implies that
where the inclusions are proper for
Further non-Shannon-type inequalities were reported in. Dougherty et al. found a number of non-Shannon-type inequalities by computer search. Matus proved the existence of infinitely many linear non-Shannon-type inequalities.
Lower bounds for the Kullback–Leibler divergence
A great many important inequalities in information theory are actually lower bounds for the Kullback–Leibler divergence. Even the Shannon-type inequalities can be considered part of this category, since the bivariate mutual information can be expressed as the Kullback–Leibler divergence of the joint distribution with respect to the product of the marginals, and thus these inequalities can be seen as a special case of Gibbs' inequality.
On the other hand, it seems to be much more difficult to derive useful upper bounds for the Kullback–Leibler divergence. This is because the Kullback–Leibler divergence DKL(P||Q) depends very sensitively on events that are very rare in the reference distribution Q. DKL(P||Q) increases without bound as an event of finite non-zero probability in the distribution P becomes exceedingly rare in the reference distribution Q, and in fact DKL(P||Q) is not even defined if an event of non-zero probability in P has zero probability in Q. (Hence the requirement that P be absolutely continuous with respect to Q.)
Gibbs' inequality
This fundamental inequality states that the Kullback–Leibler divergence is non-negative.
Kullback's inequality
Another inequality concerning the Kullback–Leibler divergence is known as Kullback's inequality. If P and Q are probability distributions on the real line with P absolutely continuous with respect to Q, and whose first moments exist, then
where
The Cramér–Rao bound is a corollary of this result.
Pinsker's inequality
Pinsker's inequality relates Kullback–Leibler divergence and total variation distance. It states that if P, Q are two probability distributions, then
where
is the Kullback–Leibler divergence in nats and
is the total variation distance.
Hirschman uncertainty
In 1957, Hirschman showed that for a (reasonably well-behaved) function
Hirschman conjectured, and it was later proved, that a sharper bound of
Tao's inequality
Given discrete random variables
relating the conditional expectation to the conditional mutual information. This is a simple consequence of Pinsker's inequality. (Note: the correction factor log 2 inside the radical arises because we are measuring the conditional mutual information in bits rather than nats.)