![]() | ||
In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proven by Jensen in 1906. Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation; it is a simple corollary that the opposite is true of concave transformations.
Contents
- Statements
- Finite form
- Measure theoretic and probabilistic form
- General inequality in a probabilistic setting
- Proofs
- Proof 1 finite form
- Proof 2 measure theoretic form
- Proof 3 general inequality in a probabilistic setting
- Form involving a probability density function
- Alternative finite form
- Statistical physics
- Information theory
- RaoBlackwell theorem
- References
Jensen's inequality generalizes the statement that the secant line of a convex function lies above the graph of the function, which is Jensen's inequality for two points: the secant line consists of weighted means of the convex function (where t ∈ [0,1]),
while the graph of the function is the convex function of the weighted means,
Thus, Jensen's inequality is
In the context of probability theory, it is generally stated in the following form: if X is a random variable and φ is a convex function, then
Statements
The classical form of Jensen's inequality involves several numbers and weights. The inequality can be stated quite generally using either the language of measure theory or (equivalently) probability. In the probabilistic setting, the inequality can be further generalized to its full strength.
Finite form
For a real convex function
and the inequality is reversed if
Equality holds if and only if
As a particular case, if the weights
For instance, the function log(x) is concave, so substituting
A common application has
Measure-theoretic and probabilistic form
Let
In real analysis, we may require an estimate on
where
The same result can be equivalently stated in a probability theory setting, by a simple change of notation. Let
In this probability setting, the measure μ is intended as a probability
Notice that the equality holds if X is constant (degenerate random variable) or if φ is linear, and even if there is
and φ is a linear function over A (that is, there are
General inequality in a probabilistic setting
More generally, let T be a real topological vector space, and X a T-valued integrable random variable. In this general setting, integrable means that there exists an element
Here
Proofs
Jensen's inequality can be proved in several ways, and three different proofs corresponding to the different statements above will be offered. Before embarking on these mathematical derivations, however, it is worth analyzing an intuitive graphical argument based on the probabilistic case where X is a real number (see figure). Assuming a hypothetical distribution of X values, one can immediately identify the position of
with equality when φ(X) is not strictly convex, e.g. when it is a straight line, or when X follows a degenerate distribution (i.e. is a constant).
The proofs below formalize this intuitive notion.
Proof 1 (finite form)
If λ1 and λ2 are two arbitrary nonnegative real numbers such that λ1 + λ2 = 1 then convexity of φ implies
This can be easily generalized: if λ1, ..., λn are nonnegative real numbers such that λ1 + ... + λn = 1, then
for any x1, ..., xn. This finite form of the Jensen's inequality can be proved by induction: by convexity hypotheses, the statement is true for n = 2. Suppose it is true also for some n, one needs to prove it for n + 1. At least one of the λi is strictly positive, say λ1; therefore by convexity inequality:
Since
one can apply the induction hypotheses to the last term in the previous formula to obtain the result, namely the finite form of the Jensen's inequality.
In order to obtain the general inequality from this finite form, one needs to use a density argument. The finite form can be rewritten as:
where μn is a measure given by an arbitrary convex combination of Dirac deltas:
Since convex functions are continuous, and since convex combinations of Dirac deltas are weakly dense in the set of probability measures (as could be easily verified), the general statement is obtained simply by a limiting procedure.
Proof 2 (measure-theoretic form)
Let g be a real-valued μ-integrable function on a probability space Ω, and let φ be a convex function on the real numbers. Since φ is convex, at each real number x we have a nonempty set of subderivatives, which may be thought of as lines touching the graph of φ at x, but which are at or below the graph of φ at all points.
Now, if we define
because of the existence of subderivatives for convex functions, we may choose a and b such that
for all real x and
But then we have that
for all x. Since we have a probability measure, the integral is monotone with μ(Ω) = 1 so that
as desired.
Proof 3 (general inequality in a probabilistic setting)
Let X be an integrable random variable that takes values in a real topological vector space T. Since
is decreasing as θ approaches 0+. In particular, the subdifferential of φ evaluated at x in the direction y is well-defined by
It is easily seen that the subdifferential is linear in y (that is false and the assertion requires Hahn-Banach theorem to be proved) and, since the infimum taken in the right-hand side of the previous formula is smaller than the value of the same term for θ = 1, one gets
In particular, for an arbitrary sub-σ-algebra
Now, if we take the expectation conditioned to
by the linearity of the subdifferential in the y variable, and the following well-known property of the conditional expectation:
Form involving a probability density function
Suppose Ω is a measurable subset of the real line and f(x) is a non-negative function such that
In probabilistic language, f is a probability density function.
Then Jensen's inequality becomes the following statement about convex integrals:
If g is any real-valued measurable function and φ is convex over the range of g, then
If g(x) = x, then this form of the inequality reduces to a commonly used special case:
Alternative finite form
Let Ω = {x1, ... xn}, and take μ to be the counting measure on Ω, then the general form reduces to a statement about sums:
provided that λi ≥ 0 and
There is also an infinite discrete form.
Statistical physics
Jensen's inequality is of particular importance in statistical physics when the convex function is an exponential, giving:
where the expected values are with respect to some probability distribution in the random variable X.
The proof in this case is very simple (cf. Chandler, Sec. 5.5). The desired inequality follows directly, by writing
and then applying the inequality eX ≥ 1 + X to the final exponential.
Information theory
If p(x) is the true probability distribution for x, and q(x) is another distribution, then applying Jensen's inequality for the random variable Y(x) = q(x)/p(x) and the function φ(y) = −log(y) gives
Therefore:
a result called Gibbs' inequality.
It shows that the average message length is minimised when codes are assigned on the basis of the true probabilities p rather than any other distribution q. The quantity that is non-negative is called the Kullback–Leibler divergence of q from p.
Since −log(x) is a strictly convex function for x > 0, it follows that equality holds when p(x) equals q(x) almost everywhere.
Rao–Blackwell theorem
If L is a convex function and
So if δ(X) is some estimator of an unobserved parameter θ given a vector of observables X; and if T(X) is a sufficient statistic for θ; then an improved estimator, in the sense of having a smaller expected loss L, can be obtained by calculating
the expected value of δ with respect to θ, taken over all possible vectors of observations X compatible with the same value of T(X) as that observed.
This result is known as the Rao–Blackwell theorem.