In estimation theory and statistics, the Cramér–Rao bound (CRB) or Cramér–Rao lower bound (CRLB), named in honor of Harald Cramér and Calyampudi Radhakrishna Rao who were among the first to derive it, expresses a lower bound on the variance of estimators of a deterministic parameter. The bound is also known as the Cramér–Rao inequality or the information inequality.
Contents
- Statement
- Scalar unbiased case
- General scalar case
- Bound on the variance of biased estimators
- Multivariate case
- Regularity conditions
- Simplified form of the Fisher information
- Single parameter proof
- Multivariate normal distribution
- Normal variance with known mean
- References
In its simplest form, the bound states that the variance of any unbiased estimator is at least as high as the inverse of the Fisher information. An unbiased estimator which achieves this lower bound is said to be (fully) efficient. Such a solution achieves the lowest possible mean squared error among all unbiased methods, and is therefore the minimum variance unbiased (MVU) estimator. However, in some cases, no unbiased technique exists which achieves the bound. This may occur even when an MVU estimator exists.
The Cramér–Rao bound can also be used to bound the variance of biased estimators of given bias. In some cases, a biased approach can result in both a variance and a mean squared error that are below the unbiased Cramér–Rao lower bound; see estimator bias.
Statement
The Cramer–Rao bound is stated in this section for several increasingly general cases, beginning with the case in which the parameter is a scalar and its estimator is unbiased. All versions of the bound require certain regularity conditions, which hold for most well-behaved distributions. These conditions are listed later in this section.
Scalar unbiased case
Suppose
where the Fisher information
and
The efficiency of an unbiased estimator
or the minimum possible variance for an unbiased estimator divided by its actual variance. The Cramér–Rao lower bound thus gives
General scalar case
A more general form of the bound can be obtained by considering an unbiased estimator
where
Bound on the variance of biased estimators
Apart from being a bound on estimators of functions of the parameter, this approach can be used to derive a bound on the variance of biased estimators with a given bias, as follows. Consider an estimator
The unbiased version of the bound is a special case of this result, with
It's trivial to have a small variance − an "estimator" that is constant has a variance of zero. But from the above equation we find that the mean squared error of a biased estimator is bounded by
using the standard decomposition of the MSE. Note, however, that if
Multivariate case
Extending the Cramér–Rao bound to multiple parameters, define a parameter column vector
with probability density function
The Fisher information matrix is a
Let
where
If
If it is inconvenient to compute the inverse of the Fisher information matrix, then one can simply take the reciprocal of the corresponding diagonal element to find a (possibly loose) lower bound.
Regularity conditions
The bound relies on two weak regularity conditions on the probability density function,
- The function
f ( x ; θ ) has bounded support inx , and the bounds do not depend onθ ; - The function
f ( x ; θ ) has infinite support, is continuously differentiable, and the integral converges uniformly for allθ .
Simplified form of the Fisher information
Suppose, in addition, that the operations of integration and differentiation can be swapped for the second derivative of
In this case, it can be shown that the Fisher information equals
The Cramèr–Rao bound can then be written as
In some cases, this formula gives a more convenient technique for evaluating the bound.
Single-parameter proof
The following is a proof of the general scalar case of the Cramér–Rao bound described above. Assume that
Let
where the chain rule is used in the final equality above. Then the expectation of
where the integral and partial derivative have been interchanged (justified by the second regularity condition).
If we consider the covariance
again because the integration and differentiation operations commute (second condition).
The Cauchy–Schwarz inequality shows that
therefore
which proves the proposition.
Multivariate normal distribution
For the case of a d-variate normal distribution
the Fisher information matrix has elements
where "tr" is the trace.
For example, let
Then the Fisher information is a scalar given by
and so the Cramér–Rao bound is
Normal variance with known mean
Suppose X is a normally distributed random variable with known mean
Then T is unbiased for
(the second equality follows directly from the definition of variance). The first term is the fourth moment about the mean and has value
Now, what is the Fisher information in the sample? Recall that the score V is defined as
where
where the second equality is from elementary calculus. Thus, the information in a single observation is just minus the expectation of the derivative of V, or
Thus the information in a sample of
The Cramer Rao bound states that
In this case, the inequality is saturated (equality is achieved), showing that the estimator is efficient.
However, we can achieve a lower mean squared error using a biased estimator. The estimator
obviously has a smaller variance, which is in fact
Its bias is
so its mean squared error is
which is clearly less than the Cramér–Rao bound found above.
When the mean is not known, the minimum mean squared error estimate of the variance of a sample from Gaussian distribution is achieved by dividing by n + 1, rather than n − 1 or n + 2.