![]() | ||
The James–Stein estimator is a biased estimator of the mean of Gaussian random vectors. It can be shown that the James–Stein estimator dominates the "ordinary" least squares approach, i.e., it has lower mean squared error on average. It is the best-known example of Stein's phenomenon.
Contents
An earlier version of the estimator was developed by Charles Stein in 1956, and is sometimes referred to as Stein's estimator. The result was improved by Willard James and Charles Stein in 1961.
Setting
Suppose θ is an unknown parameter vector of length
We are interested in obtaining an estimate
This is an everyday situation in which a set of parameters is measured, and the measurements are corrupted by independent Gaussian noise. Since the noise has zero mean, it is very reasonable to use the measurements themselves as an estimate of the parameters. This is the approach of the least squares estimator, which is
As a result, there was considerable shock and disbelief when Stein demonstrated that, in terms of mean squared error
The James–Stein estimator
If
James and Stein showed that the above estimator dominates
Notice that if
It is interesting to note that the James–Stein estimator dominates the usual estimator for any ν. A natural question to ask is whether the improvement over the usual estimator is independent of the choice of ν. The answer is no. The improvement is small if
Interpretation
Seeing the James–Stein estimator as an empirical Bayes method gives some intuition to this result: One assumes that θ itself is a random variable with prior distribution
A consequence of the above discussion is the following counterintuitive result: When three or more unrelated parameters are measured, their total MSE can be reduced by using a combined estimator such as the James–Stein estimator; whereas when each parameter is estimated separately, the least squares (LS) estimator is admissible. A quirky example would be estimating the speed of light, tea consumption in Taiwan, and hog weight in Montana, all together. The James–Stein estimator always improves upon the total MSE, i.e., the sum of the expected errors of each component. Therefore, the total MSE in measuring light speed, tea consumption, and hog weight would improve by using the James–Stein estimator. However, any particular component (such as the speed of light) would improve for some parameter values, and deteriorate for others. Thus, although the James–Stein estimator dominates the LS estimator when three or more parameters are estimated, any single component does not dominate the respective component of the LS estimator.
The conclusion from this hypothetical example is that measurements should be combined if one is interested in minimizing their total MSE. For example, in a telecommunication setting, it is reasonable to combine channel tap measurements in a channel estimation scenario, as the goal is to minimize the total channel estimation error. Conversely, there could be objections to combining channel estimates of different users, since no user would want their channel estimate to deteriorate in order to improve the average network performance.
Improvements
The basic James–Stein estimator has the peculiar property that for small values of
This estimator has a smaller risk than the basic James–Stein estimator. It follows that the basic James–Stein estimator is itself inadmissible.
It turns out, however, that the positive-part estimator is also inadmissible. This follows from a general result which requires admissible estimators to be smooth.
Extensions
The James–Stein estimator may seem at first sight to be a result of some peculiarity of the problem setting. In fact, the estimator exemplifies a very wide-ranging effect, namely, the fact that the "ordinary" or least squares estimator is often inadmissible for simultaneous estimation of several parameters. This effect has been called Stein's phenomenon, and has been demonstrated for several different problem settings, some of which are briefly outlined below.