In statistics, M-estimators are a broad class of estimators, which are obtained as the minima of sums of functions of the data. Least-squares estimators are a special case of M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators. The statistical procedure of evaluating an M-estimator on a data set is called M-estimation.
Contents
- Historical motivation
- Definition
- Types of M estimators
- type
- Computation
- Distribution
- Influence function
- Applications
- Mean
- Median
- References
More generally, an M-estimator may be defined to be a zero of an estimating function. This estimating function is often the derivative of another statistical function. For example, a maximum-likelihood estimate is often defined to be a zero of the derivative of the likelihood function with respect to the parameter; thus, a maximum-likelihood estimator is often a critical point of the score function. In many applications, such M-estimators can be thought of as estimating characteristics of the population.
Historical motivation
The method of least squares is a prototypical M-estimator, since the estimator is defined as a minimum of the sum of squares of the residuals.
Another popular M-estimator is maximum-likelihood estimation. For a family of probability density functions f parameterized by θ, a maximum likelihood estimator of θ is computed for each set of data by maximizing the likelihood function over the parameter space { θ } . When the observations are independent and identically distributed, a ML-estimate
or, equivalently,
Maximum-likelihood estimators are often inefficient and biased for finite samples. For many regular problems, maximum-likelihood estimation performs well for "large samples", being an approximation of a posterior mode. If the problem is "regular", then any bias of the MLE (or posterior mode) decreases to zero when the sample-size increases to infinity. The performance of maximum-likelihood (and posterior-mode) estimators drops when the parametric family is mis-specified.
Definition
In 1964, Peter J. Huber proposed generalizing maximum likelihood estimation to the minimization of
where ρ is a function with certain properties (see below). The solutions
are called M-estimators ("M" for "maximum likelihood-type" (Huber, 1981, page 43)); other types of robust estimator include L-estimators, R-estimators and S-estimators. Maximum likelihood estimators (MLE) are thus a special case of M-estimators. With suitable rescaling, M-estimators are special cases of extremum estimators (in which more general functions of the observations can be used).
The function ρ, or its derivative, ψ, can be chosen in such a way to provide the estimator desirable properties (in terms of bias and efficiency) when the data are truly from the assumed distribution, and 'not bad' behaviour when the data are generated from a model that is, in some sense, close to the assumed distribution.
Types of M-estimators
M-estimators are solutions, θ, which minimize
This minimization can always be done directly. Often it is simpler to differentiate with respect to θ and solve for the root of the derivative. When this differentiation is possible, the M-estimator is said to be of ψ-type. Otherwise, the M-estimator is said to be of ρ-type.
In most practical cases, the M-estimators are of ψ-type.
ρ-type
For positive integer r, let
For example, for the maximum likelihood estimator,
ψ-type
If
For example, for the maximum likelihood estimator,
Such an estimator is not necessarily an M-estimator of ρ-type, but if ρ has a continuous first derivative with respect to
If the function ψ decreases to zero as
Computation
For many choices of ρ or ψ, no closed form solution exists and an iterative approach to computation is required. It is possible to use standard function optimization algorithms, such as Newton-Raphson. However, in most cases an iteratively re-weighted least squares fitting algorithm can be performed; this is typically the preferred method.
For some choices of ψ, specifically, redescending functions, the solution may not be unique. The issue is particularly relevant in multivariate and regression problems. Thus, some care is needed to ensure that good starting points are chosen. Robust starting points, such as the median as an estimate of location and the median absolute deviation as a univariate estimate of scale, are common.
Distribution
It can be shown that M-estimators are asymptotically normally distributed. As such, Wald-type approaches to constructing confidence intervals and hypothesis tests can be used. However, since the theory is asymptotic, it will frequently be sensible to check the distribution, perhaps by examining the permutation or bootstrap distribution.
Influence function
The influence function of an M-estimator of
Let T be an M-estimator of ψ-type, and G be a probability distribution for which
assuming the density function
Applications
M-estimators can be constructed for location parameters and scale parameters in univariate and multivariate settings, as well as being used in robust regression.
Mean
Let (X1, ..., Xn) be a set of independent, identically distributed random variables, with distribution F.
If we define
we note that this is minimized when θ is the mean of the Xs. Thus the mean is an M-estimator of ρ-type, with this ρ function.
As this ρ function is continuously differentiable in θ, the mean is thus also an M-estimator of ψ-type for ψ(x, θ) = θ − x.
Median
For the median estimation of (X1, ..., Xn), instead we can define the ρ function as
and similarly, the ρ function is minimized when θ is the median of the Xs.
While this ρ function is not differentiable in θ, the ψ-type M-estimator, which is the subgradient of ρ function, can be expressed as
and