In statistics, the reduced chi-squared statistic is used extensively in goodness of fit testing. It is also known as mean square weighted deviation (MSWD) in isotopic dating and variance of unit weight in the context of weighted least squares.
Contents
Definition
It is defined as chi-squared per degree of freedom:
where the chi-squared is a weighted sum of squared deviations:
with input variance
In weighted least squares, the definition is often written in matrix notation as:
where r is the vector of residuals and W is the weight matrix, the inverse of the input (diagonal) covariance matrix of observations.
Discussion
As a rule of thumb when the variance of the measurement error is known a priori, a
When the variance of the measurement error is only partially known, the reduced chi-squared may serve as a correction estimated a posteriori, see weighted arithmetic mean#Correcting for over- or under-dispersion.
Applications
In geochronology, the MSWD is a measure of goodness of fit that takes into account the relative importance of both the internal and external reproducibility, with most common usage in isotopic dating.
In general when:
MSWD = 1 if the age data fit a univariate normal distribution in t (for the arithmetic mean age) or log(t) (for the geometric mean age) space, or if the compositional data fit a bivariate normal distribution in [log(U/He),log(Th/He)]-space (for the central age).
MSWD < 1 if the observed scatter is less than that predicted by the analytical uncertainties. In this case, the data are said to be "underdispersed", indicating that the analytical uncertainties were overestimated.
MSWD > 1 if the observed scatter exceeds that predicted by the analytical uncertainties. In this case, the data are said to be "overdispersed". This situation is the rule rather than the exception in (U-Th)/He geochronology, indicating an incomplete understanding of the isotope system. Several reasons have been proposed to explain the overdispersion of (U-Th)/He data, including unevenly distributed U-Th distributions and radiation damage.
Often the geochronologist will determine a series of age measurements on a single sample, with the measured value
The arithmetic mean of the age determinations is:
but this value can be misleading unless each determination of the age is of equal significance.
When each measured value can be assumed to have the same weighting, or significance, the biased and unbiased (or "sample" and "population", respectively) estimators of the variance are computed as follows:
The standard deviation is the square root of the variance.
When individual determinations of an age are not of equal significance it is better to use a weighted mean to obtain an 'average' age, as follows:
The biased weighted estimator of variance can be shown to be:
which can be computed on the fly as
The unbiased weighted estimator of the sample variance can be computed as follows:
Again the corresponding standard deviation is the square root of the variance.
The unbiased weighted estimator of the sample variance can also be computed on the fly as follows:
The unweighted mean square of the weighted deviations (unweighted MSWD) can then be computed, as follows:
By analogy the weighted mean square of the weighted deviations (weighted MSWD) can be computed, as follows: