![]() | ||
In statistics, originally in geostatistics, kriging or Gaussian process regression is a method of interpolation for which the interpolated values are modeled by a Gaussian process governed by prior covariances, as opposed to a piecewise-polynomial spline chosen to optimize smoothness of the fitted values. Under suitable assumptions on the priors, kriging gives the best linear unbiased prediction of the intermediate values. Interpolating methods based on other criteria such as smoothness need not yield the most likely intermediate values. The method is widely used in the domain of spatial analysis and computer experiments. The technique is also known as Wiener–Kolmogorov prediction, after Norbert Wiener and Andrey Kolmogorov.
Contents
- Related terms and techniques
- Geostatistical estimator
- Linear estimation
- Methods
- Ordinary kriging
- Simple kriging
- Properties
- Applications
- Design and analysis of computer experiments
- References
The theoretical basis for the method was developed by the French mathematician Georges Matheron in 1960, based on the Master's thesis of Danie G. Krige, the pioneering plotter of distance-weighted average gold grades at the Witwatersrand reef complex in South Africa. Krige sought to estimate the most likely distribution of gold based on samples from a few boreholes. The English verb is to krige and the most common noun is kriging; both are often pronounced with a hard "g", following the pronunciation of the name "Krige". The word is sometimes capitalized as Kriging in the literature.
Related terms and techniques
The basic idea of kriging is to predict the value of a function at a given point by computing a weighted average of the known values of the function in the neighborhood of the point. The method is mathematically closely related to regression analysis. Both theories derive a best linear unbiased estimator, based on assumptions on covariances, make use of Gauss-Markov theorem to prove independence of the estimate and error, and make use of very similar formulae. Even so, they are useful in different frameworks: kriging is made for estimation of a single realization of a random field, while regression models are based on multiple observations of a multivariate data set.
The kriging estimation may also be seen as a spline in a reproducing kernel Hilbert space, with the reproducing kernel given by the covariance function. The difference with the classical kriging approach is provided by the interpretation: while the spline is motivated by a minimum norm interpolation based on a Hilbert space structure, kriging is motivated by an expected squared prediction error based on a stochastic model.
Kriging with polynomial trend surfaces is mathematically identical to generalized least squares polynomial curve fitting.
Kriging can also be understood as a form of Bayesian inference. Kriging starts with a prior distribution over functions. This prior takes the form of a Gaussian process:
Geostatistical estimator
In geostatistical models, sampled data is interpreted as the result of a random process. The fact that these models incorporate uncertainty in their conceptualization doesn't mean that the phenomenon - the forest, the aquifer, the mineral deposit - has resulted from a random process, but rather it allows one to build a methodological basis for the spatial inference of quantities in unobserved locations, and to quantify the uncertainty associated with the estimator.
A stochastic process is, in the context of this model, simply a way to approach the set of data collected from the samples. The first step in geostatistical modulation is to create a random process that best describes the set of observed data.
A value from location
The set of random variables constitutes a random function of which only one realization is known
For instance, if one assumes, based on the homogeneity of samples in area
The hypothesis of stationarity related to the second moment is defined in the following way: the correlation between two random variables solely depends on the spatial distance between them, and is independent of their location:
where
This hypothesis allows one to infer those two measures - the variogram and the covariogram - based on the
where
Linear estimation
Spatial inference, or estimation, of a quantity
The weights
When calculating the weights
If the cloud of real values
The second criterion says that the mean of the squared deviations
Methods
Depending on the stochastic properties of the random field and the various degrees of stationarity assumed, different methods for calculating the weights can be deduced, i.e. different types of kriging apply. Classical methods are:
Ordinary kriging
The unknown value
In order to deduce the kriging system for the assumptions of the model, the following error committed while estimating
The two quality criteria referred to previously can now be expressed in terms of the mean and variance of the new random variable
Lack of bias:
Since the random function is stationary,
In order to ensure that the model is unbiased, the weights must sum to one.
Minimum Variance:
Two estimators can have
* see covariance matrix for a detailed explanation
* where the literals
Once defined the covariance model or variogram,
Some conclusions can be asserted from this expression. The variance of estimation:
Solving this optimization problem (see Lagrange multipliers) results in the kriging system:
the additional parameter
Simple kriging
Simple kriging is mathematically the simplest, but the least general. It assumes the expectation of the random field to be known, and relies on a covariance function. However, in most applications neither the expectation nor the covariance are known beforehand.
The practical assumptions for the application of simple kriging are:
The kriging weights of simple kriging have no unbiasedness condition and are given by the simple kriging equation system:
This is analogous to a linear regression of
The interpolation by simple kriging is given by:
The kriging error is given by:
which leads to the generalised least squares version of the Gauss-Markov theorem (Chiles & Delfiner 1999, p. 159):
Properties
(Cressie 1993, Chiles&Delfiner 1999, Wackernagel 1995)
Applications
Although kriging was developed originally for applications in geostatistics, it is a general method of statistical interpolation that can be applied within any discipline to sampled data from random fields that satisfy the appropriate mathematical assumptions. It can be used where spatially-related data has been collected (in 2-D or 3-D) and estimates of "fill-in" data are desired in the locations (spatial gaps) between the actual measurements.
To date kriging has been used in a variety of disciplines, including the following:
Design and analysis of computer experiments
Another very important and rapidly growing field of application, in engineering, is the interpolation of data coming out as response variables of deterministic computer simulations, e.g. finite element method (FEM) simulations. In this case, kriging is used as a metamodeling tool, i.e. a black box model built over a designed set of computer experiments. In many practical engineering problems, such as the design of a metal forming process, a single FEM simulation might be several hours or even a few days long. It is therefore more efficient to design and run a limited number of computer simulations, and then use a kriging interpolator to rapidly predict the response in any other design point. Kriging is therefore used very often as a so-called surrogate model, implemented inside optimization routines.