In statistics, importance sampling is a general technique for estimating properties of a particular distribution, while only having samples generated from a different distribution than the distribution of interest. It is related to umbrella sampling in computational physics. Depending on the application, the term may refer to the process of sampling from this alternative distribution, the process of inference, or both.
Contents
Basic theory
Let
and the precision of this estimate depends on the variance of X,
The basic idea of importance sampling is to sample the states from a different distribution to lower the variance of the estimation of E[X;P], or when sampling from P is difficult. This is accomplished by first choosing a random variable
The variable X/L will thus be sampled under P(L) to estimate E[X;P] as above and this estimation is improved when
When X is of constant sign over Ω, the best variable L would clearly be
to the right,
therefore, a good probability change P(L) in importance sampling will redistribute the law of X so that its samples' frequencies are sorted directly according to their weights in E[X;P]. Hence the name "importance sampling."
Importance sampling is often used as a Monte Carlo integrator. When
Application to probabilistic inference
Such methods are frequently used to estimate posterior densities or expectations in state and/or parameter estimation problems in probabilistic models that are too hard to treat analytically, for example in Bayesian networks.
Application to simulation
Importance sampling is a variance reduction technique that can be used in the Monte Carlo method. The idea behind importance sampling is that certain values of the input random variables in a simulation have more impact on the parameter being estimated than others. If these "important" values are emphasized by sampling more frequently, then the estimator variance can be reduced. Hence, the basic methodology in importance sampling is to choose a distribution which "encourages" the important values. This use of "biased" distributions will result in a biased estimator if it is applied directly in the simulation. However, the simulation outputs are weighted to correct for the use of the biased distribution, and this ensures that the new importance sampling estimator is unbiased. The weight is given by the likelihood ratio, that is, the Radon–Nikodym derivative of the true underlying distribution with respect to the biased simulation distribution.
The fundamental issue in implementing importance sampling simulation is the choice of the biased distribution which encourages the important regions of the input variables. Choosing or designing a good biased distribution is the "art" of importance sampling. The rewards for a good distribution can be huge run-time savings; the penalty for a bad distribution can be longer run times than for a general Monte Carlo simulation without importance sampling.
Consider
It can be shown that the following distribution minimizes the above variance:
It is easy to see that when
Mathematical approach
Consider estimating by simulation the probability
One can show that
where
is a likelihood ratio and is referred to as the weighting function. The last equality in the above equation motivates the estimator
This is the importance sampling estimator of
Now, the importance sampling problem then focuses on finding a biasing density
Conventional biasing methods
Although there are many kinds of biasing methods, the following two methods are most widely used in the applications of importance sampling.
Scaling
Shifting probability mass into the event region
In importance sampling by scaling, the simulation density is chosen as the density function of the scaled random variable
and the weighting function is
While scaling shifts probability mass into the desired event region, it also pushes mass into the complementary region
Translation
Another simple and effective biasing technique employs translation of the density function (and hence random variable) to place much of its probability mass in the rare event region. Translation does not suffer from a dimensionality effect and has been successfully used in several applications relating to simulation of digital communication systems. It often provides better simulation gains than scaling. In biasing by translation, the simulation density is given by
where
Effects of system complexity
The fundamental problem with importance sampling is that designing good biased distributions becomes more complicated as the system complexity increases. Complex systems are the systems with long memory since complex processing of a few inputs is much easier to handle. This dimensionality or memory can cause problems in three ways:
In principle, the importance sampling ideas remain the same in these situations, but the design becomes much harder. A successful approach to combat this problem is essentially breaking down a simulation into several smaller, more sharply defined subproblems. Then importance sampling strategies are used to target each of the simpler subproblems. Examples of techniques to break the simulation down are conditioning and error-event simulation (EES) and regenerative simulation.
Evaluation of importance sampling
In order to identify successful importance sampling techniques, it is useful to be able to quantify the run-time savings due to the use of the importance sampling approach. The performance measure commonly used is
Variance cost function
Variance is not the only possible cost function for a simulation, and other cost functions, such as the mean absolute deviation, are used in various statistical applications. Nevertheless, the variance is the primary cost function addressed in the literature, probably due to the use of variances in confidence intervals and in the performance measure
An associated issue is the fact that the ratio
Multiple and Adaptive Importance Sampling
When different proposal distributions,