Truncated normal distribution - Alchetron, the free social encyclopedia

Support x ∈ [a,b]

Notation ξ = x − μ σ , α = a − μ σ , β = b − μ σ {displaystyle xi ={rac {x-mu }{sigma }}, alpha ={rac {a-mu }{sigma }}, eta ={rac {b-mu }{sigma }}} Z = Φ ( β ) − Φ ( α ) {displaystyle Z=Phi (eta )-Phi (alpha )} Parameters μ ∈ R — locationσ ≥ 0 — squared scalea ∈ R — minimum valueb ∈ R — maximum value PDF f ( x ; μ , σ , a , b ) = ϕ ( ξ ) σ Z {displaystyle f(x;mu ,sigma ,a,b)={rac {phi (xi )}{sigma Z}},} CDF F ( x ; μ , σ , a , b ) = Φ ( ξ ) − Φ ( α ) Z {displaystyle F(x;mu ,sigma ,a,b)={rac {Phi (xi )-Phi (alpha )}{Z}}} Mean μ + ϕ ( α ) − ϕ ( β ) Z σ {displaystyle mu +{rac {phi (alpha )-phi (eta )}{Z}}sigma }

In probability and statistics, the truncated normal distribution is the probability distribution of a normally distributed random variable whose value is either bounded below or above (or both). The truncated normal distribution has wide applications in statistics and econometrics. For example, it is used to model the probabilities of the binary outcomes in the probit model and to model censored data in the Tobit model.

Definition

Suppose X ∼ N ( μ , σ 2 ) has a normal distribution and lies within the interval X ∈ ( a , b ) , − ∞ ≤ a < b ≤ ∞ . Then X conditional on a < X < b has a truncated normal distribution.

Its probability density function, f , for a ≤ x ≤ b , is given by

f ( x ; μ , σ , a , b ) = ϕ ( x − μ σ ) σ ( Φ ( b − μ σ ) − Φ ( a − μ σ ) )

and by f = 0 otherwise.

Here,

ϕ ( ξ ) = 1 2 π exp ⁡ ( − 1 2 ξ 2 )

is the probability density function of the standard normal distribution and Φ ( ⋅ ) is its cumulative distribution function

Φ ( x ) = 1 2 ( 1 + e r f ( x / 2 ) ) .

There is an understanding that if b = ∞ , then Φ ( b − μ σ ) = 1 , and similarly, if a = − ∞ , then Φ ( a − μ σ ) = 0 .

Moments

Let α = ( a − μ ) / σ and β = ( b − μ ) / σ .

Two sided truncation:

E ⁡ ( X ∣ a < X < b ) = μ + σ ϕ ( a − μ σ ) − ϕ ( b − μ σ ) Φ ( b − μ σ ) − Φ ( a − μ σ ) = μ + σ ϕ ( α ) − ϕ ( β ) Φ ( β ) − Φ ( α ) Var ⁡ ( X ∣ a < X < b ) = σ 2 [ 1 + a − μ σ ϕ ( a − μ σ ) − b − μ σ ϕ ( b − μ σ ) Φ ( b − μ σ ) − Φ ( a − μ σ ) − ( ϕ ( a − μ σ ) − ϕ ( b − μ σ ) Φ ( b − μ σ ) − Φ ( a − μ σ ) ) 2 ] = σ 2 [ 1 + α ϕ ( α ) − β ϕ ( β ) Φ ( β ) − Φ ( α ) − ( ϕ ( α ) − ϕ ( β ) Φ ( β ) − Φ ( α ) ) 2 ]

One sided truncation (upper tail)

In this case Z = 1 − Φ ( α ) , ϕ ( β ) = 0 , Φ ( β ) = 1

E ⁡ ( X ∣ X > a ) = μ + σ ϕ ( α ) / Z Var ⁡ ( X ∣ X > a ) = σ 2 [ 1 + α ϕ ( α ) / Z − ( ϕ ( α ) / Z ) 2 ] .

One sided truncation (lower tail)

E ⁡ ( X ∣ X < b ) = μ − σ ϕ ( β ) Φ ( β ) Var ⁡ ( X ∣ X < b ) = σ 2 [ 1 − β ϕ ( β ) Φ ( β ) − ( ϕ ( β ) Φ ( β ) ) 2 ] ,

Barr and Sherrill (1999) give a simpler expression for the variance of one sided truncations. Their formula is in terms of the chi-square CDF, which is implemented in standard software libraries. Bebu and Mathew (2009) provide formulas for (generalized) confidence intervals around the truncated moments.

Differential equation

{ σ 2 f ′ ( x ) + f ( x ) ( x − μ ) = 0 , f ( 0 ) = 2 π e − μ 2 2 σ 2 σ ( erf ( μ − a 2 σ ) − erf ( μ − b 2 σ ) ) }

A recursive formula

As for the non-truncated case, there is a neat recursive formula for the truncated moments. See.

Simulating

A random variate x defined as x = Φ − 1 ( Φ ( α ) + U ⋅ ( Φ ( β ) − Φ ( α ) ) ) σ + μ with Φ the cumulative distribution function and Φ − 1 its inverse, U a uniform random number on ( 0 , 1 ) , follows the distribution truncated to the range ( a , b ) . This is simply the inverse transform method for simulating random variables. Although one of the simplest, this method can either fail when sampling in the tail of the normal distribution, or be much too slow. Thus, in practice, one has to find alternative methods of simulation.

One such truncated normal generator (implemented in Matlab and in R (programming language) as trandn.R ) is based on an acceptance rejection idea due to Marsaglia. Despite the slightly suboptimal acceptance rate of Marsaglia (1964) in comparison with Robert (1995), Marsaglia's method is typically faster, because it does not require the costly numerical evaluation of the exponential function.

For more on simulating a draw from the truncated normal distribution, see Robert (1995), Lynch (2007) Section 8.1.3 (pages 200–206), Devroye (1986). The MSM package in R has a function, rtnorm, that calculates draws from a truncated normal. The truncnorm package in R also has functions to draw from a truncated normal.

Chopin (2011) proposed (arXiv) an algorithm inspired from the Ziggurat algorithm of Marsaglia and Tsang (1984, 2000), which is usually considered as the fastest Gaussian sampler, and is also very close to Ahrens’s algorithm (1995). Implementations can be found in C, C++, Matlab and Python.

Sampling from the multivariate truncated normal distribution is considerably more difficult. Exact or perfect simulation is only feasible in the case of truncation of the normal distribution to a polytope region. In more general cases, Damien and Walker (2001) introduce a general methodology for sampling truncated densities within a Gibbs sampling framework. Their algorithm introduces one latent variable and, within a Gibbs sampling framework, it is more computationally efficient than the algorithm of Robert (1995).

References

Truncated normal distribution Wikipedia

(Text) CC BY-SA

Contents

Definition

Moments

Simulating

References