Girish Mahajan (Editor)

Exponential tilting

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit

In Monte Carlo Estimation, exponential tilting (ET), exponential twisting, or exponential change of measure (ECM) is a distribution shifting technique commonly used in rare-event simulation, and rejection and importance sampling in particular. Exponential tilting is also used in Esscher tilting, an indirect Edgeworth approximation technique. The earliest formalization of ECM is often attributed to Esscher with its use in importance sampling being attributed to David Siegmund. ET is known as the Esscher transform in mathematical finance and is used in such contexts as insurance futures pricing.

Contents

Overview

Given a random variable X with probability distribution P , density f , and m.g.f. M X ( θ ) = E [ e θ X ] < , the exponentially tilted measure P θ is defined as follows:

P θ ( X d x ) = E [ e θ X I [ X d x ] ] M X ( θ ) = e θ x κ ( θ ) P ( X d x ) , where κ ( θ ) is the c.g.f. l o g E [ e θ X ] , ( κ ( ) is often denoted ψ ( ) ).

Thus, P θ has density f θ ( x ) = e θ x f ( x ) M X ( θ ) = e x p { θ x κ ( θ ) } f ( x ) . The set of θ parameterized distributions is known as the natural exponential family generated by X . The exponential tilting of a random vector X has an analogous definition: P θ ( X d x ) = e θ T x κ ( θ ) P ( X d x ) , where κ ( θ ) = l o g E [ e x p { θ T X } ] . This “exponentially tilted measure" is a probability distribution that in many cases has the same parametric form as that of X , so that variates can easily be generated. However, for some distributions, the exponentially tilted distribution does not belong to the same parametric family as f ; an example is the Pareto distribution with f ( x ) = α / ( 1 + x ) α , x > 0 , where f θ ( x ) is well defined for θ < 0 but is not a standard distribution. In such examples, r.v. generation may not always be straightforward.

Example

One-dimensional examples include the normal distribution, the exponential distribution, the binomial distribution and the Poisson distribution. (i) the N ( μ , σ 2 ) distribution, where f θ ( x ) is the N ( μ + θ σ 2 , σ 2 ) density; (ii) the exponential distribution f ( x ) = λ e λ x , where f θ ( x ) = λ θ e λ θ x , where λ θ = λ θ , < θ < λ  ; (iii) the binomial ( N , α ) distribution p ( k ) = ( N k ) α k ( 1 α ) N k , k = 0 , . . . , N , where p θ is binomial ( N , α θ ) with α θ = α e θ / ( 1 α + α e θ ) , α R ; (iv) the Poisson distribution p ( k ) = e μ μ k / k ! , where p θ is Poisson with μ θ = μ e θ . A multivariate example is (v) the N ( μ , C ) distribution, where the exponentially tilted measure is N ( μ + C θ , C ) .

Rare-event simulation

The exponential tilting of X , assuming it exists, supplies a family of distributions that can be used as proposal distributions for acceptance-rejection sampling or importance distributions for importance sampling. One common application is sampling from a distribution conditional on a sub-region of the domain, i.e. X | X A . With an appropriate choice of θ , sampling from P θ can meaningfully reduce the required amount of sampling or the variance of an estimator.

Saddlepoint approximation

The saddlepoint approximation method is a density approximation methodology often used for the distribution of sums and averages of iid random variables that employs edgeworth series expansion, but which generally performs better at extreme values. From the definition of the natural exponential family, it follows that f θ ( x ¯ ) = f ( x ¯ ) e x p { n ( θ x ¯ κ ( θ ) ) } . Applying edgeworth expansion, f θ ( x ¯ ) = ψ ( z ) ( V a r ( X ¯ ) ) 1 / 2 { 1 + ρ 3 ( θ ) h 3 ( z ) 6 + ρ 4 ( θ ) h 4 ( z ) 24 } , where ψ ( z ) is the standard normal density of z = x ¯ κ x ¯ ( θ ) κ x ¯ ( θ ) , ρ n ( θ ) = κ ( n ) ( θ ) { κ ( θ ) n / 2 } , and h n are the hermite polynomials defined by h n ( z ) = 1 n n ψ ( z ) z n ψ ( z ) . When considering values of x ¯ progressively farther from the center of the distribution, | z | and the h n ( z ) terms become unbounded. However, for each value of x ¯ , we can choose θ s.t. κ ( θ ) = x ¯ , i.e. the saddlepoint, so that the above expansion is always evaluated at the expectation of the tilted distribution. This choice of θ leads to the final representation of the approximation f ( x ¯ ) n 2 π κ ( θ ) 1 / 2 e x p { n ( κ ( θ ) θ x ¯ ) }

Rejection sampling

Using the tilted distribution P θ as the proposal, the rejection sampling algorithm prescribes sampling from f θ ( x ) and accepting with probability exp { θ x + κ ( θ ) } c i.e. if Unif ( 0 , 1 ) exp ( θ x + κ ( θ ) ) c , where c = sup x X d P d P θ ( x ) .

Importance sampling

Applying the exponentially tilted distribution as the importance distribution yields the equation E ( h ( X ) ) = E θ ( L ( X ) h ( X ) ) , where L ( X ) = d P d P θ . So, one samples from f θ to estimate the probability under the importance distribution P ( d X ) and then multiply by the likelihood ratio. Moreover, Var ( X ) = E [ ( L ( X ) h ( X ) 2 ] .

Example

Assume i.i.d. X s s.t. κ ( θ ) < . In order to estimate P ( X 1 + . . . X n > c ) , we can employ importance sampling by taking h ( X ) = I ( i = 1 n X i > c ) . The constant c can be rewritten as n a for some other constant a . Then, P ( i = 1 n X i > n a ) = E θ a [ e x p { θ a i = 1 n X i + n κ ( θ a ) } I ( i = 1 n X i > n a ) ] , where θ a denotes the θ defined by the saddle-point equation κ ( θ a ) = E θ a = a .

Stochastic processes

Given the tilting of a normal R.V., it is intuitive that the exponential tilting of X t , a Brownian motion with drift μ and variance σ 2 , is a Brownian motion with drift μ + θ σ 2 and variance σ 2 . Thus, any Brownian motion with drift under P can be thought of as a Brownian motion without drift under P θ . To obverse this, consider the process X t = B t + μ t . f ( X t ) = f θ ( X t ) d P d P θ = f ( B t ) e x p { μ B T + 1 2 μ 2 T } . The likelihood ratio term, e x p { μ B T 1 2 μ 2 T } , is a martingale and commonly denoted M T . Thus, a Brownian motion with drift process (as well as many other continuous processes adapted to the Brownian filtration) is a P θ -martingale.

SDEs

The above leads to the alternate representation of the stochastic differential equation d X ( t ) = μ ( t ) d t + σ ( t ) d B ( t ) : d X θ ( t ) = μ θ ( t ) d t + σ ( t ) d B ( t ) , where μ θ ( t ) = μ ( t ) + θ σ ( t ) . Girsanov's Formula states the likelihood ratio d P d P θ = e x p { 0 T μ θ ( t ) μ ( t ) σ 2 ( t ) d B ( t ) + 0 T ( σ 2 ( t ) 2 ) d t } . Therefore, Girsanov's Formula can be used to implement importance sampling for certain SDEs.

Tilting can also be useful for simulating a process X ( t ) via rejection sampling of the SDE d X ( t ) = μ ( X ( t ) ) d t + d B ( t ) . We may focus on the SDE since we know that X ( t ) can be written 0 t d X ( t ) + X ( 0 ) . As previously stated, a Brownian motion with drift can be tilted to a Brownian motion without drift. Therefore, we choose P p r o p o s a l = P θ . The likelihood ratio d P θ d P ( d X ( s ) : 0 s t ) = τ t e x p { μ ( X ( τ ) ) d X ( τ ) μ ( X ( τ ) ) 2 2 } d t = e x p { 0 t μ ( X ( τ ) ) d X ( τ ) 0 t μ ( X ( s ) ) 2 2 } d t . This likelihood ratio will be denoted M ( t ) . To ensure this is a true likelihood ratio, it must be shown that E [ M ( t ) ] = 1 . Assuming this condition holds, it can be shown that f X ( t ) ( y ) = f X ( t ) θ ( y ) E θ [ M ( t ) | X ( t ) = y ] . So, rejection sampling prescribes that one samples from a standard Brownian motion and accept with probability f X ( t ) ( y ) f X ( t ) θ ( y ) 1 c = 1 c E θ [ M ( t ) | X ( t ) = y ] .

Siegmund's algorithm

Assume i.i.d. X's with light tailed distribution and E [ X ] > 0 . In order to estimate ψ ( x ) = P ( τ ( c ) < ) where τ ( c ) = inf { t : i = 1 t > c } , when x is large and hence ψ ( x ) small, the algorithm uses exponential tilting to derive the importance distribution. The algorithm is used in many aspects, such as sequential tests, G/G/1 queue waiting times, and ψ is used as the probability of ultimate ruin in ruin theory. In this context, it is logical to ensure that P θ ( τ ( c ) < ) = 1 . The criterion θ > θ 0 , where θ 0 is s.t. κ ( θ 0 ) = 0 achieves this. Siegmund's algorithm uses θ = θ , if it exists, where θ is defined in the following way: κ ( θ ) = 0 . It has been shown that θ is the only tilting parameter producing bounded relative error ( lim sup x V a r I A ( x ) P A ( x ) 2 < ).

Black-Box algorithms

We can only see the input and output of a black box, without knowing its structure. The algorithm is to use only minimal information on its structure. When we generate random numbers, the output may not be within the same common parametric class, such as normal or exponential distributions. An automated way may be used to perform ECM. Let X 1 , X 2 , . . . be i.i.d. r.v.’s with distribution G ; for simplicity we assume X 0 . Define F n = σ ( X 1 , . . . , X n , U 1 , . . . , U n ) , where U 1 , U 2 , . . . are independent (0, 1) uniforms. A randomized stopping time for X 1 , X 2 , . . . is then a stopping time w.r.t. the filtration { F n } , . . . Let further G be a class of distributions G on [ 0 , ) with k G = 0 e θ x G ( d x ) < and define G θ by d G θ d G ( x ) = e θ x k G . We define a black-box algorithm for ECM for the given θ and the given class G of distributions as a pair of a randomized stopping time τ and an F τ measurable r.v. Z such that Z is distributed according to G θ for any G G . Formally, we write this as P G ( Z < x ) = G θ ( x ) for all x . In other words, the rules of the game are that the algorithm may use simulated values from G and additional uniforms to produce an r.v. from G θ .

Advantages

In many cases, the tilted distribution belongs to the same parametric family (when the original density is part of an exponential family) as the original which simplifies R.V. generation. Exponential tilting may still be useful if this is not the case, though normalization must be possible and additional sampling algorithms may be needed.

In addition, there exists a simple relationship between the original and tilted c.g.f.s, κ θ ( η ) = l o g ( E θ [ e η X ] ) = κ ( θ + η ) κ ( θ ) . We can see this by observing that F θ ( x ) = x e x p { θ y κ ( θ ) } f ( y ) d y . Thus, κ θ ( η ) = l o g e η x d F θ ( x ) = e η x e θ x κ ( θ ) = l o g E [ e ( n + θ ) X κ ( θ ) ] = l o g ( e κ ( η + θ ) κ ( θ ) ) = κ ( η + θ ) κ ( θ ) . Clearly, this relationship allows for easy calculation of the c.g.f. of the tilted distribution and thus the distributions moments. Moreover, it results in a simple form of the likelihood ratio. Specifically, L = d P d P θ = f ( x ) f θ ( x ) = e θ x + κ ( θ ) .

References

Exponential tilting Wikipedia


Similar Topics