Monte Carlo distribution shifting technique
Exponential Tilting (ET), Exponential Twisting, or Exponential Change of Measure (ECM) is a distribution shifting technique used in many parts of mathematics. The different exponential tiltings of a random variable
is known as the natural exponential family of
.
Exponential Tilting is used in Monte Carlo Estimation for rare-event simulation, and rejection and importance sampling in particular. In mathematical finance Exponential Tilting is also known as Esscher tilting (or the Esscher transform), and often combined with indirect Edgeworth approximation and is used in such contexts as insurance futures pricing.
The earliest formalization of Exponential Tilting is often attributed to Esscher with its use in importance sampling being attributed to David Siegmund.
Overview
Given a random variable
with probability distribution
, density
, and moment generating function (MGF)
, the exponentially tilted measure
is defined as follows:
![{\displaystyle \mathbb {P} _{\theta }(X\in dx)={\frac {\mathbb {E} [e^{\theta X}\mathbb {I} [X\in dx]]}{M_{X}(\theta )}}=e^{\theta x-\kappa (\theta )}\mathbb {P} (X\in dx),}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e5c83765d8cf0a679fe792e3fe44311925ca4ee7)
where
is the cumulant generating function (CGF) defined as
![{\displaystyle \kappa (\theta )=\log \mathbb {E} [e^{\theta X}]=\log M_{X}(\theta ).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5b230e52212d9ee80ef57ac623b8ea1bdebcc54d)
We call

the
-tilted density of
. It satisfies
.
The exponential tilting of a random vector
has an analogous definition:

where
.
Example
The exponentially tilted measure in many cases has the same parametric form as that of
. One-dimensional examples include the normal distribution, the exponential distribution, the binomial distribution and the Poisson distribution.
For example, in the case of the normal distribution,
the tilted density
is the
density. The table below provides more examples of tilted densities.
For some distributions, however, the exponentially tilted distribution does not belong to the same parametric family as
. An example of this is the Pareto distribution with
, where
is well defined for
but is not a standard distribution. In such examples, the random variable generation may not always be straightforward.
In statistical mechanics, the energy of a system in equilibrium with a heat bath has the Boltzmann distribution:
, where
is the inverse temperature. Exponential tilting then corresponds to changing the temperature:
.
Similarly, the energy and particle number of a system in equilibrium with a heat and particle bath has the grand canonical distribution:
, where
is the chemical potential. Exponential tilting then corresponds to changing both the temperature and the chemical potential.
Advantages
In many cases, the tilted distribution belongs to the same parametric family as the original. This is particularly true when the original density belongs to the exponential family of distribution. This simplifies random variable generation during Monte-Carlo simulations. Exponential tilting may still be useful if this is not the case, though normalization must be possible and additional sampling algorithms may be needed.
In addition, there exists a simple relationship between the original and tilted CGF,
![{\displaystyle \kappa _{\theta }(\eta )=\log(\mathbb {E} _{\theta }[e^{\eta X}])=\kappa (\theta +\eta )-\kappa (\theta ).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8390d7660e1b835916a6bec6298004a7f069e194)
We can see this by observing that

Thus,
.
Clearly, this relationship allows for easy calculation of the CGF of the tilted distribution and thus the distributions moments. Moreover, it results in a simple form of the likelihood ratio. Specifically,
.
Properties
- If
is the CGF of
, then the CGF of the
-tilted
is

- This means that the
-th cumulant of the tilted
is
. In particular, the expectation of the tilted distribution is
.
- The variance of the tilted distribution is
.
- Repeated tilting is additive. That is, tilting first by
and then
is the same as tilting once by
.
- If
is the sum of independent, but not necessarily identical random variables
, then the
-tilted distribution of
is the sum of
each
-tilted individually.
- If
, then
is the Kullback–Leibler divergence
![{\displaystyle D_{\text{KL}}(P\parallel P_{\theta })=\mathrm {E} \left[\log {\tfrac {P}{P_{\theta }}}\right]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c6f783cc653b281b7802ed5e7c42ca21b704f92c)
- between the tilted distribution
and the original distribution
of
.
- Similarly, since
, we have the Kullback-Leibler divergence as
.
Applications
Rare-event simulation
The exponential tilting of
, assuming it exists, supplies a family of distributions that can be used as proposal distributions for acceptance-rejection sampling or importance distributions for importance sampling. One common application is sampling from a distribution conditional on a sub-region of the domain, i.e.
. With an appropriate choice of
, sampling from
can meaningfully reduce the required amount of sampling or the variance of an estimator.
Saddlepoint approximation
The saddlepoint approximation method is a density approximation methodology often used for the distribution of sums and averages of independent, identically distributed random variables that employs Edgeworth series, but which generally performs better at extreme values. From the definition of the natural exponential family, it follows that
.
Applying the Edgeworth expansion for
, we have
![{\displaystyle f_{\theta }({\bar {x}})=\psi (z)(\mathrm {Var} [{\bar {X}}])^{-1/2}\left\{1+{\frac {\rho _{3}(\theta )h_{3}(z)}{6}}+{\frac {\rho _{4}(\theta )h_{4}(z)}{24}}\dots \right\},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/435869fa3ab414cc6db11d6274ca92b942972b07)
where
is the standard normal density of
,
,
and
are the hermite polynomials.
When considering values of
progressively farther from the center of the distribution,
and the
terms become unbounded. However, for each value of
, we can choose
such that

This value of
is referred to as the saddle-point, and the above expansion is always evaluated at the expectation of the tilted distribution. This choice of
leads to the final representation of the approximation given by

Rejection sampling
Using the tilted distribution
as the proposal, the rejection sampling algorithm prescribes sampling from
and accepting with probability

where

That is, a uniformly distributed random variable
is generated, and the sample from
is accepted if

Importance sampling
Applying the exponentially tilted distribution as the importance distribution yields the equation
,
where

is the likelihood function. So, one samples from
to estimate the probability under the importance distribution
and then multiplies it by the likelihood ratio. Moreover, we have the variance given by
.
Example
Assume independent and identically distributed
such that
. In order to estimate
, we can employ importance sampling by taking
.
The constant
can be rewritten as
for some other constant
. Then,
,
where
denotes the
defined by the saddle-point equation
.
Stochastic processes
Given the tilting of a normal R.V., it is intuitive that the exponential tilting of
, a Brownian motion with drift
and variance
, is a Brownian motion with drift
and variance
. Thus, any Brownian motion with drift under
can be thought of as a Brownian motion without drift under
. To observe this, consider the process
.
. The likelihood ratio term,
, is a martingale and commonly denoted
. Thus, a Brownian motion with drift process (as well as many other continuous processes adapted to the Brownian filtration) is a
-martingale.
Stochastic Differential Equations
The above leads to the alternate representation of the stochastic differential equation
:
, where
=
. Girsanov's Formula states the likelihood ratio
. Therefore, Girsanov's Formula can be used to implement importance sampling for certain SDEs.
Tilting can also be useful for simulating a process
via rejection sampling of the SDE
. We may focus on the SDE since we know that
can be written
. As previously stated, a Brownian motion with drift can be tilted to a Brownian motion without drift. Therefore, we choose
. The likelihood ratio 
. This likelihood ratio will be denoted
. To ensure this is a true likelihood ratio, it must be shown that
. Assuming this condition holds, it can be shown that
. So, rejection sampling prescribes that one samples from a standard Brownian motion and accept with probability
.
Choice of tilting parameter
Siegmund's algorithm
Assume i.i.d. X's with light tailed distribution and
. In order to estimate
where
, when
is large and hence
small, the algorithm uses exponential tilting to derive the importance distribution. The algorithm is used in many aspects, such as sequential tests, G/G/1 queue waiting times, and
is used as the probability of ultimate ruin in ruin theory. In this context, it is logical to ensure that
. The criterion
, where
is s.t.
achieves this. Siegmund's algorithm uses
, if it exists, where
is defined in the following way:
. It has been shown that
is the only tilting parameter producing bounded relative error (
).
Black-Box algorithms
We can only see the input and output of a black box, without knowing its structure. The algorithm is to use only minimal information on its structure. When we generate random numbers, the output may not be within the same common parametric class, such as normal or exponential distributions. An automated way may be used to perform ECM. Let
be i.i.d. r.v.’s with distribution
; for simplicity we assume
. Define
, where
, . . . are independent (0, 1) uniforms. A randomized stopping time for
, . . . is then a stopping time w.r.t. the filtration
, . . . Let further
be a class of distributions
on
with
and define
by
. We define a black-box algorithm for ECM for the given
and the given class
of distributions as a pair of a randomized stopping time
and an
measurable r.v.
such that
is distributed according to
for any
. Formally, we write this as
for all
. In other words, the rules of the game are that the algorithm may use simulated values from
and additional uniforms to produce an r.v. from
.
See also
References