Rare-event simulation

Patrick Laub

Lecture 1 (Part 2)

Agenda

Now:

  • Importance sampling
  • Exponential tilting

In the code:

  • Importance sampling
  • Exponential tilting and ruin probabilities

The reason CMC just returns 0

Consider relative-error $|\hat{\ell} - \ell | / \ell = \sqrt{ (\hat{\ell} - \ell)^2 } / \ell$

$$ \mathrm{RelativeError} = \sqrt{ \mathbb{E} \bigl[ \bigl( \frac{ \hat{\ell} - \ell }{\ell} \bigr)^2 \bigr] } = \sqrt{ \frac{ \mathbb{V}\mathrm{ar} [\hat{\ell}] }{ \ell^2 } } . $$

Since $R \cdot \hat{\ell}(R) \sim \mathsf{Binomial}(R, \ell)$, we know $\mathbb{V}\mathrm{ar} [ R \cdot \hat{\ell}(R) ] = R \ell (1 - \ell)$, so

$$ \Rightarrow \mathrm{RelativeError} = \sqrt{ \frac{ (1 - \ell) }{ R \ell } } \approx \sqrt{ \frac{1}{ R \ell } } .$$

As $\ell \searrow 0$ this $\nearrow \infty$. E.g. if $\ell \approx 10^{-6}$, and we want a RE of $0.01$, need $R \approx 10^{10}$.

Importance sampling

For $X \sim f(\cdot)$, $$ \mathbb{P}(X > \gamma) = \mathbb{E}_f[ 1\{ X > \gamma\} ] \approx \frac{1}{R} \sum_{r=1}^R 1\{ X_r > \gamma\} =: \hat{\ell}_{\mathrm{CMC}} $$ where $X_r \overset{\mathrm{i.i.d.}}{\sim} f(\cdot).$ For another density $g(\cdot)$, called the IS proposal density,

$$ \mathbb{P}(X > \gamma) = \int_{-\infty}^{\infty} 1\{ x > \gamma\} f(x) \frac{g(x)}{g(x)} \,\mathrm{d}x = \int_{-\infty}^{\infty} 1\{ x > \gamma\} \frac{f(x)}{g(x)} g(x) \,\mathrm{d}x $$$$ \Rightarrow \mathbb{P}(X > \gamma) = \mathbb{E}_g \bigl[ 1\{ X > \gamma\} \frac{f(X)}{g(X)} \bigr] \approx \frac{1}{R} \sum_{r=1}^R 1\{ X_r > \gamma\} \frac{f(X_r)}{g(X_r)} =: \hat{\ell}_{\mathrm{IS}} $$

where $X_r \overset{\mathrm{i.i.d.}}{\sim} g(\cdot).$

Technical note Make sure $g(x) > 0$ whenever $1 \{ x > \gamma \}$, otherwise we have a division by zero.

Swapping exponential distributions

Want $\mathbb{P}(X > 1)$ when $X \sim \mathsf{Exp}(10^3)$, but instead simulate proposals from $\mathsf{Exp}(1)$.

Call the p.d.f.'s $f(\cdot)$ and $g(\cdot)$, then

$$ L(x) = \frac{f(x)}{g(x)} = \frac{10^3 \mathrm{e}^{-10^3 x}}{ \mathrm{e}^{-x}} = 10^3 \mathrm{e}^{-999 x} \,. $$

Simulate $X_i \overset{\mathrm{i.i.d.}}{\sim} \mathsf{Exp}(1)$, and get $X_1 = 0.31$, $X_2 = 2.24$, $X_3 = 0.97$:

$$ \begin{aligned} \mathbb{P}_f(X > 1) &= \mathbb{E}_g[ 1\{X > 1\} L(X) ] \approx \frac1R \sum_{r=1}^R 1\{X_r > 1\} L(X_r) \\ &= \frac13 \bigl[ 0\cdot L(X_1) + 1\cdot 10^3 \mathrm{e}^{-999\cdot 2.24} + 0\cdot L(X_3) \bigr] \,. \end{aligned} $$

Importance sampling, mixed results

Know

$$ \mathbb{E}[ \hat{\ell}_{\mathrm{IS}} ] = \mathbb{E} [ \hat{\ell}_{\mathrm{CMC}} ] \,.$$

Don't know but definitely want

$$ \mathbb{V}\mathrm{ar}[ \hat{\ell}_{\mathrm{IS}} ] < \mathbb{V}\mathrm{ar}[ \hat{\ell}_{\mathrm{CMC}} ] .$$

Say $X_1, X_2 \sim \mathsf{Normal}(0, 1)$, $\mathbb{C}\mathrm{ov}(X_1, X_2) = 0.8$.

Want $\mathbb{P}( \max\{X_1, X_2\} > 3)$. Use IS for various $\mathsf{Normal}(\mu, 1)$ proposals.

Optimal IS proposal density

What is the optimal IS density for $\ell = \mathbb{P}(X > \gamma)$ for $X \sim f(\cdot)$?

$$ g^*(x) \propto 1\{ x > \gamma \} f(x) $$
$$ g^*(x) = \frac{1\{ x > \gamma \} f(x)}{\ell} $$

Unbiased and zero variance estimator $$ \hat{\ell}_{\mathrm{IS}} = 1\{ X_1 > \gamma\} \frac{f(X_1)}{g^*(X_1)} \quad \text{where } X_1 \sim g^* $$ and simplifying $$ \hat{\ell}_{\mathrm{IS}} = 1\{ X_1 > \gamma\} \frac{f(X_1)}{\frac{1\{ X_1 > \gamma \} f(X_1)}{\ell}} = \ell $$ which is a constant so $\mathbb{V}\mathrm{ar}[ \hat{\ell}_{\mathrm{IS}} ] = 0$.

Exponential tilting

Common technique for IS is to use exponential tilting. For a p.d.f. $f(x)$ the exponentially tilted p.d.f. is

$$ f(x; \theta) = \frac{\mathrm{e}^{\theta x} f(x)}{ M(\theta)} $$

where $M(\theta) = \mathbb{E}[ \mathrm{e}^{\theta X}] = \int \mathrm{e}^{\theta y} f(y) \,\mathrm{d} y$.

E.g. $\mathsf{Gamma}(r, m)$ distribution: has p.d.f.

$$ f(x) = x^{r-1} \mathrm{e}^{-\frac{x}{m}} / (\Gamma(r) m^r )$$

and moment generating function

$$ M(\theta) = (1-m \theta)^{-r} , \quad t < \frac1m. $$

With $\tilde{m}(\theta) := \frac{m}{(1-m\theta)}$, then

$$ \begin{aligned} f(x; \theta) &= \frac{ \mathrm{e}^{\theta x} }{(1-m\theta)^{-r}} \cdot x^{r-1} \frac{ \mathrm{e}^{-\frac{x}{m}}}{ \Gamma(r) m^r } \\ &= \frac{ x^{r-1} \mathrm{e}^{\theta x -\frac{x}{m}} }{\Gamma(r) m^r (1-m\theta)^{-r}} \\ &= \frac{ x^{r-1} \mathrm{e}^{-\frac{x}{\tilde{m}(\theta)}} }{ \Gamma(r) \tilde{m}(\theta)^r } . \end{aligned} $$

So, $\mathsf{Gamma}(r,m)$ tilted by $\theta$ is $\mathsf{Gamma}(r,\frac{m}{1-m\theta})$. Not really a new family of distributions at all!

Same for binomial, Poisson, normal. Not the same for Pareto.

Exponential tilting notes

$$ f(x; \theta) = \frac{\mathrm{e}^{\theta x} f(x)}{ M(\theta)} $$

What's the largest $\theta$ we can choose? Need $M(\theta) < \infty$, so

$$ \overline{\theta} := \sup \{ \theta : M(\theta) < \infty \} .$$

If $\overline{\theta} > 0$, then $f(x)$ is a light tailed distribution, and we can use exponential tilting with a positive $\theta$. E.g. exponential, geometric, gamma, Gumbel, normal, Poisson, Wald, Weibull (with shape parameter $\ge 1$).

If $\overline{\theta} = 0$, then $f(x)$ is a heavy tailed distribution, and we can't use exponential tilting with a positive $\theta$. E.g. Pareto, lognormal, Cauchy, Weibull (with shape parameter $<1$).