Approximate Bayesian Computation and Insurance

Patrick Laub

UNSW Sydney, School of Risk and Actuarial Studies

February 3, 2023


What is the bias?

We flip a coin three times and get:

Heads

Tails

Heads

What is \(\theta = \mathbb{P}(\text{Heads})\)?

Let’s flip coins

https://pat-laub.github.io/abc-party/game

\(\theta\)-biased coin gives \((\text{H, T, H})\)

Getting an exact match is hard…

Accept close enough

The ‘approximate’ part of ABC

Statistics with likelihoods

We can only find the likelihood for simple models.

\[ X_1, X_2 \overset{\mathrm{i.i.d.}}{\sim} f_X(\,\cdot\,) \]

\[ \Rightarrow X_1 + X_2 \sim ~ \texttt{Intractable likelihood}! \]

Have a sample of \(n\) i.i.d. observations. As \(n\) increases, \[ p_{\boldsymbol{X}}(\boldsymbol{x} \mid \boldsymbol{\theta}) = \prod \text{Small things} \overset{\dagger}{=} 0, \] or just takes a long time to compute, then \(\texttt{Intractable}\) \(\texttt{Likelihood}\)!

Usually it’s still possible to simulate these things…

Insurance Motivation

Have a random number of claims \(N \sim p_N( \,\cdot\, ; \boldsymbol{\theta}_{\mathrm{freq}} )\).

Random claim sizes \(U_1, \dots, U_N \sim f_U( \,\cdot\, ; \boldsymbol{\theta}_{\mathrm{sev}} )\).

We aggregate them somehow, like:

  • aggregate claims: \(X = \sum_{i=1}^N U_i\)
  • maximum claims: \(X = \max_{i=1}^N U_i\)
  • stop-loss: \(X = ( \sum_{i=1}^N U_i - c )_+\).

Question: Given a sample \(X_1, \dots, X_n\) of the summaries, what is the \(\boldsymbol{\theta} = (\boldsymbol{\theta}_{\mathrm{freq}}, \boldsymbol{\theta}_{\mathrm{sev}})\) which explains them?

E.g. a reinsurance contract

Dependent random sums

\[ N \sim \mathsf{Poisson}(\lambda), \quad U_i \mid N \sim \mathsf{Exp}(\beta\times \mathrm{e}^{\delta N}), \quad X = \sum_{i=1}^N U_i. \]

Posteriors for \(\lambda\), \(\beta\), and \(\delta\) with 50 sums and 250 sums.

\(\lambda\)

\(\beta\)

\(\delta\)

Time-varying arrival rate

\[\lambda(t) = a+b[1+\sin(2\pi c t)]\]

Intensity

Time-varying example

Sizes are \(U_i \sim \mathsf{Lognormal}(\mu, \sigma)\), observe \(X_s = \sum_{i = N_{s-1}}^{N_{s}}U_i\).

\(a\)

\(b\)

\(c\)

\(\mu\)

\(\sigma\)

Posteriors for \(a\), \(b\), \(c\), \(\mu\), and \(\sigma\) with 50 sums and 250 sums.

Python package

R package

Stats problem \(\to\) computer problem

“… you can just throw more and more computers at the problem, and that’s much much easier than throwing more brains at a problem.” Hadley Wickham

Get involved!

😊 Give it a try, feedback would be very welcome.

😍 We’d love contributions!

Empirical Dynamic Modelling

Cf. https://pat-laub.github.io.

Deep Learning for Actuaries

Appendix

Dependent random sums code


import approxbayescomp as abc
 
# Load data to fit
obsData = ...
 
# Frequency-Loss Model
freq = "poisson"
sev = "frequency dependent exponential"
psi = abc.Psi("sum") # Aggregation process
 
# Fit the model to the data using ABC
prior = abc.IndependentUniformPrior([(0, 10), (0, 20), (-1, 1)])
model = abc.Model(freq, sev, psi, prior)
fit = abc.smc(numIters, popSize, obsData, model)

Does it work in theory?

Proposition: Say we have continuous data \(\boldsymbol{x}_{\text{obs}}\), and our prior \(\pi(\boldsymbol{\theta})\) has bounded support.

If for some \(\epsilon \ge 0\) we have

\[ \sup\limits_{ (\boldsymbol{z}, \boldsymbol{\theta}) : \mathcal{D}(\boldsymbol{z}, \boldsymbol{x}_{\text{obs}} ) < \epsilon, \boldsymbol{\theta} \in \boldsymbol{ \Theta } } \pi( \boldsymbol{z} \mid \boldsymbol{ \theta }) < \infty \]

then for each \(\boldsymbol{\theta} \in \boldsymbol{\Theta}\)

\[ \lim_{\epsilon \to 0} \pi_\epsilon(\boldsymbol{\theta} \mid \boldsymbol{x}_{\text{obs}}) = \pi(\boldsymbol{\theta} \mid \boldsymbol{x}_{\text{obs}}) . \]

\(\square\)

We sample the approximate/ABC posterior.

\[ \pi_\epsilon(\boldsymbol{\theta} \mid \boldsymbol{x}_{\text{obs}}) \propto \pi(\theta) \times \mathbb{P}\bigl( \mathcal{D}\bigl(\boldsymbol{x}_{\text{obs}}, \boldsymbol{x}^{\ast} \bigr) \leq \epsilon \text{ where } \boldsymbol{x}^{\ast} \sim \boldsymbol{\theta} \bigr) , \]

but we care about the true posterior \(\pi(\boldsymbol{\theta} \mid \boldsymbol{x}_{\text{obs}})\).

Mixed data

We filled in some of the blanks for mixed data. Our data was mostly continuous data but had an atom at 0.

Get \[ \lim_{\epsilon \to 0} \pi_\epsilon(\boldsymbol{\theta} \mid \boldsymbol{x}_{\text{obs}}) \pi(\boldsymbol{\theta} \mid \boldsymbol{x}_{\text{obs}}) \]

when

\[ \mathcal{D}(\boldsymbol{z}, \boldsymbol{x}_{\text{obs}}) = \begin{cases} \mathcal{D}^+(\boldsymbol{z}^+, \boldsymbol{x}_{\text{obs}^+}) & \text{if } \# \text{Zeros}(\boldsymbol{z}) = \# \text{Zeros}(\boldsymbol{x}_{\text{obs}}) , \\ \infty & \text{otherwise}. \end{cases} \]

ABC Sequential Monte Carlo

  • Input: data \(\boldsymbol{x}_{\text{obs}}\), prior \(\pi(\boldsymbol{\theta})\), distance \(\mathcal{D}(\cdot,\cdot)\), # of generations \(G\), # of particles \(K\)
  • Start with \(\epsilon_0 = \infty\), \(\pi_0(\boldsymbol{\theta} \mid \boldsymbol{x}_{\text{obs}}) = \pi(\boldsymbol{\theta})\)
  • For each generation \(g = 1\) to \(G\)
    • For each particle \(k = 1\) to \(K\)
      • Repeatedly:
        • Generate a guess \(\boldsymbol{\theta}^{\ast}\) from \(\pi_{g-1}(\boldsymbol{\theta} \mid \boldsymbol{x}_{\text{obs}})\)
        • Generate fake data \(\boldsymbol{x}^{\ast}\) from \(\boldsymbol{\theta}^{\ast}\)
        • Stop when \(\mathcal{D}(\boldsymbol{x}^\ast , \boldsymbol{x}_{\text{obs}}) < \epsilon_{g-1}\)
      • Store \(\boldsymbol{\theta}_k^g = \boldsymbol{\theta}^{\ast}\)
    • Create a new threshold \(\epsilon_g \le \epsilon_{g-1}\) and a new population by discarding particles with \(\mathcal{D}(\boldsymbol{x}_k^g , \boldsymbol{x}_{\text{obs}}) \ge \epsilon_{g}\) until the effective sample size is \(K / 2\)
    • Weight each particle by \(w_k^g \propto \pi(\boldsymbol{\theta}_k^g) / \pi_{g-1}( \boldsymbol{\theta}_k^g \mid \boldsymbol{x}_{\text{obs}} )\)
    • Create a KDE \(\pi_g(\boldsymbol{\theta} \mid \boldsymbol{x}_{\text{obs}})\) based on the surviving \(( \boldsymbol{\theta}_k^g , w_k^g )\) particles