Frequency-Dependent Claim Sizes¶

In this example, we look at a compound sum where the claim frequency and the claim sizes are dependent.

Specifically, let's say that the claim frequency variables are Negative Binomial distributed

$$ N_i \overset{\mathrm{i.i.d.}}{\sim} \textsf{Poisson}(\lambda), \quad i = 1, \dots, T $$

and the individual claim sizes are freqency dependent exponential, which means that

$$ U_{i,1} \ldots, U_{i, N_i} \,|\, N_i \overset{\mathrm{i.i.d.}}{\sim} \textsf{Exp}(\beta\times \mathrm{e}^{\delta N_i}), \quad i = 1, \dots, T. $$

The available data is the total claim sizes

$$ X_i = \sum_{j = 1}^{N_i} U_{i,j}, \quad i = 1, \ldots, T. $$

Generating some synthetic data to fit¶

We start by importing some necessary packages.

In [1]:

Copied!





%config InlineBackend.figure_format = 'retina'
import approxbayescomp as abc
import numpy as np
import numpy.random as rnd
%config InlineBackend.figure_format = 'retina'
import approxbayescomp as abc
import numpy as np
import numpy.random as rnd

We will fit simulated data, so that we know the true value of the parameters for the data-generating process. Here, we start with $\lambda = 4$, $\beta = 2$, and $\delta = 0.2$, and say that we observe $T = 100$ i.i.d. observations of the compound sum.

In [2]:

Copied!





# Create a pseudorandom number generator
rg = rnd.default_rng(1234)

# Parameters of the true model
freq = "poisson"
sev = "frequency dependent exponential"
λ = 4
β = 2
δ = 0.2
trueTheta = (λ, β, δ)

# Setting the time horizon
T = 100

# Simulating the claim data
freqs, sevs = abc.simulate_claim_data(rg, T, freq, sev, trueTheta)

# Simulating the observed data
psi = abc.Psi("sum")
xData = abc.compute_psi(freqs, sevs, psi)
# Create a pseudorandom number generator
rg = rnd.default_rng(1234)

# Parameters of the true model
freq = "poisson"
sev = "frequency dependent exponential"
λ = 4
β = 2
δ = 0.2
trueTheta = (λ, β, δ)

# Setting the time horizon
T = 100

# Simulating the claim data
freqs, sevs = abc.simulate_claim_data(rg, T, freq, sev, trueTheta)

# Simulating the observed data
psi = abc.Psi("sum")
xData = abc.compute_psi(freqs, sevs, psi)

We can see if any of this observed data contains pesky zeros:

In [3]:

Copied!

np.sum(xData == 0)
np.sum(xData == 0)

Out[3]:

Use ABC to fit the data¶

With this data, we create objects to represent the data-generating process (the model) and the prior distribution. The priors are set as $\lambda \sim \mathsf{Unif}(0, 10)$, $\beta \sim \mathsf{Unif}(0, 20)$, and $\delta \sim \mathsf{Unif}(-1, 1).$

In [4]:

Copied!

model = abc.Model(freq, sev, psi)
params = ("$\\lambda$", "$\\beta$", "$\\delta$")
prior = abc.IndependentUniformPrior([(0, 10), (0, 20), (-1, 1)], params)
model = abc.Model(freq, sev, psi)
params = ("$\\lambda$", "$\\beta$", "$\\delta$")
prior = abc.IndependentUniformPrior([(0, 10), (0, 20), (-1, 1)], params)

After, we call the main smc method which is provided by approxbayescomp to fit the observed xData.

Ignoring the zeros¶

In [5]:

Copied!

numIters = 8
popSize = 100
%time fit = abc.smc(numIters, popSize, xData, model, prior, seed=1)
numIters = 8
popSize = 100
%time fit = abc.smc(numIters, popSize, xData, model, prior, seed=1)

CPU times: user 3.03 s, sys: 20.8 ms, total: 3.05 s
Wall time: 3.07 s

These particles all generated fake data within the following distance to the observed data:

In [6]:

Copied!

np.max(fit.dists)
np.max(fit.dists)

Out[6]:

2.9358266608426806

Plotting the fitted ABC posterior:

In [7]:

Copied!

abc.plot_posteriors(
    fit, prior, refLines=trueTheta
)
abc.plot_posteriors(
    fit, prior, refLines=trueTheta
)

No description has been provided for this image

This quick fit managed to capture $\delta$ quite well, $\beta$ with moderate success, and $\lambda$ with slightly less success.

Matching the zeros¶

The observed data does have zeros, so enabling the matchZeros flag should increase the accuracy (and computation time) of the fits.

In [8]:

Copied!

%time fitMatchZeros = abc.smc(numIters, popSize, xData, model, prior, matchZeros=True, seed=1)
%time fitMatchZeros = abc.smc(numIters, popSize, xData, model, prior, matchZeros=True, seed=1)

CPU times: user 9.5 s, sys: 7.96 ms, total: 9.5 s
Wall time: 9.59 s

These particles all generated fake data within the following distance to the observed data:

In [9]:

Copied!

np.max(fitMatchZeros.dists)
np.max(fitMatchZeros.dists)

Out[9]:

2.3066212101015164

Plotting the fitted ABC posterior:

In [10]:

Copied!





abc.plot_posteriors(
    fitMatchZeros,
    prior,
    refLines=trueTheta,
)
abc.plot_posteriors(
    fitMatchZeros,
    prior,
    refLines=trueTheta,
)