Seasonal Claim Arrivals¶

Here, we set the claim arrival process to be governed by an inhomogenous Poisson process $N_t$ with intensity function

$$ \lambda(t) = a + b \bigl( 1 + \sin(2\pi c t) \bigr) $$

The claim frequency data $n_s,\,s = 1,\ldots, t$ correspond to the increments of this non homogeneous Poisson process. These are independent Poisson variables $\text{Pois}\big(\mu(s)\big)$ with

$$ \mu(s) = \int_{s}^{s+1} \lambda(u) \,\mathrm{d}u = a + b + \frac{b}{2\pi c}\left[\cos(2\pi c s) - \cos(2\pi c (s+1))\right],\quad s = 1,\ldots, t \,. $$

The claim sizes are $U_{i,j} \overset{\mathrm{i.i.d.}}{\sim} \mathsf{Pareto}(\mu, \alpha)$ where the $\mu$ is the scale parameter and $\alpha$ is the shape parameter.

The available data is the total claim sizes $$ X_i = \sum_{j = 1}^{N_i} U_{i,j}, \quad i = 1, \ldots, T. $$

Generate synthetic observations¶

We start by importing some necessary packages.

In [1]:

Copied!





%config InlineBackend.figure_format = 'retina'
%run preamble.py
import approxbayescomp as abc
import numpy as np
import numpy.random as rnd
import matplotlib.pyplot as plt
import seaborn as sns
%config InlineBackend.figure_format = 'retina'
%run preamble.py
import approxbayescomp as abc
import numpy as np
import numpy.random as rnd
import matplotlib.pyplot as plt
import seaborn as sns

We will fit simulated data, so that we know the true value of the parameters for the data-generating process. Here, we start with $a = 1$, $b = 5$, and $c = \frac{1}{50}$ for the cyclical Poisson process, $\mu = 10$ and $\alpha = 2$ for the Pareto claim sizes, and say that we observe $T = 250$ i.i.d. observations of the compound sum.

In [2]:

Copied!





# Distributional families and parameters for the true model
freq = "cyclical_poisson"
sev = "pareto"

# Poisson process parameters
a = 1
b = 5
c = 1 / 50

# Pareto claim size parameters
μ = 10
α = 2

trueTheta = (a, b, c, μ, α)

# Setting the time horizon
T = 250

# Simulating the claim data
rg = rnd.default_rng(123)
freqs, sevs = abc.simulate_claim_data(rg, T, freq, sev, trueTheta)

# Simulating the observed data
psi = abc.Psi("sum")
xData = abc.compute_psi(freqs, sevs, psi)
# Distributional families and parameters for the true model
freq = "cyclical_poisson"
sev = "pareto"

# Poisson process parameters
a = 1
b = 5
c = 1 / 50

# Pareto claim size parameters
μ = 10
α = 2

trueTheta = (a, b, c, μ, α)

# Setting the time horizon
T = 250

# Simulating the claim data
rg = rnd.default_rng(123)
freqs, sevs = abc.simulate_claim_data(rg, T, freq, sev, trueTheta)

# Simulating the observed data
psi = abc.Psi("sum")
xData = abc.compute_psi(freqs, sevs, psi)

Plot the cyclical observations against their expected values¶

In [3]:

Copied!





t = np.arange(T)
cosTerms = np.cos(2 * np.pi * c * t) - np.cos(2 * np.pi * (t + 1) * c)
freqMean = a + b + (b / (2 * np.pi * c)) * cosTerms
severityMean = μ * α / (α - 1)
xMean = freqMean * severityMean

plt.plot(t, xMean, label="$\mathbb{E}[X_s]$", c="k")
plt.plot(t, xData, label="Observed $x_s$", alpha=0.8)

plt.xlabel("$t$")
plt.legend(ncol=2)
sns.despine()
t = np.arange(T)
cosTerms = np.cos(2 * np.pi * c * t) - np.cos(2 * np.pi * (t + 1) * c)
freqMean = a + b + (b / (2 * np.pi * c)) * cosTerms
severityMean = μ * α / (α - 1)
xMean = freqMean * severityMean

plt.plot(t, xMean, label="$\mathbb{E}[X_s]$", c="k")
plt.plot(t, xData, label="Observed $x_s$", alpha=0.8)

plt.xlabel("$t$")
plt.legend(ncol=2)
sns.despine()

No description has been provided for this image

Try a time series distance function¶

As we normally assume that we observe independent and identically distributed data, the default distance function in smc does not take into account the order of the observations. For this example, we need to swap out the distance measure with one which is designed for time series comparison.

Here, we use the dynamic time warping distance, with the implementation from the dtaidistance package. To illustrate this distance, we simulate some more data from the true data-generating process:

In [4]:

Copied!

rg = rnd.default_rng(456)
freqs2, sevs2 = abc.simulate_claim_data(rg, T, freq, sev, trueTheta)
xData2 = abc.compute_psi(freqs2, sevs2, psi)
rg = rnd.default_rng(456)
freqs2, sevs2 = abc.simulate_claim_data(rg, T, freq, sev, trueTheta)
xData2 = abc.compute_psi(freqs2, sevs2, psi)

Plotting the original $x_s$ data alongside this new $x_s'$ data:

In [5]:

Copied!





plt.plot(t, xData, label="Observed $x_s$")
plt.plot(t, xData2, label="Observed $x_s'$", alpha=0.8)
plt.xlabel("$t$")
plt.legend(ncol=3)
sns.despine()
plt.plot(t, xData, label="Observed $x_s$")
plt.plot(t, xData2, label="Observed $x_s'$", alpha=0.8)
plt.xlabel("$t$")
plt.legend(ncol=3)
sns.despine()

The dynamic time warping method speeds up and slows down the time in one time series to better match the other time series. The dtaidistance package has a built-in visualisation to show this matching between the series:

In [6]:

Copied!





from dtaidistance import dtw
from dtaidistance import dtw_visualisation as dtwvis

path = dtw.warping_path(xData, xData2)
fig = dtwvis.plot_warping(xData, xData2, path)
sns.despine()
from dtaidistance import dtw
from dtaidistance import dtw_visualisation as dtwvis

path = dtw.warping_path(xData, xData2)
fig = dtwvis.plot_warping(xData, xData2, path)
sns.despine()

Use ABC to fit the data¶

With this data, we create objects to represent the data-generating process (the model) and the prior distribution. The priors are set as

$$ a \sim \mathsf{Unif}(0, 50), \quad b \sim \mathsf{Unif}(0, 50), \quad c \sim \mathsf{Unif}(\frac{1}{1000}, \frac{1}{10}) $$ $$ \mu \sim \mathsf{Unif}(0, 50), \quad \text{ and } \quad \alpha \sim \mathsf{Unif}(1, 10) \,. $$

In [7]:

Copied!





model = abc.Model("cyclical_poisson", "pareto", psi)
params = ("$a$", "$b$", "$c$", "$\\mu$", "$\\alpha$")
prior = abc.IndependentUniformPrior(
    [(0, 50), (0, 50), (1 / 1000, 1 / 10), (0, 50), (1, 10)], params
)
model = abc.Model("cyclical_poisson", "pareto", psi)
params = ("$a$", "$b$", "$c$", "$\\mu$", "$\\alpha$")
prior = abc.IndependentUniformPrior(
    [(0, 50), (0, 50), (1 / 1000, 1 / 10), (0, 50), (1, 10)], params
)

In [8]:

Copied!

numIters = 10
popSize = 200

%time fit = abc.smc(numIters, popSize, xData, model, prior, distance=dtw.distance, numProcs=8, seed=1)
numIters = 10
popSize = 200

%time fit = abc.smc(numIters, popSize, xData, model, prior, distance=dtw.distance, numProcs=8, seed=1)

CPU times: user 374 ms, sys: 128 ms, total: 502 ms
Wall time: 1min 46s

These particles all generated fake data within the following distance to the observed data:

In [9]:

Copied!

np.max(fit.dists)
np.max(fit.dists)

Out[9]:

886.1586423863746

Plotting the fitted ABC posterior:

In [10]:

Copied!





abc.plot_posteriors(
    fit,
    prior,
    refLines=trueTheta,
)
abc.plot_posteriors(
    fit,
    prior,
    refLines=trueTheta,
)

This ABC posterior is quite accurate.