Categorical Variables

ACTL3143 & ACTL5111 Deep Learning for Actuaries

Patrick Laub

Preprocessing

Lecture Outline

  • Preprocessing

  • French Motor Claims & Poisson Regression

  • Ordinal Variables

  • Categorical Variables & Entity Embeddings

  • Keras’ Functional API

  • French Motor Dataset with Embeddings

  • Scale By Exposure

Keras model methods

  • compile: specify the loss function and optimiser
  • fit: learn the parameters of the model
  • predict: apply the model
  • evaluate: apply the model and calculate a metric


random.seed(12)
model = Sequential()
model.add(Dense(1, activation="relu"))
model.compile("adam", "poisson")
model.fit(X_train, y_train, verbose=0)
y_pred = model.predict(X_val, verbose=0)
print(model.evaluate(X_val, y_val, verbose=0))
4.944334506988525

Scikit-learn model methods

  • fit: learn the parameters of the model
  • predict: apply the model
  • score: apply the model and calculate a metric


model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_val)
print(model.score(X_val, y_val))
-0.666850597951445

Scikit-learn preprocessing methods

  • fit: learn the parameters of the transformation
  • transform: apply the transformation
  • fit_transform: learn the parameters and apply the transformation
scaler = StandardScaler()
scaler.fit(X_train)
X_train_sc = scaler.transform(X_train)
X_val_sc = scaler.transform(X_val)
X_test_sc = scaler.transform(X_test)

print(X_train_sc.mean(axis=0))
print(X_train_sc.std(axis=0))
print(X_val_sc.mean(axis=0))
print(X_val_sc.std(axis=0))
[ 2.97e-17 -2.18e-17  1.98e-17 -5.65e-17]
[1. 1. 1. 1.]
[-0.34  0.07 -0.27 -0.82]
[1.01 0.66 1.26 0.89]
scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)
X_val_sc = scaler.transform(X_val)
X_test_sc = scaler.transform(X_test)

print(X_train_sc.mean(axis=0))
print(X_train_sc.std(axis=0))
print(X_val_sc.mean(axis=0))
print(X_val_sc.std(axis=0))
[ 2.97e-17 -2.18e-17  1.98e-17 -5.65e-17]
[1. 1. 1. 1.]
[-0.34  0.07 -0.27 -0.82]
[1.01 0.66 1.26 0.89]

Summary of the splitting

Dataframes & arrays

X_test.head(3)
x1 x2 x3 x4
83 0.075805 -0.677162 0.975120 -0.147057
53 0.954002 0.651391 -0.315269 0.758969
70 0.113517 0.662131 1.586017 -1.237815
X_test_sc
array([[ 0.13, -0.64,  0.89, -0.4 ],
       [ 1.15,  0.67, -0.44,  0.62],
       [ 0.18,  0.68,  1.52, -1.62],
       [ 0.77, -0.82, -1.22,  0.31],
       [ 0.06,  1.46, -0.39,  2.83],
       [ 2.21,  0.49, -1.34,  0.51],
       [-0.57,  0.53, -0.02,  0.86],
       [ 0.16,  0.61, -0.96,  2.12],
       [ 0.9 ,  0.2 , -0.23, -0.57],
       [ 0.62, -0.11,  0.55,  1.48],
       [ 0.  ,  1.57, -2.81,  0.69],
       [ 0.96, -0.87,  1.33, -1.81],
       [-0.64,  0.87,  0.25, -1.01],
       [-1.19,  0.49, -1.06,  1.51],
       [ 0.65,  1.54, -0.23,  0.22],
       [-1.13,  0.34, -1.05, -1.82],
       [ 0.02,  0.14,  1.2 , -0.9 ],
       [ 0.68, -0.17, -0.34,  1.  ],
       [ 0.44, -1.72,  0.22, -0.66],
       [ 0.73,  2.19, -1.13, -0.87],
       [ 2.73, -1.82,  0.59, -2.04],
       [ 1.04, -0.13, -0.13, -1.36],
       [-0.14,  0.43,  1.82, -0.04],
       [-0.24, -0.72, -1.03, -1.15],
       [ 0.28, -0.57, -0.04, -0.66]])

Note

By default, when you pass sklearn a DataFrame it returns a numpy array.

Keep as a DataFrame


From scikit-learn 1.2:

from sklearn import set_config
set_config(transform_output="pandas")

imp = SimpleImputer()
imp.fit(X_train)
X_train_imp = imp.fit_transform(X_train)
X_val_imp = imp.transform(X_val)
X_test_imp = imp.transform(X_test)
X_test_imp
x1 x2 x3 x4
83 0.075805 -0.677162 0.975120 -0.147057
53 0.954002 0.651391 -0.315269 0.758969
... ... ... ... ...
42 -0.245388 -0.753736 -0.889514 -0.815810
69 0.199060 -0.600217 0.069802 -0.385314

25 rows × 4 columns

French Motor Claims & Poisson Regression

Lecture Outline

  • Preprocessing

  • French Motor Claims & Poisson Regression

  • Ordinal Variables

  • Categorical Variables & Entity Embeddings

  • Keras’ Functional API

  • French Motor Dataset with Embeddings

  • Scale By Exposure

French motor dataset

Download the dataset if we don’t have it already.

from pathlib import Path
from sklearn.datasets import fetch_openml

if not Path("french-motor.csv").exists():
    freq = fetch_openml(data_id=41214, as_frame=True).frame
    freq.to_csv("french-motor.csv", index=False)
else:
    freq = pd.read_csv("french-motor.csv")

freq

French motor dataset

IDpol ClaimNb Exposure Area VehPower VehAge DrivAge BonusMalus VehBrand VehGas Density Region
0 1.0 1.0 0.10000 D 5.0 0.0 55.0 50.0 B12 Regular 1217.0 R82
1 3.0 1.0 0.77000 D 5.0 0.0 55.0 50.0 B12 Regular 1217.0 R82
2 5.0 1.0 0.75000 B 6.0 2.0 52.0 50.0 B12 Diesel 54.0 R22
... ... ... ... ... ... ... ... ... ... ... ... ...
678010 6114328.0 0.0 0.00274 D 6.0 2.0 45.0 50.0 B12 Diesel 1323.0 R82
678011 6114329.0 0.0 0.00274 B 4.0 0.0 60.0 50.0 B12 Regular 95.0 R26
678012 6114330.0 0.0 0.00274 B 7.0 6.0 29.0 54.0 B12 Diesel 65.0 R72

678013 rows × 12 columns

Data dictionary

  • IDpol: policy number (unique identifier)
  • ClaimNb: number of claims on the given policy
  • Exposure: total exposure in yearly units
  • Area: area code (categorical, ordinal)
  • VehPower: power of the car (categorical, ordinal)
  • VehAge: age of the car in years
  • DrivAge: age of the (most common) driver in years
  • BonusMalus: bonus-malus level between 50 and 230 (with reference level 100)
  • VehBrand: car brand (categorical, nominal)
  • VehGas: diesel or regular fuel car (binary)
  • Density: density of inhabitants per km2 in the city of the living place of the driver
  • Region: regions in France (prior to 2016)

The model

Have \{ (\mathbf{x}_i, y_i) \}_{i=1, \dots, n} for \mathbf{x}_i \in \mathbb{R}^{47} and y_i \in \mathbb{N}_0.

Assume the distribution Y_i \sim \mathsf{Poisson}(\lambda(\mathbf{x}_i))

We have \mathbb{E} Y_i = \lambda(\mathbf{x}_i). The NN takes \mathbf{x}_i & predicts \mathbb{E} Y_i.

Note

For insurance, this is a bit weird. The exposures are different for each policy.

\lambda(\mathbf{x}_i) is the expected number of claims for the duration of policy i’s contract.

Normally, \text{Exposure}_i \not\in \mathbf{x}_i, and \lambda(\mathbf{x}_i) is the expected rate per year, then Y_i \sim \mathsf{Poisson}(\text{Exposure}_i \times \lambda(\mathbf{x}_i)).

Where are things defined?

In Keras, string options are used for convenience to reference specific functions or settings.

model = Sequential([
    Dense(30, activation="relu"),
    Dense(1, activation="exponential")
])

is the same as

from keras.activations import relu, exponential

model = Sequential([
    Dense(30, activation=relu),
    Dense(1, activation=exponential)
])
x = [-1.0, 0.0, 1.0]
print(relu(x))
print(exponential(x))
tf.Tensor([0. 0. 1.], shape=(3,), dtype=float32)
tf.Tensor([0.37 1.   2.72], shape=(3,), dtype=float32)

String arguments to .compile

When we run

model.compile(optimizer="adam", loss="poisson")

it is equivalent to

from keras.losses import poisson
from keras.optimizers import Adam

model.compile(optimizer=Adam(), loss=poisson)

Why do this manually? To adjust the object:

optimizer = Adam(learning_rate=0.01)
model.compile(optimizer=optimizer, loss="poisson")

or to get help.

Keras’ “poisson” loss

help(keras.losses.poisson)
Help on function poisson in module keras.src.losses.losses:

poisson(y_true, y_pred)
    Computes the Poisson loss between y_true and y_pred.
    
    Formula:
    
    ```python
    loss = y_pred - y_true * log(y_pred)
    ```
    
    Args:
        y_true: Ground truth values. shape = `[batch_size, d0, .. dN]`.
        y_pred: The predicted values. shape = `[batch_size, d0, .. dN]`.
    
    Returns:
        Poisson loss values with shape = `[batch_size, d0, .. dN-1]`.
    
    Example:
    
    >>> y_true = np.random.randint(0, 2, size=(2, 3))
    >>> y_pred = np.random.random(size=(2, 3))
    >>> loss = keras.losses.poisson(y_true, y_pred)
    >>> assert loss.shape == (2,)
    >>> y_pred = y_pred + 1e-7
    >>> assert np.allclose(
    ...     loss, np.mean(y_pred - y_true * np.log(y_pred), axis=-1),
    ...     atol=1e-5)

Ordinal Variables

Lecture Outline

  • Preprocessing

  • French Motor Claims & Poisson Regression

  • Ordinal Variables

  • Categorical Variables & Entity Embeddings

  • Keras’ Functional API

  • French Motor Dataset with Embeddings

  • Scale By Exposure

Subsample and split

freq = freq.drop("IDpol", axis=1).head(25_000)

X_train, X_test, y_train, y_test = train_test_split(
  freq.drop("ClaimNb", axis=1), freq["ClaimNb"], random_state=2023)

# Reset each index to start at 0 again.
X_train = X_train.reset_index(drop=True)
X_test = X_test.reset_index(drop=True)

What values do we see in the data?

X_train["Area"].value_counts()
X_train["VehBrand"].value_counts()
X_train["VehGas"].value_counts()
X_train["Region"].value_counts()
Area
C    5507
D    4113
A    3527
E    2769
B    2359
F     475
Name: count, dtype: int64
VehBrand
B1     5069
B2     4838
B12    3708
       ... 
B13     336
B11     284
B14     136
Name: count, Length: 11, dtype: int64
VehGas
Regular    10773
Diesel      7977
Name: count, dtype: int64
Region
R24    6498
R82    2119
R11    1909
       ... 
R21      90
R42      55
R43      26
Name: count, Length: 22, dtype: int64

Ordinal & binary categories are easy

from sklearn.preprocessing import OrdinalEncoder
oe = OrdinalEncoder()
oe.fit(X_train[["Area", "VehGas"]])
oe.categories_
[array(['A', 'B', 'C', 'D', 'E', 'F'], dtype=object),
 array(['Diesel', 'Regular'], dtype=object)]
for i, area in enumerate(oe.categories_[0]):
    print(f"The Area value {area} gets turned into {i}.")
The Area value A gets turned into 0.
The Area value B gets turned into 1.
The Area value C gets turned into 2.
The Area value D gets turned into 3.
The Area value E gets turned into 4.
The Area value F gets turned into 5.
for i, gas in enumerate(oe.categories_[1]):
    print(f"The VehGas value {gas} gets turned into {i}.")
The VehGas value Diesel gets turned into 0.
The VehGas value Regular gets turned into 1.

Ordinal encoded values

X_train_ord = oe.transform(X_train[["Area", "VehGas"]])
X_test_ord = oe.transform(X_test[["Area", "VehGas"]])
X_train[["Area", "VehGas"]].head()
Area VehGas
0 C Diesel
1 C Regular
2 E Regular
3 D Diesel
4 A Regular
X_train_ord.head()
Area VehGas
0 2.0 0.0
1 2.0 1.0
2 4.0 1.0
3 3.0 0.0
4 0.0 1.0

Train on ordinal encoded values

random.seed(12)
model = Sequential([
  Dense(1, activation="exponential")
])

model.compile(optimizer="adam", loss="poisson")

es = EarlyStopping(verbose=True)
hist = model.fit(X_train_ord, y_train, epochs=100, verbose=0,
    validation_split=0.2, callbacks=[es])
hist.history["val_loss"][-1]
Epoch 22: early stopping
0.7821308970451355


What about adding the continuous variables back in? Use a sklearn column transformer for that.

Preprocess ordinal & continuous

from sklearn.compose import make_column_transformer

ct = make_column_transformer(
  (OrdinalEncoder(), ["Area", "VehGas"]),
  ("drop", ["VehBrand", "Region"]),
  remainder=StandardScaler()
)

X_train_ct = ct.fit_transform(X_train)
X_train.head(3)
Exposure Area VehPower VehAge DrivAge BonusMalus VehBrand VehGas Density Region
0 1.00 C 6.0 2.0 66.0 50.0 B2 Diesel 124.0 R24
1 0.36 C 4.0 10.0 22.0 100.0 B1 Regular 377.0 R93
2 0.02 E 12.0 8.0 44.0 60.0 B3 Regular 5628.0 R11
X_train_ct.head(3)
ordinalencoder__Area ordinalencoder__VehGas remainder__Exposure remainder__VehPower remainder__VehAge remainder__DrivAge remainder__BonusMalus remainder__Density
0 2.0 0.0 1.126979 -0.165005 -0.844589 1.451036 -0.637179 -0.366980
1 2.0 1.0 -0.590896 -1.228181 0.586255 -1.548692 2.303010 -0.302700
2 4.0 1.0 -1.503517 3.024524 0.228544 -0.048828 -0.049141 1.031432

Preprocess ordinal & continuous II

from sklearn.compose import make_column_transformer

ct = make_column_transformer(
  (OrdinalEncoder(), ["Area", "VehGas"]),
  ("drop", ["VehBrand", "Region"]),
  remainder=StandardScaler(),
  verbose_feature_names_out=False
)
X_train_ct = ct.fit_transform(X_train)
X_train.head(3)
Exposure Area VehPower VehAge DrivAge BonusMalus VehBrand VehGas Density Region
0 1.00 C 6.0 2.0 66.0 50.0 B2 Diesel 124.0 R24
1 0.36 C 4.0 10.0 22.0 100.0 B1 Regular 377.0 R93
2 0.02 E 12.0 8.0 44.0 60.0 B3 Regular 5628.0 R11
X_train_ct.head(3)
Area VehGas Exposure VehPower VehAge DrivAge BonusMalus Density
0 2.0 0.0 1.126979 -0.165005 -0.844589 1.451036 -0.637179 -0.366980
1 2.0 1.0 -0.590896 -1.228181 0.586255 -1.548692 2.303010 -0.302700
2 4.0 1.0 -1.503517 3.024524 0.228544 -0.048828 -0.049141 1.031432

Categorical Variables & Entity Embeddings

Lecture Outline

  • Preprocessing

  • French Motor Claims & Poisson Regression

  • Ordinal Variables

  • Categorical Variables & Entity Embeddings

  • Keras’ Functional API

  • French Motor Dataset with Embeddings

  • Scale By Exposure

Region column

French Administrative Regions

One-hot encoding

oe = OneHotEncoder(sparse_output=False)
X_train_oh = oe.fit_transform(X_train[["Region"]])
X_test_oh = oe.transform(X_test[["Region"]])
print(list(X_train["Region"][:5]))
X_train_oh.head()
['R24', 'R93', 'R11', 'R42', 'R24']
Region_R11 Region_R21 Region_R22 Region_R23 Region_R24 Region_R25 Region_R26 Region_R31 Region_R41 Region_R42 ... Region_R53 Region_R54 Region_R72 Region_R73 Region_R74 Region_R82 Region_R83 Region_R91 Region_R93 Region_R94
0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
2 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

5 rows × 22 columns

Train on one-hot inputs

num_regions = len(oe.categories_[0])

random.seed(12)
model = Sequential([
  Dense(2, input_dim=num_regions),
  Dense(1, activation="exponential")
])

model.compile(optimizer="adam", loss="poisson")

es = EarlyStopping(verbose=True)
hist = model.fit(X_train_oh, y_train, epochs=100, verbose=0,
    validation_split=0.2, callbacks=[es])                       
hist.history["val_loss"][-1]
Epoch 12: early stopping
0.7526934146881104

Consider the first layer

every_category = pd.DataFrame(np.eye(num_regions), columns=oe.categories_[0])
every_category.head(3)
R11 R21 R22 R23 R24 R25 R26 R31 R41 R42 ... R53 R54 R72 R73 R74 R82 R83 R91 R93 R94
0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

3 rows × 22 columns

# Put this through the first layer of the model
X = every_category.to_numpy()
model.layers[0](X)
<tf.Tensor: shape=(22, 2), dtype=float32, numpy=
array([[-0.21, -0.14],
       [ 0.21, -0.17],
       [-0.22,  0.1 ],
       [-0.83,  0.1 ],
       [-0.01, -0.66],
       [-0.65, -0.13],
       [-0.36, -0.41],
       [ 0.21, -0.03],
       [-0.93, -0.57],
       [ 0.2 , -0.41],
       [-0.43, -0.21],
       [-1.13, -0.33],
       [ 0.17, -0.68],
       [-0.88, -0.55],
       [-0.13,  0.05],
       [ 0.11,  0.  ],
       [-0.46, -0.38],
       [-0.62, -0.37],
       [-0.19, -0.28],
       [-0.22,  0.15],
       [ 0.3 , -0.16],
       [-0.28,  0.36]], dtype=float32)>

The first layer

layer = model.layers[0]
W, b = layer.get_weights()
X.shape, W.shape, b.shape
((22, 22), (22, 2), (2,))
X @ W + b
array([[-0.21, -0.14],
       [ 0.21, -0.17],
       [-0.22,  0.1 ],
       [-0.83,  0.1 ],
       [-0.01, -0.66],
       [-0.65, -0.13],
       [-0.36, -0.41],
       [ 0.21, -0.03],
       [-0.93, -0.57],
       [ 0.2 , -0.41],
       [-0.43, -0.21],
       [-1.13, -0.33],
       [ 0.17, -0.68],
       [-0.88, -0.55],
       [-0.13,  0.05],
       [ 0.11,  0.  ],
       [-0.46, -0.38],
       [-0.62, -0.37],
       [-0.19, -0.28],
       [-0.22,  0.15],
       [ 0.3 , -0.16],
       [-0.28,  0.36]])
W + b
array([[-0.21, -0.14],
       [ 0.21, -0.17],
       [-0.22,  0.1 ],
       [-0.83,  0.1 ],
       [-0.01, -0.66],
       [-0.65, -0.13],
       [-0.36, -0.41],
       [ 0.21, -0.03],
       [-0.93, -0.57],
       [ 0.2 , -0.41],
       [-0.43, -0.21],
       [-1.13, -0.33],
       [ 0.17, -0.68],
       [-0.88, -0.55],
       [-0.13,  0.05],
       [ 0.11,  0.  ],
       [-0.46, -0.38],
       [-0.62, -0.37],
       [-0.19, -0.28],
       [-0.22,  0.15],
       [ 0.3 , -0.16],
       [-0.28,  0.36]], dtype=float32)

Just a look-up operation

display(list(oe.categories_[0]))
['R11',
 'R21',
 'R22',
 'R23',
 'R24',
 'R25',
 'R26',
 'R31',
 'R41',
 'R42',
 'R43',
 'R52',
 'R53',
 'R54',
 'R72',
 'R73',
 'R74',
 'R82',
 'R83',
 'R91',
 'R93',
 'R94']
W + b
array([[-0.21, -0.14],
       [ 0.21, -0.17],
       [-0.22,  0.1 ],
       [-0.83,  0.1 ],
       [-0.01, -0.66],
       [-0.65, -0.13],
       [-0.36, -0.41],
       [ 0.21, -0.03],
       [-0.93, -0.57],
       [ 0.2 , -0.41],
       [-0.43, -0.21],
       [-1.13, -0.33],
       [ 0.17, -0.68],
       [-0.88, -0.55],
       [-0.13,  0.05],
       [ 0.11,  0.  ],
       [-0.46, -0.38],
       [-0.62, -0.37],
       [-0.19, -0.28],
       [-0.22,  0.15],
       [ 0.3 , -0.16],
       [-0.28,  0.36]], dtype=float32)

Turn the region into an index

oe = OrdinalEncoder()
X_train_reg = oe.fit_transform(X_train[["Region"]])
X_test_reg = oe.transform(X_test[["Region"]])

for i, reg in enumerate(oe.categories_[0][:3]):
  print(f"The Region value {reg} gets turned into {i}.")
The Region value R11 gets turned into 0.
The Region value R21 gets turned into 1.
The Region value R22 gets turned into 2.

Embedding

from keras.layers import Embedding
num_regions = len(np.unique(X_train[["Region"]]))

random.seed(12)
model = Sequential([
  Embedding(input_dim=num_regions, output_dim=2),
  Dense(1, activation="exponential")
])

model.compile(optimizer="adam", loss="poisson")

Fitting that model

es = EarlyStopping(verbose=True)
hist = model.fit(X_train_reg, y_train, epochs=100, verbose=0,
    validation_split=0.2, callbacks=[es])
hist.history["val_loss"][-1]
Epoch 5: early stopping
0.7526668906211853
model.layers
[<Embedding name=embedding, built=True>, <Dense name=dense_8, built=True>]

Keras’ Embedding Layer

model.layers[0].get_weights()[0]
array([[-0.12, -0.11],
       [ 0.03, -0.  ],
       [-0.02,  0.01],
       [-0.25, -0.14],
       [-0.28, -0.32],
       [-0.3 , -0.22],
       [-0.31, -0.28],
       [ 0.1 ,  0.07],
       [-0.61, -0.51],
       [-0.06, -0.12],
       [-0.17, -0.14],
       [-0.6 , -0.46],
       [-0.22, -0.27],
       [-0.59, -0.5 ],
       [-0.  ,  0.02],
       [ 0.07,  0.06],
       [-0.31, -0.28],
       [-0.4 , -0.34],
       [-0.16, -0.15],
       [ 0.01,  0.05],
       [ 0.08,  0.03],
       [ 0.08,  0.13]], dtype=float32)
X_train["Region"].head(4)
0    R24
1    R93
2    R11
3    R42
Name: Region, dtype: object
X_sample = X_train_reg[:4].to_numpy()
X_sample
array([[ 4.],
       [20.],
       [ 0.],
       [ 9.]])
enc_tensor = model.layers[0](X_sample)
keras.ops.convert_to_numpy(enc_tensor).squeeze()
array([[-0.28, -0.32],
       [ 0.08,  0.03],
       [-0.12, -0.11],
       [-0.06, -0.12]], dtype=float32)

The learned embeddings

points = model.layers[0].get_weights()[0]
plt.scatter(points[:,0], points[:,1])
for i in range(num_regions):
  plt.text(points[i,0]+0.01, points[i,1] , s=oe.categories_[0][i])

Entity embeddings

Embeddings will gradually improve during training.

Embeddings & other inputs

Illustration of a neural network with both continuous and categorical inputs.

We can’t do this with Sequential models…

Keras’ Functional API

Lecture Outline

  • Preprocessing

  • French Motor Claims & Poisson Regression

  • Ordinal Variables

  • Categorical Variables & Entity Embeddings

  • Keras’ Functional API

  • French Motor Dataset with Embeddings

  • Scale By Exposure

Converting Sequential models

from keras.models import Model
from keras.layers import Input
random.seed(12)

model = Sequential([
  Dense(30, "relu"),
  Dense(1, "exponential")
])

model.compile(
  optimizer="adam",
  loss="poisson")

hist = model.fit(
  X_train_ord, y_train,
  epochs=1, verbose=0,
  validation_split=0.2)
hist.history["val_loss"][-1]
0.7844388484954834
random.seed(12)

inputs = Input(shape=(2,))
x = Dense(30, "relu")(inputs)
out = Dense(1, "exponential")(x)
model = Model(inputs, out)

model.compile(
  optimizer="adam",
  loss="poisson")

hist = model.fit(
  X_train_ord, y_train,
  epochs=1, verbose=0,
  validation_split=0.2)
hist.history["val_loss"][-1]
0.7844388484954834

Cf. one-length tuples.

Wide & Deep network

An illustration of the wide & deep network architecture.

Add a skip connection from input to output layers.

from keras.layers \
    import Concatenate

inp = Input(shape=X_train.shape[1:])
hidden1 = Dense(30, "relu")(inp)
hidden2 = Dense(30, "relu")(hidden1)
concat = Concatenate()(
  [inp, hidden2])
output = Dense(1)(concat)
model = Model(
    inputs=[inp],
    outputs=[output])

Naming the layers

For complex networks, it is often useful to give meaningul names to the layers.

input_ = Input(shape=X_train.shape[1:], name="input")
hidden1 = Dense(30, activation="relu", name="hidden1")(input_)
hidden2 = Dense(30, activation="relu", name="hidden2")(hidden1)
concat = Concatenate(name="combined")([input_, hidden2])
output = Dense(1, name="output")(concat)
model = Model(inputs=[input_], outputs=[output])

Inspecting a complex model

from keras.utils import plot_model
plot_model(model, show_layer_names=True)

model.summary(line_length=75)
Model: "functional_10"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type)         Output Shape         Param #  Connected to      ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ input (InputLayer)  │ (None, 10)        │         0 │ -                 │
├─────────────────────┼───────────────────┼───────────┼───────────────────┤
│ hidden1 (Dense)     │ (None, 30)        │       330 │ input[0][0]       │
├─────────────────────┼───────────────────┼───────────┼───────────────────┤
│ hidden2 (Dense)     │ (None, 30)        │       930 │ hidden1[0][0]     │
├─────────────────────┼───────────────────┼───────────┼───────────────────┤
│ combined            │ (None, 40)        │         0 │ input[0][0],      │
│ (Concatenate)       │                   │           │ hidden2[0][0]     │
├─────────────────────┼───────────────────┼───────────┼───────────────────┤
│ output (Dense)      │ (None, 1)         │        41 │ combined[0][0]    │
└─────────────────────┴───────────────────┴───────────┴───────────────────┘
 Total params: 1,301 (5.08 KB)
 Trainable params: 1,301 (5.08 KB)
 Non-trainable params: 0 (0.00 B)

French Motor Dataset with Embeddings

Lecture Outline

  • Preprocessing

  • French Motor Claims & Poisson Regression

  • Ordinal Variables

  • Categorical Variables & Entity Embeddings

  • Keras’ Functional API

  • French Motor Dataset with Embeddings

  • Scale By Exposure

The desired architecture

Illustration of a neural network with both continuous and categorical inputs.

Preprocess all French motor inputs

Transform the categorical variables to integers:

num_brands, num_regions = X_train.nunique()[["VehBrand", "Region"]]

ct = make_column_transformer(
  (OrdinalEncoder(), ["VehBrand", "Region", "Area", "VehGas"]),
  remainder=StandardScaler(),
  verbose_feature_names_out=False
)
X_train_ct = ct.fit_transform(X_train)
X_test_ct = ct.transform(X_test)

Split the brand and region data apart from the rest:

X_train_brand = X_train_ct["VehBrand"]; X_test_brand = X_test_ct["VehBrand"]
X_train_region = X_train_ct["Region"]; X_test_region = X_test_ct["Region"]
X_train_rest = X_train_ct.drop(["VehBrand", "Region"], axis=1)
X_test_rest = X_test_ct.drop(["VehBrand", "Region"], axis=1)

Organise the inputs

Make a Keras Input for: vehicle brand, region, & others.

veh_brand = Input(shape=(1,), name="vehBrand")
region = Input(shape=(1,), name="region")
other_inputs = Input(shape=X_train_rest.shape[1:], name="otherInputs")

Create embeddings and join them with the other inputs.

from keras.layers import Reshape

random.seed(1337)
veh_brand_ee = Embedding(input_dim=num_brands, output_dim=2,
    name="vehBrandEE")(veh_brand)                                
veh_brand_ee = Reshape(target_shape=(2,))(veh_brand_ee)

region_ee = Embedding(input_dim=num_regions, output_dim=2,
    name="regionEE")(region)
region_ee = Reshape(target_shape=(2,))(region_ee)

x = Concatenate(name="combined")([veh_brand_ee, region_ee, other_inputs])

Complete the model and fit it

Feed the combined embeddings & continuous inputs to some normal dense layers.

x = Dense(30, "relu", name="hidden")(x)
out = Dense(1, "exponential", name="out")(x)

model = Model([veh_brand, region, other_inputs], out)
model.compile(optimizer="adam", loss="poisson")

hist = model.fit((X_train_brand, X_train_region, X_train_rest),
    y_train, epochs=100, verbose=0,
    callbacks=[EarlyStopping(patience=5)], validation_split=0.2)
np.min(hist.history["val_loss"])
0.6692155599594116

Plotting this model

plot_model(model, show_layer_names=True)

Why we need to reshape

plot_model(model, show_layer_names=True, show_shapes=True)

Scale By Exposure

Lecture Outline

  • Preprocessing

  • French Motor Claims & Poisson Regression

  • Ordinal Variables

  • Categorical Variables & Entity Embeddings

  • Keras’ Functional API

  • French Motor Dataset with Embeddings

  • Scale By Exposure

Two different models

Have \{ (\mathbf{x}_i, y_i) \}_{i=1, \dots, n} for \mathbf{x}_i \in \mathbb{R}^{47} and y_i \in \mathbb{N}_0.

Model 1: Say Y_i \sim \mathsf{Poisson}(\lambda(\mathbf{x}_i)).

But, the exposures are different for each policy. \lambda(\mathbf{x}_i) is the expected number of claims for the duration of policy i’s contract.

Model 2: Say Y_i \sim \mathsf{Poisson}(\text{Exposure}_i \times \lambda(\mathbf{x}_i)).

Now, \text{Exposure}_i \not\in \mathbf{x}_i, and \lambda(\mathbf{x}_i) is the rate per year.

Just take continuous variables

ct = make_column_transformer(
  ("passthrough", ["Exposure"]),
  ("drop", ["VehBrand", "Region", "Area", "VehGas"]),
  remainder=StandardScaler(),
  verbose_feature_names_out=False
)
X_train_ct = ct.fit_transform(X_train)
X_test_ct = ct.transform(X_test)

Split exposure apart from the rest:

X_train_exp = X_train_ct["Exposure"]; X_test_exp = X_test_ct["Exposure"]
X_train_rest = X_train_ct.drop("Exposure", axis=1)
X_test_rest = X_test_ct.drop("Exposure", axis=1)

Organise the inputs:

exposure = Input(shape=(1,), name="exposure")
other_inputs = Input(shape=X_train_rest.shape[1:], name="otherInputs")

Make & fit the model

Feed the continuous inputs to some normal dense layers.

random.seed(1337)
x = Dense(30, "relu", name="hidden1")(other_inputs)
x = Dense(30, "relu", name="hidden2")(x)
lambda_ = Dense(1, "exponential", name="lambda")(x)
from keras.layers import Multiply

out = Multiply(name="out")([lambda_, exposure])
model = Model([exposure, other_inputs], out)
model.compile(optimizer="adam", loss="poisson")

es = EarlyStopping(patience=10, restore_best_weights=True, verbose=1)
hist = model.fit((X_train_exp, X_train_rest),
    y_train, epochs=100, verbose=0,
    callbacks=[es], validation_split=0.2)
np.min(hist.history["val_loss"])
Epoch 40: early stopping
Restoring model weights from the end of the best epoch: 30.
0.8829042911529541

Plot the model

plot_model(model, show_layer_names=True)

Package Versions

from watermark import watermark
print(watermark(python=True, packages="keras,matplotlib,numpy,pandas,seaborn,scipy,torch,tensorflow,tf_keras"))
Python implementation: CPython
Python version       : 3.11.9
IPython version      : 8.24.0

keras     : 3.3.3
matplotlib: 3.8.4
numpy     : 1.26.4
pandas    : 2.2.2
seaborn   : 0.13.2
scipy     : 1.11.0
torch     : 2.0.1
tensorflow: 2.16.1
tf_keras  : 2.16.0

Glossary

  • entity embeddings
  • Input layer
  • Keras functional API
  • Reshape layer
  • skip connection
  • wide & deep network structure