Categorical Variables

ACTL3143 & ACTL5111 Deep Learning for Actuaries

Patrick Laub

Preprocessing

Lecture Outline

Preprocessing
French Motor Claims & Poisson Regression
Ordinal Variables
Categorical Variables & Entity Embeddings
Keras’ Functional API
French Motor Dataset with Embeddings
Scale By Exposure

Keras model methods

compile: specify the loss function and optimiser
fit: learn the parameters of the model
predict: apply the model
evaluate: apply the model and calculate a metric

random.seed(12)
model = Sequential()
model.add(Dense(1, activation="relu"))
model.compile("adam", "poisson")
model.fit(X_train, y_train, verbose=0)
y_pred = model.predict(X_val, verbose=0)
print(model.evaluate(X_val, y_val, verbose=0))

4.944334506988525

Scikit-learn model methods

fit: learn the parameters of the model
predict: apply the model
score: apply the model and calculate a metric

model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_val)
print(model.score(X_val, y_val))

-0.666850597951445

Scikit-learn preprocessing methods

fit: learn the parameters of the transformation
transform: apply the transformation
fit_transform: learn the parameters and apply the transformation

fit
fit_transform

scaler = StandardScaler()
scaler.fit(X_train)
X_train_sc = scaler.transform(X_train)
X_val_sc = scaler.transform(X_val)
X_test_sc = scaler.transform(X_test)

print(X_train_sc.mean(axis=0))
print(X_train_sc.std(axis=0))
print(X_val_sc.mean(axis=0))
print(X_val_sc.std(axis=0))

[ 2.97e-17 -2.18e-17  1.98e-17 -5.65e-17]
[1. 1. 1. 1.]
[-0.34  0.07 -0.27 -0.82]
[1.01 0.66 1.26 0.89]

scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)
X_val_sc = scaler.transform(X_val)
X_test_sc = scaler.transform(X_test)

print(X_train_sc.mean(axis=0))
print(X_train_sc.std(axis=0))
print(X_val_sc.mean(axis=0))
print(X_val_sc.std(axis=0))

[ 2.97e-17 -2.18e-17  1.98e-17 -5.65e-17]
[1. 1. 1. 1.]
[-0.34  0.07 -0.27 -0.82]
[1.01 0.66 1.26 0.89]

Summary of the splitting

Dataframes & arrays

X_test.head(3)

	x1	x2	x3	x4
83	0.075805	-0.677162	0.975120	-0.147057
53	0.954002	0.651391	-0.315269	0.758969
70	0.113517	0.662131	1.586017	-1.237815

X_test_sc

array([[ 0.13, -0.64,  0.89, -0.4 ],
       [ 1.15,  0.67, -0.44,  0.62],
       [ 0.18,  0.68,  1.52, -1.62],
       [ 0.77, -0.82, -1.22,  0.31],
       [ 0.06,  1.46, -0.39,  2.83],
       [ 2.21,  0.49, -1.34,  0.51],
       [-0.57,  0.53, -0.02,  0.86],
       [ 0.16,  0.61, -0.96,  2.12],
       [ 0.9 ,  0.2 , -0.23, -0.57],
       [ 0.62, -0.11,  0.55,  1.48],
       [ 0.  ,  1.57, -2.81,  0.69],
       [ 0.96, -0.87,  1.33, -1.81],
       [-0.64,  0.87,  0.25, -1.01],
       [-1.19,  0.49, -1.06,  1.51],
       [ 0.65,  1.54, -0.23,  0.22],
       [-1.13,  0.34, -1.05, -1.82],
       [ 0.02,  0.14,  1.2 , -0.9 ],
       [ 0.68, -0.17, -0.34,  1.  ],
       [ 0.44, -1.72,  0.22, -0.66],
       [ 0.73,  2.19, -1.13, -0.87],
       [ 2.73, -1.82,  0.59, -2.04],
       [ 1.04, -0.13, -0.13, -1.36],
       [-0.14,  0.43,  1.82, -0.04],
       [-0.24, -0.72, -1.03, -1.15],
       [ 0.28, -0.57, -0.04, -0.66]])

Note

By default, when you pass sklearn a DataFrame it returns a numpy array.

Keep as a DataFrame

From scikit-learn 1.2:

from sklearn import set_config
set_config(transform_output="pandas")

imp = SimpleImputer()
imp.fit(X_train)
X_train_imp = imp.fit_transform(X_train)
X_val_imp = imp.transform(X_val)
X_test_imp = imp.transform(X_test)

X_test_imp

	x1	x2	x3	x4
83	0.075805	-0.677162	0.975120	-0.147057
53	0.954002	0.651391	-0.315269	0.758969
...	...	...	...	...
42	-0.245388	-0.753736	-0.889514	-0.815810
69	0.199060	-0.600217	0.069802	-0.385314

25 rows × 4 columns

French Motor Claims & Poisson Regression

Lecture Outline

Preprocessing
French Motor Claims & Poisson Regression
Ordinal Variables
Categorical Variables & Entity Embeddings
Keras’ Functional API
French Motor Dataset with Embeddings
Scale By Exposure

French motor dataset

Download the dataset if we don’t have it already.

from pathlib import Path
from sklearn.datasets import fetch_openml

if not Path("french-motor.csv").exists():
    freq = fetch_openml(data_id=41214, as_frame=True).frame
    freq.to_csv("french-motor.csv", index=False)
else:
    freq = pd.read_csv("french-motor.csv")

freq

French motor dataset

	IDpol	ClaimNb	Exposure	Area	VehPower	VehAge	DrivAge	BonusMalus	VehBrand	VehGas	Density	Region
0	1.0	1.0	0.10000	D	5.0	0.0	55.0	50.0	B12	Regular	1217.0	R82
1	3.0	1.0	0.77000	D	5.0	0.0	55.0	50.0	B12	Regular	1217.0	R82
2	5.0	1.0	0.75000	B	6.0	2.0	52.0	50.0	B12	Diesel	54.0	R22
...	...	...	...	...	...	...	...	...	...	...	...	...
678010	6114328.0	0.0	0.00274	D	6.0	2.0	45.0	50.0	B12	Diesel	1323.0	R82
678011	6114329.0	0.0	0.00274	B	4.0	0.0	60.0	50.0	B12	Regular	95.0	R26
678012	6114330.0	0.0	0.00274	B	7.0	6.0	29.0	54.0	B12	Diesel	65.0	R72

678013 rows × 12 columns

Data dictionary

IDpol: policy number (unique identifier)
ClaimNb: number of claims on the given policy
Exposure: total exposure in yearly units
Area: area code (categorical, ordinal)
VehPower: power of the car (categorical, ordinal)
VehAge: age of the car in years
DrivAge: age of the (most common) driver in years

BonusMalus: bonus-malus level between 50 and 230 (with reference level 100)
VehBrand: car brand (categorical, nominal)
VehGas: diesel or regular fuel car (binary)
Density: density of inhabitants per km² in the city of the living place of the driver
Region: regions in France (prior to 2016)

The model

Have \{ (\mathbf{x}_i, y_i) \}_{i=1, \dots, n} for \mathbf{x}_i \in \mathbb{R}^{47} and y_i \in \mathbb{N}_0.

Assume the distribution Y_i \sim \mathsf{Poisson}(\lambda(\mathbf{x}_i))

We have \mathbb{E} Y_i = \lambda(\mathbf{x}_i). The NN takes \mathbf{x}_i & predicts \mathbb{E} Y_i.

Note

For insurance, this is a bit weird. The exposures are different for each policy.

\lambda(\mathbf{x}_i) is the expected number of claims for the duration of policy i’s contract.

Normally, \text{Exposure}_i \not\in \mathbf{x}_i, and \lambda(\mathbf{x}_i) is the expected rate per year, then Y_i \sim \mathsf{Poisson}(\text{Exposure}_i \times \lambda(\mathbf{x}_i)).

Where are things defined?

In Keras, string options are used for convenience to reference specific functions or settings.

model = Sequential([
    Dense(30, activation="relu"),
    Dense(1, activation="exponential")
])

is the same as

from keras.activations import relu, exponential

model = Sequential([
    Dense(30, activation=relu),
    Dense(1, activation=exponential)
])

x = [-1.0, 0.0, 1.0]
print(relu(x))
print(exponential(x))

tf.Tensor([0. 0. 1.], shape=(3,), dtype=float32)
tf.Tensor([0.37 1.   2.72], shape=(3,), dtype=float32)

String arguments to `.compile`

When we run

model.compile(optimizer="adam", loss="poisson")

it is equivalent to

from keras.losses import poisson
from keras.optimizers import Adam

model.compile(optimizer=Adam(), loss=poisson)

Why do this manually? To adjust the object:

optimizer = Adam(learning_rate=0.01)
model.compile(optimizer=optimizer, loss="poisson")

or to get help.

Keras’ “poisson” loss

help(keras.losses.poisson)

Help on function poisson in module keras.src.losses.losses:

poisson(y_true, y_pred)
    Computes the Poisson loss between y_true and y_pred.
    
    Formula:
    
    ```python
    loss = y_pred - y_true * log(y_pred)
    ```
    
    Args:
        y_true: Ground truth values. shape = `[batch_size, d0, .. dN]`.
        y_pred: The predicted values. shape = `[batch_size, d0, .. dN]`.
    
    Returns:
        Poisson loss values with shape = `[batch_size, d0, .. dN-1]`.
    
    Example:
    
    >>> y_true = np.random.randint(0, 2, size=(2, 3))
    >>> y_pred = np.random.random(size=(2, 3))
    >>> loss = keras.losses.poisson(y_true, y_pred)
    >>> assert loss.shape == (2,)
    >>> y_pred = y_pred + 1e-7
    >>> assert np.allclose(
    ...     loss, np.mean(y_pred - y_true * np.log(y_pred), axis=-1),
    ...     atol=1e-5)

Ordinal Variables

Lecture Outline

Preprocessing
French Motor Claims & Poisson Regression
Ordinal Variables
Categorical Variables & Entity Embeddings
Keras’ Functional API
French Motor Dataset with Embeddings
Scale By Exposure

Subsample and split

freq = freq.drop("IDpol", axis=1).head(25_000)

X_train, X_test, y_train, y_test = train_test_split(
  freq.drop("ClaimNb", axis=1), freq["ClaimNb"], random_state=2023)

# Reset each index to start at 0 again.
X_train = X_train.reset_index(drop=True)
X_test = X_test.reset_index(drop=True)

What values do we see in the data?

X_train["Area"].value_counts()
X_train["VehBrand"].value_counts()
X_train["VehGas"].value_counts()
X_train["Region"].value_counts()

Area
C    5507
D    4113
A    3527
E    2769
B    2359
F     475
Name: count, dtype: int64

VehBrand
B1     5069
B2     4838
B12    3708
       ... 
B13     336
B11     284
B14     136
Name: count, Length: 11, dtype: int64

VehGas
Regular    10773
Diesel      7977
Name: count, dtype: int64

Region
R24    6498
R82    2119
R11    1909
       ... 
R21      90
R42      55
R43      26
Name: count, Length: 22, dtype: int64

Ordinal & binary categories are easy

from sklearn.preprocessing import OrdinalEncoder
oe = OrdinalEncoder()
oe.fit(X_train[["Area", "VehGas"]])
oe.categories_

[array(['A', 'B', 'C', 'D', 'E', 'F'], dtype=object),
 array(['Diesel', 'Regular'], dtype=object)]

for i, area in enumerate(oe.categories_[0]):
    print(f"The Area value {area} gets turned into {i}.")

The Area value A gets turned into 0.
The Area value B gets turned into 1.
The Area value C gets turned into 2.
The Area value D gets turned into 3.
The Area value E gets turned into 4.
The Area value F gets turned into 5.

for i, gas in enumerate(oe.categories_[1]):
    print(f"The VehGas value {gas} gets turned into {i}.")

The VehGas value Diesel gets turned into 0.
The VehGas value Regular gets turned into 1.

Ordinal encoded values

X_train_ord = oe.transform(X_train[["Area", "VehGas"]])
X_test_ord = oe.transform(X_test[["Area", "VehGas"]])

X_train[["Area", "VehGas"]].head()

	Area	VehGas
0	C	Diesel
1	C	Regular
2	E	Regular
3	D	Diesel
4	A	Regular

X_train_ord.head()

	Area	VehGas
0	2.0	0.0
1	2.0	1.0
2	4.0	1.0
3	3.0	0.0
4	0.0	1.0

Train on ordinal encoded values

random.seed(12)
model = Sequential([
  Dense(1, activation="exponential")
])

model.compile(optimizer="adam", loss="poisson")

es = EarlyStopping(verbose=True)
hist = model.fit(X_train_ord, y_train, epochs=100, verbose=0,
    validation_split=0.2, callbacks=[es])
hist.history["val_loss"][-1]

Epoch 22: early stopping

0.7821308970451355

What about adding the continuous variables back in? Use a sklearn column transformer for that.

Preprocess ordinal & continuous

from sklearn.compose import make_column_transformer

ct = make_column_transformer(
  (OrdinalEncoder(), ["Area", "VehGas"]),
  ("drop", ["VehBrand", "Region"]),
  remainder=StandardScaler()
)

X_train_ct = ct.fit_transform(X_train)

X_train.head(3)

	Exposure	Area	VehPower	VehAge	DrivAge	BonusMalus	VehBrand	VehGas	Density	Region
0	1.00	C	6.0	2.0	66.0	50.0	B2	Diesel	124.0	R24
1	0.36	C	4.0	10.0	22.0	100.0	B1	Regular	377.0	R93
2	0.02	E	12.0	8.0	44.0	60.0	B3	Regular	5628.0	R11

X_train_ct.head(3)

	ordinalencoder__Area	ordinalencoder__VehGas	remainder__Exposure	remainder__VehPower	remainder__VehAge	remainder__DrivAge	remainder__BonusMalus	remainder__Density
0	2.0	0.0	1.126979	-0.165005	-0.844589	1.451036	-0.637179	-0.366980
1	2.0	1.0	-0.590896	-1.228181	0.586255	-1.548692	2.303010	-0.302700
2	4.0	1.0	-1.503517	3.024524	0.228544	-0.048828	-0.049141	1.031432

Preprocess ordinal & continuous II

from sklearn.compose import make_column_transformer

ct = make_column_transformer(
  (OrdinalEncoder(), ["Area", "VehGas"]),
  ("drop", ["VehBrand", "Region"]),
  remainder=StandardScaler(),
  verbose_feature_names_out=False
)
X_train_ct = ct.fit_transform(X_train)

X_train.head(3)

	Exposure	Area	VehPower	VehAge	DrivAge	BonusMalus	VehBrand	VehGas	Density	Region
0	1.00	C	6.0	2.0	66.0	50.0	B2	Diesel	124.0	R24
1	0.36	C	4.0	10.0	22.0	100.0	B1	Regular	377.0	R93
2	0.02	E	12.0	8.0	44.0	60.0	B3	Regular	5628.0	R11

X_train_ct.head(3)

	Area	VehGas	Exposure	VehPower	VehAge	DrivAge	BonusMalus	Density
0	2.0	0.0	1.126979	-0.165005	-0.844589	1.451036	-0.637179	-0.366980
1	2.0	1.0	-0.590896	-1.228181	0.586255	-1.548692	2.303010	-0.302700
2	4.0	1.0	-1.503517	3.024524	0.228544	-0.048828	-0.049141	1.031432

Categorical Variables & Entity Embeddings

Lecture Outline

Preprocessing
French Motor Claims & Poisson Regression
Ordinal Variables
Categorical Variables & Entity Embeddings
Keras’ Functional API
French Motor Dataset with Embeddings
Scale By Exposure

Region column

French Administrative Regions

One-hot encoding

oe = OneHotEncoder(sparse_output=False)
X_train_oh = oe.fit_transform(X_train[["Region"]])
X_test_oh = oe.transform(X_test[["Region"]])
print(list(X_train["Region"][:5]))
X_train_oh.head()

['R24', 'R93', 'R11', 'R42', 'R24']

	Region_R11	Region_R24	Region_R42	...	Region_R93
0	0.0	1.0	0.0	...	0.0
1	0.0	0.0	0.0	...	1.0
2	1.0	0.0	0.0	...	0.0
3	0.0	0.0	1.0	...	0.0
4	0.0	1.0	0.0	...	0.0

5 rows × 22 columns

Train on one-hot inputs

num_regions = len(oe.categories_[0])

random.seed(12)
model = Sequential([
  Dense(2, input_dim=num_regions),
  Dense(1, activation="exponential")
])

model.compile(optimizer="adam", loss="poisson")

es = EarlyStopping(verbose=True)
hist = model.fit(X_train_oh, y_train, epochs=100, verbose=0,
    validation_split=0.2, callbacks=[es])                       
hist.history["val_loss"][-1]

Epoch 12: early stopping

0.7526934146881104

Consider the first layer

every_category = pd.DataFrame(np.eye(num_regions), columns=oe.categories_[0])
every_category.head(3)

	R11	R21	R22	...
0	1.0	0.0	0.0	...
1	0.0	1.0	0.0	...
2	0.0	0.0	1.0	...

3 rows × 22 columns

# Put this through the first layer of the model
X = every_category.to_numpy()
model.layers[0](X)

<tf.Tensor: shape=(22, 2), dtype=float32, numpy=
array([[-0.21, -0.14],
       [ 0.21, -0.17],
       [-0.22,  0.1 ],
       [-0.83,  0.1 ],
       [-0.01, -0.66],
       [-0.65, -0.13],
       [-0.36, -0.41],
       [ 0.21, -0.03],
       [-0.93, -0.57],
       [ 0.2 , -0.41],
       [-0.43, -0.21],
       [-1.13, -0.33],
       [ 0.17, -0.68],
       [-0.88, -0.55],
       [-0.13,  0.05],
       [ 0.11,  0.  ],
       [-0.46, -0.38],
       [-0.62, -0.37],
       [-0.19, -0.28],
       [-0.22,  0.15],
       [ 0.3 , -0.16],
       [-0.28,  0.36]], dtype=float32)>

The first layer

layer = model.layers[0]
W, b = layer.get_weights()
X.shape, W.shape, b.shape

((22, 22), (22, 2), (2,))

X @ W + b

array([[-0.21, -0.14],
       [ 0.21, -0.17],
       [-0.22,  0.1 ],
       [-0.83,  0.1 ],
       [-0.01, -0.66],
       [-0.65, -0.13],
       [-0.36, -0.41],
       [ 0.21, -0.03],
       [-0.93, -0.57],
       [ 0.2 , -0.41],
       [-0.43, -0.21],
       [-1.13, -0.33],
       [ 0.17, -0.68],
       [-0.88, -0.55],
       [-0.13,  0.05],
       [ 0.11,  0.  ],
       [-0.46, -0.38],
       [-0.62, -0.37],
       [-0.19, -0.28],
       [-0.22,  0.15],
       [ 0.3 , -0.16],
       [-0.28,  0.36]])

W + b

array([[-0.21, -0.14],
       [ 0.21, -0.17],
       [-0.22,  0.1 ],
       [-0.83,  0.1 ],
       [-0.01, -0.66],
       [-0.65, -0.13],
       [-0.36, -0.41],
       [ 0.21, -0.03],
       [-0.93, -0.57],
       [ 0.2 , -0.41],
       [-0.43, -0.21],
       [-1.13, -0.33],
       [ 0.17, -0.68],
       [-0.88, -0.55],
       [-0.13,  0.05],
       [ 0.11,  0.  ],
       [-0.46, -0.38],
       [-0.62, -0.37],
       [-0.19, -0.28],
       [-0.22,  0.15],
       [ 0.3 , -0.16],
       [-0.28,  0.36]], dtype=float32)

Just a look-up operation

display(list(oe.categories_[0]))

['R11',
 'R21',
 'R22',
 'R23',
 'R24',
 'R25',
 'R26',
 'R31',
 'R41',
 'R42',
 'R43',
 'R52',
 'R53',
 'R54',
 'R72',
 'R73',
 'R74',
 'R82',
 'R83',
 'R91',
 'R93',
 'R94']

W + b

array([[-0.21, -0.14],
       [ 0.21, -0.17],
       [-0.22,  0.1 ],
       [-0.83,  0.1 ],
       [-0.01, -0.66],
       [-0.65, -0.13],
       [-0.36, -0.41],
       [ 0.21, -0.03],
       [-0.93, -0.57],
       [ 0.2 , -0.41],
       [-0.43, -0.21],
       [-1.13, -0.33],
       [ 0.17, -0.68],
       [-0.88, -0.55],
       [-0.13,  0.05],
       [ 0.11,  0.  ],
       [-0.46, -0.38],
       [-0.62, -0.37],
       [-0.19, -0.28],
       [-0.22,  0.15],
       [ 0.3 , -0.16],
       [-0.28,  0.36]], dtype=float32)

Turn the region into an index

oe = OrdinalEncoder()
X_train_reg = oe.fit_transform(X_train[["Region"]])
X_test_reg = oe.transform(X_test[["Region"]])

for i, reg in enumerate(oe.categories_[0][:3]):
  print(f"The Region value {reg} gets turned into {i}.")

The Region value R11 gets turned into 0.
The Region value R21 gets turned into 1.
The Region value R22 gets turned into 2.

Embedding

from keras.layers import Embedding
num_regions = len(np.unique(X_train[["Region"]]))

random.seed(12)
model = Sequential([
  Embedding(input_dim=num_regions, output_dim=2),
  Dense(1, activation="exponential")
])

model.compile(optimizer="adam", loss="poisson")

Fitting that model

es = EarlyStopping(verbose=True)
hist = model.fit(X_train_reg, y_train, epochs=100, verbose=0,
    validation_split=0.2, callbacks=[es])
hist.history["val_loss"][-1]

Epoch 5: early stopping

0.7526668906211853

model.layers

[<Embedding name=embedding, built=True>, <Dense name=dense_8, built=True>]

Keras’ Embedding Layer

model.layers[0].get_weights()[0]

array([[-0.12, -0.11],
       [ 0.03, -0.  ],
       [-0.02,  0.01],
       [-0.25, -0.14],
       [-0.28, -0.32],
       [-0.3 , -0.22],
       [-0.31, -0.28],
       [ 0.1 ,  0.07],
       [-0.61, -0.51],
       [-0.06, -0.12],
       [-0.17, -0.14],
       [-0.6 , -0.46],
       [-0.22, -0.27],
       [-0.59, -0.5 ],
       [-0.  ,  0.02],
       [ 0.07,  0.06],
       [-0.31, -0.28],
       [-0.4 , -0.34],
       [-0.16, -0.15],
       [ 0.01,  0.05],
       [ 0.08,  0.03],
       [ 0.08,  0.13]], dtype=float32)

X_train["Region"].head(4)

0    R24
1    R93
2    R11
3    R42
Name: Region, dtype: object

X_sample = X_train_reg[:4].to_numpy()
X_sample

array([[ 4.],
       [20.],
       [ 0.],
       [ 9.]])

enc_tensor = model.layers[0](X_sample)
keras.ops.convert_to_numpy(enc_tensor).squeeze()

array([[-0.28, -0.32],
       [ 0.08,  0.03],
       [-0.12, -0.11],
       [-0.06, -0.12]], dtype=float32)

The learned embeddings

points = model.layers[0].get_weights()[0]
plt.scatter(points[:,0], points[:,1])
for i in range(num_regions):
  plt.text(points[i,0]+0.01, points[i,1] , s=oe.categories_[0][i])

Entity embeddings

Embeddings will gradually improve during training.

Embeddings & other inputs

Illustration of a neural network with both continuous and categorical inputs.

We can’t do this with Sequential models…

Keras’ Functional API

Lecture Outline

Preprocessing
French Motor Claims & Poisson Regression
Ordinal Variables
Categorical Variables & Entity Embeddings
Keras’ Functional API
French Motor Dataset with Embeddings
Scale By Exposure

Converting Sequential models

from keras.models import Model
from keras.layers import Input

random.seed(12)

model = Sequential([
  Dense(30, "relu"),
  Dense(1, "exponential")
])

model.compile(
  optimizer="adam",
  loss="poisson")

hist = model.fit(
  X_train_ord, y_train,
  epochs=1, verbose=0,
  validation_split=0.2)
hist.history["val_loss"][-1]

0.7844388484954834

random.seed(12)

inputs = Input(shape=(2,))
x = Dense(30, "relu")(inputs)
out = Dense(1, "exponential")(x)
model = Model(inputs, out)

model.compile(
  optimizer="adam",
  loss="poisson")

hist = model.fit(
  X_train_ord, y_train,
  epochs=1, verbose=0,
  validation_split=0.2)
hist.history["val_loss"][-1]

0.7844388484954834

Cf. one-length tuples.

Wide & Deep network

Add a skip connection from input to output layers.

from keras.layers \
    import Concatenate

inp = Input(shape=X_train.shape[1:])
hidden1 = Dense(30, "relu")(inp)
hidden2 = Dense(30, "relu")(hidden1)
concat = Concatenate()(
  [inp, hidden2])
output = Dense(1)(concat)
model = Model(
    inputs=[inp],
    outputs=[output])

Naming the layers

For complex networks, it is often useful to give meaningul names to the layers.

input_ = Input(shape=X_train.shape[1:], name="input")
hidden1 = Dense(30, activation="relu", name="hidden1")(input_)
hidden2 = Dense(30, activation="relu", name="hidden2")(hidden1)
concat = Concatenate(name="combined")([input_, hidden2])
output = Dense(1, name="output")(concat)
model = Model(inputs=[input_], outputs=[output])

Inspecting a complex model

from keras.utils import plot_model

plot_model(model, show_layer_names=True)

model.summary(line_length=75)

Model: "functional_10"

┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type)        ┃ Output Shape      ┃   Param # ┃ Connected to      ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ input (InputLayer)  │ (None, 10)        │         0 │ -                 │
├─────────────────────┼───────────────────┼───────────┼───────────────────┤
│ hidden1 (Dense)     │ (None, 30)        │       330 │ input[0][0]       │
├─────────────────────┼───────────────────┼───────────┼───────────────────┤
│ hidden2 (Dense)     │ (None, 30)        │       930 │ hidden1[0][0]     │
├─────────────────────┼───────────────────┼───────────┼───────────────────┤
│ combined            │ (None, 40)        │         0 │ input[0][0],      │
│ (Concatenate)       │                   │           │ hidden2[0][0]     │
├─────────────────────┼───────────────────┼───────────┼───────────────────┤
│ output (Dense)      │ (None, 1)         │        41 │ combined[0][0]    │
└─────────────────────┴───────────────────┴───────────┴───────────────────┘

 Total params: 1,301 (5.08 KB)

 Trainable params: 1,301 (5.08 KB)

 Non-trainable params: 0 (0.00 B)

French Motor Dataset with Embeddings

Lecture Outline

Preprocessing
French Motor Claims & Poisson Regression
Ordinal Variables
Categorical Variables & Entity Embeddings
Keras’ Functional API
French Motor Dataset with Embeddings
Scale By Exposure

The desired architecture

Illustration of a neural network with both continuous and categorical inputs.

Preprocess all French motor inputs

Transform the categorical variables to integers:

num_brands, num_regions = X_train.nunique()[["VehBrand", "Region"]]

ct = make_column_transformer(
  (OrdinalEncoder(), ["VehBrand", "Region", "Area", "VehGas"]),
  remainder=StandardScaler(),
  verbose_feature_names_out=False
)
X_train_ct = ct.fit_transform(X_train)
X_test_ct = ct.transform(X_test)

Split the brand and region data apart from the rest:

X_train_brand = X_train_ct["VehBrand"]; X_test_brand = X_test_ct["VehBrand"]
X_train_region = X_train_ct["Region"]; X_test_region = X_test_ct["Region"]
X_train_rest = X_train_ct.drop(["VehBrand", "Region"], axis=1)
X_test_rest = X_test_ct.drop(["VehBrand", "Region"], axis=1)

Organise the inputs

Make a Keras Input for: vehicle brand, region, & others.

veh_brand = Input(shape=(1,), name="vehBrand")
region = Input(shape=(1,), name="region")
other_inputs = Input(shape=X_train_rest.shape[1:], name="otherInputs")

Create embeddings and join them with the other inputs.

from keras.layers import Reshape

random.seed(1337)
veh_brand_ee = Embedding(input_dim=num_brands, output_dim=2,
    name="vehBrandEE")(veh_brand)                                
veh_brand_ee = Reshape(target_shape=(2,))(veh_brand_ee)

region_ee = Embedding(input_dim=num_regions, output_dim=2,
    name="regionEE")(region)
region_ee = Reshape(target_shape=(2,))(region_ee)

x = Concatenate(name="combined")([veh_brand_ee, region_ee, other_inputs])

Complete the model and fit it

Feed the combined embeddings & continuous inputs to some normal dense layers.

x = Dense(30, "relu", name="hidden")(x)
out = Dense(1, "exponential", name="out")(x)

model = Model([veh_brand, region, other_inputs], out)
model.compile(optimizer="adam", loss="poisson")

hist = model.fit((X_train_brand, X_train_region, X_train_rest),
    y_train, epochs=100, verbose=0,
    callbacks=[EarlyStopping(patience=5)], validation_split=0.2)
np.min(hist.history["val_loss"])

0.6692155599594116

Plotting this model

plot_model(model, show_layer_names=True)

Why we need to reshape

plot_model(model, show_layer_names=True, show_shapes=True)

Scale By Exposure

Lecture Outline

Preprocessing
French Motor Claims & Poisson Regression
Ordinal Variables
Categorical Variables & Entity Embeddings
Keras’ Functional API
French Motor Dataset with Embeddings
Scale By Exposure

Two different models

Have \{ (\mathbf{x}_i, y_i) \}_{i=1, \dots, n} for \mathbf{x}_i \in \mathbb{R}^{47} and y_i \in \mathbb{N}_0.

Model 1: Say Y_i \sim \mathsf{Poisson}(\lambda(\mathbf{x}_i)).

But, the exposures are different for each policy. \lambda(\mathbf{x}_i) is the expected number of claims for the duration of policy i’s contract.

Model 2: Say Y_i \sim \mathsf{Poisson}(\text{Exposure}_i \times \lambda(\mathbf{x}_i)).

Now, \text{Exposure}_i \not\in \mathbf{x}_i, and \lambda(\mathbf{x}_i) is the rate per year.

Just take continuous variables

ct = make_column_transformer(
  ("passthrough", ["Exposure"]),
  ("drop", ["VehBrand", "Region", "Area", "VehGas"]),
  remainder=StandardScaler(),
  verbose_feature_names_out=False
)
X_train_ct = ct.fit_transform(X_train)
X_test_ct = ct.transform(X_test)

Split exposure apart from the rest:

X_train_exp = X_train_ct["Exposure"]; X_test_exp = X_test_ct["Exposure"]
X_train_rest = X_train_ct.drop("Exposure", axis=1)
X_test_rest = X_test_ct.drop("Exposure", axis=1)

Organise the inputs:

exposure = Input(shape=(1,), name="exposure")
other_inputs = Input(shape=X_train_rest.shape[1:], name="otherInputs")

Make & fit the model

Feed the continuous inputs to some normal dense layers.

random.seed(1337)
x = Dense(30, "relu", name="hidden1")(other_inputs)
x = Dense(30, "relu", name="hidden2")(x)
lambda_ = Dense(1, "exponential", name="lambda")(x)

from keras.layers import Multiply

out = Multiply(name="out")([lambda_, exposure])
model = Model([exposure, other_inputs], out)
model.compile(optimizer="adam", loss="poisson")

es = EarlyStopping(patience=10, restore_best_weights=True, verbose=1)
hist = model.fit((X_train_exp, X_train_rest),
    y_train, epochs=100, verbose=0,
    callbacks=[es], validation_split=0.2)
np.min(hist.history["val_loss"])

Epoch 40: early stopping
Restoring model weights from the end of the best epoch: 30.

0.8829042911529541

Plot the model

plot_model(model, show_layer_names=True)

Package Versions

from watermark import watermark
print(watermark(python=True, packages="keras,matplotlib,numpy,pandas,seaborn,scipy,torch,tensorflow,tf_keras"))

Python implementation: CPython
Python version       : 3.11.9
IPython version      : 8.24.0

keras     : 3.3.3
matplotlib: 3.8.4
numpy     : 1.26.4
pandas    : 2.2.2
seaborn   : 0.13.2
scipy     : 1.11.0
torch     : 2.0.1
tensorflow: 2.16.1
tf_keras  : 2.16.0

Glossary

entity embeddings
Input layer
Keras functional API

Reshape layer
skip connection
wide & deep network structure

	Region_R11	Region_R24	Region_R42	...	Region_R93
0	0.0	1.0	0.0	...	0.0
1	0.0	0.0	0.0	...	1.0
2	1.0	0.0	0.0	...	0.0
3	0.0	0.0	1.0	...	0.0
4	0.0	1.0	0.0	...	0.0

	R11	R21	R22	...
0	1.0	0.0	0.0	...
1	0.0	1.0	0.0	...
2	0.0	0.0	1.0	...

	Region_R11	Region_R24	Region_R42	...	Region_R93
0	0.0	1.0	0.0	...	0.0
1	0.0	0.0	0.0	...	1.0
2	1.0	0.0	0.0	...	0.0
3	0.0	0.0	1.0	...	0.0
4	0.0	1.0	0.0	...	0.0

	R11	R21	R22	...
0	1.0	0.0	0.0	...
1	0.0	1.0	0.0	...
2	0.0	0.0	1.0	...

Categorical Variables

Preprocessing

Keras model methods

Scikit-learn model methods

Scikit-learn preprocessing methods

Summary of the splitting

Dataframes & arrays

Keep as a DataFrame

French Motor Claims & Poisson Regression

French motor dataset

French motor dataset

Data dictionary

The model

Where are things defined?

String arguments to .compile

Keras’ “poisson” loss

Ordinal Variables

Subsample and split

What values do we see in the data?

Ordinal & binary categories are easy

Ordinal encoded values

Train on ordinal encoded values

Preprocess ordinal & continuous

Preprocess ordinal & continuous II

Categorical Variables & Entity Embeddings

Region column

One-hot encoding

Train on one-hot inputs

Consider the first layer

The first layer

Just a look-up operation

Turn the region into an index

Embedding

Fitting that model

Keras’ Embedding Layer

The learned embeddings

Entity embeddings

Embeddings & other inputs

Keras’ Functional API

Converting Sequential models

Wide & Deep network

Naming the layers

Inspecting a complex model

French Motor Dataset with Embeddings

The desired architecture

Preprocess all French motor inputs

Organise the inputs

Complete the model and fit it

Plotting this model

Why we need to reshape

Scale By Exposure

Two different models

Just take continuous variables

Make & fit the model

Plot the model

Package Versions

Glossary

String arguments to `.compile`

	Region_R11	Region_R24	Region_R42	...	Region_R93
0	0.0	1.0	0.0	...	0.0
1	0.0	0.0	0.0	...	1.0
2	1.0	0.0	0.0	...	0.0
3	0.0	0.0	1.0	...	0.0
4	0.0	1.0	0.0	...	0.0

	R11	R21	R22	...
0	1.0	0.0	0.0	...
1	0.0	1.0	0.0	...
2	0.0	0.0	1.0	...