array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
ACTL3143 & ACTL5111 Deep Learning for Actuaries
Lecture Outline
Tensors & Time Series
Some Recurrent Structures
Recurrent Neural Networks
CoreLogic Hedonic Home Value Index
Splitting time series data
Predicting Sydney House Prices
Predicting Multiple Time Series
Source: Paras Patidar (2019), Tensors — Representation of Data In Neural Networks, Medium article.
axis
argument in numpyStarting with a (3, 4)-shaped matrix:
The return value’s rank is one less than the input’s rank.
Important
The axis
parameter tells us which dimension is removed.
axis
& keepdims
With keepdims=True
, the rank doesn’t change.
Say we had n observations of a time series x_1, x_2, \dots, x_n.
This \mathbf{x} = (x_1, \dots, x_n) would have shape (n,) & rank 1.
If instead we had a batch of b time series’
\mathbf{X} = \begin{pmatrix} x_7 & x_8 & \dots & x_{7+n-1} \\ x_2 & x_3 & \dots & x_{2+n-1} \\ \vdots & \vdots & \ddots & \vdots \\ x_3 & x_4 & \dots & x_{3+n-1} \\ \end{pmatrix} \,,
the batch \mathbf{X} would have shape (b, n) & rank 2.
t | x | y |
---|---|---|
0 | x_0 | y_0 |
1 | x_1 | y_1 |
2 | x_2 | y_2 |
3 | x_3 | y_3 |
Say n observations of the m time series, would be a shape (n, m) matrix of rank 2.
In Keras, a batch of b of these time series has shape (b, n, m) and has rank 3.
Note
Use \mathbf{x}_t \in \mathbb{R}^{1 \times m} to denote the vector of all time series at time t. Here, \mathbf{x}_t = (x_t, y_t).
Lecture Outline
Tensors & Time Series
Some Recurrent Structures
Recurrent Neural Networks
CoreLogic Hedonic Home Value Index
Splitting time series data
Predicting Sydney House Prices
Predicting Multiple Time Series
A recurrence relation is an equation that expresses each element of a sequence as a function of the preceding ones. More precisely, in the case where only the immediately preceding element is involved, a recurrence relation has the form
u_n = \psi(n, u_{n-1}) \quad \text{ for } \quad n > 0.
Example: Factorial n! = n (n-1)! for n > 0 given 0! = 1.
Source: Wikipedia, Recurrence relation.
The RNN processes each data in the sequence one by one, while keeping memory of what came before.
Source: James et al (2022), An Introduction to Statistical Learning, 2nd edition, Figure 10.12.
All the outputs before the final one are often discarded.
Source: Christopher Olah (2015), Understanding LSTM Networks, Colah’s Blog.
Say each prediction is a vector of size d, so \mathbf{y}_t \in \mathbb{R}^{1 \times d}.
Then the main equation of a SimpleRNN, given \mathbf{y}_0 = \mathbf{0}, is
\mathbf{y}_t = \psi\bigl( \mathbf{x}_t \mathbf{W}_x + \mathbf{y}_{t-1} \mathbf{W}_y + \mathbf{b} \bigr) .
Here, \begin{aligned} &\mathbf{x}_t \in \mathbb{R}^{1 \times m}, \mathbf{W}_x \in \mathbb{R}^{m \times d}, \\ &\mathbf{y}_{t-1} \in \mathbb{R}^{1 \times d}, \mathbf{W}_y \in \mathbb{R}^{d \times d}, \text{ and } \mathbf{b} \in \mathbb{R}^{d}. \end{aligned}
Say we operate on batches of size b, then \mathbf{Y}_t \in \mathbb{R}^{b \times d}.
The main equation of a SimpleRNN, given \mathbf{Y}_0 = \mathbf{0}, is \mathbf{Y}_t = \psi\bigl( \mathbf{X}_t \mathbf{W}_x + \mathbf{Y}_{t-1} \mathbf{W}_y + \mathbf{b} \bigr) . Here, \begin{aligned} &\mathbf{X}_t \in \mathbb{R}^{b \times m}, \mathbf{W}_x \in \mathbb{R}^{m \times d}, \\ &\mathbf{Y}_{t-1} \in \mathbb{R}^{b \times d}, \mathbf{W}_y \in \mathbb{R}^{d \times d}, \text{ and } \mathbf{b} \in \mathbb{R}^{d}. \end{aligned}
Remember, \mathbf{X} \in \mathbb{R}^{b \times n \times m}, \mathbf{Y} \in \mathbb{R}^{b \times d}, and \mathbf{X}_t is equivalent to X[:, t, :]
.
As usual, the SimpleRNN
is just a layer in Keras.
from keras.layers import SimpleRNN
random.seed(1234)
model = Sequential([
SimpleRNN(output_size, activation="sigmoid")
])
model.compile(loss="binary_crossentropy", metrics=["accuracy"])
hist = model.fit(X, y, epochs=500, verbose=False)
model.evaluate(X, y, verbose=False)
[3.1845884323120117, 0.5]
The predicted probabilities on the training set are:
[array([[0.68],
[0.21]], dtype=float32),
array([[0.49]], dtype=float32),
array([-0.51], dtype=float32)]
Source: Christopher Olah (2015), Understanding LSTM Networks, Colah’s Blog.
Source: Christopher Olah (2015), Understanding LSTM Networks, Colah’s Blog.
Lecture Outline
Tensors & Time Series
Some Recurrent Structures
Recurrent Neural Networks
CoreLogic Hedonic Home Value Index
Splitting time series data
Predicting Sydney House Prices
Predicting Multiple Time Series
Source: Aurélien Géron (2019), Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition, Chapter 15.
Source: Aurélien Géron (2019), Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition, Chapter 15.
Lecture Outline
Tensors & Time Series
Some Recurrent Structures
Recurrent Neural Networks
CoreLogic Hedonic Home Value Index
Splitting time series data
Predicting Sydney House Prices
Predicting Multiple Time Series
Brisbane | East_Bris | North_Bris | West_Bris | Melbourne | North_Syd | Sydney | |
---|---|---|---|---|---|---|---|
Date | |||||||
1990-02-28 | 0.03 | -0.01 | 0.01 | 0.01 | 0.00 | -0.00 | -0.02 |
1990-03-31 | 0.01 | 0.03 | 0.01 | 0.01 | 0.02 | -0.00 | 0.03 |
... | ... | ... | ... | ... | ... | ... | ... |
2021-04-30 | 0.03 | 0.01 | 0.01 | -0.00 | 0.01 | 0.02 | 0.02 |
2021-05-31 | 0.03 | 0.03 | 0.03 | 0.03 | 0.03 | 0.02 | 0.04 |
376 rows × 7 columns
Brisbane 0.005496
East_Bris 0.005416
North_Bris 0.005024
West_Bris 0.004842
Melbourne 0.005677
North_Syd 0.004819
Sydney 0.005526
dtype: float64
Lecture Outline
Tensors & Time Series
Some Recurrent Structures
Recurrent Neural Networks
CoreLogic Hedonic Home Value Index
Splitting time series data
Predicting Sydney House Prices
Predicting Multiple Time Series
Keras has a built-in method for converting a time series into subsequences/chunks.
from keras.utils import timeseries_dataset_from_array
integers = range(10)
dummy_dataset = timeseries_dataset_from_array(
data=integers[:-3],
targets=integers[3:],
sequence_length=3,
batch_size=2,
)
for inputs, targets in dummy_dataset:
for i in range(inputs.shape[0]):
print([int(x) for x in inputs[i]], int(targets[i]))
[0, 1, 2] 3
[1, 2, 3] 4
[2, 3, 4] 5
[3, 4, 5] 6
[4, 5, 6] 7
Source: Code snippet in Chapter 10 of Chollet.
If you have a lot of time series data, then use:
from keras.utils import timeseries_dataset_from_array
data = range(20); seq = 3; ts = data[:-seq]; target = data[seq:]
nTrain = int(0.5 * len(ts)); nVal = int(0.25 * len(ts))
nTest = len(ts) - nTrain - nVal
print(f"# Train: {nTrain}, # Val: {nVal}, # Test: {nTest}")
# Train: 8, # Val: 4, # Test: 5
Training dataset
[0, 1, 2] 3
[1, 2, 3] 4
[2, 3, 4] 5
[3, 4, 5] 6
[4, 5, 6] 7
[5, 6, 7] 8
Validation dataset
[8, 9, 10] 11
[9, 10, 11] 12
Test dataset
[12, 13, 14] 15
[13, 14, 15] 16
[14, 15, 16] 17
Adapted from: François Chollet (2021), Deep Learning with Python, Second Edition, Listing 10.7.
If you don’t have a lot of time series data, consider:
Training dataset
[0, 1, 2] 3
[1, 2, 3] 4
[2, 3, 4] 5
[3, 4, 5] 6
[4, 5, 6] 7
[5, 6, 7] 8
[6, 7, 8] 9
[7, 8, 9] 10
Validation dataset
[8, 9, 10] 11
[9, 10, 11] 12
[10, 11, 12] 13
[11, 12, 13] 14
[12, 13, 14] 15
Test dataset
[13, 14, 15] 16
[14, 15, 16] 17
[15, 16, 17] 18
[16, 17, 18] 19
Lecture Outline
Tensors & Time Series
Some Recurrent Structures
Recurrent Neural Networks
CoreLogic Hedonic Home Value Index
Splitting time series data
Predicting Sydney House Prices
Predicting Multiple Time Series
Dataset
to numpyThe Dataset
object can be handed to Keras directly, but if we really need a numpy array, we can run:
The shape of our training set is now:
Converting the rest to numpy arrays:
from keras.layers import Input, Flatten
random.seed(1)
model_dense = Sequential([
Input((seq_length, num_ts)),
Flatten(),
Dense(50, activation="leaky_relu"),
Dense(20, activation="leaky_relu"),
Dense(1, activation="linear")
])
model_dense.compile(loss="mse", optimizer="adam")
print(f"This model has {model_dense.count_params()} parameters.")
es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
%time hist = model_dense.fit(X_train, y_train, epochs=1_000, \
validation_data=(X_val, y_val), callbacks=[es], verbose=0);
This model has 3191 parameters.
Epoch 57: early stopping
Restoring model weights from the end of the best epoch: 7.
CPU times: user 2.92 s, sys: 267 ms, total: 3.18 s
Wall time: 2.84 s
SimpleRNN
layerrandom.seed(1)
model_simple = Sequential([
Input((seq_length, num_ts)),
SimpleRNN(50),
Dense(1, activation="linear")
])
model_simple.compile(loss="mse", optimizer="adam")
print(f"This model has {model_simple.count_params()} parameters.")
es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
%time hist = model_simple.fit(X_train, y_train, epochs=1_000, \
validation_data=(X_val, y_val), callbacks=[es], verbose=0);
This model has 2951 parameters.
Epoch 62: early stopping
Restoring model weights from the end of the best epoch: 12.
CPU times: user 3.77 s, sys: 452 ms, total: 4.22 s
Wall time: 3.23 s
WARNING:tensorflow:5 out of the last 7 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x7a2c503049a0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
LSTM
layerfrom keras.layers import LSTM
random.seed(1)
model_lstm = Sequential([
Input((seq_length, num_ts)),
LSTM(50),
Dense(1, activation="linear")
])
model_lstm.compile(loss="mse", optimizer="adam")
es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
%time hist = model_lstm.fit(X_train, y_train, epochs=1_000, \
validation_data=(X_val, y_val), callbacks=[es], verbose=0);
Epoch 59: early stopping
Restoring model weights from the end of the best epoch: 9.
CPU times: user 4.5 s, sys: 442 ms, total: 4.94 s
Wall time: 3.73 s
WARNING:tensorflow:6 out of the last 8 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x7a2c882e79c0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
GRU
layerfrom keras.layers import GRU
random.seed(1)
model_gru = Sequential([
Input((seq_length, num_ts)),
GRU(50),
Dense(1, activation="linear")
])
model_gru.compile(loss="mse", optimizer="adam")
es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
%time hist = model_gru.fit(X_train, y_train, epochs=1_000, \
validation_data=(X_val, y_val), callbacks=[es], verbose=0)
Epoch 57: early stopping
Restoring model weights from the end of the best epoch: 7.
CPU times: user 4.71 s, sys: 530 ms, total: 5.24 s
Wall time: 3.76 s
GRU
layersrandom.seed(1)
model_two_grus = Sequential([
Input((seq_length, num_ts)),
GRU(50, return_sequences=True),
GRU(50),
Dense(1, activation="linear")
])
model_two_grus.compile(loss="mse", optimizer="adam")
es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
%time hist = model_two_grus.fit(X_train, y_train, epochs=1_000, \
validation_data=(X_val, y_val), callbacks=[es], verbose=0)
Epoch 56: early stopping
Restoring model weights from the end of the best epoch: 6.
CPU times: user 7.44 s, sys: 818 ms, total: 8.25 s
Wall time: 9.19 s
Model | MSE | |
---|---|---|
1 | SimpleRNN | 1.250792 |
0 | Dense | 1.164461 |
2 | LSTM | 0.835326 |
4 | 2 GRUs | 0.798951 |
3 | GRU | 0.743510 |
The network with two GRU layers is the best.
Lecture Outline
Tensors & Time Series
Some Recurrent Structures
Recurrent Neural Networks
CoreLogic Hedonic Home Value Index
Splitting time series data
Predicting Sydney House Prices
Predicting Multiple Time Series
Change the targets
argument to include all the suburbs.
Dataset
to numpyThe shape of our training set is now:
Converting the rest to numpy arrays:
random.seed(1)
model_dense = Sequential([
Input((seq_length, num_ts)),
Flatten(),
Dense(50, activation="leaky_relu"),
Dense(20, activation="leaky_relu"),
Dense(num_ts, activation="linear")
])
model_dense.compile(loss="mse", optimizer="adam")
print(f"This model has {model_dense.count_params()} parameters.")
es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
%time hist = model_dense.fit(X_train, y_train, epochs=1_000, \
validation_data=(X_val, y_val), callbacks=[es], verbose=0);
This model has 3317 parameters.
Epoch 75: early stopping
Restoring model weights from the end of the best epoch: 25.
CPU times: user 3.6 s, sys: 338 ms, total: 3.93 s
Wall time: 6.72 s
SimpleRNN
layerrandom.seed(1)
model_simple = Sequential([
Input((seq_length, num_ts)),
SimpleRNN(50),
Dense(num_ts, activation="linear")
])
model_simple.compile(loss="mse", optimizer="adam")
print(f"This model has {model_simple.count_params()} parameters.")
es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
%time hist = model_simple.fit(X_train, y_train, epochs=1_000, \
validation_data=(X_val, y_val), callbacks=[es], verbose=0);
This model has 3257 parameters.
Epoch 70: early stopping
Restoring model weights from the end of the best epoch: 20.
CPU times: user 4.18 s, sys: 391 ms, total: 4.57 s
Wall time: 6.02 s
LSTM
layerrandom.seed(1)
model_lstm = Sequential([
Input((seq_length, num_ts)),
LSTM(50),
Dense(num_ts, activation="linear")
])
model_lstm.compile(loss="mse", optimizer="adam")
es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
%time hist = model_lstm.fit(X_train, y_train, epochs=1_000, \
validation_data=(X_val, y_val), callbacks=[es], verbose=0);
Epoch 74: early stopping
Restoring model weights from the end of the best epoch: 24.
CPU times: user 4.5 s, sys: 371 ms, total: 4.87 s
Wall time: 3.57 s
GRU
layerrandom.seed(1)
model_gru = Sequential([
Input((seq_length, num_ts)),
GRU(50),
Dense(num_ts, activation="linear")
])
model_gru.compile(loss="mse", optimizer="adam")
es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
%time hist = model_gru.fit(X_train, y_train, epochs=1_000, \
validation_data=(X_val, y_val), callbacks=[es], verbose=0)
Epoch 70: early stopping
Restoring model weights from the end of the best epoch: 20.
CPU times: user 4.67 s, sys: 569 ms, total: 5.24 s
Wall time: 3.68 s
GRU
layersrandom.seed(1)
model_two_grus = Sequential([
Input((seq_length, num_ts)),
GRU(50, return_sequences=True),
GRU(50),
Dense(num_ts, activation="linear")
])
model_two_grus.compile(loss="mse", optimizer="adam")
es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
%time hist = model_two_grus.fit(X_train, y_train, epochs=1_000, \
validation_data=(X_val, y_val), callbacks=[es], verbose=0)
Epoch 67: early stopping
Restoring model weights from the end of the best epoch: 17.
CPU times: user 7.14 s, sys: 904 ms, total: 8.04 s
Wall time: 5.04 s
Model | MSE | |
---|---|---|
1 | SimpleRNN | 1.491682 |
0 | Dense | 1.429465 |
4 | 2 GRUs | 1.358651 |
3 | GRU | 1.344503 |
2 | LSTM | 1.331125 |
The network with an LSTM layer is the best.
from watermark import watermark
print(watermark(python=True, packages="keras,matplotlib,numpy,pandas,seaborn,scipy,torch,tensorflow,tf_keras"))
Python implementation: CPython
Python version : 3.11.9
IPython version : 8.24.0
keras : 3.3.3
matplotlib: 3.8.4
numpy : 1.26.4
pandas : 2.2.2
seaborn : 0.13.2
scipy : 1.11.0
torch : 2.0.1
tensorflow: 2.16.1
tf_keras : 2.16.0