Show the package imports
import random
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, Input
from keras.callbacks import EarlyStopping
ACTL3143 & ACTL5111 Deep Learning for Actuaries
import random
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, Input
from keras.callbacks import EarlyStopping
axis
argument in numpyStarting with a (3, 4)-shaped matrix:
= np.arange(12).reshape(3,4)
X X
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
The above code creates an array with values from 0 to 11 and converts that array into a matrix with 3 rows and 4 columns.
axis=0
: (3, 4) \leadsto (4,).
=0) X.sum(axis
array([12, 15, 18, 21])
The above code returns the column sum. This changes the shape of the matrix from (3,4) to (4,). Similarly, X.sum(axis=1)
returns row sums and will change the shape of the matrix from (3,4) to (3,).
axis=1
: (3, 4) \leadsto (3,).
=1) X.prod(axis
array([ 0, 840, 7920])
The return value’s rank is one less than the input’s rank.
The axis
parameter tells us which dimension is removed.
axis
& keepdims
With keepdims=True
, the rank doesn’t change.
= np.arange(12).reshape(3,4)
X X
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
axis=0
: (3, 4) \leadsto (1, 4).
=0, keepdims=True) X.sum(axis
array([[12, 15, 18, 21]])
axis=1
: (3, 4) \leadsto (3, 1).
=1, keepdims=True) X.prod(axis
array([[ 0],
[ 840],
[7920]])
/ X.sum(axis=1) X
ValueError: operands could not be broadcast together with shapes (3,4) (3,)
/ X.sum(axis=1, keepdims=True) X
array([[0. , 0.17, 0.33, 0.5 ],
[0.18, 0.23, 0.27, 0.32],
[0.21, 0.24, 0.26, 0.29]])
Say we had n observations of a time series x_1, x_2, \dots, x_n.
This \mathbf{x} = (x_1, \dots, x_n) would have shape (n,) & rank 1.
If instead we had a batch of b time series’
\mathbf{X} = \begin{pmatrix} x_7 & x_8 & \dots & x_{7+n-1} \\ x_2 & x_3 & \dots & x_{2+n-1} \\ \vdots & \vdots & \ddots & \vdots \\ x_3 & x_4 & \dots & x_{3+n-1} \\ \end{pmatrix} \,,
the batch \mathbf{X} would have shape (b, n) & rank 2.
Multivariate time series consists of more than 1 variable observation at a given time point. Following example has two variables x
and y
.
t | x | y |
---|---|---|
0 | x_0 | y_0 |
1 | x_1 | y_1 |
2 | x_2 | y_2 |
3 | x_3 | y_3 |
Say n observations of the m time series, would be a shape (n, m) matrix of rank 2.
In Keras, a batch of b of these time series has shape (b, n, m) and has rank 3.
Use \mathbf{x}_t \in \mathbb{R}^{1 \times m} to denote the vector of all time series at time t. Here, \mathbf{x}_t = (x_t, y_t).
A recurrence relation is an equation that expresses each element of a sequence as a function of the preceding ones. More precisely, in the case where only the immediately preceding element is involved, a recurrence relation has the form
u_n = \psi(n, u_{n-1}) \quad \text{ for } \quad n > 0.
Example: Factorial n! = n (n-1)! for n > 0 given 0! = 1.
The RNN processes each data in the sequence one by one, while keeping memory of what came before.
The following figure shows how the recurrent neural network combines an input X_l
with a preprocessed state of the process A_l
to produce the output O_l
. RNNs have a cyclic information processing structure that enables them to pass information sequentially from previous inputs. RNNs can capture dependencies and patterns in sequential data, making them useful for analysing time series data.
All the outputs before the final one are often discarded.
Say each prediction is a vector of size d, so \mathbf{y}_t \in \mathbb{R}^{1 \times d}.
Then the main equation of a SimpleRNN, given \mathbf{y}_0 = \mathbf{0}, is
\mathbf{y}_t = \psi\bigl( \mathbf{x}_t \mathbf{W}_x + \mathbf{y}_{t-1} \mathbf{W}_y + \mathbf{b} \bigr) .
Here, \begin{aligned} &\mathbf{x}_t \in \mathbb{R}^{1 \times m}, \mathbf{W}_x \in \mathbb{R}^{m \times d}, \\ &\mathbf{y}_{t-1} \in \mathbb{R}^{1 \times d}, \mathbf{W}_y \in \mathbb{R}^{d \times d}, \text{ and } \mathbf{b} \in \mathbb{R}^{d}. \end{aligned}
At each time step, a simple Recurrent Neural Network (RNN) takes an input vector x_t
, incorporate it with the information from the previous hidden state {y}_{t-1}
and produces an output vector at each time step y_t
. The hidden state helps the network remember the context of the previous words, enabling it to make informed predictions about what comes next in the sequence. In a simple RNN, the output at time (t-1)
is the same as the hidden state at time t
.
The difference between RNN and RNNs with batch processing lies in the way how the neural network handles sequences of input data. With batch processing, the model processes multiple (b) input sequences simultaneously. The training data is grouped into batches, and the weights are updated based on the average error across the entire batch. Batch processing often results in more stable weight updates, as the model learns from a diverse set of examples in each batch, reducing the impact of noise in individual sequences.
Say we operate on batches of size b, then \mathbf{Y}_t \in \mathbb{R}^{b \times d}.
The main equation of a SimpleRNN, given \mathbf{Y}_0 = \mathbf{0}, is \mathbf{Y}_t = \psi\bigl( \mathbf{X}_t \mathbf{W}_x + \mathbf{Y}_{t-1} \mathbf{W}_y + \mathbf{b} \bigr) . Here, \begin{aligned} &\mathbf{X}_t \in \mathbb{R}^{b \times m}, \mathbf{W}_x \in \mathbb{R}^{m \times d}, \\ &\mathbf{Y}_{t-1} \in \mathbb{R}^{b \times d}, \mathbf{W}_y \in \mathbb{R}^{d \times d}, \text{ and } \mathbf{b} \in \mathbb{R}^{d}. \end{aligned}
1= 4
num_obs 2= 3
num_time_steps 3= 2
num_time_series
= np.arange(num_obs*num_time_steps*num_time_series).astype(np.float32) \
X 4
.reshape([num_obs, num_time_steps, num_time_series])
= 1
output_size = np.array([0, 0, 1, 1]) y
12] X[:
X[:2]
selects the first two slices (0 and 1) along the first dimension, and returns a sub-tensor of shape (2,3,2).
array([[[ 0., 1.],
[ 2., 3.],
[ 4., 5.]],
[[ 6., 7.],
[ 8., 9.],
[10., 11.]]], dtype=float32)
12:] X[
X[2:]
selects the last two slices (2 and 3) along the first dimension, and returns a sub-tensor of shape (2,3,2).
array([[[12., 13.],
[14., 15.],
[16., 17.]],
[[18., 19.],
[20., 21.],
[22., 23.]]], dtype=float32)
As usual, the SimpleRNN
is just a layer in Keras.
1from keras.layers import SimpleRNN
21234)
random.seed(= Sequential([
model 3="sigmoid")
SimpleRNN(output_size, activation
])4="binary_crossentropy", metrics=["accuracy"])
model.compile(loss
5= model.fit(X, y, epochs=500, verbose=False)
hist 6=False) model.evaluate(X, y, verbose
hist
2024-05-13 16:53:01.494268: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2024-05-13 16:53:01.494304: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:134] retrieving CUDA diagnostic information for host: luthen
2024-05-13 16:53:01.494309: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:141] hostname: luthen
2024-05-13 16:53:01.494392: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:165] libcuda reported version is: 535.171.4
2024-05-13 16:53:01.494412: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:169] kernel reported version is: 535.171.4
2024-05-13 16:53:01.494415: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:248] kernel version seems to match DSO: 535.171.4
[3.1845884323120117, 0.5]
The predicted probabilities on the training set are:
=0) model.predict(X, verbose
array([[0.97],
[1. ],
[1. ],
[1. ]], dtype=float32)
To verify the results of predicted probabilities, we can obtain the weights of the fitted model and calculate the outcome manually as follows.
model.get_weights()
[array([[0.68],
[0.21]], dtype=float32),
array([[0.49]], dtype=float32),
array([-0.51], dtype=float32)]
def sigmoid(x):
return 1 / (1 + np.exp(-x))
= model.get_weights()
W_x, W_y, b
= np.zeros((num_obs, output_size), dtype=np.float32)
Y for t in range(num_time_steps):
= X[:, t, :]
X_t = X_t @ W_x + Y @ W_y + b
z = sigmoid(z)
Y
Y
array([[0.97],
[1. ],
[1. ],
[1. ]], dtype=float32)
Simple RNN structures encounter vanishing gradient problems, hence, struggle with learning long term dependencies. LSTM are designed to overcome this problem. LSTMs have a more complex network structure (contains more memory cells and gating mechanisms) and can better regulate the information flow.
GRUs are simpler compared to LSTM, hence, computationally more efficient than LSTMs.
= house_prices.pct_change().dropna()
changes 2) changes.round(
Brisbane | East_Bris | North_Bris | West_Bris | Melbourne | North_Syd | Sydney | |
---|---|---|---|---|---|---|---|
Date | |||||||
1990-02-28 | 0.03 | -0.01 | 0.01 | 0.01 | 0.00 | -0.00 | -0.02 |
1990-03-31 | 0.01 | 0.03 | 0.01 | 0.01 | 0.02 | -0.00 | 0.03 |
... | ... | ... | ... | ... | ... | ... | ... |
2021-04-30 | 0.03 | 0.01 | 0.01 | -0.00 | 0.01 | 0.02 | 0.02 |
2021-05-31 | 0.03 | 0.03 | 0.03 | 0.03 | 0.03 | 0.02 | 0.04 |
376 rows × 7 columns
; changes.plot()
changes.mean()
Brisbane 0.005496
East_Bris 0.005416
North_Bris 0.005024
West_Bris 0.004842
Melbourne 0.005677
North_Syd 0.004819
Sydney 0.005526
dtype: float64
*= 100 changes
changes.mean()
Brisbane 0.549605
East_Bris 0.541562
North_Bris 0.502390
West_Bris 0.484204
Melbourne 0.567700
North_Syd 0.481863
Sydney 0.552641
dtype: float64
=False); changes.plot(legend
= int(0.6 * len(changes))
num_train = int(0.2 * len(changes))
num_val = len(changes) - num_train - num_val
num_test print(f"# Train: {num_train}, # Val: {num_val}, # Test: {num_test}")
# Train: 225, # Val: 75, # Test: 76
Keras has a built-in method for converting a time series into subsequences/chunks.
from keras.utils import timeseries_dataset_from_array
= range(10)
integers = timeseries_dataset_from_array(
dummy_dataset =integers[:-3],
data=integers[3:],
targets=3,
sequence_length=2,
batch_size
)
for inputs, targets in dummy_dataset:
for i in range(inputs.shape[0]):
print([int(x) for x in inputs[i]], int(targets[i]))
[0, 1, 2] 3
[1, 2, 3] 4
[2, 3, 4] 5
[3, 4, 5] 6
[4, 5, 6] 7
If you have a lot of time series data, then use:
from keras.utils import timeseries_dataset_from_array
= range(20); seq = 3; ts = data[:-seq]; target = data[seq:]
data = int(0.5 * len(ts)); nVal = int(0.25 * len(ts))
nTrain = len(ts) - nTrain - nVal
nTest print(f"# Train: {nTrain}, # Val: {nVal}, # Test: {nTest}")
# Train: 8, # Val: 4, # Test: 5
= \
trainDS
timeseries_dataset_from_array(
ts, target, seq,=nTrain) end_index
= \
valDS
timeseries_dataset_from_array(
ts, target, seq,=nTrain,
start_index=nTrain+nVal) end_index
= \
testDS
timeseries_dataset_from_array(
ts, target, seq,=nTrain+nVal) start_index
Training dataset
[0, 1, 2] 3
[1, 2, 3] 4
[2, 3, 4] 5
[3, 4, 5] 6
[4, 5, 6] 7
[5, 6, 7] 8
2024-05-13 16:53:39.378285: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
Validation dataset
[8, 9, 10] 11
[9, 10, 11] 12
2024-05-13 16:53:39.546092: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
Test dataset
[12, 13, 14] 15
[13, 14, 15] 16
[14, 15, 16] 17
2024-05-13 16:53:39.640563: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
If you don’t have a lot of time series data, consider:
= []; y = []
X for i in range(len(data)-seq):
+seq])
X.append(data[i:i+seq])
y.append(data[i= np.array(X); y = np.array(y); X
= int(0.5 * X.shape[0])
nTrain = X[:nTrain]
X_train = y[:nTrain] y_train
= int(np.ceil(0.25 * X.shape[0]))
nVal = X[nTrain:nTrain+nVal]
X_val = y[nTrain:nTrain+nVal] y_val
= X.shape[0] - nTrain - nVal
nTest = X[nTrain+nVal:]
X_test = y[nTrain+nVal:] y_test
Training dataset
[0, 1, 2] 3
[1, 2, 3] 4
[2, 3, 4] 5
[3, 4, 5] 6
[4, 5, 6] 7
[5, 6, 7] 8
[6, 7, 8] 9
[7, 8, 9] 10
Validation dataset
[8, 9, 10] 11
[9, 10, 11] 12
[10, 11, 12] 13
[11, 12, 13] 14
[12, 13, 14] 15
Test dataset
[13, 14, 15] 16
[14, 15, 16] 17
[15, 16, 17] 18
[16, 17, 18] 19
# Num. of input time series.
= changes.shape[1]
num_ts
# How many prev. months to use.
= 6
seq_length
# Predict the next month ahead.
= 1
ahead
# The index of the first target.
= (seq_length+ahead-1) delay
# Which suburb to predict.
= changes["Sydney"]
target_suburb
= \
train_ds
timeseries_dataset_from_array(-delay],
changes[:=target_suburb[delay:],
targets=seq_length,
sequence_length=num_train) end_index
= \
val_ds
timeseries_dataset_from_array(-delay],
changes[:=target_suburb[delay:],
targets=seq_length,
sequence_length=num_train,
start_index=num_train+num_val) end_index
= \
test_ds
timeseries_dataset_from_array(-delay],
changes[:=target_suburb[delay:],
targets=seq_length,
sequence_length=num_train+num_val) start_index
Dataset
to numpyThe Dataset
object can be handed to Keras directly, but if we really need a numpy array, we can run:
= np.concatenate(list(train_ds.map(lambda x, y: x)))
X_train = np.concatenate(list(train_ds.map(lambda x, y: y))) y_train
2024-05-13 16:53:40.200383: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
2024-05-13 16:53:40.352833: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
The shape of our training set is now:
X_train.shape
(220, 6, 7)
y_train.shape
(220,)
Converting the rest to numpy arrays:
= np.concatenate(list(val_ds.map(lambda x, y: x)))
X_val = np.concatenate(list(val_ds.map(lambda x, y: y)))
y_val = np.concatenate(list(test_ds.map(lambda x, y: x)))
X_test = np.concatenate(list(test_ds.map(lambda x, y: y))) y_test
2024-05-13 16:53:40.491475: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
2024-05-13 16:53:40.618379: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
2024-05-13 16:53:40.767503: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
2024-05-13 16:53:40.904810: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
from keras.layers import Input, Flatten
1)
random.seed(= Sequential([
model_dense
Input((seq_length, num_ts)),
Flatten(),50, activation="leaky_relu"),
Dense(20, activation="leaky_relu"),
Dense(1, activation="linear")
Dense(
])="mse", optimizer="adam")
model_dense.compile(lossprint(f"This model has {model_dense.count_params()} parameters.")
= EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
es %time hist = model_dense.fit(X_train, y_train, epochs=1_000, \
=(X_val, y_val), callbacks=[es], verbose=0); validation_data
This model has 3191 parameters.
Epoch 57: early stopping
Restoring model weights from the end of the best epoch: 7.
CPU times: user 3.29 s, sys: 286 ms, total: 3.57 s
Wall time: 6.92 s
from keras.utils import plot_model
=True) plot_model(model_dense, show_shapes
=0) model_dense.evaluate(X_val, y_val, verbose
1.1644608974456787
SimpleRNN
layer1)
random.seed(
= Sequential([
model_simple
Input((seq_length, num_ts)),50),
SimpleRNN(1, activation="linear")
Dense(
])="mse", optimizer="adam")
model_simple.compile(lossprint(f"This model has {model_simple.count_params()} parameters.")
= EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
es %time hist = model_simple.fit(X_train, y_train, epochs=1_000, \
=(X_val, y_val), callbacks=[es], verbose=0); validation_data
This model has 2951 parameters.
Epoch 62: early stopping
Restoring model weights from the end of the best epoch: 12.
CPU times: user 4.53 s, sys: 426 ms, total: 4.96 s
Wall time: 8.11 s
=0) model_simple.evaluate(X_val, y_val, verbose
1.2507916688919067
=True) plot_model(model_simple, show_shapes
WARNING:tensorflow:5 out of the last 7 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x7f616fdfb100> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
LSTM
layerfrom keras.layers import LSTM
1)
random.seed(
= Sequential([
model_lstm
Input((seq_length, num_ts)),50),
LSTM(1, activation="linear")
Dense(
])
="mse", optimizer="adam")
model_lstm.compile(loss
= EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
es
%time hist = model_lstm.fit(X_train, y_train, epochs=1_000, \
=(X_val, y_val), callbacks=[es], verbose=0); validation_data
Epoch 59: early stopping
Restoring model weights from the end of the best epoch: 9.
CPU times: user 4.97 s, sys: 435 ms, total: 5.4 s
Wall time: 8.87 s
=0) model_lstm.evaluate(X_val, y_val, verbose
0.8353261947631836
WARNING:tensorflow:6 out of the last 8 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x7f617696e7a0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
GRU
layerfrom keras.layers import GRU
1)
random.seed(
= Sequential([
model_gru
Input((seq_length, num_ts)),50),
GRU(1, activation="linear")
Dense(
])
="mse", optimizer="adam")
model_gru.compile(loss
= EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
es
%time hist = model_gru.fit(X_train, y_train, epochs=1_000, \
=(X_val, y_val), callbacks=[es], verbose=0) validation_data
Epoch 57: early stopping
Restoring model weights from the end of the best epoch: 7.
CPU times: user 5.36 s, sys: 418 ms, total: 5.78 s
Wall time: 4.37 s
=0) model_gru.evaluate(X_val, y_val, verbose
0.7435100078582764
GRU
layers1)
random.seed(
= Sequential([
model_two_grus
Input((seq_length, num_ts)),50, return_sequences=True),
GRU(50),
GRU(1, activation="linear")
Dense(
])
="mse", optimizer="adam")
model_two_grus.compile(loss
= EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
es
%time hist = model_two_grus.fit(X_train, y_train, epochs=1_000, \
=(X_val, y_val), callbacks=[es], verbose=0) validation_data
Epoch 56: early stopping
Restoring model weights from the end of the best epoch: 6.
CPU times: user 7.82 s, sys: 860 ms, total: 8.68 s
Wall time: 5.52 s
=0) model_two_grus.evaluate(X_val, y_val, verbose
0.7989509105682373
Model | MSE | |
---|---|---|
1 | SimpleRNN | 1.250792 |
0 | Dense | 1.164461 |
2 | LSTM | 0.835326 |
4 | 2 GRUs | 0.798951 |
3 | GRU | 0.743510 |
The network with two GRU layers is the best.
=0) model_two_grus.evaluate(test_ds, verbose
1.8552547693252563
Change the targets
argument to include all the suburbs.
= \
train_ds
timeseries_dataset_from_array(-delay],
changes[:=changes[delay:],
targets=seq_length,
sequence_length=num_train) end_index
= \
val_ds
timeseries_dataset_from_array(-delay],
changes[:=changes[delay:],
targets=seq_length,
sequence_length=num_train,
start_index=num_train+num_val) end_index
= \
test_ds
timeseries_dataset_from_array(-delay],
changes[:=changes[delay:],
targets=seq_length,
sequence_length=num_train+num_val) start_index
Dataset
to numpyThe shape of our training set is now:
= np.concatenate(list(train_ds.map(lambda x, y: x)))
X_train X_train.shape
2024-05-13 16:54:23.297784: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
(220, 6, 7)
= np.concatenate(list(train_ds.map(lambda x, y: y)))
y_train y_train.shape
2024-05-13 16:54:23.357777: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
(220, 7)
Converting the rest to numpy arrays:
= np.concatenate(list(val_ds.map(lambda x, y: x)))
X_val = np.concatenate(list(val_ds.map(lambda x, y: y)))
y_val = np.concatenate(list(test_ds.map(lambda x, y: x)))
X_test = np.concatenate(list(test_ds.map(lambda x, y: y))) y_test
2024-05-13 16:54:23.409925: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
2024-05-13 16:54:23.454994: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
2024-05-13 16:54:23.498338: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
2024-05-13 16:54:23.554103: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
1)
random.seed(= Sequential([
model_dense
Input((seq_length, num_ts)),
Flatten(),50, activation="leaky_relu"),
Dense(20, activation="leaky_relu"),
Dense(="linear")
Dense(num_ts, activation
])="mse", optimizer="adam")
model_dense.compile(lossprint(f"This model has {model_dense.count_params()} parameters.")
= EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
es %time hist = model_dense.fit(X_train, y_train, epochs=1_000, \
=(X_val, y_val), callbacks=[es], verbose=0); validation_data
This model has 3317 parameters.
Epoch 75: early stopping
Restoring model weights from the end of the best epoch: 25.
CPU times: user 4.68 s, sys: 352 ms, total: 5.03 s
Wall time: 5.31 s
=True) plot_model(model_dense, show_shapes
=0) model_dense.evaluate(X_val, y_val, verbose
1.4294650554656982
SimpleRNN
layer1)
random.seed(
= Sequential([
model_simple
Input((seq_length, num_ts)),50),
SimpleRNN(="linear")
Dense(num_ts, activation
])="mse", optimizer="adam")
model_simple.compile(lossprint(f"This model has {model_simple.count_params()} parameters.")
= EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
es %time hist = model_simple.fit(X_train, y_train, epochs=1_000, \
=(X_val, y_val), callbacks=[es], verbose=0); validation_data
This model has 3257 parameters.
Epoch 70: early stopping
Restoring model weights from the end of the best epoch: 20.
CPU times: user 6.74 s, sys: 670 ms, total: 7.41 s
Wall time: 7.42 s
=0) model_simple.evaluate(X_val, y_val, verbose
1.4916820526123047
=True) plot_model(model_simple, show_shapes
LSTM
layer1)
random.seed(
= Sequential([
model_lstm
Input((seq_length, num_ts)),50),
LSTM(="linear")
Dense(num_ts, activation
])
="mse", optimizer="adam")
model_lstm.compile(loss
= EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
es
%time hist = model_lstm.fit(X_train, y_train, epochs=1_000, \
=(X_val, y_val), callbacks=[es], verbose=0); validation_data
Epoch 74: early stopping
Restoring model weights from the end of the best epoch: 24.
CPU times: user 6.59 s, sys: 660 ms, total: 7.25 s
Wall time: 6.51 s
=0) model_lstm.evaluate(X_val, y_val, verbose
1.331125020980835
GRU
layer1)
random.seed(
= Sequential([
model_gru
Input((seq_length, num_ts)),50),
GRU(="linear")
Dense(num_ts, activation
])
="mse", optimizer="adam")
model_gru.compile(loss
= EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
es
%time hist = model_gru.fit(X_train, y_train, epochs=1_000, \
=(X_val, y_val), callbacks=[es], verbose=0) validation_data
Epoch 70: early stopping
Restoring model weights from the end of the best epoch: 20.
CPU times: user 6.11 s, sys: 574 ms, total: 6.68 s
Wall time: 4.85 s
=0) model_gru.evaluate(X_val, y_val, verbose
1.344503402709961
GRU
layers1)
random.seed(
= Sequential([
model_two_grus
Input((seq_length, num_ts)),50, return_sequences=True),
GRU(50),
GRU(="linear")
Dense(num_ts, activation
])
="mse", optimizer="adam")
model_two_grus.compile(loss
= EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
es
%time hist = model_two_grus.fit(X_train, y_train, epochs=1_000, \
=(X_val, y_val), callbacks=[es], verbose=0) validation_data
Epoch 67: early stopping
Restoring model weights from the end of the best epoch: 17.
CPU times: user 8.92 s, sys: 862 ms, total: 9.78 s
Wall time: 6.04 s
=0) model_two_grus.evaluate(X_val, y_val, verbose
1.358651041984558
Model | MSE | |
---|---|---|
1 | SimpleRNN | 1.491682 |
0 | Dense | 1.429465 |
4 | 2 GRUs | 1.358651 |
3 | GRU | 1.344503 |
2 | LSTM | 1.331125 |
The network with an LSTM layer is the best.
=0) model_lstm.evaluate(test_ds, verbose
1.932026982307434
from watermark import watermark
print(watermark(python=True, packages="keras,matplotlib,numpy,pandas,seaborn,scipy,torch,tensorflow,tf_keras"))
Python implementation: CPython
Python version : 3.11.9
IPython version : 8.24.0
keras : 3.3.3
matplotlib: 3.8.4
numpy : 1.26.4
pandas : 2.2.2
seaborn : 0.13.2
scipy : 1.11.0
torch : 2.0.1
tensorflow: 2.16.1
tf_keras : 2.16.0