Empirical Dynamic Modelling

Automatic Causal Inference and Forecasting

Patrick Laub

UNSW Sydney, School of Risk and Actuarial Studies

February 7, 2023

Goal: automatic causal inference


df <- read.csv("chicago.csv")
head(df)
#>   Time Temperature Crime
#> 1    1       24.08  1605
#> 2    2       19.04  1119
#> 3    3       28.04  1127
#> 4    4       30.02  1154
#> 5    5       35.96  1251
#> 6    6       33.08  1276

library(fastEDM)

crimeCCMCausesTemp <- easy_edm("Crime", "Temperature", data=df)
#> ✖ No evidence of CCM causation from Crime to Temperature found.

tempCCMCausesCrime <- easy_edm("Temperature", "Crime", data=df)
#> ✔ Some evidence of CCM causation from Temperature to Crime found.

Jinjing Li
University of Canberra

George Sugihara
University of California San Diego

Michael J. Zyphur
University of Queensland

Patrick J. Laub
UNSW


Acknowledgments

Discovery Project DP200100219 and Future Fellowship FT140100629.

Mirage correlation

A different view of causality


Imagine \(x_t\), \(y_t\), \(z_t\) are interesting time series…

If the data is generated according to the nonlinear system:

\[ \begin{aligned} x_{t+1} &= \sigma (y_t - x_t) \\ y_{t+1} &= x_t (\rho - z_t) - y_t \\ z_{t+1} &= x_t y_t - \beta z_t \end{aligned} \]

then \(y \Rightarrow x\), both \(x, z \Rightarrow y\), and both \(x, y \Rightarrow z\).

Fish brains

Gerald Pao (2021), “How to use Causality without Correlation to Download Brains into Computers”, QMNET Seminar.

Linear/nonlinear dynamical systems


Say \(\mathbf{x}_t = (x_t, y_t, z_t)\), then if:

\[ \mathbf{x}_{t+1} = \mathbf{A} \mathbf{x}_{t} \]

we have a linear system.

\[ \mathbf{x}_{t+1} = f(\mathbf{x}_{t}) \]

we have a nonlinear system.

Using a term like nonlinear science is like referring to the bulk of zoology as the study of non-elephant animals. (Stanisław Ulam)

Unobserved variables

Takens’ theorem to the rescue, though…

Takens’ theorem is a deep mathematical result with far-reaching implications. Unfortunately, to really understand it, it requires a background in topology. (Munch et al. 2020)

Simplex Algorithm

Toy problem


\(t\) \(x_t\)
0 0.00
1 1.00
2 1.35
... ...
98 0.07
99 -0.33

Create the embeddings

\(t\) \(x_t\)
0 0.00
1 1.00
2 1.35
... ...
98 0.07
99 -0.33

\(i\) \(x_t\) \(x_{t-1}\) \(x_{t-2}\)
0 1.35 1.00 0.00
1 0.75 1.35 1.00
2 -0.31 0.75 1.35
... ... ... ...
95 0.45 0.54 0.11
96 0.07 0.45 0.54

\(i\) \(x_{t+1}\)
0 0.75
1 -0.31
2 -1.00
... ...
95 0.07
96 -0.33

Making a prediction

\[ \boldsymbol{x}^* = \bigl(0.11, 0.42, 0.57 \bigr) \]

Look at the manifold

Calculate the distances to the point

\(i\) \(x_t\) \(x_{t-1}\) \(x_{t-2}\)
0 1.35 1.00 0.00
1 0.75 1.35 1.00
2 -0.31 0.75 1.35
... ... ... ...
95 0.45 0.54 0.11
96 0.07 0.45 0.54

\(x_{t+1}\)
0.75
-0.31
-1.00
...
0.07
-0.33

\(d(\mathbf{x}_i, \mathbf{x}^*)\)
0.98
1.30
1.56
...
0.17
0.66

Find the nearest neighbours

\(i\) \(x_t\) \(x_{t-1}\) \(x_{t-2}\)
95 0.45 0.54 0.11
57 0.74 0.49 -0.13
65 0.87 0.51 -0.08
... ... ... ...
26 -1.17 -1.29 -0.62
12 -0.78 -1.42 -1.05

\(x_{t+1}\)
0.07
0.50
0.62
...
-0.20
0.21

\(d(\mathbf{x}_i, \mathbf{x}^*)\)
0.17
0.30
0.37
...
2.55
2.56

Find the nearest neighbours

\(i\) \(x_t\) \(x_{t-1}\) \(x_{t-2}\)
95 0.45 0.54 0.11
57 0.74 0.49 -0.13
65 0.87 0.51 -0.08
80 0.78 0.74 0.17
\(x_{t+1}\)
0.07
0.50
0.62
0.31
\(d(\mathbf{x}_i, \mathbf{x}^*)\)
0.17
0.30
0.37
0.39

Plot those trajectories

Make a prediction

\(i\) \(x_{t+1}\)
95 0.07
57 0.50
65 0.62
80 0.31
\(d(\mathbf{x}_i, \mathbf{x}^*)\)
0.17
0.30
0.37
0.39
\(\tilde{d}(\mathbf{x}_i, \mathbf{x}^*)\)
1.00
1.76
2.18
2.29
\(w_i\)
0.49
0.23
0.15
0.13


\[w_i = \frac{ \exp\bigl\{ -\theta \, \tilde{d}(\mathbf{x}_i, \mathbf{x}^*) \bigr\} }{ \sum_{j=1}^k \exp\bigl\{ -\theta \, \tilde{d}(\mathbf{x}_j, \mathbf{x}^*) \bigr\} } \]

Plot those trajectories (weighted)

S-map Algorithm


Sequential Locally Weighted Global Linear Maps

Lorenz system

Given \(x_t\) from the Lorenz system:

predict \(x_{t+10}\) given \((x_t, x_{t-20})\).

Time series to embedding

\(t\) \(x_t\) \(y_t\) \(z_t\)
0 0.95 10.141630 0.048244
1 1.85 10.739436 0.192093
2 2.75 11.769597 0.442604
... ... ... ...
1998 14.52 16.536739 51.960670
1999 14.69 16.014687 52.941509

\(i\) \(x_t\) \(x_{t-20}\)
0 27.53 0.95
1 24.75 1.85
2 21.34 2.75
... ... ...
1768 -16.59 -4.76
1769 -17.32 -4.90

\(i\) \(x_{t+10}\)
0 -4.75
1 -6.82
2 -8.63
... ...
1768 -15.64
1769 -14.56

Calculate normalised distances

\(i\) \(x_t\) \(x_{t-20}\)
0 27.53 0.95
1 24.75 1.85
2 21.34 2.75
... ... ...
1768 -16.59 -4.76
1769 -17.32 -4.90

\(x_{t+10}\)
-4.75
-6.82
-8.63
...
-15.64
-14.56

\(d(\mathbf{x}_i, \mathbf{x}^*)\)
18.32
15.45
12.06
...
31.18
31.90

\(\hat{d}(\mathbf{x}_i, \mathbf{x}^*)\)
0.91
0.77
0.60
...
1.55
1.58

The average of the distances is:

20.13

Weights

\(i\) \(x_t\) \(x_{t-20}\)
0 27.53 0.95
1 24.75 1.85
2 21.34 2.75
... ... ...
1768 -16.59 -4.76
1769 -17.32 -4.90

\(d(\mathbf{x}_i, \mathbf{x}^*)\)
18.32
15.45
12.06
...
31.18
31.90

\(w_i\)
0.01
0.02
0.05
...
0.00
0.00

\[w_i = \exp\bigl\{ -\theta \, \hat{d}(\mathbf{x}_i, \mathbf{x}^*) \bigr\}\]

Linear auto-regression

Linear regression plane

Weighted linear regression

Empirical Dynamic Modelling (EDM)

Create lagged embeddings


Given two time series, create \(E\)-length trajectories

\[ \mathbf{x}_t = (\text{Temp}_t, \text{Temp}_{t-1}, \dots, \text{Temp}_{t-(E-1)}) \in \mathbb{R}^{E} \]

and targets

\[ y_t = \text{Crime}_{t} .\]

Note

The \(\mathbf{x}_t\)’s are called points (on the shadow manifold).

Split the data

  • \(\mathcal{L} = \{ (\mathbf{x}_1, y_1) , \dots , (\mathbf{x}_{n} , y_{n}) \}\) is library set,
  • \(\mathcal{P} = \{ (\mathbf{x}_{n+1}, y_{n+1}) , \dots , (\mathbf{x}_{T}, y_{T}) \}\) is prediction set.


For point \(\mathbf{x}_{s} \in \mathcal{P}\), pretend we don’t know \(y_s\) and try to predict it.

\[ \forall \, \mathbf{x} \in \mathcal{L} \quad \text{ find } \quad d(\mathbf{x}_{s}, \mathbf{x}) \]

This is computationally demanding.

Convergent cross mapping


  • If \(\text{Temp}_t\) causes \(\text{Crime}_t\), then information about \(\text{Temp}_t\) is somehow embedded in \(\text{Crime}_t\).

  • By observing \(\text{Crime}_t\), we should be able to forecast \(\text{Temp}_t\).

  • By observing more of \(\text{Crime}_t\) (more “training data”), our forecasts of \(\text{Temp}_t\) should be more accurate.


Example: Chicago crime and temperature.

Software

Stata package

R package


Thanks to Rishi Dhushiyandan for his hard work on easy_edm.


Python package

Modern engineering

  • Open code (9,745 LOC) on MIT License,
  • unit & integration tests (5,342 LOC),
  • documentation (5,042 LOC),
  • Git (1,198 commits),
  • Github Actions (11 tasks),
  • vectorised, microbenchmarking, ASAN, linting,
  • all C++ compilers, WASM, all OSs.

Get involved!


😊 Give it a try, feedback would be very welcome.


😍 If you’re talented in causal inference or programming (Stata/Mata, R, Javascript, C++, Python), we’d love contributions!