Automatic Causal Inference and Forecasting

Dr Patrick Laub

Time Series and Forecasting Symposium

December 2, 2022

```
df <- read.csv("chicago.csv")
head(df)
#> Time Temperature Crime
#> 1 1 24.08 1605
#> 2 2 19.04 1119
#> 3 3 28.04 1127
#> 4 4 30.02 1154
#> 5 5 35.96 1251
#> 6 6 33.08 1276
library(fastEDM)
crimeCCMCausesTemp <- easy_edm("Crime", "Temperature", data=df)
#> ✖ No evidence of CCM causation from Crime to Temperature found.
tempCCMCausesCrime <- easy_edm("Temperature", "Crime", data=df)
#> ✔ Some evidence of CCM causation from Temperature to Crime found.
```

Jinjing Li

University of Canberra

George Sugihara

University of California San Diego

Michael J. Zyphur

University of Queensland

Patrick J. Laub

UNSW

Imagine x_t, y_t, z_t are interesting time series…

*If* the data is generated according to the nonlinear system:

\begin{aligned} x_{t+1} &= \sigma (y_t - x_t) \\ y_{t+1} &= x_t (\rho - z_t) - y_t \\ z_{t+1} &= x_t y_t - \beta z_t \end{aligned}

then y \Rightarrow x, both x, z \Rightarrow y, and both x, y \Rightarrow z.

Say \mathbf{x}_t = (x_t, y_t, z_t), then if:

\mathbf{x}_{t+1} = \mathbf{A} \mathbf{x}_{t}

we have a linear system.

\mathbf{x}_{t+1} = f(\mathbf{x}_{t})

we have a nonlinear system.

Using a term like nonlinear science is like referring to the bulk of zoology as the study of non-elephant animals. (Stanisław Ulam)

We don’t fit a model for f, non-parametrically use the data. Hence the name *empirical* dynamic modelling.

Takens’ theorem to the rescue, though…

Takens’ theorem is a deep mathematical result with far-reaching implications. Unfortunately, to really understand it, it requires a background in topology. (Munch et al. 2020)

Source: Munch et al. (2020), Frequently asked questions about nonlinear dynamics and empirical dynamic modelling, ICES Journal of Marine Science.

Given two time series, create E-length trajectories

\mathbf{x}_t = (\text{Temp}_t, \text{Temp}_{t-1}, \dots, \text{Temp}_{t-(E-1)}) \in \mathbb{R}^{E}

and targets

y_t = \text{Crime}_{t} .

- \mathcal{L} = \{ (\mathbf{x}_1, y_1) , \dots , (\mathbf{x}_{n} , y_{n}) \} is
*library set*, - \mathcal{P} = \{ (\mathbf{x}_{n+1}, y_{n+1}) , \dots , (\mathbf{x}_{T}, y_{T}) \} is
*prediction set*.

For point \mathbf{x}_{s} \in \mathcal{P}, pretend we don’t know y_s and try to predict it.

\forall \, \mathbf{x} \in \mathcal{L} \quad \text{ find } \quad d(\mathbf{x}_{s}, \mathbf{x})

This is computationally demanding.

For point \mathbf{x}_{s} \in \mathcal{P}, find k nearest neighbours in \mathcal{L}.

Say, e.g., k=2 and the neighbours are

\mathcal{NN}_k = \bigl( (\mathbf{x}_{3}, y_3), (\mathbf{x}_{5}, y_5) \bigr)

The *simplex method* predicts

\widehat{y}_s = w_1 y_3 + w_2 y_5 .

*Sequential Locally Weighted Global Linear Maps (S-map)*

Weight the points by distance w_i = \exp\bigl\{ - \theta d(\mathbf{x}_{s}, \mathbf{x}_i) \bigr\} .

Build a local linear system \widehat{y}_s = \mathbf{x}_s^\top \boldsymbol{\beta}_s .

For all s \in \mathcal{P}, compare \widehat{y}_s to true y_s, and calculate \rho.

If \text{Temp}_t causes \text{Crime}_t, then information about \text{Temp}_t is somehow embedded in \text{Crime}_t.

By observing \text{Crime}_t, we should be able to forecast \text{Temp}_t.

By observing more of \text{Crime}_t (more “training data”), our forecasts of \text{Temp}_t should be more accurate.

*Example*: Chicago crime and temperature.

Thanks to Rishi Dhushiyandan for his hard work on `easy_edm`

.

- Open code (9,745 LOC) on MIT License,
- unit & integration tests (5,342 LOC),
- documentation (5,042 LOC),
- Git (1,198 commits),
- Github Actions (11 tasks),
- vectorised, microbenchmarking, ASAN, linting,
- all C++ compilers, WASM, all OSs.

😊 Give it a try, feedback would be very welcome.

😍 If you’re talented in causal inference or programming (Stata/Mata, R, Javascript, C++, Python), we’d love contributions!

Patrick Laub, Time Series and Forecasting Symposium, University of Sydney