Automatic Causal Inference and Forecasting
UNSW Sydney, School of Risk and Actuarial Studies
February 7, 2023
df <- read.csv("chicago.csv")
head(df)
#> Time Temperature Crime
#> 1 1 24.08 1605
#> 2 2 19.04 1119
#> 3 3 28.04 1127
#> 4 4 30.02 1154
#> 5 5 35.96 1251
#> 6 6 33.08 1276
library(fastEDM)
crimeCCMCausesTemp <- easy_edm("Crime", "Temperature", data=df)
#> ✖ No evidence of CCM causation from Crime to Temperature found.
tempCCMCausesCrime <- easy_edm("Temperature", "Crime", data=df)
#> ✔ Some evidence of CCM causation from Temperature to Crime found.
Jinjing Li
University of Canberra
George Sugihara
University of California San Diego
Michael J. Zyphur
University of Queensland
Patrick J. Laub
UNSW
Source: Sugihara et al. (2012), Detecting causality in complex ecosystems.
Imagine \(x_t\), \(y_t\), \(z_t\) are interesting time series…
If the data is generated according to the nonlinear system:
\[ \begin{aligned} x_{t+1} &= \sigma (y_t - x_t) \\ y_{t+1} &= x_t (\rho - z_t) - y_t \\ z_{t+1} &= x_t y_t - \beta z_t \end{aligned} \]
then \(y \Rightarrow x\), both \(x, z \Rightarrow y\), and both \(x, y \Rightarrow z\).
Gerald Pao (2021), “How to use Causality without Correlation to Download Brains into Computers”, QMNET Seminar.
Say \(\mathbf{x}_t = (x_t, y_t, z_t)\), then if:
\[ \mathbf{x}_{t+1} = \mathbf{A} \mathbf{x}_{t} \]
we have a linear system.
\[ \mathbf{x}_{t+1} = f(\mathbf{x}_{t}) \]
we have a nonlinear system.
Using a term like nonlinear science is like referring to the bulk of zoology as the study of non-elephant animals. (Stanisław Ulam)
We don’t fit a model for \(f\), non-parametrically use the data. Hence the name empirical dynamic modelling.
Takens’ theorem to the rescue, though…
Takens’ theorem is a deep mathematical result with far-reaching implications. Unfortunately, to really understand it, it requires a background in topology. (Munch et al. 2020)
Source: Munch et al. (2020), Frequently asked questions about nonlinear dynamics and empirical dynamic modelling, ICES Journal of Marine Science.
\(t\) | \(x_t\) |
---|---|
0 | 0.00 |
1 | 1.00 |
2 | 1.35 |
... | ... |
98 | 0.07 |
99 | -0.33 |
\(t\) | \(x_t\) |
---|---|
0 | 0.00 |
1 | 1.00 |
2 | 1.35 |
... | ... |
98 | 0.07 |
99 | -0.33 |
\(i\) | \(x_t\) | \(x_{t-1}\) | \(x_{t-2}\) |
---|---|---|---|
0 | 1.35 | 1.00 | 0.00 |
1 | 0.75 | 1.35 | 1.00 |
2 | -0.31 | 0.75 | 1.35 |
... | ... | ... | ... |
95 | 0.45 | 0.54 | 0.11 |
96 | 0.07 | 0.45 | 0.54 |
\(i\) | \(x_{t+1}\) |
---|---|
0 | 0.75 |
1 | -0.31 |
2 | -1.00 |
... | ... |
95 | 0.07 |
96 | -0.33 |
\(i\) | \(x_t\) | \(x_{t-1}\) | \(x_{t-2}\) |
---|---|---|---|
0 | 1.35 | 1.00 | 0.00 |
1 | 0.75 | 1.35 | 1.00 |
2 | -0.31 | 0.75 | 1.35 |
... | ... | ... | ... |
95 | 0.45 | 0.54 | 0.11 |
96 | 0.07 | 0.45 | 0.54 |
\(x_{t+1}\) |
---|
0.75 |
-0.31 |
-1.00 |
... |
0.07 |
-0.33 |
\(d(\mathbf{x}_i, \mathbf{x}^*)\) |
---|
0.98 |
1.30 |
1.56 |
... |
0.17 |
0.66 |
\(i\) | \(x_t\) | \(x_{t-1}\) | \(x_{t-2}\) |
---|---|---|---|
95 | 0.45 | 0.54 | 0.11 |
57 | 0.74 | 0.49 | -0.13 |
65 | 0.87 | 0.51 | -0.08 |
... | ... | ... | ... |
26 | -1.17 | -1.29 | -0.62 |
12 | -0.78 | -1.42 | -1.05 |
\(x_{t+1}\) |
---|
0.07 |
0.50 |
0.62 |
... |
-0.20 |
0.21 |
\(d(\mathbf{x}_i, \mathbf{x}^*)\) |
---|
0.17 |
0.30 |
0.37 |
... |
2.55 |
2.56 |
\(i\) | \(x_t\) | \(x_{t-1}\) | \(x_{t-2}\) |
---|---|---|---|
95 | 0.45 | 0.54 | 0.11 |
57 | 0.74 | 0.49 | -0.13 |
65 | 0.87 | 0.51 | -0.08 |
80 | 0.78 | 0.74 | 0.17 |
\(x_{t+1}\) |
---|
0.07 |
0.50 |
0.62 |
0.31 |
\(d(\mathbf{x}_i, \mathbf{x}^*)\) |
---|
0.17 |
0.30 |
0.37 |
0.39 |
\(i\) | \(x_{t+1}\) |
---|---|
95 | 0.07 |
57 | 0.50 |
65 | 0.62 |
80 | 0.31 |
\(d(\mathbf{x}_i, \mathbf{x}^*)\) |
---|
0.17 |
0.30 |
0.37 |
0.39 |
\(\tilde{d}(\mathbf{x}_i, \mathbf{x}^*)\) |
---|
1.00 |
1.76 |
2.18 |
2.29 |
\(w_i\) |
---|
0.49 |
0.23 |
0.15 |
0.13 |
\[w_i = \frac{
\exp\bigl\{ -\theta \, \tilde{d}(\mathbf{x}_i, \mathbf{x}^*) \bigr\}
}{
\sum_{j=1}^k \exp\bigl\{ -\theta \, \tilde{d}(\mathbf{x}_j, \mathbf{x}^*) \bigr\}
}
\]
Sequential Locally Weighted Global Linear Maps
Given \(x_t\) from the Lorenz system:
predict \(x_{t+10}\) given \((x_t, x_{t-20})\).
\(t\) | \(x_t\) | \(y_t\) | \(z_t\) |
---|---|---|---|
0 | 0.95 | 10.141630 | 0.048244 |
1 | 1.85 | 10.739436 | 0.192093 |
2 | 2.75 | 11.769597 | 0.442604 |
... | ... | ... | ... |
1998 | 14.52 | 16.536739 | 51.960670 |
1999 | 14.69 | 16.014687 | 52.941509 |
\(i\) | \(x_t\) | \(x_{t-20}\) |
---|---|---|
0 | 27.53 | 0.95 |
1 | 24.75 | 1.85 |
2 | 21.34 | 2.75 |
... | ... | ... |
1768 | -16.59 | -4.76 |
1769 | -17.32 | -4.90 |
\(i\) | \(x_{t+10}\) |
---|---|
0 | -4.75 |
1 | -6.82 |
2 | -8.63 |
... | ... |
1768 | -15.64 |
1769 | -14.56 |
\(i\) | \(x_t\) | \(x_{t-20}\) |
---|---|---|
0 | 27.53 | 0.95 |
1 | 24.75 | 1.85 |
2 | 21.34 | 2.75 |
... | ... | ... |
1768 | -16.59 | -4.76 |
1769 | -17.32 | -4.90 |
\(x_{t+10}\) |
---|
-4.75 |
-6.82 |
-8.63 |
... |
-15.64 |
-14.56 |
\(d(\mathbf{x}_i, \mathbf{x}^*)\) |
---|
18.32 |
15.45 |
12.06 |
... |
31.18 |
31.90 |
\(\hat{d}(\mathbf{x}_i, \mathbf{x}^*)\) |
---|
0.91 |
0.77 |
0.60 |
... |
1.55 |
1.58 |
The average of the distances is:
20.13
\(i\) | \(x_t\) | \(x_{t-20}\) |
---|---|---|
0 | 27.53 | 0.95 |
1 | 24.75 | 1.85 |
2 | 21.34 | 2.75 |
... | ... | ... |
1768 | -16.59 | -4.76 |
1769 | -17.32 | -4.90 |
\(d(\mathbf{x}_i, \mathbf{x}^*)\) |
---|
18.32 |
15.45 |
12.06 |
... |
31.18 |
31.90 |
\(w_i\) |
---|
0.01 |
0.02 |
0.05 |
... |
0.00 |
0.00 |
\[w_i = \exp\bigl\{ -\theta \, \hat{d}(\mathbf{x}_i, \mathbf{x}^*) \bigr\}\]
Given two time series, create \(E\)-length trajectories
\[ \mathbf{x}_t = (\text{Temp}_t, \text{Temp}_{t-1}, \dots, \text{Temp}_{t-(E-1)}) \in \mathbb{R}^{E} \]
and targets
\[ y_t = \text{Crime}_{t} .\]
For point \(\mathbf{x}_{s} \in \mathcal{P}\), pretend we don’t know \(y_s\) and try to predict it.
\[ \forall \, \mathbf{x} \in \mathcal{L} \quad \text{ find } \quad d(\mathbf{x}_{s}, \mathbf{x}) \]
This is computationally demanding.
If \(\text{Temp}_t\) causes \(\text{Crime}_t\), then information about \(\text{Temp}_t\) is somehow embedded in \(\text{Crime}_t\).
By observing \(\text{Crime}_t\), we should be able to forecast \(\text{Temp}_t\).
By observing more of \(\text{Crime}_t\) (more “training data”), our forecasts of \(\text{Temp}_t\) should be more accurate.
Example: Chicago crime and temperature.
Thanks to Rishi Dhushiyandan for his hard work on easy_edm
.
😊 Give it a try, feedback would be very welcome.
😍 If you’re talented in causal inference or programming (Stata/Mata, R, Javascript, C++, Python), we’d love contributions!
Patrick Laub, Probability Group Seminar, Université de Strasbourg