Joris Mooij: Joint Causal Inference: A Unifying Perspective on Causal Discovery - 2020

Details

Title : Joris Mooij: Joint Causal Inference: A Unifying Perspective on Causal Discovery Author(s): Online Causal Inference Seminar Link(s) : https://www.youtube.com/watch?v=NgxQkFwve70

Rough Notes

Some challenges in causal discovery include:

Latent confounders, selection bias.
Feedback loops (causal cycles, non-recursivity)
Temporal dependence, unlimited experimentation often not possible.

The Joint Causal Inference (JCI) framework is for causal modelling and discovery. For causal modelling we use Structural Causal Models (SCMs) which allow us to take into account both latent confounding and cycles. Recall a SCM \(\mathcal{M}\) is a tuple of endogenous and exogenous variables both defined on different measurable spaces \(\mathcal{X},\mathcal{E}\) respectively and a mapping \(f: \mathcal{X} \times\mathcal{E} \to \mathcal{X}\) and a product probability measure \(\mathbb{P}_\mathcal{E}\) on \(\mathcal{E}\). An SCM gives an augmented graph \(\mathcal{G}^a(\mathcal{M})\) which includes the exogenous variables, cycles and self loops as well. Let \(\mathcal{G}(\mathcal{M})\) be the graph where we omit the exongeous variables, and any pair of endogenous variables which share an exogenous parent get an additional bi-directional edge between them.

Given a subset of the endogenous variables \(I\), a perfect intevention results in an intervened SCM \(\mathcal{M}_{do(X_I = \xi_I)}\) where the functions \(f_i, i\in I\) are changed to \(\xi_i, i\in I\), forcing the variables to take these values explicitly.

We have Causal Bayesian Networks \(\subset\) Acyclic SCMs (those where \(\mathcal{G}(\mathcal{M})\) are acyclic) \(\subset\) Simple SCMs (roughly, those where any subset of the equations can be uniquely solved for its variables in terms of other variables in these equations) \(\subset\) SCMs.

Simple SCMs extend acyclic SCMs and allow for weak cyclic causal relations, while preserving the convenient properties of acyclic SCMs. In many dynamical systems, feedback loops induce cyclic causality at equilibrium.

\(\mathcal{G}(\mathcal{M})\) of an acyclic SCM is interpreted as follows:

\(i \to j\) means \(i\) is a direct cause of \(j\).
\(i \to \cdots \to j\) means \(i\) is a cause of \(j\).
\(i \leftrightarrow j\) means \(i,j\) are possibly confounded.

In the graph, an intervened node removes all incoming edges and bidrectional edges on that node.

We now look at causal discovery. RCTs (or A/B testing) are the gold standard for causal discovery. We want to know where a treatment \(T \in \{0,1\}\) is the cause of some effect \(E\), so for each sample \((e_i,t_i)\) in the population we give the treatment randomly. There are 2 ways to view RCTs:

Separate dataset view: Is \(\mathbb{P}_{T=0}(E)=\mathbb{P}_{T=1}(E)\)?
Single dataset view: Is \(E \_||\_ T\)?

Within the framework of SCMs we can prove that RCTs identify causal discovery under the RCT assumptions of outcome not causing treatment, and outcome and treatment are unconfounded.

Causal discovery based on observational data often divide themselves into constraint based and likelihood based approaches, and often rely on strong assumptions.

The JCI framework generalizes the idea of RCTs to multiple context and system variables. Context variables model the context of the system e.g. treatment variables, patient subgroups. System variables model the system of interest, e.g. bloog sugar levels, blood oxygenation levels.

JCI reduces modelling a system given its context to modelling a system jointly with its context. The boundary between system and context is a modelling choice, and can be chosen according to the JCI assumptions:

The system does not affect its context.
Context and system are unconfounded. (Harder to justify considered optional as it can be easily violated if there is no randomization for e.g.).

Many existing causal discovery methods (FCI, ASD, RCTs, Local CD, ICP) are special cases of the JCI framework. The JCI view also inspired new algorithms ASD-JCI, FCI-JCI, which outperform the purely observational counterparts i.e. no context variables. Adding context variables help considerably, can be thought of as an e.g. that perturbing the system helps us understand it.

To allow for cycles, theory can be extended to simple SCMs where d-separation is replaced by \(\sigma\) separation.