Active Invariant Causal Prediction: Experiment Selection through Stability - 2020

Details

Title : Active Invariant Causal Prediction: Experiment Selection through Stability Author(s): Gamella, Juan L. and Heinze-Deml, Christina Link(s) : https://proceedings.neurips.cc/paper/2020/hash/b197ffdef2ddc3308584dce7afa3661b-Abstract.html

Rough Notes

Selecting the right interventions/experiments to learn the true causal model is a hard task: existing approaches fall broadly into 2 categories:

Bayesian approaches which rely on maximizing a Bayesian utility function (often the mutual information between the graph and the hypothetical result). They require exact knowledge of the intervention locations and parameters, and it is hard to analyze the case where we have misspecified interventions on the experiment choices.
Graph-theoretic methods which give bounds on the number of interventions required for identifiability. These results are general for the underlying distribution, but make strong assumptions such as correct identification of the Markov Equivalence Class (MEC) and perfectly informative interventions (i.e. infinite interventional data).

Work presented in this paper does not fall into these 2 categories. It uses ICP to recover direct causes \(S^*\) of a response variable of interest \(Y\) from interventional data. The idea is that \(P(Y|\text{direct causes of Y})\) remains invariant when intervening on arbitrary variables in the system other than itself. ICP does not require knowledge of the MEC not the nature or location of the interventions performed in different environments (each environment has different experimental conditions for the system).

A rough overview of the method is at round \(t\):

Choose an intervention \(I_t = do(X_j=x)\).
Perform the experiment.
Collect sample \((X^t,Y^t) \sim P(X,Y|do(X_j=x))\), \(\mathcal{E}_t \leftarrow \mathcal{E}_{t-1}\cup \{(X^t,Y^t)\}\)
Update accepted sets, i.e. run ICP on \(\mathcal{E}_t\).

Roughly speaking, interventional stable sets (under some environment \(\mathcal{E}\) and response variable \(Y\)) enable us to characterize sets of plausible causal predictors from d-separation relationships in the causal graph. A set of predictors is stable if it d-separates the response from all interventions, and call \(\mathbb{S}_\mathcal{E}\) all stable sets under \(\mathcal{E}\). From this they define the stability ratio of a variable which is the proportion that variable occurs in the intervention stable sets under \(\mathcal{E}\). To use this information to construct an intervention policy, we need to relate this to sets of plausible causal predictors. To do this, the authors define plausible causal predictors as a set \(S\) of variables under a set of environments \(\mathcal{E}\) where \(\forall x, \forall e,f \in \mathcal{E}\), \(Y^e|X_S^e=x \overset{d}{=} Y^f|X_S^f=x\). Call \(\mathbb{C}_\mathcal{E}\) all such plausible causal predictors.

The collection of accepted sets from ICP is an estimate of \(\mathbb{C}_\mathcal{E}\), and in general \(\mathbb{S}_\mathcal{E} \subset \mathbb{C}_\mathcal{E}\).

They treat direct interventions on parents as "maximally informative" and construct active learning policies with the goal of choosing these (only single-variable) interventions. These policies make use of 3 strategies:

Markov strategy M: Selects intervention targets from within the Markov blanket, which contains the parents. Under linearity this can be computed efficiently.
Empty-set strategy E: If an observational sample is available, we can test whether the distribution equivalence in the plausible causal predictor definition holds under the empty set when considering the observational and interventional sample \(e_t\). If it does, we know the latest intervention target is not upstream of the response, and thus not a parent, and we can discard the target from future interventions.
Ratio strategy R: A variable is not a parent if its stability ratio is less than 0.5. As an estimate we use accepted sets computed from \(\mathcal{E}_{t-1}\) and compute the ratio, however we do not discard it from future interventions.

Identified parents i.e. those with stability ratio 1 are also excluded for all strategies. Each strategy narrows down the set of possible intervention targets, and the actual target is chosen uniformly at random. For multi-variable interventions, we can choose \(k\) targets, and if a policy combines several strategies, we can the intersection of the possible intervention targets.