# Tractable Uncertainty for Causal Structure Learning @ APML Seminar Series

## Details

Title : Tractable Uncertainty for Causal Structure Learning @ APML Seminar Author(s): Benjie Wang Link(s) :

## Rough Notes

Task: How to learn causal structure with uncertainty in a scalable manner. Requires:

- Compact representation of the uncertainty.
- Efficient and parallel learning of uncertainty.
- Efficient and flexible reasoning over uncertainty.

### Motivation

Some motivation related to bioinformatics Shen et al. 2020.

We have statistical uncertainty due to limited data, and causal uncertainty due to non-identifiability.

Some other motivational questions we are interested in "what is the probability that X causes Y?", "what is the expected causal effect of X on Y", etc.

### Background

Bayesian structure learning: Inputs:

- Prior \(P(G)\) over DAGs.
- Data
- Likelihood, assumed given in closed form.

(Q) Why not prior over MEC rather than DAGs? Technically speaking shouldn't we want a prior over the space of Equivalence classes instead of DAGs, since with observational data thats the only thing we can know for sure?

Goal: Compute \(P(G|D) \propto P(D|G)P(G)\), to answer questions of the form \(\mathbb{E}_{P(G|D)}[f(G)|g(F)]\).

MCMC and variational approximations are often used.

Speaker's work is more on the variational approximation side.

Mean-field approximation for the graph could be done with iid Bernoulli's but this does not guarantee acyclicity. Neural autoregressive models for this are difficult to train to encode acyclicity.

### Representation

The introduced model is called OrderSPN, which can represent a family of distribution over causal structures.

First of all, introduce orderings, and work on the joint space of topological orders \(\sigma\) and directed graphs \(G\).

- Every DAG is consistent with atleast 1 order.
- Every directed graph consistent with an order is acyclic.

Using sum-product networks we can parametrize a flexible model class over this space.

Sum-Product Networks (SPNs) are a type of tractable probabilistic model for expressing a distribution over a set of variables \(\mathbf{X}\). There are rooted in DAGs consisting of 3 types of nodes:

- L: Simple base distributions \(L(\mathbf{X})\) e.g. Gaussians.
- \(\times\): Factor distributions \(P(X)=C_1(\mathbf{X)_1)\times C_2\)
- \(+\): Mixture of component distributions \(S(\mathbf{X})=\sum_j c_j P(X)\)

They introduce a special type of SPNs called OrderSPNs. The sum nodes represent different partitions of the orderings, and the factor nodes imply multiplication over decomposed orderings on the same leaf (#DOUBT). At the bottom, the leaf nodes only have 1 variable.

(#TODO Fully understand the model).

### Learning

Learning OrderSPNs involve 2 tasks:

- When we have the splits of the orderings, use the data to decide which splits to keep.
- Then learn the corresponding parameters.

Ordering selection: At each sum node, use existing Bayesian structure learning oracle e.g. MCMC to sample orderings.

Parameter learning: Use variational inference to optimize the parameters. ELBO can be computed exactly for OrderSPNs, no need for reparametrization trick.

### Reasoning

Tractable queries follow from the results for SPNs.

With a model like this, we can compute Bayesian causal effects e.g. CAE via Bayesian Model Averaging.