# Active Learning of Causal Bayes Net Structure - 2006

## Details

Title : Active Learning of Causal Bayes Net Structure Author(s): Murphy, Kevin P. Link(s) : https://www.semanticscholar.org/paper/Active-Learning-of-Causal-Bayes-Net-Structure-Murphy/13c2ad5c1dc96f651b2bc197089f62b0691fa622

## Rough Notes

Introduces a method to select which interventions/experiments to perform to learn the causal structure as soon as possible for discrete data. Makes use of online MCMC to estimate posterior over graph structures and importance sampling to find the best action to select in each iteration.

Takes a decision theoretic approach - actions are possible experiments and an expected utility is computed for each action, with respect to the current belief for the structure, \(P(G|D_{1:t})\).

The value (expected utility) of action \(a\) is \(V(a)=\sum_{g \in \mathcal{G}}\sum_{y\in \mathcal{Y}}P(y|g,a,D)p(g|D)U(g,a,y,D)\) where \(U\) is some utility function, here \(\log P(g|a,y,D)\). Maximizing this utility is equivalent to maximizing the expectation under \(Y\) of the KL between the posterior \(P(G|a,Y,D)\) and \(P(G|D)\), equivalently, minimizing the conditional entropy \(H(G|Y,a,D)\). The best myopic action is \(a^*=\text{argmax}_{a\in \mathcal{A}}V(a)\).

The optimal algorithm is defined as:

- For each \(a\in \mathcal{A}\):
- \(V(a)=0\)
- for each \(y \in \mathcal{Y}\):
- Compute \(P(y|g,a,D),P(g|a,y,D)\) \(\forall g \in \mathcal{G}\)
- \(V(a)+=\sum_g P(y|g,a,D)P(g|D)\log P(g|a,y,D)\)

The (predictive) density with \(y\) has a closed form expression if each node has a Dirichlet prior, coupled with the discrete data assumption. The posterior over graphs can then be computed via Bayes rule.

But, the sums over \(\mathcal{A},\mathcal{Y},\mathcal{G}\) are intractable. Sampling techniques are thus used to approximate the corresponding terms.

For sampling graphs, we first consider a Metropolis-Hastings Algorithm with the proposal distribution being a uniform distribution over all graphs \(N(G)\) within a structural Hamming distance of 1 (1 edge addition,deletion,reversal) away from the current graph sample, i.e. \(Q(G'|G) = 1/|N(G)|\) if \(G'\in N(G)\) and 0 otherwise. This however is an offline method, and we need an online method. To do this, we use ideas from particle filtering with . The belief state \(P(G|D_{1:t})\) is represented as a set of weighted particles where for each \(t\) we apply \(B\) Metropolis-Hasting moves to each particle to gt an approximation to \(P(G|D_{1:t+1})\).

For sampling observations, an importance sampling approach is used.

For sampling actions, no solutions are presented.