Graphoids

In the Directed Acyclic Graph (DAG) setting, we say a DAG \(G=([p],E)\) encodes the conditional independence model \(J(G) = \{I(A,B|C) : A,B \text{ are d-separated by } C \text{ in } G\}\).

Given a set of random variables \(X_1,\cdots,X_p\), with distribution \(\mathbb{P}\) we say a collection of conditional independence statements \(J(\mathbb{P})\) is a graphoid if they satisfy the graphoid axioms, given any subsets \(X,Y,Z,W\) with empty intersection and denoting \(I(A,B|C)\) are \(A,B\) being conditionally independent given \(C\):

Symmetry: \(I(X,Y|Z)\implies I(Y,X|Z)\)
Decomposition: \(I(X,YW|Z) \implies I(X,Y|Z)\)
Weak Union: \(I(X,YW|Z)\implies I(X,Y|ZW)\)
Contraction: \(I(X,Y|Z) \: \& \: I(X,W|ZY) \implies I(X,YW|Z)\)
Intersection: \(I(X,W|ZY) \: \& \: I(X,Y|ZW) \implies I(X,YW|Z)\) Assuming the relevant probability distributions are strictly positive.

Intuitive explanations for the graphoid axioms are stated in (, , p. 12). Namely:

Symmetry means that in any state of knowledge \(Z\), if \(Y\) tells us nothing new about \(X\), it holds the other way around.
Decomposition means that if two pieces of information \(Y,W\) are irrelevant for \(X\) together, then they are also irrelevant separately.
Weak union states that knowing irrelevant information \(W\) cannot help make irrelevant information \(Y\) be relevant for \(X\).
Contraction states that if we judge \(W\) to be irrelevant to \(X\) after learning irrelevant information \(Y\) then \(W\) must have been irrelevant for \(X\) anyways before learning \(Y\).
Intersection states that if \(W\) is irrelevant for \(X\) knowing \(Y\), and if \(Y\) is irrelevant for \(X\) knowing \(W\), then neither of them nor their combination is relevant for \(X\).

For a generalization of the graphoid axioms to the context-specific case where the conditional independences hold given a certain event (e.g. \(Z=k\) instead of just \(Z\)), refer to (, a, Section 3).