Non-spuriousness of Representations

(, , Definition 1) defines non-spuriousness of a representation in causal language. Specifically, given a sample with representation \(Z=z\) and label \(Y=y\), the non-spuriousness of \(Z\) for label \(Y\) is the Probability of Sufficiency (PS) of \(\mathbb{I}(Z=z)\) for \(\mathbf{I}(Y=y)\), denoted \(PS_{Z=z,Y=y}=\mathbb{P}(Y(Z=z)=y|Z\neq z, Y\neq y)\).

For e.g. if the label \(Y\) denotes whether a dog is in the image or not, then the feature \(Z\) which represents whether a dog face is present or not is a non-spurious feature, as the counterfactual label of \(Y\) once we set \(Z=1\) (dog face is in the image), then \(Y=1\) (a dog is in the image).

In this example, a spurious feature would be whether grass exists or not - this feature is highly correlated with the label, but forcing an image to have grass does not change the image label.