Improving Mutual Information Estimation with Annealed and Energy-Based Bounds - 2022
Details
Title : Improving Mutual Information Estimation with Annealed and Energy-Based Bounds Author(s): Brekelmans, Rob and Huang, Sicong and Ghassemi, Marzyeh and Steeg, Greg Ver and Grosse, Roger Baker and Makhzani, Alireza Link(s) : https://openreview.net/forum?id=T0B9AoM_bFg
Rough Notes
This paper unifies existing Mutual Information (MI) bounds from an importance sampling perspective, and introduce 3 novel bounds, assuming either a single marginal or full joint density is known.
Insights borrowed from:
Denote \(I(\mathbf{x};\mathbf{z})\) as the MI between \(\mathbf{x,z}\), which is equivalently \(\mathbb{E}_{p(\mathbf{x,z})}[\log \frac{p(\mathbf{x,z})}{p(\mathbf{z})p(\mathbf{x})}] = H(\mathbf{x})-H(\mathbf{x}|\mathbf{z})\)
When \(p(\mathbf{x}|\mathbf{z})\) is tractable to sample and evaluate, MC sampling gives a low variance estimate of the conditional entropy term, leaving the estimation of the log partition function \(\log p(\mathbf{x})\) in the MI formulation above, and for this Importance Sampling (IS) based approaches are often used.
To estimate the \(\log p(\mathbf{x})\), first construct a proposal \(q_{\text{PROP}}(\mathbf{z}_{\text{ext}}|\mathbf{x})\) and target \(p_{\text{TGT}}(\mathbf{x},\mathbf{z}_{\text{ext}})\) over an extended state space, such that the normalization constant of this new target is \(\mathbg{Z}_{\text{TGT}}=\int p_{\text{TGT}}(\mathbf{x},\mathbf{z}_{\text{ext}})d\mathbf{z}_{\text{ext}}=p(\mathbf{x})\) and the normalization constant of \(q_{\text{PROP}}(\mathbf{x},\mathbf{z}_{\text{ext}})\) is \(\mathcal{Z}_{\text{PROP}}=1\). Then, taking expectations of the log importance weights \(\log \frac{p_{\text{TGT}}(\mathbf{x},\mathbf{z}_{\text{ext}})}{q_{\text{PROP}}(\mathbf{z}_{\text{ext}}|\mathbf{x})}\) under the proposal and the target give an upper and lower bound on \(\log p(\mathbf{x})\) (Eq. 2).