Laplace Approximation

The Laplace approximation is an Approximate Bayesian Inference method that approximates the posterior by placing a Gaussian Distribution centered at the mode of the posterior and covariance structure approximated by the Hessian of the posterior at the mode.

Specifically, take some posterior distribution \(p(\mathbf{\theta}|\mathbf{y})\) and consider the Taylor expansion of it's logarithm at the MAP estimate \(\hat{\mathbf{\theta}}\in \mathbb{R}^d\), which can be found efficiently for certain models. This gives: \[ \log p(\mathbf{\theta}|\mathbf{y}) \approx \log p(\mathbf{\hat{\theta}}|\mathbf{y}) + \frac{1}{2}(\mathbf{\theta}-\mathbf{\hat{\theta}})^T[\nabla^2_{\mathbf{\theta}}\log p(\mathbf{\theta}|\mathbf{y})]_{\mathbf{\theta}=\mathbf{\hat{\theta}}} (\mathbf{\theta}-\mathbf{\hat{\theta}}) \] Where the second term vanishes since \(\nabla_\mathbf{\theta} \log p(\mathbf{\theta}|\mathbf{y})=0\) at the MAP estimate as it is a mode.

Taking the exponential then gives: \[ p(\mathbf{\theta}|\mathbf{y}) \propto \exp(-\frac{1}{2}(\mathbf{\theta}-\mathbf{\hat{\theta}})^T[-\nabla^2_{\mathbf{\theta}}\log p(\mathbf{\theta}|\mathbf{y})]_{\mathbf{\theta}=\mathbf{\hat{\theta}}} (\mathbf{\theta}-\mathbf{\hat{\theta}})) \]

Compare this to the Gaussian density with mean \(\mathbf{\mu}\) and covariance \(\mathbf{\Sigma}\) : \[ q(\mathbf{\theta}) \propto \exp(-\frac{1}{2}((\mathbf{\theta}-\mathbf{\mu})^T\mathbf{\Sigma}^{-1} (\mathbf{\theta}-\mathbf{\mu}))) \]

Matching coefficients, we can approximate the posterior with \(q(\mathbf{\theta})\) with \(\mathbf{\mu}=\mathbf{\hat{\theta}}\) and \(\mathbf{\Sigma} = [-\nabla^2_{\mathbf{\theta}}\log p(\mathbf{\theta}|\mathbf{y})]_{\mathbf{\theta}=\mathbf{\hat{\theta}}}^{-1}\)