Hamiltonian Monte Carlo (HMC)

An auxiliary variable Markov Chain Monte Carlo (MCMC) algorithm that takes inspiration from Hamiltonian Dynamics, and uses gradient information to get better mixing rates.

Sidestepping from conventional notation, denote \(\Theta, \mathbf{v} \in \mathbb{R}^D\) to be the position and momentum of some particle respectively. The set of values \((\Theta,\mathbf{v})\) can take is called the phase space. Assuming that the potential energy \(\mathcal{E}\) of the particle depends only on its position, and its kinetic energy \(\mathcal{K}\) depends only on its momentum, we have that the Hamiltonian \(\mathcal{H}\) is then \[ \mathcal{H}(\Theta,\mathbf{v}) = \mathcal{E}(\Theta)+\mathcal{K}(\mathbf{v}) \]

To relate the Hamiltonian to statistical models, we often take the potential energy to be related to some Boltzmann Distribution, i.e. \(\mathcal{E}(\Theta) = -\log \tilde{\pi}(\Theta)\) where \(\tilde{\pi}(\Theta)\) is the unnormalized distribution we want to sample from. The kinetic energy can taken to be for e.g. to be a quadratic form i.e. \(\mathcal{K}(\mathbf{v}) = \frac{1}{2}\mathbf{v}^T\Sigma^{-1}\mathbf{v}\) where \(\Sigma\) is some covariance matrix, in this context also called the inverse mass matrix.

To generate samples, we need to simulate \(\Theta,\mathbf{v}\) based on Hamilton's equations, which describe the motion of the particle. \[ \frac{d\Theta}{dt} = \frac{\partial \mathcal{H}(\Theta,\mathbf{v})}{\partial \mathbf{v}} \] \[ \frac{d\mathbf{v}}{dt} = -\frac{\partial \mathcal{H}(\Theta,\mathbf{v})}{\partial \Theta} \]

This requires us to numerically compute the solution to the Ordinary Differential Equations (ODEs) above. For any \(t_k\), the mapping from \((\Theta(t),\mathbf{v}(t)) \mapsto (\Theta(t+t_k),\mathbf{v}(t+t_k))\) is both invertible and volume preserving. To make sure the volume preserving property is maintained, we need to use Symplectic Integrators when making the numerical approximation. The Leapfrog Integrator is one common example.

The target distribution is taken to be \(\pi(\Theta,\mathbf{v}) = \frac{1}{Z}e^{-\mathcal{H}(\Theta,\mathbf{v})\), which when marginalizing over the auxiliary variable \(\mathbf{v}\) recovers the distribution \(\pi(\Theta)\) since \(\pi(\Theta) = \int \pi(\Theta,\mathbf{v}) \: d\mathbf{v} = \frac{1}{Z_\Theta}e^{-\mathcal{E}(\Theta)} \int \frac{1}{Z_\mathbf{v}}e^{-\frac{1}{2}\mathbf{v}^T\Sigma\mathbf{v}}\:d\mathbf{v} \propto e^{-\mathcal{E}(\Theta)}\).

Suppose we now have the sample \((\Theta_t,\mathbf{v}_t)\). To generate the next sample, we generate a new momentum vector \(\mathbf{v}^*_t \sim \mathcal{N}(\mathbf{0},\Sigma)\). Then, run the symplectic integrator for \(L\) steps starting from \((\Theta_t, \mathbf{v}^*_t)\) to get the proposal state \((\Theta^*, \mathbf{v}^*)\). This new sample is accepted according to the Metropolis-Hastings acceptance probability, which is \(\alpha = \min\Big(1,\frac{\pi(\Theta^*,\mathbf{v}^*)}{\pi(\Theta_t,\mathbf{v}_t)}\Big) = \min(1, e^{-\mathcal{H}(\Theta^*,\mathbf{v}^*)+\mathcal{H}(\Theta_t,\mathbf{v}_t)})\) where in this case the correction term cancels out since the proposal is reversible.* Since the Hamiltonian is a conserved quantity, we would an acceptance probability of 1, not taking into account the numerical errors introduced by whichever symplectic integrator is used.