Gradient Descent

A descent direction based optimization algorithm to optimize \(f(\mathbf{ x }) : \mathbb{ R }^p \to \mathbb{ R }\) where the descent direction is taken to be \(\mathbf{d}^{(k)} = -\frac{\nabla_{\mathbf{x}} \hat{f}(\mathbf{x}^{(k)})}{||\nabla_{\mathbf{x}} f(\mathbf{x}^{(k)})||}\).

The idea is that \(\nabla_\mathbf{ x }f(\mathbf{ x })\) (the vector whose components tell how fast \(f\) is changing with respect to the standard basis) Two reasons why this makes sense:

The update rule is thus \(\mathbf{x}^{(k+1)} \leftarrow \mathbf{x}^{(k)} - \alpha^{(k)}\mathbf{d}^{(k)}\) with \(\alpha^{(k)} = \text{argmin}_\alpha f(\mathbf{x}^{(k)}+\alpha \mathbf{d}^{(k)})\). For the optimal \(\alpha^{(k)}\), we have that \(0=\frac{\partial f(\mathbf{x}^{(k+1)})}{\partial \alpha^{(k)}} = \frac{\partial f(\mathbf{x}^{(k+1)})}{\partial \mathbf{x}^{(k+1)}}\frac{\partial \mathbf{x}^{(k+1)}}{\partial \alpha^{(k)}}\) which implies the directions taken in step \(k, k+1\) are orthogonal to each other.

Emacs 29.4 (Org mode 9.6.15)