Descent Direction Iteration

Class of optimization methods that rely on local information about the objective function. Suppose we are optimizing some function \(f : \mathbb{R}^d \rightarrow \mathbb{R}\). The general framework is as follows: At iteration \(k\),

Check whether \(\mathbf{x}^{(k)}\) satisfies the termination conditions.
Determine a descent direction \(\mathbf{d}^{(k)}\), this could be a unit vector, found via gradient/Hessian information.
Determine a step size \(\alpha^{(k)}\) (also called a learning rate).
Compute the (hopefully better) input point \(\mathbf{x}^{(k+1)} \leftarrow \mathbf{x}^{(k)} + \alpha^{(k)}\mathbf{d}^{(k)}\).

Some terminal conditions include:

Number of maximum iterations - \(k : k < k_{\max}\)
Absolute improvement - \(k : |f(\mathbf{x}^{(k)}) - f(\mathbf{x}^{(k+1)})|<\epsilon\)
Relative improvement - \(k : |f(\mathbf{x}^{(k)}) - f(\mathbf{x}^{(k+1)})| < \epsilon|f(\mathbf{x}^{(k)})|\)
Gradient magnitude - \(k : ||\nabla f(\mathbf{x}^{(k)})|| <\epsilon\)