Descent Direction Iteration
Class of optimization methods that rely on local information about the objective function. Suppose we are optimizing some function \(f : \mathbb{R}^d \rightarrow \mathbb{R}\). The general framework is as follows: At iteration \(k\),
- Check whether \(\mathbf{x}^{(k)}\) satisfies the termination conditions.
- Determine a descent direction \(\mathbf{d}^{(k)}\), this could be a unit vector, found via gradient/Hessian information.
- Determine a step size \(\alpha^{(k)}\) (also called a learning rate).
- Compute the (hopefully better) input point \(\mathbf{x}^{(k+1)} \leftarrow \mathbf{x}^{(k)} + \alpha^{(k)}\mathbf{d}^{(k)}\).
Some terminal conditions include:
- Number of maximum iterations - \(k : k < k_{\max}\)
- Absolute improvement - \(k : |f(\mathbf{x}^{(k)}) - f(\mathbf{x}^{(k+1)})|<\epsilon\)
- Relative improvement - \(k : |f(\mathbf{x}^{(k)}) - f(\mathbf{x}^{(k+1)})| < \epsilon|f(\mathbf{x}^{(k)})|\)
- Gradient magnitude - \(k : ||\nabla f(\mathbf{x}^{(k)})|| <\epsilon\)