Line Search

A subroutine for computing the step size for descent direction based optimization methods. Suppose we have the descent direction \(\mathbf{d}\) and want to find the optimal step size \(\alpha\), assuming we want to minimize some function \(f : \mathbb{R}^d \rightarrow \mathbb{R}\), find the best \(\alpha\) involve the following 1D optimization problem: \[ \min_\alpha f(\mathbf{x}+\alpha \mathbf{d})\]

According to the authors in (, ), the Brent-Dekker method is a commonly used algorithm for such 1D optimization problems.

However it is often computationally more expensive to perform a line search during each iteration, thus approximate line search methods are often employed. These methods rely on finding an approximate value for \(\alpha\) that satisfies the following conditions:

Sufficient decrease/Armijo condition: The step size \(\alpha\) causes a sufficient decrease in the objective function value - \(f(\mathbf{x}^{(k+1)}) \leq f(\mathbf{x}^{(k)}) + \beta \alpha \nabla_{\mathbf{d}^{(k)}}f(\mathbf{x}^{(k)})\). Here \(\beta \in [0,1]\), often set to \(\beta=1\times 10^{-4}\). \(\beta\) controls how much of a decrease is acceptable, for e.g. when it is 0 any amount of decrease is acceptable.
Curvature condition: The directional derivative at the next iterate must be shallower - \(\nabla_{\mathbf{d}^{(k)}}f(\mathbf{x}^{(k+1)}) \geq \sigma\nabla_{\mathbf{d}^{(k)}}f(\mathbf{x}^{(k)})\). Here \(\sigma\) controls how shallow it can be, and often \(\beta< \sigma<1\).

An alternative to the curvature condition is the strong curvature condition, \(|\nabla_{\mathbf{d}^{(k)}}f(\mathbf{x}^{(k+1)})| \leq -\sigma\nabla_{\mathbf{d}^{(k)}}f(\mathbf{x}^{(k)})\)

Together, these are called the Wolfe conditions.