Harman Patil (Editor)

Wolfe conditions

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Wolfe conditions

In the unconstrained minimization problem, the Wolfe conditions are a set of inequalities for performing inexact line search, especially in quasi-Newton methods, first published by Philip Wolfe in 1969.

Contents

In these methods the idea is to find

for some smooth f : R n R . Each step often involves approximately solving the subproblem

where x k is the current best guess, p k R n is a search direction, and α R is the step length.

The inexact line searches provide an efficient way of computing an acceptable step length α that reduces the objective function 'sufficiently', rather than minimizing the objective function over α R + exactly. A line search algorithm can use Wolfe conditions as a requirement for any guessed α , before finding a new search direction p k .

Armijo rule and curvature

A step length α k is said to satisfy the Wolfe conditions, restricted to the direction p k , if the following two inequalities hold:

i) f ( x k + α k p k ) f ( x k ) + c 1 α k p k T f ( x k ) , ii) p k T f ( x k + α k p k ) c 2 p k T f ( x k ) ,

with 0 < c 1 < c 2 < 1 . (In examining condition (ii), recall that to ensure that p k is a descent direction, we have p k T f ( x k ) < 0 .)

c 1 is usually chosen to be quite small while c 2 is much larger; Nocedal gives example values of c 1 = 10 4 and c 2 = 0.9 for Newton or quasi-Newton methods and c 2 = 0.1 for the nonlinear conjugate gradient method. Inequality i) is known as the Armijo rule and ii) as the curvature condition; i) ensures that the step length α k decreases f 'sufficiently', and ii) ensures that the slope has been reduced sufficiently.

Strong Wolfe condition on curvature

Denote a univariate function ϕ restricted to the direction p k as ϕ ( α ) = f ( x k + α p k ) . The Wolfe conditions can result in a value for the step length that is not close to a minimizer of ϕ . If we modify the curvature condition to the following,

iii) | p k T f ( x k + α k p k ) | c 2 | p k T f ( x k ) |

then i) and iii) together form the so-called strong Wolfe conditions, and force α k to lie close to a critical point of ϕ .

Rationale

The principal reason for imposing the Wolfe conditions in an optimization algorithm where x k + 1 = x k + α p k is to ensure convergence of the gradient to zero. In particular, if the cosine of the angle between p k and the gradient,

is bounded away from zero and the i) and ii) conditions hold, then f ( x k ) 0 .

An additional motivation, in the case of a quasi-Newton method, is that if p k = B k 1 f ( x k ) , where the matrix B k is updated by the BFGS or DFP formula, then if B k is positive definite ii) implies B k + 1 is also positive definite.

References

Wolfe conditions Wikipedia