In the unconstrained minimization problem, the Wolfe conditions are a set of inequalities for performing inexact line search, especially in quasi-Newton methods, first published by Philip Wolfe in 1969.
In these methods the idea is to find
for some smooth
f
:
R
n
→
R
. Each step often involves approximately solving the subproblem
where
x
k
is the current best guess,
p
k
∈
R
n
is a search direction, and
α
∈
R
is the step length.
The inexact line searches provide an efficient way of computing an acceptable step length
α
that reduces the objective function 'sufficiently', rather than minimizing the objective function over
α
∈
R
+
exactly. A line search algorithm can use Wolfe conditions as a requirement for any guessed
α
, before finding a new search direction
p
k
.
Armijo rule and curvature
A step length
α
k
is said to satisfy the Wolfe conditions, restricted to the direction
p
k
, if the following two inequalities hold:
i)
f
(
x
k
+
α
k
p
k
)
≤
f
(
x
k
)
+
c
1
α
k
p
k
T
∇
f
(
x
k
)
,
ii)
p
k
T
∇
f
(
x
k
+
α
k
p
k
)
≥
c
2
p
k
T
∇
f
(
x
k
)
,
with
0
<
c
1
<
c
2
<
1
. (In examining condition (ii), recall that to ensure that
p
k
is a descent direction, we have
p
k
T
∇
f
(
x
k
)
<
0
.)
c
1
is usually chosen to be quite small while
c
2
is much larger; Nocedal gives example values of
c
1
=
10
−
4
and
c
2
=
0.9
for Newton or quasi-Newton methods and
c
2
=
0.1
for the nonlinear conjugate gradient method. Inequality i) is known as the Armijo rule and ii) as the curvature condition; i) ensures that the step length
α
k
decreases
f
'sufficiently', and ii) ensures that the slope has been reduced sufficiently.
Denote a univariate function
ϕ
restricted to the direction
p
k
as
ϕ
(
α
)
=
f
(
x
k
+
α
p
k
)
. The Wolfe conditions can result in a value for the step length that is not close to a minimizer of
ϕ
. If we modify the curvature condition to the following,
iii)
|
p
k
T
∇
f
(
x
k
+
α
k
p
k
)
|
≤
c
2
|
p
k
T
∇
f
(
x
k
)
|
then i) and iii) together form the so-called strong Wolfe conditions, and force
α
k
to lie close to a critical point of
ϕ
.
The principal reason for imposing the Wolfe conditions in an optimization algorithm where
x
k
+
1
=
x
k
+
α
p
k
is to ensure convergence of the gradient to zero. In particular, if the cosine of the angle between
p
k
and the gradient,
is bounded away from zero and the i) and ii) conditions hold, then
∇
f
(
x
k
)
→
0
.
An additional motivation, in the case of a quasi-Newton method, is that if
p
k
=
−
B
k
−
1
∇
f
(
x
k
)
, where the matrix
B
k
is updated by the BFGS or DFP formula, then if
B
k
is positive definite ii) implies
B
k
+
1
is also positive definite.