![]() | ||
In numerical optimization, the nonlinear conjugate gradient method generalizes the conjugate gradient method to nonlinear optimization. For a quadratic function
The minimum of
Whereas linear conjugate gradient seeks a solution to the linear equation
Given a function
with an adjustable step length
After this first iteration in the steepest direction
- Calculate the steepest direction:
Δ x n = − ∇ x f ( x n ) , - Compute
β n - Update the conjugate direction:
s n = Δ x n + β n s n − 1 - Perform a line search: optimize
α n = arg min α f ( x n + α s n ) , - Update the position:
x n + 1 = x n + α n s n
With a pure quadratic function the minimum is reached within N iterations (excepting roundoff error), but a non-quadratic function will make slower progress. Subsequent search directions lose conjugacy requiring the search direction to be reset to the steepest descent direction at least every N iterations, or sooner if progress stops. However, resetting every iteration turns the method into steepest descent. The algorithm stops when it finds the minimum, determined when no progress is made after a direction reset (i.e. in the steepest descent direction), or when some tolerance criterion is reached.
Within a linear approximation, the parameters
Four of the best known formulas for
These formulas are equivalent for a quadratic function, but for nonlinear optimization the preferred formula is a matter of heuristics or taste. A popular choice is
Newton-based methods – Newton-Raphson Algorithm, Quasi-Newton methods (e.g., BFGS method) – tend to converge in fewer iterations, although each iteration typically requires more computation than a conjugate gradient iteration, as Newton-like methods require computing the Hessian (matrix of second derivatives) in addition to the gradient. Quasi-Newton methods also require more memory to operate (see also the limited-memory L-BFGS method).