Quasi-Newton Methods

Quasi-Newton or variable metric methods can be used when the Hessian matrix is difficult or time-consuming to evaluate. Instead of obtaining an estimate of the Hessian matrix at a single point, these methods gradually build up an approximate Hessian matrix by using gradient information from some or all of the previous iterates visited by the algorithm. Given the current iterate , and the approximate Hessian matrix at , the linear system

is solved to generate a direction . The next iterate is then found by performing a line search along and setting . The question is then: How can we use the function and gradient information from points and to improve the quality of the approximate Hessian matrix ? In other words, how do we obtain the new approximate Hessian matrix from the previous approximation ?

The key to this question depends on what is sometimes called the fundamental theorem of integral calculus. If we define

then this theorem implies that

The matrix in braces can be interpreted as the average of the Hessian matrix on the line segment . This result states that when this matrix is multiplied by the vector , the resulting vector is . In view of these observations, we can make mimic the behavior of by enforcing the quasi-Newton condition

This condition can be satisfied by making a simple low-rank update to . The most commonly used family of updates is the Broyden class of rank-two updates, which have the form

where and

The choice gives the Broyden-Fletcher-Goldfarb-Shanno update, which practical experience, and some theoretical analysis, has shown to be the method of choice in most circumstances. The Davidon-Fletcher-Powell update, which was proposed earlier, is obtained by setting . These two update formulae are known universally by their initials BFGS and DFP, respectively.

Updates in the Broyden class remain positive definite as long as . Although the previous condition holds automatically if is strictly convex, it can be enforced for all functions by requiring that satisfy the curvature condition. Some codes avoid enforcing the curvature condition by skipping the update if .

GAUSS , IMSL , MATLAB , NAG(FORTRAN) , NAG(C) , OPTIMA , and PROC NLP implement quasi-Newton methods. These codes differ in the choice of update (usually BFGS), line-search procedure, and the way in which is stored and updated. We can update by either updating the Cholesky decomposition of or by updating the inverse of . In either case, the cost of updating the search direction by solving the system

is on the order of operations. Updating the Cholesky factorization is widely regarded as more reliable, while updating the inverse of is less complicated. Indeed, if we define

then a BFGS update of is equivalent to the following update of :

When we store explicitly, the direction is obtained from the matrix-vector product

The availability of quasi-Newton methods renders steepest-descent methods obsolete. Both types of algorithms require only first derivatives, and both require a line search. The quasi-Newton algorithms require slightly more operations to calculate an iterate and somewhat more storage, but in almost all cases, these additional costs are outweighed by the advantage of superior convergence.

At first glance, quasi-Newton methods may seem unsuitable for large problems because the approximate Hessian matrices and inverse Hessian matrices are generally dense. This is not the case: The explicit storage of or as matrices is not necessary. For example, the above expression for the BFGS update of makes it clear that we can compute

if we know the initial matrix ; the subsequent update vectors ; and their inner products for . If is chosen to be a diagonal matrix, the necessary information can be stored in about words of memory. Limited-memory quasi-Newton methods make use of these ideas to cut down on storage for large problems. They store only the and vectors from the previous few iterates (typically, five) and compute the vector by a recursion that requires roughly 16 operations. The LBFGS code is an implementation of the limited-memory BFGS algorithms. The codes M1QN2 and M1QN3 are the same as LBFGS, except that they allow the user to specify a preconditioning technique.

Up To:

Unconstrained Optimization.

[ OTC Home Page | NEOS Guide | NEOS Server | Optimization Tree ]

Updated 28 March 1996