Quasi-Newton or variable metric methods can be used when
the Hessian matrix is difficult or time-consuming to evaluate. Instead of
obtaining an estimate of the Hessian matrix at a single point, these methods
gradually build up an approximate Hessian matrix by using gradient information
from some or all of the previous iterates visited by the algorithm. Given the current iterate
, and the approximate Hessian matrix
at
, the linear system
is solved to generate a direction . The next iterate is then found by performing a line search along
and setting
. The question is then: How can we use the function and gradient
information from points
and
to improve the quality of the approximate Hessian matrix
? In other words, how do we obtain the new approximate
Hessian matrix
from the previous approximation
?
The key to this question depends on what is sometimes called the fundamental theorem of integral calculus. If we define
then this theorem implies that
The matrix in braces can be interpreted as the average of the Hessian matrix
on the line segment . This result states that when this matrix is multiplied by the
vector
, the resulting vector is
. In view of these observations, we can make
mimic the behavior of
by enforcing the quasi-Newton condition
This condition can be satisfied by making a simple low-rank update to . The most commonly used family of updates is the Broyden
class of rank-two updates, which have the form
where and
The choice gives the Broyden-Fletcher-Goldfarb-Shanno update, which practical
experience, and some theoretical analysis, has shown to be the method of choice
in most circumstances. The Davidon-Fletcher-Powell update, which was proposed
earlier, is obtained by setting
. These two update formulae are known universally by their initials
BFGS and DFP, respectively.
Updates in the Broyden class remain positive definite as long as . Although the previous condition holds automatically
if
is strictly convex, it can be enforced for all functions by
requiring that
satisfy the curvature condition. Some codes avoid enforcing the
curvature condition by skipping the update if
.
GAUSS
, IMSL
, MATLAB
, NAG(FORTRAN)
, NAG(C)
, OPTIMA
, and PROC
NLP implement quasi-Newton methods. These codes differ in the choice of
update (usually BFGS), line-search procedure, and the way in which is stored and updated. We can update
by either updating the Cholesky decomposition of
or by updating the inverse of
. In either case, the cost of updating the search direction by
solving the system
is on the order of operations. Updating the Cholesky factorization is widely regarded
as more reliable, while updating the inverse of
is less complicated. Indeed, if we define
then a BFGS update of is equivalent to the following update of
:
When we store explicitly, the direction
is obtained from the matrix-vector product
.
The availability of quasi-Newton methods renders steepest-descent methods obsolete. Both types of algorithms require only first derivatives, and both require a line search. The quasi-Newton algorithms require slightly more operations to calculate an iterate and somewhat more storage, but in almost all cases, these additional costs are outweighed by the advantage of superior convergence.
At first glance, quasi-Newton methods may seem unsuitable for
large problems because the approximate Hessian matrices and inverse Hessian
matrices are generally dense. This is not the case: The explicit storage of or
as
matrices is not necessary. For example, the above expression for
the BFGS update of
makes it clear that we can compute
if we know the initial matrix ; the subsequent update vectors
; and their inner products
for
. If
is chosen to be a diagonal matrix, the necessary information can
be stored in about
words of memory. Limited-memory quasi-Newton methods make
use of these ideas to cut down on storage for large problems. They store only
the
and
vectors from the previous few iterates (typically, five) and
compute the vector
by a
recursion that requires roughly 16
operations. The LBFGS code is an implementation of the
limited-memory BFGS algorithms. The codes M1QN2 and M1QN3 are the same as LBFGS,
except that they allow the user to specify a preconditioning technique.
Up To:
[ OTC Home Page | NEOS Guide | NEOS Server | Optimization Tree ]
Updated 28 March 1996