LAB #4: Fast & Dangerous Root Finding


TABLE OF CONTENTS

Introduction

The bisection method is very reliable, but slow and dull. There are many classic methods which are faster, especially when close to the correct root. Many of these methods don't require a change of sign interval, which can be almost as hard to find as a root itself. However, if used unwisely, these fast methods can crash in a spectacular way, leaving bisection chugging along safely. We will look at a modern method that tries to combine speed and safety, known as Brent's method.

At the end of this lab, there is an assignment which is due by the beginning of the following lab.

Stopping Tests

Root finding routines check after each step to see whether the current result is good enough. The tests that are made are called termination conditions or stopping tests. Common tests include:

Some methods always have a change of sign interval [X(left),X(right)], so the interval size test can be used. Other methods don't, and so we concentrate on the size of the change in the root estimate at each step.

For the bisection method, the most natural test is on the size of the change-of-sign interval. However, in order to make comparisons with the other methods, we will want to use a residual size stopping test for all of them. So we need to re-engineer the bisection code for today.

Exercise: Modify your bisection code. Replace the while statement by a for loop:

for IT = 1 : 20
Then, just before the end statement that terminates the loop, insert these stopping conditions on the residual size:
        
          if ( fxa <= FATOL )
            result = xa;
            return
          elseif ( fxb <= FATOL )
            result = xb;
            return
          end
        
      
and set FATOL = 0.000001, that is, 10^(-6). Use your new bisection code on the function
f1(x)=x^2-9
with starting points 0 and 5, and verify that the result has a function value less than or equal to FATOL. How many steps does this take?

Regula Falsi

Regula Falsi is very similar to the bisection method: it assumes that we start with a change of sign interval, and each step of the method tries to make this interval smaller. However, where bisection takes the average of the endpoints:

xnew := ( xleft + xright ) / 2
the Regula Falsi method uses the size of the function at the endpoints as weights:
xnew := ( f(xright) * xleft - f(xleft) * xright ) / ( f(xright) - f(xleft) )
If the function values at the two end points are of equal size and opposite magnitude, then this gets the same result as bisection. But suppose f(xright) is very negative and f(xleft) is slightly positive. Wouldn't you expect the root to be close to xleft? That's where Regula Falsi will look for it.

One thing to note about Regula Falsi is that it is based on an interpolating model of the function. In other words, the algorithm takes the two known values of the unknown function as "evidence", and builds an entire working model function from this. The model function is very simple, just the linear function that passes through the two data points. But the idea of "building a model" of the object you are studying is very important. Regula Falsi, for instance, could also give us an estimate of the derivative or integral of the function, or its value at an arbitrary point. It could tell us when a new piece of data is "surprising" or was practically what was expected, none of which bisection could do. We will see this idea over and over again, in root finding, and in the other topics we study.

Exercise: make a copy of your bisection program, and call it regula.m. You should probably only have to change a single line to implement the Regula Falsi method. Try your code on the f1(x) function with the starting points 0 and 5. How many steps does this take? Try to explain why one side of the interval never seems to change.

The Secant Method

Regula Falsi is better than bisection for some problems. However, it's easy to defeat it. While it's good at linear functions, it can't handle a function where the second derivative is important. What tends to happen is that one endpoint has a very large function value, which forces the method to look near the other endpoint. This can mean that the method gets bogged down, moving the "small" side of the interval in very slowly, when the problem is at the other end.

The secant method drops the idea of always keeping a change of sign interval. Instead, when it computes a new point, it always throws away the oldest of its original pair of points. To reflect the fact that we don't have a change of sign interval any more, the secant algorithm will use the names x1 and x2 for the pair of points.

The secant method retains the idea of using a linear model of the function. That means that the iteration step is the same as in the Regula Falsi method.

xnew := ( f(x1) * x2 - f(x2) * x1 ) / ( f(x1) - f(x2) )

Thus, each step of the secant algorithm starts with points x1 and x2, and computes xnew. In order to be ready for the next step, the secant algorithm does the following at the end of its step:


        x1 = x2;
        x2 = xnew;
      

One explanation for the speedup of the secant method is the fact that it throws away old data. Thus, if the method is converging, it is constantly using the very best data, and hence accelerating. Conversely, if the method begins to go bad, it very quickly forgets the information that could bring it back to reality, and flies away to infinity!

Exercise: make a copy of your Regula Falsi program, and call it secant.m. Rename the variables and change the code to use the secant method. Try your code on the f1(x) function with the starting points 0 and 5. How many steps does this take?

Newton's Method

Both Regula Falsi and the secant method use a pair of points, essentially to figure out the slope of the function, and hence where it will cross the axis. Let's suppose the process is converging. If we're converging, then the most recent iterate is the closest to the root, and hence the very best estimate of the slope is the derivative at the current point. Newton's method is using a (linear) model of the unknown function, but its "evidence" is now all gathered at the latest point.

(Newton's method is even faster than the secant method; this includes the fact that it can "go bad" or diverge faster!)

The iteration step for Newton's method is:

x := x - f(x) / f'(x)
We don't save any old information, so I didn't bother calling the new point xnew. It just immediately replaces the previous value of x.

By the way, if we applied Newton's method to the function

f(x) = x * x - 16
our iteration step would be

        x := x - f(x) / f'(x)
          := x - ( x * x - 16 ) / ( 2 * x )
          := x - 0.5 * x + 0.5 * 16 / x
          := 0.5 * ( x + 16 / x )
      
Does this formula look familiar?

Unfortunately, the form of Newton's method complicates our MATLAB coding, because now we need to write two routines, for the function and its derivative. (There are other ways to do this, of course!) However, it means that the first line of our function will probably look like:


        function result = newton ( f, df, x )
      
where df is the name of yet another M file, which has to compute the derivative of the function f.

Exercise: make a copy of your secant program, and call it newton.m. Change your function statement as described above. You'll only need one x variable now. Where you were evaluating the function, you'll now have to do two calls:


        fx = feval ( f, x );
        fp = feval ( df, x );
      
Then insert the new iteration step:

        x = x - fx / fp;
      
Try your code on the f1(x) function with the starting point 0. (You have to write a derivative routine for f1(x). Call that df1.m). How many steps does this take?

Muller's Method

Muller's method is best understood as another attempt to "model" the unknown function. At each step of the method, three points are used. Muller's method determines the quadratic polynomial that passes through these three points, and then solves for the roots of that polynomial, and chooses one of them to add as its latest point, while discarding the oldest point.

Muller's method can be fast, but there are several things that can go wrong. If the data points are very close, or lie on a straight line, there are accuracy problems in computing the coefficients of the quadratic, and more problems in determining its roots. Also, the coding for Muller's method is significantly more complicated than for other methods we've seen.

A strange feature of Muller's method is that the quadratic polynomial might have no real roots. However, in that case, it may be perfectly acceptable to chase one of the complex roots, and if you allow the method to do this, it works just the same. If you look at the algorithm I am giving you, you only have to delete one line to make it work on a problem with complex roots.

Exercise: get a copy of my version of Muller's method. Try your code on the f1(x) function with the starting points 0, 5 and 10. How many steps does this take? EXPLAIN!

Extra exercise: if you are interested, then figure out what line of algorithm needs to be deleted so that complex roots can be sought. Then try your code on

f(x) = x^4 + 10 * x^2 + 9

Brent's Method

Richard Brent devised a routine that combines the reliability of bisection with the speed of the secant method, and added another method that can be faster yet. The idea is that you start with a change of sign interval, and you never leave it. You have three choices for your next step:

If the iteration has been proceeding pretty well, then take the "fastest" step you can that does not move you out of the current change of sign interval. However, if you've been trying secant or inverse quadratic steps, and yet the interval hasn't shrunk very much, then fall back on a bisection step.

As you can imagine, this algorithm is pretty complicated to write out. However, it clearly displays more intelligence. It uses a variety of methods, and tries to adjust what it does based on its experience with the current problem. Such a method is called adaptive. Moreover, it is carefully balanced so that it will always get an answer (because it never leaves the change of sign interval, and forces a bisection step every so often), and tries to get the answer fast.

Exercise: get a copy of my version of Brent's method. Try your code on the f1(x) function with the starting points 0 and 5. How many steps does this take?

Assignment

One disadvantage of Newton's method is that we have to supply not only the function, but also a derivative. In this assignment, we will try to make life a little easier.

The idea is that we want to approximate the derivative of the function. One way would be to consider a nearby point

xp = x + dx
evaluate the function there, and then approximate the derivative by
fp = ( fxp - fx ) / ( xp - x )

Here are three ways to pick the stepsize dx:

  1. Simply set xp to the previous iterate. (On the first step, set xp to x+1.0);
  2. Always use a fixed value, say dx = 0.001;
  3. Use a value that gets smaller as you converge, say dx = fx.
We will try idea #2, using a fixed stepsize.

Make a copy of your Newton code, called newton1.m, whose declaration looks like

function root = newton1 ( f, x )
Replace the line
fp = feval ( df, x )
by the appropriate lines to approximate the derivative using a stepsize dx = 0.001. Check out your code on the function f1(x)=x^2-9.

Assignment: For the function f2(x)=x^6-x-1 and the starting point x = 5.0 and the absolute function tolerance of 0.000001, compute the number of steps taken to converge to the root x=1.134... using the standard method, and three stepsizes dx:

                              Number of steps

        Newton                __________________   (You must write df2.m!)
 
        dx = 0.1              __________________

        dx = 0.001            __________________

        dx = 0.00001          __________________
      
Write this data in a file called, say, newtons, and mail it to me.


Last revised on 27 September 1999.