MATH2071: LAB #7: Solving Linear Systems

Introduction
Test Matrices
The Linear System Problem
The Determinant
The Inverse Matrix
Gauss Elimination
PLU Factors
PLU Solution
Matrix Multiplication
ASSIGNMENT

Introduction

We now begin the fundamental topic in Numerical Analysis, Numerical Linear Algebra. We assume you are familiar with the concepts of vectors and matrices, matrix multiplication, vector dot products, and the theory of the solution of systems of linear equations, the matrix inverse, eigenvalues and so on. We now concentrate on the practical aspects of carrying out the computations required to do linear algebra in a numerical setting.

We begin by noting three special test matrices that we will refer to when trying out algorithms. Then we state the linear system problem and consider three methods of solution, using the determinant, the inverse matrix, or Gauss factorization.

Test Matrices

We will now consider a few simple test problems that we can use to study the behavior of the algorithms for solution of a linear system. We can use such problems as a quick test that we haven't made a serious programming error, but more importantly, we also use them to study the accuracy of the algorithms. We want test problems which can be made any size, and for which the exact values of the determinant, inverse, or eigenvalues can be determined.

The three test problems I have chosen for us are:

The second difference matrix:
The Frank matrix,
The Hilbert matrix.

We saw the second difference matrix when we were approximating the second derivative on equally spaced data. The formula is:

A(I,I) = -2
A(I,I-1) = A(I,I+1) = 1, where these columns exist.
A(I,J) = 0, otherwise.

Example:

        -2  1  0  0  0
         1 -2  1  0  0
         0  1 -2  1  0
         0  0  1 -2  1
         0  0  0  1 -2

The matrix is positive definite, symmetric, and tridiagonal. The determinant is plus or minus N+1.

Row I of the Frank matrix has the formula:

A(I,J) = 0, for J = 1, ..., I-2;
A(I,J) = N+1-I for J = I-1;
A(I,J) = N+1-J for J = I, I+1, ..., N.

Example:

The determinant of the Frank matrix is 1, but this is difficult to compute correctly.

The Hilbert matrix comes up when doing polynomial approximation on the interval [0,1]. The formula is

A(I,J) = 1 / ( I + J - 1 )

Example:

        1/1  1/2  1/3  1/4  1/5
        1/2  1/3  1/4  1/5  1/6
        1/3  1/4  1/5  1/6  1/7
        1/4  1/5  1/6  1/7  1/8
        1/5  1/6  1/7  1/8  1/9

The polynomial coefficients c that represent a function f are found by computing a matrix A and right hand side b and solving the linear system. The (I,J) entry of the matrix is actually the integral of the polynomial x^i+j-2 while the right hand side entry b(i) is the integral of f(x)*x^i-1. Theoretically, this is is a perfectly reasonable approach, but it results in disaster.

M Files - Copy the files dif2.m, frank.m and hilbert.m Try out the codes by generating the examples of order 5 for each test matrix.

The Linear System Problem

The linear system problem may be posed as follows:

Given an M by N matrix A and M-vector b, find an N-vector x so that A * x = b.

You probably already know that there is "usually" a solution if the matrix is square, (that is, if M=N). We will concentrate on this case for now. But we may wonder if there is an intelligent response to this problem for the cases of a square but singular matrix, or a rectangular system, whether over- or underdetermined.

For the standard problem, you already know several algorithms for producing a solution, including the use of determinants, the inverse matrix, Gauss-Jordan elimination and Gauss factorization. We will consider some of these methods, and discover that there are good reasons to prefer the Gauss factorization.

We will be concerned about several topics:

Efficiency: what algorithms produce a result with less work?
Accuracy: what algorithms produce an answer that is likely to be more accurate?
Difficulty: what makes a problem impossible or difficult to solve?
Special cases: how do we solve problems that are big? symmetric? banded? singular? rectangular?

The Determinant

A classic result from linear algebra is the following:

The linear system problem is uniquely solvable if and only if the determinant of the matrix A is nonzero. In this case the solution x can be expressed in a formula involving determinants of matrices related to A.

This seems like a very helpful result. It is one way to solve systems of order 2, 3, even 4 equations. However, the determinant computation is:

expensive, requiring the summation of N! numbers, each the product of N matrix entries;
complex, involving addition and subtraction of similar quantities, that is hard to do accurately;
sensitive, small errors in the data or arithmetic can make enormous changes in the result.

Exercise: copy the M file determinant.m from the web page. This function returns the determinant of a matrix, using the definition. The MATLAB function flops reports the total number of floating point operations carried out so far. Use this to count the work involved in the determinant function by issuing commands like this:

work = flops
determinant(dif2(1)))
work = flops - work

work = flops
determinant(dif2(2))
work = flops - work
...(and so on)

To compute the work involved, take the difference of the FLOPS count before and after each task.

        N    DETERMINANT(DIF2(N))  WORK 
                                    (operations)

        1    _____________          __________
        2    _____________          __________
        3    _____________          __________
        4    _____________          __________
        5    _____________          __________
        6    _____________          __________

As N increases, how is the value of WORK increasing? If it takes 1 second to compute determinant(dif2(10)) this way, then how long do you expect that it would take to compute the determinant(dif2(20))? 100 seconds? 10,000 seconds? 30,000,000 seconds (= 1 year)?

Exercise: MATLAB has a command called det(A) which computes a numerical approximation to the determinant. Let's repeat the exercise using this approach.

        N    DETERMINANT(DIF2(N))  WORK 
                                    (operations)

        1    __________             __________
        2    __________             __________
        4    __________             __________
        8    __________             __________
       16    __________             __________
       32    __________             __________
       64    __________             __________

Do the values of det and determinant agree? As we double N, what happens to the amount of work required? Suppose it takes a second to compute the determinant for N=10. How long do you expect that it would take to compute the determinant for N=20? 100 seconds? 10,000 seconds? 30,000,000 seconds?

The results from the determinant function should convince you that we can't work with the determinant as defined. You might think that we could use the MATLAB det function instead. But MATLAB uses Gauss factorization to get the determinant, and that's where we're actually going to end up.

We won't even begin to consider the nasty formulas that are required to solve a linear system using determinants. Instead, we will now move on to the next classic (and flawed) method.

The Inverse Matrix

Another classic result from linear algebra is the following:

The linear system problem is uniquely solvable if and only if the inverse matrix A^-1 exists. In this case the solution x can be expressed as x=A^-1*b.

This result is even nicer than the one involving determinants. Consider that once we have computed the inverse matrix, we can solve as many linear systems involving the matrix A as we want, simply by multiplying. If we had used determinants, we'd have to start our computation all over again.

But what's the catch? Well, in particular, the way you are usually taught to find an inverse is by computing determinants. And that's not an option. However, let's suppose that we actually had a formula for the inverse matrix. (This is a completely unrealistic idea!). In that case, we could go ahead and solve the linear system. For our three test matrices, we happen to to have such a formula.

Exercise - Copy the M files dif2_inv.m, frank_inv.m and hilbert_inv.m. These compute the inverse of each matrix, exactly. Test the use of the inverse to solve a linear system by setting up a 10 by 10 matrix, multiplying it by the vector [1,2,...,10] to get the right hand side. I've written the right hand sides of the statements you will need, to get you started:


          A =       __________
          x =       __________
          b =       __________
          Ainv =    __________
          x_sol =   __________
          max_err = __________

Repeat for a larger N and for the Frank and Hilbert matrices.

        Matrix       N        Maximum Absolute Error

        DIF2        10        __________
                    20        __________
        FRANK       10        __________
                    20        __________
        HILBERT     10        __________
                    20        __________

These results look pretty good, until we get to the Hilbert matrix, which we will come to see as the "bad boy" of linear algebra. You can pretty much expect every linear solution algorithm to fail on this matrix unless you are extremely careful. So we won't criticize the inverse matrix technique on the basis of the bad results for the Hilbert matrix.

But in the real world, we are very unlikely to have an exact formula for the inverse of a matrix. It is possible to compute the inverse, or more properly, to approximate it. However, this exposes us to an additional source of errors. (We already know that the Hilbert matrix was hopeless when we had the exact inverse, so we'll worry now about what happens with the other two.) Presumably, the errors we make in computing the inverse will then be reflected as additional errors in the solution of our linear system. Let's get a feeling for how worse this might make things.

Exercise - the MATLAB command inv(A) computes an approximation for the inverse of a matrix A. Let's repeat the previous experiment, but now using the approximate inverse instead of the exact one.

        Matrix       N        Maximum Absolute Error

        DIF2        10        __________
                    20        __________
        FRANK       10        __________
                    20        __________
        HILBERT     10        __________
                    20        __________

Particularly for the larger problems, you should start to see significant errors cropping up. Watch out for cases where MATLAB prints a vector using scientific notation. A large multiplier like 1.0e+03 may appear in front of a vector of numbers that look small by themselves!

Gauss Elimination

It is now time to talk about the standard algorithm for solving linear systems, usually called "Gauss elimination". I really don't want to belabor this idea. The details are excruciatingly boring, although it is actually a very tricky algorithm to code correctly. Yet we all learned it in high school when our minds were surely on other things!

Just as an extremely simple example, think about solving the problem

        [ 1   9 ]  [ x1 ]   [ 19 ]
        [ 2   4 ]  [ x2 ] = [ 10 ]

First, we will prefer to think of the process as having two separable parts, the factorization and back substitution steps. The important things to recall include:

in the K-th step of factorization, one of the 'free' equations is chosen to become the pivot equation, to be associated with x_K.
the pivot equation is used to eliminate references to x_K in the remaining free equations.
after N-1 steps of factorization, back substitution begins. The last equation involves a single variable, and can be solved for x_N. But then equation N-1 can be solved for x_N-1, because x_N is now known, and similarly, the other equations are solved in backwards order.

The simplest version of Gauss factorization is called Gauss factorization with no pivoting, because it has a very simple way of choosing the pivot equation. The first equation is always associated with the first variable, and so on. If on step K, the coefficient of x_K is zero in equation K, this method will fail, even though a solution may be exist.

The method of Gauss factorization with partial pivoting chooses the pivot equation as the free equation with largest coefficient for x_K. To keep the calculation orderly, this pivot row is actually moved into the K-th row of the matrix. In this case, the matrix gradually is transformed into upper triangular shape. For exact arithmetic, if there is a solution, this method is guaranteed to reach it. Moreover, for inexact arithmetic, this method has better accuracy properties.

On step K we actually can choose not only which equation we want to work with, but which variable we want to eliminate. The method of Gauss factorization with complete pivoting chooses the coefficient A_I,J of largest absolute magnitude in the free equations, associates equation I with variable J and then eliminates x_J from the other free equations. This method has more bookkeeping than the partial pivoting method, and doesn't produce much more improvement, so it is little used.

Exercise - to start with, let's treat Gauss factorization and back substitution as a complete "black box". In that case, to solve the linear system A*x=b, we simply type the MATLAB command

x = A \ b

Try setting up and solving the difference, Frank and Hilbert systems using this approach.

        Matrix       N        Maximum Absolute Error

        DIF2        10        __________
                    20        __________
        FRANK       10        __________
                    20        __________
        HILBERT     10        __________
                    20        __________

This is the third time we've solved the problem. We probably can't get a smaller error than with the exact inverse. But did we do significantly better or worse than when we used the approximate inverse?

(I'd expect you to do a little better this time, actually. MATLAB computes the approximate inverse by PLU factorization and solution; then we multiply the inverse times the right hand side. It's usually faster and more accurate to compute the solution directly!)

PLU Factors

The first step of Gauss elimination actually can be viewed as factoring the original matrix A into the form:

A = P * L * U

where

P is a permutation matrix;
L is a unit lower triangular matrix;
U is an upper triangular matrix;

Then backward substitution step of Gauss elimination can be thought of as using this factor information to "peel away" the multiplication a step at a time:


          P * L * U * x =                b
              L * U * x =           P^-1 * b
                  U * x =      L^-1 * P^-1 * b
                      x = U^-1 * L^-1 * P^-1 * b

However, instead of explicitly computing the inverses of the matrices, we use special facts about their form. It's not the inverse matrix itself that we want, but the inverse of the right hand side.

Hand calculation: For our extremely simple example, the PLU factorization is:

        [ 1   9 ]    [ 0  1 ]   [ 1   0 ]   [ 2  4 ]
        [ 2   4 ] =  [ 1  0 ] * [ 1/2 1 ] * [ 0  7 ]

Look over the right hand side and verify that the three factors have the right form. Multiply P*L*U and verify that the result is A. (Usually, of course, the factors are much uglier than this!)

Exercise: Copy the file ge_pp.m from the web page. It computes a factorization of a matrix by carrying out Gauss elimination with partial pivoting. To try the code out, carry out the following steps:

Set A to be the difference matrix of size 5;
Use ge_pp to factor A into matrices P, L and U;
Look at each of the three matrices and notice their properties;
Verify that A=P*L*U;

You might wonder why this routine stops at factorization. We will see shortly that we can use the PLU factorization for many different tasks, not just for a single linear solve.

PLU Solution

We have labored hard to determine the PLU factorization of a matrix A. It is time to make this effort pay off. We are going to design and construct an algorithm for solving the linear system problem A*x=b assuming we have already computed the PLU factorization of A.

Let's consider the factored linear system once more. Just like in the case of systems of ODE's, it helps to make up some names for variables and look at the problem in a different way:


          P * ( L *   U * x ) = b  
          P * z               = b

                L * ( U * x ) = P^-1 * b
                L *       y   = z

                      U * x   =      L^-1 * P^-1 * b
                      U * x   = y

                          x   = U^-1 * L^-1 * P^-1 * b

Notice that all I have done is to use parentheses to group factors, and name them. But you should now be able to see what an algorithm for solving this problem might look like.

PLU Solution Algorithm: To solve A*x=b given factors P, L, U and right hand side b,

Solve P * z = b;
Solve L * y = z;
Solve U * x = y.

Ouch! I promised a simple algorithm, but now instead of having to solve one system, we have to solve three, and we have three different solution vectors running around. But I promise you, things actually have gotten better, because these are really simple systems to solve:

The solution of P * z = b is z = P' * b;
It's easy to solve L * y = z, starting with row 1;
It's almost as easy to solve U * x = y, starting with row N.

Hand calculation: For our extremely simple example, the PLU factorization is

        [ 1   9 ]    [ 0  1 ]   [ 1   0 ]   [ 2  4 ]
        [ 2   4 ] =  [ 1  0 ] * [ 1/2 1 ] * [ 0  7 ]

Try going through the steps of solving the system, with right hand side [19, 10]', using this factorization:

         Step
           0    B = [          19,         10 ]
           1    Z = [  __________, __________ ]
           2    Y = [  __________, __________ ]
           3    X = [  __________, __________ ]

Writing down the solution steps may help you with your program assignment.

Matrix Multiplication

If you're comfortable handling vectors and matrices in programs, you can skip ahead to the assignment. Otherwise, let's pause for a moment and try to write a program to compute the product of a matrix and a vector, to get comfortable with a few ideas.

To start with, our function will be stored in the file matvec.m and have the form:

function b = matvec ( A, x )

computing the product of A*x

If I'm given a matrix A, I can ask MATLAB to tell me its dimensions with the size command:

[ m, n ] = size ( A )

Our result vector b should have m rows and and 1 column. To zero it out, we could use a for loop, but a better way, which ensures that it has the right shape as well is:

b = zeros ( m, 1 )

We think of the index i as associated with rows. Each element b(i) is defined as the sum over j of A(i,j) * x(j). So this code is straightforward, except I won't actually write the punchline out:

        
          for i = 1 : m
            for j = 1 : n
              b(i) = ?
            end
          end

Exercise - Try to put together this routine and compute the product of the Frank matrix with the vector of all 1's. It should be easy to check your work using MATLAB.

If you'd like some more practice, try to write the related routine matmat.m which allows you to compute C=A*B, for the matrices A and B of order LxM and MxN respectively. Again, it's easy to check your results by using MATLAB.

Assignment

Assignment - Write a routine plu_solve.m, to solve the linear system A*x=b by solving the permutation, lower and upper triangular systems. The routine should have the form:

function x = plu_solve ( P, L, U, b )

Design suggestions: - To start this code, set up three sections of code, with the comments

% Solve P*z=b
% Solve L*y=z
% Solve U*x=y

Then fill in the code for each section:

Solving the permutation matrix system is easy, because P is orthogonal, hence its inverse...is its transpose! So we can use the fact that the solution of a linear system can be computed by multiplying the right hand side by the inverse.

To solve L*y=z, write something like:

            y = zeros ( n, 1 )
            for j = 1 : n
              y(j) = ?               ...Solve equation j
              for i = j+1 : n
                z(i) = z(i) - ?      ...Update the right hand sides
              end
            end

You may use the fact that the diagonal entries of L are 1.

To solve U*x=y, write something like:

            x = zeros ( n, 1 )
            for j = n : -1 : 1
              x(j) = ?                ...Solve equation j
              for i = 1 : j-1
                y(i) = y(i) - ?       ...Update the right hand sides
              end
            end

The diagonal entries of U are not 1, by the way.

Test: When you think your code is working, carry out the following test:

A = frank ( 5 )
[P,L,U] = ge_pp ( A )
x = [1:5]'
b = A * x
x2 = plu_solve ( P, L, U, b )

Results: If your code has passed the test I just descrbed, it is probably working correctly. I want to take a look at your code (that's all). Mail your plu_solve.m code to me.

Last revised on 21 February 2000.