We know we can use the PLU factorization to solve a linear system, ... provided that the system is square, and that it is nonsingular, and that it is not too badly conditioned. And in fact, this is usually the case.
However, if we want to handle problems with a bad condition number, or that are singular, or even rectangular, we are going to need to come up with a different approach. In this lab, we will look at two versions of the QR factorization:
A = Q * Rwhere Q is an "orthogonal" matrix, and R is upper triangular.
We will see that this factorization can be used to solve the same problems that the PLU factorization handles, but can be extended to do several other tasks for which the PLU factorization is useless.
In situations where we are concerned about controlling roundoff error, the QR factorization can be more accurate than the PLU factorization, and is even a little easier to use when solving linear systems, since the inverse of Q is simply Q^{T}.
The QR factorization also gives us a way to solve rectangular linear systems, and is the key idea behind a method for reliably computing all the eigenvalues and eigenvectors of a matrix.
Definition: An orthogonal matrix is a real matrix whose inverse is equal to its transpose.
By convention, an orthogonal matrix is usually denoted by the symbol Q. The definition of an orthogonal matrix immediately implies that:
Q*Q^{T}=Q^{T}*Q=I
From the definition of an orthogonal matrix, and from the fact that the L2 vector norm of x can be defined by:
||x||_{2} = sqrt ( x^{T} * x )and the fact that
(A*x)^{T}=x^{T}*A^{T}you should be able to deduce an amazing fact about orthogonal matrices, namely:
||Q*x||_{2} = ||x||_{2}If I multiply x by Q and its L2 norm doesn't change, then Q*x must lie on the circle whose radius is x. In other words, the only thing that has happened is that Q*x has spun around the origin by some angle. That means that an orthogonal matrix represents a rotation or reflection.
Exercise: copy the file random_orthogonal.m from the web page. It computes a random orthogonal matrix of given dimension. Each time you call it, you should get a different matrix, so you can use this routine for experiments. Using your wits, and the experimental data, verify, contradict, or improve the following statements about a random orthogonal matrix Q:
Suppose we have N vectors x_{i}, each of length M. Here are some common and related problems:
Since we actually plan to do these calculations numerically we also need to be concerned about how we deal with numerical error. For instance, in exact arithmetic, it's enough to say a matrix is not singular. In numerical calculations, we would like to be able to say that a particular matrix, while not singular, is "nearly singular".
The Gram Schmidt method can be thought of as a process which analyzes a set of vectors X, producing an "equivalent" (and possibly smaller) set of vectors Q which span the same space, have unit L2 norm, and are pairwise orthogonal.
Note that if we can produce such a set of vectors, then we can easily answer many of the questions in our set of problems. In particular, the size of the set Q tells us whether the set X was linearly independent, and the dimension of the space spanned and the rank of a matrix constructed from the vectors. And of course, the vectors in Q are the orthonormal basis we were seeking. So you should believe that being able to compute the vectors Q is a valuable ability. In fact, the QR factorization can help us with all the problems on our list, primarily because it's possible to look at every number in the factorization and understand what that number is telling us about the linear system.
In words, the Gram-Schmidt process goes like this:
Here is a sketch of the Gram Schmidt process as an algorithm. Assume that n is the number of x vectors:
nq = 0 for j = 1 to n for i = 1 to nq r_{ij} = q_{i}' * x_{j} x_{j} = x_{j} - q_{i} * r_{ij} end r_{jj} = sqrt ( x_{j}' * x_{j} ) if ( r_{jj} ~= 0 ) nq = nq + 1 q_{nq} = x_{j} / r_{jj} end endYou should be able to match this algorithm to the previous verbal description. How is the L2 norm of a vector being computed?
M-file - implement the Gram-Schmidt process in an M-file called gram_schmidt.m. Assume the vectors are stored as columns of a matrix called X. Your function should have the form:
function Q = gram_schmidt ( X )You can reference the vector formed by the j-th column of X by using the notation X(:,j).
Exercise - Test your Gram-Schmidt code using the following input:
X = ( 2 -1 0 ) ( -1 2 -1 ) ( 0 -1 2 ) ( 0 0 -1 )If your code is working correctly, you should compute approximately:
Q = ( 0.89 0.36 0.20 ) ( -0.45 0.72 0.39 ) ( 0.00 -0.60 0.59 ) ( 0.00 0.00 -0.68 )You should verify that the columns of Q have L2 norm 1, and are pairwise orthogonal. What is an easy way to check this?
Explain the following statements about the matrix Q you just computed:
We need to take a closer look at the Gram-Schmidt process. Recall how the process of Gauss elimination could actually be regarded as a process of factorization. This insight enabled us to solve many other problems. In the same way, the Gram-Schmidt process is actually carrying out a different factorization that will give us the key to other problems.
Just to keep our heads on straight, let me point out that we're about to stop thinking of X as a bunch of vectors, and instead regard it as a matrix. Since our easily-confused brain likes matrices to be called A, that's what we'll call our set of vectors from now on.
Now, in the Gram-Schmidt algorithm, the numbers that we called r_{ij} and r_{jj}, which we computed, used, and discarded, actually record important information. They can be regarded as the nonzero elements of an upper triangular matrix R. The Gram-Schmidt process actually produces a factorization of the matrix A of the form:
A = Q * RHere, the matrix Q has the same M by N "shape" as A, so it's only square if A is. The matrix R will be square (N by N), and upper triangular.
Now that we're trying to produce a factorization of a matrix, we need to modify the Gram-Schmidt algorithm of the previous section. Every time we consider a vector x_{j} at the beginning of a loop, we will now always produce a vector q_{j} at the end of the loop. Hence, if r_{jj}, the norm of vector x_{j}, is zero, we will simply set q_{j} to the zero vector.
M-file: make a copy of your previous M-file and call it gs_factor.m. Modify it to compute the Q and R factors of a rectangular matrix A. It should have the form
function [ Q, R ] = gs_factor ( A )
Exercise: when you think your code is correct, use it to compute the QR factorization of the Frank matrix of order 4. To verify your results, the following statements should be checked:
Another approach to the QR factorization of a matrix proceeds through a series of partial factorizations A=Q_{k}*R_{k}, where the first Q is the identity matrix, and the first R is the matrix A. When we begin the k-th step of factorization, our factor R_{k-1} is only upper triangular in columns 1 to k-1. Our goal on the k-th step is to find a better factor R_{k} which is upper triangular through column k. If we can do this process n-1 times, we're done.
Suppose, then, that we've partially factored the matrix A, up to column k-1. In other words, we have a factorization
A = Q_{k-1} * R_{k-1}for which the matrix Q_{k-1} is orthogonal, but for which the matrix R_{k-1} is only upper triangular for the first k-1 columns.
To proceed from our partial factorization, we're going to consider taking some orthogonal matrix H_{k}, and "inserting" it into the factorization as follows:
A = Q_{k-1} * R_{k-1} = Q_{k-1} * H^{T} * H * R_{k-1}Then, if we define
Q_{k} = Q_{k-1} * H^{T}it will again be the case that:
R_{k} = H * R_{k-1}
A = Q_{k} * R_{k}and we're guaranteed that Q_{k} is still orthogonal. The interesting question is, if we pick the right H, can we "improve" the appearance of R_{k}, so that it is actually upper triangular all the way through column k.
In fact, we can do this by picking an appropriate Householder matrix. The formulas for a Householder matrix are a little complicated, and we won't go into them here. However, what you should understand is that, for the problem we've just described, we can determine a Householder matrix H with the property that R_{k}=H*R_{k-1} is "slightly more upper triangular" than R_{k-1}.
M-file - copy the file householder.m from the web page. This function accepts the name of a matrix R and the index of a column k, and returns a Householder matrix H such that H*R has only zero entries below the diagonal in column k.
Exercise - define the matrix A to be the Frank matrix of order 5. Compute H_{1}, the Householder matrix that knocks out the subdiagonal entries in column 1 of A, and then compute A1=H_{1}*A. Does the result have the proper form?
Now let's compute H_{2}, the Householder matrix that knocks out the subdiagonal entries in column 2 of A, and compute A2=H_{2}*A. This matrix should have subdiagonal zeros in column 2. You should be convinced that you can zero out any subdiagonal column you want.
Now let's try to zero out all the subdiagonal entries. Proceed as follows:
A = frank ( 5 ) Q = eye ( 5 ) R = A H = householder ( R, 1 ) Q = Q * H' R = H * R Q * R H = householder ( R, 2 ) Q = Q * H' R = H * R Q * R ...As you watch the calculation, you should see that the R matrix is gradually being zeroed out below the diagonal, and that the product of Q and R still equals A. Once we have completed the 4-th step, we're done.
For a rectangular M by N matrix A, the Householder QR factorization has the form
A = Q * Rwhere the matrix Q is M by M (hence square and truly orthogonal) and the matrix R is M by N, and upper triangular (or upper trapezoidal if you want to be more accurate.)
If the matrix A is not square, then this definition is different from the Gram-Schmidt factorization we discussed before. The obvious difference is the shape of the factors. Here, it's the Q matrix that is square. The other difference, which you'll have to take on faith, is that the Householder factorization is generally more accurate, and easier to define compactly.
Householder QR Factorization Algorithm:
M-file - Make an M-file h_factor.m. It should have the form
function [ Q, R ] = h_factor ( A )Use the routine householder.m that you copied earlier, in order to compute the H matrix that you need at each step.
Exercise - Test your code by computing the QR factorization of the Hilbert matrix of order 4. You should get something like this:
Q = 0.84 -0.52 0.15 0.03 R = 1.19 0.67 0.47 0.37 0.42 0.44 -0.73 0.32 0 0.12 0.13 0.12 0.28 0.53 0.14 0.79 0 0 0.01 0.01 0.21 0.50 0.65 0.53 0 0 0 0.00
By the way, there is plenty of information in this factorization. To make the 4-th column of A, for instance we use 37 percent of column 1 of Q, 12 percent of column 2, 1 percent of column 3, and practically nothing of column 4. The fact the the diagonal entry is so small compared to the other entries in the 4-th column allows us to assign a sort of measure to how linearly dependent the 4-th column of A is on the previous 3 columns (since the space spanned by the first three columns of A is the same as that spanned by the first three columns of Q.
Exercise - You may have written code that only works properly for a square matrix. In order to check that you have set things up to work for any rectangular matrix A, carry out the following tests:
A = rand ( 2, 4 )and repeat the test for random matrices of sizes (3,4), (4,4), (5,4) and (6,4). If you get a large error on any of these computations, you'll need to go back and rethink your code.
[Q,R] = h_factor ( A )
A2 = Q * R;
norm ( A - A2, inf )
If we have computed the Householder QR factorization of a matrix without encountering any singularities, then it is easy to solve linear systems. We use the property of the Q factor that Q^{T}*Q=I:
A * x = bso all we have to do is form the new right hand side Q^{T} * b and then solve the upper triangular system. And that's easy because it's the same as the last step of our old PLU solution algorithm.
Q * R * x = b
Q^{T} * Q * R * x = Q^{T} * b
R * x = Q^{T} * b
M-file - Make a copy of your file plu_solve.m, calling it h_solve.m. It should have the form
function x = h_solve ( Q, R, b )Assume that the QR factors come from the h_factor routine, so that, if A is rectangular, then so is R, while Q will always be square. Set up your code to compute Q^{T} * b and then solve the upper triangular system R * x = Q^{T} * b. For now, don't worry too much about the possibility that the matrix is rectangular.
Exercise: When you think your solver is working, test it out on a square system as follows:
n = 5 A = frank ( n ) x = [ 1 : n ]' b = A * x [ Q, R ] = h_factor ( A ) x2 = h_solve ( Q, R, b )
Instead of the Householder form of the factorization, we may use the Gram-Schmidt QR factorization. There is not much difference for the case where the system matrix is square. But if the system was rectangular, the matrix Q is rectangular, and what we do from there depends on whether the number of rows is greater or less than the number of columns.
For either method of factorization, a further complication occurs when the number of columns is greater than the number of rows. In this underdetermined situation, there are multiple solutions. Assuming we simply want one solution, any solution, the typical procedure is to set the extra degrees of freedom to zero.
We haven't had time here to discuss the issues of singularity. If the square matrix A is actually singular, what happens? We may get a very small diagonal R value, or possibly a zero value. We haven't planned on handling such a case. Similarly, in the rectangular matrix cases, it's often desirable to do what amounts to pivoting, in order to process the "best" columns of data first.
You will need to rewrite your h_solve routine so that it can handle a rectangular system. Actually, the routine only has to solve an upper triangular system. Determine the size [M,N] of the matrix R, and then do an upper triangular solve as though the matrix was of size min(M,N).
Roughly speaking, you should expect that you can get some solution x with zero residual, whenever M is no greater than N, and you will usually get an answer, with a nonzero residual, whenever M is strictly bigger than N.
Assignment: Given the system matrix A and right hand side b, use your QR factor and solve routines to compute a solution x, and the residual r=A*x-b.
System #1:
A = 1 2 3 b = 6 4 5 6 15System #2:
A = hilbert(16) b = A * ones(16,1)System #3:
A = 1.00 1.0 b = 1.98 2.05 -1.0 0.95 3.06 1.0 3.98 -1.02 2.0 0.92 4.08 -1.0 2.90System #4:
A = 1 3 b = 18 5 2 25 4 -1 7
Fill out the following table. For system #2 only, don't list all 16 elements of the solution, just the maximum solution error.
System Solution ||A*x-b||_{inf} #1 _______________ _______________ #2 _______________ _______________ #3 _______________ _______________ #4 _______________ _______________and then mail the results to me.
Comments #1 - properties of an orthogonal matrix?
Properties of an orthogonal matrix:
(Q_{1}*Q_{2})^{T}=Q^{T}_{2}*Q^{T}_{1} .with a similar statement for inverses. The important thing to notice is that the multiplication order gets reversed. Using these ideas, you can prove the statement.
Comments #2 - can a rectangular matrix be orthogonal?
A matrix Q is orthogonal if it is square and it is the case that Q^{T}*Q=Q*Q^{T}=I. From this second condition, we can conclude that an orthogonal matrix has columns of unit L2 norm which are pairwise orthogonal.
Conversely, if a matrix has columns of unit L2 norm which are pairwise orthogonal...it may loosely be called an orthogonal matrix, but it isn't, unless the matrix is square. If the matrix is M by N, with M>N (the usual case), then you will get Q^{T}*Q=I but not Q*Q^{T}=I!
We will carelessly call a rectangular matrix orthogonal. This is much more convenient than calling it a matrix "with orthonormal columns that are pairwise orthogonal". However, we have to remember that this is incorrect terminology, and that certain things true about an orthogonal matrix will not be true for rectangular "orthogonal" matrices.