# Linear Algebra Glossary

This file defines common terms from linear algebra.

## A-Orthogonal Vectors

Two vectors u and v are said to be A-orthogonal if

( u, A * v ) = 0.

Here A should be a positive definite symmetric matrix, which in turn guarantees that the expression ( u, A * v ) may be regarded as an inner product of the vectors u and v, with the usual properties.

This concept is useful in the analysis of the conjugate gradient method.

An adjacency matrix of an (undirected) graph is a matrix whose order is the number of nodes, and whose entries record which nodes are connected to each other by a link or edge of the graph.

If two nodes I and J are connected by an edge, then Ai,j=Aj,i=1. All other entries of the matrix are 0. Thus, an adjacency matrix is a zero-one matrix. The usual convention is that a node is not connected to itself, and hence the diagonal of the matrix is zero.

The product A2=A*A is a matrix which records the number of paths between nodes I and J. If it is possible to reach one node from another, it must be possible in a path of no more than n-1 links. Hence, the reachability matrix, which records whether it is possible to get from node I to node J in one or more steps, can be determined by taking the logical sum of the matrices I, A, A2, ..., An-1.

(Loops:) If edges are allowed which begin and end at the same node, (sometimes called loops) then the adjacency matrix can have some 1's on the diagonal.

(Directed graphs:) If edges are assigned a direction, then it is possible that there is an edge from node I to node J, but not in the reverse direrection. In this case of a "directed graph", an adjacency matrix can be defined as well. The only difference is that the adjacency matrix now need not be symmetric. The reachability matrix can still be defined in the same way.

(Multigraphs:) If multiple edges are allowed between two nodes, (which might be distinguished by color in a drawing), then the adjacency matrix can record this information by storing in Ai,j the number of edges between the two nodes. The reachability matrix can still be defined in the same way, and still counts the number of different routes correctly.

The adjoint matrix of a square matrix A has the property that:

A * adjoint ( A ) = adjoint ( A ) * A = det(A) * I.

Note that, if A is not invertible, then det(A)=0. We can take the convention that in this case, the adjoint of the matrix is the zero matrix.

The adjoint of a matrix A is "almost" its inverse. If we know that A is invertible, then the inverse of A, denoted inverse ( A ) or A-1, can be written explicitly as:

A-1 = ( 1 / det(A) ) * adjoint ( A ).

This formula is only of use if we can figure out how to evaluate the things on the right hand side. The first factor is not so bad, since we do know a formula for the determinant of a matrix.

But there is also a way of evaluating the second factor. The adjoint matrix is defined in terms of the cofactor matrix of A:

adjoint ( A ) = ( cofactor ( A ) )'.

## Alternating Sign Matrix

An alternating sign matrix is an integer matrix with the properties that

• the entries are only 0, +1, or -1;
• each row sum is 1;
• each column sum is 1;
• the first (and last) nonzero entry in each row and column is 1;
• the nonzero entries in any row or column alternate in sign.

Example:

0   1   0   0   0
1  -1   0   1   0
0   1   0  -1   1
0   0   0   1   0
0   0   1   0   0

Obviously, alternating sign matrices include the identity matrix and any permutation matrix. From the definitions, you should see that the first row must contain a single entry which is 1, with the other values being 0. The nonzeroes in the second row can be a single 1, or the values, 1, -1, 1, in that order, with some intervening zeroes possible. In the third row, the value -1 may occur up to as many times as there were 1's in preceding rows, which means the most interesting row could be 1, -1, 1, -1, 1, -1, 1. Thus the number of possible nonzeroes grows until the central row of the matrix is reached. Since the same restrictions apply from the bottom reading up, the number of possible nonzeroes must now decrease. Similar reasoning controls the nonzero population of the columns.

If we let An denote the number of distinct alternating sign matrices of order n, then it has only recently been proved that

An = Product ( 0 <= I <= N-1 ) (3*I+1)! / (N+I)!
giving the sequence 1, 2, 7, 42, 429, 7436, 218348, 10850216, ...

Reference:

1. David Robbins,
The Story of 1, 2, 7, 42, 429, 7436, ...,
Mathematical Intelligencer,
Volume 13, Number 2, pages 12-19.
2. David Bressoud,
Proofs and Confirmations: The Story of the Alternating Sign Matrix Conjecture,
Cambridge University Press, 1999.

## Anticirculant Matrix

An anticirculant matrix is a matrix whose first row of values is repeated in each successive row, shifted one position to the left, with the first value "wrapping around" to the end.

Here is an example of an anticirculant matrix:

1 2 3 4
2 3 4 1
3 4 1 2
4 1 2 3

and here is an example of a rectangular anticirculant matrix:
1 2 3 4 5
2 3 4 5 1
3 4 5 1 2

Simple facts about a anticirculant matrix A:

• A is constant along any antidiagonal;
• A is a special kind of Hankel matrix;
• If A is square, then A is normal, hence unitarily diagonalizable.
• If A is square, then the vector (1,1,...,1) is an eigenvector of the matrix, with eigenvalue equal to the constant row sum.

## Antisymmetric Matrix

A square matrix A is antisymmetric if it is equal to the negative of its transpose:

A = - A'.

Every matrix A can be decomposed into the sum of an antisymmetric and a symmetric matrix:

A = B + C = (1/2) * ( ( A - A' ) + ( A + A' ) )

Simple facts about an antisymmetric matrix A:

• A has a zero diagonal;
• A has pure imaginary eigenvalues;
• A is normal, hence unitarily diagonalizable.
• I - A is not singular;
• The matrix ( I + A ) * Inverse ( I - A ) is orthogonal;
• If the order of A is odd, then the determinant is 0.

An antisymmetric matrix is also called skew symmetric.

In complex arithmetic, the corresponding object is a skew Hermitian matrix.

## Arnoldi Iteration

The Arnoldi iteration is an iterative method that seeks to approximate some of the eigenvalues of a very large matrix.

If the matrix is symmetric or Hermitian, then the Arnoldi iteration is usually reformulated into the more efficient Lanczos iteration.

We start by considering the unitary similarity transform that converts the m by m matrix A into upper Hessenberg form:

A = Q H Q*

Now we let Qn indicate the m by n matrix formed from the first n columns of Q; similarly, Hn is the n+1 by n+1 submatrix of H. Then we write

A Qn = Qn+1 Hn

In particular, we can look at the subsystem involving the n-th column of Qn:

A*qn = h1,n q1 + ...
+ hn,n qn
+ hn+1,n qn+1

This can be interpreted as a recursive relationship for qn+1 in terms of the previous vectors. The Arnoldi iteration allows us to build up the matrices Qn and Hn. For reasons not completely understood, even for n much smaller than m, the eigenvalues of Hn will tend to some of the eigenvalues of A, and typically the extreme ones.

The new vector qn+1 is a Krylov vector; actually, because of orthogonalization, it represents the new component introduced into the Krylov subspace by the product A*qn.

Thus, a typical procedure is to compute Hn and apply a standard eigenvalue procedure for Hessenberg matrices to extract its eigenvalues.

An efficient form for the Arnoldi iteration is:

Let b be an arbitrary initial vector;
Set q(1) = b / ||b||;
for n = 1, ...
v = A * q(n)
for j = 1 to n
h(j,n) = conjugate ( q(j) ) * v
v = v - q(j) * h(j,n)
h(n+1,n) = ||v||
q(n+1) = v / h(n+1,n)
end for
end for

## Band Matrix

A band matrix is a matrix whose entries are all zero except for the diagonal and a few of the immediately adjacent diagonals.

If the band matrix is large enough, then many significant efficiencies can be achieved in storage and matrix operations. Because so many elements are zero, the band matrix can be stored more compactly using band matrix storage. And because so many elements are zero, many algorithms can be speeded up to execute more quickly, including matrix multiplication and Gauss elimination.

Special cases of band matrices include those which are also symmetric or positive definite.

Band matrices with very tight bandwidths are of interest, and include matrices which are diagonal, bidiagonal, tridiagonal, and in some cases pentadiagonal.

Here is an example of a band matrix:

11 12  0  0  0  0
21 22 23  0  0  0
31 32 33 34  0  0
0 42 43 44 45  0
0  0 53 54 55 56
0  0  0 64 65 66

This matrix has an upper bandwidth of 1, and lower bandwidth of 2, and an overall bandwidth of 4.

In some cases, a band matrix can be LU-factored without pivoting. This is always true if the matrix is positive definite. If a banded matrix A with lower and upper bandwidths ml and mu can be LU factored without pivoting, then its factorization can be written as

A = L * U

The usual P factor is missing because there is no pivoting, but more is true. The factor L, which is unit lower triangular, is in fact also a band matrix, with lower bandwith ml, and U, the upper triangular factor, is also band matrix with upper bandwith mu.

If the matrix A is positive definite symmetric banded, with a half-bandwidth of m, then the banded LU factorization becomes:

A = L * U

with L and U having m as their lower and upper bandwidths respectively. Moreover, if we "factor out" the non-unit diagonal of U by writing
U = D * V

where V is unit upper triangular, we may write
A = L * U = L * D * V = L * D2 * D2 * V = R' * R

where D2 is the obvious "square root" of D (which must have positive entries because A is positive definite, remember!), and R is an upper triangular banded matrix with upper bandwidth m. Another way of looking at this is to say that the Cholesky factor of a banded positive definite symmetric matrix is itself banded with the same (half) bandwidth as the original matrix.

LAPACK and LINPACK include special routines for a variety of band matrices. These routines can compute the LU factorization, the determinant, inverse, or solution of a linear system.

LAPACK and EISPACK have routines for computing the eigenvalues of a symmetric banded matrix.

## Band Matrix Storage

Band matrix storage is a matrix storage format suitable for efficiently storing the nonzero entries of a band matrix.

Most band storage schemes are column oriented. The nonzero entries of the matrix "slide" downwards, while remaining in their original column. If the original matrix looked like this:

11 12  0  0  0  0
21 22 23  0  0  0
31 32 33 34  0  0
0 42 43 44 45  0
0  0 53 54 55 56
0  0  0 64 65 66

the matrix would be saved in column band matrix storage as:

0 12 23 34  0  0
11 22 33 44 45  0
21 32 43 54 55 56
31 42 53 64 65 66

Note that the zeroes in the above array are there just as padding. They don't correspond to any entries of the original array, and are simply necessary to make the array rectangular.

If the matrix is to be handled by a Gauss elimination routine that uses pivoting, then there is a possibility of fill in; that is, nonzero entries may need to be stored in places where zeroes had been. Band matrix storage can still be used, but we need to include in the compressed matrix some extra entries representing the diagonals along which the fill in entries may occur. It turns out that the number of extra diagonals required is simply the number of nonzero subdiagonals in the original matrix. For our example, this would mean the matrix would be stored as:

0  0  0  0  0  0
0  0  0  0  0  0
0 12 23 34  0  0
11 22 33 44 45  0
21 32 43 54 55 56
31 42 53 64 65 66

## Bandwidth

The bandwidth of a band matrix is, roughly speaking, the number of diagonals that contain nonzero entries.

More precisely, define ML, the lower bandwidth of a matrix A to be the maximum value of ( I - J ), and MU to be the maximum value of ( I - J ), for all nonzero matrix entries Ai,j. Then the bandwidth M is defined by:

M = ML + 1 + MU.

This definition always treats the (main) diagonal as nonzero, and is not misled by a matrix which has only two nonzero diagonals, which are actually widely separated. All the territory between the diagonals must be included when measuring bandwidth.

## Basis

A basis for a linear space X of dimension N is a set of N vectors, {v(i) | 1 <= i <= N } from which all the elements of X can be constructed by linear combinations.

Naturally, we require that each of the vectors v(i) be an element of the space X. Moreover, it is not enough that these vectors span the space; we also require that they be linearly independent, that is, there should be no redundant vectors in the set.

The columns of the identity matrix form a basis for the linear space of vectors of dimension N. A square matrix of order N is not defective exactly when its eigenvectors form a basis for the linear space of vectors of dimension N.

Given a particular basis, the representation of a vector x is the unique set of coefficients c(i) so that

x = Sum ( 1 <= I <= N ) c(i) * v(i)
The coefficients must be unique, otherwise you can prove that the basis is not linearly independent!

If the basis vectors are pairwise orthogonal, then the basis is called an orthogonal basis. If the basis vectors have unit length in the Euclidean norm, the basis is a normal basis. If both properties apply, it is an orthonormal basis. The columns of an orthogonal matrix are an orthonormal basis for the linear space of vectors of dimension N.

## Bidiagonal Matrix

A bidiagonal matrix has only two nonzero diagonals. The matrix is called upper bidiagonal if these are the main diagonal and the principal superdiagonal. The matrix is called lower bidiagonal if these are the main diagonal and the principal subdiagonal.

A simple example of an upper bidiagonal matrix is:

1 2 0 0 0
0 3 4 0 0
0 0 5 6 0
0 0 0 7 8
0 0 0 0 9

The Jordan Canonical Form is an example of an upper bidiagonal matrix.

A bidiagonal matrix is automatically a:

with all the rights and privileges appertaining thereunto.

If a tridiagonal matrix can be LU-factored without pivoting, then the L factor is lower bidiagonal, and the U factor is upper bidiagonal.

## Basic Linear Algebra Subprograms (BLAS)

The BLAS or Basic Linear Algebra Subprograms, are a set of routines offering vector and matrix utilities. They are extensively used as part of LINPACK and LAPACK, to simplify algorithms, and to make them run more quickly.

The Level 1 BLAS provide basic vector operations such as the dot product, vector norm, and scaling. Level 2 BLAS provide operations involving a matrix and a vector, and Level 3 BLAS provide matrix-matrix operations.

There are also sets of sparse BLAS and parallel BLAS available.

Here are the Level 1 BLAS routines for single precision real vectors:

• ISAMAX returns the index of maximum absolute value in vector SX;
• SASUM computes the sum of absolute values of vector SX;
• SAXPY adds a scalar multiple of one vector to another;
• SCOPY copies vector SX into SY;
• SDOT computes the dot product of two vectors;
• SMACH estimates the largest and smallest machine values, and roundoff value;
• SNRM2 computes the Euclidean norm of a vector SX;
• SROT applies a Givens rotation;
• SROTG generates a Givens Rotation;
• SSCAL scales a vector by a constant;
• SSWAP interchanges two vectors SX and SY.

## Block Matrix

A block matrix is a matrix which is described as being built up of smaller matrices.

For example, a tridiagonal block matrix might look like this:

2 4 | 3 9 | 0 0
4 6 | 0 3 | 0 0
---------------
1 0 | 2 4 | 3 9
5 5 | 4 6 | 0 3
---------------
0 0 | 1 0 | 2 4
0 0 | 5 5 | 4 6

but for certain purposes, it might help us to see this matrix as "really" being a tridiagonal matrix, whose elements are themselves little matrices:
a | b | 0
---------
c | a | b
---------
0 | c | a

An algorithm suitable for a tridiagonal matrix can often be extended, in a natural manner, to handle a block tridiagonal matrix. Similar extensions can be made in some cases for other types of block matrices. A block banded matrix can be factored by a variant of banded Gauss elimination, for instance.

## Border Banded Matrix

A border banded matrix is a 2 by 2 block matrix comprising a (large) leading block which is a square banded matrix, two dense rectangular side strips, and a (small) trailing block which is a square dense matrix.

For example, a "toy" border banded matrix might look like this:

2 -1  0  0  0  0  0 | 1 2
-1  2 -1  0  0  0  0 | 2 5
0 -1  2 -1  0  0  0 | 7 8
0  0 -1  2 -1  0  0 | 3 3
0  0  0 -1  2 -1  0 | 4 2
0  0  0  0 -1  2 -1 | 3 1
0  0  0  0  0 -1  2 | 7 8
-------------------------
3  7  8  3  2  3  1 | 5 2
1  2  4  7  9  2  4 | 3 6

which we can regard as being made up of the blocks:
A11 | A12
---------
A21 | A22
where, as we specified, A11 is a square banded matrix, A22 is a square dense matrix, and A21 and A12 are rectangular strips.

It is desirable to take advantage of the banded structure of A11. We can specify an algorithm for solving a linear system A * x = b that can be written in terms of operations involving the sub-matrices A11, A12, A21 and A22, which will achieve this goal, at the expense of a little extra work. One problem with this technique is that it will fail if certain combinations of the matrices A11 and A22 are singular, which can happen even when A is not singular.

The algorithm for solving A * x = b rewrites the system as:

A11 * X1 + A12 * X2 = B1
A21 * X1 + A22 * X2 = B2
The first equation can be solved for X1 in terms of X2:
X1 = - A11-1 * A12 * X2 + A11-1 * B1
allowing us to rewrite the second equation for X2:
( A22 - A21 * A11-1 * A12 ) X2 = B2 - A21 * A11-1 * B1
which can be solved as:
X2 = ( A22 - A21 * A11-1 * A12 )-1 * ( B2 - A21 * A11-1 * B1 )

The actual algorithm doesn't compute the inverse, of course, but rather factors the matrices A11 and A22 - A21 * A11-1 * A12.

## Cartesian Basis Vectors

The Cartesian basis vectors are simply the N columns of the identity matrix, regarded as individual column vectors.

These vectors form the standard basis for the set of vectors RN. The vector corresponding to the I-th column of the identity matrix is often symbolized by ei.

Thus, if we are working in a space with dimension N of 4, the basis vectors e1 through e4 would be:

1     0     0    0
0     1     0    0
0     0     1    0
0     0     0    1

Facts about the Cartesian basis vectors:

• The vectors have unit Euclidean norm, are pairwise orthogonal, and form a basis for the space of N dimensional vectors;
• A * ej yields a vector which is the J-th column of the matrix A;
• the outer product of ei and ej a matrix A which is all zero except for the single entry Ai,j=1.

## The Cauchy-Schwarz Inequality

The Cauchy-Schwarz Inequality is a relationship between a vector inner product and a vector norm derived from that inner product. In particular, if the norm ||*|| is defined by an inner product (*,*) as follows:

|| x || = sqrt ( x, x ),
then the Cauchy-Schwarz inequality guarantees that for any vectors x and y it is the case that:
| ( x, y ) | <= || x || * || y ||.

## The Cayley-Hamilton Theorem

The Cayley-Hamilton Theorem guarantees that every (square) matrix satisfies its own characteristic equation.

For example, if A is the matrix:

2 3
1 4

then the characteristic equation is
lambda2 - 6 * lambda + 5 = 0.
which is not true for all values lambda, but just a few special values known as eigenvalues. The Cayley-Hamilton theorem guarantees that the matrix version of the characteristic equation, with A taking the place of lambda, is guaranteed to be true:
A2 - 6 * A + 5 * I = 0.

## CentroSymmetric Matrix

A centrosymmetric matrix is one which is symmetric about its center; that is,

Ai,j = Am+1-i,n+1-j

Example:

1 10  8 11  5
13  2  9  4 12
6  7  3  7  6
12  4  9  2 13
5 11  8 10  1

A centrosymmetric matrix A satisfies the following equation involving the Exchange matrix J:

J*A*J = A
.

## Characteristic Equation

The characteristic equation of a (square) matrix A is the polynomial equation:

det ( A - lambda * I ) = 0
where lambda is an unknown scalar value.

The left hand side of the equation is known as the characteristic polynomial of the matrix. If A is of order N, then there are N roots of the characteristic equation, possibly repeated, and possibly complex.

For example, if A is the matrix:

2 3
1 4

then the characteristic equation is
(    2 - lambda     3           )
det (    1              4 - lambda  )  = 0

or
lambda2 - 6 * lambda + 5 = 0.
This equation has roots lambda = 1 or 5.

Values of the scalar lambda which satisfy the characteristic equation are known as eigenvalues of the matrix.

Some facts about the characteristic equation of A:

• A and A' have the same characteristic equation.

The Cayley-Hamilton Theorem guarantees that the matrix itself also satisfies the matrix version of its characteristic equation.

## Cholesky Factorization

The Cholesky factorization of a positive semidefinite symmetric matrix A has the form:

A = L * L'
where L is a lower triangular matrix, or equivalently:
A = R' * R
where R is an upper triangular matrix.

If a matrix is symmetric, then it is possible to determine whether or not the matrix is positive definite simply by trying to compute its Cholesky factorization: if the matrix has a zero eigenvalue, then it is positive semidefinite, and the algorithm should theoretically spot this by computing a zero diagonal element; if the matrix actually has a negative eigenvalue, then at a particular point in the algorithm, the square root of a negative number will be computed.

Software to compute the Cholesky factorization often saves space by using symmetric matrix storage, and overwriting the original matrix A by its Cholesky factor L.

As long as the matrix A is positive definite, the Cholesky factorization can be computed from an LDL factorization, or, if no pivoting was done, from an LU factorization.

The Cholesky factorization can be used to compute the square root of the matrix.

The LINPACK routines SCHDC, SCHUD, SCHDD, SCHEX, SPOCO, SPOFA, SPODI, and SPOSL compute and use the Cholesky factorization.

## Circulant Matrix

A circulant matrix is a matrix whose first row of values is repeated in each successive row, shifted one position to the right, with the end value "wrapping around".

Here is a square circulant matrix:

1 2 3 4
4 1 2 3
3 4 1 2
2 3 4 1

a "wide" rectangular circulant matrix:
1 2 3 4 5
5 1 2 3 4
4 5 1 2 3

a "tall" rectangular circulant matrix:
1 2 3
5 1 2
4 5 1
3 4 5
2 3 4

Simple facts about a (rectangular) circulant matrix A:

Simple facts about a square circulant matrix A:

• The Identity matrix is a circulant matrix;
• A is normal, hence unitarily diagonalizable.
• The product of two circulant matrices is a circulant matrix.
• The inverse of a (nonsingular) circulant matrix is a circulant matrix.
• Any two circulant matrices commute;
• The transpose of a circulant matrix is a circulant matrix.
• Every circulant matrix is diagonalized by the Fourier matrix.
• The columns of the Fourier matrix are the eigenvectors of (every) the circulant matrix.
• The vector (1,1,...,1) is an eigenvector of the matrix, with eigenvalue equal to the row sum.
• If W is an N-th root of unity, then
Y = A(1,1) + A(1,2)*W + A(1,3)*W2 + ... + A(1,N)*WN-1
is an eigenvalue of A, with (right) eigenvector:
( 1, W, W2, ..., WN-1 )
and left eigenvector:
( WN-1, WN-2, ..., W2, W, 1 ).
Although there are exactly N distinct N-th roots of unity, the circulant may have repeated eigenvalues, because of the behavior of the polynomial. However, the matrix is guaranteed to have N linearly independent eigenvectors.

Compare the concept of an anticirculant matrix.

## Cofactor Matrix

The cofactor matrix of a square matrix A is generally used to define the adjoint matrix, or to represent the determinant.

For a given matrix A, the cofactor matrix is the transpose of the adjoint matrix:

cofactor ( A ) = ( adjoint ( A ) )'

The determinant det(A) can be represented as the product of each of the entries of any given row or column times their corresponding cofactor entries. In particular, consider the first row:

det(A) = A(1,1) * cofactor(A)(1,1) + A(1,2) * cofactor(A)(1,2) + ... + A(1,N) * cofactor(A)(1,N)

The formula for the (I,J) entry of the cofactor matrix of A is:

cofactor(A)(I,J) = (-1)(I+J) * det ( M(A,I,J) )
where M(A,I,J) is the minor matrix of A, constructed by deleting row I and column J.

## Column Echelon Form

Column echelon form is a special matrix structure which is usually arrived at by Gauss elimination.

Any matrix can be transformed into this form, using a series of elementary column operations. Once the form is computed, it is easy to compute the determinant, inverse, the solution of linear systems (even for underdetermined or overdetermined systems), the rank, and solutions to linear programming problems.

A matrix (whether square or rectangular) is in column echelon form if:

• Each nonzero column of the matrix has a 1 as its first nonzero entry.
• The leading 1 in a given column occurs in a row below the leading 1 in the previous column.
• Columns that are completely zero occur last.

A matrix is in column reduced echelon form if it is in column echelon form, and it is also true that:

• Each row containing a leading 1 has no other nonzero entries.

Column echelon form is primarily of use for teaching, and analysis of small problems, using exact arithmetic. It is of little interest numerically, because very slight errors in numeric representation or arithmetic can result in completely erroneous results.

## Commuting Matrices

Two square matrices, A and B, are said to commute if

A * B = B * A

• If a (real) matrix commutes with its transpose, it is a normal matrix.
• Every (invertible) matrix commutes with its inverse.
• Every circulant matrix commutes with all other circulant matrices of the same order.

## Companion Matrix

The companion matrix for a monic polynomial P(X) of degree N is a matrix of order N whose characteristic polynomial is P(X).

If the polynomial P(X) is represented as:

P(X) = XN + C(1) * X(N-1) + ... + C(N-1) * X + C(N).
then the companion matrix for this polynomial has the form:
0 0 0 ... 0 0 -C(N)
1 0 0 ... 0 0 -C(N-1)
0 1 0 ... 0 0 -C(N-2)
...................
0 0 0 ... 1 0 -C(2)
0 0 0 ... 0 1 -C(1)

Note that the determinant of the companion matrix is either -C(N) or +C(N), so it is never zero unless the polynomial itself is degenerate (has a zero leading coefficient).

The characteristic polynomial of a companion matrix is, in fact, the polynomial that was used to define the matrix originally. Thus it is always possible to construct a matrix with any desired set of eigenvalues, by constructing the corresponding characteristic polynomial, and then the companion matrix.

The companion matrix can also be used to perform a decomposition of a matrix A. If x is a vector, and K the Krylov matrix

K = Krylov ( A, x, n )
whose columns are the successive products x, A*x, A2*x, and so on, and if K is nonsingular, then
A = K * C * K-1
where the matrix C is the companion matrix of A.

(There are several equivalent forms of the companion matrix, with the coefficients running along the top, the bottom, or the first column of the matrix.)

## Compatible Norms

A matrix norm and a vector norm are compatible if it is true, for all vectors x and matrices A that

||A*x|| <= ||A|| * ||x||
In some texts, the word consistent is used in this sense, instead of compatible.

In particular, if you have not verified that a pair of norms are compatible, then the above inequality is not guaranteed to hold. For any vector norm, it is possible to define at least one compatible matrix norm, namely, the matrix norm defined by:

||A|| = supremum ||A*x|| / ||x||
where the supremum (roughly, the "maximum") is taken over all nonzero vectors x. If a matrix norm can be derived from a vector norm in this way, it is termed a vector-bound matrix norm. Such a relationship is stronger than is required by compatibility.

If a matrix norm is compatible with some vector norm, then it is also true that

||A*B|| <= ||A|| * ||B||
where both A and B are matrices.

## Complex Number Representation

Complex numbers have the form a+bi, where i is a special quantity with the property that i2=-1.

It is possible to devise real matrices that behave like complex numbers. Let the value "1" be represented by the identity matrix of order 2, and the value "i" be represented by

0  1
-1  0

Then it is easy to show that these matrices obey the rules of complex numbers. In particular, "i" * "i" = - "1". In general, the complex number a+bi is represented by
a  b
-b  a

and multiplication and inversion have the correct properties.

## Condition Number

The condition number of the coefficient matrix A of a linear system is a (nonnegative) number used to estimate the amount by which small errors in the right hand side b, or in A itself, can change the solution x.

This analysis ignores arithmetic roundoff, which is hard to analyze, and focusses on easily measurable quantities known beforehand, and how they will amplify or diminish the roundoff errors.

Small values of the condition number suggest that the algorithm will not be sensitive to errors, but large values indicate that small data or arithmetic errors may explode into enormous errors in the answer.

The condition number is defined in terms of a particular matrix norm. Many different matrix norms may be chosen, and the actual value of the condition number will vary depending on the norm chosen. However, the general rule that large condition numbers indicate sensitivity will hold true no matter what norm is chosen.

The condition number for a matrix A is usually defined as

condition ( A ) = || A || * || A-1 ||.
If A is not invertible, the condition number is infinite.

Simple facts about the condition number:

• The condition number is always at least 1;
• The condition number of the identity matrix is 1;
• For the L2 matrix norm, the condition number of any orthogonal or unitary matrix is 1;

LINPACK routines such as SGECO return RCOND, an estimate of the reciprocal of the condition number in the L1 matrix norm.

Turing's M condition number, M(A), for a matrix of order N, is defined as

M(A) = N * max | Ai,j | * max | A-1i,j |.

Turing's N condition number, N(A) is

N(A) = Frob ( A ) * Frob ( A-1 ) / N
where Frob(A) is the Frobenius matrix norm.

The Von Neumann and Goldstine P condition number is

P(A) = | lambda_Max / lambda_Min |
where lambda_Max and lambda_Min are the eigenvalues of largest and smallest magnitude, which is equivalent to using the spectral radius of A and A-1.

There is also a condition number defined for the eigenvalue problem, which attempts to estimate the amount of error to be expected when finding the eigenvalues of a matrix A.

## Congruent Matrix

Congruent matrices A and B are related by a nonsingular matrix P such that

A = P' * B * P.

Congruent matrices have the same inertia.

Congruence is of little interest by itself, but the case where P is also an orthogonal matrix is much more important.

The conjugate gradient method is designed to solve linear systems

A * x = b
when the matrix A is symmetric, and positive definite.

The classical method is not an iterative method, but rather a direct method which produces an approximation to the solution after N steps.

Because of numerical inaccuracies and instabilities, many implementations of the method repeat the computation several times, until the residual error is deemed small enough.

On the other hand, if N is very large, and a suitable preconditioner is used, it is possible to treat the conjugate gradient method as "essentially" an iterative method, in the sense that it may be the case that a good approximate solution can be found by stopping the algorithm early. The matrix is assumed to be symmetric positive definite; if it is also large and sparse, a suitable preconditioner may be the Incomplete Cholesky Factorization. In this case, the algorithm is sometimes referred to as ICCG(0): that is, the Incomplete Cholesky - Conjugate Gradient algorithm.

The method is ideally suited for use with large sparse systems, because the matrix A is only accessed to compute a single matrix-vector product on each step. This involves no fill in or overwriting of the data structure that describes A. On the other hand, if the matrix A is dense, the conjugate gradient method costs roughly 3 times the number of operations for direct Gauss elimination.

The conjugate gradient method can be considered as a minimization of the functional f(x), defined by

f(x) = x' * ( 0.5 * A * x - b )
which achieves its minimum value when x solves the linear system.

Here are the formulas for the basic conjugate gradient method. Brackets indicate the value of an iterative quantity. X[0] is the initial value of the vector X, X[1] the value after one iteration, and so on.

X[0] = 0
For K = 1 to N

Compute the residual error:
R[K-1] = B - A * X[K-1]

Compute the direction vector:
If K = 1 then
P[K] = R[0]
else
BETA = - P'[K-1] * A * R[K-1]
/ ( P'[K-1] * A * P[K-1] )
P[K] = R[K-1] + BETA * P[K-1]
end if

Compute the location of the next iterate:
ALPHA = ( R'[K-1] ) * R[K-1] / ( P'[K] * A * P[K] )
X[K] = X[K-1] + ALPHA * P[K]
end for

Conjugate gradient algorithms are available in IMSL, ITPACK and NSPCG.

## Conjugate Matrix

The conjugate matrix of a complex matrix A, denoted by A* or conjugate ( A ), is the matrix obtained by replacing each entry of A by its complex conjugate.

(In this document, the form conjugate ( A ) is preferred, because the A* is easily confused with multiplication.

The complex conjugate transpose, sometimes called the Hermitian or tranjugate of A, is derived from A by complex conjugation, followed by transposition, and is denoted by AH.

## Conjugate of a Complex Number

The conjugate of a complex number z = a + b * i is the complex number

conjugate ( z ) = a - b * i.

The conjugate is frequently represented by placing a bar over the quantity, or occasionally a star after it, as in z*.

The complex conjugate can be used in a formula for the norm or magnitude of a complex number, which must always be a real nonnegative value:

norm ( z ) = sqrt ( z * conjugate ( z ) ) = sqrt ( a2 + b2 ).

For complex vectors, an inner product with the correct properties may be defined as:

V dot W = ( V, W ) = sum ( 1 <= I <= N ) conjugate ( V(I) ) * W(I).
This inner product is computed in the BLAS function CDOTC, for example, and yields another relationship with the Euclidean vector norm:
|| V || = sqrt ( V dot V )

## Conjunctive Matrix

Two (complex) matrices A and B are said to be conjunctive if there is some nonsingular matrix P so that

A = ( conjugate ( P ) )' * B * P,

This is the extension to complex matrices of the concept of a congruent real matrix.

## Consistent System

A consistent linear system is an M by N linear system A * x = b, with A and b given, for which there is at least one solution vector x.

A simple example of an inconsistent system is:

1 0 1        1
0 2 3 * x =  2
1 2 4        4

The last row of A is the sum of the previous two rows, but the last entry of b is not the sum of the previous two entries, so there can be no solution.

By contrast, the system

1  2  3         14
4  5  6  * x  = 34

is consistent, having as one solution (of many) the vector (1,2,3), and the system
1  0          5
1  2          5
2  4  * x =  10
3  6         15
4  8         20

is consistent, having the unique solution ( 5, 0 )

Facts about a consistent linear system:

• If the matrix A is square, and invertible, then any linear system involving A will be consistent.
• If the matrix A is square, and noninvertible, then there is guaranteed to be some right hand side b for which the corresponding linear system will be inconsistent.
• For a square or rectangular matrix A, the linear system will be consistent if and only if the right hand side b lies in the range of A which is equivalent to requiring that b lie in the column space of A.
• A rectangular system of any order can be consistent or inconsistent.

If a linear system is not consistent, then there is no solution x. But in such a case, it may be of interest to determine an approximate solution, which satisfies some condition, such as minimizing some norm of the residual. One technique for doing this involves the linear least squares method..

## Convergent Matrix

A convergent matrix A is a square matrix for which the limit as n goes to infinity of An is zero.

A matrix is convergent if and only if the spectral radius rho(A) satisfies

rho(A) < 1

A semiconvergent matrix A is a square matrix A for which the limit as n goes to infinity of An exists. If a matrix is semiconvergent, then either it is convergent, (rho(A) < 1) or else rho(A) = 1. In the second case, it must be further true that the only eigenvalue of norm 1 is 1, and that this eigenvalue is semisimple.

The simplest example of a semiconvergent matrix is the identity matrix.

## Correlation Matrix

A correlation matrix is a square positive semidefinite symmetric matrix with unit diagonal entries, and off-diagonal entries whose magnitude is no greater than 1.

In statistical applications, entry Ai,j represents the strength of the correlation between variables I and J. The strongest correlation is +1; a correlation of 0 means the variables are apparently completely independent. If a correlation is negative, it means that variable I is positively correlated to the negative of the value of variable J.

The entries of the correlation matrix can be represented as

Ai,j = X(I) dot X(J) / ( ||X(I)|| ||X(J)|| )
where the norm used is derived from the dot product, and the dot product takes the form appropriate for the kind of data being handled, such as finite vectors, or continuous functions.

## Cross Product

The cross product of two vectors u and v, denoted u x v, is a vector w which is perpendicular to u and v, pointing in the direction so that (u,v,w) forms a right handed coordinate system, and whose length is equal to the area of the parallelogram two of whose sides are u and v.

Algebraically,

w(1) = u(2) * v(3) - u(3) * v(2)
w(2) = u(3) * v(1) - u(1) * v(3)
w(3) = u(1) * v(2) - u(2) * v(1)

If the unit vectors in the coordinate directions are denoted by i, j and k, then the cross product vector can also be regarded as the (vector) value of the following "determinant":

|   i    j    k  |
w = u x v = det | u(1) u(2) u(3) |
| v(1) v(2) v(3) |

## Cyclic Reduction

Cyclic reduction is a method for solving a linear system A*x=b in the special case where A is a tridiagonal matrix.

On a parallel computer, this method solves the system in LOG(N) "steps" where N is the order of A. A standard Gauss elimination method for a tridiagonal system would require roughly N "steps" instead.

A tridiagonal system has some very special properties that will allow us to carry this operation out. Consider this system of 7 equations:

A11 x1 + A12 x2                                              = y1
A21 x1 + A22 x2 + A23 x3                                     = y2
A32 x2 + A33 x3 + A34 x4                            = y3
A43 x3 + A44 x4 + A43 x5                   = y4
A54 x4 + A55 x5 + A56 x6          = y5
A65 x5 + A66 x6 + A67 x7 = y6
A76 x6 + A77 x7 = y7

The first equation can be used to eliminate the coefficient A21 in the second equation, and the third equation to eliminate the coefficient A23 in the second equation. This knocks out variables x1 and x3 in the second equation, but adds x4 into that equation.

By the same method, x3 and x5 can be eliminated from the equation for x4, and so on. By eliminating the odd variables from the even equations, a smaller tridiagonal system system is derived, with half the equations and variables.

If elimination is applied to this set, the number of equations is again reduced by half; this reduction may be repeated until a single equation in one variable is reached. Backsubstitution then produces the values of all the variables.

The reason this method might have an advantage over Gauss elimination is that, at each step of the elimination phase, the parts of the step are independent. If many computer processors are available, then each can be working on a separate portion of the elimination. If the number of processors is large enough, the system can really be solved in LOG(N) time.

Cyclic reduction routines are available in the NCAR software library, the SLATEC library, and the Cray SCILIB library.

## Cyclic Tridiagonal Matrix

A cyclic tridiagonal matrix is a generalization of a tridiagonal matrix which includes an extra last entry in the first row, and an extra first entry in the last row.

An example of a cyclic tridiagonal matrix:

-2  1  0  0  1
1 -2  1  0  0
0  1 -2  1  0
0  0  1 -2  1
1  0  0  1 -2

A cyclic tridiagonal matrix is not a tridiagonal matrix. If the matrix is constant along the three generalized diagonals, a cyclic tridiagonal matrix is a circulant matrix. A cyclic tridiagonal matrix can arise in situations where a periodic boundary condition is applied.

It is very disappointing that a cyclic tridiagonal matrix is not a tridiagonal matrix, since there are so many good methods for solving tridiagonal linear systems. One way to solve a cyclic tridiagonal system is to use the Sherman Morrison Formula and view the matrix as a rank one perturbation of a tridiagonal matrix. Another approach is to view it as a border banded matrix.

A cyclic tridiagonal matrix may also be called a periodic tridiagonal matrix.

## Defective Matrix

A defective matrix is a (square) matrix that does not have a full set of N linearly independent eigenvectors.

For every eigenvalue, there is always at least one eigenvector, and eigenvectors corresponding to distinct eigenvalues are linearly independent. If the N eigenvalues of a matrix are distinct, then it surely has N linearly independent eigenvectors, and so cannot be defective.

Conversely, if a matrix is defective, then it must have at least one repeated eigenvalue, that is, an eigenvalue of algebraic multiplicity greater than 1. A matrix is defective if and only if its Jordan Canonical Form has at least one nonzero entry on the superdiagonal.

Thus, a simple example of a defective matrix is:

1 1
0 1

which has the single eigenvalue of 1, with algebraic multiplicity 2, but geometric multiplicity 1. The only eigenvector is ( 0, 1 ).

If a matrix is not defective, then its eigenvectors form a basis for the entire linear space. In other words, any vector y can be written as

y = X * c
where X is the array of eigenvectors of A.

If a matrix A is not defective, then it is similar to its diagonal eigenvalue matrix:

A = X * LAMBDA * X-1
and the similarity transformation matrix X is actually the eigenvector matrix.

This in turn allows us to make interesting statements about the inverse, tranpose, and powers of A. For instance, we see that

A2 = ( X * LAMBDA * X-1 ) * ( X * LAMBDA * X-1 )
= X * LAMBDA2 * X-1
leading us to the statement that for a nondefective matrix, the square has the same eigenvectors, and the square of the eigenvalues.

## Deflation

Deflation is a technique for "removing" a known eigenvalue from a matrix, in order to facilitate the determination of other eigenvalues.

For example, the power method is able to estimate the eigenvalue of largest modulus of a matrix A. Once this is computed, it might be desired to find the next largest eigenvalue. Deflation can be used to essentially create a new matrix, A', which has the same eigenvalues as A, except that the largest eigenvalue has been dropped (and the order of A' reduced by 1) or the largest eigenvalue is replaced by 0. In either case, the power method applied to A' will produce the next largest eigenvalue of A.

To eliminate a known eigenvalue, lambda, it is necessary to know its eigenvector x, which we will assume has been scaled to have unit Euclidean norm. By the properties of eigenvectors, we know that

A * x = lambda * x.
Now define the matrix A^ so that:
A^ = A - lambda * x * x'.
Now x is an eigenvector of A^ with eigenvalue 0, because:
A^ * x = ( A - lambda * x * x' ) * x
= A * x - lambda * x * x' * x
= lambda * x - lambda * x
= 0

If the power method is being employed, then the new iteration should try to "factor out" any component of the eigenvector x; otherwise, small errors in the computation of the first eigenvalue and eigenvector will interfere with the next results.

Theoretically, this process may be repeated as often as desired, eliminating each eigenvalue as it is discovered. Practically, however, accumulated errors in the eigenvalues and eigenvectors make the computation more and more unreliable with each step of deflation. Thus, if more than a few eigenvalues are desired, it is more appropriate to use a standard technique.

## Derogatory Matrix

A derogatory matrix is a matrix whose minimal polynomial is of lower degree than its characteristic polynomial

For this to happen, of course, the matrix must have at least one eigenvector with algebraic multiplicity greater than one.

Surprisingly, the identity matrix is derogatory, (ignoring the case N=1). A matrix can be guaranteed to be nonderogatory if the geometric multiplicity of every eigenvalue is 1.

Perhaps the only reason that the term is worth knowing is this fact: every nonderogatory matrix is similar to the companion matrix of its characteristic polynomial.

## Determinant of a Matrix

The determinant of a square matrix is a scalar value whose most common usage is to indicate whether a matrix is singular or not.

If a matrix is singular, then it doesn't have an inverse and linear systems involving this matrix cannot be reliably solved.

In numerical work, the determinant is not really a reliable indicator of singularity, and other data, such as the relative size of the matrix elements encountered during pivoting, are preferred when checking for singularity.

The determinant also occurs in the definition of the eigenvalue problem.

An explicit formula for the determinant of a matrix A is:

det ( A ) = sum [ over all P ] sign(P) * A(1,P(1)) * A(2,P(2) * ... * A(N,P(N)).
where the sum ranges over all possible permutations P of the numbers 1 through N, and sign(P) is +1 for an even permutation, and -1 for an odd permutation. (Any permutation may be accomplished by a sequence of switching pairs of objects. The permutation is called even or odd, depending on whether the number of switches is even or odd).

A numerical method for finding the determinant comes as a byproduct of the LU factorization used in Gaussian elimination. Typically, this factorization has the form

A = P * L * U,
and the value of the determinant is simply
det ( A ) = det ( P ) * product ( 1 <= I <= N ) U(I,I).
where det ( P ) is +1 or -1, again determined by the sign of the permutation.

• the determinant of the identity matrix is 1;
• the determinant of a permutation matrix is +1 or -1;
• the determinant of an orthogonal matrix is +1 or -1;
• the determinant of a diagonal matrix is the product of the diagonal entries;
• the determinant of a unit upper or lower triangular matrix is 1;
• the determinant of an upper or lower triangular matrix is the product of the diagonal entries;
• there is a formula for the determinant of the identity matrix plus a rank one matrix: det ( I + uv' ) = 1 + u'v;
• the product of the eigenvalues of a matrix equals the determinant;
• det ( A' ) = det ( A );
• det ( s*A ) = sN*det ( A ), where s is a scalar multiplier, and N is the order of A;
• det ( A * B ) = det ( A ) * det ( B );
• det ( A-1 ) = 1 / det ( A ).

A single elementary row operation has the following effect on the determinant:

• Interchanging two rows multiplies the determinant by -1;
• Multiplying a row by the nonzero scalar s multiplies the determinant by s;
• Adding a multiple of one row to another leaves the determinant unchanged.

For small matrices, the exact determinant is simple to compute by hand. The determinant of a 2 by 2 matrix

a  b
c  d

is a*d-b*c, while the determinant of a 3 by 3 matrix:
a  b  c
e  f  g
h  i  j

is a * (f*j-g*i) - b * (e*j-g*h) + c * (e*i-f*h).

If absolutely necessary, the determinant of a matrix of order N can be computed recursively in terms of determinants of minor matrices. Let M(A,I,J) stand for the (I,J) minor matrix of A. Then the determinant of A is

det ( A ) = sum ( 1 <= J <= N ) (-1)(J+1) * Ai,j * det ( M(A,I,J) ).
Of course, now we need to compute the determinants of the N minor matrices, but the order of these matrices has been reduced by 1. Theoretically, we can represent the determinant of any of these matrices of order N-1 by a similar sum involving minor matrices of order N-2, and this process can be repeated until we reach matrices of order 1 or 2, whose determinants are easy to compute. In practice, this method is never used except in simple classroom exercises.

There is a geometric interpretation of the determinant. If the rows or columns of A are regarded as vectors in N dimensional space, then the determinant is the volume of a the parallelepiped, or "slanted cube" whose one corner is defined by these vectors.

LAPACK and LINPACK provide routines for computing the determinant of a matrix, after the matrix has been decomposed into LU factors.

## Diagonal Dominance

A matrix is (column) diagonally dominant if, for every column, the sum of the absolute values of the offdiagonal elements is never greater than the absolute value of the diagonal element.

The matrix is strictly diagonally dominant if the offdiagonal sum is always strictly less than the absolute value of the diagonal element.

The same definitions can be used to consider rows instead of columns. The terms column diagonally dominant and row diagonally dominant may be used, if necessary, to specify which case is being considered.

A strictly diagonally dominant matrix cannot be singular, by Gershgorin's Theorem.

A diagonally dominant matrix which is also irreducible cannot be singular.

Here is a diagonally dominant matrix which is not strictly diagonally dominant:

-2  1  0  0  0
1 -2  1  0  0
0  1 -2  1  0
0  0  1 -2  1
0  0  0  1 -2

For a linear system A * x = b, if the matrix A is strictly diagonally dominant, then both Jacobi iteration and Gauss Seidel iteration are guaranteed to converge to the solution.

## Diagonal Matrix

A diagonal matrix is one whose only nonzero entries are along the main diagonal. For example:

3 0 0
0 4 0
0 0 7

Simple facts about a diagonal matrix A:

• A is singular if and only if any diagonal entry is zero;
• The eigenvalues of A are the diagonal entries, and the eigenvectors are the columns of the identity matrix;
• The determinant of A is the product of the diagonal entries;

## Diagonalizable Matrix

A diagonalizable matrix is any (square) matrix A which is similar to a diagonal matrix:

A = P * D * P-1.

This concept is important in the study of eigenvectors To see the relationship, post-multiply the equation by P:

A * P = P * D.
Looking at the columns of P as eigenvectors, and the diagonal entries of D as eigenvalues, this shows that a matrix is diagonalizable exactly when it has N linearly independent eigenvectors.

In certain cases, not only is a matrix diagonalizable, but the matrix P has a special form. The most interesting case is that of any (real) symmetric matrix A; not only can such a matrix be diagonalized, but the similarity matrix is orthogonal:

A = Q * D * Q-1 = Q * D * Q',
This fact can be interpreted to show that not only does every symmetric matrix have a complete set of eigenvectors, but the eigenvectors and eigenvalues are real, and the eigenvectors are pairwise orthogonal.

Similarly, a complex matrix that is a Hermitian has a complete set of eigenvectors, and the eigenvalues are real. (The eigenvectors in this case will not generally be real).

Simple facts:

• If the eigenvalues of A are distinct, then it can be diagonalized;
• If a matrix is symmetric, then it is diagonalizable, and the transformation matrix can be chosen to be orthogonal.
• A matrix is diagonalizable if, and only if, it has a set of N linearly independent eigenvectors; in other words, if it is not defective.
• If a matrix is normal, then it is not only diagonalizable, but the transformation matrix is unitary.

## Downshift Matrix

The downshift matrix A circularly shifts all vector entries or matrix rows down 1 position.

Example:

0 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0

Facts about the downshift matrix A:

## Eigenvalues

Eigenvalues are special values associated with a (square) matrix, which can be used to analyze its behavior in multiplying any vector.

The formal definition of an eigenvalue of a matrix A is that it is any value lambda which is a root of the characteristic equation of the matrix,

det ( A - lambda * I ) = 0.

lambda is an eigenvalue of A if and only if there is a nonzero vector x, known as an eigenvector (or sometimes a "right" eigenvector), with the property that

A * x = lambda * x.
Note that there must also be a "left" eigenvector y, with the property
y * A = A' * y = lambda * y.

The characteristic equation has exactly N roots, so a matrix has N eigenvalues. An important consideration is whether any eigenvalue is a repeated root, which determines how hard the eigenvector computation will be.

If a matrix has the maximum possible number of linearly independent eigenvectors (namely N, the order of the matrix), then the eigenvalues and eigenvectors can be used to diagonalize the matrix. This only happens when the matrix is normal.

If the NxN matrix A has N linearly independent eigenvectors, we may collect the eigenvectors as columns of a matrix X, the eigenvalues as entries in the diagonal matrix Lambda, we may first write

A * X = X * Lambda
and then, because the eigenvectors are presumed to be linearly independent, we have the eigen decomposition or the spectral decomposition of A:
A = X * Lambda * X-1

Simple facts about eigenvalues of A:

Simple algorithms for computing eigenvalues include the power method and the inverse power method. The QR method is a more powerful method that can handle complex and multiple eigenvalues.

LAPACK and EISPACK include algorithms for computing the eigenvalues and eigenvectors of a variety of types of matrix, as well as methods that can be applied to more general eigenvalue problems.

## Eigenvectors

A nonzero vector x is an eigenvector of the square matrix A if

A * x = lambda * x
for some scalar value lambda, called the associated eigenvalue.

Sometimes this eigenvector is more particularly described as a right eigenvector, so that we may also consider left eigenvectors, that is, vectors y for which it is true that

y * A = A' * y = mu * y
for some scalar mu.

For every eigenvalue of a matrix, there is at least one eigenvector. Every nonzero multiple of this eigenvector is also an eigenvector, but in an uninteresting way. If, and only if, an eigenvalue is a repeated root, then there may be more than one linearly independent eigenvector associated with that eigenvalue. In particular, if an eigenvalue is repeated 3 times, then there will be 1, 2 or 3 linearly independent eigenvectors corresponding to that eigenvalue.

• If x is an eigenvector, so is s*x, for any nonzero scalar s.
• If A is singular, then it has an eigenvector associated with the eigenvalue 0, so that A * x = 0.
• If x is a right eigenvector for eigenvalue lambda, and y is a left eigenvector for eigenvalue mu, and lambda and mu are distinct, then x and y are orthogonal. This property is sometimes call biorthogonality.

## EISPACK

EISPACK is a package of routines for handling the standard and generalized eigenvalue problems.

The beginning user who is not interested in trying to learn the details of EISPACK and simply wants the answer to an eigenvalue problem quickly should call one of the main driver routines. Each of these is tailored to handle a given problem completely with a single subroutine call. For more advanced work, it may be worth investigating some of the underlying routines.

Driver routines to solve A*x=lambda*x include:

• CG, complex general matrix.
• CH, complex Hermitian matrix.
• RG, real general matrix.
• RS, real symmetric matrix.
• RSB, real symmetric band matrix.
• RSP, real symmetric, packed storage matrix.
• RSPP, real symmetric packed storage matrix, some eigenvectors.
• RST, real symmetric tridiagonal matrix.
• RT, real sign-symmetric tridiagonal.

For the generalized eigenvalue problem:

• RGG solves A*x=lambda*B*x for A and B real, general matrices.
• RSG solves A*x=lambda*B*x for A real symmetric, B real positive definite.
• RSGAB solves A*B*x=lambda*x for A real symmetric, B real positive definite.
• RSGBA solves B*A*x=lambda*x for A real symmetric, B real positive definite.

## EISPACK Matrix Norm

The EISPACK matrix norm is used in the EISPACK eigenvalue package.

The definition of the norm for an M by N matrix is:

||A|| = sum ( I = 1 to M, J = 1 to N ) | Ai,j |

It's a simple exercise to verify that this quantity satisifes the requirements for a matrix norm.

This norm is easy to calculate, and was used in EISPACK in order to have a standard against which to compare the size of matrix elements that were being driven to zero. I haven't seen it used anywhere else in practice.

## Elementary Column Operations

Elementary column operations are a simple set of matrix operations that can be used to carry out Gauss elimination, Gauss Jordan elimination, or the reduction of a matrix to column echelon form.

Restricting the operations to a simple set makes it easy to:

• guarantee that each step is legitimate;
• record each step using a simple notation;
• compute the inverse of the total set of operations.

The three elementary column operations include:

• interchange any two columns;
• multiply any column by a nonzero value;
• add a multiple of any column to another column.

Each of these operations may be represented by an elementary matrix, and the transformation of the original matrix A to the reduced matrix B can be expressed as postmultiplication by a concatenation of elementary matrices:

B = A * E(1) * E(2) * ... * E(k)
which may be abbreviated as:
B = A * C
Since C will be guaranteed to be invertible, we also know that,
B * C-1 = A
which yields a factorization of A.

## Elementary Matrix

An elementary matrix E is one which, when pre-multiplying another matrix A, produces a product matrix E * A which has exactly one of the following properties:

• rows R1 and R2 are interchanged, or
• row R1 is multiplied by a nonzero constant s, or
• a multiple of row R2 is added to row R1.

The matrix E which interchanges rows R1 and R2 of matrix A has the form E(I,J)=:

• 1, for I = R1, J = R2, or I = R2, J = R1;
• 1, for I = J and I not equal to R1 or R2;
• 0, otherwise.
The inverse of this matrix is simply its transpose

The matrix E which multiplies row R1 of A by the constant s has the form E(I,J)=:

• s, if I = J = R1;
• 1, if I = J =/= R1;
• 0, otherwise.
The inverse of this matrix is constructed by negating the value of s.

The matrix E which adds s * row R2 to row R1 of A has the form E(I,J) =:

• 1, if I = J;
• s, if I = R1 and J = R2;
• 0, otherwise.
The inverse of this matrix is constructed in the same way, using 1/s.

If a matrix F can be represented as the product of elementary matrices,

F = E1 * E2 * ... * EM,
then its inverse is:
F-1 = EM-1 * EM-1-1 * ... * E1-1.

An elementary similarity transformation uses a matrix F which is the product of elementary matrices, and transforms the matrix A into the similar matrix B by the formula

B = F-1 * A * F.

## Elementary Row Operations

Elementary row operations are a simple set of matrix operations that can be used to carry out Gauss elimination, Gauss Jordan elimination, or the reduction of a matrix to row echelon form.

Restricting the operations to a simple set makes it easy to:

• guarantee that each step is legitimate;
• record each step using a simple notation;
• compute the inverse of the total set of operations.

The three elementary row operations include:

• interchange any two rows;
• multiply any row by a nonzero value;
• add a multiple of any row to another row.

Each of these operations may be represented by an elementary matrix, and the transformation of the original matrix A to the reduced matrix B can be expressed as premultiplication by a concatenation of elementary matrices:

B = E(k) * E(k-1) * ... * E(2) * E(1) * A
which may be abbreviated as:
B = C * A.
Since C will be guaranteed to be invertible, we also know that,
C-1 * B = A
which yields a factorization of A.

## Ellipsoids

An ellipsoid is an N dimensional generalization of an ellipse. The formula for an ellipsoid may be written as:

sum ( I = 1 to N, J = 1 to N ) Ai,j * Xi * Xj = 1.
where A is a positive definite symmetric matrix.

A principal axis of an ellipsoid is any N dimensional point X on the ellipsoid such that the vector from the origin to X is normal to the ellipsoid.

In the general case, there are exactly N principal axes (plus their negatives). In degenerate cases, there may be an entire plane of vectors that satisfy the requirement, but it is always possible to choose a set of N principal axes which are linearly independent.

Moreover, in the general case, the principal axes are pairwise orthogonal vectors, and in the degenerate case, may be chosen pairwise orthogonal.

Moreover, it is always true that the principal axes are eigenvectors of the matrix A of ellipsoid coefficients. The length of the principal axis vector associated with an eigenvalue lambda(I) is 1 / Sqrt ( lambda(I) ).

These facts have a strong relationship to the formulation of the conjugate gradient method.

## Equilibration

Equilibration is the technique of balancing the rows or columns of a matrix by rescaling them.

Consider, for instance, the fact that the following two equations are equivalent:

0.0001 * x + 0.0001 * y = 0.0001
and
1000 * x + 1000 * y = 1000
However, the large coefficients in the second equation will bias a Gauss elimination routine to choose that equation as its pivot. Actually, it's more important in this case that the chosen row be as "linearly independent as possible" from the other rows, and this is more likely to occur if we ensure that all the rows start out with an equal norm. This can be done very simply, by finding the element of maximum absolute value in each row and dividing that row (and its right hand side) by that value. Such a technique is called row equilibration. It is not necessary that the rows have precisely the same norm; it is desirable that the norms of the rows be maintained within some controlled range.

Equilibration is useful in many areas of linear algebra, including eigenvalue calculations. In some cases, column equilibration is preferred, and in other cases, the norms of both the rows and columns are to be controlled.

## Equivalent Matrix

Matrices A and B are said to be equivalent if there are nonsingular matrices P and Q so that

A = P * B * Q.

• Every matrix is equivalent to a diagonal matrix;
• If A and B are equivalent, then they are both singular or both nonsingular.

Equivalence is a very loose concept of relatedness. A stronger and more useful concept is similarity.

## Exchange Matrix

The exchange matrix J is constructed from the identity matrix by reversing the order of the columns.

For example, the matrix J of order 4:

0 0 0 1
0 0 1 0
0 1 0 0
1 0 0 0

Facts about the exchange matrix J:

The exchange matrix is sometimes called the anti-identity matrix or the counter-identity matrix or the reversal matrix.

## External Storage Algorithms

An external storage algorithm is a method of solving a problem that is too large to be loaded into computer memory as a whole.

Instead, the problem is solved incrementally, with most of the problem data residing, at any one time, in computer files, also called "disk storage" or "external storage". It is a considerable difficulty just to rewrite an algorithm that can handle a situation where, say, part of a matrix is in one place, and part is in another, remote place.

However, such algorithms must also be aware that data transfers between memory and disk are very slow. Hence, if the algorithm is to be of any use, it must do as much processing as possible on the portion of the data that resides in memory, and read the external problem data into memory as rarely as possible, and in large contiguous "chunks".

## Fourier Matrix

The Fourier matrix represents the linear operator that transforms a vector of data into a vector of Fourier coefficients.

Let w indicate an N-th root of unity. Then, if we choose N=4, the matrix F will be:

1  1    1    1
1  w    w^2  w^3
1  w^2  w^4  w^6
1  w^3  w^6  w^9

which simplifies to
1  1    1    1
1  w    w^2  w^3
1  w^2  1    w^2
1  w^3  w^2  w^1

However, we will choose to scale F by 1/sqrt(N).

Facts about the Fourier matrix F:

## Frobenius Matrix Norm

The Frobenius matrix norm is a matrix norm that has the simple formula: ||A|| = the square root of the sum of the squares of all the entries of the matrix.

The Frobenius matrix norm is not a vector-bound matrix norm, although it is compatible with the L2 vector norm, and much easier to compute that the L2 matrix norm.

The Frobenius matrix norm is sometimes called the Schur matrix norm or Euclidean matrix norm

## Gauss Elimination

Gauss elimination has the goal of producing a solution x to the system of linear equations A*x=b, where A is matrix of order N, and b a vector of length N. The standard version of Gauss elimination used in most algorithms employs partial pivoting.

Gauss elimination accomplishes its goal by decomposing the original matrix A into three factors, a permutation matrix P, a unit lower triangular matrix L, and an upper triangular matrix U. The factors are related to the original matrix by the formula

A = P * L * U.

Once the matrix is factored, it is a simple matter to solve A*x=b, by solving instead P * ( L * ( U * x ) ) ) = b, because each of the three factors is easy to invert.

Moreover, once the factors are known, the user may solve several linear systems involving A, with different right hand sides.

The determinant of A is equal to the product of the determinants of the factors, and hence is easily computed: the determinant of P is plus or minus 1, and that of L is 1, and that of U is simply the product of its diagonal elements.

The inverse matrix could be solved for, if necessary, by solving the N linear systems A * X(I) = E(I), where E(I) is the I-th Cartesian basis vector The vectors X(1), X(2), ..., X(N) then are the columns of the inverse of A.

As an example of the Gauss elimination of a matrix, suppose we start with with the the matrix:

A

1 2 3
4 5 6
7 8 0

We can write an imperfect PLU factorization as:
P          L          U

1 0 0      1 0 0      1 2 3
0 1 0      0 1 0      4 5 6
0 0 1      0 0 1      7 8 0

The factorization is imperfect because, although A = P*L*U, the matrix U is not upper triangular. We will now modify the matrix U, and update the factors P and L, so that it is always true that A=P*L*U, while the matrix U gradually is transformed into the correct upper triangular form.

Step 1.1: Choose a pivot row in U, namely, row 3. We want to interchange rows 3 and 1 of the matrix U. The elementary permutation matrix P(1,3) does this. We are allowed to insert the inverse of this matrix times itself between L and U in the factorization. We also insert the inverse of this matrix times itself between P and L. If we use primes to denote the updated quantities, these operations are:

A = P * L * U
= [ P * P-1(1,3) ] * [ P(1,3) * L * P-1(1,3) ] * [ P(1,3) * U ]
= P' * L' * U'

The resulting factorization is now:
P          L          U

0 0 1      1 0 0      7 8 0
0 1 0      0 1 0      4 5 6
1 0 0      0 0 1      1 2 3

Step 1.2: Eliminate U2,1 by subtracting 4/7 of row 1 from row 2. To do this, we construct the elementary matrix L(1,2,4/7), and insert the product of its inverse and itself into the factorization. Then we absorb the inverse into L, and the matrix into U.

A = P * L * U
= P * [ L * L-1(1,2,4/7) ] * [ L(1,2,4/7) * U ]
= P * L' * U'

The resulting factorization is now:
P            L           U

0 0 1       1  0 0      7  8  0
0 1 0      4/7 1 0      0 3/7 6
1 0 0       0  0 1      1  2  3

Step 1.3: Eliminate U3,1 by subtracting 1/7 of row 1 from row 3. To do this, we construct the elementary matrix L(1,3,1/7), and insert the product of its inverse and itself into the factorization. Then we absorb the inverse into L, and the matrix into U.

A = P * L * U
= P * [ L * L-1(1,3,1/7) ] * [ L(1,3,1/7) * U ]
= P * L' * U'

The resulting factorization is now:
P           L           U

0 0 1      1  0 0      7  8  0
0 1 0     4/7 1 0      0 3/7 6
1 0 0     1/7 0 1      0 6/7 3

Step 2.2: Choose a pivot row in U, namely, row 3. We want to interchange rows 3 and 2 of the U. The elementary permutation matrix P(2,3) does this. We are allowed to insert the inverse of this matrix times itself between L and U in the factorization. We also insert the inverse of this matrix times itself between P and L. If we use primes to denote the updated quantities, these operations are:

A = P * L * U
= [ P * P-1(2,3) ] * [ P(2,3) * L * P-1(2,3) ] * [ P(2,3) * U ]
= P' * L' * U'

The resulting factorization is now:
P            L           U

0 1 0       1  0 0      7  8  0
0 0 1      1/7 1 0      0 6/7 3
1 0 0      4/7 0 1      0 3/7 6

Step 2.3: Eliminate U3,2 by subtracting 1/2 of row 2 from row 3. To do this, we construct the elementary matrix L(2,3,1/2), and insert the product of its inverse and itself into the factorization. Then we absorb the inverse into L, and the matrix into U.

A = P * L * U
= P * [ L * L-1(2,3,1/2) ] * [ L(2,3,1/2) * U ]
= P * L' * U'

The resulting factorization is now:
P            L            U

0 1 0      1   0  0      7  8   0
0 0 1     1/7  1  0      0 6/7  3
1 0 0     4/7 1/2 1      0  0  9/2

The PLU factorization is now correct.

You should be able to see a formula for the final factors:

P = I * P-1(1,3) * P-1(2,3)
and
L = P(2,3) * P(1,3) * I * P-1(1,3) * L-1(1,2,4/7) * L-1(1,3,1/7) * P-1(2,3) * L-1(2,3,1/2)
and
U = L(2,3,1/2) * P(2,3) * L(1,3,1/7) * L(1,2,4/7) * P(1,3) * A
You should see that the form of these matrices guarantees that A=P*L*U. The way we carried out the steps guarantees that P stays a permutation matrix, L stays a lower triangular matrix, and U becomes an upper triangular matrix.

## Gauss Jordan Elimination

Gauss Jordan elimination is a method for solving a system of linear equations A * x = b for x, or for computing the inverse matrix of A.

Gauss elimination and Gauss Jordan elimination are very closely related. Gauss elimination reduces A to an upper triangular matrix, and saves the elimination factors in a lower triangular matrix. Gauss Jordan elimination proceeds relentlessly until A has been converted into the identity matrix.

Thus, unlike the Gauss elimination procedure, the Gauss Jordan elimination does not produce a factorization of the matrix, but only a solution to the linear system. This means that if a second linear system has to be solved, the matrix has to be set up and eliminated all over again.

The simplest way to describe Gauss Jordan is to note that to solve, say, the linear system A * x = b, the right hand side is appended as an extra column of the matrix. Then, on step I of the elimination, we choose a pivot row, move it to row I, divide it through by the pivot value, and then eliminate the matrix entries in column I from all other rows, rather than simply from rows I+1 through N. When the process is completed, the solution x has overwritten the right hand side b that was stored in column N+1.

Several right hand sides can be handled at once, by appending all of them to the coefficient matrix; the inverse can be computed by appending a copy of the identity matrix to the coefficient matrix before beginning elimination.

Gauss Jordan elimination is primarily used as a teaching tool, and for small linear systems. In practical computation, standard Gauss elimination is universally preferred.

## Gauss Seidel Iteration For Linear Equations

The Gauss Seidel iteration for linear equations is an iterative method for solving linear systems of equations A*x=b. It is similar to the the Jacobi algorithm and the successive overrelaxation method (SOR).

The Gauss Seidel iteration should only be used for matrices which are symmetric and positive definite, or for a matrix which is strictly diagonally dominant.

Each step of the Gauss Seidel iteration begins with an approximate answer x, and produces a new approximation y. Each component of y is computed in order, using the formula:

y(i) = [ b(i)
- a(i,1)*y(1)
- a(i,2)*y(2)
...
- a(i,i-1)*y(i-1)
- a(i,i+1)*x(i+1)
...
- a(i,n)*x(n) ] / a(i,i)

The calculation of each entry of y is dependent on the calculation of the entries of lower index. Thus, the value of y(1) is calculated first, and then y(2) is calculated based on the values of y(1) as well as x(3) through x(n), and so on.

The process is to be repeated until the residual error is small, or the change in the approximate solution is negligible.

The Gauss Seidel iteration can be considered in terms of its matrix splitting. That is, if we decompose the matrix A into its strictly lower triangular, diagonal, and strictly upper triangular parts:

A = L + D + U
then the method is equivalent to the iteration
( L + D ) * xnew = b - U * x.
which means that the convergence of the algorithm can be understood in terms of the behavior of powers of the iteration matrix:
- ( L + D )-1 * U,
which in turn may best be understood by looking at the eigenvalues.

If the original coefficient matrix A is symmetric, then it may be preferred to use the symmetric Gauss Seidel iteration or SGS. In this case, the iteration consists of pairs of Gauss Seidel steps. The odd steps are the same as the usual iteration. But in the even steps, the variables are solved for in reverse order. Each pair of such steps is a single step of the SGS iteration, which has the property that its iteration matrix is similar to a symmetric matrix (though not necessarily symmetric itself). Among other things, this means that SGS can be used as a preconditioner for certain other problems.

## General Matrix Storage

A general matrix is one which has no special matrix structure or matrix symmetry.

In such a case, there are no space advantages to be gained by using a special matrix storage format, and so the matrix entries are stored using the standard two dimensional array format provided by the programming language.

For general matrices, the only remaining issue concerns the problem that occurs when the matrix storage must be set aside before the size of the matrix is known. In FORTRAN, for example, it is common to specify a maximum matrix size of, say, 100 by 100. If the actual problem to be solved is of size 25 by 25, then it may be necessary to describe the data with both the matrix order of 25, and the leading dimension of the storage array, which is 100. In LINPACK and LAPACK, variables containing leading dimension information have names like LDA, LDB and so on.

LAPACK and LINPACK provide routines, with the prefix SGE, which apply to matrices in general storage.

## Generalized Permutation Matrix

A generalized permutation matrix is a square matrix A with at most one nonzero entry in each row, and in each column.

A standard permutation matrix, of course, has exactly one nonzero entry in each row and column, and that entry has value 1.

An interesting fact: if A is a nonsingular nonnegative matrix, then the inverse of A is also nonnegative if and only if A is a generalized permutation matrix. In other words, it's very hard for both A and A-1 to be nonnegative, and essentially can only happen if A is diagonal.

So suppose A >= 0 but A is not a generalized permutation matrix. For an arbitrary vector b >= 0 , can we say that the solution x of A * x = b is nonnegative? No, because we know that the inverse of A is not nonnegative. Therefore A-1 contains at least one negative entry, say entry (i,j). Choose b = E(J), where E(J) is the J-th Cartesian basis vector. Then x = A-1 * b and it is easy to see that x(i) is negative.

## Gershgorin Disks

The method of Gershgorin disks provides an estimate of the size of the eigenvalues of a matrix. The accuracy of the estimate varies wildly, depending on the size of the elements of the matrix. It is most useful for matrices that are diagonally dominant or sparse.

Gershgorin's theorem states that the eigenvalues of any matrix A lie in the space covered by the disks Di:

Di = ( x: sqrt ( x - A(I,I) )2 <= Ri )
where Ri is the sum of the absolute values of the off-diagonal elements of row I:
Ri = sum ( J =/= I ) | Ai,j |.

The theorem may also be applied using columns instead of rows.

## Givens Rotation Matrix

A Givens rotation is a linear transformation applied to two vectors, or two rows or columns of a matrix, which can be interpreted as a coordinate axis rotation. The intent of the rotation is to zero out an entry of the vector or matrix using an orthogonal transformation.

A Givens rotation is similar to the elementary row operation that adds a multiple of one row to another, but because a Givens rotation is an orthogonal similarity transformation, it offers greater stability and easy invertibility.

A Givens rotation matrix G has the form:

1  0  0  0  0  0
0  c  0  0  s  0  <-- row i
0  0  1  0  0  0
0  0  0  1  0  0
0 -s  0  0  c  0  <-- row j
0  0  0  0  0  1

^        ^
col i     col j

where c = cosine(theta) and s = sin(theta) for some angle theta.

Premultiplying A by G has the following effect:

row i  of A is replaced by   c*row i + s*row j  in G*A;
row j  of A is replaced by  -s*row i + c*row j  in G*A.

while postmultiplication, A*G, would carry out a similar operation on the columns of A.

As an example, to zero out entry Ai,j of a matrix requires a Givens rotation with values of cosine and sine so that:

- s * Ai,i + c * Ai,j = 0.
It's not actually necessary to compute the underlying rotation angle theta, since c and s can be computed directly:
s = Ai,j / sqrt ( Ai,j2 + Ai,i2 )
c = Ai,i / sqrt ( Ai,j2 + Ai,i2 )

For instance, to zero out the 3,1 entry of this matrix:

4 2 0
0 4 5
3 8 1

the sine and cosine are 3/5, 4/5, yielding a Givens matrix G of:
0.8  0  0.6
0   1   0
-0.6  0  0.8

and the product G * A:
5.0 6.4 6.0
0   4   5
0  5.2 8.0

It is possible to zero out entries of a matrix, one by one, using Givens rotations, similar to the way that Householder matrices are used, to reduce a matrix to a simpler form. The process can be used to zero out the entire lower triangle of a matrix, but further operations on the upper triangle would reintroduce zeroes in the lower triangle. Nonetheless, zeroing out the lower triangle means that Givens rotations can be used to produce the QR factorization of the matrix.

## Gram Matrix

Given a set of M vectors Vi, each of dimension N, the M by M Gram matrix is formed by all the pairwise dot products of the vectors:

Gi,j = Vi dot Vj

The Gram matrix is a positive semidefinite symmetric. If any vector is linearly dependent on the others, the matrix is singular. If the vectors are linearly independent, the Gram matrix has full rank M. If the vectors form a basis for the space, the Gram matrix has rank N.

The Gram matrix can be used to construct a projection matrix into the space spanned by the vectors.

## Gram Schmidt Orthogonalization

Gram Schmidt orthogonalization is a process which starts with a set of N vectors X(I), each with M components, and produces a set of N2 orthonormal vectors Y which span the linear space of the original vectors.

N2 is less than N if the vectors X(I) are linearly dependent, and equal to N if they are linearly independent. Thus, one use for the Gram Schmidt process is simply to determine if a set of vectors are linearly dependent; a second is to determine the dimension of the space spanned by a linearly dependent set.

The Gram Schmidt process may be defined iteratively:

for I = 1 to N

N2 = I

Y(I) = X(I)

for J = 1 to I-1
C(J) = dot_product ( X(I), Y(J) )
Y(I) = Y(I) - C(J) * Y(J)
end for

Norm = sqrt ( dot_product ( Y(I), Y(I) ) )

if ( Norm = 0 ) exit

Y(I) = Y(I) / Norm

end for

Another way of looking at the process is to use the vectors to form the columns of a matrix A. The Gram Schmidt process can then be used to construct one version of the QR factorization of the matrix A:

A = Q * R
where Q is orthogonal and R is upper triangular.

The Hadamard product of matrices A and B is a matrix C created by elementwise multiplication:

Ci,j = Ai,j * Bi,j

The Hadamard product is defined for any pair of rectangular matrices, as long as they have the same "shape", that is, the same number of rows, and the same number of columns.

1 2 3  *   7  8  9 =  7 16 27
4 5 6     10 11 12   40 55 72

Hadamard's inequality provides an upper bound on the size of the determinant of a matrix. It is related to the fact that the determinant represents the volume of an N-dimensional parallelepiped.

Let ||C(I)|| designate the Euclidean norm of column I of the matrix A. Hadamard's inequality states that

det ( A ) <= ||C(1)|| * ||C(2)|| * ... * ||C(N)||,
with equality holding only if one of the C(I)'s is zero, (yielding the minimum possible value of 0), or if all the C(I)'s are pairwise orthogonal vectors, (yielding the largest possible value).

The theorem may also be applied using rows instead of columns.

## Hamiltonian Matrix

A Hamiltonian matrix is a 2Nx2N matrix of the form

A  G
Q -A

where the NxN matrices Q and G are symmetric.

## Hankel Matrix

A Hankel matrix is a matrix which is constant along each of its anti-diagonals.

Here is an example of a square Hankel matrix:

7 6 5 4
6 5 4 3
5 4 3 2
4 3 2 1

and a rectangular Hankel matrix:
7 6 5 4 3 2
6 5 4 3 2 1
5 4 3 2 1 0

Simple facts about a Hankel matrix A:

• A is symmetric;
• the inverse of A is symmetric, but need not be a Hankel matrix;

Compare the concepts of Toeplitz Matrix, an Anticirculant Matrix, and a Persymmetric Matrix.

## Harwell Boeing Sparse Matrix Collection (HBSMC)

The Harwell Boeing Sparse Matrix Collection, or HBSMC, is a set of 43 data files describing a standard set of test sparse matrices for sparse matrix calculations.

The test set comprises linear systems, least squares problems, and eigenvalue calculations from a wide variety of disciplines. The set is offered as a standard benchmark for comparison of algorithms.

Here is an overview of the source and size of the various matrices:

Discipline                          Number   Largest  Largest number
of       order    of nonzeroes
matrices

Counter examples, small matrices       3       11        76
Original Harwell test set             36      822      4841
Air traffic control                    1     2873     15032
Astrophysics                           2      765     24382
Chemical Engineering                  16     2021      7353
Circuit simulation                     1      991      6027
Demography                             3     3140    543162
Economic modelling                    11     2529     90158
Nuclear reactor core modelling         3     1374      8606
Optimal power flow problems            3     4929     47369
Stochastic modelling                   7     1107      5664
Acoustic scattering                    4      841      4089
Oil reservoir modelling               19     5005     20033
Stiff ODE problems                    10      760      5976
George and Liu test problems          21     3466     13681
Model PDE problems                     3      900      4322
Navier Stokes problems                 7     3937     25407
Unassembled finite element matrices   10     5976     15680
Oceanography                           4     1919     17159
Power network matrices                14     5300     13571
Everstine test set, ship structures   30     2680     23853
Structures, eigenproblems             22    15439    133840
Structures, linear equations          36    44609   1029655
Least squares problems                 4     1850     10608

The SPARSKIT package includes utilities for conversion of matrices in the Harwell Boeing format into other formats, such as that used by ELLPACK and ITPACK.

## HBSMC Sparse Matrix Storage

The Harwell Boeing Sparse Matrix Collection uses a matrix storage format which is a special kind of sparse matrix storage for most of the matrices in the collection.

The standard sparse matrix format is column oriented. That is, the matrix is represented by a sequence of columns. Each column is held as a sparse vector, represented by a list of row indices of the entries in an integer array and a list of the corresponding values in a separate real array. A single integer array and a single real array are used to store the row indices and the values, respectively, for all the columns.

Data for each column are stored in consecutive locations. The columns are stored in order, and there is no space between columns. A separate integer array holds the location of the first entry of each column, and the first free location. For symmetric and Hermitian matrices, only the entries of the lower triangle are stored, including the diagonal. For antisymmetric matrices, only the strict lower triangle is stored.

Here is a simple example of a 5 by 5 matrix:

1.0 -3.0  0.0 -1.0  0.0
0.0  0.0 -2.0  0.0  3.0
2.0  0.0  0.0  0.0  0.0
0.0  4.0  0.0 -4.0  0.0
5.0  0.0 -5.0  0.0  6.0

This matrix would be stored in the arrays
• COLPTR (location of the first entry of a column),
• ROWIND (row indices) and
• VALUES (numerical values)
as follows:
Subscripts:   1    2    3    4    5    6    7    8    9   10   11

COLPTR        1    4    6    8   10   12
ROWIND        1    3    5    1    4    2    5    1    4    2    5
VALUES      1.0  2.0  5.0 -3.0  4.0 -2.0 -5.0 -1.0 -4.0  3.0  6.0

We can generate column 5, say, by observing that its first entry is in position COLPTR(5)=10 of arrays ROWIND and VALUES. This entry is in row ROWIND(10)=2 and has value VALUES(10)=3.0. Other entries in column 5 are found by scanning ROWIND and VALUES to position COLPTR(6)-1, that is, position 11. Thus, the only other entry in column 5 is in row ROWIND(11)=5 with value VALUES(11)=6.

## HBSMC Finite Element Matrix Storage

The HBSMC finite element storage format is a special matrix storage format for those matrices in the collection which derive from a finite element problem.

Matrices arising in finite element applications are usually assembled from numerous small elemental matrices. The collection includes a few sparse matrices in original unassembled form. The storage of the individual unassembled matrices is based on the general sparse format, which stores a matrix as a list of matrix columns. The elemental representation stores the matrix as a list of elemental matrices. Each elemental matrix is represented by a list of the row/column indices (variables) associated with the element and by a small dense matrix giving the numerical values by columns, or in the symmetric case, only the lower triangular part. The lists of indices are held contiguously, just as for the lists of row indices in the standard format. The dense matrices are held contiguously in a separate array, with each matrix held by columns. Although there is not a one to one correspondence between the arrays of integer and numerical values, the representation does not hold the pointers to the beginning of the real values for each element. These pointers can be created from the index start pointers (ELTPTR) after noting that an element with NU variables has NU*NU real values, or (NU*(NU+1))/2 in the symmetric case.

We illustrate the elemental storage scheme with a small, 5 by 5 example:

5.0  0.0  0.0  1.0  2.0
0.0  4.0  3.0  0.0  6.0
0.0  3.0  7.0  8.0  1.0
1.0  0.0  8.0  9.0  0.0
2.0  6.0  1.0  0.0 10.0

generated from four elemental matrices:
1   4         1   5        2   3   5        3   4
1 (2.0 1.0)   1 (3.0 2.0)  2 (4.0 3.0 6.0)  3 (2.0 8.0)
4 (1.0 7.0)   5 (2.0 8.0)  3 (3.0 5.0 1.0)  4 (8.0 2.0)
5 (6.0 1.0 2.0)

where the variable indices are indicated by the integers marking the rows and columns. This matrix would be stored in the ELTPTR (location of first entry), VARIND (variable indices) and VALUES (numerical values) arrays as follows:

Subscripts:  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
ELTPTR:      1   3   5   8  10
VARIND:      1   4   1   5   2   3   5   3   4
VALUES:      2.  1.  7.  3.  2.  8.  4.  3.  6.  5.  1.  2.  2.  8.  2.

## Hermite Normal Form

A nonsingular integer matrix is in Hermite Normal Form if it is lower triangular, all entries are non-negative, and each row has a unique maximum element which is located on the main diagonal.

(In some definitions, the matrix is required to be upper triangular instead.)

Any nonsingular integer matrix can be transformed to Hermite Normal Form using a series of unimodular column operations:

• add an integer multiple of one column to another;
• exchange two columns;
• multiple any column by -1.

For example, given the matrix A:

5   2  1
-4   2  4
0  -3  6

its Hermite normal form is:
1   0   0
4   6   0
6  15  30

## Hermitian Matrix

A Hermitian matrix A is a complex matrix that is equal to its complex conjugate transpose, symbolized by a "*" or "H":

A = AH
or
A = A*

Here is a Hermitian matrix:

1     1+2i  3-4i
1-2i  4       6i
3+4i   -6i  8

Simple facts about a Hermitian matrix A:

• The diagonal entries of A must be real.
• The eigenvalues of A are real, although the eigenvectors are generally complex.
• Eigenvectors of A that correspond to different eigenvectors will be orthogonal vectors.
• A can be decomposed into the form A = U*LAMBDA*UH where U is unitary and LAMBDA is real and diagonal.

The corresponding concept for real matrices is symmetric.

## Hessenberg Matrix

An upper Hessenberg matrix is a matrix which is entirely zero below the first subdiagonal.

An upper Hessenberg matrix is "almost" upper triangular. A lower Hessenberg matrix is, of course, entirely zero above the first superdiagonal. Upper Hessenberg matrices occur so often that they are frequently simply called "Hessenberg" matrices.

An example of an upper Hessenberg matrix is:

1 2 3 4 5
6 7 8 9 1
0 5 4 3 2
0 0 8 3 7
0 0 0 9 1

Eigenvalue programs typically transform a matrix into upper Hessenberg form, and then carry out the QR method on this matrix, which converges rapidly to a matrix which is diagonal except for 2 by 2 blocks corresponding to complex eigenvalues.

The reason for transforming a matrix into upper Hessenberg form is that the QR method is much less expensive if carried out on an upper Hessenberg matrix. It is actually cheaper to go to the additional trouble of transforming a matrix to upper Hessenberg form, and then carrying out the QR method on that matrix, rather than carrying out the QR method on the original matrix.

## Hölder Norms (p-norms)

The Hölder norm of a vector, sometimes called the p-norm, is a vector norm of the form ||x||p = ( sum ( 1 <<= i <= n ) |xi|p )(1/p) where p is any real value greater than or equal to 1.

Although the formula does not make sense, there is also a Hölder norm for p equal to infinity; it's simply the maximum of the absolute values of the entries of the vector.

The L1 norm, the L2 norm, and the L Infinity norm are exactly the Hölder norms with p equal to 1, 2 and infinity.

## Householder Matrix

A Householder matrix for a given vector v has the form:

H = I - 2 * v * v' / ( norm2 ( v ) )2
or, in the common case where the Euclidean norm of v is 1, we may write:
H = I - 2 * v * v'
Note that the Householder matrix is a sort of "overenthusiastic" projection matrix; instead of removing the component in the direction of v, it reverses it.

For the simple case where w = (1/3, 2/3, 2/3), here is what H would look like:

(1 0 0)       (1/9 2/9 2/9)   ( 7/9 -4/9 -4/9)
(0 1 0) - 2 * (2/9 4/9 4/9) = (-4/9  1/9 -8/9)
(0 0 1)       (2/9 4/9 4/9)   (-4/9 -8/9  1/9)

A little "reflection" will convince you that the Householder matrix for any vector v will always be symmetric, just like this example.

The Householder matrix is also an orthogonal matrix:

H * H' = H * H = I.

Householder matrices can be used to compute the QR factorization of a matrix. A Householder matrix can be found which will "wipe out" all the subdiagonal entries of the first column of the original matrix. Another Householder matrix can be found which will "wipe out" all the subdiagonal entries of the second column, and so on. At the end of N-1 steps of this process, we have computed

Hn-1 * ... * H2 * H1 * A = R
where R is upper triangular. But the product
H = Hn-1 * ... * H2 * H1
is an orthogonal matrix. We can multiply both sides by the transpose of H, which is also its inverse, to get
A = H' * R
or, if we define Q to be H':
A = Q * R.

The Householder matrix is sometimes called an elementary reflector.

## Idempotent Matrix

An idempotent matrix A has the property that

A * A = A
An idempotent matrix is "not quite" a projection matrix.

Simple facts about an idempotent matrix A:

• The identity matrix is idempotent;
• Every point of the form A*x is a fixed point of A, that is A * ( A * x ) = A * x;
• If A is idempotent, and it is not the identity matrix, it must be singular;
• The only eigenvalues of A are 0 and 1;
• the determinant is either 1 (in which case A is the identity) or 0.
• I-A is also idempotent;
• rank(A) = trace(A);
• I-2*A is involutory;

## The Identity Matrix

The identity matrix, usually denoted I, is a square matrix with 1's on the main diagonal and 0's elsewhere. The identity matrix behaves like the number 1 in matrix multiplication. For any matrix A:

A * I = I * A = A.

Here is the 3 by 3 identity matrix:

1 0 0
0 1 0
0 0 1

Simple facts about the identity matrix I:

## Ill Conditioned Linear System

When solving a linear system A * x = b, the matrix A is said to be ill conditioned if small errors or perturbations in the coefficient matrix A or right hand side b correspond to large errors or perturbations in the solution x.

A numerical scale for ill conditioning is provided by the condition number.

Linear systems which are extremely ill conditioned may be impossible to solve accurately. A standard example of an ill conditioned matrix is the Hilbert Matrix, with Ai,j = 1 / ( I + J ).

## The Incomplete Cholesky factorization

The incomplete Cholesky factorization is an approximate Cholesky factorization, of a matrix A, comprising a lower triangular matrix L for which A is approximately equal to L*L'.

The incomplete Cholesky factorization is a special case of the Incomplete LU Factorization, modified for symmetric positive definite matrices. There are known cases where it will fail.

Typically, the matrix A is large and sparse, (as well as being symmetric and positive definite) and an iterative scheme is being used to solve the linear system A*x=b. The incomplete Cholesky factorization is intended as a preconditioner, which modifies the linear system, improving the convergence rate of the iterative scheme. In particular, the preconditioner matrix is M=L*L', and we are well advised to store L instead of M, since this gives us an easy way of solving linear systems associated with M.

The computation of the incomplete Cholesky factorization is similar to that of the usual Cholesky factor except for two points. First, we assume that no pivoting is required. Secondly, if the original matrix A has a zero entry, then we require that the corresponding entry of the L matrix be zero. This means that there is no fill in (and it is also why L*L' will not equal A). Because there is no fill in, the factor L can actually be stored in a data structure that is the same as that of A.

Because no pivoting is allowed, the algorithm may break down because of negative elements that may appear on the diagonal. See the Kershaw matrix for a simple example.

## The Incomplete LU factorization

The incomplete LU factorization is an approximate LU factorization of a matrix A, comprising a unit lower triangular matrix L and an upper triangular matrix U.

Typically, the matrix A is large and sparse, and an iterative scheme is being used to solve the linear system A*x=b. The incomplete LU factorization is intended as a preconditioner, which modifies the linear system, improving the convergence rate of the iterative scheme. In particular, the preconditioner matrix is M=L*U, and we are well advised to store L and U instead of M, since this gives us an easy way of solving linear systems associated with M.

The computation of the incomplete LU factorization is similar to that of the usual PLU factors except for two points. First, we assume that no pivoting is required, hence the P factor can be omitted. Secondly, if the original matrix A has a zero entry, then we require that the corresponding entry of the U matrix be zero. This means that there is no fill in (and it is also why L*U will not equal A). Because there is no fill in, the factors L and U can actually be stored in a data structure that is the same as that of A. The elements of L are stored in the positions devoted to lower triangular elements of A, while U goes in the diagonal and upper triangular locations.

If the matrix A is symmetric and positive definite, then the related Incomplete Cholesky Factorization may be tried.

## Inertia

The oddly named inertia of a (square) matrix is the numbers of negative, zero, and positive eigenvalues.

Sylvester's Law of Inertia states that if A and B are congruent matrices, then we cannot guarantee that they have the same eigenvalues, but they do have the same inertias.

This theorem allows us to determine if a symmetric matrix is positive definite. Because the matrix is symmetric, we can compute an LDL factorization:

A = L * D * L'
where L is unit lower triangular, and D is diagonal. This means that A is congruent to the diagonal matrix D. But the eigenvalues of D are easily determined, from which we can get the inertia of D. This is equal to the inertia of A; in particular, if D has only positive eigenvalues, then so does A, which is therefore positive definite.

Moreover, if such a factorization is cheap, as for a tridiagonal symmetric matrix, then we can search for eigenvalues by seeking diagonal shifts of the matrix that cause the number of negative eigenvalues to change by 1.

## Inner Product

An inner product is a scalar-valued function of two vectors x and y, denoted (x,y), with the properties that:

• ( x, y ) = ( y, x ), symmetry
• ( x, s * y ) = s * ( x, y ) for any scalar s, linearity
• ( x, y + z ) = ( x, y ) + ( x, z ) additivity
• ( x, x ) >= 0, and equal to zero only if x = 0, positivity.
• ( x, A * y ) = ( A' * x, y ).

A vector inner product (x,y) can be used to define a corresponding vector norm ||x||:

|| x || = sqrt ( x, x ).
If the inner product and norm are related in this way, then the Cauchy-Schwarz inequality relates them.

The inner product is sometimes referred to as the dot product (because it is often represented as x dot y), or as the scalar product (because its result is a scalar value).

## Integer Matrix

An integer matrix is simple a matrix whose entries are all integers.

Examples of integer matrices include

If A is an integer matrix, it is natural to ask whether this tells us something about quantities related to A.

It is no surprise that if A is an integer matrix, then so will be the trace and its transpose of A.

moreover, if A and B are integer matrices, then it is obvious that the product A*B will also be an integer matrix.

Since the determinant is a sum of products of entries of the matrix, if A is an integer matrix then det(A) must be an integer.

The adjoint matrix and the cofactor matrix, being formed from determinants of submatrices of A, will be integer matrices if A is.

Since the inverse matrix can be formed using Cramer's rule, which involves the ratio of determinants of a submatrix of A divided by the determinant of A, it is not necessarily the case that inverse(A) is an integer matrix, but it must be the case that det(A)*inverse(A) is an integer matrix.

For similar reasons, if A is an integer matrix, and b is an integer vector, then the solution x of the linear system A*x=b is not necessarily an integer vector, but det(A)*x will be an integer vector.

If A is triangular matrix with unit diagonal and is an integer matrix, then inverse(A) will have the same format and will be an integer matrix too. (Of course, this also follows from the fact that det(A) must be 1!)

## Inverse Matrix

The inverse matrix of a square matrix A, if it exists, is a matrix denoted A-1 with the property that

A * A-1 = A-1 * A = I.

If the inverse matrix exists, it is unique, and A is said to be nonsingular or invertible. Otherwise, A is singular.

If the inverse of A exists, then the solution of

A * x = b
can be immediately written down:
x = A-1 * b.

However, it's not a good idea to solve a linear system in this way. The inverse is relatively expensive to compute, and subject to greater inaccuracies than other solution methods. This is not to say that the inverse isn't useful. Orthogonal and unitary transformations are so popular in numerical linear algebra because their inverses "come for free"; (and their inverses are very well conditioned).

Simple facts:

• if lambda is an eigenvalue of A, then 1/lambda is an eigenvalue of A-1;
• if x is an eigenvector of A, x is also an eigenvector of A-1.

LAPACK and LINPACK include routines for explicitly computing the inverse of a given matrix.

## Inverse Power Method

The inverse power method is a technique for solving the eigenvalue problem.

The inverse power method is related to the power method, but is more flexible. Through the use of shifts, it can be "aimed" to seek the eigenvalue that is closest to any particular target value. Moreover, once a rough estimate of an eigenvalue is made, the shift can be set to this value, which will increase the convergence rate of the algorithm.

The inverse power method only requires that the user be able to repeatedly solve linear systems of the form:

A * x = b
or, when shifts are used,
( A - shift * I ) * x = b.
Thus, for instance, if A is a band matrix or a sparse matrix, the user can employ a storage method and solution algorithm appropriate to the particular form of the problem.

The inverse power method gets its name from the fact that, when the shift is zero, it is equivalent to using the power method on the matrix A-1.

The inverse power method begins by picking a starting guess for the eigenvector x, and the eigenvalue lambda. If the shift will not vary then a fixed value should be set now. Then repeat the following steps as often as necessary:

• Set norm = the norm of x;
• Set x = x / norm;
• Set lambda_Old = lambda;
• Set lambda = 1 / norm;
• If the shift may vary, set shift = lambda;
• Solve ( A - shift * I ) * xnew = x.
• If no convergence, set xnew = x and repeat.

## Invertible Matrix

An invertible matrix A is a (square) matrix for which there exists an inverse matrix B, called the inverse of A.

A square matrix A is invertible if and only if:

• no eigenvalue of A is zero;
• no singular value of A is zero;
• every linear system A * x = b has a unique solution;
• the null space of A is exactly the zero vector;
• the rows (and the columns) of A are linearly independent;
• the determinant of A is nonzero;
• Gauss elimination, with partial pivoting, and exact arithmetic, never encounters a zero pivot (including the "unused" pivot on the last step).

A common situation involves a "small" perturbation matrix added to the identity matrix. The matrix I+A is guaranteed to be invertible if it is true that ||A|| < 1 for some vector-bound matrix norm.

## Involutory Matrix

An involutory matrix A has the property that

A * A = I.

Simple facts about an involutory matrix A:

• The identity matrix is involutory;
• A is invertible, and A-1=A;
• the determinant is +1 or -1.

## Irreducible Matrix

An irreducible matrix is a (square) matrix which is not reducible.

Definition One: A reducible matrix is one which can be rearranged into the following block form:

( P Q )
( 0 R )

where P and R are square sub-blocks, and "0" represents a (nonempty) rectangular block of zero entries. The rearrangement can be done by simultaneous row and column permutations. An irreducible matrix is one which is not reducible.

Definition Two: A matrix is irreducible if and only if, for any row index i and column index j, there is always a nonnegative integer p (which may be 0) and a sequence of integers k1, ..., kp so that the product

Ai,k1 * Ak1,k2 * ... * Akp,j
is nonzero.

Theorem: A nonnegative matrix A is irreducible if and only if, for any vector x>0 it is the case that A*x>0.

Definition Three: A matrix of order N is irreducible if, for any division of the integers between 1 and N into two disjoint sets K1 and K2, there is always a nonzero element Ai,j with I in K1 and J in K2.

The concept of an irreducible matrix is mainly of interest in the analysis of the convergence of certain iterative schemes for linear equations. One key idea is the following: if the matrix A is irreducible and diagonally dominant, then A is nonsingular.

If you only know that the matrix is diagonally dominant, then Gershgorin's theorem would still not rule out an eigenvalue of zero, and hence singularity. It's the irreducibility that guarantees no zero eigenvalue here. The "1, -2, 1" tridiagonal matrix is an example where this theorem applies.

Here's an example of an irreducible matrix with zero main diagonal:

0 1 0 0
1 0 1 0
0 1 0 1
0 0 1 0

If you're familiar with graph theory, then a matrix A is irreducible if and only if the digraph with corresponding adjacency matrix is strongly connected. (Consider node I to be connected to node J if Ai,j is nonzero. The digraph is strongly connected if you can get from any node to any other node following directed edges.)

A simple algorithm to determine if a matrix is irreducible can be guided by the idea of the strong connectivity of the corresponding adjacency matrix. First, we choose any node P, and ensure that it can be reached from at least one other node. (We're looking at the P column of the matrix.) If that is not the case, then the matrix is reducible, and we're done. Otherwise, we determine all the nodes we can reach from node P (that's the nonzero entries in row P). Now we take all those nodes, and see what new nodes we can reach. We repeat this operation until no new nodes are found. If any node wasn't reached, the graph is not strongly connected, and the matrix is reducible. Otherwise, the matrix is irreducible.

## Iterative Methods for Eigenvalues

A method is called iterative when it consists of a basic series of operations which are carried out over and over again, until the answer that is produced is no longer significantly changing, or some exceptional error occurs, or the limit on the number of steps is exceeded.

All eigenvalue problems are solved by iterative methods, except for the "toy" problems presented in textbooks. This is because the computation of eigenvalues is equivalent to finding the roots of a polynomial, and there is no explicit method for finding the roots of a general polynomial of degree 5 or higher.

The best known methods for the eigenvalue problem include:

## Iterative Methods for Linear Equations

A method is called iterative when it consists of a basic series of operations which are carried out over and over again, until the answer that is produced is no longer significantly changing, or some exceptional error occurs, or the limit on the number of steps is exceeded.

A direct method is the "opposite" of an iterative method. A fixed number of operations are carried out once, at the end of which the solution is produced. Gauss elimination on a linear system is an example of such a direct method. Direct methods are the primary method for solving small or dense or nonsymmetric linear systems.

The most common reason for using an iterative method is that it can require far less storage than a direct method. An iterative method typically only sets aside storage for the original nonzero entries of the matrix; no fill in occurs. Standard direct methods must set aside an entry for every possible position in the matrix, though some reduction in this requirement is possible if the matrix is banded.

Secondly, each iteration typically takes much less time than a full direct solve; thus, it is possible, for some problems, that an iterative method will actually converge to an acceptable answer more quickly than a direct method.

An iterative method has numerous disadvantages. You will need a starting point, and if you pick a poor one that may slow down convergence. Your system matrix usually needs to satisfy extra conditions beyond merely being nonsingular. The rate of convergence may be extremely slow, although this can be helped by a suitable preconditioner. If you have a very large problem and are storing the matrix in a compact form, the programming and computational cost involved in storing and retrieving coefficient data can exceed that of the solution phase.

Iterative methods are generally only suitable for certain kinds of system matrices. The most common requirements are that the matrix be positive definite symmetric or strictly diagonally dominant.

Iterative methods for solving systems of linear equations include:

## Iterative Refinement

Iterative refinement is an attempt to "improve" a computed solution x0 for the linear system of equations A*x=b.

The algorithm is occasionally effective when the coefficient matrix is ill conditioned, a problem that may become evident if the residual error is computed, and seen to be relatively large.

A single step of the algorithm involves computing the residual error:

r := b - A * x,
solving the linear system:
A * dx = r,
and adding the correction dx to the original solution x.
x := x + dx.

In order to achieve any improvement in accuracy, the residual error should be calculated in higher arithmetic precision. The rest of the calculation can be done in single precision, allowing the use of the already computed LU factorization of A, if available. However, to compute the residual error, the original matrix A is needed. It is preferable to figure out how to use the LU factors to compute the residual error, rather than keeping two copies of the matrix, one factored and one untouched.

If the residual error of the new solution is still too large, but somewhat better, then the procedure may be repeated as often as desired.

A subroutine SGEIM for carrying out iterative refinement using the LINPACK routines is described in the LINPACK User's Guide.

## Jacobi Algorithm for Eigenvalues

The Jacobi algorithm for computing eigenvalues is an iterative method, which may be applied to a symmetric matrix A, to compute its eigenvalues and eigenvectors.

The method will produce an orthogonal matrix Q and an "approximately" diagonal matrix lambda such that

A = Q' * lambda * Q.

The steps of the iteration compute a sequence of orthogonal matrices Q1, Q2, ... which transform the original matrix A into matrices A1, A2, ... each of which has the same eigenvalues as A, but each of which is "more" diagonal than A was. Starting with A, we determine Q1, and produce A1, which is orthogonally similar to A:

A1 = Q1' * A * Q1.
We then determine a matrix Q2 that produces A2:
A2 = Q2' * A1 * Q2
= ( Q1 * Q2 )' * A * ( Q1 * Q2 ).

The point of the algorithm is how we choose the matrices Q at each step. Q1 is chosen in such a way as to "annihilate" the (1,2) and (2,1) elements of A. That is, the (1,2) and (2,1) elements of A1 will be zero. Q2 will eliminate the (1,3) and (3,1) elements of A1. Thus A2 will have zeroes in the (1,3) and (3,1) positions. Unfortunately, the (1,2) and (2,1) positions of A2, which we just zeroed out on the previous step, will not remain zero, but will fill in again! However, the fill in values are generally smaller than the original values. As we can sweep through the entire matrix, we repeatedly annihilate the off-diagonal elements until they all have decreased below some tolerance.

It can be shown that the sum of the squares of the off diagonal elements always decreases to zero with the iteration. Thus, for some iteration step M, one can expect to have a matrix AM and a matrix

Q = Q1*Q2*...*QM
so that
AM = Q' * A * Q
where AM is "essentially" diagonal. At that point, we can rewrite this equation as
A * Q = lambda * Q
where lambda is the diagonal entries of AM, and is the eigenvalues of A, and the transformation matrix Q is the matrix of eigenvectors of A. Thus, if the off diagonal elements disappear as promised, we have approximately solved our eigenvalue problem.

So which orthogonal matrix Q zeroes out a specific pair of entries? The formula is fairly simple. To annihilate the arbitrary entries Ai,j and Aj,i, the matrix Q is equal to the identity matrix, except for:

Q(I,I) =     C    Q(I,J) = S
Q(J,I) =   - S    Q(J,J) = C

where C and S are the cosine and sine of some rotation angle THETA. Thus, each matrix Q is a Givens rotation matrix. We can compute C and S directly (skipping the computation of THETA) by the following formula:

U = ( A(J,J) - A(I,I) ) / ( 2 * Ai,j )
T = sign ( U ) / ( | U | + sqrt ( U2 + 1 ) )
C = 1 / sqrt ( T2 + 1 )
S = T / sqrt ( T2 + 1 )

The Jacobi method is simple and easy to program, but is usually slower than the QR method.

## The Jacobi Algorithm for Linear Equations

The Jacobi algorithm for linear equations is an iterative method for solving linear systems of equations A * x = b.

The method is similar to the Gauss Seidel and Successive Overrelation Method (SOR), but has the distinction of being the only simple iterative method that is easy to program in parallel.

The Jacobi iteration is only appropriate for matrices which are strictly diagonally dominant or else symmetric and positive definite.

Each step of the Jacobi iteration begins with an approximate solution x. An improved approximate solution xnew is computed by solving N independent linear equations:

xnew(i) = [ b(i)
- A(i,1)   * x(1)
- A(i,2)   * x(2)
...
- A(i,i-1) * x(i-1)
- 0.0      * x(i)    <-- note that we are skipping x(i) here!
- A(i,i+1) * x(i+1)
...
- A(i,n)   * x(n) ] / A(i,i)

Once xnew is computed, the residual error is calculated:

r = A * xnew - b
and the solution increment:
dx = xnew - x.
If the norms of these vectors are satisfactorily small, the iteration may be halted. Otherwise, xnew becomes the new starting guess x for the next step of the iteration.

The Jacobi iteration can be considered in terms of its matrix splitting. That is, if we decompose the matrix A into its strictly lower triangular, diagonal, and strictly upper triangular parts:

A = L + D + U
then the method is equivalent to the iteration
D * xnew = b - ( L + U ) * x.
which means that the convergence of the algorithm can be understood in terms of the behavior of powers of the iteration matrix:
- D-1 * ( L + U ),
which in turn may best be understood by looking at the eigenvalues. Note that if A is symmetric, then so is the iteration matrix.

## The Jacobi Preconditioner

The Jacobi preconditioner is a very simple preconditioning matrix for use with an iterative method for linear equations.

For a given system matrix A, the Jacobi preconditioner matrix M is the diagonal matrix whose entries are defined by

M = diag ( A )
In effect, using the Jacobi preconditioner amounts to dividing each equation and right hand side entry by the corresponding diagonal entry of the original coefficient matrix.

## Jordan Canonical Form

The Jordan canonical form of a matrix A is an upper bidiagonal matrix whose main diagonal contains the eigenvalues of A, and whose superdiagonal contains only zeroes or ones. The location of the ones in the superdiagonal is determined by the algebraic and geometric multiplicity of the eigenvalues.

Here is an example of a Jordan canonical form:

4 0 0 0 0
0 2 1 0 0
0 0 2 1 0
0 0 0 2 0
0 0 0 0 3

Every matrix is unitarily similar to its Jordan Canonical Form. That is, for any matrix A, there exists a unitary matrix U so that

A = U* * J * U
where J has Jordan canonical form. This form can also be regarded as a matrix factorization.

If A is real, but has complex eigenvalues, the matrix J has complex entries.

The Jordan canonical form is of little interest to computational linear algebraists. Unless exact arithmetic is used, it is extremely sensitive to small errors, and the information it provides can be computed more reliably in other ways.

## Kershaw Matrix

The Kershaw matrix is a simple 4 by 4 positive definite symmetric matrix, which is an example for which the symmetric version of the incomplete Cholesky factorization gives a negative pivot, causing the algorithm to break down.

The Kershaw matrix:

3  -2   0   2
-2   3  -2   0
0  -2   3  -2
2   0  -2   3

Reference:

1. David Kershaw,
The Incomplete Cholesky-Conjugate Gradient Method for the Iterative Solution of Systems of Linear Equations,
Journal of Computational Physics,
Volume 26, pages 43-65, 1978.

## Krylov Matrix

Given a square matrix A and some initial vector b, the first k elements of the sequence of Krylov vectors are:

b, Ab, A2b, A3b, ... Ak-1b;

the corresponding Krylov subspace is the subspace spanned by these vectors; the Krylov matrix is the matrix whose columns are formed by the Krylov vectors.

## L Matrix

An L matrix is matrix whose diagonal elements are strictly positive, and whose off-diagonal elements are nonpositive.

The identity matrix is a "trivial" example of an L matrix.

## L1 Matrix Norm

The L1 matrix norm is a matrix norm that is vector-bound to, and hence compatible with, the L1 vector norm.

Thus, the formal definition of the norm is

||A|| = max ( || A*x || / ||x|| )
where the vector norm used on the right hand side is the L1 vector norm, and the maximum is taken over all nonzero vectors x.

However, it is easy to show that the L1 matrix norm has a simpler formula: ||A|| = the maximum, over all matrix columns, of the sum of the absolute values of the entries in the column.

## L1 Vector Norm

The L1 vector norm is a vector norm defined as

||x|| = sum ( 1 <= I <= N ) |x(i)|.

The L1 vector norm is an example of a Hölder norm.

## L2 Matrix Norm

The L2 matrix norm is a matrix norm that is vector-bound to, and hence compatible with, the L2 vector norm.

Thus, the formal definition of the norm is

||A|| = max ( || A*x || / ||x|| )
where the vector norm used on the right hand side is the L2 vector norm, and the maximum is taken over all nonzero vectors x.

The L2 matrix norm has another formulation: ||A|| = the square root of the maximum absolute value of the eigenvalues of A' * A.

The computation of the L2 norm is expensive, and so it is often simpler to use the easily-computed Frobenius matrix norm, which is not vector-bound to the L2 vector norm, but is compatible with it.

## L2 Vector Norm

The L2 vector norm is a vector norm defined as

||x|| = sqrt ( sum ( 1 <= I <= N ) x(i)2 )

The L2 vector norm is also known as the Euclidean vector norm or the root-mean-square vector norm.

The L2 vector norm is an example of a Hölder norm.

## L Infinity Matrix Norm

The L Infinity matrix norm is a matrix norm that is vector-bound to, and hence compatible with, the L Infinity vector norm.

Thus, the formal definition of the norm is

||A|| = max ( || A*x || / ||x|| )
where the vector norm used on the right hand side is the L Infinity vector norm, and the maximum is taken over all nonzero vectors x.

However, it is easy to show that the L Infinity matrix norm has a simpler formula: ||A|| = the maximum, over all matrix rows, of the sum of the absolute values of the entries in the row.

## L Infinity Vector Norm

The L Infinity vector norm is a vector norm defined as

||x|| = max ( 1 <= I <= N ) |x(i)|.

The L Infinity vector norm is an example of a Hölder norm.

## The Lanczos Iteration

The Lanczos iteration is an iterative method that seeks to approximate some of the eigenvalues of a very large real symmetric (or Hermitian) matrix.

The Lanczos iteration is a version of the Arnoldi iteration that takes advantage of the special properties associated with a real symmetric or Hermitian matrix.

Consider the unitary similarity transform that converts the m by m symmetric or Hermitian matrix A into tridiagonal form:

A = Q T Q*

Let Qn indicate the m by n matrix formed from the first n columns of Q; similarly, Tn is the n by n submatrix of T. Then we write

Tn = Qn* An Qn

As we pass from n to n+1, we simply add a new column qn+1 to Qn, and to Tn we add a new diagonal value alphan+1 and an offdiagonal value betan. Thus, the iteration allows us to gradually build up the matrices Qn and Tn. For reasons not completely understood, even for n much smaller than m, the eigenvalues of Tn will tend to some of the eigenvalues of A, and typically the extreme ones.

The new vector qn+1 is a Krylov vector; actually, because of orthogonalization, it represents the new component introduced into the Krylov subspace by the product A*qn.

Thus, a typical procedure is to compute Tn and apply a standard eigenvalue procedure for tridiagonal matrices to extract its eigenvalues.

The entries of the matrix Tn are denoted by
 alpha1 beta1 beta1 alpha2 beta2 .... .... .... .... betan-2 alphan-1 betan-1 betan-1 alphan

An efficient form for the Lanczos iteration is:

beta0 = 0;
q(0) = 0;
Let b be an arbitrary initial vector;
Set q(1) = b / ||b||;
for n = 1, ...
v = A * q(n)
alpha(n) = q(n)' * v
v = v - beta(n-1) * q(n-1) - alpha(n) * q(n)
beta(n) = ||v||
q(n+1) = v / beta(n)
end for

## LAPACK

LAPACK is a set of linear algebra routines, intended as the replacement for LINPACK and EISPACK. It is a project sponsored by the Argonne National Laboratories and the Numerical Algorithms Group (NAG). The LAPACK routines are intended to achieve optimal performance on various machines by calling the level 3 BLAS, which use block methods to achieve high performance.

## LDL Factorization

The LDL factorization of a symmetric matrix is a decomposition of the form:

A = L * D * L',
involving a unit lower triangular matrix L, and a diagonal matrix.

The LDL factorization is a special case of the LU Factorization, in which we give up the option of Pivoting in order to get a very simple factorization. If the matrix A has a zero pivot, the factorization is still valid; we just can't guarantee that we can solve linear systems.

If the matrix A is actually positive definite, then we can get the even stronger Cholesky factorization.

## Linear Dependence

A set of M vectors, each of order N, is called linearly dependent if there is some linear combination of the vectors, with at least one nonzero coefficient, which equals the zero vector.

If the I-th vector is used as row I of an M by N matrix A, then this is equivalent to saying there is a nonzero vector C such that

A * C = 0
If no such combination is possible, then the vectors are linearly independent.

Simple facts:

• if any of the vectors is the zero vector, then the set is linearly dependent;
• if M is greater than N, the set is linearly dependent;
• if M = N, the vectors are linearly independent if and only if the matrix A is nonsingular.

## Linear Least Squares

A linear least squares problem seeks an "approximate solution" x to a linear system A * x = b, where x is to be chosen in such a way as to minimize the L2 or Euclidean norm of the residual error b - A * x.

A least squares approach is usually tried when the standard algorithms for solving the linear system, such as Gauss Elimination, are not suitable, perhaps because

• the matrix is square, but singular;
• the matrix is square, nonsingular, but with a high condition number;
• the matrix is rectangular, with more rows than columns (probably overdetermined);
• the matrix is rectangular, with more columns than rows ( underdetermined);

In the most common case, there are more equations than unknowns, so that A is actually a matrix with rectangular order of M rows by N columns. Then the system may or may not be consistent, and A itself may perhaps not have the fullest possible rank.

Despite the fact that this problem can't be solved by the usual means, it is still the case that:

1. a unique exact solution x may exist;
2. many exact solutions might exist;
3. there might be no exact solutions, but clearly some vectors x are "better" than others, in the sense that they produce a "smaller" residual error b - A * x.

By choosing to use the L2 vector norm, we are using a least-squares approach. Two advantages are that in case 2, there will always exist a unique solution of minimal norm, and in case 3, there are good algorithms for finding the unique approximate solution which has a residual of minimum norm.

If the columns of the coefficient matrix are linearly independent, then a simple approach to solving the problem is to form and solve the normal equations, although this incurs a significant penalty in terms of the condition number.

Other, superior, methods of solving this problem are preferred, usually via the QR factorization or the pseudoinverse. Such methods can also handle the case where the matrix does not have maximal rank, or in which the number of columns is greater than the number of rows.

## Linear Space

A linear space is a collection X of "vectors", a scalar field F, (usually the real or complex field), and the operations of vector addition and scalar multiplication, with the properties that:

• X includes the zero vector;
• if x is in X, then so is alpha * x, for any scalar alpha;
• if x and y are in X, then so is x + y.

Examples of linear spaces include:

• The set R^n of N dimensional vectors;
• the null space of a matrix;
• the eigenspace associated with an eigenvalue of a matrix, the set of all eigenvectors associated with a particular eigenvalue of a matrix (plus the 0 vector, which we usually don't count as an eigenvector);
• the set of all vectors perpendicular to a given vector.

A linear space has a dimension. The dimension of a linear space can be thought of as the smallest number of vectors from which the space can be reconstructed, or the cardinality of the smallest set of vectors that spans the space, or the cardinality of any basis for the space.

## Linear Transformation

A Linear Transformation is, formally, an operator A applied to elements x of some linear space X, with the properties that

• A ( r * x ) = r * A ( x ) for any x in X, and any real number r;
• A ( x1 + x2 ) = A ( x1 ) + A ( x2 ) for any x1 and x2 in X;

If X is a finite dimensional vector space with an orthonormal basis, then any element x can be represented by an N dimensional vector, and any linear transformation on X can be represented by a matrix. It is usually this matrix that people think of when they speak of a linear transformation.

Linear transformations are frequently used, for example, in computer graphics. It is interesting to note that for an arbitrary linear transformation matrix A, the Gram Schmidt factorization allows us to write A as the product of an orthogonal matrix Q and an upper triangular matrix R. If we "factor out" the diagonal entries of R, we then can view A as:

A = Q * D * S
where
• Q is an orthogonal matrix, and represents a set of rotations;
• D is a diagonal matrix, and represents a set of dilations;
• S is unit upper triangular, and represents a set of shears.

## LINPACK

LINPACK is a standard linear algebra package for factoring matrices, computing matrix determinants, condition numbers, and inverses, and for solving linear systems. Additional capabilities include least squares solutions, QR and singular value decomposition.

Many matrix storage modes are allowed, including dense, banded, symmetric, positive definite, symmetric banded, but LINPACK does not handle sparse matrices, nor does it employ iterative methods.

Here are the routines available for single precision computations. There is a related set available for complex matrices, with names which begin with C instead of S.

• SCHDC computes the Cholesky Factorization of a positive definite matrix in general storage.
• SCHDD "downdates" a Cholesky Factorization.
• SCHEX updates the Cholesky Factorization of a permuted matrix.
• SCHUD updates a Cholesky Factorization.
• SGBCO factors a general band matrix and estimates its condition number.
• SGBDI computes the determinant of a matrix factored by SGBCO or SGBFA.
• SGBFA factors a general band matrix.
• SGBSL solves a linear system involving a general band matrix.
• SGECO factors a general matrix and estimates its condition number.
• SGEDI gets determinant or inverse of a matrix factored by SGECO or SGEFA.
• SGEFA factors a general matrix.
• SGESL solves a linear system factored by SGECO or SGEFA.
• SGTSL solves a linear system, unfactored tridiagonal matrix.
• SPBCO factors a positive definite band matrix and estimates its condition number.
• SPBDI computes the determinant or inverse of a matrix factored by SPBCO or SPBFA.
• SPBFA factors a positive definite band matrix.
• SPBSL solves a linear system factored by SPBCO or SPBFA.
• SPOCO factors a positive definite matrix and estimates its condition number.
• SPODI computes the determinant or inverse of a matrix factored by SPOCO or SPOFA.
• SPOFA factors a positive definite matrix.
• SPOSL solves a linear system factored by SPOCO or SPOFA.
• SPPCO factors a positive definite packed matrix and estimates its condition number.
• SPPDI computes the determinant or inverse of a matrix factored by SPPCO or SPPFA.
• SPPFA factors a positive definite packed matrix.
• SPPSL solves a linear system factored by SPPCO or SPPFA.
• SPTSL solves a linear system for a positive definite tridiagonal matrix.
• SQRDC computes the QR factorization of a general matrix.
• SQRSL solves an overdetermined system, given QR factorization.
• SSICO factors a symmetric indefinite matrix and estimates its condition number.
• SSIDI computes the determinant or inverse of a matrix factored by SSICO or SSIFA.
• SSIFA factors a symmetric indefinite matrix.
• SSISL solves a linear system factored by SSIFA or SSICO.
• SSPCO factors a symmetric indefinite packed matrix and estimates its condition number.
• SSPDI computes the determinant or inverse of a matrix factored by SSPCO or SSPFA.
• SSPFA factors a symmetric indefinite packed matrix.
• SSPSL solves a linear system factored by SSPFA or SSPCO.
• SSVDC computes the singular value decomposition of a general matrix.
• STRCO estimates the condition number of a triangular matrix.
• STRDI finds the inverse or determinant of a triangular matrix.
• STRSL solves a triangular linear system.

## Lonesum Matrix

A lonesum matrix is a (possibly rectangular) zero one matrix with the property that every entry can be determined using only the list of row and column sums.

An example of a square lonesum matrix is:

0 1 1
0 0 1
1 1 1

because it is possible to write it, with its row and column sums, as:
0 1 1 | 2
0 0 1 | 1
1 1 1 | 3
------+
1 2 3

and then to hide the entries:
* * * | 2
* * * | 1
* * * | 3
------+
1 2 3

and to logically figure out what every entry must have been, using only the row and column sums and the fact that the entries are either 0 or 1.

## LU Factorization

The LU factorization of a matrix is a decomposition of the form:

A = P * L * U,
involving a permutation matrix P, a unit lower triangular matrix L, and an upper triangular matrix U.

P records the pivoting operations carried out in Gauss elimination, L the row multipliers used during elimination and U contains the pivot values and other information.

The factors are typically computed via Gauss elimination. Once the factors are computed, they may be used to

• solve one or more linear systems A*x=b;
• solve one or more transposed linear systems A'*x=b;
• compute the inverse;
• compute the determinant.

The LU factorization is generally computed only for a nonsingular square matrix, but the LU factorization exists and is useful even if the matrix is singular, or rectangular.

While the LU factorization is usually quite satisfactory, it is possible to prefer a factorization of the form

A = P * L * U * Q,
by using Gaussian elimination with complete pivoting, in which case Q is a permutation matrix selecting the variable (or column) to be eliminated at each step. Another rival factorization is the QR factorization
A = Q *R,
which is slightly more expensive, but has better stability and accuracy properties.

LAPACK and LINPACK include routines to compute the LU factorization of a matrix stored in a variety of formats. Note that the P, L and U factors themselves are scrambled and compressed in the data storage used by these routines, so that it's difficult to determine their actual values.

## M Matrix

An M matrix is a (real) (square) invertible matrix whose offdiagonal elements are nonpositive, and whose inverse is a nonnegative matrix.

An M matrix is also called a Minkowski matrix.

There are many definitions of an M matrix. Another one is that a matrix A is an M matrix if there exists a nonnegative matrix B, with a maximal eigenvalue r, such that

A = c * I - B
where c >= r. From this definition, it should be clear that an M matrix must have a nonnegative diagonal, and nonpositive offdiagonal.

Facts about an M matrix A:

• All of the eigenvalues of A have nonnegative real part; (and if a matrix has nonpositive offdiagonal elements, and all eigenvalues have nonnegative real part, it is an M matrix);
• Every real eigenvalue of A is nonnegative;
• Every principal submatrix of Ais an M matrix;
• Every principal minor matrix of A is nonnegative;
• The inverse of A is a nonnegative matrix.

A symmetric M matrix is called a Stieltjes matrix. A Stieltjes matrix is positive definite.

## Magic Square

A magic square can be regarded as a square matrix with an associated constant mu such that all row sums and column sums are equal to mu. In most cases the main and antidiagonal elements also sum to mu.

Most magic squares of order n are made from the consecutive integers from 1 to n2. In this case, it is common that an entry k and its "complement" n+1-k are located symmetrically with respect to the center of the matrix. Such a magic square is called regular.

Example:

1 14 15  4
12  7  6  9
8 11 10  5
13  2  3 16

When a magic square is regarded as a matrix, it may exhibit some interesting properties. For instance, if we let J be the Exchange matrix, then

A + J * A * J = 2 * ( mu / n ) * E
where E is the matrix all of whose entries are 1.

Because the entries of A are positive, the Perron Frobenius theorem guarantees that there is a positive eigenvalue, equal to the spectral radius of the matrix, and which is a simple eigenvalue (having algebraic multiplicity of 1), and an associated eigenvector all of whose entries are positive.

In fact, the dominant simple eigenvalue is mu, and the associated positive eigenvector is e=(1,1,...,1)'.

## Matrix Exponential

The matrix exponential of a square matrix A is a matrix B(A,t) = exp ( A * t ), which has properties similar to those of the exponential function of a scalar argument.

In particular:

• B(A,0) = I, the identity matrix;
• d B(A,t)/dt = A * B(A,t);
• Lambda is an eigenvalue of A if and only if exp ( Lambda * t ) is an eigenvalue of B(A,t).
• B(A,t) is never singular, for any values of A or t;
• B(A,t) = sum ( 0 <= I < Infinity ) ( A * t )I / I!;
• Inverse ( B(A,t) ) = B(A,-t);
• If A = M * J * Inverse(M) is the Jordan canonical form of A, then B(A,t) = M * exp ( J*t ) * Inverse(M). If J is diagonal, then exp(J*t) is a diagonal matrix whose entries are the exponentials of J(I,I)*t;

If the matrix A can be is orthogonally diagonalizable, so that

A = Q * Lambda * Q',
then, using the power series definition of B(A,t) results in:
B(A,t) = Q * sum ( 0 <= I < Infinity ) ( Lambda * t )I / I! * Q';
or
B(A,t) = Q * exp ( Lambda, t ) * Q',
where exp ( Lambda, t ) is the diagonal matrix whose I-th diagonal entry is exp(lambda(i)), the exponential of the I-th eigenvalue of A.

In the general case where A cannot be orthogonally diagonalized, or where the eigenvalues and eigenvectors cannot be reliably computed, the computation of the matrix exponential is difficult.

## Matrix Factorization

Matrix factorization is the process of rewriting a matrix A as a product of factors with certain special properties. Matrix factorization is a key technique in solving linear systems, determining eigenvalues, and many other tasks.

Useful matrix factorizations include:

## Matrix Multiplication

Matrix multiplication is the computation of the matrix-vector product A * x or the matrix-matrix product A * B, where A and B are (possibly rectangular) matrices, and x is a vector.

A matrix product is only meaningful when the factors are conformable. This is a condition on the dimensions of the factors. If we are computing A * x or A * B, then the column order of A must equal the order of x or the row order of B.

To multiply the L by M array A times the M by N array B, the product C will have L rows and N columns, and a typical entry Ci,j is:

Ci,j = sum ( K = 1 to M ) Ai,k * Bk,j.

Matrix multiplication is not commutative. A*B and B*A are not even of the same order unless both matrices are square. Even if they are square, the two products will general have different numerical values.

Matrix multiplication is associative: (A*B)*C=A*(B*C). While the result is not affected by the order of operations, the amount of work can change, if the matrices are of different orders. For products of a large number of matrices, techniques of dynamic programming are required to determine the most efficient order in which to carry out the operations.

As a computational task, matrix multiplication is relatively expensive. To multiply two matrices of order N takes roughly 2*N3 floating point operations, which is more expensive than Gauss elimination of a matrix of order N. On vector and parallel machines, it is important to write a multiplication algorithm carefully, to take advantage of the potential for speedup.

## Matrix Norm

A matrix norm is a scalar quantity, which may be thought of as a sort of "magnitude" of the matrix. The norm can be used to estimate the effect of multiplying the matrix times a vector, solving a linear system, or other matrix operations. The norm also is used in the analysis of error and convergence

A matrix norm ||*|| must satisfy the following properties:

• ||A|| > 0, unless A = 0 (in which case ||A|| = 0);
• || s * A || = |s| * ||A|| for any real number s;
• || A + B || <= ||A|| + ||B|| (triangle inequality);
• || A * B || <= ||A|| * ||B|| (submultiplicativity).

Matrix norms are most often needed when dealing with combinations of matrices and vectors. In such a case, it is important that the matrix norm and vector norm that are being used are compatible.

Any given vector norm can be used to derive a corresponding matrix norm, guaranteed to be compatible. This matrix norm is known as the vector-bound matrix norm.

Only if the matrix norm and vector norm are compatible can we write a useful bound like:

||A*x|| <= ||A|| * ||x||

Matrix norms include:

The spectral radius rho(A) is not a matrix norm, although it is often thought of that way.

## Matrix Order

The order of a square matrix is the number of rows and columns. Thus a matrix of order 5 is a square matrix with 5 rows and 5 columns.

The order of a rectangular matrix is described by giving both the number of rows and the number of columns. Thus a rectangular matrix might have order 5 by 4.

The order of a matrix refers to a property of the mathematical object, which does not depend on how the information is stored in a computer. The actual numbers representing the matrix may be stored in a rectangular two dimensional array, whose row and column lengths are equal to or greater than the mathematical orders, a rectangular array of lesser size (for band matrix storage, say), or even in a collection of several separate one dimensional arrays.

## Matrix Properties

Matrix properties are any features of a matrix which may be of use in choosing a storage scheme or algorithm, or analyzing convergence or error properties, but which are not immediately evident from the simple arrangement of zeroes in the matrix, or any symmetries among the nonzero elements.

Matrix properties include:

## Matrix Rank

The rank of a matrix is a measure of the linear independence of its rows and columns.

The row rank of a matrix of order M by N is the number of linearly independent rows in the matrix, while the column rank is the number of linearly independent columns. The row rank will be between 0 and M, the column rank between 0 and N. If the row rank is equal to M, the matrix is said to have maximal or full row rank; there are corresponding terms for column rank.

For a "square" matrix, of order N, the row and column ranks will be equal. A square matrix is nonsingular if and only if it has maximal row rank; in other words, if no row is a linear combination of the other rows, and similarly for columns. A square matrix with full rank has an inverse, a nonzero determinant, and Gauss elimination with pivoting can be used to solve linear systems involving the matrix.

Every singular matrix is "almost" nonsingular. That is, using any matrix norm you like, and no matter how small you specify the (positive) tolerance epsilon, there is a nonsingular matrix closer than epsilon to the singular matrix. Another way to look at this is to realize that it is always possible, by making tiny changes to the entries of a singular matrix, to turn it into a nonsingular matrix. (Consider the zero matrix; add epsilons to the diagonal entries, and it's nonsingular.) Thus, given the roundoff implicit in computation, the determination of matrix rank is not a reliable process.

If the rank of a matrix is actually desired, a reasonable method is to compute the QR factorization. Very small diagonal terms in the R factor may indicate linear dependence of the corresponding columns of the matrix. The singular value decomposition will also give this sort of information.

## Matrix Splitting

A matrix splitting is a decomposition of the system matrix:

A = M - N
in order to analyze the behavior of an iterative method for solving the linear system A*x = b. The M matrix is the multiplier of the next iterate, and the N matrix is the multiplier of the current iterate.

For instance, consider the Jacobi iteration. If we decompose the matrix into its strictly lower triangular, diagonal, and strictly upper triangular parts, we can write:

A = L + D + U
The Jacobi iteration can be written as:
D * xnew = - ( L + U ) * x - b
Hence, in the language of matrix splittings,
M = D
N = ( L + U )
.

The matrix splitting gives us a convenient way of expressing the iteration matrix, which is the multiplier of the current iterate in the (explicit) formula for the next one. In terms of the matrix splitting, this iteration matrix always has the form M-1 * N. For instance, for the Jacobi iteration, the explicit formula for xnew is:

xnew = - D-1 * ( L + U ) * x - D-1 * b
and so the iteration matrix is D-1 * ( L + U ).

## Matrix Storage

Matrix storage formats are the schemes for storing the values of a mathematical matrix into memory locations in a computer, or, in some cases, for writing this information to an external file.

A special storage format may be chosen because of the overall matrix structure of zero entries, or because of the matrix symmetry involving nonzero entries.

Common matrix storage formats include:

In addition, many linear programming problems are stored in files that are formatted according to the MPS Storage Format.

## Matrix Structure

Matrix structure is the classification of matrices according to the pattern of its zero entries.

If a matrix has a known structure, it may be possible to use a specialized storage scheme that takes less space, or an algorithm that can execute more quickly.

A matrix about which nothing is known, or which exhibits no special pattern, may be called full or dense or general.

Matrix structure patterns that can occur include:

## Matrix Symmetry

Matrix symmetry classifies certain common patterns that may relate the nonzero values of the matrix.

This classification occurs after the study of the basic matrix structure induced by the pattern of zero elements.

As with matrix structure, any matrix symmetries that are present may influence the choice of storage used for the matrix data, and the algorithms suitable to be applied to the matrix.

Matrix symmetries include:

## Minimal Polynomial

The minimal polynomial of a matrix A is the monic polynomial P(X) of least degree with the property that P(A)=0.

The Cayley-Hamilton theorem asserts that every matrix satisfies its own characteristic equation. In other words, the polynomial

P ( lambda ) = det ( A - lambda * I ),
which is zero when lambda is equal to any of the numbers which are eigenvalues of the matrix A, is equal to the zero MATRIX when lambda is replaced in the explicit formula for P(lambda) by A.

Thus, every matrix A of order N is guaranteed to be the root of a polynomial P(X) of degree N. Therefore the minimal polynomial of A is either the characteristic equation, or else some polynomial Q(X) of degree less than N. As a simple example, the characteristic polynomial of the identity matrix I is XN-1, but the minimal polynomial is X-1.

A matrix is diagonalizable if and only if its minimal polynomial consists of distinct linear factors.

The minimal polynomial of a matrix is used to define a derogatory matrix, which is important when considering the eigenvalues and eigenvectors of a matrix.

## Minor Matrix

A minor matrix is derived from a matrix A by removing some rows and columns. Usually both A and the minor matrix are square, and usually only one row and one column are removed from A at a time.

For instance, if A is

1 2 3
4 5 6
7 8 9

then the minor matrix derived by deleting row 2 and column 1 is
2 3
8 9

The principal minor matrices or principal minors of a square matrix of order N are a set of N matrices of orders 1, 2, ..., N; the M-th matrix has upper left entry A1,1 and lower right entry AM,M.

Simple facts involving minor matrices:

• the cofactor matrix of A is derived by replacing each element of A by determinants of minor matrices;
• the determinant of A can be computed by an expansion that involves determinants of minor matrices;
• If the determinants of all the leading principal minors of a matrix are positive, then the matrix is positive definite.

## Monic Polynomial

A monic polynomial is a polynomial whose leading coefficient is 1.

The leading coefficient, of course, is the coefficient of the highest power of X, (or whatever the independent variable happens to be). Thus, the polynomial X2+3*X+17 is monic, but X2+3*X3 is not.

The main reason for the idea of a monic polynomial is to be able to specify a unique polynomial with a given property. Thus, for instance, there are many polynomials of degree 2 which are zero at 1, 2, and 3, but they are all multiples of each other. In order to make a specific choice, we may specify that we mean the monic polynomial with these properties.

## The MPS Storage Format

The MPS data format is an industry standard for the description of a variety of linear programming problems.

Reference:

1. Bruce Murtagh,
Advanced Linear Programming: Computation and Practice,
McGraw-Hill, 1981.
2. J L Nazareth,
Computer Solution of Linear Programs,
Monographs on Numerical Analysis,
Oxford University Press, 1996.

An MPS file is organized into named sections, which appear in the following order:

1. NAME
2. ROWS
3. COLUMNS
4. RHS
5. RANGES (optional)
6. BOUNDS (optional)
7. ENDATA
Each section begins with a single line containing the name of the section in the Indicator field, and may be followed by a number of data lines, which have a very particular format.

An MPS file may also have comment lines. Such lines may occur anywhere in the file. They begin with an asterisk '*' in column 1. They may contain information intended to explain the problem, or other facts of interest to a human, and of no interest to a program parsing the file.

The data lines of an MPS file are divided into fields of a specific width:
Field 1Field 2Field 3 Field 4Field 5Field 6
Columns 2-35-1215-22 25-3640-4750-61
Contents IndicatorName 1Name 2 Value 1Name 3Value 2
Indicator and Name fields contain names of things; Value fields contain numeric values. The particular use of these fields depends on the section in which the data line occurs.

We now discuss in detail the format of the data lines that occur in each data section.

1. NAME: This section consists of just one card, with the word NAME in columns 1-4, and the title of the problem in columns 15-22.
2. ROWS: In this section all the row labels are defined, as well as the row type. The row type is entered in field 1 (in column 2 or 3) and the row label is entered in field 2 (columns 5-12).

The code for specifying row type is as follows:
TypeSymbolMeaning
= EEqual to.
< LLess than or equal to.
> GGreater than or equal to.
ObjectiveNObjective function.
Free NNo restriction imposed.

This section of data begins with a card with ROWS in columns 1-4, followed by a data card for each row. The first N-type row encountered is regarded as the objective function.

Linear combinations of rows may also be specified. In this case, the above row types are denoted respectively by the codes DE, DL, DG, and DN, in columns 2-3. Field 2 contains the linear combination rowname. Fields 3-6 contain the rowname(s) in fields 3 and 5, and their multiplier(s) in fields 4 and 6. A linear combination of three or more rows requires additional cards, following the first card contiguously. In the additional cards, field 1 is empty. The right-hand sides of a linear combination row must be specified in the RHS section, described below.

3. COLUMNS: This section defines the names of the variable, the coefficients of the objective, and all the nonzero matrix elements Ai,j. The data is entered column by column, and all the data cards for the nonzero entries in each column must be grouped contiguously. The section begins with a card with COLUMNS in columns 1-7, followed by data cards which may have one or two matrix elements per card.

The data card has the column label in field 2 (columns 5-12), the row label in field 3 (columns 15-22), and the value of the coefficient Ai,j or Cj in field 4 (columns 25-36), including a decimal point). If more than one nonzero row entry for the same column is to be made on the card, then field 5 (columns 40-47) has the next row label and field 6 (columns 50-61) has its corresponding coefficient value. The use of fields 5 and 6 is optional.

There is no need to specify columns for slack variables; this is taken care of automatically, based on the definitions of the row types.

4. RHS: This section contains the elements of the right-hand side. The section begins with a card with RHS in columns 1-3. Since the right-hand side can be regarded as another column of the matrix, the data cards specifying the nonzero entries are in exactly the same format as the COLUMNS data cards, except that field 2 (columns 5-12) has a label for the right-hand side. More than one right-hand side may thus be specified in this section.
5. RANGES: (optional). This section is for constraints of the form:
hi <= Ai,1x1 + Ai,2x2 + ... + Ai,nxn <= ui
that is, both an upper and lower bound are specified for the row. The range of the constraint is
ri = ui - hi
The value of ui or hi is specified in the RHS section data, and the value of ri is specified in the RANGES section data. This information, plus the row type specified in the ROWS section, defines the bounds ui and hi.

If bi is the number entered in the RHS section and ri is the number specified in the RANGES section, the ui and hi are defined as follows:
Row Symbol Meaning Sign of ri Lower limit hi Upper limit ui
G >= + or - bi bi+|ri|
L <= + or - bi-|ri| bi
E = + bi bi+|ri|
E = - bi-|ri| bi

The section begins with a card with RANGES in columns 1-6. The data cards specifying the values of ri are in exactly the same format as the COLUMNS data cards, except that field 2 (columns 5-12) has a label for the column of ranges (which can also be regarded as another column of the matrix). More than one column of ranges may be specified, but all the data cards for each column must be grouped contiguously.

6. BOUNDS: (optional). In this section, bounds on the variables are specified. The section begins with a card with BOUNDS in columns 1-6. The bounds are entered as a row, with a corresponding row label. The nonzero entries in this row vector correspond to columns in the matrix and must be in the same order in which the column names appear in the COLUMNS section. When bounds are not specified for a column, or if the entire BOUNDS section is omitted, the usual bounds are assumed,
0 <= xj < infinity

More than one bound for a particular variable may be entered, that is, both a lower and an upper bound; when only one bound is specified the other is assumed to be one of the default values of 0 or infinity as shown in parentheses below.

• Field 1 (columns 2-3) specifies the type of bound:
Symbol Meaning Form of the bound
LO lower bound bj <= xj ( < infinity )
UP upper bound ( 0 <= ) xj <= bj
FX fixed variable xj = bj
FR free variable the value of xj is not restricted.
MI lower bound is negative infinity -infinity <= xj ( <= 0 )
PL upper bound is +infinity (default) ( 0 <= ) xj < infinity
• Field 2 (columns 5-12) specifies the bounds row label. More than one bounds row may be entered, but the data must be grouped together contiguously for each bounds row.
• Field 3 (columns 15-22) specifies the column label (j) corresponding to the variable xj;
• Field 4 (columns 25-36) specifies the bound value bj;
• Note that the handling of the MI type is apparently not standardized. In some implementations, the upper bound is assumed to be +infinity, rather than 0 as in the table above. In such cases, the upper bound may be reset through the use of an UP bound elsewhere in the MPS file.
7. ENDATA: This section is just one card, with ENDATA in columns 1-6, signalling the end of input.

Example: the file afiro.mps.

NAME          AFIRO
ROWS
E  R09
E  R10
L  X05
L  X21
E  R12
E  R13
L  X17
L  X18
L  X19
L  X20
E  R19
E  R20
L  X27
L  X44
E  R22
E  R23
L  X40
L  X41
L  X42
L  X43
L  X45
L  X46
L  X47
L  X48
L  X49
L  X50
L  X51
N  COST
COLUMNS
X01       X48               .301   R09                -1.
X01       R10              -1.06   X05                 1.
X02       X21                -1.   R09                 1.
X02       COST               -.4
X03       X46                -1.   R09                 1.
X04       X50                 1.   R10                 1.
X06       X49               .301   R12                -1.
X06       R13              -1.06   X17                 1.
X07       X49               .313   R12                -1.
X07       R13              -1.06   X18                 1.
X08       X49               .313   R12                -1.
X08       R13               -.96   X19                 1.
X09       X49               .326   R12                -1.
X09       R13               -.86   X20                 1.
X10       X45              2.364   X17                -1.
X11       X45              2.386   X18                -1.
X12       X45              2.408   X19                -1.
X13       X45              2.429   X20                -1.
X14       X21                1.4   R12                 1.
X14       COST              -.32
X15       X47                -1.   R12                 1.
X16       X51                 1.   R13                 1.
X22       X46               .109   R19                -1.
X22       R20               -.43   X27                 1.
X23       X44                -1.   R19                 1.
X23       COST               -.6
X24       X48                -1.   R19                 1.
X25       X45                -1.   R19                 1.
X26       X50                 1.   R20                 1.
X28       X47               .109   R22               -.43
X28       R23                 1.   X40                 1.
X29       X47               .108   R22               -.43
X29       R23                 1.   X41                 1.
X30       X47               .108   R22               -.39
X30       R23                 1.   X42                 1.
X31       X47               .107   R22               -.37
X31       R23                 1.   X43                 1.
X32       X45              2.191   X40                -1.
X33       X45              2.219   X41                -1.
X34       X45              2.249   X42                -1.
X35       X45              2.279   X43                -1.
X36       X44                1.4   R23                -1.
X36       COST              -.48
X37       X49                -1.   R23                 1.
X38       X51                 1.   R22                 1.
X39       R23                 1.   COST               10.
RHS
B         X50               310.   X51               300.
B         X05                80.   X17                80.
B         X27               500.   R23                44.
B         X40               500.
ENDATA

## Multiplicity

The algebraic multiplicity of a root lambda of a polynomial equation p(x)=0 is the "number of times lambda is a root".

More precisely, the algebraic multiplicity is the exponent ma of the factor (x-lambda) in the factorization of p(x). A root is known as a simple root if it has algebraic multiplicity 1; otherwise it is a repeated root.

Eigenvalues are defined as the roots of the characteristic equation of a matrix, and so an eigenvalue has an algebraic multiplicity. The behavior of the eigenvalue problem depends in part on the multiplicity of the eigenvalues. In particular:

• Every (distinct) eigenvalue has a corresponding linearly independent eigenvector;
• If all eigenvalues are simple (ma=1), then the matrix is guaranteed to have a complete set of n linearly independent eigenvectors;
• the number of linearly independent eigenvectors for a given eigenvalue, sybmolized as mg, is known as the geometric multiplicity of the eigenvalue, and represents the dimension of the linear subspace spanned by those eigenvalues;
• If an eigenvalue has algebraic multiplicity ma, it must be the case that 1 <= mg <= ma;
• If, for at least one eigenvalue, mg < ma, then the matrix does not have a complete set of eigenvectors, and is termed defective.

For example, for the matrix

7 1
0 7

7 is an eigenvalue of algebraic muliplicity 2, but there is only a single eigenvector x = ( 0, 1 ), so 7 has a geometric multiplicity of 1.

## The Neumann Series

The Neumann series for a square matrix A is the infinite sum

I + A + A2 + A3 + A4 + ...

If the spectral radius of the matrix A satisfies rho(A) < 1, then I-A is nonsingular, and the Neumann series converges to (I-A)-1.

Conversely, if the Neumann series converges, then the spectral radius of A must be less than 1.

## Nilpotent Matrix

A nilpotent matrix is one for which the square, cube, or some finite power equals zero. For instance, any strictly lower triangular matrix is nilpotent.

Consider the following matrix A and its powers:

0 0 0          0 0 0          0 0 0
A    = 2 0 0   A**2 = 0 0 0   A**3 = 0 0 0
3 4 0          8 0 0          0 0 0

Simple facts about a nilpotent matrix A:

• the lowest power of A which equals 0 must be N or less, where N is the order of the matrix;
• A is singular;
• Every eigenvalue of A is zero.

## Nonnegative Matrix

A nonnegative matrix A has only nonnegative entries; that is, for all indices I and J,

Ai,j >= 0.

Similar terms, with obvious definitions, include matrices that are positive, negative and nonpositive. It's easy to check if a matrix is nonnegative; this is much simpler than checking whether a matrix is positive definite. The expression A >= 0 is sometimes used to express the notion that the matrix A is nonnegative.

Facts about a nonnegative matrix A:

• if x is nonnegative, then the vector A * x is nonnegative;
• every nonnegative power AI is a nonnegative matrix.
• if A is tridiagonal, all the eigenvalues are real;
• A has at least one real, positive, eigenvalue, and a corresponding eigenvector with nonnegative entries;
• if A is also irreducible, then much more is known about its eigensystem. See the Perron Frobenius Theorem.

## Normal Equations

The normal equations for an M by N rectangular linear system

A * x = b
are computed by multiplying both sides by the transpose matrix:
A' * A * x = A' * b.

In the case where N < M, and the N columns of A are linearly independent, the matrix

B = A' * A
will be an invertible symmetric N by N matrix, and the normal equations can be solved by Gauss elimination. The matrix B is sometimes called the Gram matrix.

However, this method of producing an approximate solution of an overdetermined (and usually inconsistent) linear system is usually not recommended. The matrix B has a condition number that is the square of the condition number of A. Hence we may expect our solution to have a considerable error component. We can do better. There are more accurate methods which employ the QR factorization or the singular value decomposition.

## Normal Matrix

A normal matrix A is a real matrix that commutes with its transpose:

A * A' = A' * A;
or a complex matrix that commutes with its transpose conjugate:
A * AH = AH * A.

Why define such an odd property? Because it's easy to check, and it turns out that a matrix is normal if and only if it is unitarily similar to a diagonal matrix. That's the same as saying the matrix is not only diagonalizable (similar to a diagonal matrix), but has a complete set of orthonormal eigenvectors.

A matrix is automatically normal if it is:

Note that an upper or lower triangular matrix is not normal, (unless it is actually diagonal!); such a matrix may have a complete set of eigenvectors (for example, if the eigenvalues are distinct), but the eigenvectors cannot be orthonormal.

## Null Space

The null space of a matrix A is the set of all null vectors x such that

A * x = 0.

Simple facts:

• The null space is a linear space;
• A square matrix is nonsingular if and only if its null space is exactly the zero vector;
• If x is a solution of A * x = b, then so is ( x + y ), where y is any vector in the null space of A; linear solutions are only guaranteed to exist and to be unique when the null space is 0;
• The eigenspace corresponding to an eigenvalue lambda is the null space of ( A - lambda * I ).

## Null Vector

A null vector of a matrix A is a non-zero vector x with the property that A * x = 0.

• A nonzero multiple of a null vector is also a null vector;
• The sum of two null vectors is a null vector;
• The set of all null vectors of a matrix forms the null space of the matrix;
• If x solves A * x = b, and y is a null vector of A, then x + y is another solution, that is, A * ( x + y ) = b;
• A square matrix A is singular if and only if it has a null vector;
• For a square matrix A, a null vector is an eigenvector for the eigenvalue 0.

## Ones Matrix

The ones matrix is a matrix all of whose entries are one.

A square ones matrix of order 2 or greater is singular.

A ones matrix, which can be easily defined in Matlab by

A = ones ( m, n )
is often a convenient tool for various algorithms or matrix definitions.

## Order of a Matrix

When we speak of the order of a matrix, it is usually understood that the matrix is square, with N rows and N columns, and the order is simply the number N.

If the matrix is rectangular, having M rows and N columns, it is less common to speak of its order, but in this case, the order would be the pair of values, that is, (M,N).

## Orthogonal Matrix

An orthogonal matrix A is a square, invertible matrix for which it is true that:

A' = A-1
which implies:
A' * A = A * A' = I.

Facts about an orthogonal matrix A:

• Every row and column of A has unit Euclidean norm;
• Distinct columns (or rows) of A are orthogonal vectors;
• The action of A on a vector is like a rotation or reflection;
• For any vector x, ||A*x||2=||x||2;
• The determinant of A is +1 or -1;
• All eigenvalues of A have unit magnitude;
• The singular values of A are all equal to 1;
• The eigenvectors of A have unit length in the Euclidean norm;
• The product of two orthogonal matrices is an orthogonal matrix;

The QR factorization of a rectangular M by N matrix A, with M>N has two forms:

• Q is M by N, R is N by N;
• Q is M by M, R is M by N;
For both factorizations, it is common to refer to the matrix Q as "orthogonal", but it is important to realize that only a square matrix can be orthogonal. In the first factorization, the rectangular matrix Q has columns that are of unit length and pairwise orthogonal, and it is true that Q'*Q=I, but the rows are generally not of unit length, nor pairwise orthogonal, nor is it true that Q*Q'=I.

In complex arithmetic, the corresponding concept is a unitary matrix.

## Orthogonal Similarity Transformation

An orthogonal similarity transformation is a similarity relationship between two real matrices A and B, carried out by an orthogonal matrix U, of the form:

A = U-1 * B * U = U' * B * U.

A and B are said to be orthogonally similar.

Orthogonal transformations are very common, particularly in eigenvalue computations. A general matrix A may be reduced to upper Hessenberg form by such transformations, so that Q*A=B. The nice thing about this is that if we later wish to reverse this transformation, the inverse of Q, because it is just the transpose of Q! This means that orthogonal transformations are very easy to apply and invert.

Another nice feature of orthogonal transformations is that they may be built up gradually as the product of a series of Householder matrices or Givens rotation matrices.

## Orthogonal Vectors

Two vectors u and v are orthogonal if their dot product is 0:

u' * v = sum ( 1 <= i <= m ) ui * vi = 0,
Of course, we are assuming here that both vectors have the same spatial dimension.

A set of n vectors V is pairwise orthogonal if it is true that every pair of vectors in the set is orthogonal. If an m by n matrix A is formed using pairwise orthogonal vectors for its columns, then the product matrix A'*A is diagonal, with the diagonal entries being the squares of the L2 vector norms of each vector.

Note that, from the definition, the zero vector is orthogonal to every vector, even to itself! Hence, in some cases, it is necessary to phrase a statement as "Let u and v be two nonzero orthogonal vectors..."

## Orthonormal Vectors

A set of orthonormal vectors is a collection of M vectors X(I), each of order N, each of length 1 in the Euclidean norm:

X(I)' * X(I) = ( X(I), X(I) ) = 1,
and pairwise orthogonal, so that if I and J are distinct:
X(I)' * X(J) = ( X(I), X(J) ) = 0.

Given any set of vectors, Gram Schmidt othogonalization or the QR factorization can produce an orthonormal set of vectors, possibly fewer in number, that form a basis for the linear space that is spanned by the original set.

## Outer Product

An outer product of two vectors x of dimension M and y of dimension N is a matrix A of order M by N whose entries are defined by:

A = x * y',
or, entrywise:
Ai,j = xi * yj

A matrix defined as the outer product of two vectors will usually have rank equal to 1. The only other possibility is that the matrix might actually have rank 0. If an outer product of two vectors is added to a matrix, this operation is known as a rank one update.

## Overdetermined System

An overdetermined linear system A * x = b is, loosely speaking, a set of M linear equations in N variables, with M > N.

Actually, it is possible for many of the equations to be redundant, so that what looks formally like an overdetermined system could "actually" be determined or underdetermined, if we'd just throw away the redundant equations. However, just recognizing and removing the redundancy is a difficult task, except for small problems and exact arithmetic. Thus, we will simply use the term "overdetermined" for any situation in which there are more equations than variables.

Except in special artificial cases, an overdetermined linear system has no solution. In such a case, we might be interested in an approximate solution which satisfies as many equations exactly as possible, which we might find simply by using pivoting to choose the equations to be satisfied, or a solution x which minimizes the residual norm A*x-b, which we might find using the QR factorization.

## P Matrix

A P matrix is a square matrix with the property that every principal minor matrix has a (strictly) positive determinant.

A P0 matrix is a square matrix with the property that every principal minor matrix has a determinant that is positive or zero.

A nonsingular M matrix is guaranteed to be a P matrix.

A matrix which is both a P matrix and a Z matrix is guaranteed to be a nonsingular M matrix.

## Permanent of a Matrix

The permanent of a square matrix is a scalar value defined in a way similar to the determinant.

An explicit formula for the permanent of a matrix A is:

permanent ( A ) = sum [ over all P ] A(1,P(1)) * A(2,P(2) * ... * A(N,P(N)).
where the sum ranges over all possible permutations P of the numbers 1 through N. This differs from the definition of the determinant only in that the sign of the permutation is not taken into account.

For example, suppose A is

1 2 3
4 5 6
7 8 9

Then the permanent of A is

permanent(A) = 1 * 5 * 9 + 2 * 6 * 7 + 3 * 4 * 8  (diagonals)
+ 1 * 6 * 8 + 2 * 4 * 9 + 3 * 5 * 7  (antidiagonals)

The trick of doing all diagonals and antidiagonals works for the 3 by 3 case, but no other. When going to the 4 by 4 case, for instance, you would do best by expanding along a row, and computing the permanents of the 3 by 3 minors, similar to the way that determinants of 4 by 4 or 5 by 5 matrices can be computed by (patient) hand.

Determinants are preserved by elementary row or column operations, but permanents are not. Thus, to compute a determinant efficiently, it is actually cheaper to carry out Gauss elimination to the point where we have triangularized the matrix. Then the product of the diagonal elements, times the sign of the permutation, gives us the determinant. But for permanents, this option is not available!

## Permutation Matrix

A permutation matrix is a square matrix for which all the entries are 0 or 1, with the value 1 occuring exactly once in each row and column. Such a matrix, when premultiplying or postmultiplying another matrix, will simply permute the rows or columns of that matrix.

For example, the following is a permutation matrix:

0 1 0
0 0 1
1 0 0

If A is the matrix
11 12 13
21 22 23
31 32 33

then P*A permutes the rows of A:
21 22 23
32 32 33
11 12 13

while A*P permutes the columns:
13 11 12
23 21 22
33 31 32

Simple facts about a permutation matrix P:

• P has a determinant of +1 or -1.
• P' = P-1.
• A permutation matrix that interchanges just two indices is an elementary matrix.

The use of pivoting during Gauss elimination means that along with the LU factors of A there is also a permutation matrix factor P:

P * L * U = A.

Similarly, column pivoting may be used during the QR factorization of a matrix, and in that case, the actual factorization is not A=Q*R, but A=Q*R*P, for some permutation matrix P.

## Perron-Frobenius Theorem

The Perron-Frobenius Theorem tells us a great deal about the largest eigenvalue of a positive or nonnegative matrix.

The Perron-Frobenius Theorem for a positive matrix: If the square matrix A is positive, then

• the spectral radius rho(A) > 0;
• there is an eigenvalue lambda equal to rho(A);
• there is a corresponding positive eigenvector x, normalized so that the sum of its entries is one, called the "Perron vector", so that A * x = lambda * x = rho(A) * x;
• lambda = rho(A) is an algebraically and geometrically simple eigenvalue of A;
• all other eigenvalues of A are strictly smaller in magnitude than lambda;
• [A/rho(A)]m -> L > 0 as m -> infinity, where L=x*y', with x the (right) eigenvector of lambda, and y a left eigenvector of lambda, with the properties that y > 0 and x'*y=1.

The Perron-Frobenius Theorem for a nonnegative matrix: If the nontrivial matrix A is nonnegative, then

• there is an eigenvalue lambda equal to rho(A);
• there is a corresponding nonnegative (and nonzero) eigenvector x, so that A * x = lambda * x = rho(A) * x;

The Perron-Frobenius Theorem for a nonnegative irreducible matrix: If the nontrivial matrix A is nonnegative and irreducible, then in addition:

• the spectral radius rho(A) > 0;
• there is an eigenvalue lambda equal to rho(A);
• there is a corresponding positive eigenvector x, so that A * x = lambda * x = rho(A) * x;
• lambda = rho(A) is an algebraically and geometrically simple eigenvalue of A;

The Perron-Frobenius Theorem for a nonnegative irreducible primitive matrix: If the nontrivial matrix A is nonnegative and irreducible and primitive, then in addition:

• [A/rho(A)]m -> L > 0 as m -> infinity, where L=x*y', with x the (right) eigenvector of lambda, and y a left eigenvector of lambda, with the properties that y > 0 and x'*y=1.

## Persymmetric Matrix

A persymmetric matrix A is a square matrix whose values are "reflected" across its main anti-diagonal, that is, for all I and J:

Ai,j = A(N+1-J,N+1-I).

Here is an example of a persymmetric matrix:

4 3 2 1
7 6 5 2
9 8 6 3
10 9 7 4

## Pivoting

Pivoting is the attempt to improve the accuracy of a linear algebra algorithm by choosing the most suitable column or row for use during a single step.

The most common instance of pivoting occurs in Gauss elimination. During the first step, we wish to eliminate all the entries in column one except for one entry. We will do this by adding carefully chosen multiples of row one to the other rows. For example, if the first two rows were

2 3 5 8
6 8 9 2

we could add -3 times row 1 to row 2, and thus zero out the "6" in the first column of row 2. This scheme would not work, however, if the first entry in the first row were zero, because no multiple of zero can be anything but zero.

Thus, we allow ourselves to interchange the first row with some other row which has a nonzero entry in the first column. But since we're going to interchange rows anyway, it turns out that best accuracy occurs if we choose to bring in the entry with the largest absolute value. Such a scheme, which considers each column in order, and uses the largest entry in the row as the pivot, is called partial pivoting. This form of pivoting is used in most linear algebra software.

It can be shown that Gauss elimination can be carried out without pivoting if each of the principal minors of the matrix is nonsingular. If the matrix is positive definite, then this condition is guaranteed, and it is common to factor the matrix without pivoting.

A complete pivoting or full pivoting scheme would search for the entry of largest magnitude anywhere in the uneliminated portion of the matrix, and use that entry as the next pivot. Such a scheme requires a lot more searching, and auxiliary storage of the row and column numbers of the pivots. It is little used, since it does not seem to bring a great improvement in accuracy.

QR factorization can also use pivoting. At step I, columns I through N are examined. The column with the greatest norm is interchanged with column I, and then the QR operations for that step are carried out. In this case as well, pivoting is done to try to ensure stability.

## Polar Decomposition

The polar decomposition of any matrix A has the form

A = P * Q
where P is positive semidefinite and has the same rank as A, and Q is unitary.

The polar decomposition of a matrix can be determined from its singular value decomposition:

A = U * D * V'
= ( U * D * U' ) * ( U * V' )
= P * Q
where we have
P = U * D * U'
Q = U * V'

## Positive definite matrix

A (complex) matrix A is positive definite if it is true that for every nonzero (complex) vector x, the product:

x* * A * x > 0,
where, in particular, we are requiring that this product be a real number.

A real matrix A is restricted positive definite if it is true that for every nonzero real vector x:

x' * A * x > 0.
If a real matrix is positive definite, then it is restricted positive definite.

Every complex positive definite matrix is guaranteed to be Hermitian. Thus, the phrase positive definite Hermitian is, strictly speaking, redundant. However, a restricted positive definite matrix is not guaranteed to be symmetric! This can happen because we only use real vectors in the test product.

Here is a simple example of a restricted positive definite matrix which is not symmetric:

1  1
-1  1

because x' * A * x = x12+x22

Whenever real positive definite matrices occur in practical applications, however, they are assumed or required to be symmetric. If necessary, one can decompose a positive definite matrix in the restricted sense into its symmetric and antisymmetric parts. The symmetric part will actually be (fully) positive definite.

Simple facts about a positive definite matrix A:

• A is nonsingular;
• The inverse of A is positive definite;
• every eigenvalue of A is positive;
• Gauss elimination can be performed on A without pivoting;
• if A is symmetric as well, it has a Cholesky factorization A = L * L', where L is lower triangular.

A matrix about which no such information is known is called indefinite. A matrix for which the product is always nonnegative is sometimes called positive indefinite or positive semidefinite. There are similar definitions of negative definite and negative indefinite.

## Positive Matrix

A positive matrix A has only positive entries; that is, for all indices I and J,

Ai,j > 0.

Similar terms, with obvious definitions, include matrices that are nonnegative, negative and nonpositive. It's easy to check if a matrix is positive; this is much simpler than checking whether a matrix is positive definite.

Simple facts about a positive matrix A:

## The Power Method

The power method is a simple iterative algorithm for computing the eigenvalue of largest magnitude in a matrix, and its corresponding eigenvector.

The algorithm for an arbitrary matrix A works as follows:

• Initialize the vector x arbitrarily;
• Divide each entry of x by ||x||.
• Compute xnew = A * x.
• Compute lambda = ||xnew|| / ||x||.
• If the estimated eigenvalue has changed significantly since the last iteration, then replace x by xnew, and repeat the preceding steps.

The speed of convergence of the algorithm depends largely on the ratio between the magnitude of the largest and the second largest eigenvalues. If the second largest eigenvalue is close in magnitude to the largest, convergence will be slow. If A has several distinct eigenvalues of the same magnitude, say -2 and 2, or 5, 3+4i and 3-4i, then the algorithm will fail.

While this algorithm is very easy to program, it is difficult to adapt it to the case where other eigenvalues are desired. The use of deflation is a possibility, but still forces the user to compute the eigenvalues one at a time, and in order of size, making it rather difficult to find, say, just the smallest eigenvalue.

The inverse power method, a generalization of the power method, has the advantage of being able to find the other eigenvalues of the matrix.

## Preconditioner

A preconditioner is a matrix M most commonly used to improve the performance of an iterative method for linear equations.

The simplest preconditioning is done using a left preconditioner. The original linear system is left-multiplied by the inverse of the preconditioning matrix, resulting in the system

M-1*A*x=M-1*b
For more complicated preconditioning, both left and right preconditioners, M1 and M2 may be used, so that the system is transformed to
M1-1*A*M2-1*M2*x=M-1*b
A left and right preconditioning like this can be used if the original matrix is symmetric, and it is desired to preserve this symmetry in the transformed system.

The convergence of the iterative scheme depends on properties of the system matrix. A suitable choice of a preconditioner can mean that the transformed problem converges much more rapidly than the original one.

Although the definition of the preconditioner matrix suggests that we have to compute its inverse, it is usually the case that we have a factorization of the preconditioner which allows us to solve linear systems. Any computations that formally involve the inverse can therefore be replaced by linear system solves.

Desirable properties of a preconditioner matrix include:

• M should approximate A in some way;
• It should be significantly easier to solve linear systems involving M than A (otherwise the problem has just gotten harder);
• M should not require much more storage space than A.

Examples of preconditioners include

## Primitive Matrix

A primitive matrix is a (square) matrix which is irreducible and which has only one eigenvalue of maximum modulus.

Note that we are not requiring that the eigenvalue be simple (that is, of multiplicity 1). Rather, we are requiring that there not be two distinct eigenvalues of the same modulus.

The Perron Frobenius Theorem applies to positive matrices. It is possible to extend most of the Perron-Frobenius theorem to a nonnegative matrix, if it is assumed to be a primitive matrix.

## Projection Matrix

A matrix A is a projection matrix if it is idempotent and hermitian (or symmetric in the real case).

A projection matrix is sometimes referred to as an orthogonal projection matrix, but in this case, the matrix is not generally what is termed an orthogonal matrix; rather, what is being referred to is the fact that A may be used to determine the projection of a vector x onto the orthogonal complement of a given subspace. Thus, the vectors A*x and (I-A)*x comprise a decomposition of x into two orthogonal components, one lying in the space spanned by the columns of A, and one in the space orthogonal to that one.

Simple facts about a projection matrix A:

• I - A is also a projection matrix;
• A * A * x = A * x (idempotent);
• A may be written as A = sum ( 1 <= i <= k ) vi viH, where the vi are an orthonormal set of vectors that span the range of A.

Given a set of k linearly independent vectors vi, with F=[v1, v2,...,vk] then the projection onto the space spanned by these vectors is

A = F * inverse ( FH * F ) * FH

The previous statement be rephrased as follows: Given an m by n rectangular matrix F, with m>=n, with maximal column rank (that is, the columns are linearly independent), then the matrix

A = F * inverse ( FH * F ) * FH
can be used to determine the projection of any vector onto the column space of F.

If, in fact, the columns of F happen to be of unit L2 norm and pairwise orthogonal, (that is, the columns are an orthonormal set of vectors), then FH * F = I, so the formula for the projector simplifies to:

A = F * FH
This particular formulation often occurs when a QR factorization has been used to extract an orthonormal basis for the column space.

Note, in particular, that it must be the case that A * F = F. When solving the overdetermined linear system,

F * x = b
it is natural to consider the related projected linear system
A * F * y = F * y = A * b
which can be solved exactly, because A*b lies in the column space of F. Moreover, this solution y is the best solution to the original problem F * x = b, in the least squares sense, that is, it achieves the residual of minimum L2 norm. The matrix A is sometimes called the hat matrix for matrix F. It is often used in statistics; in some cases, the diagonal entries of A are of considerable interest.

In the simple case where we are given one (nonzero) vector v, then F=v, and the projection onto the space spanned by v is

A = v * inverse ( v' * v ) * v' = v * v' / ||v||2
using the l2 norm; hence the matrix that projects a vector into the orthogonal complement of v is is
I - A = I - v * v' / ||v||2,
or, if v is already of unit l2 norm, simply I-v*v'.

## Property A

A matrix A has property A if the indices 1 to N can be divided into two sets S and T so that, for any Ai,j which is not zero, it must be the case that:

• I = J, or
• I is in S and J is in T, or
• I is in T and J is in S.

This is equivalent to saying that, by listing the S rows and columns first, the matrix can be rewritten in the block form:

| D1  F  |
| G   D2 |

where D1 and D2 are diagonal.

## The Pseudoinverse

The pseudoinverse is a generalization of the idea of the inverse matrix, for cases where the standard inverse matrix cannot be applied. Such cases include matrices A which are singular, or rectangular.

The pseudoinverse is sometimes called the Moore Penrose inverse or the generalized inverse.

The pseudoinverse of an M by N rectangular matrix A is defined as the unique matrix N by M matrix A+ which satisfies the four conditions:

A * A+ * A = A
A+ * A * A+ = A+
(A * A+)' = A * A+
(A+ * A)' = A+ * A
Note that if A is a square invertible matrix, then the pseudo inverse is actually the inverse.

The pseudoinverse can be used in a way similar to the way an inverse is used. For instance, given the rectangular set of linear equations

A * x = y
a "solution" can be computed as:
x = A+ * y.
If the equations form a consistent system, then x will actually satisfy the equations. Otherwise, x will be a "best possible" solution, in the sense that it minimizes the Euclidean norm of the residual error.

The pseudoinverse can be computed from the information contained in the singular value decomposition, which has the form:

A = U * S * V'
where
• A is an M by N rectangular matrix,
• U is an M by M orthogonal matrix,
• S is an M by N diagonal matrix,
• V is an N by N orthogonal matrix.
The formula for the pseudoinverse of A is then:
A+ = V * S^ * U'
where S^ is an M by N diagonal matrix, whose diagonal entries are the inverse of the diagonal entries of S, except that where S has zero entries, so does S^.

## QR Factorization

The QR factorization factors a matrix A into an orthogonal matrix Q and an upper triangular matrix R, so that

A = Q * R.
The factorization can also be applied to rectangular matrices, in which case one of the factors is no longer square.

The QR factorization can be useful for solving the full variety of linear systems, whether nonsingular, under-determined, over-determined or ill conditioned. It can be used to carry out the Gram Schmidt orthogonalization of a set of vectors constituting the columns of A. The QR factorization is also used repeatedly in an iterative solution of eigenvalue problems.

The QR factorization can be produced incrementally, by a series of transformations involving Householder matrices or Givens rotation matrices

As an example of QR factorization, the matrix A:

1 1 0
1 0 1
0 1 1

can be factored into the orthogonal matrix Q:
SQRT(1/2)  SQRT(1/6) -SQRT(1/3)
SQRT(1/2) -SQRT(1/6)  SQRT(1/3)
0       SQRT(2/3)  SQRT(1/3)

and the upper triangular matrix R:
SQRT(2) SQRT(1/2) SQRT(1/2)
0    SQRT(3/2) SQRT(1/6)
0       0      SQRT(4/3)

LAPACK and LINPACK include routines for computing the QR factorization.

## QR Method for Eigenvalues

The QR method is used to find the eigenvalues of a square matrix.

For efficiency, the method begins by transforming the matrix A using Householder matrices until we have determined an upper Hessenberg matrix A1 which is orthogonally similar to A.

The QR factorization of A1 is then computed, A1 = Q1*R1. Then the factors Q1 and R1 are reversed, to compute a new matrix A2 = R1*Q1. A2 itself can be QR factored into A2 = Q2*R2, and then reversal of factors results in a matrix A3, and so on.

Each matrix A1, A2, and so on is orthogonally similar to A, and so shares the same eigenvalues. The sequence of matrices A, A1, A2, A3, ... will generally converge to a matrix B of a very simple type: B will consist entirely of diagonal entries and two by two blocks. The diagonal entries are the real eigenvalues, and the eigenvalues of the 2 by 2 blocks are the complex eigenvalues.

In the particular case where the eigenvalues are all real, (and hence B is diagonal), and if the orthogonal transformations have been accumulated along the way, the result can be written:

A = Q * B * Q'
which is an orthogonal similarity transformation of A into a diagonal matrix, or
A * Q = Q * B
which means that the columns of Q are the eigenvectors.

If the matrix B is not diagonal, then the pair of columns corresponding to any 2 by 2 block can be used to construct the pair of corresponding complex eigenvectors. Alternatively, once the eigenvalues have been determined, the inverse power method can be used to compute any particular eigenvector.

## Quaternion Representation

Quaternions, discovered by William Hamilton, have the form a+bi+cj+dk, where i, j and k are "special" quantities.

The properties of "1" and the 3 special quantities are best displayed in a multiplication table:
1ijk
11ijk
ii-1k-j
jj-k-1i
kkj-i-1

It is possible to devise matrices that behave like quaternions. Let the value "1" be represented by the identity matrix of order 4, and the value "i" be represented by

0  1  0  0
-1  0  0  0
0  0  0 -1
0  0  1  0

the value "j" by
0  0  1  0
0  0  0  1
-1  0  0  0
0 -1  0  0

the value "k" by
0  0  0  1
0  0 -1  0
0  1  0  0
-1  0  0  0

Then it is easy to show that these matrices, and linear combinations of them, obey the rules of quaternions.

## Rank One Matrix

A rank one matrix is simply a matrix with rank one. The simplest example would be any matrix with a single nonzero entry. Another simple example is any matrix whose columns (or rows) are all equal (and not entirely zero). Finally, any matrix whose colums (or rows) are all a multiple of the first column (or row) is a rank one matrix.

These matrices are not simply a peculiarity. They can be represented in an interesting way. An M by N rank one matrix A must have the representation

A = u * v'
where u is an M by 1 vector and v is an N by 1 vector. This linear algebraic representation means that rank one matrices can be used to analyze a number of interesting structures, including the singular value decomposition.

## Rank One Update

A rank one update to a matrix is a simple but useful perturbation whose effects can be calculated.

If we suppose that we have an m by n matrix A, then a rank one update is accomplished by adding the outer product of an m vector u and an n vector v:

B = A + u v'

If the matrix A is actually square, then the rank of A only changes by +1, -1 or 0 when a rank one update is applied. If A is actually invertible, then the Sherman Morrison Formula shows how to cheaply compute the inverse of the updated matrix.

If the matrix A is the identity matrix, then we have

det ( I + u v' ) = 1 + v'u.

## Rayleigh Quotient

The Rayleigh quotient of a matrix A and vector x is

R(A,x) = (x'*A*x) / (x'*x).

If the matrix A is symmetric, then the Rayleigh quotient is an excellent estimate of the "nearest" eigenvalue; in particular,

Lambdamin <= R(A,x) <= Lambdamax

On the other hand, if the matrix is not symmetric, then we cannot guarantee that the eigenvalues will be real; moreover, we cannot guarantee that the eigenvectors will be orthogonal. Thus for a general matrix, the information obtained from the Rayleigh quotient must be interpreted more carefully.

A frequent use of the Rayleigh quotient is in the inverse power method. Starting with approximations for the eigenvalue and eigenvector, one step of the inverse power method gives an improved estimate of the eigenvector. Then the Rayleigh quotient improves the eigenvalue estimate, and will be used to shift the next step of the inverse power method. This pair of steps can be rapidly convergent.

If the matrix A is actually positive definite symmetric, then the quantity (y'*A*x) has all the properties of an inner product. In that case, the Rayleigh quotient may be generalized to involve any pair of vectors:

R2(y,A,x) = ( y'*A * x ) / ( y'*x ).

If the matrix is not symmetric, but we still want a sensible way to estimate the eigenvalues, then the "Unsymmetric Rayleigh Quotient" could be defined as:

URQ(A,x) = sqrt ( ( ( A * x)'*(A * x) ) / ( x'*x ) ).
This amounts to the square root of the Rayleigh quotient for the symmetric matrix A'*A, but can also be regarded as the ratio of the L2 norms of A*x and x.

## Rectangular Matrix

A rectangular matrix is a matrix which is not "square", that is, a matrix whose row order and column order are different.

While many operations and algorithms of linear algebra only apply to a square matrix, a rectangular matrix does have an LU factorization, and a singular value decomposition.

A rectangular matrix can occur when solving an under-determined or over-determined linear system.

## Reflection Matrix

A reflection matrix A has the property it carries out the reflection (or negation) of the portion of every vector that is perpendicular to some hyperplane, while leaving the parallel portion of the vector unchanged.

A reflection matrix A is involutory, that is,

A * A = I,
which strongly restricts the eigenvalues.

Examples of reflection matrices include the identity matrix, a diagonal matrix whose diagonal entries are +1 or -1, any matrix which rotates two coordinate axes by 180 degrees, and the Householder matrices.

Simple facts about a reflection matrix A:

• A = A-1;
• the matrix I - A is an idempotent matrix;
• All eigenvalues of A have magnitude 1.

## Residual Error

The residual error is a measure of the error that occurs when a given approximate solution vector x is substituted into the equation of the problem being solved.

For a system of linear equations, the residual error is the vector r defined as

r = b - A * x

For the eigenvalue problem, the residual error is a vector which is a function of the approximate eigenvector x and the approximate eigenvalue lambda:

r = lambda * x - A * x.

When carrying out an iterative solution process, it is common to compute the residual error for each new approximate solution, and to terminate the iteration successfully if the vector norm of the residual error decreases below some tolerance.

An important fact to realize is that the residual error is not, by itself, a reliable estimate of the error in the solution of a linear system. If the residual error has small norm, we can really only hope that the solution error (between our computed x and the true solution x*) is small. We need to know the norm of the inverse of the matrix, in which case the following restriction holds:

||x - x*|| <= ||A-1|| * || A ( x - x* ) ||
= ||A-1|| * || A * x - b + b - A * x* ||
= ||A-1|| * ||r||,
so that when the norm of the residual error, ||r|| is small, we have a precise upper bound on the error in the solution. Since such matrix norms are generally not known or computable, what we really have is a promise of continuity in the errors: as we drive the residual down, we are forcing down the upper limit on the approximation error.

## Root of Unity

An N-th root of unity is any complex number W such that WN = 1.

For a given N, there are N such roots, which can be summarized as:

THETA = 2 * pi / N
WJ = cos ( ( J - 1 ) * THETA ) + i * sin ( ( J - 1 ) * THETA )

Roots of unity are especially useful in discussing Fourier matrices and Circulant matrices.

## Rotation

A rotation is a linear transformation which preserves (Euclidean) distances.

Because a rotation R is a linear transformation, the value of R * 0 must be 0; in other words, the origin does not move. Because a rotation preserves distances, it must be the case that, in the Euclidean vector norm,

|| R * x || = || x ||
for every vector x in the space. From this fact, we can conclude that:
• R has an L2 matrix norm of 1;
• every eigenvalue of R has magnitude 1.

Common examples of matrices which embody rotations include:

## Row Echelon Form

Row echelon form is a special matrix structure which is usually arrived at by a form of Gauss elimination.

Any matrix, including singular and rectangular matrices, can be transformed into this form, using a series of elementary row operations. Once the form is computed, it is easy to compute the determinant, inverse, the solution of linear systems (even for underdetermined or overdetermined systems), the rank, and solutions to linear programming problems.

Moreover, the process can be considered a matrix factorization, of the form

A = B * E
where B is nonsingular, and E is in row echelon form.

A matrix (whether square or rectangular) is in row echelon form if:

• Each nonzero row of the matrix has a 1 as its first nonzero entry.
• The leading 1 in a given row occurs in a column to the right of the leading 1 in the previous row.
• Rows that are completely zero occur last.

A matrix is in row reduced echelon form if it is in row echelon form, and it is also true that:

• Each column containing a leading 1 has no other nonzero entries.

Row echelon form is primarily of use for teaching, and analysis of small problems, using exact arithmetic. It is of little interest numerically, because very slight errors in numeric representation or arithmetic can result in completely erroneous results.

## Row Rank

The row rank of a matrix is the number of linearly independent rows it contains.

The matrix

1  2  3  4
5  6  7  8
9 10 11 12

has row rank 2, because row 3 is equal to twice row 2 minus row 1.

For any matrix, the row rank is the same as the column rank. This common value is also equal to the rank of the matrix, which is defined for both square and rectangular matrices.

For a square matrix of order n, the rank is a number between 0 and n. A square matrix whose rank is n is said to be nonsingular. For a rectangular matrix of order m by n, the rank must be a number between 0 and the minimum of m and n. Rectangular matrices which attain this maximal value are not called "nonsingular". Instead, they are said to have full row rank, full column rank, or maximal rank.

## Row Space

The row space of an M by N matrix A is the set of all possible linear combinations of rows of A.

If the N vector v is a linear combination of rows of A, then there is a M vector c of coefficients with the property that

v = c * A
In other words, the row space is the set of all possible results of premultiplying A by an arbitrary vector.

## Row Sum

The row sum vector of an M by N matrix A is the vector R of length M whose I-th entry is the sum of the elements of row I of the matrix.

A matrix is said to have constant row sum if all the elements of R are equal.

The operation of computing the row sum vector R is equivalent to multiplying the matrix A times a vector X whose entries are all 1:

R = A * X

Therefore, if a square matrix A has a constant row sum r, then A has the eigenvalue r with eigenvector X:

A * X = R = r * X

Matrices with constant row sum include

## The Schur Complement

Suppose we are working with a block matrix Aof the form

A11 A12
A21 A22

where A11 is square and nonsingular. The Schur complement of A11 in A is the matrix

S = A22 - A21 A-111 A12

The Schur complement can be thought of as the data that would replace A22 after Gauss elimination (without pivoting) had been carried out on the matrix A, up to the point where A11 had been triangularized.

## The Schur Decomposition

The Schur decomposition is a decomposition of a square matrix. For a complex matrix, the decomposition involves a unitary factor, while for a real marix, there is an orthogonal factor.

The eigenvalue / eigenvector system for a matrix, when it exists, results in a decomposition that includes a diagonal factor (the eigenvalues); the corresponding factor in the Schur decomposition is a tridiagonal factor. However, in many cases the eigen decomposition does not exist for a given matrix, while the Schur decomposition can always be formed, as long as the matrix is square. For a more general and more useful decomposition, see the singular value decomposition.

### The Unitary Schur Decomposition of a Complex Square Matrix

The Schur decomposition of a complex square matrix A is a factorization of the form

A = U * T * U*
where T is an upper triangular matrix, and U is unitary. The decomposition is also known as the Schur normal form. The form of the decomposition shows that the matrix T is unitarily similar to A.

Some facts about the Schur decomposition U * T * U* of a complex square matrix A:

• Every A has such a Schur decomposition;
• The eigenvalues of A are the diagonal elements of T;
• If A is Hermitian, then the triangular matrix T is actually a diagonal matrix.
• A is normal if and only if T is diagonal.
• If T is diagonal, then the rows of U are the eigenvectors of A.

### The Orthogonal Schur Decomposition of a Real Matrix

If A is a real matrix, then it has the unitary Schur decomposition described above, but it also has an orthogonal decomposition that is "almost" upper triangular:

A = Q * T2 * Q'
where T2 is a block upper triangular matrix, and Q is an orthogonal matrix.

Some facts about the Schur orthogonal decomposition Q * T2 * Q' of a real square matrix A:

• Every real square matrix A has a Schur orthogonal decomposition;
• The diagonal blocks of T2 are either singletons, corresponding to real eigenvalues, or 2 by 2 blocks, corresponding to complex conjugate pairs of complex eigenvalues.

### The Orthogonal Schur Decomposition of a Real Matrix with Real Eigenvalues

If A is a real square matrix, and has only real eigenvalues, then the matrix T2 in the orthogonal Schur decomposition must actually be a diagonal matrix, and so the decomposition has the form:

A = Q * LAMBDA * Q'

Some facts about the Schur decomposition Q * LAMBDA * Q' of a real square matrix A with only real eigenvalues:

• Every real square matrix has a Schur decomposition;
• The eigenvalues of A are the diagonal elements of LAMBDA;
• The rows of Q are the eigenvectors of A.

## Sherman Morrison Formula

The Sherman Morrison formula applies to a matrix which has been perturbed in a simple way, and produces the inverse, or the solution of a linear system for the perturbed system, based on information already computed for the original system.

The perturbation of the coefficient matrix A is assume to be of the form:

C = A + u * v',
where the perturbation is an outer product of the vectors u and v. Modifying a matrix in this way is sometimes called a rank one update. Changing a single entry Ai,j of a matrix can be viewed in this way, for instance.

The Sherman Morrison formula for the inverse of the perturbed matrix is:

C-1 = A-1 - z * w' / ( 1 + alpha ),
where:
z = A-1 * u;
w = (A-1)' * v;
alpha = v' * z.

In the common case where the solution of C * x = b is desired, and the LU factorization of A has been computed, so that linear systems involving the original matrix A can be easily solved, the procedure is:

solve A * z = u for z;
solve A' * w = v for w;
set alpha = v' * z;
set beta = w' * b;
solve A * x = b for x;
adjust x := x - beta * z / ( 1 + alpha ).

The method will fail if 1 + alpha is 0, which can indicate singularity of the matrix C or a more technical problem.

## Sign Symmetric Matrix

A sign symmetric tridiagonal matrix A is one for which the signs of every pair of corresponding off-diagonal entries are equal:

sign ( A(I,I+1) ) = sign ( A(I+1,I) )
for I = 1 to N-1. (A value of zero matches any sign of the other entry.)

A symmetric tridiagonal matrix is always sign symmetric. Some EISPACK routines handle the case of a sign symmetric tridiagonal matrix directly. A matrix is strictly sign symmetric if it is sign symmetric and zero entries only occur in pairs.

The following tridiagonal matrix is not symmetric, is sign symmetric, and not strictly sign symmetric:

1  2  0  0  0
3  4 -7  0  0
0 -3  4  2  0
0  0  5  3  0
0  0  0  1  9

## Similar Matrix

Two matrices A and B are similar if B is related to A by a matrix P in the following way:

B = P-1 * A * P.
In this case, P is said to be the similarity transformation matrix.

Matrices which are similar have the same eigenvalues. Special cases include the similarity matrix P being an elementary transformation matrix, or an orthogonal matrix or a unitary matrix.

Many algorithms try to improve speed or efficiency by using similarity transforms on an input matrix A, so as to find a simpler matrix B for which the problem can be more easily or more quickly solved. It may then be necessary to take the answer for the problem about B and "backtransform" it to an answer for the problem about A. For example, if we get the eigenvalues and eigenvectors of B, A will have the same eigenvalues, but will have different eigenvectors, related by the similarity transform.

Every symmetric matrix A is orthogonally similar to a diagonal matrix. Since the inverse of an orthogonal matrix is its transpose, this relationship may be written

B = Q-1 * A * Q = Q' * A * Q.

## Simple Eigenvalue

A simple eigenvalue is an eigenvalue, or root of the characteristic equation, which has the property that it is not a repeated root. Another way of stating this is that a simple eigenvalue has an algebraic multiplicity of 1.

Since every eigenvalue has at least one eigenvector, and the geometric multiplicity is never greater than the algebraic multiplicity, we have that a simple eigenvector is associated with a 1-dimensional eigenspace.

A semisimple eigenvalue is an eigenvalue for which the algebraic and geometric multiplicities are equal. A semisimple eigenvector is associated with a K-dimensional eigenspace, and there are K linearly independent eigenvectors. Of course, any simple eigenvalue is also a semisimple eigenvalue.

A matrix is diagonalizable if and only if every eigenvalue is semisimple.

The remaining possibility, called a nonsimple eigenvalue, is one for which the algebraic multiplicity is greater than one, and the geometric multiplicity is smaller than the algebraic multiplicity. In this case, the eigenvalue is associated with a K-dimensional eigenspace, but there are less than K linearly independent eigenvectors.

## Singular Matrix

A singular matrix is a square matrix that does not have an inverse.

Facts about a singular matrix A:

• A has a null vector.
• A has a zero eigenvalue.
• The determinant of A is zero.
• There is at least one nonzero vector b for which the linear system A*x=b has no solution x.
• The singular linear system A*x=0 has a nonzero solution x.

## Singular Value Decomposition

The singular value decomposition of a real rectangular M by N matrix A is a factorization of the form:

A = U * S * V'
where, in the "reduced SVD" or EISPACK version:
• U is an M by N matrix with columns that are pairwise orthogonal;
• S is an N by N diagonal matrix, containing the nonnegative singular values of A;
• V is an N by N orthogonal matrix.
or, in the "standard SVD" used by LINPACK and LAPACK:
• U is an M by M orthogonal matrix;
• S is an M by N rectangular matrix whose diagonal contains the nonnegative singular values of A;
• V is an N by N orthogonal matrix.

The EISPACK version of the factorization is sometimes called "the economy size" since it omits information that actually doesn't have an effect on the result.

The LINPACK/LAPACK form has the advantage that U and V are both orthogonal, and S retains the "shape" of A. Moreover, this format allows us to consider U and V to be composed of left and right singular vectors, in analogy to the eigenvalue factorization (when it exists) of square matrix. Also, the extra information in the full SVD gives a basis for the null space of A.

One use for the singular value decomposition is that it allows us to "solve" any linear system A*x=B, whether singular or nonsingular, overdetermined or undetermined. Except for the nonsingular square case, the solution is only done in a generalized sense. In particular, if there are multiple solutions, then the singular value decomposition can be used to find the solution of minimum l2 norm. If there are no exact solutions, then the singular value decomposition can be used to find a "solution" for which the residual has minimum l2 norm.

The generalized solution of A * x = b is equal to

x = V * S^ * U' * b
where S^ is a diagonal matrix whose entries are the inverse of the nonzero entries of S, or 0 when the corresponding entry of S is zero.

For any column I no greater than the minimum of M and N, let ui be the I-th column of U, and vi be the I-th column of V, and sii be the I-th diagonal element of S. Then it is a fact that

A * vi = sii * ui
and
A' * ui = sii * vi
which allows us to conclude that
A * A' * ui = sii * sii * ui
and
A' * A * vi = sii * sii * vi
In other words, U, V and S contain information about the eigenvalues and eigenvectors of A * A'.

Conversely, if we know the eigenvalues and eigenvectors of A * A', then we know the squares of the singular values of A, and the left singular vectors of A (the U matrix).

The columns of U and V' can be thought of as forming a sequence of min ( M,N ) rank one matrices A1, A2, and so on, with

A1 = U(1:N,1) * S(1,1) * V(1:M,1)',
A2 = U(1:N,2) * S(2,2) * V(1:M,2)',...
These rank one matrices have the property that they sum up to A. Moreover, A1 is the rank one matrix which best approximates A in the l2 norm, while A1+A2 is the best rank two approximant, and so on. This property means that initial column subsets of U and V' can be used for a reduced order model of A.

The singular value decomposition can be used to construct the the pseudoinverse of the rectangular or singular matrix A.

Routines for singular value decomposition are included in EISPACK, LAPACK, and LINPACK.

## Skew CentroSymmetric Matrix

A skew centrosymmetric matrix is one which is antisymmetric about its center; that is,

Ai,j = - Am+1-i,n+1-j

Example:

1  10   8  11  -5
-13   2   9   4 -12
-6  -7   0   7   6
12  -4  -9  -2  13
5 -11  -8 -10  -1

Facts about a skew centrosymmetric matrix A:

• If lambda is an eigenvalue of A, then so is -lambda;
• If x is an eigenvector of A, then so is J*x where J is the Exchange matrix;

## Skew Circulant Matrix

A matrix A is skew circulant if it can be formed by negating the strict lower triangle of a circulant matrix.

Example:

a  b  c  d  e  f
-f  a  b  c  d  e
-e -f  a  b  c  d
-d -e -f  a  b  c
-c -d -e -f  a  b
-b -c -d -e -f  a

## Skew Hermitian Matrix

A complex matrix A is skew Hermitian if it is equal to the negative of its transpose complex conjugate:

A = - ( conjugate ( A ) )'.

A skew Hermitian matrix must have a purely imaginary diagonal.

## Skyline Matrix Storage

Skyline storage is a matrix storage method for storing a particular kind of sparse matrix. The format is simple, compact, and suitable for use with Gaussian elimination.

Skyline storage is most typically used with symmetric matrices derived from a finite element problem. In this setting, it is common to encounter a matrix which is "nearly" banded, or "raggedly banded". That is, the nonzero elements of the matrix are always near the diagonal, but the number of such elements varies from column to column.

Here is a matrix suitable for skyline storage:

A11 A12  0  A14  0
A21 A22  0  A24  0
0   0  A33  0   0
A41 A42  0  A44 A45
0   0   0  A54 A55

If the matrix is symmetric, we can regard the matrix as a sequence of columns, starting with the first nonzero element in a column, and proceeding to the diagonal, including every entry in between, whether or not it is zero. Thus, this matrix could be regarded as equivalent to the following five column vectors:

A14
A24
A12       A34  A45
A11  A22  A33  A44  A55

(Note that we have added a location for A34. It's storing a zero, but it's between a nonzero entry, A24, and the diagonal A44). This is the heart of the idea behind skyline storage. We simply cram all these columns into a single vector:

A11, A12, A22, A33, A14, A24, A34, A44, A45, A55,

and then figure out a way to address the individual entries. The obvious route is to have a way of pointing to the diagonal entries:
A11 is in entry 1,
A22 in entry 3,
A33 in entry 4,
A44 in entry 8,
A55 in entry 10.

and we can store these values in an array called "INDEX". Then, we know that column 4 of the matrix is contained in locations INDEX(3)+1 through INDEX(4).

The reason that skyline storage is ideal for Gaussian elimination is that we have already set aside all the entries we will ever need to take care of "fill in", as long as we are not performing any pivoting.

For instance, consider what happens when we are eliminating row 1. We will add multiples of row 1 to rows 2 and 4. If there is a nonzero entry in a particular column of row 1, then that entry could cause "fill in" when added to row 2 or 4. But there is no problem, because if column J of row 1 is nonzero, then we're sure we've already set aside a entry for column J of rows 2 through row J. So we will never have to modify our storage scheme because of unexpected nonzero entries generated by fillin.

On the other hand, it should be clear that indexing entries of the matrix can be tedious or cumbersome. In particular, it is much harder to find entries of a row of the matrix than of a column. Writing a routine to actually carry out Gauss elimination with such a data structure can be something of a trial.

Thus skyline storage achieves simplicity and compactness in storage, at the cost of complexity in coding.

## Span of a Set of Vectors

The span of a set of vectors { v(i)| 1 <= I <= N } is the set of vectors that can be constructed from the set via linear combinations. The span is sometimes called the spanning space.

The span of a set of vectors is a linear space; it includes the 0 vector, each of the original vectors, and sums and multiples of them.

As an example of a vector span, consider the eigenvectors associated with a particular eigenvalue of a matrix. If the eigenvalue has algebraic multiplicity of 1, then we usually think there is only one associated eigenvector; if the eigenvalue has a higher multiplicity, there may be 2 or more linearly independent eigenvectors. In any case, the eigenspace associated with the eigenvalue is the span of these eigenvectors. Any element of this span is itself also an eigenvector for the eigenvalue.

Given a set of vectors which define a spanning space, it is often desired to know whether a subset of those vectors will define the same spanning space; that is, whether we can toss out some of the vectors because they don't add anything to the span. If we toss out as many vectors as possible, we end up with a basis for the span; the number of vectors remaining tells us the dimension of that space.

## The Sparse BLAS

The Sparse BLAS are a set of vector oriented linear algebra routines useful for sparse matrix problems.

The success of the Basic Linear Algebra Subprograms (BLAS) motivated the creation of a small set of similar routines for use with sparse vectors. The routines enable the interaction of sparse and full vectors. The sparse vectors are assumed to be stored as a vector of values, and a vector of indices. For example, a sparse vector might be represented by the pair

X = (1.0, 2.0, 3.0)
IX = (9, 1, 200)

which represents a vector that is entirely zero, except that entry 1 equals 2.0, entry 9 equals 1.0 and entry 200 equals 3.0.

FORTRAN subroutines which implement the sparse BLAS are available through the NETLIB web site.

The single precision Sparse BLAS routines include:

• SAXPYI adds a multiple of a sparse vector to a dense vector.
• SDOTI computes the dot product of a dense and a sparse vector.
• SGTHR gathers entries of a dense vector into a sparse vector.
• SGTHRZ gathers entries of a dense vector into a sparse vector, zeroing first.
• SROTI applies a Givens rotation to a sparse vector.
• SSCTR scatters entries of a sparse vector into a dense vector.

## Sparse Matrix

A sparse matrix is a matrix with so many zero entries that it is profitable to use special schemes to store the matrix and solve problems involving it. A rule of thumb is that if less than 5% of the matrix is nonzero, the matrix can be called sparse.

Not only does a sparse matrix have few nonzeroes, but these nonzeroes are typically scattered seemingly at random throughout the matrix. Thus, unlike a band matrix, (which also is "mostly zero") we have to expend a significant amount of effort simply recording where every nonzero entry occurs.

## Sparse Matrix Storage

A sparse storage scheme is a matrix storage method of storing the nonzero entries of a sparse matrix in a convenient and efficient way.

There are many different kinds of sparse matrix storage scheme. Their features depend on the properties of the underlying problem. In this discussion, a few simple non-specific schemes will be discussed.

If a direct linear equation solver will be used, storage must also be made available for "fill-in" values that will occur during the factorization process. An iterative solver, on the other hand, has no fill-in values, and never needs to change the entries of the matrix.

A simple storage scheme would require the user to supply NZ, the number of nonzero entries, and three arrays of size NZ:

• IROW(I) and ICOL(I) specify the row and column indices for entry I;
• A(I) specifies the value of entry I. Thus the first nonzero entry of the matrix occurs in row IROW(1), column ICOL(1), with the value A(1).

If this scheme seems excessive, the elements of the matrix could be listed in order of rows, the ROW array could be replaced by a ROWEND array of length N. ROWEND(I) is the index in A of the last nonzero element of row I of the original matrix. This small saving comes at the price of complicating access to the matrix.

## SPARSKIT

SPARSKIT is a tool kit for sparse matrix computations.

SPARSKIT can manipulate sparse matrices in a variety of formats, and can convert from one to another. For example, a matrix can be converted from the generalized diagonal format used by ELLPACK and ITPACK to the format used by the Harwell-Boeing Sparse Matrix Collection (HBSMC) or even into LINPACK banded format.

Utilities available include converting data structures, printing simple statistics on a matrix, plotting a matrix profile, performing basic linear algebra operations (similar to the BLAS for dense matrix), and so on.

The spectral radius of a matrix is the magnitude of the largest eigenvalue of a matrix.

The spectral radius is often easy to compute, and it is a useful measure of the "size" or "strength" of a matrix. However, the spectral radius is not a matrix norm.

The spectral radius violates several rules for matrix norms. In particular, it is possible to have any of the following situations:

• rho(A)=0 but A is not the zero matrix;
• rho(A+B) > rho(A) + rho(B);
• rho(A*B) > rho(A) * rho(B);

On the other hand, the spectral radius is a lower bound for the value of any vector-bound matrix norm on A, because there must be an eigenvalue lambda and a vector of unit norm x with the property that

A * x = lambda * x
so the norm of A*x divided by the norm of x is lambda. Therefore, the matrix norm of A must be at least | lambda |.

The Euclidean norm of a real symmetric matrix is equal to its spectral radius.

## Spectrum

The spectrum of a square matrix is the set of eigenvalues.

## Square Matrix

A square matrix is one which has the same number of rows and columns. Most cases (but not all!) requiring the solution of a linear system involve a square coefficient matrix.

Only a square matrix has a determinant, an inverse (if not singular!) a trace, powers, a matrix exponential, and eigenvalues.

If a matrix is not square, it is called rectangular.

## Square Root of a Matrix

Given a matrix A, we say that the matrix B is the square root of A if it is true that A = B * B = B2.

It is obvious that A must be a square matrix in order to have a square root.

(Diagonal matrices): If D is a diagonal matrix, then a square root of D can be computed by taking the square roots of each diagonal entry; if any entry was negative, the square root matrix will be complex.

The square root matrix need not be unique.

For example, the 2x2 diagonal matrix with diagonal entries 4 and 9 has the following four distinct square root matrices:

(2  0)  (2  0)  (-2  0)  (-2  0)
(0  3)  (0 -3)  ( 0  3)  ( 0 -3)

(Diagonalizable Matrices): If A is diagonalizable, so that A = X * D * inv(X), then a square root of A can be computed by sqrt(A) = X * sqrt(D) * inv(X). Note that if any entry in D was negative, we will end up with a complex square root matrix.

(Symmetric Matrices): If A is positive semidefinite symmetric, then the square root will be real, and can be determined from the Cholesky factorization as follows:

A = L * L'
Now determine the Singular Value Decomposition of L
L = U * D * V'
and write
A = L * L'
= U * D * V' * ( U * D * V' )'
= U * D * V' * V * D * U'
= U * D * D * U'
= U * D * U' * U * D * U'
= X * X
where X is the desired matrix square root of A. In particular:
X = U * D * U'

If B3=A, we say that B is the cube root or third root of A, and for any integer N, BN=A means that B is called an N-th root of A.

## Stochastic Matrix

A stochastic matrix has only nonnegative entries, with all row sums equal to 1.

A stochastic matrix may also be called a row stochastic matrix or Markov matrix or a transition matrix. These names derive from the fact that the entry Ai,j may be viewed as a probability that a system currently in state I will transition to state J on the next step.

Facts about a (row) stochastic matrix A:

• A is a nonnegative matrix;
• Every eigenvalue of A must be no greater than 1 in modulus;
• The vector (1,1,...,1) is an eigenvector, with eigenvalue 1;
• A is summable;
• If A and B are stochastic, then so is A * B;

A column stochastic matrix has only nonnegative entries, with the entries in each column summing to 1.

A doubly stochastic matrix is both row and column stochastic.

Facts about a doubly stochastic matrix A:

• If A and B are doubly stochastic, then so is A * B;
• If A is doubly stochastic, then A can be written as
A = Sum ( 1 <= I <= N ) c(I) * P(I)
where N is the order of A, the real numbers c(I) sum to 1, and each P(I) is a permutation matrix.

An ergodic matrix is a row stochastic matrix which has no eigenvalues of modulus 1 except for 1 itself.

## Strassen's Algorithm

Strassen's algorithm for matrix multiplication is a method which can produce a matrix product more efficiently than the standard approach.

For simplicity, suppose that the two factors A and B are both of order N. Then the obvious method of producing the product C requires N multiplications and N additions for each entry, for a total of 2 * N3 floating point operations.

Strassen's algorithm is defined recursively. It is easiest to describe if the matrix has an order that is a power of two. In that case, the product of two matrices of order N is described in terms of the product of matrices of order (N/2), and so on, until factors of order 2 are reached.

Now suppose that A and B are each of order 2. The definitions for the entries of the product C are:

C11 = A11 * B11 + A12 * B21        C12 = A11 * B12 + A12 * B22
C21 = A21 * B11 + A22 * B12        C22 = A21 * B12 + A22 * B22

Now compute the following quantities:
P1 = (A11+A22) * (B11+B22)
P2 = (A21+A22) * B11
P3 = A11 * (B12-B22)
P4 = A22 * (B21-B11)
P5 = (A11+A12) * B22
P6 = (A21-A11) * (B11+B12)
P7 = (A12-A22) * (B21+B22)

Then it is simply a matter of substitution to show that
C11 = P1 + P4 - P5 + P7           C12 = P3 + P5
C21 = P2 + P4                     C22 = P1 + P3 - P2 + P6

Instead of 8 multiplications, only 7 are required, at the cost of several more additions.

The reason it does is that we can apply the above formulas recursively. And as we break a matrix of order N into matrices of order N/2, we have to define 7 values, not 8. But each of those 7 values is also a matrix multiplication, and hence can be computed by the algorithm, and requires only 7 multiplications, and not 8. It turns out that the number of quantities we have to define drops precipitously, and so the fact that we have to use a lot of extra additions to define them doesn't matter.

Strassen's algorithm requires an amount of work that increases with N like NLOG2 7 rather than N3. The extra additions in the defining formulas cause the work formula to have a larger constant in front of it, so that for small N, the standard algorithm is faster. But, as we have shown above, there are now implementations of the Strassen algorithm that beat the best implementations of the standard algorithm on the Cray.

Reference:

1. Volker Strassen,
Gaussian Elimination is not Optimal,
Numerische Mathematik,
Volume 13, page 354, 1969.

## Submatrix

A submatrix of a matrix A is any rectangular "chunk" of the matrix.

The chosen entries may all be neighbors, or they may be chosen by choosing any subset of the row and column indices of A, in any order.

As a simple example of the use of submatrices, suppose that A has the form

( A1 A2 )
( 0  A3 )

where A1, A2 and A3 are submatrices. Then a linear system involving A could be solved as two smaller subproblems: solve A3 * x2 = b2, then solve A1 * x1 = b1 - A2 * x2. ( Here b1 and b2 are the parts of the right hand side that correspond to the subdivision of A).

Similarly, for this case, the eigenvalues of A can be determined by finding the eigenvalues of A1 and A3. (The computation of the eigenvectors would be a little more complicated.

## Successive Overrelaxation (SOR)

The successive overrelaxation method or SOR is an iterative method for solving linear systems, and is a generalization of the Gauss Seidel and the Jacobi iterations.

SOR is only appropriate for matrices which are strictly diagonally dominant or else symmetric and positive definite.

To derive SOR, think of both the Jacobi and Gauss Seidel iterations as computing a correction to the current estimate of the solution, so that a step of the method has the form:

X[N+1] = X[N] + dX.
SOR offers the ability to add a greater or smaller proportion of the correction, which we will denote w:
X[N+1] = X[N] + w * dX.

Surprisingly, for the appropriate choice of w, the SOR method can converge faster than the Gauss Seidel or Jacobi methods. It can be shown that the SOR method will only be convergent for 0 < w < 2. Values of w less than 1 result in an underrelaxed iteration, while values greater than 1 (the usual case) correspond to an overrelaxed iteration.

For a given coefficient matrix, convergence of the SOR method is optimal for some value of w, but it is generally not easy to determine this value. A variety of schemes are available for estimating and adjusting the value used during a particular iteration.

The SOR iteration can be considered in terms of its matrix splitting. That is, if we decompose the matrix A into its strictly lower triangular, diagonal, and strictly upper triangular parts:

A = L + D + U
then the method is equivalent to the iteration
xnew = x - w * ( L + D )-1 * ( A * x - b ).
which means that the convergence of the algorithm can be understood in terms of the behavior of powers of the iteration matrix:
I - w * ( L + D )-1 * A.

If the original coefficient matrix A is symmetric, then it may be preferred to use the symmetric SOR iteration or SSOR. In this case, the iteration consists of pairs of SOR steps. The odd steps are the same as the usual iteration. But in the even steps, the variables are solved for in reverse order. Each pair of such steps is a single step of the SSOR iteration, which has the property that its iteration matrix is similar to a symmetric matrix (though not necessarily symmetric itself). Among other things, this means that SSOR can be used as a preconditioner for certain other problems.

## Summable Matrix

A (square) matrix A is said to be summable if the limit as k goes to infinity exists for the following:

(1/k) * ( I + A + A2 + ... + Ak-1)

If A is convergent or semiconvergent, it is summable. However, a matrix can be summable without being convergent.

It should be clear that for a matrix to be summable, the spectral radius rho(A) must be less than or equal to 1. But if the spectral radius is less than one, the matrix is convergent, and we know how that works. The interesting case occurs when rho(A) = 1.

It turns out that if rho(A) = 1, then matrix A will be summable if and only if it has only semisimple eigenvalues.

Note that all stochastic matrices are summable.

## Symmetric Matrix

A symmetric matrix A is equal to its transpose, that is,

A = A'
or, for every pair of indices I and J:
Ai,j = Aj,i

Every matrix A can be decomposed into the sum of an antisymmetric and a symmetric matrix:

A = B + C = (1/2) * ( ( A - A' ) + ( A + A' ) )

Here is an example of a symmetric matrix:

1 2 0 4
2 9 4 8
0 4 5 3
4 8 3 7

Simple facts about a symmetric matrix A:

The eigenvalues of successive members of the sequence of principal minors of a symmetric matrix have the Sturm sequence property, a strict interlacing relationship. The k+1 eigenvalues of the principal minor of order k+1 are strictly separated by the k eigenvalues of the minor of order k.

LAPACK, LINPACK and EISPACK include specialized routines for symmetric matrices, and include the use of symmetric matrix storage to save space.

## Symmetric Matrix Storage

Symmetric storage is a matrix storage method of storing a symmetric matrix economically, omitting the repeated elements.

The strict lower triangle of a symmetric or Hermitian matrix is redundant. A symmetric storage scheme packs the upper triangle of the matrix into a linear vector of length ( N * ( N + 1 ) ) / 2. The data is organized by columns, with each column starting in row 1 of the original matrix, and proceeding down to the diagonal.

If A was the matrix:

11 12 13 14 15
12 22 23 24 25
13 23 33 34 35
14 24 34 44 45
15 25 35 45 55

then A could be symmetrically stored as:
1   2       3           4               5
( 11, 12, 22, 13, 23, 33, 14, 24, 34, 44, 15, 25, 35, 45, 55 ).

LAPACK, LINPACK and EISPACK include routines which can operate on data stored in this format.

## TEST_MATRIX

TEST_MATRIX is a collection of subroutines useful for generating and manipulating test matrices.

Many sample matrices are available with known inverse, determinant, eigenvalues, rank, symmetry, and other properties. These matrices may be used to test software for correctness, or for classroom demonstrations.

Most of the matrices come from a MATLAB M file collection developed by Nicholas Higham, Department of Mathematics, University of Manchester, and maintained in the "testmatrix" file somewhere at the MATLAB web site.

An earlier version of the collection is available, again as MATLAB M files, in ACM TOMS Algorithm 694, in the TOMS directory of the NETLIB web site.

I have a FORTRAN version of the source code available in the TEST_MATRIX page.

## Toeplitz Matrix

A Toeplitz matrix is a matrix which is constant along each of its diagonals.

Here is an example of a square Toeplitz matrix:

4 5 6 7
3 4 5 6
2 3 4 5
1 2 3 4

a "wide" rectangular Toeplitz matrix:
3 4 5 6 7
2 3 4 5 6
1 2 3 4 5

a "tall" rectangular Toeplitz matrix:
5 6 7
4 5 6
3 4 5
2 3 4
1 2 3

Facts about a Toeplitz matrix A:

• A is persymmetric.
• the inverse of A is not, in general, a Toeplitz matrix, but is persymmetric.

Compare the concepts of a Hankel Matrix, a Symmetric Matrix and a Circulant Matrix.

## Tournament Matrix

A Tournament matrix is a (square) matrix whose diagonal is all zero, and such that, for every pair of corresponding off-diagonal entries Ai,j and Aj,i, exactly one is -1 and one is 1.

The tournament matrix can be thought of as recording the wins and losses of a set of players, each of whom plays every other player.

Example:

0  1 -1  1 -1
-1  0 -1 -1  1
1  1  0  1  1
-1  1 -1  0 -1
1 -1 -1  1  0

Facts about a Tournament matrix A:

• A is antisymmetric.
• A is singular if and only if the order of A is odd.
• if the order of A is odd, the dimension of the null space is exactly 1.
• A is the incidence matrix of a complete oriented graph (in which every pair of distinct nodes shares exactly one directed edge)

## Tournament Win Matrix

A Tournament win matrix is a (square) matrix whose diagonal is all zero, and such that, for every pair of corresponding off-diagonal entries Ai,j and Aj,i, exactly one is 0 and one is 1.

The tournament win matrix can be thought of as recording the wins of a set of players, each of whom plays every other player.

Example:

0  1  0  1  0
0  0  0  0  1
1  1  0  1  1
0  1  0  0  0
1  0  0  1  0

Facts about a Tournament Win matrix A:

• A is a zero-one matrix.
• A+A'+I is a matrix of all ones.

## Trace of a Matrix

The trace of a (square) matrix is the sum of the diagonal elements.

Simple facts about the trace of a matrix A:

• the trace is equal to the sum of the eigenvalues of A;
• if A is similar to B, then trace ( A ) = trace ( B ).
• For square matrices A and B, trace ( A * B ) = trace ( B * A ).

The trace of the following matrix is 4:

1 -1  0
-1  2 -1
0 -1  1

and it has three eigenvalues, 0, 1 and 3, whose sum is also 4.

## Transpose

The transpose of a matrix A is obtained by switching all pairs of values Ai,j and Aj,i.

In printed text, the transpose is usually denoted by a superscript T, as in AT, while in running text the symbols A', A^T, transpose ( A ), or trans ( A ) might be used.

For the square matrix A:

1 2 0 4
4 8 9 2
5 5 6 3
7 0 0 3

the transpose A' is:
1 4 5 7
2 8 5 0
0 9 6 0
4 2 3 3

For the "wide" rectangular matrix A:

11 12 13 14 15
21 22 23 24 25
31 32 33 34 35

the transpose A' is the "tall" rectangular matrix:
11 21 31
12 22 32
13 23 33
14 24 34
15 25 35

Simple facts about the transpose of a matrix A:

• A is singular if and only if A' is singular;
• A and A' have the same determinant;
• A and A' have the same characteristic equation;
• A and A' have the same eigenvalues;
• the left eigenvectors of A are the right eigenvectors of A', and vice versa;
• the transpose of an M by N rectangular matrix is N by M.
• the tranpose of a sum is the sum of the transposes:
(A + B)' = A' + B'
• the tranpose of a product is the product of the transposes in reverse order:
(A * B)' = B' * A'

The LU factorization of a matrix A allows the solution of linear systems involving A' as well. LAPACK and LINPACK linear solution software takes advantage of this fact.

## Trapezoidal Matrix

A trapezoidal matrix is essentially a rectangular triangular matrix. Thus, a trapezoidal matrix has a different number of rows than columns. If, in addition, Ai,j = 0 whenever I > J, the matrix is called upper trapezoidal. If Ai,j = 0 whenever I < J, the matrix is called lower trapezoidal.

Here is a "wide" upper trapezoidal matrix:

11 12 13 14
0 22 23 24
0  0 33 34

Here is a "tall" upper trapezoidal matrix:
11 12 13 14
0 22 23 24
0  0 33 34
0  0  0 44
0  0  0  0
0  0  0  0

You might encounter an upper trapezoidal matrix when carrying out Gauss Jordan Elimination on a square matrix, or computing the QR factorization of a rectangular matrix.

## Triangular Matrix

An upper triangular matrix is entirely zero below the main diagonal while a lower triangular matrix is zero above the main diagonal.

It the entries of the main diagonal are all equal to 1, the matrix is said to be unit upper triangular or unit lower triangular. For example, you will encounter a unit lower triangular matrix in Gauss elimination.

Simple facts about a triangular matrix A:

• The determinant of A is the product of the diagonal entries;
• The eigenvalues of A are the diagonal entries;
• The inverse of A is also a triangular matrix;
• The linear system A * x = y is very easy to solve;
• Unless A is actually a diagonal matrix, it is not normal, and hence not unitarily or orthogonally diagonalizable.

## Tridiagonal Matrix

A tridiagonal matrix is a matrix whose only nonzero entries occur on the main diagonal or on the two diagonals which are immediate neighbors of the main diagonal.

The diagonals immediately below and above the main diagonal are referred to as the (principal) subdiagonal and (principal) superdiagonal respectively.

Here is an example of a tridiagonal matrix which is also positive definite symmetric:

-2  1  0  0
1 -2  1  0
0  1 -2  1
0  0  1 -2

Simple facts about a tridiagonal matrix A:

• A is a band matrix.
• Every matrix is similar to a tridiagonal matrix.
• A tridiagonal matrix A is irreducible if and only if every subdiagonal and superdiagonal element is nonzero.
• If Gauss elimination can be performed on A without using pivoting, then the L factor is zero except for the diagonal and first subdiagonal, and the U factor is zero except for the diagonal and first superdiagonal, in other words, L and U are lower and upper bidiagonal matrices.
• A-1 is generally not tridiagonal.

If it is true that none of the subdiagonal elements are zero, and none of the superdiagonal elements are zero, and

|A1,1| > |A1,2|
|An,n| > |An,n-1|
and, for 2 <= i <= n-1,
|Ai,i| >= |Ai,i-1| + |Ai,i-1|
then A is nonsingular, and Gauss elimination can be performed without pivoting. THis is true, for instance, for the "-1,2,-1" matrix used to approximate the second derivative, and the "2,1;1,4,1;1,2" matrix used in cubic spline interpolation.

If a (real) matrix A is tridiagonal and irreducible, then its eigenvalues are real and distinct. Moreover, the eigenvalues of the sequence of principal minors of A have a strict interlacing property: the k+1 eigenvalues of the principal minor of order k+1 are strictly separated by the k eigenvalues of the minor of order k. Since the determinant of the minors can be easily computed recursively, and since the sign sequence of these determinants carries information about the number of negative eigenvalues associated with each minor, this suggests how a bisection method can be employed to hunt for the eigenvalues of a tridiagonal matrix.

LAPACK and LINPACK include special routines for efficiently solving linear systems with a tridiagonal coefficient matrix. LAPACK and EISPACK have routines for finding eigenvalues of a tridiagonal matrix.

Cyclic reduction is a method of solving a tridiagonal system of equations which can give a large speedup on vector or parallel processors.

## Underdetermined System

An underdetermined linear system is a linear system A * x = b in which the values of N variables are sought, but for fewer than N linearly independent equations are given.

Of course, it is possible to set up a linear system where the number of equations M equals or exceeds the number of variables, but for which the number of linearly independent equations is less than N. Thus, you can't assume that M ≥ N rules out the undetermined case. It is the number of linearly independent equations that sets this. But of course, if you start with fewer than N equations, you are guaranteed an undetermined system.

An underdetermined linear system typically has an infinite number of solutions which constitute an affine linear space. The dimension of the linear space is usually called the number of degrees of freedom.

## Unimodular Matrix

A unimodular matrix is a square matrix whose determinant has absolute value 1.

Facts about a unimodular matrix A:

• The inverse matrix A-1 is unimodular;
• If A is an integer matrix, then so is A-1;
• If B is unimodular, so is A*B;

Examples of unimodular matrices:

## Unitary Matrix

A unitary matrix is a complex matrix U whose transpose complex conjugate is equal to its inverse:

U-1 = UH

Facts about a unitary matrix U

• U * UH = UH * U = I
• U is "L2-norm preserving": ||U*x||2 = ||x||2
• the columns of U are pairwise orthogonal vectors, and have unit vector L2 norm.

In real arithmetic, the corresponding concept is an orthogonal matrix.

## Unitary Similarity Transformation

A unitary transformation is a relationship between two complex matrices A and B, and a unitary matrix V, of the form:

A = V-1 * B * V.

## Unreduced Matrix

A reduced matrix is a (usually square) matrix A which may be broken up into submatrices

A11 | A12
----+---
0   | A22

where A11 is square, and the submatrix below it is completely zero.

An unreduced matrix is of course a matrix which does not have this form.

An irreducible matrix is one which cannot be put into reduced form by a renumbering of the variables. (The same permutation must be applied to both rows and columns.)

An unreduced tridiagonal matrix is one which has no "extra" zeroes, that is, the immediate superdiagonal and subdiagonal have no zero entries.

An unreduced upper Hessenberg matrix is one which has no "extra" zeroes, that is, the immediate subdiagonal has no zero entries.

Facts about a (square) reduced matrix A:

• A is singular if and only if one or both of A11 and A22 are singular.
• det(A) = det(A11) * det(A22)
• the eigenvalues of A are the eigenvalues of A11 plus the eigenvalues of A22.

## Upshift Matrix

The upshift matrix A circularly shifts all vector entries or matrix rows up 1 position.

Example:

0 1 0 0
0 0 1 0
0 0 0 1
1 0 0 0

Facts about the upshift matrix A:

## Vector-Bound Matrix Norm

A vector-bound matrix norm is a matrix norm that has been (or can be) derived from a vector norm by the following formula:

||A|| = supremum ||A*x|| / ||x||
where the supremum (roughly, the "maximum") is taken over all nonzero vectors x.

If such a relationship holds, then expressions involving the matrix norm and vector norm can be mingled to produce useful inequalities, based on the guaranteed compatiblity relationship:

||A*x|| <= ||A|| * ||x||

Matrix norms which are vector bound with some vector norm include the L1 matrix norm, the L2 matrix norm, and the L Infinity matrix norm.

## Vector Norm

A vector norm is a function ||*|| that measures the "size" of a vector.

A vector norm must have the following properties:

• || V || >= 0, and || V || = 0 if and only if V is the zero vector (positivity);
• || s * V || = | s | * || V || for any scalar s (linearity).
• || V + W || <= || V || + || W || for any vectors V and W (triangle inequality).

Commonly used vector norms include:

Given two points in space, x and y, we can define the distance between the points, d(x,y), in terms of a vector norm operating on the vectors of position coordinates:

d(x,y) = || x - y ||
and this quantity will have the expected properties of a distance function.

For a given vector norm, it is important to know which matrix norms are compatible, so that expressions like

||A*x|| <= ||A|| * ||x||
may be asserted.

## Z Matrix

A Z matrix is a real (square) matrix with nonpositive off-diagonal entries.

If the diagonal entries of a Z matrix are strictly positive, then it is also an L matrix.

A Z matrix is a "candidate" to be an M matrix, and over 50 different conditions are known that will guarantee that a Z matrix is, in fact, an M matrix.

## Zero Matrix

The zero matrix is a matrix all of whose entries are zero.

A zero matrix is sometimes called a trivial matrix. A matrix which has at least one nonzero entry is called a nontrivial matrix or nonzero matrix. (It is not required that all entries be nonzero, just that at least one of them is nonzero!)

## Zero One Matrix

A zero one matrix is a matrix whose entries are equal to 0 or 1.

A few examples of zero one matrices include:

Many combinatorial problems can be formulated in terms of a zero one matrix.

The permanent of a zero one matrix is equal to the number of generalized diagonals that contain no 0 element. The permanent of a zero one matrix is zero if and only if it contains an r by s subblock of zeroes, with r+s>=n+1.