Two vectors u and v are said to be A-orthogonal if
( u, A * v ) = 0.
Here A should be a positive definite symmetric matrix, which in turn guarantees that the expression ( u, A * v ) may be regarded as an inner product of the vectors u and v, with the usual properties.
This concept is useful in the analysis of the conjugate gradient method.
Back to TABLE OF CONTENTS.
An adjacency matrix of an (undirected) graph is a matrix whose order is the number of nodes, and whose entries record which nodes are connected to each other by a link or edge of the graph.
If two nodes I and J are connected by an edge, then A_{i,j}=A_{j,i}=1. All other entries of the matrix are 0. Thus, an adjacency matrix is a zero-one matrix. The usual convention is that a node is not connected to itself, and hence the diagonal of the matrix is zero.
The product A^{2}=A*A is a matrix which records the number of paths between nodes I and J. If it is possible to reach one node from another, it must be possible in a path of no more than n-1 links. Hence, the reachability matrix, which records whether it is possible to get from node I to node J in one or more steps, can be determined by taking the logical sum of the matrices I, A, A^{2}, ..., A^{n-1}.
(Loops:) If edges are allowed which begin and end at the same node, (sometimes called loops) then the adjacency matrix can have some 1's on the diagonal.
(Directed graphs:) If edges are assigned a direction, then it is possible that there is an edge from node I to node J, but not in the reverse direrection. In this case of a "directed graph", an adjacency matrix can be defined as well. The only difference is that the adjacency matrix now need not be symmetric. The reachability matrix can still be defined in the same way.
(Multigraphs:) If multiple edges are allowed between two nodes, (which might be distinguished by color in a drawing), then the adjacency matrix can record this information by storing in A_{i,j} the number of edges between the two nodes. The reachability matrix can still be defined in the same way, and still counts the number of different routes correctly.
Back to TABLE OF CONTENTS.
The adjoint matrix of a square matrix A has the property that:
A * adjoint ( A ) = adjoint ( A ) * A = det(A) * I.
Note that, if A is not invertible, then det(A)=0. We can take the convention that in this case, the adjoint of the matrix is the zero matrix.
The adjoint of a matrix A is "almost" its inverse. If we know that A is invertible, then the inverse of A, denoted inverse ( A ) or A^{-1}, can be written explicitly as:
A^{-1} = ( 1 / det(A) ) * adjoint ( A ).
This formula is only of use if we can figure out how to evaluate the things on the right hand side. The first factor is not so bad, since we do know a formula for the determinant of a matrix.
But there is also a way of evaluating the second factor. The adjoint matrix is defined in terms of the cofactor matrix of A:
adjoint ( A ) = ( cofactor ( A ) )'.
Back to TABLE OF CONTENTS.
An alternating sign matrix is an integer matrix with the properties that
Example:
0 1 0 0 0 1 -1 0 1 0 0 1 0 -1 1 0 0 0 1 0 0 0 1 0 0
Obviously, alternating sign matrices include the identity matrix and any permutation matrix. From the definitions, you should see that the first row must contain a single entry which is 1, with the other values being 0. The nonzeroes in the second row can be a single 1, or the values, 1, -1, 1, in that order, with some intervening zeroes possible. In the third row, the value -1 may occur up to as many times as there were 1's in preceding rows, which means the most interesting row could be 1, -1, 1, -1, 1, -1, 1. Thus the number of possible nonzeroes grows until the central row of the matrix is reached. Since the same restrictions apply from the bottom reading up, the number of possible nonzeroes must now decrease. Similar reasoning controls the nonzero population of the columns.
If we let A_{n} denote the number of distinct alternating sign matrices of order n, then it has only recently been proved that
A_{n} = Product ( 0 <= I <= N-1 ) (3*I+1)! / (N+I)!giving the sequence 1, 2, 7, 42, 429, 7436, 218348, 10850216, ...
Reference:
Back to TABLE OF CONTENTS.
An anticirculant matrix is a matrix whose first row of values is repeated in each successive row, shifted one position to the left, with the first value "wrapping around" to the end.
Here is an example of an anticirculant matrix:
1 2 3 4 2 3 4 1 3 4 1 2 4 1 2 3and here is an example of a rectangular anticirculant matrix:
1 2 3 4 5 2 3 4 5 1 3 4 5 1 2
Simple facts about a anticirculant matrix A:
Back to TABLE OF CONTENTS.
A square matrix A is antisymmetric if it is equal to the negative of its transpose:
A = - A'.
Every matrix A can be decomposed into the sum of an antisymmetric and a symmetric matrix:
A = B + C = (1/2) * ( ( A - A' ) + ( A + A' ) )
Simple facts about an antisymmetric matrix A:
An antisymmetric matrix is also called skew symmetric.
In complex arithmetic, the corresponding object is a skew Hermitian matrix.
Back to TABLE OF CONTENTS.
The Arnoldi iteration is an iterative method that seeks to approximate some of the eigenvalues of a very large matrix.
If the matrix is symmetric or Hermitian, then the Arnoldi iteration is usually reformulated into the more efficient Lanczos iteration.
We start by considering the unitary similarity transform that converts the m by m matrix A into upper Hessenberg form:
A = Q H Q*Now we let Q_{n} indicate the m by n matrix formed from the first n columns of Q; similarly, H_{n} is the n+1 by n+1 submatrix of H. Then we write
A Q_{n} = Q_{n+1} H_{n}In particular, we can look at the subsystem involving the n-th column of Q_{n}:
A*q_{n} = h_{1,n} q_{1} + ... + h_{n,n} q_{n} + h_{n+1,n} q_{n+1}
This can be interpreted as a recursive relationship for q_{n+1} in terms of the previous vectors. The Arnoldi iteration allows us to build up the matrices Q_{n} and H_{n}. For reasons not completely understood, even for n much smaller than m, the eigenvalues of H_{n} will tend to some of the eigenvalues of A, and typically the extreme ones.
The new vector q_{n+1} is a Krylov vector; actually, because of orthogonalization, it represents the new component introduced into the Krylov subspace by the product A*q_{n}.
Thus, a typical procedure is to compute H_{n} and apply a standard eigenvalue procedure for Hessenberg matrices to extract its eigenvalues.
An efficient form for the Arnoldi iteration is:
Let b be an arbitrary initial vector; Set q(1) = b / ||b||; for n = 1, ... v = A * q(n) for j = 1 to n h(j,n) = conjugate ( q(j) ) * v v = v - q(j) * h(j,n) h(n+1,n) = ||v|| q(n+1) = v / h(n+1,n) end for end for
Back to TABLE OF CONTENTS.
A band matrix is a matrix whose entries are all zero except for the diagonal and a few of the immediately adjacent diagonals.
If the band matrix is large enough, then many significant efficiencies can be achieved in storage and matrix operations. Because so many elements are zero, the band matrix can be stored more compactly using band matrix storage. And because so many elements are zero, many algorithms can be speeded up to execute more quickly, including matrix multiplication and Gauss elimination.
Special cases of band matrices include those which are also symmetric or positive definite.
Band matrices with very tight bandwidths are of interest, and include matrices which are diagonal, bidiagonal, tridiagonal, and in some cases pentadiagonal.
Here is an example of a band matrix:
11 12 0 0 0 0 21 22 23 0 0 0 31 32 33 34 0 0 0 42 43 44 45 0 0 0 53 54 55 56 0 0 0 64 65 66This matrix has an upper bandwidth of 1, and lower bandwidth of 2, and an overall bandwidth of 4.
In some cases, a band matrix can be LU-factored without pivoting. This is always true if the matrix is positive definite. If a banded matrix A with lower and upper bandwidths ml and mu can be LU factored without pivoting, then its factorization can be written as
A = L * UThe usual P factor is missing because there is no pivoting, but more is true. The factor L, which is unit lower triangular, is in fact also a band matrix, with lower bandwith ml, and U, the upper triangular factor, is also band matrix with upper bandwith mu.
If the matrix A is positive definite symmetric banded, with a half-bandwidth of m, then the banded LU factorization becomes:
A = L * Uwith L and U having m as their lower and upper bandwidths respectively. Moreover, if we "factor out" the non-unit diagonal of U by writing
U = D * Vwhere V is unit upper triangular, we may write
A = L * U = L * D * V = L * D2 * D2 * V = R' * Rwhere D2 is the obvious "square root" of D (which must have positive entries because A is positive definite, remember!), and R is an upper triangular banded matrix with upper bandwidth m. Another way of looking at this is to say that the Cholesky factor of a banded positive definite symmetric matrix is itself banded with the same (half) bandwidth as the original matrix.
LAPACK and LINPACK include special routines for a variety of band matrices. These routines can compute the LU factorization, the determinant, inverse, or solution of a linear system.
LAPACK and EISPACK have routines for computing the eigenvalues of a symmetric banded matrix.
Back to TABLE OF CONTENTS.
Band matrix storage is a matrix storage format suitable for efficiently storing the nonzero entries of a band matrix.
Most band storage schemes are column oriented. The nonzero entries of the matrix "slide" downwards, while remaining in their original column. If the original matrix looked like this:
11 12 0 0 0 0 21 22 23 0 0 0 31 32 33 34 0 0 0 42 43 44 45 0 0 0 53 54 55 56 0 0 0 64 65 66the matrix would be saved in column band matrix storage as:
0 12 23 34 0 0 11 22 33 44 45 0 21 32 43 54 55 56 31 42 53 64 65 66Note that the zeroes in the above array are there just as padding. They don't correspond to any entries of the original array, and are simply necessary to make the array rectangular.
If the matrix is to be handled by a Gauss elimination routine that uses pivoting, then there is a possibility of fill in; that is, nonzero entries may need to be stored in places where zeroes had been. Band matrix storage can still be used, but we need to include in the compressed matrix some extra entries representing the diagonals along which the fill in entries may occur. It turns out that the number of extra diagonals required is simply the number of nonzero subdiagonals in the original matrix. For our example, this would mean the matrix would be stored as:
0 0 0 0 0 0 0 0 0 0 0 0 0 12 23 34 0 0 11 22 33 44 45 0 21 32 43 54 55 56 31 42 53 64 65 66
Back to TABLE OF CONTENTS.
The bandwidth of a band matrix is, roughly speaking, the number of diagonals that contain nonzero entries.
More precisely, define ML, the lower bandwidth of a matrix A to be the maximum value of ( I - J ), and MU to be the maximum value of ( I - J ), for all nonzero matrix entries A_{i,j}. Then the bandwidth M is defined by:
M = ML + 1 + MU.
This definition always treats the (main) diagonal as nonzero, and is not misled by a matrix which has only two nonzero diagonals, which are actually widely separated. All the territory between the diagonals must be included when measuring bandwidth.
Back to TABLE OF CONTENTS.
A basis for a linear space X of dimension N is a set of N vectors, {v(i) | 1 <= i <= N } from which all the elements of X can be constructed by linear combinations.
Naturally, we require that each of the vectors v(i) be an element of the space X. Moreover, it is not enough that these vectors span the space; we also require that they be linearly independent, that is, there should be no redundant vectors in the set.
The columns of the identity matrix form a basis for the linear space of vectors of dimension N. A square matrix of order N is not defective exactly when its eigenvectors form a basis for the linear space of vectors of dimension N.
Given a particular basis, the representation of a vector x is the unique set of coefficients c(i) so that
x = Sum ( 1 <= I <= N ) c(i) * v(i)The coefficients must be unique, otherwise you can prove that the basis is not linearly independent!
If the basis vectors are pairwise orthogonal, then the basis is called an orthogonal basis. If the basis vectors have unit length in the Euclidean norm, the basis is a normal basis. If both properties apply, it is an orthonormal basis. The columns of an orthogonal matrix are an orthonormal basis for the linear space of vectors of dimension N.
Back to TABLE OF CONTENTS.
A bidiagonal matrix has only two nonzero diagonals. The matrix is called upper bidiagonal if these are the main diagonal and the principal superdiagonal. The matrix is called lower bidiagonal if these are the main diagonal and the principal subdiagonal.
A simple example of an upper bidiagonal matrix is:
1 2 0 0 0 0 3 4 0 0 0 0 5 6 0 0 0 0 7 8 0 0 0 0 9
The Jordan Canonical Form is an example of an upper bidiagonal matrix.
A bidiagonal matrix is automatically a:
with all the rights and privileges appertaining thereunto.If a tridiagonal matrix can be LU-factored without pivoting, then the L factor is lower bidiagonal, and the U factor is upper bidiagonal.
Back to TABLE OF CONTENTS.
The BLAS or Basic Linear Algebra Subprograms, are a set of routines offering vector and matrix utilities. They are extensively used as part of LINPACK and LAPACK, to simplify algorithms, and to make them run more quickly.
The Level 1 BLAS provide basic vector operations such as the dot product, vector norm, and scaling. Level 2 BLAS provide operations involving a matrix and a vector, and Level 3 BLAS provide matrix-matrix operations.
There are also sets of sparse BLAS and parallel BLAS available.
Here are the Level 1 BLAS routines for single precision real vectors:
Back to TABLE OF CONTENTS.
A block matrix is a matrix which is described as being built up of smaller matrices.
For example, a tridiagonal block matrix might look like this:
2 4 | 3 9 | 0 0 4 6 | 0 3 | 0 0 --------------- 1 0 | 2 4 | 3 9 5 5 | 4 6 | 0 3 --------------- 0 0 | 1 0 | 2 4 0 0 | 5 5 | 4 6but for certain purposes, it might help us to see this matrix as "really" being a tridiagonal matrix, whose elements are themselves little matrices:
a | b | 0 --------- c | a | b --------- 0 | c | a
An algorithm suitable for a tridiagonal matrix can often be extended, in a natural manner, to handle a block tridiagonal matrix. Similar extensions can be made in some cases for other types of block matrices. A block banded matrix can be factored by a variant of banded Gauss elimination, for instance.
Back to TABLE OF CONTENTS.
A border banded matrix is a 2 by 2 block matrix comprising a (large) leading block which is a square banded matrix, two dense rectangular side strips, and a (small) trailing block which is a square dense matrix.
For example, a "toy" border banded matrix might look like this:
2 -1 0 0 0 0 0 | 1 2 -1 2 -1 0 0 0 0 | 2 5 0 -1 2 -1 0 0 0 | 7 8 0 0 -1 2 -1 0 0 | 3 3 0 0 0 -1 2 -1 0 | 4 2 0 0 0 0 -1 2 -1 | 3 1 0 0 0 0 0 -1 2 | 7 8 ------------------------- 3 7 8 3 2 3 1 | 5 2 1 2 4 7 9 2 4 | 3 6which we can regard as being made up of the blocks:
A11 | A12where, as we specified, A11 is a square banded matrix, A22 is a square dense matrix, and A21 and A12 are rectangular strips.
---------
A21 | A22
It is desirable to take advantage of the banded structure of A11. We can specify an algorithm for solving a linear system A * x = b that can be written in terms of operations involving the sub-matrices A11, A12, A21 and A22, which will achieve this goal, at the expense of a little extra work. One problem with this technique is that it will fail if certain combinations of the matrices A11 and A22 are singular, which can happen even when A is not singular.
The algorithm for solving A * x = b rewrites the system as:
A11 * X1 + A12 * X2 = B1The first equation can be solved for X1 in terms of X2:
A21 * X1 + A22 * X2 = B2
X1 = - A11^{-1} * A12 * X2 + A11^{-1} * B1allowing us to rewrite the second equation for X2:
( A22 - A21 * A11^{-1} * A12 ) X2 = B2 - A21 * A11^{-1} * B1which can be solved as:
X2 = ( A22 - A21 * A11^{-1} * A12 )^{-1} * ( B2 - A21 * A11^{-1} * B1 )
The actual algorithm doesn't compute the inverse, of course, but rather factors the matrices A11 and A22 - A21 * A11^{-1} * A12.
Back to TABLE OF CONTENTS.
The Cartesian basis vectors are simply the N columns of the identity matrix, regarded as individual column vectors.
These vectors form the standard basis for the set of vectors R^{N}. The vector corresponding to the I-th column of the identity matrix is often symbolized by e_{i}.
Thus, if we are working in a space with dimension N of 4, the basis vectors e_{1} through e_{4} would be:
1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
Facts about the Cartesian basis vectors:
Back to TABLE OF CONTENTS.
The Cauchy-Schwarz Inequality is a relationship between a vector inner product and a vector norm derived from that inner product. In particular, if the norm ||*|| is defined by an inner product (*,*) as follows:
|| x || = sqrt ( x, x ),then the Cauchy-Schwarz inequality guarantees that for any vectors x and y it is the case that:
| ( x, y ) | <= || x || * || y ||.
Back to TABLE OF CONTENTS.
The Cayley-Hamilton Theorem guarantees that every (square) matrix satisfies its own characteristic equation.
For example, if A is the matrix:
2 3 1 4then the characteristic equation is
lambda^{2} - 6 * lambda + 5 = 0.which is not true for all values lambda, but just a few special values known as eigenvalues. The Cayley-Hamilton theorem guarantees that the matrix version of the characteristic equation, with A taking the place of lambda, is guaranteed to be true:
A^{2} - 6 * A + 5 * I = 0.
Back to TABLE OF CONTENTS.
A centrosymmetric matrix is one which is symmetric about its center; that is,
A_{i,j} = A_{m+1-i,n+1-j}
Example:
1 10 8 11 5 13 2 9 4 12 6 7 3 7 6 12 4 9 2 13 5 11 8 10 1
A centrosymmetric matrix A satisfies the following equation involving the Exchange matrix J:
J*A*J = A.
Back to TABLE OF CONTENTS.
The characteristic equation of a (square) matrix A is the polynomial equation:
det ( A - lambda * I ) = 0where lambda is an unknown scalar value.
The left hand side of the equation is known as the characteristic polynomial of the matrix. If A is of order N, then there are N roots of the characteristic equation, possibly repeated, and possibly complex.
For example, if A is the matrix:
2 3 1 4then the characteristic equation is
( 2 - lambda 3 ) det ( 1 4 - lambda ) = 0or
lambda^{2} - 6 * lambda + 5 = 0.This equation has roots lambda = 1 or 5.
Values of the scalar lambda which satisfy the characteristic equation are known as eigenvalues of the matrix.
Some facts about the characteristic equation of A:
The Cayley-Hamilton Theorem guarantees that the matrix itself also satisfies the matrix version of its characteristic equation.
Back to TABLE OF CONTENTS.
The Cholesky factorization of a positive semidefinite symmetric matrix A has the form:
A = L * L'where L is a lower triangular matrix, or equivalently:
A = R' * Rwhere R is an upper triangular matrix.
If a matrix is symmetric, then it is possible to determine whether or not the matrix is positive definite simply by trying to compute its Cholesky factorization: if the matrix has a zero eigenvalue, then it is positive semidefinite, and the algorithm should theoretically spot this by computing a zero diagonal element; if the matrix actually has a negative eigenvalue, then at a particular point in the algorithm, the square root of a negative number will be computed.
Software to compute the Cholesky factorization often saves space by using symmetric matrix storage, and overwriting the original matrix A by its Cholesky factor L.
As long as the matrix A is positive definite, the Cholesky factorization can be computed from an LDL factorization, or, if no pivoting was done, from an LU factorization.
The Cholesky factorization can be used to compute the square root of the matrix.
The LINPACK routines SCHDC, SCHUD, SCHDD, SCHEX, SPOCO, SPOFA, SPODI, and SPOSL compute and use the Cholesky factorization.
Back to TABLE OF CONTENTS.
A circulant matrix is a matrix whose first row of values is repeated in each successive row, shifted one position to the right, with the end value "wrapping around".
Here is a square circulant matrix:
1 2 3 4 4 1 2 3 3 4 1 2 2 3 4 1a "wide" rectangular circulant matrix:
1 2 3 4 5 5 1 2 3 4 4 5 1 2 3a "tall" rectangular circulant matrix:
1 2 3 5 1 2 4 5 1 3 4 5 2 3 4
Simple facts about a (rectangular) circulant matrix A:
Simple facts about a square circulant matrix A:
Y = A(1,1) + A(1,2)*W + A(1,3)*W^{2} + ... + A(1,N)*W^{N-1}is an eigenvalue of A, with (right) eigenvector:
( 1, W, W^{2}, ..., W^{N-1} )and left eigenvector:
( W^{N-1}, W^{N-2}, ..., W^{2}, W, 1 ).Although there are exactly N distinct N-th roots of unity, the circulant may have repeated eigenvalues, because of the behavior of the polynomial. However, the matrix is guaranteed to have N linearly independent eigenvectors.
Compare the concept of an anticirculant matrix.
Back to TABLE OF CONTENTS.
The cofactor matrix of a square matrix A is generally used to define the adjoint matrix, or to represent the determinant.
For a given matrix A, the cofactor matrix is the transpose of the adjoint matrix:
cofactor ( A ) = ( adjoint ( A ) )'
The determinant det(A) can be represented as the product of each of the entries of any given row or column times their corresponding cofactor entries. In particular, consider the first row:
det(A) = A(1,1) * cofactor(A)(1,1) + A(1,2) * cofactor(A)(1,2) + ... + A(1,N) * cofactor(A)(1,N)
The formula for the (I,J) entry of the cofactor matrix of A is:
cofactor(A)(I,J) = (-1)^{(I+J)} * det ( M(A,I,J) )where M(A,I,J) is the minor matrix of A, constructed by deleting row I and column J.
Back to TABLE OF CONTENTS.
Column echelon form is a special matrix structure which is usually arrived at by Gauss elimination.
Any matrix can be transformed into this form, using a series of elementary column operations. Once the form is computed, it is easy to compute the determinant, inverse, the solution of linear systems (even for underdetermined or overdetermined systems), the rank, and solutions to linear programming problems.
A matrix (whether square or rectangular) is in column echelon form if:
A matrix is in column reduced echelon form if it is in column echelon form, and it is also true that:
Column echelon form is primarily of use for teaching, and analysis of small problems, using exact arithmetic. It is of little interest numerically, because very slight errors in numeric representation or arithmetic can result in completely erroneous results.
Back to TABLE OF CONTENTS.
Two square matrices, A and B, are said to commute if
A * B = B * A
Facts about commuting matrices:
Back to TABLE OF CONTENTS.
The companion matrix for a monic polynomial P(X) of degree N is a matrix of order N whose characteristic polynomial is P(X).
If the polynomial P(X) is represented as:
P(X) = X^{N} + C(1) * X^{(N-1)} + ... + C(N-1) * X + C(N).then the companion matrix for this polynomial has the form:
0 0 0 ... 0 0 -C(N) 1 0 0 ... 0 0 -C(N-1) 0 1 0 ... 0 0 -C(N-2) ................... 0 0 0 ... 1 0 -C(2) 0 0 0 ... 0 1 -C(1)
Note that the determinant of the companion matrix is either -C(N) or +C(N), so it is never zero unless the polynomial itself is degenerate (has a zero leading coefficient).
The characteristic polynomial of a companion matrix is, in fact, the polynomial that was used to define the matrix originally. Thus it is always possible to construct a matrix with any desired set of eigenvalues, by constructing the corresponding characteristic polynomial, and then the companion matrix.
The companion matrix can also be used to perform a decomposition of a matrix A. If x is a vector, and K the Krylov matrix
K = Krylov ( A, x, n )whose columns are the successive products x, A*x, A^{2}*x, and so on, and if K is nonsingular, then
A = K * C * K^{-1}where the matrix C is the companion matrix of A.
(There are several equivalent forms of the companion matrix, with the coefficients running along the top, the bottom, or the first column of the matrix.)
Back to TABLE OF CONTENTS.
A matrix norm and a vector norm are compatible if it is true, for all vectors x and matrices A that
||A*x|| <= ||A|| * ||x||In some texts, the word consistent is used in this sense, instead of compatible.
In particular, if you have not verified that a pair of norms are compatible, then the above inequality is not guaranteed to hold. For any vector norm, it is possible to define at least one compatible matrix norm, namely, the matrix norm defined by:
||A|| = supremum ||A*x|| / ||x||where the supremum (roughly, the "maximum") is taken over all nonzero vectors x. If a matrix norm can be derived from a vector norm in this way, it is termed a vector-bound matrix norm. Such a relationship is stronger than is required by compatibility.
If a matrix norm is compatible with some vector norm, then it is also true that
||A*B|| <= ||A|| * ||B||where both A and B are matrices.
Back to TABLE OF CONTENTS.
Complex numbers have the form a+bi, where i is a special quantity with the property that i^{2}=-1.
It is possible to devise real matrices that behave like complex numbers. Let the value "1" be represented by the identity matrix of order 2, and the value "i" be represented by
0 1 -1 0Then it is easy to show that these matrices obey the rules of complex numbers. In particular, "i" * "i" = - "1". In general, the complex number a+bi is represented by
a b -b aand multiplication and inversion have the correct properties.
Back to TABLE OF CONTENTS.
The condition number of the coefficient matrix A of a linear system is a (nonnegative) number used to estimate the amount by which small errors in the right hand side b, or in A itself, can change the solution x.
This analysis ignores arithmetic roundoff, which is hard to analyze, and focusses on easily measurable quantities known beforehand, and how they will amplify or diminish the roundoff errors.
Small values of the condition number suggest that the algorithm will not be sensitive to errors, but large values indicate that small data or arithmetic errors may explode into enormous errors in the answer.
The condition number is defined in terms of a particular matrix norm. Many different matrix norms may be chosen, and the actual value of the condition number will vary depending on the norm chosen. However, the general rule that large condition numbers indicate sensitivity will hold true no matter what norm is chosen.
The condition number for a matrix A is usually defined as
condition ( A ) = || A || * || A^{-1} ||.If A is not invertible, the condition number is infinite.
Simple facts about the condition number:
LINPACK routines such as SGECO return RCOND, an estimate of the reciprocal of the condition number in the L1 matrix norm.
Turing's M condition number, M(A), for a matrix of order N, is defined as
M(A) = N * max | A_{i,j} | * max | A^{-1}_{i,j} |.
Turing's N condition number, N(A) is
N(A) = Frob ( A ) * Frob ( A^{-1} ) / Nwhere Frob(A) is the Frobenius matrix norm.
The Von Neumann and Goldstine P condition number is
P(A) = | lambda_Max / lambda_Min |where lambda_Max and lambda_Min are the eigenvalues of largest and smallest magnitude, which is equivalent to using the spectral radius of A and A^{-1}.
There is also a condition number defined for the eigenvalue problem, which attempts to estimate the amount of error to be expected when finding the eigenvalues of a matrix A.
Back to TABLE OF CONTENTS.
Congruent matrices A and B are related by a nonsingular matrix P such that
A = P' * B * P.
Congruent matrices have the same inertia.
Congruence is of little interest by itself, but the case where P is also an orthogonal matrix is much more important.
Back to TABLE OF CONTENTS.
The conjugate gradient method is designed to solve linear systems
A * x = bwhen the matrix A is symmetric, and positive definite.
The classical method is not an iterative method, but rather a direct method which produces an approximation to the solution after N steps.
Because of numerical inaccuracies and instabilities, many implementations of the method repeat the computation several times, until the residual error is deemed small enough.
On the other hand, if N is very large, and a suitable preconditioner is used, it is possible to treat the conjugate gradient method as "essentially" an iterative method, in the sense that it may be the case that a good approximate solution can be found by stopping the algorithm early. The matrix is assumed to be symmetric positive definite; if it is also large and sparse, a suitable preconditioner may be the Incomplete Cholesky Factorization. In this case, the algorithm is sometimes referred to as ICCG(0): that is, the Incomplete Cholesky - Conjugate Gradient algorithm.
The method is ideally suited for use with large sparse systems, because the matrix A is only accessed to compute a single matrix-vector product on each step. This involves no fill in or overwriting of the data structure that describes A. On the other hand, if the matrix A is dense, the conjugate gradient method costs roughly 3 times the number of operations for direct Gauss elimination.
The conjugate gradient method can be considered as a minimization of the functional f(x), defined by
f(x) = x' * ( 0.5 * A * x - b )which achieves its minimum value when x solves the linear system.
Here are the formulas for the basic conjugate gradient method. Brackets indicate the value of an iterative quantity. X[0] is the initial value of the vector X, X[1] the value after one iteration, and so on.
X[0] = 0 For K = 1 to NCompute the residual error:
R[K-1] = B - A * X[K-1]Compute the direction vector:
If K = 1 then P[K] = R[0] else BETA = - P'[K-1] * A * R[K-1] / ( P'[K-1] * A * P[K-1] ) P[K] = R[K-1] + BETA * P[K-1] end ifCompute the location of the next iterate:
ALPHA = ( R'[K-1] ) * R[K-1] / ( P'[K] * A * P[K] ) X[K] = X[K-1] + ALPHA * P[K] end for
Conjugate gradient algorithms are available in IMSL, ITPACK and NSPCG.
Back to TABLE OF CONTENTS.
The conjugate matrix of a complex matrix A, denoted by A* or conjugate ( A ), is the matrix obtained by replacing each entry of A by its complex conjugate.
(In this document, the form conjugate ( A ) is preferred, because the A* is easily confused with multiplication.
The complex conjugate transpose, sometimes called the Hermitian or tranjugate of A, is derived from A by complex conjugation, followed by transposition, and is denoted by A^{H}.
Back to TABLE OF CONTENTS.
The conjugate of a complex number z = a + b * i is the complex number
conjugate ( z ) = a - b * i.
The conjugate is frequently represented by placing a bar over the quantity, or occasionally a star after it, as in z*.
The complex conjugate can be used in a formula for the norm or magnitude of a complex number, which must always be a real nonnegative value:
norm ( z ) = sqrt ( z * conjugate ( z ) ) = sqrt ( a^{2} + b^{2} ).
For complex vectors, an inner product with the correct properties may be defined as:
V dot W = ( V, W ) = sum ( 1 <= I <= N ) conjugate ( V(I) ) * W(I).This inner product is computed in the BLAS function CDOTC, for example, and yields another relationship with the Euclidean vector norm:
|| V || = sqrt ( V dot V )
Back to TABLE OF CONTENTS.
Two (complex) matrices A and B are said to be conjunctive if there is some nonsingular matrix P so that
A = ( conjugate ( P ) )' * B * P,
This is the extension to complex matrices of the concept of a congruent real matrix.
Back to TABLE OF CONTENTS.
A consistent linear system is an M by N linear system A * x = b, with A and b given, for which there is at least one solution vector x.
A simple example of an inconsistent system is:
1 0 1 1 0 2 3 * x = 2 1 2 4 4The last row of A is the sum of the previous two rows, but the last entry of b is not the sum of the previous two entries, so there can be no solution.
By contrast, the system
1 2 3 14 4 5 6 * x = 34is consistent, having as one solution (of many) the vector (1,2,3), and the system
1 0 5 1 2 5 2 4 * x = 10 3 6 15 4 8 20is consistent, having the unique solution ( 5, 0 )
Facts about a consistent linear system:
If a linear system is not consistent, then there is no solution x. But in such a case, it may be of interest to determine an approximate solution, which satisfies some condition, such as minimizing some norm of the residual. One technique for doing this involves the linear least squares method..
Back to TABLE OF CONTENTS.
A convergent matrix A is a square matrix for which the limit as n goes to infinity of A^{n} is zero.
A matrix is convergent if and only if the spectral radius rho(A) satisfies
rho(A) < 1
A semiconvergent matrix A is a square matrix A for which the limit as n goes to infinity of A^{n} exists. If a matrix is semiconvergent, then either it is convergent, (rho(A) < 1) or else rho(A) = 1. In the second case, it must be further true that the only eigenvalue of norm 1 is 1, and that this eigenvalue is semisimple.
The simplest example of a semiconvergent matrix is the identity matrix.
Back to TABLE OF CONTENTS.
A correlation matrix is a square positive semidefinite symmetric matrix with unit diagonal entries, and off-diagonal entries whose magnitude is no greater than 1.
In statistical applications, entry A_{i,j} represents the strength of the correlation between variables I and J. The strongest correlation is +1; a correlation of 0 means the variables are apparently completely independent. If a correlation is negative, it means that variable I is positively correlated to the negative of the value of variable J.
The entries of the correlation matrix can be represented as
A_{i,j} = X(I) dot X(J) / ( ||X(I)|| ||X(J)|| )where the norm used is derived from the dot product, and the dot product takes the form appropriate for the kind of data being handled, such as finite vectors, or continuous functions.
Back to TABLE OF CONTENTS.
The cross product of two vectors u and v, denoted u x v, is a vector w which is perpendicular to u and v, pointing in the direction so that (u,v,w) forms a right handed coordinate system, and whose length is equal to the area of the parallelogram two of whose sides are u and v.
Algebraically,
w(1) = u(2) * v(3) - u(3) * v(2)
w(2) = u(3) * v(1) - u(1) * v(3)
w(3) = u(1) * v(2) - u(2) * v(1)
If the unit vectors in the coordinate directions are denoted by i, j and k, then the cross product vector can also be regarded as the (vector) value of the following "determinant":
| i j k | w = u x v = det | u(1) u(2) u(3) | | v(1) v(2) v(3) |
Back to TABLE OF CONTENTS.
Cyclic reduction is a method for solving a linear system A*x=b in the special case where A is a tridiagonal matrix.
On a parallel computer, this method solves the system in LOG(N) "steps" where N is the order of A. A standard Gauss elimination method for a tridiagonal system would require roughly N "steps" instead.
A tridiagonal system has some very special properties that will allow us to carry this operation out. Consider this system of 7 equations:
A11 x1 + A12 x2 = y1 A21 x1 + A22 x2 + A23 x3 = y2 A32 x2 + A33 x3 + A34 x4 = y3 A43 x3 + A44 x4 + A43 x5 = y4 A54 x4 + A55 x5 + A56 x6 = y5 A65 x5 + A66 x6 + A67 x7 = y6 A76 x6 + A77 x7 = y7
The first equation can be used to eliminate the coefficient A21 in the second equation, and the third equation to eliminate the coefficient A23 in the second equation. This knocks out variables x1 and x3 in the second equation, but adds x4 into that equation.
By the same method, x3 and x5 can be eliminated from the equation for x4, and so on. By eliminating the odd variables from the even equations, a smaller tridiagonal system system is derived, with half the equations and variables.
If elimination is applied to this set, the number of equations is again reduced by half; this reduction may be repeated until a single equation in one variable is reached. Backsubstitution then produces the values of all the variables.
The reason this method might have an advantage over Gauss elimination is that, at each step of the elimination phase, the parts of the step are independent. If many computer processors are available, then each can be working on a separate portion of the elimination. If the number of processors is large enough, the system can really be solved in LOG(N) time.
Cyclic reduction routines are available in the NCAR software library, the SLATEC library, and the Cray SCILIB library.
Back to TABLE OF CONTENTS.
A cyclic tridiagonal matrix is a generalization of a tridiagonal matrix which includes an extra last entry in the first row, and an extra first entry in the last row.
An example of a cyclic tridiagonal matrix:
-2 1 0 0 1 1 -2 1 0 0 0 1 -2 1 0 0 0 1 -2 1 1 0 0 1 -2
A cyclic tridiagonal matrix is not a tridiagonal matrix. If the matrix is constant along the three generalized diagonals, a cyclic tridiagonal matrix is a circulant matrix. A cyclic tridiagonal matrix can arise in situations where a periodic boundary condition is applied.
It is very disappointing that a cyclic tridiagonal matrix is not a tridiagonal matrix, since there are so many good methods for solving tridiagonal linear systems. One way to solve a cyclic tridiagonal system is to use the Sherman Morrison Formula and view the matrix as a rank one perturbation of a tridiagonal matrix. Another approach is to view it as a border banded matrix.
A cyclic tridiagonal matrix may also be called a periodic tridiagonal matrix.
Back to TABLE OF CONTENTS.
A defective matrix is a (square) matrix that does not have a full set of N linearly independent eigenvectors.
For every eigenvalue, there is always at least one eigenvector, and eigenvectors corresponding to distinct eigenvalues are linearly independent. If the N eigenvalues of a matrix are distinct, then it surely has N linearly independent eigenvectors, and so cannot be defective.
Conversely, if a matrix is defective, then it must have at least one repeated eigenvalue, that is, an eigenvalue of algebraic multiplicity greater than 1. A matrix is defective if and only if its Jordan Canonical Form has at least one nonzero entry on the superdiagonal.
Thus, a simple example of a defective matrix is:
1 1 0 1which has the single eigenvalue of 1, with algebraic multiplicity 2, but geometric multiplicity 1. The only eigenvector is ( 0, 1 ).
If a matrix is not defective, then its eigenvectors form a basis for the entire linear space. In other words, any vector y can be written as
y = X * cwhere X is the array of eigenvectors of A.
If a matrix A is not defective, then it is similar to its diagonal eigenvalue matrix:
A = X * LAMBDA * X^{-1}and the similarity transformation matrix X is actually the eigenvector matrix.
This in turn allows us to make interesting statements about the inverse, tranpose, and powers of A. For instance, we see that
A^{2} = ( X * LAMBDA * X^{-1} ) * ( X * LAMBDA * X^{-1} )leading us to the statement that for a nondefective matrix, the square has the same eigenvectors, and the square of the eigenvalues.
= X * LAMBDA^{2} * X^{-1}
Back to TABLE OF CONTENTS.
Deflation is a technique for "removing" a known eigenvalue from a matrix, in order to facilitate the determination of other eigenvalues.
For example, the power method is able to estimate the eigenvalue of largest modulus of a matrix A. Once this is computed, it might be desired to find the next largest eigenvalue. Deflation can be used to essentially create a new matrix, A', which has the same eigenvalues as A, except that the largest eigenvalue has been dropped (and the order of A' reduced by 1) or the largest eigenvalue is replaced by 0. In either case, the power method applied to A' will produce the next largest eigenvalue of A.
To eliminate a known eigenvalue, lambda, it is necessary to know its eigenvector x, which we will assume has been scaled to have unit Euclidean norm. By the properties of eigenvectors, we know that
A * x = lambda * x.Now define the matrix A^ so that:
A^ = A - lambda * x * x'.Now x is an eigenvector of A^ with eigenvalue 0, because:
A^ * x = ( A - lambda * x * x' ) * x
= A * x - lambda * x * x' * x
= lambda * x - lambda * x
= 0
If the power method is being employed, then the new iteration should try to "factor out" any component of the eigenvector x; otherwise, small errors in the computation of the first eigenvalue and eigenvector will interfere with the next results.
Theoretically, this process may be repeated as often as desired, eliminating each eigenvalue as it is discovered. Practically, however, accumulated errors in the eigenvalues and eigenvectors make the computation more and more unreliable with each step of deflation. Thus, if more than a few eigenvalues are desired, it is more appropriate to use a standard technique.
Back to TABLE OF CONTENTS.
A derogatory matrix is a matrix whose minimal polynomial is of lower degree than its characteristic polynomial
For this to happen, of course, the matrix must have at least one eigenvector with algebraic multiplicity greater than one.
Surprisingly, the identity matrix is derogatory, (ignoring the case N=1). A matrix can be guaranteed to be nonderogatory if the geometric multiplicity of every eigenvalue is 1.Perhaps the only reason that the term is worth knowing is this fact: every nonderogatory matrix is similar to the companion matrix of its characteristic polynomial.
Back to TABLE OF CONTENTS.
The determinant of a square matrix is a scalar value whose most common usage is to indicate whether a matrix is singular or not.
If a matrix is singular, then it doesn't have an inverse and linear systems involving this matrix cannot be reliably solved.
In numerical work, the determinant is not really a reliable indicator of singularity, and other data, such as the relative size of the matrix elements encountered during pivoting, are preferred when checking for singularity.
The determinant also occurs in the definition of the eigenvalue problem.
An explicit formula for the determinant of a matrix A is:
det ( A ) = sum [ over all P ] sign(P) * A(1,P(1)) * A(2,P(2) * ... * A(N,P(N)).where the sum ranges over all possible permutations P of the numbers 1 through N, and sign(P) is +1 for an even permutation, and -1 for an odd permutation. (Any permutation may be accomplished by a sequence of switching pairs of objects. The permutation is called even or odd, depending on whether the number of switches is even or odd).
A numerical method for finding the determinant comes as a byproduct of the LU factorization used in Gaussian elimination. Typically, this factorization has the form
A = P * L * U,and the value of the determinant is simply
det ( A ) = det ( P ) * product ( 1 <= I <= N ) U(I,I).where det ( P ) is +1 or -1, again determined by the sign of the permutation.
Simple facts about the determinant:
A single elementary row operation has the following effect on the determinant:
For small matrices, the exact determinant is simple to compute by hand. The determinant of a 2 by 2 matrix
a b c dis a*d-b*c, while the determinant of a 3 by 3 matrix:
a b c e f g h i jis a * (f*j-g*i) - b * (e*j-g*h) + c * (e*i-f*h).
If absolutely necessary, the determinant of a matrix of order N can be computed recursively in terms of determinants of minor matrices. Let M(A,I,J) stand for the (I,J) minor matrix of A. Then the determinant of A is
det ( A ) = sum ( 1 <= J <= N ) (-1)^{(J+1)} * A_{i,j} * det ( M(A,I,J) ).Of course, now we need to compute the determinants of the N minor matrices, but the order of these matrices has been reduced by 1. Theoretically, we can represent the determinant of any of these matrices of order N-1 by a similar sum involving minor matrices of order N-2, and this process can be repeated until we reach matrices of order 1 or 2, whose determinants are easy to compute. In practice, this method is never used except in simple classroom exercises.
There is a geometric interpretation of the determinant. If the rows or columns of A are regarded as vectors in N dimensional space, then the determinant is the volume of a the parallelepiped, or "slanted cube" whose one corner is defined by these vectors.
LAPACK and LINPACK provide routines for computing the determinant of a matrix, after the matrix has been decomposed into LU factors.
Back to TABLE OF CONTENTS.
A matrix is (column) diagonally dominant if, for every column, the sum of the absolute values of the offdiagonal elements is never greater than the absolute value of the diagonal element.
The matrix is strictly diagonally dominant if the offdiagonal sum is always strictly less than the absolute value of the diagonal element.
The same definitions can be used to consider rows instead of columns. The terms column diagonally dominant and row diagonally dominant may be used, if necessary, to specify which case is being considered.
A strictly diagonally dominant matrix cannot be singular, by Gershgorin's Theorem.
A diagonally dominant matrix which is also irreducible cannot be singular.
Here is a diagonally dominant matrix which is not strictly diagonally dominant:
-2 1 0 0 0 1 -2 1 0 0 0 1 -2 1 0 0 0 1 -2 1 0 0 0 1 -2
For a linear system A * x = b, if the matrix A is strictly diagonally dominant, then both Jacobi iteration and Gauss Seidel iteration are guaranteed to converge to the solution.
Back to TABLE OF CONTENTS.
A diagonal matrix is one whose only nonzero entries are along the main diagonal. For example:
3 0 0 0 4 0 0 0 7
Simple facts about a diagonal matrix A:
Back to TABLE OF CONTENTS.
A diagonalizable matrix is any (square) matrix A which is similar to a diagonal matrix:
A = P * D * P^{-1}.
This concept is important in the study of eigenvectors To see the relationship, post-multiply the equation by P:
A * P = P * D.Looking at the columns of P as eigenvectors, and the diagonal entries of D as eigenvalues, this shows that a matrix is diagonalizable exactly when it has N linearly independent eigenvectors.
In certain cases, not only is a matrix diagonalizable, but the matrix P has a special form. The most interesting case is that of any (real) symmetric matrix A; not only can such a matrix be diagonalized, but the similarity matrix is orthogonal:
A = Q * D * Q^{-1} = Q * D * Q',This fact can be interpreted to show that not only does every symmetric matrix have a complete set of eigenvectors, but the eigenvectors and eigenvalues are real, and the eigenvectors are pairwise orthogonal.
Similarly, a complex matrix that is a Hermitian has a complete set of eigenvectors, and the eigenvalues are real. (The eigenvectors in this case will not generally be real).
Simple facts:
Back to TABLE OF CONTENTS.
The downshift matrix A circularly shifts all vector entries or matrix rows down 1 position.
Example:
0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0
Facts about the downshift matrix A:
Back to TABLE OF CONTENTS.
Eigenvalues are special values associated with a (square) matrix, which can be used to analyze its behavior in multiplying any vector.
The formal definition of an eigenvalue of a matrix A is that it is any value lambda which is a root of the characteristic equation of the matrix,
det ( A - lambda * I ) = 0.
lambda is an eigenvalue of A if and only if there is a nonzero vector x, known as an eigenvector (or sometimes a "right" eigenvector), with the property that
A * x = lambda * x.Note that there must also be a "left" eigenvector y, with the property
y * A = A' * y = lambda * y.
The characteristic equation has exactly N roots, so a matrix has N eigenvalues. An important consideration is whether any eigenvalue is a repeated root, which determines how hard the eigenvector computation will be.
If a matrix has the maximum possible number of linearly independent eigenvectors (namely N, the order of the matrix), then the eigenvalues and eigenvectors can be used to diagonalize the matrix. This only happens when the matrix is normal.
If the NxN matrix A has N linearly independent eigenvectors, we may collect the eigenvectors as columns of a matrix X, the eigenvalues as entries in the diagonal matrix Lambda, we may first write
A * X = X * Lambdaand then, because the eigenvectors are presumed to be linearly independent, we have the eigen decomposition or the spectral decomposition of A:
A = X * Lambda * X^{-1}
Simple facts about eigenvalues of A:
Simple algorithms for computing eigenvalues include the power method and the inverse power method. The QR method is a more powerful method that can handle complex and multiple eigenvalues.
LAPACK and EISPACK include algorithms for computing the eigenvalues and eigenvectors of a variety of types of matrix, as well as methods that can be applied to more general eigenvalue problems.
Back to TABLE OF CONTENTS.
A nonzero vector x is an eigenvector of the square matrix A if
A * x = lambda * xfor some scalar value lambda, called the associated eigenvalue.
Sometimes this eigenvector is more particularly described as a right eigenvector, so that we may also consider left eigenvectors, that is, vectors y for which it is true that
y * A = A' * y = mu * yfor some scalar mu.
For every eigenvalue of a matrix, there is at least one eigenvector. Every nonzero multiple of this eigenvector is also an eigenvector, but in an uninteresting way. If, and only if, an eigenvalue is a repeated root, then there may be more than one linearly independent eigenvector associated with that eigenvalue. In particular, if an eigenvalue is repeated 3 times, then there will be 1, 2 or 3 linearly independent eigenvectors corresponding to that eigenvalue.
Facts about eigenvectors:
Back to TABLE OF CONTENTS.
EISPACK is a package of routines for handling the standard and generalized eigenvalue problems.
The beginning user who is not interested in trying to learn the details of EISPACK and simply wants the answer to an eigenvalue problem quickly should call one of the main driver routines. Each of these is tailored to handle a given problem completely with a single subroutine call. For more advanced work, it may be worth investigating some of the underlying routines.
Driver routines to solve A*x=lambda*x include:
For the generalized eigenvalue problem:
Back to TABLE OF CONTENTS.
The EISPACK matrix norm is used in the EISPACK eigenvalue package.
The definition of the norm for an M by N matrix is:
||A|| = sum ( I = 1 to M, J = 1 to N ) | A_{i,j} |
It's a simple exercise to verify that this quantity satisifes the requirements for a matrix norm.
This norm is easy to calculate, and was used in EISPACK in order to have a standard against which to compare the size of matrix elements that were being driven to zero. I haven't seen it used anywhere else in practice.
Back to TABLE OF CONTENTS.
Elementary column operations are a simple set of matrix operations that can be used to carry out Gauss elimination, Gauss Jordan elimination, or the reduction of a matrix to column echelon form.
Restricting the operations to a simple set makes it easy to:
The three elementary column operations include:
Each of these operations may be represented by an elementary matrix, and the transformation of the original matrix A to the reduced matrix B can be expressed as postmultiplication by a concatenation of elementary matrices:
B = A * E(1) * E(2) * ... * E(k)which may be abbreviated as:
B = A * CSince C will be guaranteed to be invertible, we also know that,
B * C^{-1} = Awhich yields a factorization of A.
Back to TABLE OF CONTENTS.
An elementary matrix E is one which, when pre-multiplying another matrix A, produces a product matrix E * A which has exactly one of the following properties:
The matrix E which interchanges rows R1 and R2 of matrix A has the form E(I,J)=:
The matrix E which multiplies row R1 of A by the constant s has the form E(I,J)=:
The matrix E which adds s * row R2 to row R1 of A has the form E(I,J) =:
If a matrix F can be represented as the product of elementary matrices,
F = E1 * E2 * ... * EM,then its inverse is:
F^{-1} = EM^{-1} * EM-1^{-1} * ... * E1^{-1}.
An elementary similarity transformation uses a matrix F which is the product of elementary matrices, and transforms the matrix A into the similar matrix B by the formula
B = F^{-1} * A * F.
Back to TABLE OF CONTENTS.
Elementary row operations are a simple set of matrix operations that can be used to carry out Gauss elimination, Gauss Jordan elimination, or the reduction of a matrix to row echelon form.
Restricting the operations to a simple set makes it easy to:
The three elementary row operations include:
Each of these operations may be represented by an elementary matrix, and the transformation of the original matrix A to the reduced matrix B can be expressed as premultiplication by a concatenation of elementary matrices:
B = E(k) * E(k-1) * ... * E(2) * E(1) * Awhich may be abbreviated as:
B = C * A.Since C will be guaranteed to be invertible, we also know that,
C^{-1} * B = Awhich yields a factorization of A.
Back to TABLE OF CONTENTS.
An ellipsoid is an N dimensional generalization of an ellipse. The formula for an ellipsoid may be written as:
sum ( I = 1 to N, J = 1 to N ) A_{i,j} * X_{i} * X_{j} = 1.where A is a positive definite symmetric matrix.
A principal axis of an ellipsoid is any N dimensional point X on the ellipsoid such that the vector from the origin to X is normal to the ellipsoid.
In the general case, there are exactly N principal axes (plus their negatives). In degenerate cases, there may be an entire plane of vectors that satisfy the requirement, but it is always possible to choose a set of N principal axes which are linearly independent.
Moreover, in the general case, the principal axes are pairwise orthogonal vectors, and in the degenerate case, may be chosen pairwise orthogonal.
Moreover, it is always true that the principal axes are eigenvectors of the matrix A of ellipsoid coefficients. The length of the principal axis vector associated with an eigenvalue lambda(I) is 1 / Sqrt ( lambda(I) ).
These facts have a strong relationship to the formulation of the conjugate gradient method.
Back to TABLE OF CONTENTS.
Equilibration is the technique of balancing the rows or columns of a matrix by rescaling them.
Consider, for instance, the fact that the following two equations are equivalent:
0.0001 * x + 0.0001 * y = 0.0001and
1000 * x + 1000 * y = 1000However, the large coefficients in the second equation will bias a Gauss elimination routine to choose that equation as its pivot. Actually, it's more important in this case that the chosen row be as "linearly independent as possible" from the other rows, and this is more likely to occur if we ensure that all the rows start out with an equal norm. This can be done very simply, by finding the element of maximum absolute value in each row and dividing that row (and its right hand side) by that value. Such a technique is called row equilibration. It is not necessary that the rows have precisely the same norm; it is desirable that the norms of the rows be maintained within some controlled range.
Equilibration is useful in many areas of linear algebra, including eigenvalue calculations. In some cases, column equilibration is preferred, and in other cases, the norms of both the rows and columns are to be controlled.
Back to TABLE OF CONTENTS.
Matrices A and B are said to be equivalent if there are nonsingular matrices P and Q so that
A = P * B * Q.
Simple facts about equivalence:
Equivalence is a very loose concept of relatedness. A stronger and more useful concept is similarity.
Back to TABLE OF CONTENTS.
The exchange matrix J is constructed from the identity matrix by reversing the order of the columns.
For example, the matrix J of order 4:
0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0
Facts about the exchange matrix J:
The exchange matrix is sometimes called the anti-identity matrix or the counter-identity matrix or the reversal matrix.
Back to TABLE OF CONTENTS.
An external storage algorithm is a method of solving a problem that is too large to be loaded into computer memory as a whole.
Instead, the problem is solved incrementally, with most of the problem data residing, at any one time, in computer files, also called "disk storage" or "external storage". It is a considerable difficulty just to rewrite an algorithm that can handle a situation where, say, part of a matrix is in one place, and part is in another, remote place.
However, such algorithms must also be aware that data transfers between memory and disk are very slow. Hence, if the algorithm is to be of any use, it must do as much processing as possible on the portion of the data that resides in memory, and read the external problem data into memory as rarely as possible, and in large contiguous "chunks".
Back to TABLE OF CONTENTS.
The Fourier matrix represents the linear operator that transforms a vector of data into a vector of Fourier coefficients.
Let w indicate an N-th root of unity. Then, if we choose N=4, the matrix F will be:
1 1 1 1 1 w w^2 w^3 1 w^2 w^4 w^6 1 w^3 w^6 w^9which simplifies to
1 1 1 1 1 w w^2 w^3 1 w^2 1 w^2 1 w^3 w^2 w^1However, we will choose to scale F by 1/sqrt(N).
Facts about the Fourier matrix F:
Back to TABLE OF CONTENTS.
The Frobenius matrix norm is a matrix norm that has the simple formula: ||A|| = the square root of the sum of the squares of all the entries of the matrix.
The Frobenius matrix norm is not a vector-bound matrix norm, although it is compatible with the L2 vector norm, and much easier to compute that the L2 matrix norm.
The Frobenius matrix norm is sometimes called the Schur matrix norm or Euclidean matrix norm
Back to TABLE OF CONTENTS.
Gauss elimination has the goal of producing a solution x to the system of linear equations A*x=b, where A is matrix of order N, and b a vector of length N. The standard version of Gauss elimination used in most algorithms employs partial pivoting.
Gauss elimination accomplishes its goal by decomposing the original matrix A into three factors, a permutation matrix P, a unit lower triangular matrix L, and an upper triangular matrix U. The factors are related to the original matrix by the formula
A = P * L * U.
Once the matrix is factored, it is a simple matter to solve A*x=b, by solving instead P * ( L * ( U * x ) ) ) = b, because each of the three factors is easy to invert.
Moreover, once the factors are known, the user may solve several linear systems involving A, with different right hand sides.
The determinant of A is equal to the product of the determinants of the factors, and hence is easily computed: the determinant of P is plus or minus 1, and that of L is 1, and that of U is simply the product of its diagonal elements.
The inverse matrix could be solved for, if necessary, by solving the N linear systems A * X(I) = E(I), where E(I) is the I-th Cartesian basis vector The vectors X(1), X(2), ..., X(N) then are the columns of the inverse of A.
As an example of the Gauss elimination of a matrix, suppose we start with with the the matrix:
A 1 2 3 4 5 6 7 8 0We can write an imperfect PLU factorization as:
P L U 1 0 0 1 0 0 1 2 3 0 1 0 0 1 0 4 5 6 0 0 1 0 0 1 7 8 0The factorization is imperfect because, although A = P*L*U, the matrix U is not upper triangular. We will now modify the matrix U, and update the factors P and L, so that it is always true that A=P*L*U, while the matrix U gradually is transformed into the correct upper triangular form.
Step 1.1: Choose a pivot row in U, namely, row 3. We want to interchange rows 3 and 1 of the matrix U. The elementary permutation matrix P(1,3) does this. We are allowed to insert the inverse of this matrix times itself between L and U in the factorization. We also insert the inverse of this matrix times itself between P and L. If we use primes to denote the updated quantities, these operations are:
A = P * L * U = [ P * P^{-1}(1,3) ] * [ P(1,3) * L * P^{-1}(1,3) ] * [ P(1,3) * U ] = P' * L' * U'The resulting factorization is now:
P L U 0 0 1 1 0 0 7 8 0 0 1 0 0 1 0 4 5 6 1 0 0 0 0 1 1 2 3Step 1.2: Eliminate U_{2,1} by subtracting 4/7 of row 1 from row 2. To do this, we construct the elementary matrix L(1,2,4/7), and insert the product of its inverse and itself into the factorization. Then we absorb the inverse into L, and the matrix into U.
A = P * L * U = P * [ L * L^{-1}(1,2,4/7) ] * [ L(1,2,4/7) * U ] = P * L' * U'The resulting factorization is now:
P L U 0 0 1 1 0 0 7 8 0 0 1 0 4/7 1 0 0 3/7 6 1 0 0 0 0 1 1 2 3Step 1.3: Eliminate U_{3,1} by subtracting 1/7 of row 1 from row 3. To do this, we construct the elementary matrix L(1,3,1/7), and insert the product of its inverse and itself into the factorization. Then we absorb the inverse into L, and the matrix into U.
A = P * L * U = P * [ L * L^{-1}(1,3,1/7) ] * [ L(1,3,1/7) * U ] = P * L' * U'The resulting factorization is now:
P L U 0 0 1 1 0 0 7 8 0 0 1 0 4/7 1 0 0 3/7 6 1 0 0 1/7 0 1 0 6/7 3Step 2.2: Choose a pivot row in U, namely, row 3. We want to interchange rows 3 and 2 of the U. The elementary permutation matrix P(2,3) does this. We are allowed to insert the inverse of this matrix times itself between L and U in the factorization. We also insert the inverse of this matrix times itself between P and L. If we use primes to denote the updated quantities, these operations are:
A = P * L * U = [ P * P^{-1}(2,3) ] * [ P(2,3) * L * P^{-1}(2,3) ] * [ P(2,3) * U ] = P' * L' * U'The resulting factorization is now:
P L U 0 1 0 1 0 0 7 8 0 0 0 1 1/7 1 0 0 6/7 3 1 0 0 4/7 0 1 0 3/7 6Step 2.3: Eliminate U_{3,2} by subtracting 1/2 of row 2 from row 3. To do this, we construct the elementary matrix L(2,3,1/2), and insert the product of its inverse and itself into the factorization. Then we absorb the inverse into L, and the matrix into U.
A = P * L * U = P * [ L * L^{-1}(2,3,1/2) ] * [ L(2,3,1/2) * U ] = P * L' * U'The resulting factorization is now:
P L U 0 1 0 1 0 0 7 8 0 0 0 1 1/7 1 0 0 6/7 3 1 0 0 4/7 1/2 1 0 0 9/2The PLU factorization is now correct.
You should be able to see a formula for the final factors:
P = I * P^{-1}(1,3) * P^{-1}(2,3)and
L = P(2,3) * P(1,3) * I * P^{-1}(1,3) * L^{-1}(1,2,4/7) * L^{-1}(1,3,1/7) * P^{-1}(2,3) * L^{-1}(2,3,1/2)and
U = L(2,3,1/2) * P(2,3) * L(1,3,1/7) * L(1,2,4/7) * P(1,3) * AYou should see that the form of these matrices guarantees that A=P*L*U. The way we carried out the steps guarantees that P stays a permutation matrix, L stays a lower triangular matrix, and U becomes an upper triangular matrix.
Back to TABLE OF CONTENTS.
Gauss Jordan elimination is a method for solving a system of linear equations A * x = b for x, or for computing the inverse matrix of A.
Gauss elimination and Gauss Jordan elimination are very closely related. Gauss elimination reduces A to an upper triangular matrix, and saves the elimination factors in a lower triangular matrix. Gauss Jordan elimination proceeds relentlessly until A has been converted into the identity matrix.
Thus, unlike the Gauss elimination procedure, the Gauss Jordan elimination does not produce a factorization of the matrix, but only a solution to the linear system. This means that if a second linear system has to be solved, the matrix has to be set up and eliminated all over again.
The simplest way to describe Gauss Jordan is to note that to solve, say, the linear system A * x = b, the right hand side is appended as an extra column of the matrix. Then, on step I of the elimination, we choose a pivot row, move it to row I, divide it through by the pivot value, and then eliminate the matrix entries in column I from all other rows, rather than simply from rows I+1 through N. When the process is completed, the solution x has overwritten the right hand side b that was stored in column N+1.
Several right hand sides can be handled at once, by appending all of them to the coefficient matrix; the inverse can be computed by appending a copy of the identity matrix to the coefficient matrix before beginning elimination.
Gauss Jordan elimination is primarily used as a teaching tool, and for small linear systems. In practical computation, standard Gauss elimination is universally preferred.
Back to TABLE OF CONTENTS.
The Gauss Seidel iteration for linear equations is an iterative method for solving linear systems of equations A*x=b. It is similar to the the Jacobi algorithm and the successive overrelaxation method (SOR).
The Gauss Seidel iteration should only be used for matrices which are symmetric and positive definite, or for a matrix which is strictly diagonally dominant.
Each step of the Gauss Seidel iteration begins with an approximate answer x, and produces a new approximation y. Each component of y is computed in order, using the formula:
y(i) = [ b(i) - a(i,1)*y(1) - a(i,2)*y(2) ... - a(i,i-1)*y(i-1) - a(i,i+1)*x(i+1) ... - a(i,n)*x(n) ] / a(i,i)The calculation of each entry of y is dependent on the calculation of the entries of lower index. Thus, the value of y(1) is calculated first, and then y(2) is calculated based on the values of y(1) as well as x(3) through x(n), and so on.
The process is to be repeated until the residual error is small, or the change in the approximate solution is negligible.
The Gauss Seidel iteration can be considered in terms of its matrix splitting. That is, if we decompose the matrix A into its strictly lower triangular, diagonal, and strictly upper triangular parts:
A = L + D + Uthen the method is equivalent to the iteration
( L + D ) * xnew = b - U * x.which means that the convergence of the algorithm can be understood in terms of the behavior of powers of the iteration matrix:
- ( L + D )^{-1} * U,which in turn may best be understood by looking at the eigenvalues.
If the original coefficient matrix A is symmetric, then it may be preferred to use the symmetric Gauss Seidel iteration or SGS. In this case, the iteration consists of pairs of Gauss Seidel steps. The odd steps are the same as the usual iteration. But in the even steps, the variables are solved for in reverse order. Each pair of such steps is a single step of the SGS iteration, which has the property that its iteration matrix is similar to a symmetric matrix (though not necessarily symmetric itself). Among other things, this means that SGS can be used as a preconditioner for certain other problems.
Back to TABLE OF CONTENTS.
A general matrix is one which has no special matrix structure or matrix symmetry.
In such a case, there are no space advantages to be gained by using a special matrix storage format, and so the matrix entries are stored using the standard two dimensional array format provided by the programming language.
For general matrices, the only remaining issue concerns the problem that occurs when the matrix storage must be set aside before the size of the matrix is known. In FORTRAN, for example, it is common to specify a maximum matrix size of, say, 100 by 100. If the actual problem to be solved is of size 25 by 25, then it may be necessary to describe the data with both the matrix order of 25, and the leading dimension of the storage array, which is 100. In LINPACK and LAPACK, variables containing leading dimension information have names like LDA, LDB and so on.
LAPACK and LINPACK provide routines, with the prefix SGE, which apply to matrices in general storage.
Back to TABLE OF CONTENTS.
A generalized permutation matrix is a square matrix A with at most one nonzero entry in each row, and in each column.
A standard permutation matrix, of course, has exactly one nonzero entry in each row and column, and that entry has value 1.
An interesting fact: if A is a nonsingular nonnegative matrix, then the inverse of A is also nonnegative if and only if A is a generalized permutation matrix. In other words, it's very hard for both A and A^{-1} to be nonnegative, and essentially can only happen if A is diagonal.
So suppose A >= 0 but A is not a generalized permutation matrix. For an arbitrary vector b >= 0 , can we say that the solution x of A * x = b is nonnegative? No, because we know that the inverse of A is not nonnegative. Therefore A^{-1} contains at least one negative entry, say entry (i,j). Choose b = E(J), where E(J) is the J-th Cartesian basis vector. Then x = A^{-1} * b and it is easy to see that x(i) is negative.
Back to TABLE OF CONTENTS.
The method of Gershgorin disks provides an estimate of the size of the eigenvalues of a matrix. The accuracy of the estimate varies wildly, depending on the size of the elements of the matrix. It is most useful for matrices that are diagonally dominant or sparse.
Gershgorin's theorem states that the eigenvalues of any matrix A lie in the space covered by the disks D_{i}:
D_{i} = ( x: sqrt ( x - A(I,I) )^{2} <= R_{i} )where R_{i} is the sum of the absolute values of the off-diagonal elements of row I:
R_{i} = sum ( J =/= I ) | A_{i,j} |.
The theorem may also be applied using columns instead of rows.
Back to TABLE OF CONTENTS.
A Givens rotation is a linear transformation applied to two vectors, or two rows or columns of a matrix, which can be interpreted as a coordinate axis rotation. The intent of the rotation is to zero out an entry of the vector or matrix using an orthogonal transformation.
A Givens rotation is similar to the elementary row operation that adds a multiple of one row to another, but because a Givens rotation is an orthogonal similarity transformation, it offers greater stability and easy invertibility.
A Givens rotation matrix G has the form:
1 0 0 0 0 0 0 c 0 0 s 0 <-- row i 0 0 1 0 0 0 0 0 0 1 0 0 0 -s 0 0 c 0 <-- row j 0 0 0 0 0 1 ^ ^ col i col jwhere c = cosine(theta) and s = sin(theta) for some angle theta.
Premultiplying A by G has the following effect:
row i of A is replaced by c*row i + s*row j in G*A; row j of A is replaced by -s*row i + c*row j in G*A.while postmultiplication, A*G, would carry out a similar operation on the columns of A.
As an example, to zero out entry A_{i,j} of a matrix requires a Givens rotation with values of cosine and sine so that:
- s * A_{i,i} + c * A_{i,j} = 0.It's not actually necessary to compute the underlying rotation angle theta, since c and s can be computed directly:
s = A_{i,j} / sqrt ( A_{i,j}^{2} + A_{i,i}^{2} )
c = A_{i,i} / sqrt ( A_{i,j}^{2} + A_{i,i}^{2} )
For instance, to zero out the 3,1 entry of this matrix:
4 2 0 0 4 5 3 8 1the sine and cosine are 3/5, 4/5, yielding a Givens matrix G of:
0.8 0 0.6 0 1 0 -0.6 0 0.8and the product G * A:
5.0 6.4 6.0 0 4 5 0 5.2 8.0
It is possible to zero out entries of a matrix, one by one, using Givens rotations, similar to the way that Householder matrices are used, to reduce a matrix to a simpler form. The process can be used to zero out the entire lower triangle of a matrix, but further operations on the upper triangle would reintroduce zeroes in the lower triangle. Nonetheless, zeroing out the lower triangle means that Givens rotations can be used to produce the QR factorization of the matrix.
Back to TABLE OF CONTENTS.
Given a set of M vectors V_{i}, each of dimension N, the M by M Gram matrix is formed by all the pairwise dot products of the vectors:
G_{i,j} = V_{i} dot V_{j}
The Gram matrix is a positive semidefinite symmetric. If any vector is linearly dependent on the others, the matrix is singular. If the vectors are linearly independent, the Gram matrix has full rank M. If the vectors form a basis for the space, the Gram matrix has rank N.
The Gram matrix can be used to construct a projection matrix into the space spanned by the vectors.
Back to TABLE OF CONTENTS.
Gram Schmidt orthogonalization is a process which starts with a set of N vectors X(I), each with M components, and produces a set of N2 orthonormal vectors Y which span the linear space of the original vectors.
N2 is less than N if the vectors X(I) are linearly dependent, and equal to N if they are linearly independent. Thus, one use for the Gram Schmidt process is simply to determine if a set of vectors are linearly dependent; a second is to determine the dimension of the space spanned by a linearly dependent set.
The Gram Schmidt process may be defined iteratively:
for I = 1 to N N2 = I Y(I) = X(I) for J = 1 to I-1 C(J) = dot_product ( X(I), Y(J) ) Y(I) = Y(I) - C(J) * Y(J) end for Norm = sqrt ( dot_product ( Y(I), Y(I) ) ) if ( Norm = 0 ) exit Y(I) = Y(I) / Norm end for
Another way of looking at the process is to use the vectors to form the columns of a matrix A. The Gram Schmidt process can then be used to construct one version of the QR factorization of the matrix A:
A = Q * Rwhere Q is orthogonal and R is upper triangular.
Back to TABLE OF CONTENTS.
The Hadamard product of matrices A and B is a matrix C created by elementwise multiplication:
C_{i,j} = A_{i,j} * B_{i,j}
The Hadamard product is defined for any pair of rectangular matrices, as long as they have the same "shape", that is, the same number of rows, and the same number of columns.
Example of a Hadamard product:
1 2 3 * 7 8 9 = 7 16 27 4 5 6 10 11 12 40 55 72
Back to TABLE OF CONTENTS.
Hadamard's inequality provides an upper bound on the size of the determinant of a matrix. It is related to the fact that the determinant represents the volume of an N-dimensional parallelepiped.
Let ||C(I)|| designate the Euclidean norm of column I of the matrix A. Hadamard's inequality states that
det ( A ) <= ||C(1)|| * ||C(2)|| * ... * ||C(N)||,with equality holding only if one of the C(I)'s is zero, (yielding the minimum possible value of 0), or if all the C(I)'s are pairwise orthogonal vectors, (yielding the largest possible value).
The theorem may also be applied using rows instead of columns.
Back to TABLE OF CONTENTS.
A Hamiltonian matrix is a 2Nx2N matrix of the form
A G Q -Awhere the NxN matrices Q and G are symmetric.
Back to TABLE OF CONTENTS.
A Hankel matrix is a matrix which is constant along each of its anti-diagonals.
Here is an example of a square Hankel matrix:
7 6 5 4 6 5 4 3 5 4 3 2 4 3 2 1and a rectangular Hankel matrix:
7 6 5 4 3 2 6 5 4 3 2 1 5 4 3 2 1 0
Simple facts about a Hankel matrix A:
Compare the concepts of Toeplitz Matrix, an Anticirculant Matrix, and a Persymmetric Matrix.
Back to TABLE OF CONTENTS.
The Harwell Boeing Sparse Matrix Collection, or HBSMC, is a set of 43 data files describing a standard set of test sparse matrices for sparse matrix calculations.
The test set comprises linear systems, least squares problems, and eigenvalue calculations from a wide variety of disciplines. The set is offered as a standard benchmark for comparison of algorithms.
Here is an overview of the source and size of the various matrices:
Discipline Number Largest Largest number of order of nonzeroes matrices Counter examples, small matrices 3 11 76 Original Harwell test set 36 822 4841 Air traffic control 1 2873 15032 Astrophysics 2 765 24382 Chemical Engineering 16 2021 7353 Circuit simulation 1 991 6027 Demography 3 3140 543162 Economic modelling 11 2529 90158 Nuclear reactor core modelling 3 1374 8606 Optimal power flow problems 3 4929 47369 Stochastic modelling 7 1107 5664 Acoustic scattering 4 841 4089 Oil reservoir modelling 19 5005 20033 Stiff ODE problems 10 760 5976 George and Liu test problems 21 3466 13681 Model PDE problems 3 900 4322 Navier Stokes problems 7 3937 25407 Unassembled finite element matrices 10 5976 15680 Oceanography 4 1919 17159 Power network matrices 14 5300 13571 Everstine test set, ship structures 30 2680 23853 Structures, eigenproblems 22 15439 133840 Structures, linear equations 36 44609 1029655 Least squares problems 4 1850 10608
The SPARSKIT package includes utilities for conversion of matrices in the Harwell Boeing format into other formats, such as that used by ELLPACK and ITPACK.
Back to TABLE OF CONTENTS.
The Harwell Boeing Sparse Matrix Collection uses a matrix storage format which is a special kind of sparse matrix storage for most of the matrices in the collection.
The standard sparse matrix format is column oriented. That is, the matrix is represented by a sequence of columns. Each column is held as a sparse vector, represented by a list of row indices of the entries in an integer array and a list of the corresponding values in a separate real array. A single integer array and a single real array are used to store the row indices and the values, respectively, for all the columns.
Data for each column are stored in consecutive locations. The columns are stored in order, and there is no space between columns. A separate integer array holds the location of the first entry of each column, and the first free location. For symmetric and Hermitian matrices, only the entries of the lower triangle are stored, including the diagonal. For antisymmetric matrices, only the strict lower triangle is stored.
Here is a simple example of a 5 by 5 matrix:
1.0 -3.0 0.0 -1.0 0.0 0.0 0.0 -2.0 0.0 3.0 2.0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 -4.0 0.0 5.0 0.0 -5.0 0.0 6.0This matrix would be stored in the arrays
Subscripts: 1 2 3 4 5 6 7 8 9 10 11 COLPTR 1 4 6 8 10 12 ROWIND 1 3 5 1 4 2 5 1 4 2 5 VALUES 1.0 2.0 5.0 -3.0 4.0 -2.0 -5.0 -1.0 -4.0 3.0 6.0We can generate column 5, say, by observing that its first entry is in position COLPTR(5)=10 of arrays ROWIND and VALUES. This entry is in row ROWIND(10)=2 and has value VALUES(10)=3.0. Other entries in column 5 are found by scanning ROWIND and VALUES to position COLPTR(6)-1, that is, position 11. Thus, the only other entry in column 5 is in row ROWIND(11)=5 with value VALUES(11)=6.
Back to TABLE OF CONTENTS.
The HBSMC finite element storage format is a special matrix storage format for those matrices in the collection which derive from a finite element problem.
Matrices arising in finite element applications are usually assembled from numerous small elemental matrices. The collection includes a few sparse matrices in original unassembled form. The storage of the individual unassembled matrices is based on the general sparse format, which stores a matrix as a list of matrix columns. The elemental representation stores the matrix as a list of elemental matrices. Each elemental matrix is represented by a list of the row/column indices (variables) associated with the element and by a small dense matrix giving the numerical values by columns, or in the symmetric case, only the lower triangular part. The lists of indices are held contiguously, just as for the lists of row indices in the standard format. The dense matrices are held contiguously in a separate array, with each matrix held by columns. Although there is not a one to one correspondence between the arrays of integer and numerical values, the representation does not hold the pointers to the beginning of the real values for each element. These pointers can be created from the index start pointers (ELTPTR) after noting that an element with NU variables has NU*NU real values, or (NU*(NU+1))/2 in the symmetric case.
We illustrate the elemental storage scheme with a small, 5 by 5 example:
5.0 0.0 0.0 1.0 2.0 0.0 4.0 3.0 0.0 6.0 0.0 3.0 7.0 8.0 1.0 1.0 0.0 8.0 9.0 0.0 2.0 6.0 1.0 0.0 10.0generated from four elemental matrices:
1 4 1 5 2 3 5 3 4 1 (2.0 1.0) 1 (3.0 2.0) 2 (4.0 3.0 6.0) 3 (2.0 8.0) 4 (1.0 7.0) 5 (2.0 8.0) 3 (3.0 5.0 1.0) 4 (8.0 2.0) 5 (6.0 1.0 2.0)where the variable indices are indicated by the integers marking the rows and columns. This matrix would be stored in the ELTPTR (location of first entry), VARIND (variable indices) and VALUES (numerical values) arrays as follows:
Subscripts: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ELTPTR: 1 3 5 8 10 VARIND: 1 4 1 5 2 3 5 3 4 VALUES: 2. 1. 7. 3. 2. 8. 4. 3. 6. 5. 1. 2. 2. 8. 2.
Back to TABLE OF CONTENTS.
A nonsingular integer matrix is in Hermite Normal Form if it is lower triangular, all entries are non-negative, and each row has a unique maximum element which is located on the main diagonal.
(In some definitions, the matrix is required to be upper triangular instead.)
Any nonsingular integer matrix can be transformed to Hermite Normal Form using a series of unimodular column operations:
For example, given the matrix A:
5 2 1 -4 2 4 0 -3 6its Hermite normal form is:
1 0 0 4 6 0 6 15 30
Back to TABLE OF CONTENTS.
A Hermitian matrix A is a complex matrix that is equal to its complex conjugate transpose, symbolized by a "*" or "^{H}":
A = A^{H}or
A = A*
Here is a Hermitian matrix:
1 1+2i 3-4i 1-2i 4 6i 3+4i -6i 8
Simple facts about a Hermitian matrix A:
The corresponding concept for real matrices is symmetric.
Back to TABLE OF CONTENTS.
An upper Hessenberg matrix is a matrix which is entirely zero below the first subdiagonal.
An upper Hessenberg matrix is "almost" upper triangular. A lower Hessenberg matrix is, of course, entirely zero above the first superdiagonal. Upper Hessenberg matrices occur so often that they are frequently simply called "Hessenberg" matrices.
An example of an upper Hessenberg matrix is:
1 2 3 4 5 6 7 8 9 1 0 5 4 3 2 0 0 8 3 7 0 0 0 9 1
Eigenvalue programs typically transform a matrix into upper Hessenberg form, and then carry out the QR method on this matrix, which converges rapidly to a matrix which is diagonal except for 2 by 2 blocks corresponding to complex eigenvalues.
The reason for transforming a matrix into upper Hessenberg form is that the QR method is much less expensive if carried out on an upper Hessenberg matrix. It is actually cheaper to go to the additional trouble of transforming a matrix to upper Hessenberg form, and then carrying out the QR method on that matrix, rather than carrying out the QR method on the original matrix.
Back to TABLE OF CONTENTS.
The Hölder norm of a vector, sometimes called the p-norm, is a vector norm of the form ||x||_{p} = ( sum ( 1 <<= i <= n ) |x_{i}|^{p} )^{(1/p)} where p is any real value greater than or equal to 1.
Although the formula does not make sense, there is also a Hölder norm for p equal to infinity; it's simply the maximum of the absolute values of the entries of the vector.
The L1 norm, the L2 norm, and the L Infinity norm are exactly the Hölder norms with p equal to 1, 2 and infinity.
Back to TABLE OF CONTENTS.
A Householder matrix for a given vector v has the form:
H = I - 2 * v * v' / ( norm2 ( v ) )^{2}or, in the common case where the Euclidean norm of v is 1, we may write:
H = I - 2 * v * v'Note that the Householder matrix is a sort of "overenthusiastic" projection matrix; instead of removing the component in the direction of v, it reverses it.
For the simple case where w = (1/3, 2/3, 2/3), here is what H would look like:
(1 0 0) (1/9 2/9 2/9) ( 7/9 -4/9 -4/9) (0 1 0) - 2 * (2/9 4/9 4/9) = (-4/9 1/9 -8/9) (0 0 1) (2/9 4/9 4/9) (-4/9 -8/9 1/9)
A little "reflection" will convince you that the Householder matrix for any vector v will always be symmetric, just like this example.
The Householder matrix is also an orthogonal matrix:
H * H' = H * H = I.
Householder matrices can be used to compute the QR factorization of a matrix. A Householder matrix can be found which will "wipe out" all the subdiagonal entries of the first column of the original matrix. Another Householder matrix can be found which will "wipe out" all the subdiagonal entries of the second column, and so on. At the end of N-1 steps of this process, we have computed
H_{n-1} * ... * H_{2} * H_{1} * A = Rwhere R is upper triangular. But the product
H = H_{n-1} * ... * H_{2} * H_{1}is an orthogonal matrix. We can multiply both sides by the transpose of H, which is also its inverse, to get
A = H' * Ror, if we define Q to be H':
A = Q * R.
The Householder matrix is sometimes called an elementary reflector.
Back to TABLE OF CONTENTS.
An idempotent matrix A has the property that
A * A = AAn idempotent matrix is "not quite" a projection matrix.
Simple facts about an idempotent matrix A:
Back to TABLE OF CONTENTS.
The identity matrix, usually denoted I, is a square matrix with 1's on the main diagonal and 0's elsewhere. The identity matrix behaves like the number 1 in matrix multiplication. For any matrix A:
A * I = I * A = A.
Here is the 3 by 3 identity matrix:
1 0 0 0 1 0 0 0 1
Simple facts about the identity matrix I:
Back to TABLE OF CONTENTS.
When solving a linear system A * x = b, the matrix A is said to be ill conditioned if small errors or perturbations in the coefficient matrix A or right hand side b correspond to large errors or perturbations in the solution x.
A numerical scale for ill conditioning is provided by the condition number.
Linear systems which are extremely ill conditioned may be impossible to solve accurately. A standard example of an ill conditioned matrix is the Hilbert Matrix, with A_{i,j} = 1 / ( I + J ).
Back to TABLE OF CONTENTS.
The incomplete Cholesky factorization is an approximate Cholesky factorization, of a matrix A, comprising a lower triangular matrix L for which A is approximately equal to L*L'.
The incomplete Cholesky factorization is a special case of the Incomplete LU Factorization, modified for symmetric positive definite matrices. There are known cases where it will fail.
Typically, the matrix A is large and sparse, (as well as being symmetric and positive definite) and an iterative scheme is being used to solve the linear system A*x=b. The incomplete Cholesky factorization is intended as a preconditioner, which modifies the linear system, improving the convergence rate of the iterative scheme. In particular, the preconditioner matrix is M=L*L', and we are well advised to store L instead of M, since this gives us an easy way of solving linear systems associated with M.
The computation of the incomplete Cholesky factorization is similar to that of the usual Cholesky factor except for two points. First, we assume that no pivoting is required. Secondly, if the original matrix A has a zero entry, then we require that the corresponding entry of the L matrix be zero. This means that there is no fill in (and it is also why L*L' will not equal A). Because there is no fill in, the factor L can actually be stored in a data structure that is the same as that of A.
Because no pivoting is allowed, the algorithm may break down because of negative elements that may appear on the diagonal. See the Kershaw matrix for a simple example.
Back to TABLE OF CONTENTS.
The incomplete LU factorization is an approximate LU factorization of a matrix A, comprising a unit lower triangular matrix L and an upper triangular matrix U.
Typically, the matrix A is large and sparse, and an iterative scheme is being used to solve the linear system A*x=b. The incomplete LU factorization is intended as a preconditioner, which modifies the linear system, improving the convergence rate of the iterative scheme. In particular, the preconditioner matrix is M=L*U, and we are well advised to store L and U instead of M, since this gives us an easy way of solving linear systems associated with M.
The computation of the incomplete LU factorization is similar to that of the usual PLU factors except for two points. First, we assume that no pivoting is required, hence the P factor can be omitted. Secondly, if the original matrix A has a zero entry, then we require that the corresponding entry of the U matrix be zero. This means that there is no fill in (and it is also why L*U will not equal A). Because there is no fill in, the factors L and U can actually be stored in a data structure that is the same as that of A. The elements of L are stored in the positions devoted to lower triangular elements of A, while U goes in the diagonal and upper triangular locations.
If the matrix A is symmetric and positive definite, then the related Incomplete Cholesky Factorization may be tried.
Back to TABLE OF CONTENTS.
The oddly named inertia of a (square) matrix is the numbers of negative, zero, and positive eigenvalues.
Sylvester's Law of Inertia states that if A and B are congruent matrices, then we cannot guarantee that they have the same eigenvalues, but they do have the same inertias.
This theorem allows us to determine if a symmetric matrix is positive definite. Because the matrix is symmetric, we can compute an LDL factorization:
A = L * D * L'where L is unit lower triangular, and D is diagonal. This means that A is congruent to the diagonal matrix D. But the eigenvalues of D are easily determined, from which we can get the inertia of D. This is equal to the inertia of A; in particular, if D has only positive eigenvalues, then so does A, which is therefore positive definite.
Moreover, if such a factorization is cheap, as for a tridiagonal symmetric matrix, then we can search for eigenvalues by seeking diagonal shifts of the matrix that cause the number of negative eigenvalues to change by 1.
Back to TABLE OF CONTENTS.
An inner product is a scalar-valued function of two vectors x and y, denoted (x,y), with the properties that:
A vector inner product (x,y) can be used to define a corresponding vector norm ||x||:
|| x || = sqrt ( x, x ).If the inner product and norm are related in this way, then the Cauchy-Schwarz inequality relates them.
The inner product is sometimes referred to as the dot product (because it is often represented as x dot y), or as the scalar product (because its result is a scalar value).
Back to TABLE OF CONTENTS.
An integer matrix is simple a matrix whose entries are all integers.
Examples of integer matrices include
If A is an integer matrix, it is natural to ask whether this tells us something about quantities related to A.
It is no surprise that if A is an integer matrix, then so will be the trace and its transpose of A.
moreover, if A and B are integer matrices, then it is obvious that the product A*B will also be an integer matrix.
Since the determinant is a sum of products of entries of the matrix, if A is an integer matrix then det(A) must be an integer.
The adjoint matrix and the cofactor matrix, being formed from determinants of submatrices of A, will be integer matrices if A is.
Since the inverse matrix can be formed using Cramer's rule, which involves the ratio of determinants of a submatrix of A divided by the determinant of A, it is not necessarily the case that inverse(A) is an integer matrix, but it must be the case that det(A)*inverse(A) is an integer matrix.
For similar reasons, if A is an integer matrix, and b is an integer vector, then the solution x of the linear system A*x=b is not necessarily an integer vector, but det(A)*x will be an integer vector.
If A is triangular matrix with unit diagonal and is an integer matrix, then inverse(A) will have the same format and will be an integer matrix too. (Of course, this also follows from the fact that det(A) must be 1!)
Back to TABLE OF CONTENTS.
The inverse matrix of a square matrix A, if it exists, is a matrix denoted A^{-1} with the property that
A * A^{-1} = A^{-1} * A = I.
If the inverse matrix exists, it is unique, and A is said to be nonsingular or invertible. Otherwise, A is singular.
If the inverse of A exists, then the solution of
A * x = bcan be immediately written down:
x = A^{-1} * b.
However, it's not a good idea to solve a linear system in this way. The inverse is relatively expensive to compute, and subject to greater inaccuracies than other solution methods. This is not to say that the inverse isn't useful. Orthogonal and unitary transformations are so popular in numerical linear algebra because their inverses "come for free"; (and their inverses are very well conditioned).
Simple facts:
LAPACK and LINPACK include routines for explicitly computing the inverse of a given matrix.
Back to TABLE OF CONTENTS.
The inverse power method is a technique for solving the eigenvalue problem.
The inverse power method is related to the power method, but is more flexible. Through the use of shifts, it can be "aimed" to seek the eigenvalue that is closest to any particular target value. Moreover, once a rough estimate of an eigenvalue is made, the shift can be set to this value, which will increase the convergence rate of the algorithm.
The inverse power method only requires that the user be able to repeatedly solve linear systems of the form:
A * x = bor, when shifts are used,
( A - shift * I ) * x = b.Thus, for instance, if A is a band matrix or a sparse matrix, the user can employ a storage method and solution algorithm appropriate to the particular form of the problem.
The inverse power method gets its name from the fact that, when the shift is zero, it is equivalent to using the power method on the matrix A^{-1}.
The inverse power method begins by picking a starting guess for the eigenvector x, and the eigenvalue lambda. If the shift will not vary then a fixed value should be set now. Then repeat the following steps as often as necessary:
Back to TABLE OF CONTENTS.
An invertible matrix A is a (square) matrix for which there exists an inverse matrix B, called the inverse of A.
A square matrix A is invertible if and only if:
A common situation involves a "small" perturbation matrix added to the identity matrix. The matrix I+A is guaranteed to be invertible if it is true that ||A|| < 1 for some vector-bound matrix norm.
Back to TABLE OF CONTENTS.
An involutory matrix A has the property that
A * A = I.
Simple facts about an involutory matrix A:
Back to TABLE OF CONTENTS.
An irreducible matrix is a (square) matrix which is not reducible.
Definition One: A reducible matrix is one which can be rearranged into the following block form:
( P Q ) ( 0 R )where P and R are square sub-blocks, and "0" represents a (nonempty) rectangular block of zero entries. The rearrangement can be done by simultaneous row and column permutations. An irreducible matrix is one which is not reducible.
Definition Two: A matrix is irreducible if and only if, for any row index i and column index j, there is always a nonnegative integer p (which may be 0) and a sequence of integers k1, ..., kp so that the product
A_{i,k1} * A_{k1,k2} * ... * A_{kp,j}is nonzero.
Theorem: A nonnegative matrix A is irreducible if and only if, for any vector x>0 it is the case that A*x>0.
Definition Three: A matrix of order N is irreducible if, for any division of the integers between 1 and N into two disjoint sets K1 and K2, there is always a nonzero element A_{i,j} with I in K1 and J in K2.
The concept of an irreducible matrix is mainly of interest in the analysis of the convergence of certain iterative schemes for linear equations. One key idea is the following: if the matrix A is irreducible and diagonally dominant, then A is nonsingular.
If you only know that the matrix is diagonally dominant, then Gershgorin's theorem would still not rule out an eigenvalue of zero, and hence singularity. It's the irreducibility that guarantees no zero eigenvalue here. The "1, -2, 1" tridiagonal matrix is an example where this theorem applies.
Here's an example of an irreducible matrix with zero main diagonal:
0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0
If you're familiar with graph theory, then a matrix A is irreducible if and only if the digraph with corresponding adjacency matrix is strongly connected. (Consider node I to be connected to node J if A_{i,j} is nonzero. The digraph is strongly connected if you can get from any node to any other node following directed edges.)
A simple algorithm to determine if a matrix is irreducible can be guided by the idea of the strong connectivity of the corresponding adjacency matrix. First, we choose any node P, and ensure that it can be reached from at least one other node. (We're looking at the P column of the matrix.) If that is not the case, then the matrix is reducible, and we're done. Otherwise, we determine all the nodes we can reach from node P (that's the nonzero entries in row P). Now we take all those nodes, and see what new nodes we can reach. We repeat this operation until no new nodes are found. If any node wasn't reached, the graph is not strongly connected, and the matrix is reducible. Otherwise, the matrix is irreducible.
Back to TABLE OF CONTENTS.
A method is called iterative when it consists of a basic series of operations which are carried out over and over again, until the answer that is produced is no longer significantly changing, or some exceptional error occurs, or the limit on the number of steps is exceeded.
All eigenvalue problems are solved by iterative methods, except for the "toy" problems presented in textbooks. This is because the computation of eigenvalues is equivalent to finding the roots of a polynomial, and there is no explicit method for finding the roots of a general polynomial of degree 5 or higher.
The best known methods for the eigenvalue problem include:
Back to TABLE OF CONTENTS.
A method is called iterative when it consists of a basic series of operations which are carried out over and over again, until the answer that is produced is no longer significantly changing, or some exceptional error occurs, or the limit on the number of steps is exceeded.
A direct method is the "opposite" of an iterative method. A fixed number of operations are carried out once, at the end of which the solution is produced. Gauss elimination on a linear system is an example of such a direct method. Direct methods are the primary method for solving small or dense or nonsymmetric linear systems.
The most common reason for using an iterative method is that it can require far less storage than a direct method. An iterative method typically only sets aside storage for the original nonzero entries of the matrix; no fill in occurs. Standard direct methods must set aside an entry for every possible position in the matrix, though some reduction in this requirement is possible if the matrix is banded.
Secondly, each iteration typically takes much less time than a full direct solve; thus, it is possible, for some problems, that an iterative method will actually converge to an acceptable answer more quickly than a direct method.
An iterative method has numerous disadvantages. You will need a starting point, and if you pick a poor one that may slow down convergence. Your system matrix usually needs to satisfy extra conditions beyond merely being nonsingular. The rate of convergence may be extremely slow, although this can be helped by a suitable preconditioner. If you have a very large problem and are storing the matrix in a compact form, the programming and computational cost involved in storing and retrieving coefficient data can exceed that of the solution phase.
Iterative methods are generally only suitable for certain kinds of system matrices. The most common requirements are that the matrix be positive definite symmetric or strictly diagonally dominant.
Iterative methods for solving systems of linear equations include:
Back to TABLE OF CONTENTS.
Iterative refinement is an attempt to "improve" a computed solution x0 for the linear system of equations A*x=b.
The algorithm is occasionally effective when the coefficient matrix is ill conditioned, a problem that may become evident if the residual error is computed, and seen to be relatively large.
A single step of the algorithm involves computing the residual error:
r := b - A * x,solving the linear system:
A * dx = r,and adding the correction dx to the original solution x.
x := x + dx.
In order to achieve any improvement in accuracy, the residual error should be calculated in higher arithmetic precision. The rest of the calculation can be done in single precision, allowing the use of the already computed LU factorization of A, if available. However, to compute the residual error, the original matrix A is needed. It is preferable to figure out how to use the LU factors to compute the residual error, rather than keeping two copies of the matrix, one factored and one untouched.
If the residual error of the new solution is still too large, but somewhat better, then the procedure may be repeated as often as desired.
A subroutine SGEIM for carrying out iterative refinement using the LINPACK routines is described in the LINPACK User's Guide.
Back to TABLE OF CONTENTS.
The Jacobi algorithm for computing eigenvalues is an iterative method, which may be applied to a symmetric matrix A, to compute its eigenvalues and eigenvectors.
The method will produce an orthogonal matrix Q and an "approximately" diagonal matrix lambda such that
A = Q' * lambda * Q.
The steps of the iteration compute a sequence of orthogonal matrices Q1, Q2, ... which transform the original matrix A into matrices A1, A2, ... each of which has the same eigenvalues as A, but each of which is "more" diagonal than A was. Starting with A, we determine Q1, and produce A1, which is orthogonally similar to A:
A1 = Q1' * A * Q1.We then determine a matrix Q2 that produces A2:
A2 = Q2' * A1 * Q2
= ( Q1 * Q2 )' * A * ( Q1 * Q2 ).
The point of the algorithm is how we choose the matrices Q at each step. Q1 is chosen in such a way as to "annihilate" the (1,2) and (2,1) elements of A. That is, the (1,2) and (2,1) elements of A1 will be zero. Q2 will eliminate the (1,3) and (3,1) elements of A1. Thus A2 will have zeroes in the (1,3) and (3,1) positions. Unfortunately, the (1,2) and (2,1) positions of A2, which we just zeroed out on the previous step, will not remain zero, but will fill in again! However, the fill in values are generally smaller than the original values. As we can sweep through the entire matrix, we repeatedly annihilate the off-diagonal elements until they all have decreased below some tolerance.
It can be shown that the sum of the squares of the off diagonal elements always decreases to zero with the iteration. Thus, for some iteration step M, one can expect to have a matrix AM and a matrix
Q = Q1*Q2*...*QMso that
AM = Q' * A * Qwhere AM is "essentially" diagonal. At that point, we can rewrite this equation as
A * Q = lambda * Qwhere lambda is the diagonal entries of AM, and is the eigenvalues of A, and the transformation matrix Q is the matrix of eigenvectors of A. Thus, if the off diagonal elements disappear as promised, we have approximately solved our eigenvalue problem.
So which orthogonal matrix Q zeroes out a specific pair of entries? The formula is fairly simple. To annihilate the arbitrary entries A_{i,j} and A_{j,i}, the matrix Q is equal to the identity matrix, except for:
Q(I,I) = C Q(I,J) = S Q(J,I) = - S Q(J,J) = Cwhere C and S are the cosine and sine of some rotation angle THETA. Thus, each matrix Q is a Givens rotation matrix. We can compute C and S directly (skipping the computation of THETA) by the following formula:
U = ( A(J,J) - A(I,I) ) / ( 2 * A_{i,j} ) T = sign ( U ) / ( | U | + sqrt ( U^{2} + 1 ) ) C = 1 / sqrt ( T^{2} + 1 ) S = T / sqrt ( T^{2} + 1 )
The Jacobi method is simple and easy to program, but is usually slower than the QR method.
Back to TABLE OF CONTENTS.
The Jacobi algorithm for linear equations is an iterative method for solving linear systems of equations A * x = b.
The method is similar to the Gauss Seidel and Successive Overrelation Method (SOR), but has the distinction of being the only simple iterative method that is easy to program in parallel.
The Jacobi iteration is only appropriate for matrices which are strictly diagonally dominant or else symmetric and positive definite.
Each step of the Jacobi iteration begins with an approximate solution x. An improved approximate solution xnew is computed by solving N independent linear equations:
xnew(i) = [ b(i) - A(i,1) * x(1) - A(i,2) * x(2) ... - A(i,i-1) * x(i-1) - 0.0 * x(i) <-- note that we are skipping x(i) here! - A(i,i+1) * x(i+1) ... - A(i,n) * x(n) ] / A(i,i)
Once xnew is computed, the residual error is calculated:
r = A * xnew - band the solution increment:
dx = xnew - x.If the norms of these vectors are satisfactorily small, the iteration may be halted. Otherwise, xnew becomes the new starting guess x for the next step of the iteration.
The Jacobi iteration can be considered in terms of its matrix splitting. That is, if we decompose the matrix A into its strictly lower triangular, diagonal, and strictly upper triangular parts:
A = L + D + Uthen the method is equivalent to the iteration
D * xnew = b - ( L + U ) * x.which means that the convergence of the algorithm can be understood in terms of the behavior of powers of the iteration matrix:
- D^{-1} * ( L + U ),which in turn may best be understood by looking at the eigenvalues. Note that if A is symmetric, then so is the iteration matrix.
Back to TABLE OF CONTENTS.
The Jacobi preconditioner is a very simple preconditioning matrix for use with an iterative method for linear equations.
For a given system matrix A, the Jacobi preconditioner matrix M is the diagonal matrix whose entries are defined by
M = diag ( A )In effect, using the Jacobi preconditioner amounts to dividing each equation and right hand side entry by the corresponding diagonal entry of the original coefficient matrix.
Back to TABLE OF CONTENTS.
The Jordan canonical form of a matrix A is an upper bidiagonal matrix whose main diagonal contains the eigenvalues of A, and whose superdiagonal contains only zeroes or ones. The location of the ones in the superdiagonal is determined by the algebraic and geometric multiplicity of the eigenvalues.
Here is an example of a Jordan canonical form:
4 0 0 0 0 0 2 1 0 0 0 0 2 1 0 0 0 0 2 0 0 0 0 0 3
Every matrix is unitarily similar to its Jordan Canonical Form. That is, for any matrix A, there exists a unitary matrix U so that
A = U^{*} * J * Uwhere J has Jordan canonical form. This form can also be regarded as a matrix factorization.
If A is real, but has complex eigenvalues, the matrix J has complex entries.
The Jordan canonical form is of little interest to computational linear algebraists. Unless exact arithmetic is used, it is extremely sensitive to small errors, and the information it provides can be computed more reliably in other ways.
Back to TABLE OF CONTENTS.
The Kershaw matrix is a simple 4 by 4 positive definite symmetric matrix, which is an example for which the symmetric version of the incomplete Cholesky factorization gives a negative pivot, causing the algorithm to break down.
The Kershaw matrix:
3 -2 0 2 -2 3 -2 0 0 -2 3 -2 2 0 -2 3
Reference:
Back to TABLE OF CONTENTS.
Given a square matrix A and some initial vector b, the first k elements of the sequence of Krylov vectors are:
b, Ab, A^{2}b, A^{3}b, ... A^{k-1}b;the corresponding Krylov subspace is the subspace spanned by these vectors; the Krylov matrix is the matrix whose columns are formed by the Krylov vectors.
Back to TABLE OF CONTENTS.
An L matrix is matrix whose diagonal elements are strictly positive, and whose off-diagonal elements are nonpositive.
The identity matrix is a "trivial" example of an L matrix.
Back to TABLE OF CONTENTS.
The L1 matrix norm is a matrix norm that is vector-bound to, and hence compatible with, the L1 vector norm.
Thus, the formal definition of the norm is
||A|| = max ( || A*x || / ||x|| )where the vector norm used on the right hand side is the L1 vector norm, and the maximum is taken over all nonzero vectors x.
However, it is easy to show that the L1 matrix norm has a simpler formula: ||A|| = the maximum, over all matrix columns, of the sum of the absolute values of the entries in the column.
Back to TABLE OF CONTENTS.
The L1 vector norm is a vector norm defined as
||x|| = sum ( 1 <= I <= N ) |x(i)|.
The L1 vector norm is an example of a Hölder norm.
Back to TABLE OF CONTENTS.
The L2 matrix norm is a matrix norm that is vector-bound to, and hence compatible with, the L2 vector norm.
Thus, the formal definition of the norm is
||A|| = max ( || A*x || / ||x|| )where the vector norm used on the right hand side is the L2 vector norm, and the maximum is taken over all nonzero vectors x.
The L2 matrix norm has another formulation: ||A|| = the square root of the maximum absolute value of the eigenvalues of A' * A.
The computation of the L2 norm is expensive, and so it is often simpler to use the easily-computed Frobenius matrix norm, which is not vector-bound to the L2 vector norm, but is compatible with it.
Back to TABLE OF CONTENTS.
The L2 vector norm is a vector norm defined as
||x|| = sqrt ( sum ( 1 <= I <= N ) x(i)^{2} )
The L2 vector norm is also known as the Euclidean vector norm or the root-mean-square vector norm.
The L2 vector norm is an example of a Hölder norm.
Back to TABLE OF CONTENTS.
The L Infinity matrix norm is a matrix norm that is vector-bound to, and hence compatible with, the L Infinity vector norm.
Thus, the formal definition of the norm is
||A|| = max ( || A*x || / ||x|| )where the vector norm used on the right hand side is the L Infinity vector norm, and the maximum is taken over all nonzero vectors x.
However, it is easy to show that the L Infinity matrix norm has a simpler formula: ||A|| = the maximum, over all matrix rows, of the sum of the absolute values of the entries in the row.
Back to TABLE OF CONTENTS.
The L Infinity vector norm is a vector norm defined as
||x|| = max ( 1 <= I <= N ) |x(i)|.
The L Infinity vector norm is an example of a Hölder norm.
Back to TABLE OF CONTENTS.
The Lanczos iteration is an iterative method that seeks to approximate some of the eigenvalues of a very large real symmetric (or Hermitian) matrix.
The Lanczos iteration is a version of the Arnoldi iteration that takes advantage of the special properties associated with a real symmetric or Hermitian matrix.
Consider the unitary similarity transform that converts the m by m symmetric or Hermitian matrix A into tridiagonal form:
A = Q T Q*Let Q_{n} indicate the m by n matrix formed from the first n columns of Q; similarly, T_{n} is the n by n submatrix of T. Then we write
T_{n} = Q_{n}* A_{n} Q_{n}
As we pass from n to n+1, we simply add a new column q_{n+1} to Q_{n}, and to T_{n} we add a new diagonal value alpha_{n+1} and an offdiagonal value beta_{n}. Thus, the iteration allows us to gradually build up the matrices Q_{n} and T_{n}. For reasons not completely understood, even for n much smaller than m, the eigenvalues of T_{n} will tend to some of the eigenvalues of A, and typically the extreme ones.
The new vector q_{n+1} is a Krylov vector; actually, because of orthogonalization, it represents the new component introduced into the Krylov subspace by the product A*q_{n}.
Thus, a typical procedure is to compute T_{n} and apply a standard eigenvalue procedure for tridiagonal matrices to extract its eigenvalues.
The entries of the matrix T_{n} are denoted by
alpha_{1} | beta_{1} | ||
beta_{1} | alpha_{2} | beta_{2} | |
.... | .... | .... | .... |
beta_{n-2} | alpha_{n-1} | beta_{n-1} | |
beta_{n-1} | alpha_{n} |
An efficient form for the Lanczos iteration is:
beta_{0} = 0; q(0) = 0; Let b be an arbitrary initial vector; Set q(1) = b / ||b||; for n = 1, ... v = A * q(n) alpha(n) = q(n)' * v v = v - beta(n-1) * q(n-1) - alpha(n) * q(n) beta(n) = ||v|| q(n+1) = v / beta(n) end for
Back to TABLE OF CONTENTS.
LAPACK is a set of linear algebra routines, intended as the replacement for LINPACK and EISPACK. It is a project sponsored by the Argonne National Laboratories and the Numerical Algorithms Group (NAG). The LAPACK routines are intended to achieve optimal performance on various machines by calling the level 3 BLAS, which use block methods to achieve high performance.
Back to TABLE OF CONTENTS.
The LDL factorization of a symmetric matrix is a decomposition of the form:
A = L * D * L',involving a unit lower
The LDL factorization is a special case of the
If the matrix A is actually positive definite, then we can get the even stronger Cholesky factorization.
A set of M vectors, each of order N, is called linearly dependent if there is some linear combination of the vectors, with at least one nonzero coefficient, which equals the zero vector.
If the I-th vector is used as row I of an M by N matrix A, then this is equivalent to saying there is a nonzero vector C such that
A * C = 0If no such combination is possible, then the vectors are linearly independent.
Simple facts:
Back to TABLE OF CONTENTS.
A linear least squares problem seeks an "approximate solution" x to a linear system A * x = b, where x is to be chosen in such a way as to minimize the L2 or Euclidean norm of the residual error b - A * x.
A least squares approach is usually tried when the standard algorithms for solving the linear system, such as Gauss Elimination, are not suitable, perhaps because
In the most common case, there are more equations than unknowns, so that A is actually a matrix with rectangular order of M rows by N columns. Then the system may or may not be consistent, and A itself may perhaps not have the fullest possible rank.
Despite the fact that this problem can't be solved by the usual means, it is still the case that:
By choosing to use the L2 vector norm, we are using a least-squares approach. Two advantages are that in case 2, there will always exist a unique solution of minimal norm, and in case 3, there are good algorithms for finding the unique approximate solution which has a residual of minimum norm.
If the columns of the coefficient matrix are linearly independent, then a simple approach to solving the problem is to form and solve the normal equations, although this incurs a significant penalty in terms of the condition number.
Other, superior, methods of solving this problem are preferred, usually via the QR factorization or the pseudoinverse. Such methods can also handle the case where the matrix does not have maximal rank, or in which the number of columns is greater than the number of rows.
Back to TABLE OF CONTENTS.
A linear space is a collection X of "vectors", a scalar field F, (usually the real or complex field), and the operations of vector addition and scalar multiplication, with the properties that:
Examples of linear spaces include:
A linear space has a dimension. The dimension of a linear space can be thought of as the smallest number of vectors from which the space can be reconstructed, or the cardinality of the smallest set of vectors that spans the space, or the cardinality of any basis for the space.
Back to TABLE OF CONTENTS.
A Linear Transformation is, formally, an operator A applied to elements x of some linear space X, with the properties that
If X is a finite dimensional vector space with an orthonormal basis, then any element x can be represented by an N dimensional vector, and any linear transformation on X can be represented by a matrix. It is usually this matrix that people think of when they speak of a linear transformation.
Linear transformations are frequently used, for example, in computer graphics. It is interesting to note that for an arbitrary linear transformation matrix A, the Gram Schmidt factorization allows us to write A as the product of an orthogonal matrix Q and an upper triangular matrix R. If we "factor out" the diagonal entries of R, we then can view A as:
A = Q * D * Swhere
Back to TABLE OF CONTENTS.
LINPACK is a standard linear algebra package for factoring matrices, computing matrix determinants, condition numbers, and inverses, and for solving linear systems. Additional capabilities include least squares solutions, QR and singular value decomposition.
Many matrix storage modes are allowed, including dense, banded, symmetric, positive definite, symmetric banded, but LINPACK does not handle sparse matrices, nor does it employ iterative methods.
Here are the routines available for single precision computations. There is a related set available for complex matrices, with names which begin with C instead of S.
Back to TABLE OF CONTENTS.
A lonesum matrix is a (possibly rectangular) zero one matrix with the property that every entry can be determined using only the list of row and column sums.
An example of a square lonesum matrix is:
0 1 1 0 0 1 1 1 1because it is possible to write it, with its row and column sums, as:
0 1 1 | 2 0 0 1 | 1 1 1 1 | 3 ------+ 1 2 3and then to hide the entries:
* * * | 2 * * * | 1 * * * | 3 ------+ 1 2 3and to logically figure out what every entry must have been, using only the row and column sums and the fact that the entries are either 0 or 1.
Back to TABLE OF CONTENTS.
The LU factorization of a matrix is a decomposition of the form:
A = P * L * U,involving a permutation matrix P, a unit lower triangular matrix L, and an upper triangular matrix U.
P records the pivoting operations carried out in Gauss elimination, L the row multipliers used during elimination and U contains the pivot values and other information.
The factors are typically computed via Gauss elimination. Once the factors are computed, they may be used to
The LU factorization is generally computed only for a nonsingular square matrix, but the LU factorization exists and is useful even if the matrix is singular, or rectangular.
While the LU factorization is usually quite satisfactory, it is possible to prefer a factorization of the form
A = P * L * U * Q,by using Gaussian elimination with complete pivoting, in which case Q is a permutation matrix selecting the variable (or column) to be eliminated at each step. Another rival factorization is the QR factorization
A = Q *R,which is slightly more expensive, but has better stability and accuracy properties.
LAPACK and LINPACK include routines to compute the LU factorization of a matrix stored in a variety of formats. Note that the P, L and U factors themselves are scrambled and compressed in the data storage used by these routines, so that it's difficult to determine their actual values.
Back to TABLE OF CONTENTS.
An M matrix is a (real) (square) invertible matrix whose offdiagonal elements are nonpositive, and whose inverse is a nonnegative matrix.
An M matrix is also called a Minkowski matrix.
There are many definitions of an M matrix. Another one is that a matrix A is an M matrix if there exists a nonnegative matrix B, with a maximal eigenvalue r, such that
A = c * I - Bwhere c >= r. From this definition, it should be clear that an M matrix must have a nonnegative diagonal, and nonpositive offdiagonal.
Facts about an M matrix A:
A symmetric M matrix is called a Stieltjes matrix. A Stieltjes matrix is positive definite.
Back to TABLE OF CONTENTS.
A magic square can be regarded as a square matrix with an associated constant mu such that all row sums and column sums are equal to mu. In most cases the main and antidiagonal elements also sum to mu.
Most magic squares of order n are made from the consecutive integers from 1 to n^{2}. In this case, it is common that an entry k and its "complement" n+1-k are located symmetrically with respect to the center of the matrix. Such a magic square is called regular.
Example:
1 14 15 4 12 7 6 9 8 11 10 5 13 2 3 16
When a magic square is regarded as a matrix, it may exhibit some interesting properties. For instance, if we let J be the Exchange matrix, then
A + J * A * J = 2 * ( mu / n ) * Ewhere E is the matrix all of whose entries are 1.
Because the entries of A are positive, the Perron Frobenius theorem guarantees that there is a positive eigenvalue, equal to the spectral radius of the matrix, and which is a simple eigenvalue (having algebraic multiplicity of 1), and an associated eigenvector all of whose entries are positive.
In fact, the dominant simple eigenvalue is mu, and the associated positive eigenvector is e=(1,1,...,1)'.
Back to TABLE OF CONTENTS.
The matrix exponential of a square matrix A is a matrix B(A,t) = exp ( A * t ), which has properties similar to those of the exponential function of a scalar argument.
In particular:
If the matrix A can be is orthogonally diagonalizable, so that
A = Q * Lambda * Q',then, using the power series definition of B(A,t) results in:
B(A,t) = Q * sum ( 0 <= I < Infinity ) ( Lambda * t )^{I} / I! * Q';or
B(A,t) = Q * exp ( Lambda, t ) * Q',where exp ( Lambda, t ) is the diagonal matrix whose I-th diagonal entry is exp(lambda(i)), the exponential of the I-th eigenvalue of A.
In the general case where A cannot be orthogonally diagonalized, or where the eigenvalues and eigenvectors cannot be reliably computed, the computation of the matrix exponential is difficult.
Back to TABLE OF CONTENTS.
Matrix factorization is the process of rewriting a matrix A as a product of factors with certain special properties. Matrix factorization is a key technique in solving linear systems, determining eigenvalues, and many other tasks.
Useful matrix factorizations include:
Back to TABLE OF CONTENTS.
Matrix multiplication is the computation of the matrix-vector product A * x or the matrix-matrix product A * B, where A and B are (possibly rectangular) matrices, and x is a vector.
A matrix product is only meaningful when the factors are conformable. This is a condition on the dimensions of the factors. If we are computing A * x or A * B, then the column order of A must equal the order of x or the row order of B.
To multiply the L by M array A times the M by N array B, the product C will have L rows and N columns, and a typical entry C_{i,j} is:
C_{i,j} = sum ( K = 1 to M ) A_{i,k} * B_{k,j}.
Matrix multiplication is not commutative. A*B and B*A are not even of the same order unless both matrices are square. Even if they are square, the two products will general have different numerical values.
Matrix multiplication is associative: (A*B)*C=A*(B*C). While the result is not affected by the order of operations, the amount of work can change, if the matrices are of different orders. For products of a large number of matrices, techniques of dynamic programming are required to determine the most efficient order in which to carry out the operations.
As a computational task, matrix multiplication is relatively expensive. To multiply two matrices of order N takes roughly 2*N^{3} floating point operations, which is more expensive than Gauss elimination of a matrix of order N. On vector and parallel machines, it is important to write a multiplication algorithm carefully, to take advantage of the potential for speedup.
Back to TABLE OF CONTENTS.
A matrix norm is a scalar quantity, which may be thought of as a sort of "magnitude" of the matrix. The norm can be used to estimate the effect of multiplying the matrix times a vector, solving a linear system, or other matrix operations. The norm also is used in the analysis of error and convergence
A matrix norm ||*|| must satisfy the following properties:
Matrix norms are most often needed when dealing with combinations of matrices and vectors. In such a case, it is important that the matrix norm and vector norm that are being used are compatible.
Any given vector norm can be used to derive a corresponding matrix norm, guaranteed to be compatible. This matrix norm is known as the vector-bound matrix norm.
Only if the matrix norm and vector norm are compatible can we write a useful bound like:
||A*x|| <= ||A|| * ||x||
Matrix norms include:
The spectral radius rho(A) is not a matrix norm, although it is often thought of that way.
Back to TABLE OF CONTENTS.
The order of a square matrix is the number of rows and columns. Thus a matrix of order 5 is a square matrix with 5 rows and 5 columns.
The order of a rectangular matrix is described by giving both the number of rows and the number of columns. Thus a rectangular matrix might have order 5 by 4.
The order of a matrix refers to a property of the mathematical object, which does not depend on how the information is stored in a computer. The actual numbers representing the matrix may be stored in a rectangular two dimensional array, whose row and column lengths are equal to or greater than the mathematical orders, a rectangular array of lesser size (for band matrix storage, say), or even in a collection of several separate one dimensional arrays.
Back to TABLE OF CONTENTS.
Matrix properties are any features of a matrix which may be of use in choosing a storage scheme or algorithm, or analyzing convergence or error properties, but which are not immediately evident from the simple arrangement of zeroes in the matrix, or any symmetries among the nonzero elements.
Matrix properties include:
Back to TABLE OF CONTENTS.
The rank of a matrix is a measure of the linear independence of its rows and columns.
The row rank of a matrix of order M by N is the number of linearly independent rows in the matrix, while the column rank is the number of linearly independent columns. The row rank will be between 0 and M, the column rank between 0 and N. If the row rank is equal to M, the matrix is said to have maximal or full row rank; there are corresponding terms for column rank.
For a "square" matrix, of order N, the row and column ranks will be equal. A square matrix is nonsingular if and only if it has maximal row rank; in other words, if no row is a linear combination of the other rows, and similarly for columns. A square matrix with full rank has an inverse, a nonzero determinant, and Gauss elimination with pivoting can be used to solve linear systems involving the matrix.
Every singular matrix is "almost" nonsingular. That is, using any matrix norm you like, and no matter how small you specify the (positive) tolerance epsilon, there is a nonsingular matrix closer than epsilon to the singular matrix. Another way to look at this is to realize that it is always possible, by making tiny changes to the entries of a singular matrix, to turn it into a nonsingular matrix. (Consider the zero matrix; add epsilons to the diagonal entries, and it's nonsingular.) Thus, given the roundoff implicit in computation, the determination of matrix rank is not a reliable process.
If the rank of a matrix is actually desired, a reasonable method is to compute the QR factorization. Very small diagonal terms in the R factor may indicate linear dependence of the corresponding columns of the matrix. The singular value decomposition will also give this sort of information.
Back to TABLE OF CONTENTS.
A matrix splitting is a decomposition of the system matrix:
A = M - Nin order to analyze the behavior of an iterative method for solving the linear system A*x = b. The M matrix is the multiplier of the next iterate, and the N matrix is the multiplier of the current iterate.
For instance, consider the Jacobi iteration. If we decompose the matrix into its strictly lower triangular, diagonal, and strictly upper triangular parts, we can write:
A = L + D + UThe Jacobi iteration can be written as:
D * xnew = - ( L + U ) * x - bHence, in the language of matrix splittings,
M = D.
N = ( L + U )
The matrix splitting gives us a convenient way of expressing the iteration matrix, which is the multiplier of the current iterate in the (explicit) formula for the next one. In terms of the matrix splitting, this iteration matrix always has the form M^{-1} * N. For instance, for the Jacobi iteration, the explicit formula for xnew is:
xnew = - D^{-1} * ( L + U ) * x - D^{-1} * band so the iteration matrix is D^{-1} * ( L + U ).
Back to TABLE OF CONTENTS.
Matrix storage formats are the schemes for storing the values of a mathematical matrix into memory locations in a computer, or, in some cases, for writing this information to an external file.
A special storage format may be chosen because of the overall matrix structure of zero entries, or because of the matrix symmetry involving nonzero entries.
Common matrix storage formats include:
In addition, many linear programming problems are stored in files that are formatted according to the MPS Storage Format.
Back to TABLE OF CONTENTS.
Matrix structure is the classification of matrices according to the pattern of its zero entries.
If a matrix has a known structure, it may be possible to use a specialized storage scheme that takes less space, or an algorithm that can execute more quickly.
A matrix about which nothing is known, or which exhibits no special pattern, may be called full or dense or general.
Matrix structure patterns that can occur include:
Back to TABLE OF CONTENTS.
Matrix symmetry classifies certain common patterns that may relate the nonzero values of the matrix.
This classification occurs after the study of the basic matrix structure induced by the pattern of zero elements.
As with matrix structure, any matrix symmetries that are present may influence the choice of storage used for the matrix data, and the algorithms suitable to be applied to the matrix.
Matrix symmetries include:
Back to TABLE OF CONTENTS.
The minimal polynomial of a matrix A is the monic polynomial P(X) of least degree with the property that P(A)=0.
The Cayley-Hamilton theorem asserts that every matrix satisfies its own characteristic equation. In other words, the polynomial
P ( lambda ) = det ( A - lambda * I ),which is zero when lambda is equal to any of the numbers which are eigenvalues of the matrix A, is equal to the zero MATRIX when lambda is replaced in the explicit formula for P(lambda) by A.
Thus, every matrix A of order N is guaranteed to be the root of a polynomial P(X) of degree N. Therefore the minimal polynomial of A is either the characteristic equation, or else some polynomial Q(X) of degree less than N. As a simple example, the characteristic polynomial of the identity matrix I is X^{N}-1, but the minimal polynomial is X-1.
A matrix is diagonalizable if and only if its minimal polynomial consists of distinct linear factors.
The minimal polynomial of a matrix is used to define a derogatory matrix, which is important when considering the eigenvalues and eigenvectors of a matrix.
Back to TABLE OF CONTENTS.
A minor matrix is derived from a matrix A by removing some rows and columns. Usually both A and the minor matrix are square, and usually only one row and one column are removed from A at a time.
For instance, if A is
1 2 3 4 5 6 7 8 9then the minor matrix derived by deleting row 2 and column 1 is
2 3 8 9
The principal minor matrices or principal minors of a square matrix of order N are a set of N matrices of orders 1, 2, ..., N; the M-th matrix has upper left entry A_{1,1} and lower right entry A_{M,M}.
Simple facts involving minor matrices:
Back to TABLE OF CONTENTS.
A monic polynomial is a polynomial whose leading coefficient is 1.
The leading coefficient, of course, is the coefficient of the highest power of X, (or whatever the independent variable happens to be). Thus, the polynomial X^{2}+3*X+17 is monic, but X^{2}+3*X^{3} is not.
The main reason for the idea of a monic polynomial is to be able to specify a unique polynomial with a given property. Thus, for instance, there are many polynomials of degree 2 which are zero at 1, 2, and 3, but they are all multiples of each other. In order to make a specific choice, we may specify that we mean the monic polynomial with these properties.
Back to TABLE OF CONTENTS.
The MPS data format is an industry standard for the description of a variety of linear programming problems.
Reference:
An MPS file is organized into named sections, which appear in the following order:
An MPS file may also have comment lines. Such lines may occur anywhere in the file. They begin with an asterisk '*' in column 1. They may contain information intended to explain the problem, or other facts of interest to a human, and of no interest to a program parsing the file.
The data lines of an MPS file are divided into fields of a specific width:
Field 1 | Field 2 | Field 3 | Field 4 | Field 5 | Field 6 | |
---|---|---|---|---|---|---|
Columns | 2-3 | 5-12 | 15-22 | 25-36 | 40-47 | 50-61 |
Contents | Indicator | Name 1 | Name 2 | Value 1 | Name 3 | Value 2 |
We now discuss in detail the format of the data lines that occur in each data section.
The code for specifying row type is as follows:
Type | Symbol | Meaning |
---|---|---|
= | E | Equal to. |
< | L | Less than or equal to. |
> | G | Greater than or equal to. |
Objective | N | Objective function. |
Free | N | No restriction imposed. |
This section of data begins with a card with ROWS in columns 1-4, followed by a data card for each row. The first N-type row encountered is regarded as the objective function.
Linear combinations of rows may also be specified. In this case, the above row types are denoted respectively by the codes DE, DL, DG, and DN, in columns 2-3. Field 2 contains the linear combination rowname. Fields 3-6 contain the rowname(s) in fields 3 and 5, and their multiplier(s) in fields 4 and 6. A linear combination of three or more rows requires additional cards, following the first card contiguously. In the additional cards, field 1 is empty. The right-hand sides of a linear combination row must be specified in the RHS section, described below.
The data card has the column label in field 2 (columns 5-12), the row label in field 3 (columns 15-22), and the value of the coefficient A_{i,j} or C_{j} in field 4 (columns 25-36), including a decimal point). If more than one nonzero row entry for the same column is to be made on the card, then field 5 (columns 40-47) has the next row label and field 6 (columns 50-61) has its corresponding coefficient value. The use of fields 5 and 6 is optional.
There is no need to specify columns for slack variables; this is taken care of automatically, based on the definitions of the row types.
h_{i} <= A_{i,1}x_{1} + A_{i,2}x_{2} + ... + A_{i,n}x_{n} <= u_{i}that is, both an upper and lower bound are specified for the row. The range of the constraint is
r_{i} = u_{i} - h_{i}The value of u_{i} or h_{i} is specified in the RHS section data, and the value of r_{i} is specified in the RANGES section data. This information, plus the row type specified in the ROWS section, defines the bounds u_{i} and h_{i}.
If b_{i} is the number entered in the RHS section and r_{i} is the number specified in the RANGES section, the u_{i} and h_{i} are defined as follows:
Row Symbol | Meaning | Sign of r_{i} | Lower limit h_{i} | Upper limit u_{i} |
---|---|---|---|---|
G | >= | + or - | b_{i} | b_{i}+|r_{i}| |
L | <= | + or - | b_{i}-|r_{i}| | b_{i} |
E | = | + | b_{i} | b_{i}+|r_{i}| |
E | = | - | b_{i}-|r_{i}| | b_{i} |
The section begins with a card with RANGES in columns 1-6. The data cards specifying the values of r_{i} are in exactly the same format as the COLUMNS data cards, except that field 2 (columns 5-12) has a label for the column of ranges (which can also be regarded as another column of the matrix). More than one column of ranges may be specified, but all the data cards for each column must be grouped contiguously.
0 <= x_{j} < infinity
More than one bound for a particular variable may be entered, that is, both a lower and an upper bound; when only one bound is specified the other is assumed to be one of the default values of 0 or infinity as shown in parentheses below.
Symbol | Meaning | Form of the bound |
---|---|---|
LO | lower bound | b_{j} <= x_{j} ( < infinity ) |
UP | upper bound | ( 0 <= ) x_{j} <= b_{j} |
FX | fixed variable | x_{j} = b_{j} |
FR | free variable | the value of x_{j} is not restricted. |
MI | lower bound is negative infinity | -infinity <= x_{j} ( <= 0 ) |
PL | upper bound is +infinity (default) | ( 0 <= ) x_{j} < infinity |
Example: the file afiro.mps.
NAME AFIRO ROWS E R09 E R10 L X05 L X21 E R12 E R13 L X17 L X18 L X19 L X20 E R19 E R20 L X27 L X44 E R22 E R23 L X40 L X41 L X42 L X43 L X45 L X46 L X47 L X48 L X49 L X50 L X51 N COST COLUMNS X01 X48 .301 R09 -1. X01 R10 -1.06 X05 1. X02 X21 -1. R09 1. X02 COST -.4 X03 X46 -1. R09 1. X04 X50 1. R10 1. X06 X49 .301 R12 -1. X06 R13 -1.06 X17 1. X07 X49 .313 R12 -1. X07 R13 -1.06 X18 1. X08 X49 .313 R12 -1. X08 R13 -.96 X19 1. X09 X49 .326 R12 -1. X09 R13 -.86 X20 1. X10 X45 2.364 X17 -1. X11 X45 2.386 X18 -1. X12 X45 2.408 X19 -1. X13 X45 2.429 X20 -1. X14 X21 1.4 R12 1. X14 COST -.32 X15 X47 -1. R12 1. X16 X51 1. R13 1. X22 X46 .109 R19 -1. X22 R20 -.43 X27 1. X23 X44 -1. R19 1. X23 COST -.6 X24 X48 -1. R19 1. X25 X45 -1. R19 1. X26 X50 1. R20 1. X28 X47 .109 R22 -.43 X28 R23 1. X40 1. X29 X47 .108 R22 -.43 X29 R23 1. X41 1. X30 X47 .108 R22 -.39 X30 R23 1. X42 1. X31 X47 .107 R22 -.37 X31 R23 1. X43 1. X32 X45 2.191 X40 -1. X33 X45 2.219 X41 -1. X34 X45 2.249 X42 -1. X35 X45 2.279 X43 -1. X36 X44 1.4 R23 -1. X36 COST -.48 X37 X49 -1. R23 1. X38 X51 1. R22 1. X39 R23 1. COST 10. RHS B X50 310. X51 300. B X05 80. X17 80. B X27 500. R23 44. B X40 500. ENDATA
Back to TABLE OF CONTENTS.
The algebraic multiplicity of a root lambda of a polynomial equation p(x)=0 is the "number of times lambda is a root".
More precisely, the algebraic multiplicity is the exponent ma of the factor (x-lambda) in the factorization of p(x). A root is known as a simple root if it has algebraic multiplicity 1; otherwise it is a repeated root.
Eigenvalues are defined as the roots of the characteristic equation of a matrix, and so an eigenvalue has an algebraic multiplicity. The behavior of the eigenvalue problem depends in part on the multiplicity of the eigenvalues. In particular:
For example, for the matrix
7 1 0 77 is an eigenvalue of algebraic muliplicity 2, but there is only a single eigenvector x = ( 0, 1 ), so 7 has a geometric multiplicity of 1.
Back to TABLE OF CONTENTS.
The Neumann series for a square matrix A is the infinite sum
I + A + A^{2} + A^{3} + A^{4} + ...
If the spectral radius of the matrix A satisfies rho(A) < 1, then I-A is nonsingular, and the Neumann series converges to (I-A)^{-1}.
Conversely, if the Neumann series converges, then the spectral radius of A must be less than 1.
Back to TABLE OF CONTENTS.
A nilpotent matrix is one for which the square, cube, or some finite power equals zero. For instance, any strictly lower triangular matrix is nilpotent.
Consider the following matrix A and its powers:
0 0 0 0 0 0 0 0 0 A = 2 0 0 A**2 = 0 0 0 A**3 = 0 0 0 3 4 0 8 0 0 0 0 0
Simple facts about a nilpotent matrix A:
Back to TABLE OF CONTENTS.
A nonnegative matrix A has only nonnegative entries; that is, for all indices I and J,
A_{i,j} >= 0.
Similar terms, with obvious definitions, include matrices that are positive, negative and nonpositive. It's easy to check if a matrix is nonnegative; this is much simpler than checking whether a matrix is positive definite. The expression A >= 0 is sometimes used to express the notion that the matrix A is nonnegative.
Facts about a nonnegative matrix A:
Back to TABLE OF CONTENTS.
The normal equations for an M by N rectangular linear system
A * x = bare computed by multiplying both sides by the transpose matrix:
A' * A * x = A' * b.
In the case where N < M, and the N columns of A are linearly independent, the matrix
B = A' * Awill be an invertible symmetric N by N matrix, and the normal equations can be solved by Gauss elimination. The matrix B is sometimes called the Gram matrix.
However, this method of producing an approximate solution of an overdetermined (and usually inconsistent) linear system is usually not recommended. The matrix B has a condition number that is the square of the condition number of A. Hence we may expect our solution to have a considerable error component. We can do better. There are more accurate methods which employ the QR factorization or the singular value decomposition.
Back to TABLE OF CONTENTS.
A normal matrix A is a real matrix that commutes with its transpose:
A * A' = A' * A;or a complex matrix that commutes with its transpose conjugate:
A * A^{H} = A^{H} * A.
Why define such an odd property? Because it's easy to check, and it turns out that a matrix is normal if and only if it is unitarily similar to a diagonal matrix. That's the same as saying the matrix is not only diagonalizable (similar to a diagonal matrix), but has a complete set of orthonormal eigenvectors.
A matrix is automatically normal if it is:
Note that an upper or lower triangular matrix is not normal, (unless it is actually diagonal!); such a matrix may have a complete set of eigenvectors (for example, if the eigenvalues are distinct), but the eigenvectors cannot be orthonormal.
Back to TABLE OF CONTENTS.
The null space of a matrix A is the set of all null vectors x such that
A * x = 0.
Simple facts:
Back to TABLE OF CONTENTS.
A null vector of a matrix A is a non-zero vector x with the property that A * x = 0.
Facts about a null vector:
Back to TABLE OF CONTENTS.
The ones matrix is a matrix all of whose entries are one.
A square ones matrix of order 2 or greater is singular.
A ones matrix, which can be easily defined in Matlab by
A = ones ( m, n )is often a convenient tool for various algorithms or matrix definitions.
Back to TABLE OF CONTENTS.
Back to TABLE OF CONTENTS.
When we speak of the order of a matrix, it is usually understood that the matrix is square, with N rows and N columns, and the order is simply the number N.
If the matrix is rectangular, having M rows and N columns, it is less common to speak of its order, but in this case, the order would be the pair of values, that is, (M,N).
An orthogonal matrix A is a square, invertible matrix for which it is true that:
A' = A^{-1}which implies:
A' * A = A * A' = I.
Facts about an orthogonal matrix A:
The QR factorization of a rectangular M by N matrix A, with M>N has two forms:
In complex arithmetic, the corresponding concept is a unitary matrix.
Back to TABLE OF CONTENTS.
An orthogonal similarity transformation is a similarity relationship between two real matrices A and B, carried out by an orthogonal matrix U, of the form:
A = U^{-1} * B * U = U' * B * U.
A and B are said to be orthogonally similar.
Orthogonal transformations are very common, particularly in eigenvalue computations. A general matrix A may be reduced to upper Hessenberg form by such transformations, so that Q*A=B. The nice thing about this is that if we later wish to reverse this transformation, the inverse of Q, because it is just the transpose of Q! This means that orthogonal transformations are very easy to apply and invert.
Another nice feature of orthogonal transformations is that they may be built up gradually as the product of a series of Householder matrices or Givens rotation matrices.
Back to TABLE OF CONTENTS.
Two vectors u and v are orthogonal if their dot product is 0:
u' * v = sum ( 1 <= i <= m ) u_{i} * v_{i} = 0,Of course, we are assuming here that both vectors have the same spatial dimension.
A set of n vectors V is pairwise orthogonal if it is true that every pair of vectors in the set is orthogonal. If an m by n matrix A is formed using pairwise orthogonal vectors for its columns, then the product matrix A'*A is diagonal, with the diagonal entries being the squares of the L2 vector norms of each vector.
Note that, from the definition, the zero vector is orthogonal to every vector, even to itself! Hence, in some cases, it is necessary to phrase a statement as "Let u and v be two nonzero orthogonal vectors..."
Back to TABLE OF CONTENTS.
A set of orthonormal vectors is a collection of M vectors X(I), each of order N, each of length 1 in the Euclidean norm:
X(I)' * X(I) = ( X(I), X(I) ) = 1,and pairwise orthogonal, so that if I and J are distinct:
X(I)' * X(J) = ( X(I), X(J) ) = 0.
Given any set of vectors, Gram Schmidt othogonalization or the QR factorization can produce an orthonormal set of vectors, possibly fewer in number, that form a basis for the linear space that is spanned by the original set.
Back to TABLE OF CONTENTS.
An outer product of two vectors x of dimension M and y of dimension N is a matrix A of order M by N whose entries are defined by:
A = x * y',or, entrywise:
A_{i,j} = x_{i} * y_{j}
A matrix defined as the outer product of two vectors will usually have rank equal to 1. The only other possibility is that the matrix might actually have rank 0. If an outer product of two vectors is added to a matrix, this operation is known as a rank one update.
An overdetermined linear system A * x = b is, loosely speaking, a set of M linear equations in N variables, with M > N.
Actually, it is possible for many of the equations to be redundant, so that what looks formally like an overdetermined system could "actually" be determined or underdetermined, if we'd just throw away the redundant equations. However, just recognizing and removing the redundancy is a difficult task, except for small problems and exact arithmetic. Thus, we will simply use the term "overdetermined" for any situation in which there are more equations than variables.
Except in special artificial cases, an overdetermined linear system has no solution. In such a case, we might be interested in an approximate solution which satisfies as many equations exactly as possible, which we might find simply by using pivoting to choose the equations to be satisfied, or a solution x which minimizes the residual norm A*x-b, which we might find using the QR factorization.
Back to TABLE OF CONTENTS.
A P matrix is a square matrix with the property that every principal minor matrix has a (strictly) positive determinant.
A P0 matrix is a square matrix with the property that every principal minor matrix has a determinant that is positive or zero.
A nonsingular M matrix is guaranteed to be a P matrix.
A matrix which is both a P matrix and a Z matrix is guaranteed to be a nonsingular M matrix.
Back to TABLE OF CONTENTS.
The permanent of a square matrix is a scalar value defined in a way similar to the determinant.
An explicit formula for the permanent of a matrix A is:
permanent ( A ) = sum [ over all P ] A(1,P(1)) * A(2,P(2) * ... * A(N,P(N)).where the sum ranges over all possible permutations P of the numbers 1 through N. This differs from the definition of the determinant only in that the sign of the permutation is not taken into account.
For example, suppose A is
1 2 3 4 5 6 7 8 9
Then the permanent of A is
permanent(A) = 1 * 5 * 9 + 2 * 6 * 7 + 3 * 4 * 8 (diagonals) + 1 * 6 * 8 + 2 * 4 * 9 + 3 * 5 * 7 (antidiagonals)
The trick of doing all diagonals and antidiagonals works for the 3 by 3 case, but no other. When going to the 4 by 4 case, for instance, you would do best by expanding along a row, and computing the permanents of the 3 by 3 minors, similar to the way that determinants of 4 by 4 or 5 by 5 matrices can be computed by (patient) hand.
Determinants are preserved by elementary row or column operations, but permanents are not. Thus, to compute a determinant efficiently, it is actually cheaper to carry out Gauss elimination to the point where we have triangularized the matrix. Then the product of the diagonal elements, times the sign of the permutation, gives us the determinant. But for permanents, this option is not available!
Back to TABLE OF CONTENTS.
A permutation matrix is a square matrix for which all the entries are 0 or 1, with the value 1 occuring exactly once in each row and column. Such a matrix, when premultiplying or postmultiplying another matrix, will simply permute the rows or columns of that matrix.
For example, the following is a permutation matrix:
0 1 0 0 0 1 1 0 0If A is the matrix
11 12 13 21 22 23 31 32 33then P*A permutes the rows of A:
21 22 23 32 32 33 11 12 13while A*P permutes the columns:
13 11 12 23 21 22 33 31 32
Simple facts about a permutation matrix P:
The use of pivoting during Gauss elimination means that along with the LU factors of A there is also a permutation matrix factor P:
P * L * U = A.
Similarly, column pivoting may be used during the QR factorization of a matrix, and in that case, the actual factorization is not A=Q*R, but A=Q*R*P, for some permutation matrix P.
Back to TABLE OF CONTENTS.
The Perron-Frobenius Theorem tells us a great deal about the largest eigenvalue of a positive or nonnegative matrix.
The Perron-Frobenius Theorem for a positive matrix: If the square matrix A is positive, then
The Perron-Frobenius Theorem for a nonnegative matrix: If the nontrivial matrix A is nonnegative, then
The Perron-Frobenius Theorem for a nonnegative irreducible matrix: If the nontrivial matrix A is nonnegative and irreducible, then in addition:
The Perron-Frobenius Theorem for a nonnegative irreducible primitive matrix: If the nontrivial matrix A is nonnegative and irreducible and primitive, then in addition:
Back to TABLE OF CONTENTS.
A persymmetric matrix A is a square matrix whose values are "reflected" across its main anti-diagonal, that is, for all I and J:
A_{i,j} = A(N+1-J,N+1-I).
Here is an example of a persymmetric matrix:
4 3 2 1 7 6 5 2 9 8 6 3 10 9 7 4
Back to TABLE OF CONTENTS.
Pivoting is the attempt to improve the accuracy of a linear algebra algorithm by choosing the most suitable column or row for use during a single step.
The most common instance of pivoting occurs in Gauss elimination. During the first step, we wish to eliminate all the entries in column one except for one entry. We will do this by adding carefully chosen multiples of row one to the other rows. For example, if the first two rows were
2 3 5 8 6 8 9 2we could add -3 times row 1 to row 2, and thus zero out the "6" in the first column of row 2. This scheme would not work, however, if the first entry in the first row were zero, because no multiple of zero can be anything but zero.
Thus, we allow ourselves to interchange the first row with some other row which has a nonzero entry in the first column. But since we're going to interchange rows anyway, it turns out that best accuracy occurs if we choose to bring in the entry with the largest absolute value. Such a scheme, which considers each column in order, and uses the largest entry in the row as the pivot, is called partial pivoting. This form of pivoting is used in most linear algebra software.
It can be shown that Gauss elimination can be carried out without pivoting if each of the principal minors of the matrix is nonsingular. If the matrix is positive definite, then this condition is guaranteed, and it is common to factor the matrix without pivoting.
A complete pivoting or full pivoting scheme would search for the entry of largest magnitude anywhere in the uneliminated portion of the matrix, and use that entry as the next pivot. Such a scheme requires a lot more searching, and auxiliary storage of the row and column numbers of the pivots. It is little used, since it does not seem to bring a great improvement in accuracy.
QR factorization can also use pivoting. At step I, columns I through N are examined. The column with the greatest norm is interchanged with column I, and then the QR operations for that step are carried out. In this case as well, pivoting is done to try to ensure stability.
Back to TABLE OF CONTENTS.
The polar decomposition of any matrix A has the form
A = P * Qwhere P is positive semidefinite and has the same rank as A, and Q is unitary.
The polar decomposition of a matrix can be determined from its singular value decomposition:
A = U * D * V'where we have
= ( U * D * U' ) * ( U * V' )
= P * Q
P = U * D * U'
Q = U * V'
Back to TABLE OF CONTENTS.
A (complex) matrix A is positive definite if it is true that for every nonzero (complex) vector x, the product:
x^{*} * A * x > 0,where, in particular, we are requiring that this product be a real number.
A real matrix A is restricted positive definite if it is true that for every nonzero real vector x:
x' * A * x > 0.If a real matrix is positive definite, then it is restricted positive definite.
Every complex positive definite matrix is guaranteed to be Hermitian. Thus, the phrase positive definite Hermitian is, strictly speaking, redundant. However, a restricted positive definite matrix is not guaranteed to be symmetric! This can happen because we only use real vectors in the test product.
Here is a simple example of a restricted positive definite matrix which is not symmetric:
1 1 -1 1because x' * A * x = x_{1}^{2}+x_{2}^{2}
Whenever real positive definite matrices occur in practical applications, however, they are assumed or required to be symmetric. If necessary, one can decompose a positive definite matrix in the restricted sense into its symmetric and antisymmetric parts. The symmetric part will actually be (fully) positive definite.
Simple facts about a positive definite matrix A:
A matrix about which no such information is known is called indefinite. A matrix for which the product is always nonnegative is sometimes called positive indefinite or positive semidefinite. There are similar definitions of negative definite and negative indefinite.
Back to TABLE OF CONTENTS.
A positive matrix A has only positive entries; that is, for all indices I and J,
A_{i,j} > 0.
Similar terms, with obvious definitions, include matrices that are nonnegative, negative and nonpositive. It's easy to check if a matrix is positive; this is much simpler than checking whether a matrix is positive definite.
Simple facts about a positive matrix A:
Back to TABLE OF CONTENTS.
The power method is a simple iterative algorithm for computing the eigenvalue of largest magnitude in a matrix, and its corresponding eigenvector.
The algorithm for an arbitrary matrix A works as follows:
The speed of convergence of the algorithm depends largely on the ratio between the magnitude of the largest and the second largest eigenvalues. If the second largest eigenvalue is close in magnitude to the largest, convergence will be slow. If A has several distinct eigenvalues of the same magnitude, say -2 and 2, or 5, 3+4i and 3-4i, then the algorithm will fail.
While this algorithm is very easy to program, it is difficult to adapt it to the case where other eigenvalues are desired. The use of deflation is a possibility, but still forces the user to compute the eigenvalues one at a time, and in order of size, making it rather difficult to find, say, just the smallest eigenvalue.
The inverse power method, a generalization of the power method, has the advantage of being able to find the other eigenvalues of the matrix.
Back to TABLE OF CONTENTS.
A preconditioner is a matrix M most commonly used to improve the performance of an iterative method for linear equations.
The simplest preconditioning is done using a left preconditioner. The original linear system is left-multiplied by the inverse of the preconditioning matrix, resulting in the system
M^{-1}*A*x=M^{-1}*bFor more complicated preconditioning, both left and right preconditioners, M_{1} and M_{2} may be used, so that the system is transformed to
M_{1}^{-1}*A*M_{2}^{-1}*M_{2}*x=M^{-1}*bA left and right preconditioning like this can be used if the original matrix is symmetric, and it is desired to preserve this symmetry in the transformed system.
The convergence of the iterative scheme depends on properties of the system matrix. A suitable choice of a preconditioner can mean that the transformed problem converges much more rapidly than the original one.
Although the definition of the preconditioner matrix suggests that we have to compute its inverse, it is usually the case that we have a factorization of the preconditioner which allows us to solve linear systems. Any computations that formally involve the inverse can therefore be replaced by linear system solves.
Desirable properties of a preconditioner matrix include:
Examples of preconditioners include
Back to TABLE OF CONTENTS.
A primitive matrix is a (square) matrix which is irreducible and which has only one eigenvalue of maximum modulus.
Note that we are not requiring that the eigenvalue be simple (that is, of multiplicity 1). Rather, we are requiring that there not be two distinct eigenvalues of the same modulus.
The Perron Frobenius Theorem applies to positive matrices. It is possible to extend most of the Perron-Frobenius theorem to a nonnegative matrix, if it is assumed to be a primitive matrix.
Back to TABLE OF CONTENTS.
A matrix A is a projection matrix if it is idempotent and hermitian (or symmetric in the real case).
A projection matrix is sometimes referred to as an orthogonal projection matrix, but in this case, the matrix is not generally what is termed an orthogonal matrix; rather, what is being referred to is the fact that A may be used to determine the projection of a vector x onto the orthogonal complement of a given subspace. Thus, the vectors A*x and (I-A)*x comprise a decomposition of x into two orthogonal components, one lying in the space spanned by the columns of A, and one in the space orthogonal to that one.
Simple facts about a projection matrix A:
Given a set of k linearly independent vectors v_{i}, with F=[v_{1}, v_{2},...,v_{k}] then the projection onto the space spanned by these vectors is
A = F * inverse ( F^{H} * F ) * F^{H}
The previous statement be rephrased as follows: Given an m by n rectangular matrix F, with m>=n, with maximal column rank (that is, the columns are linearly independent), then the matrix
A = F * inverse ( F^{H} * F ) * F^{H}can be used to determine the projection of any vector onto the column space of F.
If, in fact, the columns of F happen to be of unit L2 norm and pairwise orthogonal, (that is, the columns are an orthonormal set of vectors), then F^{H} * F = I, so the formula for the projector simplifies to:
A = F * F^{H}This particular formulation often occurs when a QR factorization has been used to extract an orthonormal basis for the column space.
Note, in particular, that it must be the case that A * F = F. When solving the overdetermined linear system,
F * x = bit is natural to consider the related projected linear system
A * F * y = F * y = A * bwhich can be solved exactly, because A*b lies in the column space of F. Moreover, this solution y is the best solution to the original problem F * x = b, in the least squares sense, that is, it achieves the residual of minimum L2 norm. The matrix A is sometimes called the hat matrix for matrix F. It is often used in statistics; in some cases, the diagonal entries of A are of considerable interest.
In the simple case where we are given one (nonzero) vector v, then F=v, and the projection onto the space spanned by v is
A = v * inverse ( v' * v ) * v' = v * v' / ||v||^{2}using the l2 norm; hence the matrix that projects a vector into the orthogonal complement of v is is
I - A = I - v * v' / ||v||^{2},or, if v is already of unit l2 norm, simply I-v*v'.
Back to TABLE OF CONTENTS.
A matrix A has property A if the indices 1 to N can be divided into two sets S and T so that, for any A_{i,j} which is not zero, it must be the case that:
This is equivalent to saying that, by listing the S rows and columns first, the matrix can be rewritten in the block form:
| D1 F | | G D2 |where D1 and D2 are diagonal.
Back to TABLE OF CONTENTS.
The pseudoinverse is a generalization of the idea of the inverse matrix, for cases where the standard inverse matrix cannot be applied. Such cases include matrices A which are singular, or rectangular.
The pseudoinverse is sometimes called the Moore Penrose inverse or the generalized inverse.
The pseudoinverse of an M by N rectangular matrix A is defined as the unique matrix N by M matrix A^{+} which satisfies the four conditions:
A * A^{+} * A = ANote that if A is a square invertible matrix, then the pseudo inverse is actually the inverse.
A^{+} * A * A^{+} = A^{+}
(A * A^{+})' = A * A^{+}
(A^{+} * A)' = A^{+} * A
The pseudoinverse can be used in a way similar to the way an inverse is used. For instance, given the rectangular set of linear equations
A * x = ya "solution" can be computed as:
x = A^{+} * y.If the equations form a consistent system, then x will actually satisfy the equations. Otherwise, x will be a "best possible" solution, in the sense that it minimizes the Euclidean norm of the residual error.
The pseudoinverse can be computed from the information contained in the singular value decomposition, which has the form:
A = U * S * V'where
A^{+} = V * S^ * U'where S^ is an M by N diagonal matrix, whose diagonal entries are the inverse of the diagonal entries of S, except that where S has zero entries, so does S^.
Back to TABLE OF CONTENTS.
The QR factorization factors a matrix A into an orthogonal matrix Q and an upper triangular matrix R, so that
A = Q * R.The factorization can also be applied to rectangular matrices, in which case one of the factors is no longer square.
The QR factorization can be useful for solving the full variety of linear systems, whether nonsingular, under-determined, over-determined or ill conditioned. It can be used to carry out the Gram Schmidt orthogonalization of a set of vectors constituting the columns of A. The QR factorization is also used repeatedly in an iterative solution of eigenvalue problems.
The QR factorization can be produced incrementally, by a series of transformations involving Householder matrices or Givens rotation matrices
As an example of QR factorization, the matrix A:
1 1 0 1 0 1 0 1 1can be factored into the orthogonal matrix Q:
SQRT(1/2) SQRT(1/6) -SQRT(1/3) SQRT(1/2) -SQRT(1/6) SQRT(1/3) 0 SQRT(2/3) SQRT(1/3)and the upper triangular matrix R:
SQRT(2) SQRT(1/2) SQRT(1/2) 0 SQRT(3/2) SQRT(1/6) 0 0 SQRT(4/3)
LAPACK and LINPACK include routines for computing the QR factorization.
Back to TABLE OF CONTENTS.
The QR method is used to find the eigenvalues of a square matrix.
For efficiency, the method begins by transforming the matrix A using Householder matrices until we have determined an upper Hessenberg matrix A1 which is orthogonally similar to A.
The QR factorization of A1 is then computed, A1 = Q1*R1. Then the factors Q1 and R1 are reversed, to compute a new matrix A2 = R1*Q1. A2 itself can be QR factored into A2 = Q2*R2, and then reversal of factors results in a matrix A3, and so on.
Each matrix A1, A2, and so on is orthogonally similar to A, and so shares the same eigenvalues. The sequence of matrices A, A1, A2, A3, ... will generally converge to a matrix B of a very simple type: B will consist entirely of diagonal entries and two by two blocks. The diagonal entries are the real eigenvalues, and the eigenvalues of the 2 by 2 blocks are the complex eigenvalues.
In the particular case where the eigenvalues are all real, (and hence B is diagonal), and if the orthogonal transformations have been accumulated along the way, the result can be written:
A = Q * B * Q'which is an orthogonal similarity transformation of A into a diagonal matrix, or
A * Q = Q * Bwhich means that the columns of Q are the eigenvectors.
If the matrix B is not diagonal, then the pair of columns corresponding to any 2 by 2 block can be used to construct the pair of corresponding complex eigenvectors. Alternatively, once the eigenvalues have been determined, the inverse power method can be used to compute any particular eigenvector.
Back to TABLE OF CONTENTS.
Quaternions, discovered by William Hamilton, have the form a+bi+cj+dk, where i, j and k are "special" quantities.
The properties of "1" and the 3 special quantities are best displayed in a multiplication table:
1 | i | j | k | |
---|---|---|---|---|
1 | 1 | i | j | k |
i | i | -1 | k | -j |
j | j | -k | -1 | i |
k | k | j | -i | -1 |
It is possible to devise matrices that behave like quaternions. Let the value "1" be represented by the identity matrix of order 4, and the value "i" be represented by
0 1 0 0 -1 0 0 0 0 0 0 -1 0 0 1 0the value "j" by
0 0 1 0 0 0 0 1 -1 0 0 0 0 -1 0 0the value "k" by
0 0 0 1 0 0 -1 0 0 1 0 0 -1 0 0 0Then it is easy to show that these matrices, and linear combinations of them, obey the rules of quaternions.
Back to TABLE OF CONTENTS.
A rank one matrix is simply a matrix with rank one. The simplest example would be any matrix with a single nonzero entry. Another simple example is any matrix whose columns (or rows) are all equal (and not entirely zero). Finally, any matrix whose colums (or rows) are all a multiple of the first column (or row) is a rank one matrix.
These matrices are not simply a peculiarity. They can be represented in an interesting way. An M by N rank one matrix A must have the representation
A = u * v'where u is an M by 1 vector and v is an N by 1 vector. This linear algebraic representation means that rank one matrices can be used to analyze a number of interesting structures, including the singular value decomposition.
A rank one update to a matrix is a simple but useful perturbation whose effects can be calculated.
If we suppose that we have an m by n matrix A, then a rank one update is accomplished by adding the outer product of an m vector u and an n vector v:
B = A + u v'
If the matrix A is actually square, then the rank of A only changes by +1, -1 or 0 when a rank one update is applied. If A is actually invertible, then the Sherman Morrison Formula shows how to cheaply compute the inverse of the updated matrix.
If the matrix A is the identity matrix, then we have
det ( I + u v' ) = 1 + v'u.
Back to TABLE OF CONTENTS.
The Rayleigh quotient of a matrix A and vector x is
R(A,x) = (x'*A*x) / (x'*x).
If the matrix A is symmetric, then the Rayleigh quotient is an excellent estimate of the "nearest" eigenvalue; in particular,
Lambda_{min} <= R(A,x) <= Lambda_{max}
On the other hand, if the matrix is not symmetric, then we cannot guarantee that the eigenvalues will be real; moreover, we cannot guarantee that the eigenvectors will be orthogonal. Thus for a general matrix, the information obtained from the Rayleigh quotient must be interpreted more carefully.
A frequent use of the Rayleigh quotient is in the inverse power method. Starting with approximations for the eigenvalue and eigenvector, one step of the inverse power method gives an improved estimate of the eigenvector. Then the Rayleigh quotient improves the eigenvalue estimate, and will be used to shift the next step of the inverse power method. This pair of steps can be rapidly convergent.
If the matrix A is actually positive definite symmetric, then the quantity (y'*A*x) has all the properties of an inner product. In that case, the Rayleigh quotient may be generalized to involve any pair of vectors:
R2(y,A,x) = ( y'*A * x ) / ( y'*x ).
If the matrix is not symmetric, but we still want a sensible way to estimate the eigenvalues, then the "Unsymmetric Rayleigh Quotient" could be defined as:
URQ(A,x) = sqrt ( ( ( A * x)'*(A * x) ) / ( x'*x ) ).This amounts to the square root of the Rayleigh quotient for the symmetric matrix A'*A, but can also be regarded as the ratio of the L2 norms of A*x and x.
Back to TABLE OF CONTENTS.
A rectangular matrix is a matrix which is not "square", that is, a matrix whose row order and column order are different.
While many operations and algorithms of linear algebra only apply to a square matrix, a rectangular matrix does have an LU factorization, and a singular value decomposition.
A rectangular matrix can occur when solving an under-determined or over-determined linear system.
Back to TABLE OF CONTENTS.
A reflection matrix A has the property it carries out the reflection (or negation) of the portion of every vector that is perpendicular to some hyperplane, while leaving the parallel portion of the vector unchanged.
A reflection matrix A is involutory, that is,
A * A = I,which strongly restricts the eigenvalues.
Examples of reflection matrices include the identity matrix, a diagonal matrix whose diagonal entries are +1 or -1, any matrix which rotates two coordinate axes by 180 degrees, and the Householder matrices.
Simple facts about a reflection matrix A:
Back to TABLE OF CONTENTS.
The residual error is a measure of the error that occurs when a given approximate solution vector x is substituted into the equation of the problem being solved.
For a system of linear equations, the residual error is the vector r defined as
r = b - A * x
For the eigenvalue problem, the residual error is a vector which is a function of the approximate eigenvector x and the approximate eigenvalue lambda:
r = lambda * x - A * x.
When carrying out an iterative solution process, it is common to compute the residual error for each new approximate solution, and to terminate the iteration successfully if the vector norm of the residual error decreases below some tolerance.
An important fact to realize is that the residual error is not, by itself, a reliable estimate of the error in the solution of a linear system. If the residual error has small norm, we can really only hope that the solution error (between our computed x and the true solution x*) is small. We need to know the norm of the inverse of the matrix, in which case the following restriction holds:
||x - x*|| <= ||A^{-1}|| * || A ( x - x* ) ||so that when the norm of the residual error, ||r|| is small, we have a precise upper bound on the error in the solution. Since such matrix norms are generally not known or computable, what we really have is a promise of continuity in the errors: as we drive the residual down, we are forcing down the upper limit on the approximation error.
= ||A^{-1}|| * || A * x - b + b - A * x* ||
= ||A^{-1}|| * ||r||,
Back to TABLE OF CONTENTS.
An N-th root of unity is any complex number W such that W^{N} = 1.
For a given N, there are N such roots, which can be summarized as:
THETA = 2 * pi / N
W_{J} = cos ( ( J - 1 ) * THETA ) + i * sin ( ( J - 1 ) * THETA )
Roots of unity are especially useful in discussing Fourier matrices and Circulant matrices.
Back to TABLE OF CONTENTS.
A rotation is a linear transformation which preserves (Euclidean) distances.
Because a rotation R is a linear transformation, the value of R * 0 must be 0; in other words, the origin does not move. Because a rotation preserves distances, it must be the case that, in the Euclidean vector norm,
|| R * x || = || x ||for every vector x in the space. From this fact, we can conclude that:
Common examples of matrices which embody rotations include:
Back to TABLE OF CONTENTS.
Row echelon form is a special matrix structure which is usually arrived at by a form of Gauss elimination.
Any matrix, including singular and rectangular matrices, can be transformed into this form, using a series of elementary row operations. Once the form is computed, it is easy to compute the determinant, inverse, the solution of linear systems (even for underdetermined or overdetermined systems), the rank, and solutions to linear programming problems.
Moreover, the process can be considered a matrix factorization, of the form
A = B * Ewhere B is nonsingular, and E is in row echelon form.
A matrix (whether square or rectangular) is in row echelon form if:
A matrix is in row reduced echelon form if it is in row echelon form, and it is also true that:
Row echelon form is primarily of use for teaching, and analysis of small problems, using exact arithmetic. It is of little interest numerically, because very slight errors in numeric representation or arithmetic can result in completely erroneous results.
Back to TABLE OF CONTENTS.
The row rank of a matrix is the number of linearly independent rows it contains.
The matrix
1 2 3 4 5 6 7 8 9 10 11 12has row rank 2, because row 3 is equal to twice row 2 minus row 1.
For any matrix, the row rank is the same as the column rank. This common value is also equal to the rank of the matrix, which is defined for both square and rectangular matrices.
For a square matrix of order n, the rank is a number between 0 and n. A square matrix whose rank is n is said to be nonsingular. For a rectangular matrix of order m by n, the rank must be a number between 0 and the minimum of m and n. Rectangular matrices which attain this maximal value are not called "nonsingular". Instead, they are said to have full row rank, full column rank, or maximal rank.
Back to TABLE OF CONTENTS.
The row space of an M by N matrix A is the set of all possible linear combinations of rows of A.
If the N vector v is a linear combination of rows of A, then there is a M vector c of coefficients with the property that
v = c * AIn other words, the row space is the set of all possible results of premultiplying A by an arbitrary vector.
Back to TABLE OF CONTENTS.
The row sum vector of an M by N matrix A is the vector R of length M whose I-th entry is the sum of the elements of row I of the matrix.
A matrix is said to have constant row sum if all the elements of R are equal.
The operation of computing the row sum vector R is equivalent to multiplying the matrix A times a vector X whose entries are all 1:
R = A * X
Therefore, if a square matrix A has a constant row sum r, then A has the eigenvalue r with eigenvector X:
A * X = R = r * X
Matrices with constant row sum include
Back to TABLE OF CONTENTS.
Suppose we are working with a block matrix Aof the form
A_{11} A_{12} A_{21} A_{22}where A_{11} is square and nonsingular. The Schur complement of A_{11} in A is the matrix
S = A_{22} - A_{21} A^{-1}_{11} A_{12}
The Schur complement can be thought of as the data that would replace A_{22} after Gauss elimination (without pivoting) had been carried out on the matrix A, up to the point where A_{11} had been triangularized.
Back to TABLE OF CONTENTS.
The Schur decomposition is a decomposition of a square matrix. For a complex matrix, the decomposition involves a unitary factor, while for a real marix, there is an orthogonal factor.
The eigenvalue / eigenvector system for a matrix, when it exists, results in a decomposition that includes a diagonal factor (the eigenvalues); the corresponding factor in the Schur decomposition is a tridiagonal factor. However, in many cases the eigen decomposition does not exist for a given matrix, while the Schur decomposition can always be formed, as long as the matrix is square. For a more general and more useful decomposition, see the singular value decomposition.
The Schur decomposition of a complex square matrix A is a factorization of the form
A = U * T * U^{*}where T is an upper triangular matrix, and U is unitary. The decomposition is also known as the Schur normal form. The form of the decomposition shows that the matrix T is unitarily similar to A.
Some facts about the Schur decomposition U * T * U^{*} of a complex square matrix A:
If A is a real matrix, then it has the unitary Schur decomposition described above, but it also has an orthogonal decomposition that is "almost" upper triangular:
A = Q * T_{2} * Q'where T_{2} is a block upper triangular matrix, and Q is an orthogonal matrix.
Some facts about the Schur orthogonal decomposition Q * T_{2} * Q' of a real square matrix A:
If A is a real square matrix, and has only real eigenvalues, then the matrix T_{2} in the orthogonal Schur decomposition must actually be a diagonal matrix, and so the decomposition has the form:
A = Q * LAMBDA * Q'
Some facts about the Schur decomposition Q * LAMBDA * Q' of a real square matrix A with only real eigenvalues:
Back to TABLE OF CONTENTS.
The Sherman Morrison formula applies to a matrix which has been perturbed in a simple way, and produces the inverse, or the solution of a linear system for the perturbed system, based on information already computed for the original system.
The perturbation of the coefficient matrix A is assume to be of the form:
C = A + u * v',where the perturbation is an outer product of the vectors u and v. Modifying a matrix in this way is sometimes called a rank one update. Changing a single entry A_{i,j} of a matrix can be viewed in this way, for instance.
The Sherman Morrison formula for the inverse of the perturbed matrix is:
C^{-1} = A^{-1} - z * w' / ( 1 + alpha ),where:
z = A^{-1} * u;
w = (A^{-1})' * v;
alpha = v' * z.
In the common case where the solution of C * x = b is desired, and the LU factorization of A has been computed, so that linear systems involving the original matrix A can be easily solved, the procedure is:
solve A * z = u for z;
solve A' * w = v for w;
set alpha = v' * z;
set beta = w' * b;
solve A * x = b for x;
adjust x := x - beta * z / ( 1 + alpha ).
The method will fail if 1 + alpha is 0, which can indicate singularity of the matrix C or a more technical problem.
Back to TABLE OF CONTENTS.
A sign symmetric tridiagonal matrix A is one for which the signs of every pair of corresponding off-diagonal entries are equal:
sign ( A(I,I+1) ) = sign ( A(I+1,I) )for I = 1 to N-1. (A value of zero matches any sign of the other entry.)
A symmetric tridiagonal matrix is always sign symmetric. Some EISPACK routines handle the case of a sign symmetric tridiagonal matrix directly. A matrix is strictly sign symmetric if it is sign symmetric and zero entries only occur in pairs.
The following tridiagonal matrix is not symmetric, is sign symmetric, and not strictly sign symmetric:
1 2 0 0 0 3 4 -7 0 0 0 -3 4 2 0 0 0 5 3 0 0 0 0 1 9
Back to TABLE OF CONTENTS.
Two matrices A and B are similar if B is related to A by a matrix P in the following way:
B = P^{-1} * A * P.In this case, P is said to be the similarity transformation matrix.
Matrices which are similar have the same eigenvalues. Special cases include the similarity matrix P being an elementary transformation matrix, or an orthogonal matrix or a unitary matrix.
Many algorithms try to improve speed or efficiency by using similarity transforms on an input matrix A, so as to find a simpler matrix B for which the problem can be more easily or more quickly solved. It may then be necessary to take the answer for the problem about B and "backtransform" it to an answer for the problem about A. For example, if we get the eigenvalues and eigenvectors of B, A will have the same eigenvalues, but will have different eigenvectors, related by the similarity transform.
Every symmetric matrix A is orthogonally similar to a diagonal matrix. Since the inverse of an orthogonal matrix is its transpose, this relationship may be written
B = Q^{-1} * A * Q = Q' * A * Q.
Back to TABLE OF CONTENTS.
A simple eigenvalue is an eigenvalue, or root of the characteristic equation, which has the property that it is not a repeated root. Another way of stating this is that a simple eigenvalue has an algebraic multiplicity of 1.
Since every eigenvalue has at least one eigenvector, and the geometric multiplicity is never greater than the algebraic multiplicity, we have that a simple eigenvector is associated with a 1-dimensional eigenspace.
A semisimple eigenvalue is an eigenvalue for which the algebraic and geometric multiplicities are equal. A semisimple eigenvector is associated with a K-dimensional eigenspace, and there are K linearly independent eigenvectors. Of course, any simple eigenvalue is also a semisimple eigenvalue.
A matrix is diagonalizable if and only if every eigenvalue is semisimple.
The remaining possibility, called a nonsimple eigenvalue, is one for which the algebraic multiplicity is greater than one, and the geometric multiplicity is smaller than the algebraic multiplicity. In this case, the eigenvalue is associated with a K-dimensional eigenspace, but there are less than K linearly independent eigenvectors.
Back to TABLE OF CONTENTS.
A singular matrix is a square matrix that does not have an inverse.
Facts about a singular matrix A:
Back to TABLE OF CONTENTS.
The singular value decomposition of a real rectangular M by N matrix A is a factorization of the form:
A = U * S * V'where, in the "reduced SVD" or EISPACK version:
The EISPACK version of the factorization is sometimes called "the economy size" since it omits information that actually doesn't have an effect on the result.
The LINPACK/LAPACK form has the advantage that U and V are both orthogonal, and S retains the "shape" of A. Moreover, this format allows us to consider U and V to be composed of left and right singular vectors, in analogy to the eigenvalue factorization (when it exists) of square matrix. Also, the extra information in the full SVD gives a basis for the null space of A.
One use for the singular value decomposition is that it allows us to "solve" any linear system A*x=B, whether singular or nonsingular, overdetermined or undetermined. Except for the nonsingular square case, the solution is only done in a generalized sense. In particular, if there are multiple solutions, then the singular value decomposition can be used to find the solution of minimum l2 norm. If there are no exact solutions, then the singular value decomposition can be used to find a "solution" for which the residual has minimum l2 norm.
The generalized solution of A * x = b is equal to
x = V * S^ * U' * bwhere S^ is a diagonal matrix whose entries are the inverse of the nonzero entries of S, or 0 when the corresponding entry of S is zero.
For any column I no greater than the minimum of M and N, let u_{i} be the I-th column of U, and v_{i} be the I-th column of V, and s_{ii} be the I-th diagonal element of S. Then it is a fact that
A * v_{i} = s_{ii} * u_{i}and
A' * u_{i} = s_{ii} * v_{i}which allows us to conclude that
A * A' * u_{i} = s_{ii} * s_{ii} * u_{i}and
A' * A * v_{i} = s_{ii} * s_{ii} * v_{i}In other words, U, V and S contain information about the eigenvalues and eigenvectors of A * A'.
Conversely, if we know the eigenvalues and eigenvectors of A * A', then we know the squares of the singular values of A, and the left singular vectors of A (the U matrix).
The columns of U and V' can be thought of as forming a sequence of min ( M,N ) rank one matrices A_{1}, A_{2}, and so on, with
A_{1} = U(1:N,1) * S(1,1) * V(1:M,1)',These rank one matrices have the property that they sum up to A. Moreover, A_{1} is the rank one matrix which best approximates A in the l2 norm, while A_{1}+A_{2} is the best rank two approximant, and so on. This property means that initial column subsets of U and V' can be used for a reduced order model of A.
A_{2} = U(1:N,2) * S(2,2) * V(1:M,2)',...
The singular value decomposition can be used to construct the the pseudoinverse of the rectangular or singular matrix A.
Routines for singular value decomposition are included in EISPACK, LAPACK, and LINPACK.
Back to TABLE OF CONTENTS.
A skew centrosymmetric matrix is one which is antisymmetric about its center; that is,
A_{i,j} = - A_{m+1-i,n+1-j}
Example:
1 10 8 11 -5 -13 2 9 4 -12 -6 -7 0 7 6 12 -4 -9 -2 13 5 -11 -8 -10 -1
Facts about a skew centrosymmetric matrix A:
A matrix A is skew circulant if it can be formed by negating the strict lower triangle of a circulant matrix.
Example:
a b c d e f -f a b c d e -e -f a b c d -d -e -f a b c -c -d -e -f a b -b -c -d -e -f a
A skew circulant matrix is a TABLE OF CONTENTS.
A complex matrix A is skew Hermitian if it is equal to the negative of its transpose complex conjugate:
A = - ( conjugate ( A ) )'.
A skew Hermitian matrix must have a purely imaginary diagonal.
Back to TABLE OF CONTENTS.
Skyline storage is a matrix storage method for storing a particular kind of sparse matrix. The format is simple, compact, and suitable for use with Gaussian elimination.
Skyline storage is most typically used with symmetric matrices derived from a finite element problem. In this setting, it is common to encounter a matrix which is "nearly" banded, or "raggedly banded". That is, the nonzero elements of the matrix are always near the diagonal, but the number of such elements varies from column to column.
Here is a matrix suitable for skyline storage:
A11 A12 0 A14 0 A21 A22 0 A24 0 0 0 A33 0 0 A41 A42 0 A44 A45 0 0 0 A54 A55
If the matrix is symmetric, we can regard the matrix as a sequence of columns, starting with the first nonzero element in a column, and proceeding to the diagonal, including every entry in between, whether or not it is zero. Thus, this matrix could be regarded as equivalent to the following five column vectors:
A14 A24 A12 A34 A45 A11 A22 A33 A44 A55
(Note that we have added a location for A34. It's storing a zero, but it's between a nonzero entry, A24, and the diagonal A44). This is the heart of the idea behind skyline storage. We simply cram all these columns into a single vector:
A11, A12, A22, A33, A14, A24, A34, A44, A45, A55,and then figure out a way to address the individual entries. The obvious route is to have a way of pointing to the diagonal entries:
A11 is in entry 1, A22 in entry 3, A33 in entry 4, A44 in entry 8, A55 in entry 10.and we can store these values in an array called "INDEX". Then, we know that column 4 of the matrix is contained in locations INDEX(3)+1 through INDEX(4).
The reason that skyline storage is ideal for Gaussian elimination is that we have already set aside all the entries we will ever need to take care of "fill in", as long as we are not performing any pivoting.
For instance, consider what happens when we are eliminating row 1. We will add multiples of row 1 to rows 2 and 4. If there is a nonzero entry in a particular column of row 1, then that entry could cause "fill in" when added to row 2 or 4. But there is no problem, because if column J of row 1 is nonzero, then we're sure we've already set aside a entry for column J of rows 2 through row J. So we will never have to modify our storage scheme because of unexpected nonzero entries generated by fillin.
On the other hand, it should be clear that indexing entries of the matrix can be tedious or cumbersome. In particular, it is much harder to find entries of a row of the matrix than of a column. Writing a routine to actually carry out Gauss elimination with such a data structure can be something of a trial.
Thus skyline storage achieves simplicity and compactness in storage, at the cost of complexity in coding.
Back to TABLE OF CONTENTS.
The span of a set of vectors { v(i)| 1 <= I <= N } is the set of vectors that can be constructed from the set via linear combinations. The span is sometimes called the spanning space.
The span of a set of vectors is a linear space; it includes the 0 vector, each of the original vectors, and sums and multiples of them.
As an example of a vector span, consider the eigenvectors associated with a particular eigenvalue of a matrix. If the eigenvalue has algebraic multiplicity of 1, then we usually think there is only one associated eigenvector; if the eigenvalue has a higher multiplicity, there may be 2 or more linearly independent eigenvectors. In any case, the eigenspace associated with the eigenvalue is the span of these eigenvectors. Any element of this span is itself also an eigenvector for the eigenvalue.
Given a set of vectors which define a spanning space, it is often desired to know whether a subset of those vectors will define the same spanning space; that is, whether we can toss out some of the vectors because they don't add anything to the span. If we toss out as many vectors as possible, we end up with a basis for the span; the number of vectors remaining tells us the dimension of that space.
Back to TABLE OF CONTENTS.
The Sparse BLAS are a set of vector oriented linear algebra routines useful for sparse matrix problems.
The success of the Basic Linear Algebra Subprograms (BLAS) motivated the creation of a small set of similar routines for use with sparse vectors. The routines enable the interaction of sparse and full vectors. The sparse vectors are assumed to be stored as a vector of values, and a vector of indices. For example, a sparse vector might be represented by the pair
X = (1.0, 2.0, 3.0) IX = (9, 1, 200)which represents a vector that is entirely zero, except that entry 1 equals 2.0, entry 9 equals 1.0 and entry 200 equals 3.0.
FORTRAN subroutines which implement the sparse BLAS are available through the NETLIB web site.
The single precision Sparse BLAS routines include:
Back to TABLE OF CONTENTS.
A sparse matrix is a matrix with so many zero entries that it is profitable to use special schemes to store the matrix and solve problems involving it. A rule of thumb is that if less than 5% of the matrix is nonzero, the matrix can be called sparse.
Not only does a sparse matrix have few nonzeroes, but these nonzeroes are typically scattered seemingly at random throughout the matrix. Thus, unlike a band matrix, (which also is "mostly zero") we have to expend a significant amount of effort simply recording where every nonzero entry occurs.
Back to TABLE OF CONTENTS.
A sparse storage scheme is a matrix storage method of storing the nonzero entries of a sparse matrix in a convenient and efficient way.
There are many different kinds of sparse matrix storage scheme. Their features depend on the properties of the underlying problem. In this discussion, a few simple non-specific schemes will be discussed.
If a direct linear equation solver will be used, storage must also be made available for "fill-in" values that will occur during the factorization process. An iterative solver, on the other hand, has no fill-in values, and never needs to change the entries of the matrix.
A simple storage scheme would require the user to supply NZ, the number of nonzero entries, and three arrays of size NZ:
If this scheme seems excessive, the elements of the matrix could be listed in order of rows, the ROW array could be replaced by a ROWEND array of length N. ROWEND(I) is the index in A of the last nonzero element of row I of the original matrix. This small saving comes at the price of complicating access to the matrix.
Back to TABLE OF CONTENTS.
SPARSKIT is a tool kit for sparse matrix computations.
SPARSKIT can manipulate sparse matrices in a variety of formats, and can convert from one to another. For example, a matrix can be converted from the generalized diagonal format used by ELLPACK and ITPACK to the format used by the Harwell-Boeing Sparse Matrix Collection (HBSMC) or even into LINPACK banded format.
Utilities available include converting data structures, printing simple statistics on a matrix, plotting a matrix profile, performing basic linear algebra operations (similar to the BLAS for dense matrix), and so on.
Back to TABLE OF CONTENTS.
The spectral radius of a matrix is the magnitude of the largest eigenvalue of a matrix.
The spectral radius is often easy to compute, and it is a useful measure of the "size" or "strength" of a matrix. However, the spectral radius is not a matrix norm.
The spectral radius violates several rules for matrix norms. In particular, it is possible to have any of the following situations:
On the other hand, the spectral radius is a lower bound for the value of any vector-bound matrix norm on A, because there must be an eigenvalue lambda and a vector of unit norm x with the property that
A * x = lambda * xso the norm of A*x divided by the norm of x is lambda. Therefore, the matrix norm of A must be at least | lambda |.
The Euclidean norm of a real symmetric matrix is equal to its spectral radius.
Back to TABLE OF CONTENTS.
The spectrum of a square matrix is the set of eigenvalues.
Back to TABLE OF CONTENTS.
A square matrix is one which has the same number of rows and columns. Most cases (but not all!) requiring the solution of a linear system involve a square coefficient matrix.
Only a square matrix has a determinant, an inverse (if not singular!) a trace, powers, a matrix exponential, and eigenvalues.
If a matrix is not square, it is called rectangular.
Back to TABLE OF CONTENTS.
Given a matrix A, we say that the matrix B is the square root of A if it is true that A = B * B = B^{2}.
It is obvious that A must be a square matrix in order to have a square root.
(Diagonal matrices): If D is a diagonal matrix, then a square root of D can be computed by taking the square roots of each diagonal entry; if any entry was negative, the square root matrix will be complex.
The square root matrix need not be unique.
For example, the 2x2 diagonal matrix with diagonal entries 4 and 9 has the following four distinct square root matrices:
(2 0) (2 0) (-2 0) (-2 0) (0 3) (0 -3) ( 0 3) ( 0 -3)
(Diagonalizable Matrices): If A is diagonalizable, so that A = X * D * inv(X), then a square root of A can be computed by sqrt(A) = X * sqrt(D) * inv(X). Note that if any entry in D was negative, we will end up with a complex square root matrix.
(Symmetric Matrices): If A is positive semidefinite symmetric, then the square root will be real, and can be determined from the Cholesky factorization as follows:
A = L * L'Now determine the Singular Value Decomposition of L
L = U * D * V'and write
A = L * L'where X is the desired matrix square root of A. In particular:
= U * D * V' * ( U * D * V' )'
= U * D * V' * V * D * U'
= U * D * D * U'
= U * D * U' * U * D * U'
= X * X
X = U * D * U'
If B^{3}=A, we say that B is the cube root or third root of A, and for any integer N, B^{N}=A means that B is called an N-th root of A.
Back to TABLE OF CONTENTS.
A stochastic matrix has only nonnegative entries, with all row sums equal to 1.
A stochastic matrix may also be called a row stochastic matrix or Markov matrix or a transition matrix. These names derive from the fact that the entry A_{i,j} may be viewed as a probability that a system currently in state I will transition to state J on the next step.
Facts about a (row) stochastic matrix A:
A column stochastic matrix has only nonnegative entries, with the entries in each column summing to 1.
A doubly stochastic matrix is both row and column stochastic.
Facts about a doubly stochastic matrix A:
A = Sum ( 1 <= I <= N ) c(I) * P(I)where N is the order of A, the real numbers c(I) sum to 1, and each P(I) is a permutation matrix.
An ergodic matrix is a row stochastic matrix which has no eigenvalues of modulus 1 except for 1 itself.
Back to TABLE OF CONTENTS.
Strassen's algorithm for matrix multiplication is a method which can produce a matrix product more efficiently than the standard approach.
For simplicity, suppose that the two factors A and B are both of order N. Then the obvious method of producing the product C requires N multiplications and N additions for each entry, for a total of 2 * N^{3} floating point operations.
Strassen's algorithm is defined recursively. It is easiest to describe if the matrix has an order that is a power of two. In that case, the product of two matrices of order N is described in terms of the product of matrices of order (N/2), and so on, until factors of order 2 are reached.
Now suppose that A and B are each of order 2. The definitions for the entries of the product C are:
C11 = A11 * B11 + A12 * B21 C12 = A11 * B12 + A12 * B22 C21 = A21 * B11 + A22 * B12 C22 = A21 * B12 + A22 * B22Now compute the following quantities:
P1 = (A11+A22) * (B11+B22) P2 = (A21+A22) * B11 P3 = A11 * (B12-B22) P4 = A22 * (B21-B11) P5 = (A11+A12) * B22 P6 = (A21-A11) * (B11+B12) P7 = (A12-A22) * (B21+B22)Then it is simply a matter of substitution to show that
C11 = P1 + P4 - P5 + P7 C12 = P3 + P5 C21 = P2 + P4 C22 = P1 + P3 - P2 + P6Instead of 8 multiplications, only 7 are required, at the cost of several more additions.
The reason it does is that we can apply the above formulas recursively. And as we break a matrix of order N into matrices of order N/2, we have to define 7 values, not 8. But each of those 7 values is also a matrix multiplication, and hence can be computed by the algorithm, and requires only 7 multiplications, and not 8. It turns out that the number of quantities we have to define drops precipitously, and so the fact that we have to use a lot of extra additions to define them doesn't matter.
Strassen's algorithm requires an amount of work that increases with N like N^{LOG2 7} rather than N^{3}. The extra additions in the defining formulas cause the work formula to have a larger constant in front of it, so that for small N, the standard algorithm is faster. But, as we have shown above, there are now implementations of the Strassen algorithm that beat the best implementations of the standard algorithm on the Cray.
Reference:
Back to TABLE OF CONTENTS.
A submatrix of a matrix A is any rectangular "chunk" of the matrix.
The chosen entries may all be neighbors, or they may be chosen by choosing any subset of the row and column indices of A, in any order.
As a simple example of the use of submatrices, suppose that A has the form
( A1 A2 ) ( 0 A3 )where A1, A2 and A3 are submatrices. Then a linear system involving A could be solved as two smaller subproblems: solve A3 * x2 = b2, then solve A1 * x1 = b1 - A2 * x2. ( Here b1 and b2 are the parts of the right hand side that correspond to the subdivision of A).
Similarly, for this case, the eigenvalues of A can be determined by finding the eigenvalues of A1 and A3. (The computation of the eigenvectors would be a little more complicated.
Back to TABLE OF CONTENTS.
The successive overrelaxation method or SOR is an iterative method for solving linear systems, and is a generalization of the Gauss Seidel and the Jacobi iterations.
SOR is only appropriate for matrices which are strictly diagonally dominant or else symmetric and positive definite.
To derive SOR, think of both the Jacobi and Gauss Seidel iterations as computing a correction to the current estimate of the solution, so that a step of the method has the form:
X[N+1] = X[N] + dX.SOR offers the ability to add a greater or smaller proportion of the correction, which we will denote w:
X[N+1] = X[N] + w * dX.
Surprisingly, for the appropriate choice of w, the SOR method can converge faster than the Gauss Seidel or Jacobi methods. It can be shown that the SOR method will only be convergent for 0 < w < 2. Values of w less than 1 result in an underrelaxed iteration, while values greater than 1 (the usual case) correspond to an overrelaxed iteration.
For a given coefficient matrix, convergence of the SOR method is optimal for some value of w, but it is generally not easy to determine this value. A variety of schemes are available for estimating and adjusting the value used during a particular iteration.
The SOR iteration can be considered in terms of its matrix splitting. That is, if we decompose the matrix A into its strictly lower triangular, diagonal, and strictly upper triangular parts:
A = L + D + Uthen the method is equivalent to the iteration
xnew = x - w * ( L + D )^{-1} * ( A * x - b ).which means that the convergence of the algorithm can be understood in terms of the behavior of powers of the iteration matrix:
I - w * ( L + D )^{-1} * A.
If the original coefficient matrix A is symmetric, then it may be preferred to use the symmetric SOR iteration or SSOR. In this case, the iteration consists of pairs of SOR steps. The odd steps are the same as the usual iteration. But in the even steps, the variables are solved for in reverse order. Each pair of such steps is a single step of the SSOR iteration, which has the property that its iteration matrix is similar to a symmetric matrix (though not necessarily symmetric itself). Among other things, this means that SSOR can be used as a preconditioner for certain other problems.
Back to TABLE OF CONTENTS.
A (square) matrix A is said to be summable if the limit as k goes to infinity exists for the following:
(1/k) * ( I + A + A^{2} + ... + A^{k-1})
If A is convergent or semiconvergent, it is summable. However, a matrix can be summable without being convergent.
It should be clear that for a matrix to be summable, the spectral radius rho(A) must be less than or equal to 1. But if the spectral radius is less than one, the matrix is convergent, and we know how that works. The interesting case occurs when rho(A) = 1.
It turns out that if rho(A) = 1, then matrix A will be summable if and only if it has only semisimple eigenvalues.
Note that all stochastic matrices are summable.
Back to TABLE OF CONTENTS.
A symmetric matrix A is equal to its transpose, that is,
A = A'or, for every pair of indices I and J:
A_{i,j} = A_{j,i}
Every matrix A can be decomposed into the sum of an antisymmetric and a symmetric matrix:
A = B + C = (1/2) * ( ( A - A' ) + ( A + A' ) )
Here is an example of a symmetric matrix:
1 2 0 4 2 9 4 8 0 4 5 3 4 8 3 7
Simple facts about a symmetric matrix A:
The eigenvalues of successive members of the sequence of principal minors of a symmetric matrix have the Sturm sequence property, a strict interlacing relationship. The k+1 eigenvalues of the principal minor of order k+1 are strictly separated by the k eigenvalues of the minor of order k.
LAPACK, LINPACK and EISPACK include specialized routines for symmetric matrices, and include the use of symmetric matrix storage to save space.
Back to TABLE OF CONTENTS.
Symmetric storage is a matrix storage method of storing a symmetric matrix economically, omitting the repeated elements.
The strict lower triangle of a symmetric or Hermitian matrix is redundant. A symmetric storage scheme packs the upper triangle of the matrix into a linear vector of length ( N * ( N + 1 ) ) / 2. The data is organized by columns, with each column starting in row 1 of the original matrix, and proceeding down to the diagonal.
If A was the matrix:
11 12 13 14 15 12 22 23 24 25 13 23 33 34 35 14 24 34 44 45 15 25 35 45 55then A could be symmetrically stored as:
1 2 3 4 5 ( 11, 12, 22, 13, 23, 33, 14, 24, 34, 44, 15, 25, 35, 45, 55 ).
LAPACK, LINPACK and EISPACK include routines which can operate on data stored in this format.
Back to TABLE OF CONTENTS.
TEST_MATRIX is a collection of subroutines useful for generating and manipulating test matrices.
Many sample matrices are available with known inverse, determinant, eigenvalues, rank, symmetry, and other properties. These matrices may be used to test software for correctness, or for classroom demonstrations.
Most of the matrices come from a MATLAB M file collection developed by Nicholas Higham, Department of Mathematics, University of Manchester, and maintained in the "testmatrix" file somewhere at the MATLAB web site.
An earlier version of the collection is available, again as MATLAB M files, in ACM TOMS Algorithm 694, in the TOMS directory of the NETLIB web site.
I have a FORTRAN version of the source code available in the TEST_MATRIX page.
Back to TABLE OF CONTENTS.
A Toeplitz matrix is a matrix which is constant along each of its diagonals.
Here is an example of a square Toeplitz matrix:
4 5 6 7 3 4 5 6 2 3 4 5 1 2 3 4a "wide" rectangular Toeplitz matrix:
3 4 5 6 7 2 3 4 5 6 1 2 3 4 5a "tall" rectangular Toeplitz matrix:
5 6 7 4 5 6 3 4 5 2 3 4 1 2 3
Facts about a Toeplitz matrix A:
Compare the concepts of a Hankel Matrix, a Symmetric Matrix and a Circulant Matrix.
Back to TABLE OF CONTENTS.
A Tournament matrix is a (square) matrix whose diagonal is all zero, and such that, for every pair of corresponding off-diagonal entries A_{i,j} and A_{j,i}, exactly one is -1 and one is 1.
The tournament matrix can be thought of as recording the wins and losses of a set of players, each of whom plays every other player.
Example:
0 1 -1 1 -1 -1 0 -1 -1 1 1 1 0 1 1 -1 1 -1 0 -1 1 -1 -1 1 0
Facts about a Tournament matrix A:
Back to TABLE OF CONTENTS.
A Tournament win matrix is a (square) matrix whose diagonal is all zero, and such that, for every pair of corresponding off-diagonal entries A_{i,j} and A_{j,i}, exactly one is 0 and one is 1.
The tournament win matrix can be thought of as recording the wins of a set of players, each of whom plays every other player.
Example:
0 1 0 1 0 0 0 0 0 1 1 1 0 1 1 0 1 0 0 0 1 0 0 1 0
Facts about a Tournament Win matrix A:
Back to TABLE OF CONTENTS.
The trace of a (square) matrix is the sum of the diagonal elements.
Simple facts about the trace of a matrix A:
The trace of the following matrix is 4:
1 -1 0 -1 2 -1 0 -1 1and it has three eigenvalues, 0, 1 and 3, whose sum is also 4.
Back to TABLE OF CONTENTS.
The transpose of a matrix A is obtained by switching all pairs of values A_{i,j} and A_{j,i}.
In printed text, the transpose is usually denoted by a superscript T, as in A^{T}, while in running text the symbols A', A^T, transpose ( A ), or trans ( A ) might be used.
For the square matrix A:
1 2 0 4 4 8 9 2 5 5 6 3 7 0 0 3the transpose A' is:
1 4 5 7 2 8 5 0 0 9 6 0 4 2 3 3
For the "wide" rectangular matrix A:
11 12 13 14 15 21 22 23 24 25 31 32 33 34 35the transpose A' is the "tall" rectangular matrix:
11 21 31 12 22 32 13 23 33 14 24 34 15 25 35
Simple facts about the transpose of a matrix A:
(A + B)' = A' + B'
(A * B)' = B' * A'
The LU factorization of a matrix A allows the solution of linear systems involving A' as well. LAPACK and LINPACK linear solution software takes advantage of this fact.
Back to TABLE OF CONTENTS.
A trapezoidal matrix is essentially a rectangular triangular matrix. Thus, a trapezoidal matrix has a different number of rows than columns. If, in addition, A_{i,j} = 0 whenever I > J, the matrix is called upper trapezoidal. If A_{i,j} = 0 whenever I < J, the matrix is called lower trapezoidal.
Here is a "wide" upper trapezoidal matrix:
11 12 13 14 0 22 23 24 0 0 33 34Here is a "tall" upper trapezoidal matrix:
11 12 13 14 0 22 23 24 0 0 33 34 0 0 0 44 0 0 0 0 0 0 0 0
You might encounter an upper trapezoidal matrix when carrying out Gauss Jordan Elimination on a square matrix, or computing the QR factorization of a rectangular matrix.
Back to TABLE OF CONTENTS.
An upper triangular matrix is entirely zero below the main diagonal while a lower triangular matrix is zero above the main diagonal.
It the entries of the main diagonal are all equal to 1, the matrix is said to be unit upper triangular or unit lower triangular. For example, you will encounter a unit lower triangular matrix in Gauss elimination.
Simple facts about a triangular matrix A:
Back to TABLE OF CONTENTS.
A tridiagonal matrix is a matrix whose only nonzero entries occur on the main diagonal or on the two diagonals which are immediate neighbors of the main diagonal.
The diagonals immediately below and above the main diagonal are referred to as the (principal) subdiagonal and (principal) superdiagonal respectively.
Here is an example of a tridiagonal matrix which is also positive definite symmetric:
-2 1 0 0 1 -2 1 0 0 1 -2 1 0 0 1 -2
Simple facts about a tridiagonal matrix A:
If it is true that none of the subdiagonal elements are zero, and none of the superdiagonal elements are zero, and
|A_{1,1}| > |A_{1,2}|and, for 2 <= i <= n-1,
|A_{n,n}| > |A_{n,n-1}|
|A_{i,i}| >= |A_{i,i-1}| + |A_{i,i-1}|then A is nonsingular, and Gauss elimination can be performed without pivoting. THis is true, for instance, for the "-1,2,-1" matrix used to approximate the second derivative, and the "2,1;1,4,1;1,2" matrix used in cubic spline interpolation.
If a (real) matrix A is tridiagonal and irreducible, then its eigenvalues are real and distinct. Moreover, the eigenvalues of the sequence of principal minors of A have a strict interlacing property: the k+1 eigenvalues of the principal minor of order k+1 are strictly separated by the k eigenvalues of the minor of order k. Since the determinant of the minors can be easily computed recursively, and since the sign sequence of these determinants carries information about the number of negative eigenvalues associated with each minor, this suggests how a bisection method can be employed to hunt for the eigenvalues of a tridiagonal matrix.
LAPACK and LINPACK include special routines for efficiently solving linear systems with a tridiagonal coefficient matrix. LAPACK and EISPACK have routines for finding eigenvalues of a tridiagonal matrix.
Cyclic reduction is a method of solving a tridiagonal system of equations which can give a large speedup on vector or parallel processors.
Back to TABLE OF CONTENTS.
An underdetermined linear system is a linear system A * x = b in which the values of N variables are sought, but for fewer than N linearly independent equations are given.
Of course, it is possible to set up a linear system where the number of equations M equals or exceeds the number of variables, but for which the number of linearly independent equations is less than N. Thus, you can't assume that M ≥ N rules out the undetermined case. It is the number of linearly independent equations that sets this. But of course, if you start with fewer than N equations, you are guaranteed an undetermined system.
An underdetermined linear system typically has an infinite number of solutions which constitute an affine linear space. The dimension of the linear space is usually called the number of degrees of freedom.
Back to TABLE OF CONTENTS.
A unimodular matrix is a square matrix whose determinant has absolute value 1.
Facts about a unimodular matrix A:
Examples of unimodular matrices:
Back to TABLE OF CONTENTS.
A unitary matrix is a complex matrix U whose transpose complex conjugate is equal to its inverse:
U^{-1} = U^{H}
Facts about a unitary matrix U
In real arithmetic, the corresponding concept is an orthogonal matrix.
Back to TABLE OF CONTENTS.
A unitary transformation is a relationship between two complex matrices A and B, and a unitary matrix V, of the form:
A = V^{-1} * B * V.
Back to TABLE OF CONTENTS.
A reduced matrix is a (usually square) matrix A which may be broken up into submatrices
A11 | A12 ----+--- 0 | A22where A11 is square, and the submatrix below it is completely zero.
An unreduced matrix is of course a matrix which does not have this form.
An irreducible matrix is one which cannot be put into reduced form by a renumbering of the variables. (The same permutation must be applied to both rows and columns.)
An unreduced tridiagonal matrix is one which has no "extra" zeroes, that is, the immediate superdiagonal and subdiagonal have no zero entries.
An unreduced upper Hessenberg matrix is one which has no "extra" zeroes, that is, the immediate subdiagonal has no zero entries.
Facts about a (square) reduced matrix A:
Back to TABLE OF CONTENTS.
The upshift matrix A circularly shifts all vector entries or matrix rows up 1 position.
Example:
0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0
Facts about the upshift matrix A:
Back to TABLE OF CONTENTS.
A vector-bound matrix norm is a matrix norm that has been (or can be) derived from a vector norm by the following formula:
||A|| = supremum ||A*x|| / ||x||where the supremum (roughly, the "maximum") is taken over all nonzero vectors x.
If such a relationship holds, then expressions involving the matrix norm and vector norm can be mingled to produce useful inequalities, based on the guaranteed compatiblity relationship:
||A*x|| <= ||A|| * ||x||
Matrix norms which are vector bound with some vector norm include the L1 matrix norm, the L2 matrix norm, and the L Infinity matrix norm.
Back to TABLE OF CONTENTS.
A vector norm is a function ||*|| that measures the "size" of a vector.
A vector norm must have the following properties:
Commonly used vector norms include:
Given two points in space, x and y, we can define the distance between the points, d(x,y), in terms of a vector norm operating on the vectors of position coordinates:
d(x,y) = || x - y ||and this quantity will have the expected properties of a distance function.
For a given vector norm, it is important to know which matrix norms are compatible, so that expressions like
||A*x|| <= ||A|| * ||x||may be asserted.
Back to TABLE OF CONTENTS.
A Z matrix is a real (square) matrix with nonpositive off-diagonal entries.
If the diagonal entries of a Z matrix are strictly positive, then it is also an L matrix.
A Z matrix is a "candidate" to be an M matrix, and over 50 different conditions are known that will guarantee that a Z matrix is, in fact, an M matrix.
Back to TABLE OF CONTENTS.
The zero matrix is a matrix all of whose entries are zero.
A zero matrix is sometimes called a trivial matrix. A matrix which has at least one nonzero entry is called a nontrivial matrix or nonzero matrix. (It is not required that all entries be nonzero, just that at least one of them is nonzero!)
Back to TABLE OF CONTENTS.
A zero one matrix is a matrix whose entries are equal to 0 or 1.
A few examples of zero one matrices include:
The permanent of a zero one matrix is equal to the number of generalized diagonals that contain no 0 element. The permanent of a zero one matrix is zero if and only if it contains an r by s subblock of zeroes, with r+s>=n+1.
Back to TABLE OF CONTENTS.
You can return to the HTML web page.