Sparse Linear Algebra Package

DLAP is a FORTRAN90 library which implements a number of routines for solving sparse linear systems, by Anne Greenbaum and Mark Seager.

DLAP contains "core" routines for the iterative solution of symmetric and non-symmetric positive definite and positive semi-definite linear systems.

Included in this package are core routines to do

The DLAP core routines generally do not interact directly with the sparse matrix. Instead, they require the user to supply two routines:

This allows the core routines to be written in a way that makes them independent of the matrix data structure. For each core routine there are several drivers and support routines that allow the user to utilize Diagonal Scaling and Incomplete Cholesky factorization or Incomplete LU factorization as preconditioners with no coding. The price for this convenience is that one must use the a specific matrix data structure: DLAP Column or DLAP Triad format.

This document contains the specifications for the DLAP Version 2.0 package, a Fortran package for the solution of large sparse linear systems, Ax = b, via preconditioned iterative methods.

Included in this package are core routines to do

These core routines do not require a fixed data structure for storing the matrix A and the preconditioning matrix M. The user is free to choose any structure that facilitates efficient solution of the problem at hand. The drawback to this approach is that the user must also supply at least two routines (MATVEC and MSOLVE, say).

MATVEC must calculate y = A*x, given x and the user's data structure for A. MSOLVE must solve, r = M*z, for z (*NOT* r) given r and the user's data structure for M (or its inverse). The user should choose M so that M*A is approximately the identity and the solution step r = M*z is "easy" to solve.

For some of the "core" routines (Orthomin, BiConjugate Gradient and Conjugate Gradient on the normal equations) the user must also supply a matrix transpose times vector routine (MTTVEC, say) and (possibly, depending on the "core" method) a routine that solves the transpose of the preconditioning step (MTSOLV, say). Specifically, MTTVEC is a routine which calculates y = A'x, given x and the user's data structure for A (A' is the transpose of A). MTSOLV is a routine which solves the system r = M'z for z given r and the user's data structure for M.

This process of writing the matrix vector operations can be time consuming and error prone. To alleviate these problems we have written drivers for the "core" methods that assume the user supplies one of two specific data structures (DLAP Triad and DLAP Column format), see below. Utilizing these data structures we have augmented each "core" method with two preconditioners: Diagonal Scaling and Incomplete Factorization. Diagonal scaling is easy to implement, vectorizes very well and for problems that are not too ill-conditioned reduces the number of iterations enough to warrant its use. On the other hand, an Incomplete factorization (Incomplete Cholesky for symmetric systems and Incomplete LU for nonsymmetric systems) may take much longer to calculate, but it reduces the iteration count (for most problems) significantly. Our implementations of IC and ILU vectorize for machines with hardware gather scatter, but the vector lengths can be quite short if the number of nonzeros in a column is not large.

DLAP Triad format

In the DLAP Triad format only the non-zeros are stored. They may appear in any order. The user supplies three arrays of length NELT, where NELT is the number of non-zeros in the matrix: (IA(NELT), JA(NELT), A(NELT)). If the matrix is symmetric then one need only store the lower triangle (including the diagonal) and NELT would be the corresponding number of non-zeros stored. For each non-zero the user puts the row and column index of that matrix element in the IA and JA arrays. The value of the non-zero matrix element is placed in the corresponding location of the A array. This is an extremely easy data structure to generate. On the other hand, it is not very efficient on vector computers for the iterative solution of linear systems. Hence, DLAP changes this input data structure to the DLAP Column format for the iteration (but does not change it back).

Here is an example of the DLAP Triad storage format for a nonsymmetric 5x5 Matrix. NELT=11. Recall that the entries may appear in any order.

      |11 12  0  0 15|
      |21 22  0  0  0|
      | 0  0 33  0 35|
      | 0  0  0 44  0|
      |51  0 53  0 55|
Here is the DLAP Triad format for the same 5x5 matrix:
           1  2  3  4  5  6  7  8  9 10 11
       A: 51 12 11 33 15 53 55 22 35 44 21
      IA:  5  1  1  3  1  5  5  2  3  4  2
      JA:  1  2  1  3  5  3  5  2  5  4  1

DLAP Column format

In the DLAP Column format the non-zeros are stored counting down columns (except for the diagonal entry, which must appear first in each "column") and are stored in the real array A. In other words, for each column in the matrix first put the diagonal entry in A. Then put in the other non-zero elements going down the column (except the diagonal) in order. The IA array holds the row index for each non-zero. The JA array holds the offsets into the IA, A arrays for the beginning of each column. That is, IA(JA(ICOL)), A(JA(ICOL)) are the first elements of the ICOL-th column in IA and A. IA(JA(ICOL+1)-1), A(JA(ICOL+1)-1) are the last elements of the ICOL-th column. Note that we always have JA(N+1) = NELT+1, where N is the number of columns in the matrix and NELT is the number of non-zeros in the matrix. If the matrix is symmetric one need only store the lower triangle (including the diagonal) and NELT would be the corresponding number of non-zeros stored.

Here is an example of the DLAP Column storage format for a nonsymmetric 5x5 Matrix (in the A and IA arrays '|' denotes the end of a column):

      |11 12  0  0 15|
      |21 22  0  0  0|
      | 0  0 33  0 35|
      | 0  0  0 44  0|
      |51  0 53  0 55|
Here is the DLAP Column format for the 5x5 matrix:
           1  2  3    4  5    6  7    8    9 10 11
       A: 11 21 51 | 22 12 | 33 53 | 44 | 55 15 35
      IA:  1  2  5 |  2  1 |  3  5 |  4 |  5  1  3
      JA:  1  4  6    8  9   12

Naming Conventions

DLAP iterative methods, matrix vector and preconditioner calculation routines follow a naming convention which, when understood, allows one to determine the iterative method and data structure(s) used. The subroutine naming convention takes the following form:

where P stands for the precision (or data type) of the routine and is required in all names, S denotes whether or not the routine requires the DLAP Triad or Column format (it does if the second letter of the name is S and does not otherwise), the optional M stands for the type of preconditioner used (only appears in drivers for "core" routines) and DESC is some number of letters describing the method or purpose of the routine. In this incarnation of DLAP only single precision data types are supported (no double precision or complex data type routines have been written). Hence, all routines start with an S, boring. The brackets around S and M designate that these fields are optional.

The following is a list of the "DESC" fields for iterative methods and their meaning: BCG: BiConjugate Gradient; CG: Conjugate Gradient; CGN: Conjugate Gradient on the Normal equations; CGS, CS: biConjugate Gradient Squared; GMRES, GMR, GM: Generalized Minimum RESidual; IR, R: Iterative Refinement; JAC: JACobi's method; GS: Gauss-Seidel; OMN, OM: Orthomin.

Here are some examples of the routines:

  1. SBCG: Single precision BiConjugate Gradient "core" routine. One can deduce that this is a "core" routine, because the S and M fields are missing and BiConjugate Gradient is an iterative method.
  2. SSDBCG: Single precision, DLAP data structure BCG with Diagonal scaling.
  3. SSLUBC: Single precision, BCG with incomplete LU factorization as the preconditioning.
  4. SCG: Single precision Conjugate Gradient "core" routine.
  5. SSDCG: Single precision, DLAP data structure Conjugate Gradient with Diagonal scaling.
  6. SSICCG: Single precision, DLAP data structure Conjugate Gradient with Incomplete Cholesky factorization preconditioning.

Which Method To Use

In solving a large sparse linear system Ax = b using an iterative method, it is not necessary to actually store the matrix A. Rather, what is needed is a procedure for multiplying the matrix A times a given vector y to obtain the matrix-vector product, Ay. DLAP has been written to take advantage of this fact. The higher level routines in the package require storage only of the nonzero elements of A (and their positions), and even this can be avoided, if the user writes his own subroutine for multiplying the matrix times a vector and calls the lower-level iterative routines in the package.

If the matrix A is ill-conditioned, then most iterative methods will be slow to converge (if they converge at all!). To improve the convergence rate, one may use a "matrix splitting," or, "preconditioning matrix," say, M. It is then necessary to solve, at each iteration, a linear system with coefficient matrix M. A good preconditioner M should have two properties: (1) M should "approximate" A, in the sense that the matrix inv(M)*A (or some variant thereof) is better conditioned than the original matrix A; and (2) linear systems with coefficient matrix M should be much easier to solve than the original system with coefficient matrix A. Preconditioning routines in the DLAP package are separate from the iterative routines, so that any of the preconditioners provided in the package, or one that the user codes, can be used with any of the iterative routines.

If you are willing to live with either the DLAP Triad or Column matrix data structure you can then choose one of two types of preconditioners to use: diagonal scaling or incomplete factorization. To choose between these two methods requires knowing something about the computer you're going to run these codes on and how well incomplete factorization approximates the inverse of your matrix.

Let's suppose you have a scalar machine. Then, unless the incomplete factorization is very, very poor this is *GENERALLY* the method to choose. It will reduce the number of iterations significantly and is not all that expensive to compute. So if you have just one linear system to solve and "just want to get the job done" then try incomplete factorization first. If you are thinking of integrating some DLAP iterative method into your favorite "production code" then try incomplete factorization first, but also check to see that diagonal scaling is indeed slower for a large sample of test problems.

Let's now suppose you have a vector computer with hardware gather/scatter support (Cray X-MP, Y-MP, SCS-40 or Cyber 205, ETA 10, ETA Piper or Convex C-1, etc.). Then it's much harder to choose between the two methods. The versions of incomplete factorization in DLAP do in fact vectorize, but have short vector lengths and the factorization step is relatively more expensive. Hence, for most problems (i.e., unless your problem is ill conditioned, sic!) diagonal scaling is faster, with its very fast set up time and vectorized (with long vectors) preconditioning step (even though it may take more iterations). If you have several systems (or right hand sides) to solve that can utilize the same preconditioner then the cost of the incomplete factorization can be amortized over these several solutions. This situation gives more advantage to the incomplete factorization methods. If you have a vector machine without hardware gather/scatter (Cray 1, Cray 2 & Cray 3) then the advantages for incomplete factorization are even less.

If you're trying to shoehorn DLAP into your favorite "production code" and can not easily generate either the DLAP Triad or Column format, then you are left to your own devices in terms of preconditioning. Also, you may find that the preconditioners supplied with DLAP are not sufficient for your problem. In this situation, we would recommend that you talk with a numerical analyst versed in iterative methods about writing other preconditioning subroutines (e.g., polynomial preconditioning, shifted incomplete factorization, SOR or SSOR iteration). You can always "roll your own" by using the "core" iterative methods and supplying your own MSOLVE and MATVEC (and possibly MTSOLV and MTTVEC) routines. If you do develop a new preconditioner for the DLAP data structure send the code to us (if you can do that with no strings attached!, i.e. copyright restrictions) and we'll add it to the package!

If your matrix is symmetric then you would want to use one of the symmetric system solvers. If your system is also positive definite, (Ax,x) (Ax dot product with x) is positive for all non-zero vectors x, then use Conjugate Gradient (SCG, SSDCG, SSICSG). If you're not sure it's SPD (symmetric and Positive Definite) then try SCG anyway and if it works, fine. If you're sure your matrix is not positive definite then you may want to try the iterative refinement methods (SIR) or the GMRES code (SGMRES) if SIR converges too slowly.

Nonsymmetric systems are an area of active research in numerical analysis and there are new strategies being developed. Consequently take the following advice with a grain of salt. If your matrix is positive definite, (Ax,x) (Ax dot product with x is positive for all non-zero vectors x), then you can use any of the methods for nonsymmetric systems (Orthomin, GMRES, BiConjugate Gradient, BiConjugate Gradient Squared and Conjugate Gradient applied to the normal equations). If your system is not too ill conditioned then try BiConjugate Gradient Squared (BCGS) or GMRES (SGMRES). Both of these methods converge very quickly and do not require A' or M' (' denotes transpose) information. SGMRES does require some additional storage, though. If the system is very ill conditioned or nearly positive indefinite ((Ax,x) is positive, but may be very small), then GMRES should be the first choice, but try the other methods if you have to fine tune the solution process for a "production code". If you have a great preconditioner for the normal equations (i.e., M is an approximation to the inverse of AA' rather than just A) then this is not a bad route to travel. Old wisdom would say that the normal equations are a disaster (since it squares the condition number of the system and SCG convergence is linked to this number of infamy), but some preconditioners (like incomplete factorization) can reduce the condition number back below that of the original system.

DLAP can store a sparse matrix in a file, using SLAP/DLAP Sparse Matrix File Format. Such a file can be written by the routine DTOUT, or read back in by DTIN.

There is also a routine DBHIN which can read in a file in Harwell Boeing Sparse Matrix File Format.


DLAP is available in a FORTRAN90 version.

Related Data and Programs:

CC, a data directory which contains examples of the Compressed Column (CC) sparse matrix file format;

CG, a FORTRAN90 library which implements the conjugate gradient method for solving a positive definite sparse linear system A*x=b, using reverse communication.

CR, a data directory which contains examples of the Compressed Row (CR) sparse matrix file format;

DLAP_IO, a FORTRAN90 library which reads and writes files describing sparse matrices used by DLAP.

DSP, a data directory which contains a description and examples of the DSP format for storing sparse matrices, which is equivalent to the SLAP/DLAP Triad format.

HB_TO_ST, a FORTRAN90 program which converts the sparse matrix information stored in a Harwell-Boeing file into a sparse triplet file.

LINPLUS, a FORTRAN90 library which manipulates matrices in a variety of formats.

MACHINE, a FORTRAN90 library which supplies certain machine arithmetic constants. A copy of this library is used by DLAP.

MGMRES, a FORTRAN90 library which applies the restarted GMRES algorithm to solve a sparse linear system, by Lili Ju.

SLATEC, a FORTRAN90 library which includes DLAP.

SPARSEKIT, a FORTRAN90 library which carries out sparse matrix operations, by Yousef Saad.

SPARSEPAK, a FORTRAN90 library which forms an obsolete version of the Waterloo Sparse Matrix Package.

SUPERLU, FORTRAN90 programs which illustrate how to use the SUPERLU library, which applies a fast direct solution method to solve sparse linear systems, by James Demmel, John Gilbert, and Xiaoye Li.


Anne Greenbaum, Mark Seager


  1. Peter Brown, Alan Hindmarsh,
    Reduced Storage Matrix Methods In Stiff ODE Systems,
    Technical Report UCRL-95088, Revision 1,
    Lawrence Livermore National Laboratory, June 1987.
  2. Paul Concus, Gene Golub, Dianne OLeary,
    A Generalized Conjugate Gradient Method for the Numerical Solution of Elliptic Partial Differential Equations,
    in Symposium on Sparse Matrix Computations,
    edited by James Bunch, Donald Rose,
    Academic Press, 1979,
    ISBN: 0121410501,
    LC: QA188.S9.
  3. Gene Golub, Charles VanLoan,
    Matrix Computations,
    Third Edition,
    Johns Hopkins, 1996,
    ISBN: 0-8018-4513-X,
    LC: QA188.G65.
  4. Louis Hageman, David Young,
    Applied Iterative Methods,
    Academic Press, 1981,
    ISBN: 0-12-313340-8,
    LC: QA297.8.H34.
  5. Ron Jones, David Kahaner,
    XERROR, The SLATEC Error Handling Package,
    Technical Report SAND82-0800,
    Sandia National Laboratories, 1982.
  6. Ron Jones, David Kahaner,
    XERROR, The SLATEC Error Handling Package,
    Software: Practice and Experience,
    Volume 13, Number 3, 1983, pages 251-257.
  7. Erik Kaasschieter,
    The solution of non-symmetric linear systems by bi-conjugate gradients or conjugate gradients squared,
    Technical Report 86-21,
    Delft University of Technology Report, 1986.
  8. Charles Lawson, Richard Hanson, David Kincaid, Fred Krogh,
    Algorithm 539: Basic Linear Algebra Subprograms for Fortran Usage,
    ACM Transactions on Mathematical Software,
    Volume 5, Number 3, September 1979, pages 308-323.
  9. Mark Seager,
    A SLAP for the Masses,
    Technical Report: UCRL-100267,
    Lawrence Livermore National Laboratory, December 1988.
  10. Richard Singleton,
    Algorithm 347: An Efficient Algorithm for Sorting with Minimal Storage,
    Communications of the ACM,
    Volume 12, Number 3, March 1969, pages 185-187.
  11. Peter Sonneveld,
    CGS, a fast Lanczos-type solver for nonsymmetric linear systems,
    Technical Report 84-16,
    Department of Mathematics and Informatics,
    Delft University of Technology, 1984.
  12. http://www.netlib.org/slap/index.html, NETLIB web site.

Source Code:

Examples and Tests:

DLAP_PRB is a formal test program that checks all the DLAP routines. However, it's not very useful as a template for writing your own program!

DLAP_DGMRES_PRB is a "simple" test program applies the GMRES algorithm to the [-1,2,-1] matrix. No preconditioning is used. The entries of the matrix are stored naively, in DLAP Triad format. You might find this example easier to adapt to your own purposes.

List of Routines:

You can go up one level to the FORTRAN90 source codes.

Last revised on 01 January 2010.