# SVD_BASIS_WEIGHT Extract singular vectors from weighted data

SVD_BASIS_WEIGHT is a FORTRAN90 program which applies the singular value decomposition ("SVD") to a set of weighted data vectors, to extract the leading "modes" of the data.

This procedure, originally devised by Karl Pearson, has arisen repeatedly in a variety of fields, and hence is known under various names, including:

• the Hotelling transform;
• the discrete Karhunen-Loeve transform (KLT)
• Principal Component Analysis (PCA)
• Principal Orthogonal Direction (POD)
• Proper Orthogonal Decomposition (POD)
• Singular Value Decomposition (SVD)

The role of the weights is to assign greater importance to some vectors. This will have the effect of making it more likely that modes associated with such input data will reappear in the output basis.

This program is intended as an intermediate application, in the following situation:

1. a "high fidelity" or "high resolution" PDE solver is used to determine many (say N = 500) solutions of a discretized PDE at various times, or parameter values. Each solution may be regarded as an M vector. Typically, each solution involves an M by M linear system, greatly reduced in complexity because of bandedness or sparsity.
2. The user determines a weight vector W, with one value assigned to each solution or vector. Depending on the problem, the weights might be chosen beforehand, or computed automatically by some natural system, perhaps related to a varying time step size, or other reasons.
3. This program is applied to extract L dominant modes from the N weighted solutions. This is done using the singular value decomposition of the M by N matrix, each of whose columns is one of the original solution vectors after scaling by the weight vector.
4. a "reduced order model" program may then attempt to solve a discretized version of the PDE, using the L dominant modes as basis vectors. Typically, this means that a dense L byL linear system will be involved.

Thus, the program might read in 500 solution files, and a weight file, and write out 5 or 10 files of the corresponding size and "shape", representing the dominant solution modes.

The optional normalization step involves computing the average of all the solution vectors and subtracting that average from each solution. In this case, the average vector is treated as a special "mode 0", and also written out to a file.

To compute the singular value decomposition, we first construct the M by N matrix A using the individual solution vectors as columns (after multiplication by the weights (w1, w2, ..., wN):

A = [ w1 * X1 | w2 * X2 | ... | wN * XN ]

The singular value decomposition has the form:

A = U * S * V'
and is determined using the DGESVD routine from the linear algebra package LAPACK. The leading L columns of the orthogonal M by M matrix U, associated with the largest singular values S, are chosen to form the basis.

In most PDE's, the solution vector has some structure; perhaps there are 100 nodes, and at each node the solution has perhaps 4 components (horizontal and vertical velocity, pressure, and temperature, say). While the solution is therefore a vector of length 400, it's more natural to think of it as a sort of table of 100 items, each with 4 components. You can use that idea to organize your solution data files; in other words, your data files can each have 100 lines, containing 4 values on each line. As long as every line has the same number of values, and every data file has the same form, the program can figure out what's going on.

The program assumes that each solution vector is stored in a separate data file and that the files are numbered consecutively, such as data01.txt, data02,txt, ... In a data file, comments (beginning with '#") and blank lines are allowed. Except for comment lines, each line of the file is assumed to represent all the component values of the solution at a particular node.

Here, for instance, is one data file for a problem with just 3 nodes, and 4 solution components at each node:

```      #  This is solution file number 1
#
1   2   3   4
5   6   7   8
9  10  11  12
```
As far as the program is concerned, this file contains a vector of length 12, the first data item. Presumably, many more files will be supplied.

A separate file must be supplied for the weights, with one weight for each data item. The magnitude of the weight is important, but the sign is meaningless. For a set of 500 data vectors, the weight file might have 500 lines of text (ignoring comments) such as:

```        # weight file
#
0.15
0.50
1.25
0.01
1.95
...
0.77
```

The program is interactive, but requires only a very small amount of input:

• L, the number of basis vectors to be extracted from the data;
• the name of the first input data file in the first set.
• the name of the first input data file in the second set, if any. (you are allowed to define a master data set composed of several groups of files, each consisting of a sequence of consecutive file names)
• a BLANK line, when there are no more sets of data to be added.
• the name of the WEIGHT file.
• "Y" if the vectors should be averaged, the average subtracted from all vectors, and the average written out as an extra "mode 0" vector;
• "Y" if the output files may include some initial comment lines, which will be indicated by initial "#" characters.

The program computes L basis vectors, and writes each one to a separate file, starting with svd_001.txt, svd_002.txt and so on. The basis vectors are written with the same component and node structure that was encountered on the solution files. Each vector will have unit Euclidean norm.

### Languages:

SVD_BASIS_WEIGHT is available in a FORTRAN90 version.

### Related Data and Programs:

BRAIN_SENSOR_POD, a MATLAB program which applies the method of Proper Orthogonal Decomposition to seek underlying patterns in sets of 40 sensor readings of brain activity.

BURGERS, a dataset which contains a set of 40 successive solutions to the Burgers equation. This data can be analyzed using SVD_BASIS_WEIGHT.

LAPACK_EXAMPLES, a FORTRAN90 program which demonstrates the use of the LAPACK linear algebra library.

POD_BASIS_FLOW, a FORTRAN90 program which uses the same algorithm used by SVD_BASIS_WEIGHT, but specialized to handle solution data from a particular set of fluid flow problems.

SVD_BASIS, a FORTRAN90 program which is similar to SVD_BASIS_WEIGHT but works on the case where all data has the same weight.

SVD_DEMO, a FORTRAN90 program which demonstrates the singular value decomposition for a simple example.

SVD_SNOWFALL, a FORTRAN90 program which reads a file containing historical snowfall data and analyzes the data with the Singular Value Decomposition (SVD), and plots created by GNUPLOT.

### Reference:

1. Edward Anderson, Zhaojun Bai, Christian Bischof, Susan Blackford, James Demmel, Jack Dongarra, Jeremy DuCroz, Anne Greenbaum, Sven Hammarling, Alan McKenney, Danny Sorensen,
LAPACK User's Guide,
Third Edition,
SIAM, 1999,
ISBN: 0898714478,
LC: QA76.73.F25L36
2. Gal Berkooz, Philip Holmes, John Lumley,
The proper orthogonal decomposition in the analysis of turbulent flows,
Annual Review of Fluid Mechanics,
Volume 25, 1993, pages 539-575.
3. John Burkardt, Max Gunzburger, Hyung-Chun Lee,
Centroidal Voronoi Tessellation-Based Reduced-Order Modelling of Complex Systems,
SIAM Journal on Scientific Computing,
Volume 28, Number 2, 2006, pages 459-484.
4. Lawrence Sirovich,
Turbulence and the dynamics of coherent structures, Parts I-III,
Quarterly of Applied Mathematics,
Volume XLV, Number 3, 1987, pages 561-590.

### Examples and Tests:

The user's input, and the program's output are here:

The input data consists of 5 files:

There are two options for the weight files:

EXAMPLE 1 uses no averaging and even weights. The output data consists of 4 files, all of which are SVD basis vectors:

EXAMPLE 2 uses no averaging and uneven weights. The output data consists of 4 files, all of which are SVD basis vectors:

EXAMPLE 3 uses averaging and even weights. The output data consists of 5 files, all of which are SVD basis vectors:

### List of Routines:

• MAIN is the main program for SVD_BASIS_WEIGHT.
• BASIS_WRITE writes a basis vector to a file.
• CH_CAP capitalizes a single character.
• CH_EQI is a case insensitive comparison of two characters for equality.
• CH_IS_DIGIT is TRUE if a character is a decimal digit.
• CH_TO_DIGIT returns the integer value of a base 10 digit.
• DIGIT_INC increments a decimal digit.
• DIGIT_TO_CH returns the character representation of a decimal digit.
• FILE_COLUMN_COUNT counts the number of columns in the first line of a file.
• FILE_EXIST reports whether a file exists.
• FILE_NAME_INC generates the next filename in a series.
• FILE_ROW_COUNT counts the number of row records in a file.
• GET_UNIT returns a free FORTRAN unit number.
• I4_INPUT prints a prompt string and reads an I4 from the user.