cvt_basis_flow

cvt_basis_flow, a Fortran90 code which extracts representative solution modes of a set of solutions to a fluid flow PDE.

The selection process uses K-Means clustering, which can be considered to be a discrete version of the CVT algorithm (Centroidal Voronoi Tessellation).

The selected modes will generally be "well spread out" in the space spanned by the set of solutions. Such a set of modes might be useful as a basis for a low-dimensional approximation of new solutions, as long as it may be assumed that these new solutions do not have significant components that were not evident in the original solution data.

Specifically, a partial differential equation (PDE) has been defined, specifying the time dependent flow of a fluid through a region. The PDE specification includes a parameter ALPHA whose value strongly affects the behavior of the flow. The steady state solution S0 is computed for a particular value of ALPHA. Then the time dependent problem is solved over a fixed time interval, with ALPHA varying from time to time. A set of several hundred solutions S(T(I),ALPHA(I)) are saved.

The need is to try to extract from this solution data the typical modes of behavior of the solution. Such a set of modes may then be used as a finite element basis that is highly tuned to the physics of the problem, so that a very small set of basis functions can be used to closely approximate the behavior of the solution over a range of values of ALPHA.

The method of extracting information from the solution data uses a form of K-Means clustering. The program will try to cluster the data, that is, to organize the data by defining a number of cluster centers, which are also points in N dimensional space, and assigning each record to the cluster associated with a particular center.

The method of assigning data aims to minimize the cluster energy, which is taken to be the sum of the squares of the distances of each data point from its cluster center.

In some contexts, it makes sense to use the usual Euclidean sort of distance. In others, it may make more sense to replace each data record by a normalized version, and to assign distance by computing angles between the unit vectors.

Because the data comes from a finite element computation, and the results may be used as a new reduced basis, it may be desirable to carry out mass matrix preconditioning of the data, so that output vectors (cluster generators) are pairwise orthogonal in the L2 inner product (integration of the product of the finite element functions over the domain).

Because the results may be used as a new reduced basis, it may be desirable, once the results have been computed, to apply a Gram-Schmidt orthogonalization procedure, so that the basis vectors have unit Euclidean norm, and are pairwise orthogonal.

The current version of the program assumes that a steady state solution SS of the PDE is known, and that a multiple of SS is to be subtracted from each solution vector before processing.

FILES: the program assumes the existence of the following files: (the actual names of the files are specified by the user at run time. The names used here are just suggestions.)

xy.txt, contains the coordinates of each node, with one pair of coordinates per line of the file;
ss.txt, contains the steady state solution values at each node; normally, there are two values per node (horizontal and vertical velocity). However, the program will accept data that is scalar, or with a higher number of components than 2. Most of the ensuing discussion assumes that the number of components is 2, but that's just because that is the problem we are usually working on;
uv01.txt, uv02.txt, ..., contains the solution values at each node for solution 1, 2, and so on; the number of components (normally 2) must be the same as for the steady state solution file.
element.txt, contains the indices of the six nodes that make up each element, with one set of six indices per line of the file (only needed if mass matrix preconditioning is used);

INPUT: at run time, the user specifies:

run_type describes how we subtract off the steady state, whether we drop some data, and other options. The current values range from 1 to 8. The most common value is 6, used with the TCELL data:
1. no steady state file is used, no preprocessing is carried out;
2. no steady state file is used, no preprocessing is carried out;
3. subtract 1/3 SS from solution 1, 5/3 SS from solutions 2 to 201, and 1/3 SS from solutions 202 through 401.
4. subtract 1/3 SS from solution 1, 5/3 SS from solutions 2 to 201, and 1/3 SS from solutions 202 through 401, and drop the even-numbered data.
5. subtract 1/3 SS from solution 1, 5/3 SS from solutions 2 to 201, and 1/3 SS from solutions 202 through 401, and skip half the data and normalize it.
6. subtract 5/3 SS from solutions 1 to 250, and 1/3 SS from solutions 251 through 500, do not normalize.
7. subtract 5/3 SS from solutions 1 to 250, and 1/3 SS from solutions 251 through 500, normalize the data.
8. subtract 5/3 SS from solutions 1 to 250, and 1/3 SS from solutions 251 through 500, then drop the odd-numbered data, do not normalize.
xy_file, the name of the xy file containing the node coordinates;
steady_file, the name of the steady state solution file, or "none" if the data does not need to be preprocessed (run_type 1 or 2);
uv0_file, the name of the first solution file (the program will assume all the files are numbered consecutively). The code has been modified so that you may now specify more than one set of solution families. Enter "none" if there are no more families, or else the name of the first file in the next family. Up to 10 separate families of files are allowed.
cluster_lo, cluster_hi, the range of cluster sizes to check. In most cases, you simply want to specify the same number for both these values, namely, the requested basis size.
cluster_it_max, the number of different times you want to try to cluster the data; I often use 15.
energy_it_max, the number of times you want to try to improve a given clustering by swapping points from one cluster to another; I often use 50 or 100.
element_file, the name of the element file, if mass matrix preconditioning is desired, or else "none".
normal, 0 to use raw data, 1 to normalize; here, after we have subtracted the steady state and preconditioned the data vectors, we are offering also to make each data vector have unit norm before clustering. At the moment, I'm working with the raw data.
comment, "Y" if initial comments may be included in the beginning of the output files. These comments always start with a "#" character in column 1.

OUTPUT: the program computes basis_num basis vectors. The first vector is written to the file gen_001.txt; again, the output vectors are written with two values per line, since this represents the two components of velocity at a particular node.

Linkage:: The program calls numerous LAPACK routines for the processing of the mass matrix. The text for these routines is not included in the source code. The compiled program must be linked to the LAPACK library.

Licensing:

The information on this web page is distributed under the MIT license.

Languages:

cvt_basis_flow is available in a Fortran90 version.

Related Data and Programs:

brain_sensor_pod, a MATLAB program which applies the method of Proper Orthogonal Decomposition to seek underlying patterns in sets of 40 sensor readings of brain activity.

cvt_basis, a Fortran90 program which is similar to CVT_BASIS_FLOW, but handles any general set of data vectors.

cvtp, a Fortran90 library which creates a CVTP, that is, a Centroidal Voronoi Tessellation on a periodic domain.

pod_basis_flow, a Fortran90 program which is similar to CVT_BASIS_FLOW, but uses POD methods to extract representative modes from the data.

Reference:

Franz Aurenhammer,
Voronoi diagrams - a study of a fundamental geometric data structure,
ACM Computing Surveys,
Volume 23, Number 3, pages 345-405, September 1991.
John Burkardt, Max Gunzburger, Hyung-Chun Lee,
Centroidal Voronoi Tessellation-Based Reduced-Order Modelling of Complex Systems,
SIAM Journal on Scientific Computing,
Volume 28, Number 2, 2006, pages 459-484.
John Burkardt, Max Gunzburger, Janet Peterson and Rebecca Brannon,
User Manual and Supporting Information for Library of Codes for Centroidal Voronoi Placement and Associated Zeroth, First, and Second Moment Determination,
Sandia National Laboratories Technical Report SAND2002-0099,
February 2002.
Qiang Du, Vance Faber, Max Gunzburger,
Centroidal Voronoi Tessellations: Applications and Algorithms,
SIAM Review, Volume 41, 1999, pages 637-676.
Lili Ju, Qiang Du, Max Gunzburger,
Probabilistic methods for centroidal Voronoi tessellations and their parallel implementations,
Parallel Computing,
Volume 28, 2002, pages 1477-1500.
Wendy Martinez, Angel Martinez,
Computational Statistics Handbook with MATLAB,
Chapman and Hall / CRC, 2002.

Source Code:

cvt_basis_flow.f90, the source code.

Examples and Tests:

PDE solution datasets you may copy include:

../../datasets/cavity_flow/cavity_flow.html, the driven cavity;
../../datasets/inout_flow/inout_flow.html, flow in and out of a chamber;
../../datasets/inout_flow2/inout_flow2.html, flow in and out of a chamber, using a finer grid and more timesteps;
../../datasets/tcell_flow/tcell_flow.html, flow through a T-cell;

This program has been run with a number of different datasets, and with various requirements as to normalization and so on. The purpose of most of the runs is to find a generator set of given size. The input and output of each run is stored in a separate subdirectory.

Now we worked with 500 flow solutions in the TCELL region. We subtract 5/3 of steady solution from 1-250, and 1/3 from 251 through 500. We DON'T normalize the PDE solutions.

run 22, 2 elements;
run 13, 4 elements;
run 14, 8 elements;
run 15, 16 elements;

The next set of runs worked with 500 flow solutions in the TCELL region. We subtract 5/3 of steady solution from 1-250, and 1/3 from 251 through 500. Now we NORMALIZE the PDE solutions before processing them.

run 23, 2 elements;
run 16, 4 elements;
run 17, 8 elements;
run 18, 16 elements;

The next set of runs worked with 500 flow solutions in the TCELL region. We subtract 5/3 of steady solution from 1-250, and 1/3 from 251 through 500. We DON'T normalize the PDE solutions. We discard half the data, keeping the EVEN steps, 2, 4, ..., 500.

run 24, 2 elements;
run 19, 4 elements;
run 20, 8 elements;
run 21, 16 elements;

The next set of runs works with 500 flow solutions in the INOUT region. We subtract 5/3 of steady solution from 1-250, and 1/3 from 251 through 500. We DON'T normalize the PDE solutions.

run 25, 2 elements;
run 26, 4 elements;
run 27, 8 elements;
run 28, 16 elements;

The next set of runs works with 500 flow solutions in the INOUT region. We subtract 5/3 of steady solution from 1-250, and 1/3 from 251 through 500. We NORMALIZE the PDE solutions.

run 29, 2 elements;
run 30, 4 elements;
run 31, 8 elements;
run 32, 16 elements;

The next set of runs works with 500 flow solutions in the INOUT region. We subtract 5/3 of steady solution from 1-250, and 1/3 from 251 through 500. We DON'T normalize the PDE solutions. Before we proceed, we DROP the ODD numbered PDE solutions

run 33, 2 elements;
run 34, 4 elements;
run 35, 8 elements;
run 36, 16 elements;

The next set of runs works with 500 flow solutions in the CAVITY region. We subtract 5/3 of steady solution from 1-250, and 1/3 from 251 through 500. We DON'T normalize the PDE solutions.

run 37, 2 elements;
run 38, 4 elements;
run 39, 8 elements;
run 40, 16 elements;

The next set of runs works with 500 flow solutions in the CAVITY region. We subtract 5/3 of steady solution from 1-250, and 1/3 from 251 through 500. We NORMALIZE the PDE solutions.

run 41, 2 elements;
run 42, 4 elements;
run 43, 8 elements;
run 44, 16 elements;

The next set of runs works with 500 flow solutions in the CAVITY region. We subtract 5/3 of steady solution from 1-250, and 1/3 from 251 through 500. We DON'T normalize the PDE solutions. Before we proceed, we DROP the ODD numbered PDE solutions

run 45, 2 elements;
run 46, 4 elements;
run 47, 8 elements;
run 48, 16 elements;

The next set of runs works with 500 flow solutions in the CAVITY region. We subtract 5/3 of steady solution from 1-250, and 1/3 from 251 through 500. We normalize the PDE solutions. We use MASS MATRIX preconditioning.

run 49, 4 elements;
run 50, 8 elements;
run 51, 16 elements;

The next set of runs works with 500 flow solutions in the INOUT region. We subtract 5/3 of steady solution from 1-250, and 1/3 from 251 through 500. We normalize the PDE solutions. We use MASS MATRIX preconditioning.

run 52, 4 elements;
run 53, 8 elements;
run 54, 16 elements;

The next set of runs works with 500 flow solutions in the TCELL region. We subtract 5/3 of steady solution from 1-250, and 1/3 from 251 through 500. We normalize the PDE solutions. We use MASS MATRIX preconditioning.

run 55, 4 elements;
run 76, 5 elements;
run 77, 7 elements;
run 56, 8 elements;
run 57, 16 elements;

The next set of runs works with 500 flow solutions in the CAVITY region. We subtract 5/3 of steady solution from 1-250, and 1/3 from 251 through 500. We do not normalize the PDE solutions. We use MASS MATRIX preconditioning.

run 58, 4 elements;
run 59, 8 elements;
run 60, 16 elements;

The next set of runs works with 500 flow solutions in the INOUT region. We subtract 5/3 of steady solution from 1-250, and 1/3 from 251 through 500. We do not normalize the PDE solutions. We use MASS MATRIX preconditioning.

run 61, 4 elements;
run 62, 8 elements;
run 63, 16 elements;

The next set of runs works with 500 flow solutions in the TCELL region. We subtract 5/3 of steady solution from 1-250, and 1/3 from 251 through 500. We do not normalize the PDE solutions. We use MASS MATRIX preconditioning.

run 64, 4 elements;
run 78, 5 elements;
run 81, 6 elements;
run 79, 7 elements;
run 65, 8 elements;
run 82, 9 elements;
run 80, 10 elements;
run 83, 11 elements;
run 84, 12 elements;
run 85, 13 elements;
run 86, 14 elements;
run 87, 15 elements;
run 66, 16 elements;
run 88, 17 elements;
run 89, 18 elements;
run 90, 19 elements;
run 91, 20 elements;

run 67, 4 elements;
run 68, 8 elements;
run 69, 16 elements;

run 70, 4 elements;
run 71, 8 elements;
run 72, 16 elements;

run 73, 4 elements;
run 74, 8 elements;
run 75, 16 elements;

The next set of runs works with 800 flow solutions in the INOUT2 region. We subtract 5/3 of steady solution from 1-400, and 1/3 from 401 through 800. We DON'T normalize the PDE solutions.

run 92, 2 elements;
run 93, 4 elements;
run 94, 8 elements;
run 95, 16 elements;

The next set of runs works with 800 flow solutions in the INOUT2 region. We subtract 5/3 of steady solution from 1-400, and 1/3 from 401 through 800. We DON'T normalize the PDE solutions. We use mass matrix preconditioning.

run 96, 2 elements;
run 97, 4 elements;
run 98, 8 elements;
run 99, 16 elements;

The next set of runs works with 40 scalar flow solutions in the one-dimensional BURGERS equation.

run 100, 4 elements;

List of Routines:

MAIN is the main routine for the CVT_BASIS_FLOW program.
ANALYSIS_NORMAL computes the energy for a range of number of clusters.
ANALYSIS_RAW computes the energy for a range of number of clusters.
BANDWIDTH_DETERMINE computes the lower bandwidth of a finite element matrix.
CH_CAP capitalizes a single character.
CH_EQI is a case insensitive comparison of two characters for equality.
CH_IS_DIGIT returns .TRUE. if a character is a decimal digit.
CH_TO_DIGIT returns the integer value of a base 10 digit.
CLUSTER_CENSUS computes and prints the population of each cluster.
CLUSTER_INITIALIZE_RAW initializes the cluster centers to random values.
CLUSTER_LIST prints out the assignments.
DATA_TO_GNUPLOT writes data to a file suitable for processing by GNUPLOT.
DIGIT_INC increments a decimal digit.
DIGIT_TO_CH returns the character representation of a decimal digit.
DISTANCE_NORMAL_SQ computes the distance between normalized vectors.
DTABLE_DATA_READ reads data from a double precision table file.
DTABLE_DATA_WRITE writes data to a double precision table file.
DTABLE_HEADER_READ reads the header from a double precision table file.
DTABLE_HEADER_WRITE writes the header to a double precision table file.
DTABLE_WRITE writes a double precision table file.
ENERGY_NORMAL computes the total energy of a given clustering.
ENERGY_RAW computes the total energy of a given clustering.
FILE_COLUMN_COUNT counts the number of columns in the first line of a file.
FILE_EXIST reports whether a file exists.
FILE_NAME_INC generates the next filename in a series.
FILE_ROW_COUNT counts the number of row records in a file.
GET_UNIT returns a free Fortran unit number.
HMEANS_NORMAL seeks the minimal energy of a cluster of a given size.
HMEANS_RAW seeks the minimal energy of a cluster of a given size.
I4_INPUT prints a prompt string and reads an integer from the user.
I4_RANGE_INPUT reads a pair of integers from the user, representing a range.
I4_UNIFORM returns a scaled pseudorandom I4.
ITABLE_DATA_READ reads data from an integer table file.
ITABLE_HEADER_READ reads the header from an integer table file.
I4VEC_PRINT prints an integer vector.
KMEANS_NORMAL tries to improve a partition of points.
KMEANS_RAW tries to improve a partition of points.
MASS_MATRIX computes the mass matrix.
NEAREST_CLUSTER_NORMAL finds the cluster nearest to a data point.
NEAREST_CLUSTER_RAW finds the cluster nearest to a data point.
NEAREST_POINT finds the center point nearest a data point.
POINT_GENERATE generates data points for the problem.
POINT_PRINT prints out the values of the data points.
R8VEC_NORM2 returns the 2-norm of a vector.
R8VEC_RANGE_INPUT reads two DP vectors from the user, representing a range.
R8VEC_UNIT_EUCLIDEAN normalizes a N-vector in the Euclidean norm.
RANDOM_INITIALIZE initializes the Fortran 90 random number seed.
REFQBF evaluates a reference element quadratic basis function.
S_BLANK_DELETE removes blanks from a string, left justifying the remainder.
S_EQI is a case insensitive comparison of two strings for equality.
S_INPUT prints a prompt string and reads a string from the user.
S_OF_I4 converts an integer to a left-justified string.
S_REP_CH replaces all occurrences of one character by another.
S_TO_I4 reads an I4 from a string.
S_TO_I4VEC reads an integer vector from a string.
S_TO_R8 reads an R8 from a string.
S_TO_R8VEC reads an R8VEC from a string.
S_WORD_COUNT counts the number of "words" in a string.
TIMESTAMP prints the current YMDHMS date as a time stamp.
TIMESTRING writes the current YMDHMS date into a string.
TRIANGLE_UNIT_SET sets a quadrature rule in a unit triangle.

You can go up one level to the Fortran90 source codes.

Last revised on 12 November 2006.