cvt_basis

cvt_basis, a Fortran90 code which uses discrete Centroidal Voronoi Tessellation (CVT) techniques to produce a small set of basis vectors that are good cluster centers for a large set of data vectors;

The clustering process uses the K-Means algorithm, which can be considered to be a discrete version of the CVT algorithm.

The data is a collection of vectors, with each vector stored in a separate file. The files are presumed to have "sequential" names, such as "fred01.txt", "fred02.txt", and so on. Each file must be a TABLE file, that is a series of N lines, with M values on every line (although comment lines may be inserted as well.)

The code is given the name of the first file in the sequence. It reads the data from each file in the sequence, and carries out the K Means clustering process to determine K cluster centers. It writes each of these cluster centers out to a separate file.

The cluster centers will generally be "well spread out" in the space spanned by the set of data. Such a set might be useful, for instance, in determining a basis for a low-dimensional approximation of the data.

INPUT: at run time, the user specifies:

uv0_file, the name of the first data file (the code will assume all the files are numbered consecutively). Note that you may now specify more than one set of solution families. Enter "none" if there are no more families, or else the name of the first file in the next family. Up to 10 separate families of files are allowed.
cluster_lo, cluster_hi, the range of cluster sizes to check. In most cases, you simply want to specify the same number for both these values, namely, the requested basis size.
cluster_it_max, the number of different times you want to try to cluster the data; I often use 15.
energy_it_max, the number of times you want to try to improve a given clustering by swapping points from one cluster to another; I often use 50 or 100.
comment, "Y" if initial comments may be included in the beginning of the output files. These comments always start with a "#" character in column 1.

Licensing:

The information on this web page is distributed under the MIT license.

Languages:

cvt_basis is available in a Fortran90 version.

Related Data and codes:

cvt_basis_test

brain_sensor_pod, a MATLAB code which applies the method of Proper Orthogonal Decomposition to seek underlying patterns in sets of 40 sensor readings of brain activity.

burgers, a data set directory which contains solutions of the 1 dimensional Burgers equation;

cavity_flow, a dataset directory which contains solutions of a driven cavity flow in 2D;

cvt_basis_flow, a Fortran90 code which is similar to CVT_BASIS, but is specialized to handle a particular family of fluid flow solutions.

cvtp, a Fortran90 code which creates a CVTP, that is, a Centroidal Voronoi Tessellation on a periodic domain.

inout_flow, a dataset directory which contains solutions for flow in and out of a chamber in 2D;

inout_flow2, a dataset directory which contains solutions for flow in and out of a chamber in 2D, using a finer grid and more timesteps;

svd_basis, a Fortran90 code which uses the singular value decomposition to extract representative modes from a set of data vectors.

tcell_flow, a dataset directory which contains solutions for flow through a T-cell in 2D;

Reference:

Franz Aurenhammer,
Voronoi diagrams - a study of a fundamental geometric data structure,
ACM Computing Surveys,
Volume 23, Number 3, pages 345-405, September 1991.
John Burkardt, Max Gunzburger, Hyung-Chun Lee,
Centroidal Voronoi Tessellation-Based Reduced-Order Modelling of Complex Systems,
SIAM Journal on Scientific Computing,
Volume 28, Number 2, 2006, pages 459-484.
John Burkardt, Max Gunzburger, Janet Peterson, Rebecca Brannon,
User Manual and Supporting Information for Library of Codes for Centroidal Voronoi Placement and Associated Zeroth, First, and Second Moment Determination,
Sandia National Laboratories Technical Report SAND2002-0099,
February 2002.
Qiang Du, Vance Faber, Max Gunzburger,
Centroidal Voronoi Tessellations: Applications and Algorithms,
SIAM Review, Volume 41, 1999, pages 637-676.
Lili Ju, Qiang Du, Max Gunzburger,
Probabilistic methods for centroidal Voronoi tessellations and their parallel implementations,
Parallel Computing,
Volume 28, 2002, pages 1477-1500.
Wendy Martinez, Angel Martinez,
Computational Statistics Handbook with MATLAB,
Chapman and Hall / CRC, 2002.

Source Code:

cvt_basis.f90, the source code.
cvt_basis.sh, compiles the source code.

Last revised on 30 April 2022.