Data Clustering by K-Means Techniques

CVT_BASIS is a FORTRAN90 program which computes good cluster centers for a set of data.

The clustering process uses the K-Means algorithm, which can be considered to be a discrete version of the CVT algorithm (Centroidal Voronoi Tessellation).

The data is a collection of vectors, with each vector stored in a separate file. The files are presumed to have "sequential" names, such as "fred01.txt", "fred02.txt", and so on. Each file must be a TABLE file, that is a series of N lines, with M values on every line (although comment lines may be inserted as well.)

The program is given the name of the first file in the sequence. It reads the data from each file in the sequence, and carries out the K Means clustering process to determine K cluster centers. It writes each of these cluster centers out to a separate file.

The cluster centers will generally be "well spread out" in the space spanned by the set of data. Such a set might be useful, for instance, in determining a basis for a low-dimensional approximation of the data.

INPUT: at run time, the user specifies:


The computer code and data files described and made available on this web page are distributed under the GNU LGPL license.


CVT_BASIS is available in a FORTRAN90 version.

Related Data and Programs:

BRAIN_SENSOR_POD, a MATLAB program which applies the method of Proper Orthogonal Decomposition to seek underlying patterns in sets of 40 sensor readings of brain activity.

BURGERS, a data set directory which contains solutions of the 1 dimensional Burgers equation;

CAVITY_FLOW, a dataset directory which contains solutions of a driven cavity flow in 2D;

CVT_BASIS_FLOW, a FORTRAN90 program which is similar to CVT_BASIS, but is specialized to handle a particular family of fluid flow solutions.

CVTP, a FORTRAN90 library which creates a CVTP, that is, a Centroidal Voronoi Tessellation on a periodic domain.

INOUT_FLOW, a dataset directory which contains solutions for flow in and out of a chamber in 2D;

INOUT_FLOW2, a dataset directory which contains solutions for flow in and out of a chamber in 2D, using a finer grid and more timesteps;

SVD_BASIS, a FORTRAN90 program which uses the singular value decomposition to extract representative modes from a set of data vectors.

TCELL_FLOW, a dataset directory which contains solutions for flow through a T-cell in 2D;


  1. Franz Aurenhammer,
    Voronoi diagrams - a study of a fundamental geometric data structure,
    ACM Computing Surveys,
    Volume 23, Number 3, pages 345-405, September 1991.
  2. John Burkardt, Max Gunzburger, Hyung-Chun Lee,
    Centroidal Voronoi Tessellation-Based Reduced-Order Modelling of Complex Systems,
    SIAM Journal on Scientific Computing,
    Volume 28, Number 2, 2006, pages 459-484.
  3. John Burkardt, Max Gunzburger, Janet Peterson, Rebecca Brannon,
    User Manual and Supporting Information for Library of Codes for Centroidal Voronoi Placement and Associated Zeroth, First, and Second Moment Determination,
    Sandia National Laboratories Technical Report SAND2002-0099,
    February 2002.
  4. Qiang Du, Vance Faber, Max Gunzburger,
    Centroidal Voronoi Tessellations: Applications and Algorithms,
    SIAM Review, Volume 41, 1999, pages 637-676.
  5. Lili Ju, Qiang Du, Max Gunzburger,
    Probabilistic methods for centroidal Voronoi tessellations and their parallel implementations,
    Parallel Computing,
    Volume 28, 2002, pages 1477-1500.
  6. Wendy Martinez, Angel Martinez,
    Computational Statistics Handbook with MATLAB,
    Chapman and Hall / CRC, 2002.

Source Code:

Examples and Tests:

List of Routines:

You can go up one level to the FORTRAN90 source codes.

Last revised on 27 November 2012.