cvt_basis, a FORTRAN90 code which uses discrete Centroidal Voronoi Tessellation (CVT) techniques to produce a small set of basis vectors that are good cluster centers for a large set of data vectors;
The clustering process uses the K-Means algorithm, which can be considered to be a discrete version of the CVT algorithm.
The data is a collection of vectors, with each vector stored in a separate file. The files are presumed to have "sequential" names, such as "fred01.txt", "fred02.txt", and so on. Each file must be a TABLE file, that is a series of N lines, with M values on every line (although comment lines may be inserted as well.)
The code is given the name of the first file in the sequence. It reads the data from each file in the sequence, and carries out the K Means clustering process to determine K cluster centers. It writes each of these cluster centers out to a separate file.
The cluster centers will generally be "well spread out" in the space spanned by the set of data. Such a set might be useful, for instance, in determining a basis for a low-dimensional approximation of the data.
INPUT: at run time, the user specifies:
The computer code and data files described and made available on this web page are distributed under the GNU LGPL license.
cvt_basis is available in a FORTRAN90 version.
