cluster_energy


cluster_energy, a FORTRAN90 code which seeks to organize data into a given number of clusters, in a way which minimizes the cluster energy.

Specifically, suppose we are given a set of data points in NUM_DIM dimensional space. Suppose we are told to use C_NUM clusters. Each cluster is to be represented by a CENTER point. Each data point is to be assigned to a cluster. The total energy is the sum of the cluster energies, and the energy of a cluster is the sum of the squares of the distance of each data point to its center point.

This code allows the user to specify a dimension, the number of data points, the range of the data, a range of cluster values to try, and the number of cluster iterations to carry out. It then tries to compute the minimal cluster energy for the given data, and the various numbers of clusters.

Licensing:

The computer code and data files made available on this web page are distributed under the MIT license

Languages:

cluster_energy is available in a FORTRAN90 version.

Related Data and codes:

ASA058, a FORTRAN90 code which implements the K-means algorithm of Sparks.

ASA136, a FORTRAN90 code which contains the original text of the Hartigan and Wong clustering algorithm ASA136.

cluster_energy_test

CVT_BASIS, a FORTRAN90 code which uses the CVT algorithm to cluster data.

CVT_BASIS_FLOW, a FORTRAN90 code which uses the CVT algorithm to cluster data related to fluid flow.

KMEANS, a FORTRAN90 code which uses the K-Means algorithm to cluster data.

LAU_NP, a FORTRAN90 code which contains heuristic algorithms for the K-center and K-median problems.

POD_BASIS_FLOW, a FORTRAN90 code which uses the POD algorithm to cluster data related to fluid flow.

POINT_MERGE, a FORTRAN90 code which considers N points in M dimensional space, and counts or indexes the unique or "tolerably unique" items.

SPAETH, a FORTRAN90 code which can cluster data according to various principles.

SPAETH, a dataset directory which contains a set of test data.

SPAETH2, a FORTRAN90 code which can cluster data according to various principles.

SPAETH2, a dataset directory which contains a set of test data.

SVD_BASIS, a FORTRAN90 code which uses the Singular Value Decomposition to cluster data.

Reference:

  1. John Hartigan, Manchek Wong,
    Algorithm AS 136: A K-Means Clustering Algorithm,
    Applied Statistics,
    Volume 28, Number 1, 1979, pages 100-108.
  2. Wendy Martinez, Angel Martinez,
    Computational Statistics Handbook with MATLAB,
    Chapman and Hall / CRC, 2002.
  3. David Sparks,
    Algorithm AS 58: Euclidean Cluster Analysis,
    Applied Statistics,
    Volume 22, Number 1, 1973, pages 126-130.

Source Code:


Last revised on 10 June 2020.