The Finite Minimization Algorithm

Given a set of points X, we want to group them into K clusters. Each cluster has a special point Z, and a point X_I belongs in cluster J if it is closest to the special point Z_J.

Now so far the points Z_J have been the centroids of their clusters. Actually, we've had to iterate to get this to happen; for each cluster of points we compute a centroid, then for the new centroids we recluster the points, and repeat.

One disadvantage of this approach is that the special point Z is usually not a member of the set. This doesn't matter so much when the points are points in space. But if our "points" are words, or proteins or so on, it's not quite so clear what the best way to compute distance is, and it would be very nice if the representative point of the cluster was the same kind of thing as the points that define the cluster. In other words, if we take a list of words, and we can somehow define distances between words, and we find a cluster of "dog", "cat", and "fox", we may not know how to average these three words to get a centroid. But it may be good enough to just pick the "most central" word out of the list.

So, for a finite set of points, we may proceed as follows: We have a set of N points X, and a subset of K of these points X(center). For each point X, we measure its distance to each of the center points, and put it into the cluster whose center is nearest. Now it's time to update the cluster centers. For each cluster, we consider using each point as the new cluster center. We compute the corresponding cluster energy, and take as our center whichever point makes this energy the minimum.

So this algorithm looks just like the centroid algorithm: group the data around the centers, update the centers, and repeat. It only differs in how we update the centers. This algorithm is termed the finite minimization algorithm because the only candidates for our center point are the finite set of data points, not the infinite number of points "near" the data points.

Try to make a copy of your clustering algorithm that uses this finite minimization technique. To verify that it's working properly, do an energy calculation, working with our 2D dataset of 100 points in [0,100] by [0,100]. Can you predict, beforehand, a relationship between the energies computed for the full minimization algorithm, and for the finite minimization algorithm?

A big advantage of this algorithm is that the center points of each cluster are guaranteed to be datapoints. That means that our clustering can always be described in terms of a set of K "most typical" data points, and their neighbors.

Last modified on 02 July 2001.