The Voronoi Project


Your initial task is to write a centroidal Voronoi Tesselation (CVT) MATLAB code, starting from an existing program in another language.

To do this, start by copying the CVT Fortran program. The program consists of 12 routines. Execution begins and ends in the main program, which calls the other routines to carry out the calculation. A number of data files are created by the program when it runs.

The MATLAB program you are going to create will put each routine in a separate file; you won't need some of the FORTRAN routines at all. Here is a rough sketch of the basic routines we need:

To begin with:

From the plots you showed me, it looks like you have a working code. It would be a good idea to freeze a copy of the current code, give it a name, put it somewhere, print it. I'd also like to have an online copy.

Stage 2: It's time to make some complications to the code that will allow us to investigate a number of different possibilities.

Our new goal is to work on Voronoi clustering of discrete data sets. We're thinking of problems where the quantities X, which we like to think of as geometric points, actually represent DNA sequences, solutions of differential equations, or other kinds of abstract data, and we are interested in how we can efficiently group such data together.

In general, we're going to be looking at cases where we have N points X, each of which is a vector of M numbers; we will be trying to group this data into K Voronoi clusters.

We have an energy measurement associated with the clustering. If we are using K clusters, we write

    E = sum ( over each cluster I ) sum ( points J in cluster I ) ( XJ - CI )2
  
where CI is the centroid of cluster I.

For our first example, we just want to try to sketch out some code that will work, and make sure it runs properly. To start with, let's assume we've picked a value for K, and we want to compute the Voronoi clusters. The code might look something like this:

    function [ C, Energy ] = K_Clusters ( K, X )

    Initialize by setting C(1) = X(1),...C(K) = X(K).

    Do

      For each X, find the nearest C(I).
      
      Replace C(I) by the average of all the points that were closest to C(I)

      Compute the new energy

      If the difference between the new energy and the previous one is "small",
      then quit

    End
  

Here is some more information on the Gene Expression data. We will want to try out the finite minimization algorithm on this data.

You have proposed using the same algorithm to look at the clustering of stock market data, gray scales and RGB values in images, amplitudes in a sampled sound file, and other groups of data.


Last modified on 01 August 2001.