The Voronoi Project

Your initial task is to write a centroidal Voronoi Tesselation (CVT) MATLAB code, starting from an existing program in another language.

To do this, start by copying the CVT Fortran program. The program consists of 12 routines. Execution begins and ends in the main program, which calls the other routines to carry out the calculation. A number of data files are created by the program when it runs.

The MATLAB program you are going to create will put each routine in a separate file; you won't need some of the FORTRAN routines at all. Here is a rough sketch of the basic routines we need:

CVT_MAIN - main program;
CVT_ITERATION takes one step.
FIND_CLOSEST finds the nearest point,
REGION_SAMPLER gets sample points from the region.
TEST_REGION determines if a point is in the region. (We're going to change this routine so that the region is a circle).

To begin with:

Print out the program and get an idea of what's going on. There is lots of information in the program, but also lots of stuff we're not going to use at all.
Make a simple flow diagram that shows how the 5 "important" routines are related, that is, how one routine calls other routines. From this, try to understand how the program carries out its task.
For each of the "important" routines, make a list of its parameters, separated into input and output. In some cases, a parameter is both input and output, and we will have to do something about that. I want to see a written or printed copy of this information.
Try to write MATLAB versions of the 5 routines. This will probably be a painful process, but you should be able to read the FORTRAN routines and figure out some of what's going on. Talk to me to fill in the details. Start with TEST_REGION, but converted to a circle, then REGION_SAMPLER, and FIND_NEAREST.
We'll talk together about CVT_ITERATION and compare it to the algorithm that Professor Gunzburger discussed. Then we'll do CVT_MAIN. We'll need to learn how to print out information and how to save information into files.
When this is done, we'll try to make some plots of the data.

From the plots you showed me, it looks like you have a working code. It would be a good idea to freeze a copy of the current code, give it a name, put it somewhere, print it. I'd also like to have an online copy.

Stage 2: It's time to make some complications to the code that will allow us to investigate a number of different possibilities.

To make sure that your code can handle different regions, try to program the "tuning fork" region.
Try a different density function. This is the function "rho". Let's use the function
rho(x,y) = exp ( - 10 * ( x^2 + y^2 ) )
with the square [-1,1] by [-1,1] as your region. Try to get 64 cells. This should make a big difference in your results.
Make the initialization of your generators an option with three possibilities: random (what we've done so far), grid ( evenly spaced points) and Halton. For details of how to set up the Halton calculation, check out the Halton Sequence writeup.

Our new goal is to work on Voronoi clustering of discrete data sets. We're thinking of problems where the quantities X, which we like to think of as geometric points, actually represent DNA sequences, solutions of differential equations, or other kinds of abstract data, and we are interested in how we can efficiently group such data together.

In general, we're going to be looking at cases where we have N points X, each of which is a vector of M numbers; we will be trying to group this data into K Voronoi clusters.

We have an energy measurement associated with the clustering. If we are using K clusters, we write

    E = sum ( over each cluster I ) sum ( points J in cluster I ) ( X_J - C_I )²

where C_I is the centroid of cluster I.

For our first example, we just want to try to sketch out some code that will work, and make sure it runs properly. To start with, let's assume we've picked a value for K, and we want to compute the Voronoi clusters. The code might look something like this:

    function [ C, Energy ] = K_Clusters ( K, X )

    Initialize by setting C(1) = X(1),...C(K) = X(K).

    Do

      For each X, find the nearest C(I).
      
      Replace C(I) by the average of all the points that were closest to C(I)

      Compute the new energy

      If the difference between the new energy and the previous one is "small",
      then quit

    End

Here is some more information on the Gene Expression data. We will want to try out the finite minimization algorithm on this data.

You have proposed using the same algorithm to look at the clustering of stock market data, gray scales and RGB values in images, amplitudes in a sampled sound file, and other groups of data.

Last modified on 01 August 2001.