Tessellation and Clustering by Mixture Models

Clustering and tessellations are basic tools in data mining. CVT and Mixture Model-based clustering can be used to tessellate a region upon any sample functions defined in this region. One of the most important algorithms in the Mixture Model-based clustering is the EM (Expectation Maximization) algorithms. We successfully introduced a new clustering strategy which shares common features with both the EM and CVT algorithms. Moreover, it also leads to more general tessellation of a region with respect to a continuous and perhaps anisotropic density distribution. We also proposed some probabilistic methods that play important roles in the construction of these tessellations and their parallel implementations on distributed memory systems.

Examples:

k CEM Our method k-means
k=20
k=20
k=20

Comparing the clustering results from CEM, our method and k-means.

 

Method Handwriting recognition rate
Our method 85.7%
CEM 78.7%
k-means 79.8%

Comparing the handwriting recognition rate of our method, CEM and k-means.

 

bullet

References

  1. Tessellation and Clustering by Mixture Models and Their Parallel Implementations, with Qiang Du, Proceeding of the fourth SIAM international conference on Data Mining, (regular paper), Orlando, FL, 2004, SIAM, pp257-268.