Tessellation and Clustering by Mixture Models
Clustering and tessellations are basic tools in data mining. CVT and Mixture Model-based clustering can be used to tessellate a region upon any sample functions defined in this region. One of the most important algorithms in the Mixture Model-based clustering is the EM (Expectation Maximization) algorithms. We successfully introduced a new clustering strategy which shares common features with both the EM and CVT algorithms. Moreover, it also leads to more general tessellation of a region with respect to a continuous and perhaps anisotropic density distribution. We also proposed some probabilistic methods that play important roles in the construction of these tessellations and their parallel implementations on distributed memory systems.
k CEM Our method k-means k=20 k=20 k=20
Comparing the clustering results from CEM, our method and k-means.
Method Handwriting recognition rate Our method 85.7% CEM 78.7% k-means 79.8%
Comparing the handwriting recognition rate of our method, CEM and k-means.
- Tessellation and Clustering by Mixture Models and Their Parallel Implementations, with Qiang Du, Proceeding of the fourth SIAM international conference on Data Mining, (regular paper), Orlando, FL, 2004, SIAM, pp257-268.