LCVT Latinized CVT Datasets

LCVT is a dataset directory which contains points generated by "Latinizing" an M-dimensional Centroidal Voronoi Tessellation.

Each dataset contains N points in M-dimensions, contained with the unit hypercube, and with the centered Latin hypercube property, namely, that for each coordinate 1 <= I <= M, and each of the N intervals of the form [(J-1)/N,J/N], for 1 <= J <= N, there is exactly one point K whose I-th coordinate lies at the center of that interval.

It's actually pretty easy to generate datasets with this property. For example, there is a "diagonal dataset" that works, and whose first point has all coordinates equal to 1/(2*N), the second point has all coordinates equal to 3/(2*N) and so on. Other examples abound, but they usually do relatively poorly in terms of point distribution in the original M-dimensional space.

A Latinized CVT attempts to achieve good dispersion in two opposing senses, first in the Latin hypercube sense (which considers each dimension separately, and which is achieved exactly by these datasets), and secondly in the CVT sense, which considers dispersion in the original M-dimensional space, and is approximately achieved by the starting CVT dataset used to begin the Latin computation, but which we can only hope the final Latinized set roughly "inherits".

The datasets are distinguished by the values of the following parameters:

• M, the spatial dimension;
• N, the number of points to generate;
• SEED, the initial seed for the random number routine;
• INITIALIZE, UNIFORM/RANDOM/HALTON/GRID/FILE, the method of initializing the generators;
• SAMPLE, UNIFORM/RANDOM/HALTON/GRID, the method of sampling the Voronoi diagram;
• SAMPLE_NUM, the number of points used to sample the region to estimate area, energy, and the Voronoi diagram;
• CVT_IT, the number of CVT iteration steps;
• CVT_ENERGY, the clustering energy of the CVT dataset used to initialize the final Latin computation;
• LATIN_IT, the number of Latin iteration steps;
• LATIN_ENERGY, the clustering energy of the final Latinized CVT dataset;
The values of M and N are specified in the dataset file names.

The values of the clustering energy are approximated by averaging the squared distance between each sampling point and the nearest element of the dataset.

Related Data and Programs:

CVT, a dataset directory which contains examples of CVT (Centroidal Voronoi Tessellation) datasets.

LATIN_CENTER, a dataset directory which contains examples of Latin Center datasets.

LATIN_EDGE, a dataset directory which contains examples of Latin Edge datasets.

LATIN_RANDOM, a dataset directory which contains examples of Latin Random datasets.

LCVT, a C++ library which computes a Latinized Centroidal Voronoi Tessellation (CVT).

LCVT_DATASET, a C++ program which computes a Latinized Centroidal Voronoi Tessellation (CVT) and writes it to a file.

LCVTP, a dataset directory which contains examples of Latinized CVT's on periodic regions.

PLOT_POINTS, a FORTRAN90 program which can plot two dimensional datasets, making Encapsulated PostScript images.

TABLE_LATINIZE, a FORTRAN90 program which can read a TABLE file of points and "latinize" the points, that is, "gently" rearranging them so that they are regularly spaced in every coordinate direction.

TABLE_TOP, a FORTRAN90 program which can be used to analyze datasets of any dimension, by creating images of pairwise coordinates.

Example dataset:

A typical (but small) dataset looks like this:

#  lcvt_02_00010.txt
#  created by routine LCVT_WRITE in LCVT_DATASET.F90
#  at November 12 2003   4:33:43.028 PM
#
#  Spatial dimension M =   2
#  Number of points N =     10
#  EPSILON (unit roundoff) =   0.119209E-06
#
#  Initial SEED =    123456789
#
#  Initialization by UNIFORM.
#  Sampling by UNIFORM.
#  Number of sample points =        500000
#  Number of CVT iterations =     25
#  Energy of CVT dataset =   0.168880E-01
#  Number of Latin iterations =     10
#  Energy of Latinized CVT dataset =   0.192302E-01
#
0.150000  0.250000
0.450000  0.050000
0.750000  0.150000
0.650000  0.650000
0.250000  0.550000
0.850000  0.850000
0.350000  0.950000
0.950000  0.450000
0.550000  0.350000
0.050000  0.750000

Reference:

1. John Burkardt, Max Gunzburger, Janet Peterson and Rebecca Brannon,
User Manual and Supporting Information for Library of Codes for Centroidal Voronoi Placement and Associated Zeroth, First, and Second Moment Determination,
Sandia National Laboratories Technical Report SAND2002-0099,
February 2002.
Online ordering
2. Charles Colbourn, Jeffrey Dinitz,
CRC Handbook of Combinatorial Designs,
CRC Press, 1996,
ISBN: 0849389488.
3. Qiang Du, Vance Faber, Max Gunzburger,
Centroidal Voronoi Tessellations: Applications and Algorithms,
SIAM Review,
Volume 41, Number 4, December 1999, pages 637-676.
4. Michael McKay, William Conover, Richard Beckman,
A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code,
Technometrics,
Volume 21, 1979, pages 239-245.
5. Herbert Ryser,
Combinatorial Mathematics,
Mathematical Association of America, 1963,
ISBN: 0883850141,
LC: QA165.R95.

Datasets:

The first family of datasets in M = 2 dimensions include:

• lcvt_02_00010.txt, M = 2, N = 10, SEED = 123456789, INITIALIZE = UNIFORM, SAMPLE = UNIFORM, SAMPLE_NUM = 500,000, CVT_IT = 25, CVT_ENERGY = 0.017054, LATIN_IT = 5, LATIN_ENERGY = 0.019797;
• lcvt_02_00010.png, a PNG image of the dataset;
• lcvt_02_00100.txt, M = 2, N = 100, SEED = 123456789, INITIALIZE = UNIFORM, SAMPLE = UNIFORM, SAMPLE_NUM = 500,000, CVT_IT = 25, CVT_ENERGY = 0.001694, LATIN_IT = 5, LATIN_ENERGY = 0.001847;
• lcvt_02_00100.png, a PNG image of the dataset;
• lcvt_02_01000.txt, M = 2, N = 1000, SEED = 123456789, INITIALIZE = UNIFORM, SAMPLE = UNIFORM, SAMPLE_NUM = 500,000, CVT_IT = 25, CVT_ENERGY = 0.000164, LATIN_IT = 5, LATIN_ENERGY = 0.000169;
• lcvt_02_01000.png, a PNG image of the dataset;
• lcvt_02_10000.txt, M = 2, N = 10000, SEED = 123456789, INITIALIZE = UNIFORM, SAMPLE = UNIFORM, SAMPLE_NUM = 500,000, CVT_IT = 25, CVT_ENERGY = 0.000017, LATIN_IT = 5, LATIN_ENERGY = 0.000017;

A second family of datasets in M = 2 dimensions was computed using different seeds:

The first family of datasets in M = 3 dimensions include:

• lcvt_03_00010.txt, M = 3, N = 10, SEED = 123456789, INITIALIZE = UNIFORM, SAMPLE = UNIFORM, SAMPLE_NUM = 500,000, CVT_IT = 25, CVT_ENERGY = 0.066311, LATIN_IT = 2, LATIN_ENERGY = 0.068743;
• lcvt_03_00100.txt, M = 3, N = 100, SEED = 123456789, INITIALIZE = UNIFORM, SAMPLE = UNIFORM, SAMPLE_NUM = 500,000, CVT_IT = 25, CVT_ENERGY = 0.011400, LATIN_IT = 2, LATIN_ENERGY = 0.013140;
• lcvt_03_01000.txt, M = 3, N = 1000, SEED = 123456789, INITIALIZE = UNIFORM, SAMPLE = UNIFORM, SAMPLE_NUM = 500,000, CVT_IT = 25, CVT_ENERGY = 0.002437, LATIN_IT = 2, LATIN_ENERGY = 0.002592;
• lcvt_03_10000.txt, M = 3, N = 10000, SEED = 123456789, INITIALIZE = UNIFORM, SAMPLE = UNIFORM, SAMPLE_NUM = 500,000, CVT_IT = 25, CVT_ENERGY = 0.000529, LATIN_IT = 2, LATIN_ENERGY = 0.000543;

The first family of datasets in M = 7 dimensions include:

• lcvt_07_00010.txt, M = 7, N = 10, SEED = 123456789, INITIALIZE = UNIFORM, SAMPLE = UNIFORM, SAMPLE_NUM = 500,000, CVT_IT = 25, CVT_ENERGY = 0.343148, LATIN_IT = 2, LATIN_ENERGY = 0.445605;
• lcvt_07_00100.txt, M = 7, N = 100, SEED = 123456789, INITIALIZE = UNIFORM, SAMPLE = UNIFORM, SAMPLE_NUM = 500,000, CVT_IT = 25, CVT_ENERGY = 0.155165, LATIN_IT = 2, LATIN_ENERGY = 0.229383;
• lcvt_07_01000.txt, M = 7, N = 1000, SEED = 123456789, INITIALIZE = UNIFORM, SAMPLE = UNIFORM, SAMPLE_NUM = 500,000, CVT_IT = 25, CVT_ENERGY = 0.079601, LATIN_IT = 2, LATIN_ENERGY = 0.094108;
• lcvt_07_10000.txt, M = 7, N = 10000, SEED = 123456789, INITIALIZE = UNIFORM, SAMPLE = UNIFORM, SAMPLE_NUM = 500,000, CVT_IT = 25, CVT_ENERGY = 0.040774, LATIN_IT = 2, LATIN_ENERGY = 0.045106;

The first family of datasets in M = 16 dimensions include:

• lcvt_16_00010.txt, M = 16, N = 10, SEED = 123456789, INITIALIZE = UNIFORM, SAMPLE = UNIFORM, SAMPLE_NUM = 500,000, CVT_IT = 25, CVT_ENERGY = 1.057923, LATIN_IT = 2, LATIN_ENERGY = 1.451406;
• lcvt_16_00100.txt, M = 16, N = 100, SEED = 123456789, INITIALIZE = UNIFORM, SAMPLE = UNIFORM, SAMPLE_NUM = 500,000, CVT_IT = 25, CVT_ENERGY = 0.828592, LATIN_IT = 2, LATIN_ENERGY = 1.042796;
• lcvt_16_01000.txt, M = 16, N = 1000, SEED = 123456789, INITIALIZE = UNIFORM, SAMPLE = UNIFORM, SAMPLE_NUM = 500,000, CVT_IT = 25, CVT_ENERGY = 0.590259, LATIN_IT = 2, LATIN_ENERGY = 0.714075;
• lcvt_16_10000.txt, M = 16, N = 10000, SEED = 123456789, INITIALIZE = UNIFORM, SAMPLE = UNIFORM, SAMPLE_NUM = 500,000, CVT_IT = 25, CVT_ENERGY = 0.415456, LATIN_IT = 2, LATIN_ENERGY = 0.502405;

You can go up one level to the DATASETS directory.

Last revised on 02 November 2005.