RUNNING BIG PROBLEMS 25 April 2001 To get an idea of how hard it is to solve a problem with roughly a million generators, we set up a sequence of problems, each about 10 times larger, and recorded the required CPU time. To get a quicker result, we reduced the number of CVT iteration loops to 1, and the number of sampling points per generator to 100. We also did a moment calculation, which is about the same work as 1 CVT iteration, so effectively, we're timing 2 CVT iterations. We used bins to compute the nearest neighbors. The problem had physical dimensions of 100 x 100 x 20. We used bins that were cubes. The number of bins had to be increased as the problem size increased. At first, we were going to use 25 x 25 by 5 bins for all problems, but we quickly discovered that the execution time increased dramatically. Then we realized that one factor in the computation is the average number of points in a nonempty bin, since this is roughly the number of points that have to be compared in the nearest neighbor search. By adjusting our bins as the problem grew so that this value was roughly constant, we got our execution time to grow as we expected, roughly linear with number of generators. Here are the numbers: Generators Bins Points/Nonempty Bin CPU seconds CPU Hours ---------- -------------- ------------------- ----------------------- 5,000 25 x 25 x 5 7 150 s = 0.04 h 50,000 50 x 50 x 10 9 2222 s = 0.62 h 500,000 100 x 100 x 20 13 29030 s = 8.06 h As an illustration of what can happen if you don't increase the number of bins, we compare two runs: Generators Bins Points/Nonempty Bin CPU seconds CPU Hours ---------- -------------- ------------------- ----------------------- 50,000 25 x 25 x 5 69 6977 s = 1.93 h 50,000 50 x 50 x 10 9 2222 s = 0.62 h