Scientific Communication Outline An Improved Implementation of the Nijenhuis-Wilf Random K-Subset Algorithm John Burkardt Abstract (written already) Introduction The task is common to many computations. This is a widely used algorithm with special advantages in storage and work. It uses a clever but complicated trick to avoid O(K^2) work. The resulting algorithm is subtle. The published implementation is obsolete, unreadable and undocumented. The Random K Subset Problem Requires a random number generator Storage, Work, Sorting. Uses cryptography statistical sampling a robust cluster has N servers, a user asks for K, need to pick K at random to guarantee work is evenly assigned. reliability simulation: a system is required to continue to function if up to K out of N subsystems fail. Alternative Algorithms Drawing with replacement Drawing without replacement Reservoir Assign index to every K-subset, choose index at random. The Nijenhuis-Wilf Algorithm Discuss steps, try to be clear. The Published Implementation Defects Proposed Improvements Performance Measurements K = 1, 2, 3, K = M versus K = N - M, K = N - 2, N - 1, N Average work = O(K)? Comparison to other methods. Uniformity Measurements Histogram of number of times each element occurred Chi-Squared test Sample Codes Original F77 code Revised F77 code C and Python versions References N & W book Kreher and Stinson Pemmaraju & Skiena (Mathematica, Combinatorica) RandomKSubset(I,k)