OA is a dataset directory which contains a collection of orthogonal arrays (OA's), as computed by Art Owen (AO).
These files were copied from http://lib.stat.cmu.edu/index.php STATLIB. More such files are available there. There is also an OA library and a set of OA executables which manipulate OA's.
The data files provided here happen to conform to the TABLE format, and there are a variety of utilities available for reading and manipulating such data.
An orthogonal array A is a matrix of n rows and k columns, with every element being one of the q symbols 0 through q-1. The array has strength t if, in every n by t submatrix, the qt possible distinct rows, all appear the same number of times. This number is the index of the array, commonly denoted lambda. Clearly,
lambda * qt = n
Geometrically, if one were to "plot" the submatrix with one plotting axis for each of the t columns and one point in t dimensional space for each row, the result would be a grid of qt distinct points. There would be lambda "overstrikes" at each point of the grid.
The notation for such an array is OA ( n, k, q, t ).
If
n <= q(t+1)then the n rows "should" plot as n distinct points in every n by t+1 dimensional subarray. When this fails to hold, the array has the "coincidence defect".
Owen (1992,1994) describes some uses for randomized orthogonal arrays, in numerical integration, computer experiments and visualization of functions. Those references contain further references to the literature, that provide further explanations. A strength 1 randomized orthogonal array is a Latin hypercube sample, essentially so or exactly so, depending on the definition used for Latin hypercube sampling. The arrays constructed here have strength 2 or more, it being much easier to construct arrays of strength 1.
The randomization is achieved by independent uniform permutation of the symbols in each column.
To investigate a function f of d variables, one has to have an array with k greater than or equal to d. One may also have a maximum value of n in mind and a minimum value for the number q of distinct levels to investigate. It is entirely possible that there is no array of strength t greater than 1 that is compatible with these conditions. The programs here provide some choices to pick from, hopefully without too much of a compromise.
The constructions used are based on published algorithms that exploit properties of Galois fields. Because of this, the number of levels q must be a prime power. That is
q = prwhere p is prime and r is a positive integer.
The Galois field arithmetic for the prime powers is based on tables published by Knuth and Alanen (1964). The resulting fields have been tested by the methods described in Appendix 2 of that paper and they passed.
Visualization: Given a function in d dimensions, one might want to run it at N points and then use interactive data analysis on the output. For example if the function computes switching speed and breakdown voltage of a semiconductor device given d=10 process settings one might select those points with a large enough breakdown voltage and a large enough speed and then make plots in lower dimensions of the corresponding process settings. Full 10 dimensional grids are infeasible for this. Latin hypercube samples can miss settings where there are strong effects in the corners. For example if one setting is oxidation temperature and another is oxidation time we expect that the corners (hi,hi) and (lo,lo) will be significant. It can happen that Latin hypercube samples have gaps in these corners. There are 4*C(d,2)=2d(d-1) "bivariate" corners to investigate and random designs or Latin hypercube samples can easily miss some of them.
Integration: The sum of function values over a Latin hypercube sample is a good estimate of the integral over the input cube, for functions that are nearly additive. Stein shows how this works. Owen gives a central limit theorem for the estimate and shows how to estimate the variance. The sum over an orthogonal array of strength 2 gives a similarly good estimate of the integral for functions dominated by main effects and two factor interactions among the input variables. Owen discusses this.
Computer Experiments: In many computer experiments the visualization methods outlined above are sufficiently informative. Sometimes one would like to find response surface models for the function, perhaps for predictive purposes, or for interpolation to a finer visualization data set. Least squares regression methods can be applied here. The population coefficients are determined from certain integrals over the input space. These can be estimated by the sum over a sample (independent, Latin hypercube, orthogonal array). More accurate sample integrals imply more accurately estimated response surfaces.
Multivariate Nonparametric Regression: These arrays might form good designs for fitting models like MARS, Friedman.
Suppose that q, the number of distinct values per axis, is a prime or a prime raised to a power. Then there exists a Galois field with q elements, GF(q). Using this field one can construct OA( q^2,q+1,q,2 ) the orthogonal array with N=q^2 rows, k=q+1 columns, q symbols and strength 2. The construction is a special case of the one given in Raghavarao 2.4 and appears to be very old. Let column 1 be q 0's, q 1's,...,q (q-1)'s and let column 2 be q repetitions of (0,1,...,q-1). Then for 1 <= k <= q-1 column k+2 is (column 1) + k * (column 2) where the addition and multiplication take place in GF(q). If q is prime these are simply addition and multiplication modulo q. Random permutation of the labels within columns preserves the orthogonality, breaks up the "planes" and provides a basis for randomization inference.
The files gf.02, gf.03, gf.04 ... contain these arrays with q=2,3,4,5,7,8,9,11,13,16. Note that gf.q contains q^3+q^2 numbers so files with q=17,25,27,32 come in their own shar files.
When q is not a prime power the largest attainable number of columns k can be much less than q+1. In general if c mutually orthogonal latin squares of side q can be found, then an orthogonal array of k=c+2 columns may be constructed. For q=6, k=3 is the limit.
These designs are q^(q-1) fractional replications of q^(q+1) factorials with resolution III. That is they might be denoted q_III^[(q+1)-(q-1)].
In practice one might use the first 5 columns of gf.16 to get 256 runs in 5 variables. If some input variables only take 2 values (on/off say) then they can be coded as a column of gf.q for any even number q by mapping 0..q/2-1 to on and q/2..q-1 to off.
Taguchi's L9 has the same form as gf.03, L25 has the form of gf.05. The array qf.32 could be called L1024.
The computer code and data files described and made available on this web page are distributed under the GNU LGPL license.
Files you may copy include:File | Levels | Runs | Variables |
---|---|---|---|
gf.02 | 2 | 4 | 3 |
gf.03 | 3 | 9 | 4 |
gf.04 | 4 | 16 | 5 |
gf.05 | 5 | 25 | 6 |
gf.07 | 7 | 49 | 8 |
gf.08 | 8 | 64 | 9 |
gf.09 | 9 | 81 | 10 |
gf.11 | 11 | 121 | 12 |
gf.13 | 13 | 169 | 14 |
gf.16 | 16 | 256 | 17 |
(gf.17) | 17 | 289 | 18 |
(gf.25) | 25 | 625 | 26 |
(gf.32) | 32 | 1024 | 33 |
With N=q^2 only q+1 columns are possible. Addelman and Kempthorne show how to get 2q+1 columns in N=2q^2 rows. As before q is a prime or a power of a prime. The files ak.02,...,ak.11 contain these designs for q=2,3,5,7,9,11. The algorithm for even q is not as easy to code, but the bb2 arrays below fill that role. The array L18 has the same form as ak.03. This array was known before Addelman and Kempthorne by Bose and Bush and is alluded to in a note added in proof to Plackett and Burman. These designs are OA( 2q^2,2q+1,q,2 ).
Files you may copy include:
File | Levels | Runs | Variables |
---|---|---|---|
ak.02 | 2 | 8 | 5 |
ak.03 | 3 | 18 | 7 |
ak.05 | 5 | 50 | 11 |
ak.07 | 7 | 98 | 15 |
ak.09 | 9 | 162 | 19 |
(ak.11) | 11 | 242 | 23 |
Bose and Bush show how to construct OA( lambda x q^2, lambda x q, q, 2 ) where q is a prime power and lambda is a power of the same prime. The designs bb2.02,...,bb2.16 are of this form with lambda=2, except that they are augmented with a 2q+1st column using a method Bose and Bush discuss. So they are of the form OA( 2q^2, 2q+1, q, 2 ) where q is a power of 2, augmenting the designs available from Addelman and Kempthorne.
Files you may copy include:
File | Levels | Runs | Variables |
---|---|---|---|
bb2.02 | 2 | 8 | 5 |
bb2.04 | 4 | 32 | 9 |
bb2.08 | 8 | 128 | 17 |
(bb2.16) | 16 | 512 | 33 |
It is possible to construct arrays of strength 3. These might be useful for the same purposes as arrays of strength 2, but they require much larger numbers N of runs.
If you want q levels and q is not a power of a prime, the MacNeish-Mann theorem might help. If q=25*4, you can get an array with 5 columns of 4 symbols (gf.04) and an array with 26 columns of 25 symbols (gf.25). Take the first 5 columns of gf.25. Then make an array of 100x5 symbols in which the kth column is obtained by taking each element of the kth column of gf.25 and replacing it by a vector of length 16 formed by taking 4* the element of gf.25 and adding the 16 values in the kth column of gf.4.
You can go up one level to the DATASETS directory.