MARTINEZ
Computational Statistics Datasets


MARTINEZ is a dataset directory which contains data associated with a book on computational statistics and MATLAB.

The original data files are available as MATLAB M files, and as text files. The original text files were broken up so that each variable is now in its own file, with no extraneous text or blank lines. This may facilitate the use of the data by a variety of programs.

Licensing:

The computer code and data files described and made available on this web page are distributed under the GNU LGPL license.

Reference:

  1. Wendy Martinez, Angel Martinez,
    Computational Statistics Handbook with MATLAB,
    Chapman and Hall, 2002.
  2. http://lib.stat.cmu.edu,
    the STATLIB web site.
  3. http://www.infinityassociates.com

Datasets:

The ANAEROB data set has 53 observations (rows) of oxygen uptake and expired ventilation. The files you may copy are:

The ABRASION data set has 30 observations (rows). The files you may copy are:

The ANSCOMBE data set has 11 observations (rows) of simulated data used to illustrate the ideas of exploratory data analysis. The files you may copy are:

The BANK data set has 100 observations (rows) of six properties (columns) of banknotes. Observations were made for sets of 100 forged and 100 genuine banknotes. This data can be used to test clustering techniques. The files you may copy are:

The BIOLOGY data set records the number of papers published for 1534 biologists. The number of papers ranges from 1 to 11. The files you may copy are:

The BODMIN data set records the location of 35 granite tors on Bodmin Moor. The files you may copy are:

The BOSTON data set contains 14 measures (columns) of housing data for 506 census tracts (rows) in Boston, taken in 1970. The columns

The files you may copy are:

The BROWNLEE data set has 21 observations of a plant operation for the oxidation of ammonia. There are three predictor or "X" variables, and one response or "Y" variable. The files you may copy are:

The CARDIFF data set records the location of the homes of 168 juvenile offenders in Cardiff, Wales. The files you may copy are:

The CEREAL data set conains 11 ratings (columns) of 8 brands (rows) of cereal. The files you may copy are:

The COAL data set counts the number of coal mine disasters per year over 112 years. The files you may copy are:

The CLUSTER data set is an artificial and simple example of 5 points in 2D, which can be grouped into two clusters. This data can be used to test clustering techniques. The files you may copy are:

The COUNTING data set counts the number of scintillations in 72 second intervals arising from the decay of radioactive polonium. The files you may copy are:

The ELDERLY data set contains the height measurements in centimeters of 351 elderly women. The files you may copy are:

The ENVIRON data set contains 111 daily readings of ozone level and wind speed in New York City between May and September 1973. The files you may copy are:

The FILIP data set contains 82 pairs of (x,y) data, used as a standard test for least squares calculations. The files you may copy are:

The FLEA data set contains measurements (rows) of 2 quantities (columns) for each of 3 species of flea. This data can be used to test clustering techniques. The files you may copy are:

The FOREARM data set contains measurements of the length in inches of the forearms of 140 adult males. Files you may copy are:

The GEYSER data set contains the waiting time in minutes between successive eruptions of the Old Faithful geyser. 300 values are recorded. Files you may copy are:

The HELMETS data set has 133 observations of the acceleration of a head after an accident. The files you may copy are:

The HOUSEHOLD data set contains observations of 4 expenditures (columns) for households of single men and single women. This data can be used to test clustering techniques. The files you may copy are:

The HUMAN data set records measurements of the percentage of fat (column 1) and age (column 2). This data can be used to test clustering techniques. The files you may copy are:

The INSECT data set contains 10 measurements (rows) of 3 quantities (columns) for each of 3 species of insect. This data can be used to test clustering techniques. The files you may copy are:

The INSULATE data set contains measurements (rows) of 2 quantities (columns): the average outside temperature in degrees Celsius, and the weekly gas consumption in thousands of cubic feet. One set of data was take before insulation, and the other after insulation. The files you may copy are:

The IRIS data set contains 50 measurements (rows) of 4 quantities (columns) for each of 3 species of iris. This data can be used to test clustering techniques. The files you may copy are:

The LAW data set is a random sampling of the LAWPOP data set. It contains the LSAT scores and GPA's for 15 randomly chosen records. The files you may copy are:

The LAWPOP data set contains the average LSAT scores and GPA's for freshman students at 82 law schools. The files you may copy are:

The LONGLEY data set contains 16 observations (rows) of 7 predictor variables X (one of which is always 1), and a response variable Y. The files you may copy are:

The MEASURE data set contains 20 measurements (rows) of 3 quantities (columns), chest, waist and hips. 10 of the measurements are for men, 10 for women. This data can be used to test clustering techniques. The files you may copy are:

The MOTHS data set contains the number of moths caugh in a trap over 24 consecutive nights. Files you may copy are:

The NFL data set contains measure of the game time til first score by kicking the ball between the end posts (X1) and game time til the first score made by moving the ball into the end zone (X2). 42 observations were made. The files you may copy are:

The OKBLACK data set records the location of thefts by African-American offenders in Oklahoma City in the late 1970's. The files you may copy are:

The OKWHITE data set records the location of thefts by Caucasian offenders in Oklahoma City in the late 1970's. The files you may copy are:

The PEANUTS data set contains measure of the average level of alfatoxin of a batch of peanuts, and the percentage of non-contaminated peanuts in the batch. 34 observations were made. The files you may copy are:

The POSSE data set contains 6 sets of data generated for simulation studies. Each data set has 400 observations (rows) in 8 dimensions (columns). Files you may copy are:

The QUAKES data set records the time in days between successive earthquakes. 62 intervals are recorded. The files you may copy are:

The REMISS data set contains the remission times for 42 leukemia patients. Some of the patients were treated with the drug 6-mercaptopurine, and the rest were part of the control group. The files you may copy are:

The SNOWFALL records the annual snowfall, in inches, in Buffalo, New York, for the 63 years from 1910 to 1972. The files you may copy are:

The SPATIAL data set records the scores of 26 neurologically impaired children on a test of spatial perception. The files you may copy are:

The STEAM data set records the average atmospheric temperature X, and the corresponding amount of steam used per month, Y. 25 observations were made. The files you may copy are:

The THROMBOS data set has measurements of urinary-thromboglobulin excretion in 12 normal and 12 diabetic patients. This data can be used to test clustering techniques. The files you may copy are:

The TIBETAN data set contains 32 observations (rows) of 5 measurements (columns) of skull height. 17 of the skulls came from one are, and 15 from another. This data can be used to test clustering techniques. The files you may copy are:

The UGANDA data set records the location of 120 volcano crater centers in west Uganda. The files you may copy are:

The WHISKY data set records the price in dollars of a fifth of whisky in 16 states with state-owned liquor stores and 26 states with private liquor stores. The files you may copy are:

You can go up one level to the DATASETS directory.


Last revised on 31 August 2005.