HARTIGAN 
 Clustering Algorithm Datasets
    
    
    
      HARTIGAN
      is a dataset directory which
      contains test data for 
      clustering algorithms.
    
    
      The data files are all text files, and have a common, simple format:  
      
        -  
          initial comment lines, each beginning with a "#".
        
 
        - 
          A title for the data;
        
 
        - 
          The number of attributes for each data item (columns in the table);
        
 
        - 
          The number of data items (rows in the table);
        
 
        - 
          A set of labels for the data
        
 
        - 
          each row of data, on a separate line, with data separated by spaces,
          and character data in quotes.
        
 
      
    
    
      Licensing:
    
    
      The computer code and data files described and made available on this web page 
      are distributed under
      the GNU LGPL license.
    
    
      Related Data and Programs:
    
    
      
      MDS,
      a dataset directory which
      contains datasets for M-dimensional scaling;
    
    
      
      PCL,
      a dataset directory which
      contains datasets from a gene expression experiment on Arabidopsis,
      which are candidates for data cluster analysis;
    
    
      
      SAMMON,
      a dataset directory which
      contains six sets of M-dimensional data for cluster analysis.
    
    
      
      SOKAL_ROHLF,
      a dataset directory which
      contains biological datasets considered by Sokal and Rohlf.
    
    
      
      SPAETH,
      a dataset directory which
      contains datasets for cluster analysis;
    
    
      
      SPAETH2,
      a dataset directory which
      contains datasets for cluster analysis;
    
    
      
      STATS,
      a dataset directory which
      contains datasets for computational statistics;
    
    
      
      TRIOLA,
      a dataset directory which
      contains datasets used for statistical analysis.
    
    
      Reference:
    
    
      
        - 
          John Hartigan,
          Clustering Algorithms,
          Wiley, 1975,
          LC: QA278.H36,
          ISBN: 0-471-35645-X.
         
      
    
    
      Datasets:
    
    
      
        - 
          file01.txt, sightings of minor planets;
        
 
        - 
          file02.txt, animal milk constituent
          percentages;
        
 
        - 
          file03.txt, 1970 crime rates for American
          cities;
        
 
        - 
          file04.txt, 1960-1965 demographic data for
          the South;
        
 
        - 
          file05.txt, measurements of the sides
          of 20 pieces that form a jigsaw puzzle;
        
 
        - 
          file06.txt, measurements of nutrient levels
          in a variety of foods;
        
 
        - 
          file07.txt, life expectancies by age,
          gender and country.
        
 
        - 
          file08.txt, reading and arithmetic
          levels for 4th and 6th grades at a number of schools.
        
 
        - 
          file09.txt, Civil War battles, force
          levels, and casualties.
        
 
        - 
          file10.txt, data about the moons
          in the solar system.
        
 
        - 
          file11.txt, data about the planets
          in the solar system.
        
 
        - 
          file12.txt, Olympic track times.
        
 
        - 
          file13.txt, casualty figures in
          the Vietnam war.
        
 
        - 
          file14.txt, ratings for wine.
        
 
        - 
          file15.txt, vervet sleeping groups.
        
 
        - 
          file16.txt, lists the months in which
          particular forms of British butterflies may be observed.
        
 
        - 
          file17.txt, clusters of animals forming
          a tree.
        
 
        - 
          file18.txt, cost and nutrient
          contributions for selected foods.
        
 
        - 
          file19.txt, dentition of mammals.
        
 
        - 
          file20.txt, frequency of car repairs.
        
 
        - 
          file21.txt, triads based on hardware.
        
 
        - 
          file22.txt, expectation of life in 
          various cities.
        
 
        - 
          file23.txt, relatedness values in 
          "The Boy Has Lost A Dollar".
        
 
        - 
          file24.txt, portable typewriters.
        
 
        - 
          file25.txt, air mile distances between
          30 cities of the world.
        
 
        - 
          file26.txt, birth and death rates
          per 1000 persons.
        
 
        - 
          file27.txt, US per capita income, 1964.
        
 
        - 
          file28.txt, links between states
          (the list of neighbor states for each state).
        
 
        - 
          file29.txt, mutation distances.
        
 
        - 
          file30.txt, stock yields.
        
 
        - 
          file31.txt, Ivy League Football,
          first half of 1965 season.
        
 
        - 
          file32.txt, questionnaire about data
          analysis course.
        
 
        - 
          file33.txt, nails and screws.
        
 
        - 
          file34.txt, ingredients in cakes.
        
 
        - 
          file35.txt, presence of cerci
          (tail appendages) in insects.
        
 
        - 
          file36.txt, amino acid sequence 
          in cytochrome-c for vertebrates.
        
 
        - 
          file37.txt, congressmen by bills 
          (90th congress).
        
 
        - 
          file38.txt, Indo-European languages.
        
 
        - 
          file39.txt, Republican vote for president.
        
 
        - 
          file40.txt, acidosis patients.
        
 
        - 
          file41.txt, profitability of sectors of
          US economy.
        
 
        - 
          file42.txt, Connecticut votes for
          president.
        
 
        - 
          file43.txt, oxidation-fermentation
          patterns in species of Candida.
        
 
        - 
          file44.txt, Ohio croplands.
        
 
        - 
          file45.txt, European foods.
        
 
        - 
          file46.txt, languages spoken in Europe.
        
 
        - 
          file47.txt, mammal's milk.
        
 
        - 
          file48.txt, selected votes in the 
          United Nations (1969-1970).
        
 
        - 
          file49.txt, correlation between physical
          measurements.
        
 
        - 
          file50.txt, Indian caste measurements.
        
 
        - 
          file51.txt, Leukemia mortality rates.
        
 
        - 
          file52.txt, city crime.
        
 
      
    
    
      You can go up one level to 
      the DATASETS directory.
    
    
    
      Last revised on 06 March 2012.