HARTIGAN
Clustering Algorithm Datasets
HARTIGAN
is a dataset directory which
contains test data for
clustering algorithms.
The data files are all text files, and have a common, simple format:
-
initial comment lines, each beginning with a "#".
-
A title for the data;
-
The number of attributes for each data item (columns in the table);
-
The number of data items (rows in the table);
-
A set of labels for the data
-
each row of data, on a separate line, with data separated by spaces,
and character data in quotes.
Licensing:
The computer code and data files described and made available on this web page
are distributed under
the GNU LGPL license.
Related Data and Programs:
The PCL dataset directory contains sample datasets for clustering,
based on gene expression experiments.
The SPAETH dataset directory contains sample datasets for clustering.
The SPAETH2 dataset directory contains sample datasets for clustering.
Reference:
-
John Hartigan,
Clustering Algorithms,
Wiley, 1975,
LC: QA278.H36,
ISBN: 0-471-35645-X.
Datasets:
-
file01.txt, sightings of minor planets;
-
file02.txt, animal milk constituent
percentages;
-
file03.txt, 1970 crime rates for American
cities;
-
file04.txt, 1960-1965 demographic data for
the South;
-
file05.txt, measurements of the sides
of 20 pieces that form a jigsaw puzzle;
-
file06.txt, measurements of nutrient levels
in a variety of foods;
-
file07.txt, life expectancies by age,
gender and country.
-
file08.txt, reading and arithmetic
levels for 4th and 6th grades at a number of schools.
-
file09.txt, Civil War battles, force
levels, and casualties.
-
file10.txt, data about the moons
in the solar system.
-
file11.txt, data about the planets
in the solar system.
-
file12.txt, Olympic track times.
-
file13.txt, casualty figures in
the Vietnam war.
-
file14.txt, ratings for wine.
-
file15.txt, vervet sleeping groups.
-
file16.txt, lists the months in which
particular forms of British butterflies may be observed.
-
file17.txt, clusters of animals forming
a tree.
-
file18.txt, cost and nutrient
contributions for selected foods.
-
file19.txt, dentition of mammals.
-
file20.txt, frequency of car repairs.
-
file21.txt, triads based on hardware.
-
file22.txt, expectation of life in
various cities.
-
file23.txt, relatedness values in
"The Boy Has Lost A Dollar".
-
file24.txt, portable typewriters.
-
file25.txt, airline distances between
principal cities of the world.
-
file26.txt, birth and death rates
per 1000 persons.
-
file27.txt, US per capita income, 1964.
-
file28.txt, links between states
(the list of neighbor states for each state).
-
file29.txt, mutation distances.
-
file30.txt, stock yields.
-
file31.txt, Ivy League Football,
first half of 1965 season.
-
file32.txt, questionnaire about data
analysis course.
-
file33.txt, nails and screws.
-
file34.txt, ingredients in cakes.
-
file35.txt, presence of cerci
(tail appendages) in insects.
-
file36.txt, amino acid sequence
in cytochrome-c for vertebrates.
-
file37.txt, congressmen by bills
(90th congress).
-
file38.txt, Indo-European languages.
-
file39.txt, Republican vote for president.
-
file40.txt, acidosis patients.
-
file41.txt, profitability of sectors of
US economy.
-
file42.txt, Connecticut votes for
president.
-
file43.txt, oxidation-fermentation
patterns in species of Candida.
-
file44.txt, Ohio croplands.
-
file45.txt, European foods.
-
file46.txt, languages spoken in Europe.
-
file47.txt, mammal's milk.
-
file48.txt, selected votes in the
United Nations (1969-1970).
-
file49.txt, correlation between physical
measurements.
-
file50.txt, Indian caste measurements.
-
file51.txt, Leukemia mortality rates.
-
file52.txt, city crime.
-
file53.txt, presidential heights.
You can go up one level to
the DATASETS directory.
Last revised on 31 August 2005.