HARTIGAN
Clustering Algorithm Datasets
HARTIGAN
is a dataset directory which
contains test data for
clustering algorithms.
The data files are all text files, and have a common, simple format:
-
initial comment lines, each beginning with a "#".
-
A title for the data;
-
The number of attributes for each data item (columns in the table);
-
The number of data items (rows in the table);
-
A set of labels for the data
-
each row of data, on a separate line, with data separated by spaces,
and character data in quotes.
Licensing:
The computer code and data files described and made available on this web page
are distributed under
the GNU LGPL license.
Related Data and Programs:
MDS,
a dataset directory which
contains datasets for M-dimensional scaling;
PCL,
a dataset directory which
contains datasets from a gene expression experiment on Arabidopsis,
which are candidates for data cluster analysis;
SAMMON,
a dataset directory which
contains six sets of M-dimensional data for cluster analysis.
SOKAL_ROHLF,
a dataset directory which
contains biological datasets considered by Sokal and Rohlf.
SPAETH,
a dataset directory which
contains datasets for cluster analysis;
SPAETH2,
a dataset directory which
contains datasets for cluster analysis;
STATS,
a dataset directory which
contains datasets for computational statistics;
TRIOLA,
a dataset directory which
contains datasets used for statistical analysis.
Reference:
-
John Hartigan,
Clustering Algorithms,
Wiley, 1975,
LC: QA278.H36,
ISBN: 0-471-35645-X.
Datasets:
-
file01.txt, sightings of minor planets;
-
file02.txt, animal milk constituent
percentages;
-
file03.txt, 1970 crime rates for American
cities;
-
file04.txt, 1960-1965 demographic data for
the South;
-
file05.txt, measurements of the sides
of 20 pieces that form a jigsaw puzzle;
-
file06.txt, measurements of nutrient levels
in a variety of foods;
-
file07.txt, life expectancies by age,
gender and country.
-
file08.txt, reading and arithmetic
levels for 4th and 6th grades at a number of schools.
-
file09.txt, Civil War battles, force
levels, and casualties.
-
file10.txt, data about the moons
in the solar system.
-
file11.txt, data about the planets
in the solar system.
-
file12.txt, Olympic track times.
-
file13.txt, casualty figures in
the Vietnam war.
-
file14.txt, ratings for wine.
-
file15.txt, vervet sleeping groups.
-
file16.txt, lists the months in which
particular forms of British butterflies may be observed.
-
file17.txt, clusters of animals forming
a tree.
-
file18.txt, cost and nutrient
contributions for selected foods.
-
file19.txt, dentition of mammals.
-
file20.txt, frequency of car repairs.
-
file21.txt, triads based on hardware.
-
file22.txt, expectation of life in
various cities.
-
file23.txt, relatedness values in
"The Boy Has Lost A Dollar".
-
file24.txt, portable typewriters.
-
file25.txt, air mile distances between
30 cities of the world.
-
file26.txt, birth and death rates
per 1000 persons.
-
file27.txt, US per capita income, 1964.
-
file28.txt, links between states
(the list of neighbor states for each state).
-
file29.txt, mutation distances.
-
file30.txt, stock yields.
-
file31.txt, Ivy League Football,
first half of 1965 season.
-
file32.txt, questionnaire about data
analysis course.
-
file33.txt, nails and screws.
-
file34.txt, ingredients in cakes.
-
file35.txt, presence of cerci
(tail appendages) in insects.
-
file36.txt, amino acid sequence
in cytochrome-c for vertebrates.
-
file37.txt, congressmen by bills
(90th congress).
-
file38.txt, Indo-European languages.
-
file39.txt, Republican vote for president.
-
file40.txt, acidosis patients.
-
file41.txt, profitability of sectors of
US economy.
-
file42.txt, Connecticut votes for
president.
-
file43.txt, oxidation-fermentation
patterns in species of Candida.
-
file44.txt, Ohio croplands.
-
file45.txt, European foods.
-
file46.txt, languages spoken in Europe.
-
file47.txt, mammal's milk.
-
file48.txt, selected votes in the
United Nations (1969-1970).
-
file49.txt, correlation between physical
measurements.
-
file50.txt, Indian caste measurements.
-
file51.txt, Leukemia mortality rates.
-
file52.txt, city crime.
You can go up one level to
the DATASETS directory.
Last revised on 06 March 2012.