Datasets

adjacency, a dataset directory which contains adjacency matrices associated with an undirected graph.
alphabet_lowercase, a dataset directory which contains large images of the 26 lowercase alphabetic characters.
alphabet_uppercase, a dataset directory which contains large images of the 26 uppercase alphabetic characters.
beale_cipher, a dataset directory which contains the text of the three Beale cipher documents, which are supposed to indicate the location of a hoard of gold and silver.
bin_packing, a dataset directory which contains examples of the bin packing problem, in which a number of objects are to be packed in the minimum possible number of uniform bins;
birthdays, a dataset directory which contains data related to birthdays, such as the birthdays of members of hockey teams, and the number of babies born in the US on each calendar day over an interval of several years.
boston_housing, a dataset directory which stores training and test data about housing prices in Boston. This dataset is also available as a builtin dataset in keras.
cats, a dataset directory which contains jpg images of cats.
ccs, a data directory which contains examples of sparse matrices stored as Compressed Column Storage (CCS) files, a three-file format;
census, a dataset directory which contains US census data;
chain_letters, a dataset directory which contains examples of a chain letter;
change_making, a dataset directory which contains test data for the change making problem;
cities, a dataset directory which contains sets of information about cities and the distances between them;
clustering, a dataset directory which can be used with clustering algorithms;
color, a dataset directory which contains information about colors in terms of RGB values.
crs, a dataset directory which contains examples of sparse matrices stored in Compressed Row Storage (CRS) format, a three-file format;
csv, a data directory which contains examples of CSV files, a flat file format of Comma Separated Values.
dates, a dataset directory which contains lists of dates in certain calendars.
dogs, a dataset directory which contains images of dogs.
draft_lottery, a dataset directory which contains the numbers assigned to each birthday, for the Selective Service System lotteries for 1970 through 1976.
faces, a dataset directory which contains 10 photographs of each of 40 people, for use in facial recognition experiments.
faces_angela_merkel, a dataset directory which contains images of Angela Merkel for facial recognition applications.
faces_arnold_schwarzenegger, a dataset directory which contains images of Arnold Schwarzenegger for facial recognition applications.
faces_emma_stone, a dataset directory which contains images of Emma Stone for facial recognition applications.
faces_matt_damon, a dataset directory which contains images of Matt Damon for facial recognition applications.
faces_michael_caine, a dataset directory which contains images of Michael Caine for facial recognition applications.
faces_sylvester_stallone, a dataset directory which contains images of Sylvester Stallone for facial recognition applications.
faces_taylor_swift, a dataset directory which contains images of Taylor Swift for facial recognition applications.
fingerprints, a dataset directory which contains a few images of fingerprints.
ge, a dataset directory which contains matrices stored in General (GE) format;
generalized_assignment, a dataset directory which contains test data for the generalized assignment problem;
german, a dataset directory which contains some short texts in German;
graffiti, a dataset directory which contains 195 mathematical graphs, described as a collection of nodes, with edges between some pairs of nodes.
graphics_test, a dataset directory which contains examples of data used to illustrate or test various graphics procedures for presenting and analyzing data.
grid, a dataset directory which contains points generated on an M-dimensional uniform grid, but with "holes".
hand, a dataset directory which contains the 59 (x,y) coordinates of points that outline a hand;
hartigan, a dataset directory which contains datasets for testing clustering algorithms;
hbsmc, a dataset directory which contains the Harwell Boeing Sparse Matrix Collection (HBSMC);
hex_grid, a dataset directory which contains points that form a hexagonal array in the 2D square, or more general 2D regions.
house, a dataset directory which contains the 11 (x,y) coordinates of points that outline a house;
human, a dataset directory which lists the 136 (x,y) coordinates of points traced along the outline of a human body.
imagej, a dataset directory which contains image data suitable for use with the ImageJ program.
incidence, a dataset directory which contains incidence matrices associated with a directed graph.
interpolation, a dataset directory which contains datasets to be interpolated.
iswr, a dataset directory which contains example datasets used for statistical analysis.
knapsack, a dataset directory which contains test data for the knapsack problem; we consider n items of given value and weight, and a knapsack with a weight limit. We wish to select items to store in the knapsack which maximize the total value.
knapsack_multiple, a dataset directory which contains test data for the multiple knapsack problem;
lcvt, a dataset directory which contains examples of Latinized Centroidal Voronoi Tessellations;
lp, a dataset directory which contains datasets for linear programming, used for programs such as CPLEX, GUROBI and SCIP;
maple, a dataset directory which contains the 1776 (x,y) coordinates of points along the perimeter of a maple leaf;
martinez, a dataset directory which contains datasets for computational statistics;
molecule_xyz, a dataset directory which contains the Cartesian (x,y,z) coordinates of the atoms forming a molecule.
mortality, a dataset directory which lists mortality information for a single year, the age at death, number of deaths, male deaths and females deaths, for a total of 2,423,509 deaths, including 1,203,812 males and 1,219,697 females, between the ages of 0 and 114.
mps, a dataset directory which contains datasets for linear programming;
ngrams, a dataset directory which contains information about the observed frequency of "ngrams" (particular sequences of n letters) in English text.
niederreiter2, a dataset directory which contains examples of the Niederreiter quasirandom sequence using a base of 2;
partition_problem, a dataset directory which contains examples of the partition problem, in which a set of numbers is given, and it is desired to break the set into two subsets with equal sum.
pcl, a dataset directory which contains datasets from a gene expression experiment on Arabidopsis;
polygon, a dataset directory which contains examples of polygons;
population, a dataset directory which contains listings of populations.
presidents, a dataset directory which lists various facts about US presidents.
profile, a dataset directory which lists the 44 (x,y) coordinates of points traced along the profile of a face.
propack, a dataset directory which contains matrices in Harwell-Boeing format, used for testing the SVD package propack();
pyramid_jaskowiec_rule, a dataset directory which contains the definitions of 20 symmetric quadrature rules for the pyramid, by Jan Jaskowiec and Natarajan Sukumar.
quad_mesh, a dataset directory which contains examples of quad meshes.
quadrature_rules, a dataset directory which contains quadrature rules for 1D intervals, 2D rectangles or M-dimensional rectangular regions, stored as a file of abscissas, a file of weights, and a file of region limits.
quadrature_rules_pyramid, a dataset directory which contains quadrature rules for a pyramid with a square base.
quadrature_rules_tet, a dataset directory which contains quadrature rules for tetrahedrons, stored as a file of abscissas, a file of weights, and a file of vertices.
regression, a dataset directory which contains datasets for testing linear regression;
sammon, a dataset directory which contains examples of six kinds of M-dimensional datasets for cluster analysis.
sgb, a dataset directory which contains files used as input data for demonstrations and tests of Donald Knuth's Stanford Graph Base.
sgmga, a dataset directory which contains SGMGA files (Sparse Grid Mixed Growth Anisotropic), that is, M-dimensional Smolyak sparse grids based on a mixture of 1D rules, and with a choice of exponential and linear growth rates for the 1D rules and anisotropic weights for the dimensions.
sokal_rohlf, a dataset directory which contains biological datasets considered by Sokal and Rohlf.
solar_system, a dataset directory of planetary measurements.
spaeth, a dataset directory which contains datasets for cluster analysis;
spaeth2, a dataset directory which contains datasets for cluster analysis;
sphere_design_rule is a dataset directory which contains files defining point sets on the surface of the unit sphere, known as "designs", which can be useful for estimating integrals on the surface, among other uses.
sphere_grid, a dataset directory which contains grids of points, lines, triangles or quadrilaterals on a sphere;
sphere_lebedev_rule, a dataset directory which contains sets of Lebedev points on a sphere which can be used for quadrature rules of a known precision;
sphere_maximum_determinant, a dataset directory which contains files defining maximum determinant rules on the unit sphere, which can be used for interpolation and quadrature;
st, a dataset directory of examples of Sparse Triplet (ST) files, a sparse matrix file format, storing just (I,J,A(I,J)), and using zero-based indexing.
st1, a dataset directory of examples of Sparse Triplet (ST1) files, a sparse matrix file format, storing just (I,J,A(I,J)), and using one-based indexing.
states, a dataset directory which contains some information about the individual American states.
stats, a dataset directory which contains some examples of statistical datasets.
subset_sum, a dataset directory which contains examples of the subset sum problem, in which a set of numbers is given, and it is desired to find at least one subset that sums to a given target value.
svdpack, a dataset directory which contains matrices in Harwell-Boeing format, used for testing the singular value decomposition library svdpack();
symbols, a dataset directory which contains large images of numbers and symbols.
test_approx, a dataset directory which contains sets of data (x,y) for which an approximating formula is desired.
test_con, a dataset directory which contains sequences of points that lie on M-dimensional curves defined by sets of nonlinear equations;
tet_mesh_order4, a dataset directory of examples of order 4 tetrahedral meshes.
tet_mesh_order10, a dataset directory of examples of order 10 tetrahedral meshes.
tet_mesh_order20, a dataset directory of examples of order 20 tetrahedral meshes.
tetrahedron_jaskowiec_rule, a dataset directory which contains the definitions of 20 symmetric quadrature rules for the tetrahedron, by Jan Jaskowiec and Natarajan Sukumar.
tetrahedrons, a dataset directory which contains examples of tetrahedrons.
text, a dataset directory which contains some short texts in English, such as the Gettysburg Address;
tikz, a data directory of examples of tikz files, which are descriptions of drawings to be included in a tex file.
time_series, a data directory of examples of time series, which are simply records of the values of some quantity at a sequence of times.
timelines, a data directory of examples of timelines, that is, dates or durations or lifetimes meant to be displayed in chronological order.
triangles, a dataset directory which contains examples of triangles.
triangulation_order3, a dataset directory which contains examples of order 3 triangulations, a linear triangulation of a set of 2D points, using a pair of files to list the node coordinates and the 3 nodes that make up each triangle;
triangulation_order4, a dataset directory which contains examples of order 4 triangulations, a triangulation of a set of 2D points, using a pair of files to list the node coordinates and the 4 nodes that define each triangle (3 vertices and the centroid);
triangulation_order6, a dataset directory which contains examples of order 6 triangulations, a quadratic triangulation of a set of 2D points, using a pair of files to list the node coordinates and the 6 nodes that make up each triangle; Six-node triangles are used when a higher degree approximation is desired; they may also be used as isoparametric elements that model curved boundaries;
triola, a dataset directory which contains datasets used for statistical analysis.
tsp, a dataset directory which contains examples of the traveling salesperson problem.
van_der_corput, a dataset directory which contains examples of one-dimensional van der Corput sequences, for various bases;
words, a dataset directory which contains lists of words;
xls, a data directory which contains examples of XLS files, used by the Microsoft Excel spreadsheet program.

Last revised on 21 January 2025.