REGRESSION
Linear Regression Datasets
REGRESSION
is a dataset directory which
contains test data for
linear regression.
The simplest kind of linear regression involves taking a
set of data (x_{i},y_{i}), and trying
to determine the "best" linear relationship
y = a * x + b
Commonly, we look at the vector of errors:
e_{i} = y_{i}  a * x_{i}  b
and look for values (a,b) that minimize the L1,
L2 or Linfinity norm of the errors. For problems
involving multivariate sets of data, the number a
becomes a matrix, and b a vector, but the idea
is similar.
The data files have a simple format:

initial comment lines, each beginning with a "#".

the number of columns of data;

the number of rows of data;

for each column of data, a line containing a column label;
the first column is always "Index" and counts the rows;
if there is a column labeled "A0" it usually contains the
value 1.0;

each row of data, on a separate line, with data separated by spaces.
There are also some extended examples, which involve an M by N linear
system, a set of linear constraints to be solved exactly, and
a set of linear inequalities. In that case, a master file
lists the sizes of the three sets of data, and the name of
the first file, which contains the linear system.
Licensing:
The computer code and data files described and made available on this web page
are distributed under
the GNU LGPL license.
Related Data and Programs:
HARTIGAN,
a dataset directory which
contains datasets for testing clustering algorithms;
ISWR,
a dataset directory which
contains datasets used for statistical analysis, particularly with the R language.
MARTINEZ,
a dataset directory which
contains datasets for computational statistics,
including cluster analysis;
MDS,
a dataset directory which
contains datasets for Mdimensional scaling;
SOKAL_ROHLF,
a dataset directory which
contains biological datasets considered by Sokal and Rohlf.
STATS,
a dataset directory which
contains datasets for computational statistics;
Reference:

I Barrodale, F Roberts,
Algorithm 552:
Solution of the Constrained L1 Approximation Problem,
ACM Transactions on Mathematical Software,
Volume 6, Number 2, pages 231235, 1980.

Richard Gunst, Robert Mason,
Regression Analysis and Its Applications: a dataoriented approach,
Dekker, 1980,
ISBN: 0824769937,
LC: QA278.2.G85.

David Kahaner, Cleve Moler, Steven Nash,
Numerical Methods and Software,
Prentice Hall, 1989,
ISBN: 0136272584,
LC: TA345.K34.

Helmuth Spaeth,
Mathematical Algorithms for Linear Regression,
Academic Press, 1991,
ISBN: 0126564604.
Datasets:

x01.txt, brain and body weight, 62 rows,
3 columns;

x02.txt, height, weight, catheter length,
12 rows, 4 columns;

x03.txt, age, blood pressure, 30 rows,
4 columns;

x04.txt, catalog print run versus orders,
38 rows, 4 columns;

x05.txt, catalog print run versus orders,
38 rows, 5 columns;

x06.txt, age, water temperature, length
of fish, 44 rows, 4 columns;

x07.txt, retardation, doctor distrust,
degree of illness, 53 rows, 4 columns;

x08.txt, poverty, unemployment, murder
rate, 20 rows, 5 columns;

x09.txt, age, weight, blood fat,
25 rows, 5 columns;

x10.txt,factory operation parameters,
percent of unprocessed ammonia, 21 rows, 5 columns;

x11.txt, pasturage properties and price,
67 rows, 5 columns;

x12.txt, electrical utility data,
16 rows, 6 columns;

x13.txt, production, imports, and
consumption data, 18 rows, 7 columns;

x14.txt, gas tank temperature and
pressure, 32 rows, 6 columns;

x15.txt, gas consumption versus
local conditions, 48 rows, 6 columns;

x16.txt, gas consumption versus
local conditions, 48 rows, 7 columns;

x17.txt, octane rating versus
raw materials, 82 rows, 6 columns;

x18.txt, octane rating versus
raw materials, 82 rows, 7 columns;

x19.txt, livestock market expenses,
19 rows, 7 columns;

x20.txt, population and drinking data,
46 rows, 7 columns;

x21.txt, economic and employment data,
16 rows, 8 columns;

x22.txt, economic and employment data,
16 rows, 9 columns;

x23.txt, office worker satisfaction,
30 rows, 8 columns;

x24.txt, office worker satisfaction,
30 rows, 9 columns;

x25.txt, ground evaporation versus
conditions, 25 rows, 9 columns;

x26.txt, selling price of houses,
28 rows, 13 columns;

x27.txt, selling price of houses,
28 rows, 14 columns;

x28.txt, the death rate as a function
of other variables, 60 rows, 17 columns;
More data files you may copy, involving overdetermined linear
systems, include:

x29.txt, points in the plane, 9 rows,
4 columns;

x30.txt, a linear system, 4 rows, 5 columns;

x31.txt, a linear system, 10 rows, 5 columns;

x32.txt, a linear system, 13 rows, 5 columns;

x33.txt, a linear system, 96 rows, 6 columns;

x34.txt, a linear system, 20 rows, 7 columns;

x35.txt, a linear system, 30 rows, 7 columns;

x36.txt, a linear system, 6 rows, 7 columns;

x37.txt, a linear system, 6 rows, 7 columns;

x38.txt, a linear system, 6 rows, 7 columns;

x39.txt, a linear system, 6 rows, 7 columns;

x40.txt, a linear system, 6 rows, 7 columns;

x41.txt, a linear system, 6 rows, 7 columns;

x42.txt, a linear system, 16 rows, 11 columns;

x60.txt, a linear system, 3 rows, 5 columns;
More data files you may copy, involving overdetermined linear systems with
equality and inequality constraints, include:

x43.txt, a linear system made up of
3 subsystems, 12 rows, 2 columns;

x43_01.txt, subsystem #1,
4 rows, 2 columns;

x43_02.txt, subsystem #2,
5 rows, 2 columns;

x43_03.txt, subsystem #3,
3 rows, 2 columns;

x44.txt, a linear system made up of
3 subsystems, 12 rows, 2 columns;

x44_01.txt, subsystem #1,
4 rows, 2 columns;

x44_02.txt, subsystem #2,
5 rows, 2 columns;

x44_03.txt, subsystem #3,
3 rows, 2 columns;

x45.txt, a linear system made up of
3 subsystems, 12 rows, 2 columns;

x45_01.txt, subsystem #1,
4 rows, 2 columns;

x45_02.txt, subsystem #2,
5 rows, 2 columns;

x45_03.txt, subsystem #3,
3 rows, 2 columns;

x46.txt, a linear system, 3 rows, 2 columns;

x47.txt, a system made up of
a linear system, equality and inequality constraints,
5 rows, 2 columns;

x47_01.txt, the linear system,
3 rows, 2 columns;

x47_02.txt, the equality constraints,
0 rows, 2 columns;

x47_03.txt, the inequality constraints,
2 rows, 2 columns;

x48.txt, a system made up of
a linear system, equality and inequality constraints,
5 rows, 2 columns;

x48_01.txt, the linear system,
3 rows, 2 columns;

x48_02.txt, the equality constraints,
0 rows, 2 columns;

x48_03.txt, the inequality constraints,
2 rows, 2 columns;

x49.txt, a system made up of
a linear system, equality and inequality constraints,
4 rows, 2 columns;

x49_01.txt, the linear system,
3 rows, 2 columns;

x49_02.txt, the equality constraints,
1 rows, 2 columns;

x49_03.txt, the inequality constraints,
0 rows, 2 columns;

x50.txt, a system made up of
a linear system, equality and inequality constraints,
4 rows, 2 columns;

x50_01.txt, the linear system,
3 rows, 2 columns;

x50_02.txt, the equality constraints,
1 rows, 2 columns;

x50_03.txt, the inequality constraints,
0 rows, 2 columns;

x51.txt, a system made up of
a linear system, equality and inequality constraints,
6 rows, 2 columns;

x51_01.txt, the linear system,
3 rows, 2 columns;

x51_02.txt, the equality constraints,
1 rows, 2 columns;

x51_03.txt, the inequality constraints,
2 rows, 2 columns;

x52.txt, a system made up of
a linear system, equality and inequality constraints,
6 rows, 2 columns;

x52_01.txt, the linear system,
3 rows, 2 columns;

x52_02.txt, the equality constraints,
1 rows, 2 columns;

x52_03.txt, the inequality constraints,
2 rows, 2 columns;

x53.txt, a system made up of
a linear system, equality and inequality constraints,
6 rows, 2 columns;

x53_01.txt, the linear system,
3 rows, 2 columns;

x53_02.txt, the equality constraints,
1 rows, 2 columns;

x53_03.txt, the inequality constraints,
2 rows, 2 columns;

x54.txt, a system made up of
a linear system, equality and inequality constraints,
13 rows, 5 columns;

x54_01.txt, the linear system,
8 rows, 5 columns;

x54_02.txt, the equality constraints,
3 rows, 5 columns;

x54_03.txt, the inequality constraints,
2 rows, 5 columns;

x55.txt, a system made up of
a linear system, equality and inequality constraints,
14 rows, 7 columns;

x55_01.txt, the linear system,
9 rows, 7 columns;

x55_02.txt, the equality constraints,
0 rows, 7 columns;

x55_03.txt, the inequality constraints,
5 rows, 7 columns;

x56.txt, a system made up of
a linear system, equality and inequality constraints,
11 rows, 5 columns;

x56_01.txt, the linear system,
6 rows, 5 columns;

x56_02.txt, the equality constraints,
0 rows, 5 columns;

x56_03.txt, the inequality constraints,
5 rows, 5 columns;

x57.txt, a system made up of
a linear system, equality and inequality constraints,
11 rows, 5 columns;

x57_01.txt, the linear system,
6 rows, 5 columns;

x57_02.txt, the equality constraints,
0 rows, 5 columns;

x57_03.txt, the inequality constraints,
5 rows, 5 columns;

x58.txt, a system made up of
a linear system, equality and inequality constraints,
5 rows, 3 columns;

x58_01.txt, the linear system,
3 rows, 3 columns;

x58_02.txt, the equality constraints,
0 rows, 3 columns;

x58_03.txt, the inequality constraints,
2 rows, 3 columns;

x59.txt, a system made up of
a linear system, equality and inequality constraints,
6 rows, 2 columns;

x59_01.txt, the linear system,
3 rows, 2 columns;

x59_02.txt, the equality constraints,
1 rows, 2 columns;

x59_03.txt, the inequality constraints,
2 rows, 2 columns;

x61.txt, a system made up of
a linear system, equality and inequality constraints,
13 rows, 7 columns;

x61_01.txt, the linear system,
8 rows, 7 columns;

x61_02.txt, the equality constraints,
3 rows, 7 columns;

x61_03.txt, the inequality constraints,
2 rows, 7 columns;
Miscellaneous data files:

x62.txt, 12 measurements of dye concentration
in a liquid over time, 12 rows, 3 columns;
You can go up one level to
the DATASETS directory.
Last revised on 15 July 2011.