REGRESSION
Linear Regression Datasets
REGRESSION
is a dataset directory which
contains test data for
linear regression.
The simplest kind of linear regression involves taking a
set of data (xi,yi), and trying
to determine the "best" linear relationship
y = a * x + b
Commonly, we look at the vector of errors:
ei = yi - a * xi - b
and look for values (a,b) that minimize the L1,
L2 or L-infinity norm of the errors. For problems
involving multivariate sets of data, the number a
becomes a matrix, and b a vector, but the idea
is similar.
The data files have a simple format:
-
initial comment lines, each beginning with a "#".
-
the number of columns of data;
-
the number of rows of data;
-
for each column of data, a line containing a column label;
the first column is always "Index" and counts the rows;
if there is a column labeled "A0" it usually contains the
value 1.0;
-
each row of data, on a separate line, with data separated by spaces.
There are also some extended examples, which involve an M by N linear
system, a set of linear constraints to be solved exactly, and
a set of linear inequalities. In that case, a master file
lists the sizes of the three sets of data, and the name of
the first file, which contains the linear system.
Licensing:
The computer code and data files described and made available on this web page
are distributed under
the GNU LGPL license.
Related Data and Programs:
HARTIGAN,
a dataset directory which
contains datasets for testing clustering algorithms;
ISWR,
a dataset directory which
contains datasets used for statistical analysis, particularly with the R language.
MARTINEZ,
a dataset directory which
contains datasets for computational statistics,
including cluster analysis;
MDS,
a dataset directory which
contains datasets for M-dimensional scaling;
SOKAL_ROHLF,
a dataset directory which
contains biological datasets considered by Sokal and Rohlf.
STATS,
a dataset directory which
contains datasets for computational statistics;
Reference:
-
I Barrodale, F Roberts,
Algorithm 552:
Solution of the Constrained L1 Approximation Problem,
ACM Transactions on Mathematical Software,
Volume 6, Number 2, pages 231-235, 1980.
-
Richard Gunst, Robert Mason,
Regression Analysis and Its Applications: a data-oriented approach,
Dekker, 1980,
ISBN: 0824769937,
LC: QA278.2.G85.
-
David Kahaner, Cleve Moler, Steven Nash,
Numerical Methods and Software,
Prentice Hall, 1989,
ISBN: 0-13-627258-4,
LC: TA345.K34.
-
Helmuth Spaeth,
Mathematical Algorithms for Linear Regression,
Academic Press, 1991,
ISBN: 0-12-656460-4.
Datasets:
-
x01.txt, brain and body weight, 62 rows,
3 columns;
-
x02.txt, height, weight, catheter length,
12 rows, 4 columns;
-
x03.txt, age, blood pressure, 30 rows,
4 columns;
-
x04.txt, catalog print run versus orders,
38 rows, 4 columns;
-
x05.txt, catalog print run versus orders,
38 rows, 5 columns;
-
x06.txt, age, water temperature, length
of fish, 44 rows, 4 columns;
-
x07.txt, retardation, doctor distrust,
degree of illness, 53 rows, 4 columns;
-
x08.txt, poverty, unemployment, murder
rate, 20 rows, 5 columns;
-
x09.txt, age, weight, blood fat,
25 rows, 5 columns;
-
x10.txt,factory operation parameters,
percent of unprocessed ammonia, 21 rows, 5 columns;
-
x11.txt, pasturage properties and price,
67 rows, 5 columns;
-
x12.txt, electrical utility data,
16 rows, 6 columns;
-
x13.txt, production, imports, and
consumption data, 18 rows, 7 columns;
-
x14.txt, gas tank temperature and
pressure, 32 rows, 6 columns;
-
x15.txt, gas consumption versus
local conditions, 48 rows, 6 columns;
-
x16.txt, gas consumption versus
local conditions, 48 rows, 7 columns;
-
x17.txt, octane rating versus
raw materials, 82 rows, 6 columns;
-
x18.txt, octane rating versus
raw materials, 82 rows, 7 columns;
-
x19.txt, livestock market expenses,
19 rows, 7 columns;
-
x20.txt, population and drinking data,
46 rows, 7 columns;
-
x21.txt, economic and employment data,
16 rows, 8 columns;
-
x22.txt, economic and employment data,
16 rows, 9 columns;
-
x23.txt, office worker satisfaction,
30 rows, 8 columns;
-
x24.txt, office worker satisfaction,
30 rows, 9 columns;
-
x25.txt, ground evaporation versus
conditions, 25 rows, 9 columns;
-
x26.txt, selling price of houses,
28 rows, 13 columns;
-
x27.txt, selling price of houses,
28 rows, 14 columns;
-
x28.txt, the death rate as a function
of other variables, 60 rows, 17 columns;
More data files you may copy, involving overdetermined linear
systems, include:
-
x29.txt, points in the plane, 9 rows,
4 columns;
-
x30.txt, a linear system, 4 rows, 5 columns;
-
x31.txt, a linear system, 10 rows, 5 columns;
-
x32.txt, a linear system, 13 rows, 5 columns;
-
x33.txt, a linear system, 96 rows, 6 columns;
-
x34.txt, a linear system, 20 rows, 7 columns;
-
x35.txt, a linear system, 30 rows, 7 columns;
-
x36.txt, a linear system, 6 rows, 7 columns;
-
x37.txt, a linear system, 6 rows, 7 columns;
-
x38.txt, a linear system, 6 rows, 7 columns;
-
x39.txt, a linear system, 6 rows, 7 columns;
-
x40.txt, a linear system, 6 rows, 7 columns;
-
x41.txt, a linear system, 6 rows, 7 columns;
-
x42.txt, a linear system, 16 rows, 11 columns;
-
x60.txt, a linear system, 3 rows, 5 columns;
More data files you may copy, involving overdetermined linear systems with
equality and inequality constraints, include:
-
x43.txt, a linear system made up of
3 subsystems, 12 rows, 2 columns;
-
x43_01.txt, subsystem #1,
4 rows, 2 columns;
-
x43_02.txt, subsystem #2,
5 rows, 2 columns;
-
x43_03.txt, subsystem #3,
3 rows, 2 columns;
-
x44.txt, a linear system made up of
3 subsystems, 12 rows, 2 columns;
-
x44_01.txt, subsystem #1,
4 rows, 2 columns;
-
x44_02.txt, subsystem #2,
5 rows, 2 columns;
-
x44_03.txt, subsystem #3,
3 rows, 2 columns;
-
x45.txt, a linear system made up of
3 subsystems, 12 rows, 2 columns;
-
x45_01.txt, subsystem #1,
4 rows, 2 columns;
-
x45_02.txt, subsystem #2,
5 rows, 2 columns;
-
x45_03.txt, subsystem #3,
3 rows, 2 columns;
-
x46.txt, a linear system, 3 rows, 2 columns;
-
x47.txt, a system made up of
a linear system, equality and inequality constraints,
5 rows, 2 columns;
-
x47_01.txt, the linear system,
3 rows, 2 columns;
-
x47_02.txt, the equality constraints,
0 rows, 2 columns;
-
x47_03.txt, the inequality constraints,
2 rows, 2 columns;
-
x48.txt, a system made up of
a linear system, equality and inequality constraints,
5 rows, 2 columns;
-
x48_01.txt, the linear system,
3 rows, 2 columns;
-
x48_02.txt, the equality constraints,
0 rows, 2 columns;
-
x48_03.txt, the inequality constraints,
2 rows, 2 columns;
-
x49.txt, a system made up of
a linear system, equality and inequality constraints,
4 rows, 2 columns;
-
x49_01.txt, the linear system,
3 rows, 2 columns;
-
x49_02.txt, the equality constraints,
1 rows, 2 columns;
-
x49_03.txt, the inequality constraints,
0 rows, 2 columns;
-
x50.txt, a system made up of
a linear system, equality and inequality constraints,
4 rows, 2 columns;
-
x50_01.txt, the linear system,
3 rows, 2 columns;
-
x50_02.txt, the equality constraints,
1 rows, 2 columns;
-
x50_03.txt, the inequality constraints,
0 rows, 2 columns;
-
x51.txt, a system made up of
a linear system, equality and inequality constraints,
6 rows, 2 columns;
-
x51_01.txt, the linear system,
3 rows, 2 columns;
-
x51_02.txt, the equality constraints,
1 rows, 2 columns;
-
x51_03.txt, the inequality constraints,
2 rows, 2 columns;
-
x52.txt, a system made up of
a linear system, equality and inequality constraints,
6 rows, 2 columns;
-
x52_01.txt, the linear system,
3 rows, 2 columns;
-
x52_02.txt, the equality constraints,
1 rows, 2 columns;
-
x52_03.txt, the inequality constraints,
2 rows, 2 columns;
-
x53.txt, a system made up of
a linear system, equality and inequality constraints,
6 rows, 2 columns;
-
x53_01.txt, the linear system,
3 rows, 2 columns;
-
x53_02.txt, the equality constraints,
1 rows, 2 columns;
-
x53_03.txt, the inequality constraints,
2 rows, 2 columns;
-
x54.txt, a system made up of
a linear system, equality and inequality constraints,
13 rows, 5 columns;
-
x54_01.txt, the linear system,
8 rows, 5 columns;
-
x54_02.txt, the equality constraints,
3 rows, 5 columns;
-
x54_03.txt, the inequality constraints,
2 rows, 5 columns;
-
x55.txt, a system made up of
a linear system, equality and inequality constraints,
14 rows, 7 columns;
-
x55_01.txt, the linear system,
9 rows, 7 columns;
-
x55_02.txt, the equality constraints,
0 rows, 7 columns;
-
x55_03.txt, the inequality constraints,
5 rows, 7 columns;
-
x56.txt, a system made up of
a linear system, equality and inequality constraints,
11 rows, 5 columns;
-
x56_01.txt, the linear system,
6 rows, 5 columns;
-
x56_02.txt, the equality constraints,
0 rows, 5 columns;
-
x56_03.txt, the inequality constraints,
5 rows, 5 columns;
-
x57.txt, a system made up of
a linear system, equality and inequality constraints,
11 rows, 5 columns;
-
x57_01.txt, the linear system,
6 rows, 5 columns;
-
x57_02.txt, the equality constraints,
0 rows, 5 columns;
-
x57_03.txt, the inequality constraints,
5 rows, 5 columns;
-
x58.txt, a system made up of
a linear system, equality and inequality constraints,
5 rows, 3 columns;
-
x58_01.txt, the linear system,
3 rows, 3 columns;
-
x58_02.txt, the equality constraints,
0 rows, 3 columns;
-
x58_03.txt, the inequality constraints,
2 rows, 3 columns;
-
x59.txt, a system made up of
a linear system, equality and inequality constraints,
6 rows, 2 columns;
-
x59_01.txt, the linear system,
3 rows, 2 columns;
-
x59_02.txt, the equality constraints,
1 rows, 2 columns;
-
x59_03.txt, the inequality constraints,
2 rows, 2 columns;
-
x61.txt, a system made up of
a linear system, equality and inequality constraints,
13 rows, 7 columns;
-
x61_01.txt, the linear system,
8 rows, 7 columns;
-
x61_02.txt, the equality constraints,
3 rows, 7 columns;
-
x61_03.txt, the inequality constraints,
2 rows, 7 columns;
Miscellaneous data files:
-
x62.txt, 12 measurements of dye concentration
in a liquid over time, 12 rows, 3 columns;
You can go up one level to
the DATASETS directory.
Last revised on 15 July 2011.