REGRESSION
Linear Regression Datasets


REGRESSION is a dataset directory which contains test data for linear regression.

The simplest kind of linear regression involves taking a set of data (xi,yi), and trying to determine the "best" linear relationship


        y = a * x + b
      
Commonly, we look at the vector of errors:

        ei = yi - a * xi - b
      
and look for values (a,b) that minimize the L1, L2 or L-infinity norm of the errors. For problems involving multivariate sets of data, the number a becomes a matrix, and b a vector, but the idea is similar.

The data files have a simple format:

There are also some extended examples, which involve an M by N linear system, a set of linear constraints to be solved exactly, and a set of linear inequalities. In that case, a master file lists the sizes of the three sets of data, and the name of the first file, which contains the linear system.

Licensing:

The computer code and data files described and made available on this web page are distributed under the GNU LGPL license.

Related Data and Programs:

HARTIGAN, a dataset directory which contains datasets for testing clustering algorithms;

ISWR, a dataset directory which contains datasets used for statistical analysis, particularly with the R language.

MARTINEZ, a dataset directory which contains datasets for computational statistics, including cluster analysis;

MDS, a dataset directory which contains datasets for M-dimensional scaling;

SOKAL_ROHLF, a dataset directory which contains biological datasets considered by Sokal and Rohlf.

STATS, a dataset directory which contains datasets for computational statistics;

Reference:

  1. I Barrodale, F Roberts,
    Algorithm 552: Solution of the Constrained L1 Approximation Problem,
    ACM Transactions on Mathematical Software,
    Volume 6, Number 2, pages 231-235, 1980.
  2. Richard Gunst, Robert Mason,
    Regression Analysis and Its Applications: a data-oriented approach,
    Dekker, 1980,
    ISBN: 0824769937,
    LC: QA278.2.G85.
  3. David Kahaner, Cleve Moler, Steven Nash,
    Numerical Methods and Software,
    Prentice Hall, 1989,
    ISBN: 0-13-627258-4,
    LC: TA345.K34.
  4. Helmuth Spaeth,
    Mathematical Algorithms for Linear Regression,
    Academic Press, 1991,
    ISBN: 0-12-656460-4.

Datasets:

More data files you may copy, involving overdetermined linear systems, include:

More data files you may copy, involving overdetermined linear systems with equality and inequality constraints, include:

Miscellaneous data files:

You can go up one level to the DATASETS directory.


Last revised on 15 July 2011.