REGRESSION
Linear Regression Datasets


REGRESSION is a dataset directory which contains test data for linear regression.

The simplest kind of linear regression involves taking a set of data (xi,yi), and trying to determine the "best" linear relationship


        y = a * x + b
      
Commonly, we look at the vector of errors:

        ei = yi - a * xi - b
      
and look for values (a,b) that minimize the L1, L2 or L-infinity norm of the errors. For problems involving multivariate sets of data, the number a becomes a matrix, and b a vector, but the idea is similar.

The data files have a simple format:

There are also some extended examples, which involve an M by N linear system, a set of linear constraints to be solved exactly, and a set of linear inequalities. In that case, a master file lists the sizes of the three sets of data, and the name of the first file, which contains the linear system.

Licensing:

The computer code and data files described and made available on this web page are distributed under the GNU LGPL license.

Reference:

  1. Helmuth Spaeth,
    Mathematical Algorithms for Linear Regression,
    Academic Press, 1991,
    ISBN: 0-12-656460-4.
  2. I Barrodale, F Roberts,
    Algorithm 552: Solution of the Constrained L1 Approximation Problem,
    ACM Transactions on Mathematical Software,
    Volume 6, Number 2, pages 231-235, 1980.

Datasets:

Data files you may copy include:

More data files you may copy, involving overdetermined linear systems, include:

More data files you may copy, involving overdetermined linear systems with equality and inequality constraints, include:

You can go up one level to the DATASETS directory.


Last revised on 31 August 2005.