A Histogram of Discrete Data

**HISTOGRAM_DISCRETE**
is a MATLAB program which
creates a histogram of discrete data.

The motivation of this study was to explore whether a histogram could be made without artificially imposing bins; instead, the original data would be kept and handled directly.

An attempt was made to come up with a sensible formulation of the probability density function (PDF) for data in the range [X_MIN,X_MAX]. Instead of what amounts to a Dirac delta function, or a discrete bin function, a piecewise linear function was devised that tried to emulate the distribution of probability density along the interval. A corresponding piecewise quadratic cumulative density function (CDF) was also constructed.

Because we don't use bins, we can plot the PDF and CDF at any resolution we like. When we do so, the CDF looks reasonable, but the PDF does not look so good; a particularly unpleasant example occurs for the Gaussian test. Now we see that binning has the advantage of smoothing out the PDF data, whereas our piecewise linear function is too jumpy. Presumably, one way to cure this would allow us to retain the piecewise linear PDF, while essentially binning the data dynamically, according to the resolution of the requested plot. This, of course, is work to be pursued in the vague future.

The computer code and data files described and made available on this web page are distributed under the GNU LGPL license.

**HISTOGRAM_DISCRETE** is available in
a MATLAB version.

HISTOGRAM_DISPLAY, a MATLAB program which makes a bar plot of a set of data stored as columns in a file; the first column is the X values, and all the other columns are Y values to be shown as a stack of bars;

PWC_PLOT, a MATLAB library which converts the definition of a piecewise constant function into plottable data.

- cdf_discrete_value.m, evaluates the discrete cdf.
- pdf_discrete_value.m, evaluates the discrete pdf.
- r8vec_print.m, prints an R8VEC.
- r8vec2_print.m, prints two R8VEC's.
- r8vec3_print.m, prints three R8VEC's.
- setup_discrete.m, takes a vector of sample data and prepares it for use as a discrete histogram.
- timestamp.m, prints the current YMDHMS date as a timestamp.

- histogram_discrete_test.m, calls all the tests.
- histogram_discrete_test_output.txt, the output file.
- bigger_test.m, tests pdf_discrete_value() on a set of data whose size is input.
- gaussian_test.m, looks at a set of data from a Gaussian distribution. Here, we see that the PDF plot looks very ragged, although the CDF is just fine. If is clear that the PDF data needs to be smoothed in some way, which is what binning does, for instance.
- gaussian_cdf_discrete_test.png, a plot of the CDF.
- gaussian_pdf_discrete_test.png, a plot of the PDF.
- cdf_discrete_test.m, tests cdf_discrete_value() on a small set of data.
- cdf_discrete_test.png, a plot of the CDF.
- pdf_discrete_test.m, tests pdf_discrete_value() on a small set of data.
- pdf_discrete_test.png, a plot of the PDF.
- setup_discrete_test.m, tests setup_discrete().

You can go up one level to the MATLAB source codes.