The Average Puzzle
Solution


  1. The minimizing point is the median, if there are an odd number of data points, and any point between x(n/2) and x((n+1)/2) if there are an even number of points. To see this, consider pairing the low and high terms in the sum. For example, if there were six terms, we'd pair them thus:
                |x6 - x*| + |x* - x1|
                |x5 - x*| + |x* - x2|
                |x4 - x*| + |x* - x3|
              
    Now, if, and only if, x* is greater than the low numbers, and smaller than the high ones, we can drop the absolute value signs and get that the sum is (x6-x1) + (x5-x2) + (x4-x3). A little reflection will convince you that you can't do better than this.
  2. The minimizing point is the average. This can be shown by calculus. Set Total(x) to the sum of the squares of distances from each data point to x, and differentiate and set to zero.
  3. The minimizing point is the average of the maximum and minimum values of the data.

Now, suppose we have the same problem, but in two dimensions! That is, suppose we've scattered N points in the plane, (xi,yi), and your job is to find points (x*,y*) that minimize each of the three quantities considered for the one dimensional case. Can you do it?

Before we proceed, note that there are two different ways of extending the problem to 2D. For instance, in the first problem, do we want to minimize the sum of the absolute values of the coordinate differences:

        |xi-x*| + |yi-y*| 
      
or the sum of the distances:
        sqrt ( (xi-x*)2 + (yi-y*)2).
      

If you want to minimize the coordinate differences quantity, then the problem naturally breaks up into two copies of the 1D problem; minimize the x quanitities with x* and the y quantities with y*, so we would just compute the medians. The distance minimization version of the first problem is more interesting.

The third problem also becomes uninteresting if we try to minimize the maximum value of the coordinate differences, but has more meat on it if we minimize the distance.

Therefore, for interesting 2D problems, we ask you, for an arbitrary set of data, to find values (x*,y*) that will minimize

  1. The sum of sqrt ( (xi-x*)2 + (yi-y*)2).
  2. The sum of ( (xi-x*)2 + (yi-y*)2).
  3. The maximum of ( (xi-x*)2 + (yi-y*)2).

For the second quantity, the calculus trick works the same in 2D (and in ND), so the average will always minimize the sum of the squares of the differences.

For the third quantity, it looks like you need to define a "width" for the set of points, that is, the maximum distance between two parallel planes that just contain the points. Once you find the width, you can sensibly place your "average".

You can go back to the main puzzle page.


Last revised on 08 September 2005.