\section{Example: The Pizza Truck Problem}

The town of Grayville has three pizza trucks, which are painted red,
green, and blue.  By tradition, every house in Grayville has always
ordered from the same truck, which then sends the delivery by scooter. 
\vskip 0.1in
\begin{center}
\begin{tikzpicture} [ initial_style/.style={rectangle,draw=green,fill=green!10,thick }, scale = { 0.25 } ]
  \node (A) at ( 3,28) {\includegraphics[width=0.5in]{red_house.png} };
  \node (B) at ( 9,16) {\includegraphics[width=0.5in]{red_house.png} };
  \node (C) at (12,26) {\includegraphics[width=0.5in]{red_house.png} };
  \node (D) at ( 4, 4) {\includegraphics[width=0.5in]{blue_house.png} };
  \node (E) at (18,30) {\includegraphics[width=0.5in]{green_house.png} };
  \node (F) at (16,13) {\includegraphics[width=0.5in]{green_house.png} };
  \node (G) at (23,26) {\includegraphics[width=0.5in]{green_house.png} };
  \node (H) at (24, 2) {\includegraphics[width=0.5in]{blue_house.png} };
  \node (I) at (26,10) {\includegraphics[width=0.5in]{blue_house.png} };
  \node (J) at (28,18) {\includegraphics[width=0.5in]{red_house.png} };
  \node (K) at (33,30) {\includegraphics[width=0.5in]{green_house.png} };
  \node (L) at (33,10) {\includegraphics[width=0.5in]{green_house.png} };
  \node (M) at (40,24) {\includegraphics[width=0.5in]{green_house.png} };
  \node (N) at (36, 6) {\includegraphics[width=0.5in]{blue_house.png} };
  \node (O) at (15,20) {\includegraphics[width=0.5in]{blue_house.png} };

  \node (P) at (11,15) {\includegraphics[width=0.75in]{red_pizza.png} };
  \node (Q) at (20,25) {\includegraphics[width=0.75in]{green_pizza.png} };
  \node (R) at (20, 5) {\includegraphics[width=0.75in]{blue_pizza.png} };

  \draw [-,very thick,red]   (A) -- (P);
  \draw [-,very thick,red]   (B) -- (P);
  \draw [-,very thick,red]   (C) -- (P);
  \draw [-,very thick,blue]  (D) -- (R);
  \draw [-,very thick,green] (E) -- (Q);
  \draw [-,very thick,green] (F) -- (Q);
  \draw [-,very thick,green] (G) -- (Q);
  \draw [-,very thick,blue]  (H) -- (R);
  \draw [-,very thick,blue]  (I) -- (R);
  \draw [-,very thick,red]   (J) -- (P);
  \draw [-,very thick,green] (K) -- (Q);
  \draw [-,very thick,green] (L) -- (Q);
  \draw [-,very thick,green] (M) -- (Q);
  \draw [-,very thick,blue]  (N) -- (R);
  \draw [-,very thick,blue]  (O) -- (R);
\end{tikzpicture}
\end{center}

The price of gas has risen, and the owner of the pizza trucks asks
a consultant if there is a way to save money.
\begin{packed_enumerate}
  \item{ Each house should be served by the nearest pizza truck.
{\it{Assign each data item to the nearest centroid)}}
The owner is impressed by this change, which lowers the monthly 
gas bill.  But is that as good as we can do?  It turns out that
in this new system, the trucks are not well placed.}
  \item{Each truck should be moved to the center of its service area.
{\it{(Replace each centroid by the average of its data items.)}}
That has got to be it, says the owner.  No, because when you moved
the trucks, you actually made some houses slightly closer to a different
truck than the one they had been assigned.}
  \item{Unless things settled down, go back to step \#1}
\end{packed_enumerate}
\begin{center}
\begin{tikzpicture} [ initial_style/.style={rectangle,draw=green,fill=green!10,thick }, scale = { 0.25 } ]
  \node (A) at ( 3,28) {\includegraphics[width=0.5in]{red_house.png} };
  \node (B) at ( 9,16) {\includegraphics[width=0.5in]{red_house.png} };
  \node (C) at (12,26) {\includegraphics[width=0.5in]{red_house.png} };
  \node (D) at ( 4, 4) {\includegraphics[width=0.5in]{blue_house.png} };
  \node (E) at (18,30) {\includegraphics[width=0.5in]{red_house.png} };
  \node (F) at (16,13) {\includegraphics[width=0.5in]{blue_house.png} };
  \node (G) at (23,26) {\includegraphics[width=0.5in]{green_house.png} };
  \node (H) at (24, 2) {\includegraphics[width=0.5in]{blue_house.png} };
  \node (I) at (26,10) {\includegraphics[width=0.5in]{blue_house.png} };
  \node (J) at (28,18) {\includegraphics[width=0.5in]{green_house.png} };
  \node (K) at (33,30) {\includegraphics[width=0.5in]{green_house.png} };
  \node (L) at (33,10) {\includegraphics[width=0.5in]{blue_house.png} };
  \node (M) at (40,24) {\includegraphics[width=0.5in]{green_house.png} };
  \node (N) at (36, 6) {\includegraphics[width=0.5in]{blue_house.png} };
  \node (O) at (15,20) {\includegraphics[width=0.5in]{red_house.png} };

  \node (P) at (11.4, 24.0) {\includegraphics[width=0.75in]{red_pizza.png} };
  \node (Q) at (31.0, 24.5) {\includegraphics[width=0.75in]{green_pizza.png} };
  \node (R) at (23.1,  7.5) {\includegraphics[width=0.75in]{blue_pizza.png} };

  \draw [-,very thick,red]   (A) -- (P);
  \draw [-,very thick,red]   (B) -- (P);
  \draw [-,very thick,red]   (C) -- (P);
  \draw [-,very thick,blue]  (D) -- (R);
  \draw [-,very thick,green] (E) -- (P);
  \draw [-,very thick,blue]  (F) -- (R);
  \draw [-,very thick,green] (G) -- (Q);
  \draw [-,very thick,blue]  (H) -- (R);
  \draw [-,very thick,blue]  (I) -- (R);
  \draw [-,very thick,green] (J) -- (Q);
  \draw [-,very thick,green] (K) -- (Q);
  \draw [-,very thick,blue]  (L) -- (R);
  \draw [-,very thick,green] (M) -- (Q);
  \draw [-,very thick,blue]  (N) -- (R);
  \draw [-,very thick,red]   (O) -- (P);
\end{tikzpicture}
\end{center}
This simple example suggests how k-means clustering can reorganize data
so that it is grouped more tightly.  In this case, the regrouping simply
reduces the total travel cost of the deliverers.  In other cases, we
expect that the grouping may reflect some meaningful fact about the data.

\section{Example: Coding the Pizza Truck Problem}

Assume that the arrays {\tt{x}} and {\tt{y}} contain the coordinates 
of each house, that {\tt{s}} and {\tt{t}} contain the coordinates
of each truck, and that {\tt{rc}}, {\tt{gc}} and {\tt{bc}} list
the houses served by the red, green, and blue trucks respectively.
\vskip 0.1in
Our first improvement is to assign each house to the nearest truck.
To do this, we need to compute the distance of each house to each truck,
and update the assignment vectors.  We can also compute the current cost.
\begin{lstlisting}
  rd = np.sqrt ( ( x - s[0] )**2 + ( y - t[0] )**2 )
  gd = np.sqrt ( ( x - s[1] )**2 + ( y - t[1] )**2 )
  bd = np.sqrt ( ( x - s[2] )**2 + ( y - t[2] )**2 )

  rc = np.where ( ( rd < bd ) & ( rd < gd ) )
  gc = np.where ( ( gd < bd ) & ( gd < rd ) )
  bc = np.where ( ( bd < rd ) & ( bd < gd ) )

  cost = sum ( bd[bc] ) + sum ( rd[rc] ) + sum ( gd[gc] )
\end{lstlisting}
\vskip 0.1in
Because we have reassigned some houses, it makes sense to move
each truck to the center of its set of houses.  We just have to
average all the coordinates:
\begin{lstlisting}
  s[0] = np.mean ( x[rc] )
  t[0] = np.mean ( y[rc] )
  s[1] = np.mean ( x[gc] )
  t[1] = np.mean ( y[gc] )
  s[2] = np.mean ( x[bc] )
  t[2] = np.mean ( y[bc] )
\end{lstlisting}
Because the trucks have moved, we need to recompute the distances
and update the cost.
\vskip 0.1in
These two steps of reassigning houses and moving trucks are repeated
until no house has to be reassigned, or the cost stops changing.
\vskip 0.1in
For the example problem in the illustration, here is how the cost changes:
\begin{lstlisting}
0:  181.49  Initial
1:  146.29  Reassign houses
2:  117.08  Move trucks, reassign houses
3:  116.73  Move trucks, reassign houses
4:  116.73  Move trucks, reassign houses, NO CHANGE
\end{lstlisting}
\vskip 0.1in
You can examine a simple code for this problem in the file 
{\it{pizza\_kmeans.py}}.