{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "cvt_census.ipynb\n", "\n", "Discussion: Compute Florida congressional districts using census data.\n", "\n", "Licensing: This code is distributed under the GNU LGPL license.\n", " \n", "Modified: 14 November 2016\n", "\n", "Author: John Burkardt, Lukas Bystricky" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using matplotlib backend: agg\n" ] } ], "source": [ "# Import necessary libraries and set plot option\n", "%matplotlib\n", "%matplotlib inline\n", "%config InlineBackend.figure_format = 'svg'\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import math\n", "import scipy.spatial as spatial" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 27 Florida Congressional Districts from Census Data #\n", "\n", "Legally, Florida must be divided up into 27 congressional districts of roughly equal population. This process can be manipulated to favor a particular party or candidate. It is interesting to investigate the shape of districts that might be computed by an approach inspired by CVT's. \n", "\n", "We will look at:\n", "* how to read a file containing usable census data;\n", "* how to set up the PDF and CDF data from the census data;\n", "* how to use the CDF data to randomly select a single person at random in Florida;\n", "* how to sample N persons from Florida, with their latitudes and longitudes;\n", "* how to initialize each congressional district's \"generator\" point;\n", "* how to compute the nearest generator point to each selected person;\n", "* how to carry out a CVT iteration that, on each step, selects N random persons, assigns them to nearest generator, and replaces generator by centroid." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Get Florida Polygon Vertices #\n", "\n", "We will want to display some data on a map of Florida, so once again we need to\n", "read the shape of the Florida polygon. We will save the data in FLA_LON and\n", "FLA_LAT." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0 -81.5000 30.7200\n", " 1 -81.4800 30.6800\n", " 2 -81.4500 30.7000\n", " 3 -81.4300 30.6800\n", " 4 -81.4500 30.5700\n", " 5 -81.4300 30.5200\n", " 6 -81.4700 30.5500\n", " 7 -81.4800 30.6200\n", " 8 -81.5000 30.6200\n", " 9 -81.5000 30.5500\n", "...\n", " 560 -81.7800 30.7700\n", " 561 -81.7300 30.7700\n", " 562 -81.7200 30.7500\n", " 563 -81.6800 30.7300\n", " 564 -81.6500 30.7300\n", " 565 -81.6300 30.7300\n", " 566 -81.6200 30.7200\n", " 567 -81.5500 30.7000\n", " 568 -81.5200 30.7200\n", " 569 -81.5000 30.7200\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Code to read and display longitude and latitude for Florida polygon vertices\n", "\n", "def florida_shape_read ( ):\n", "\n", " import numpy as np\n", "\n", " input = open ( 'florida_shape.txt', 'r' )\n", "\n", " fla_lon = np.zeros ( 570 )\n", " fla_lat = np.zeros ( 570 )\n", "\n", " i = 0\n", " for line in input:\n", "\n", " if ( line[0] == '#' ):\n", " continue\n", " else:\n", " data = line.split ( )\n", " fla_lon[i] = data[0]\n", " fla_lat[i] = data[1]\n", " i = i + 1\n", "\n", " input.close ( )\n", "\n", " return fla_lon, fla_lat\n", "#\n", "# Test the function.\n", "#\n", "fla_lon, fla_lat = florida_shape_read ( )\n", "for i in range ( 0, 10 ):\n", " print ( ' %3d %8.4f %8.4f' % ( i, fla_lon[i], fla_lat[i] ) ) \n", "print ( '...')\n", "for i in range ( 560, 570 ):\n", " print ( ' %3d %8.4f %8.4f' % ( i, fla_lon[i], fla_lat[i] ) ) \n", "\n", "plt.plot ( fla_lon, fla_lat, 'b-' )\n", "plt.axis ( 'Equal')\n", "plt.title ( 'The outline of Florida')" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "# A Census Data File#\n", "\n", "The file \"florida_census.txt\" is available from the web page \n", "http://people.sc.fsu.edu/~jburkardt/classes/urop_2016/florida_census.txt\n", "\n", "For 4,245 locations in Florida, the US Census Bureau recorded an identifying\n", "ID number (which we will ignore), the population, the longitude and latitude.\n", "\n", "Note that this file is not the original data from the Census Bureau website.\n", "Many piece of information were removed to reduce the file size and simplify\n", "the processing step. Also, a number of legitimate locations (especially in\n", "Key West!) fall outside the polygon that we use to represent the shape of Florida.\n", "In order to avoid some confusing results, we set the populations of those\n", "sites to zero!\n", "\n", "Now we need to read this file and make its data available to our Python codes." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " I Pop Lon Lat\n", "\n", " 0 7431 -82.3334 29.6511\n", " 1 3308 -82.3314 29.6680\n", " 2 2770 -82.3301 29.6842\n", " 3 5665 -82.3081 29.6795\n", " 4 4976 -82.3169 29.6568\n", " 5 4350 -82.2885 29.6558\n", " 6 6075 -82.2902 29.6317\n", " 7 2180 -82.3322 29.6390\n", " 8 3630 -82.3335 29.6274\n", " 9 1364 -82.3493 29.6305\n", "...\n", " 4235 2506 -86.0550 30.3097\n", " 4236 7367 -86.3621 30.3869\n", " 4237 0 -86.1947 30.3021\n", " 4238 1651 -85.5683 30.7151\n", " 4239 3335 -85.5911 30.7873\n", " 4240 3228 -85.4889 30.7741\n", " 4241 3624 -85.7855 30.6721\n", " 4242 2282 -85.5749 30.5019\n", " 4243 6615 -85.7626 30.5286\n", " 4244 4161 -85.5266 30.6417\n" ] } ], "source": [ "# Code to read the census data file.\n", "\n", "def florida_census_read ( ):\n", "\n", " import numpy as np\n", "\n", " input = open ( 'florida_census.txt', 'r' )\n", "\n", " m = 4245\n", " cen_pop = np.zeros ( m, dtype = np.int32 )\n", " cen_lon = np.zeros ( m, dtype = np.float64 )\n", " cen_lat = np.zeros ( m, dtype = np.float64 )\n", "\n", " i = 0\n", " for line in input:\n", "\n", " data = line.split ( )\n", " id = data[0]\n", " cen_pop[i] = data[1]\n", " cen_lon[i] = data[2]\n", " cen_lat[i] = data[3]\n", " i = i + 1\n", "\n", " input.close ( )\n", "\n", " return cen_pop, cen_lon, cen_lat\n", "#\n", "# Test the function.\n", "#\n", "cen_pop, cen_lon, cen_lat = florida_census_read ( )\n", "m = len ( cen_pop )\n", "\n", "print ( '' )\n", "print ( ' I Pop Lon Lat' )\n", "print ( '' )\n", "for i in range ( 0, 10 ):\n", " print ( ' %3d %6d %8.4f %8.4f' % ( i, cen_pop[i], cen_lon[i], cen_lat[i] ) ) \n", "print ( '...')\n", "for i in range ( m - 10, m ):\n", " print ( ' %3d %6d %8.4f %8.4f' % ( i, cen_pop[i], cen_lon[i], cen_lat[i] ) ) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Setting up PDF and CDF from Census Data #\n", "\n", "We have seen earlier that we could sample points from a polygon by computing\n", "areas of triangles and creating PDF and CDF files that represented the \n", "probability and cumulative probability functions. The population data can\n", "be used in a similar way.\n", "\n", "Since we have 4,245 locations, the probability of picking a person at a particular\n", "location is simply the population of that location, divided by the total population.\n", "So we need to create an array called PDF, so that\n", "\n", " pdf[i] = cen_pop[i] / sum ( cen_pop[0:4245] ) (Use REAL arithmetic!)\n", " \n", "The CDF array will record the probability of picking a person from location I,\n", "or any earlier index. So\n", "\n", " cdf[0] = pdf[0] for i = 0\n", " cdf[i] = pdf[i] + cdf[i-1] for i > 0\n", "\n", "If things are done correctly, then cdf[4244] should be 1.0, or very close to it." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " I Pop Lon Lat PDF CDF\n", "\n", " 0 7431 -82.3334 29.6511 0.0004105 0.0004\n", " 1 3308 -82.3314 29.6680 0.0001827 0.0006\n", " 2 2770 -82.3301 29.6842 0.000153 0.0007\n", " 3 5665 -82.3081 29.6795 0.0003129 0.0011\n", " 4 4976 -82.3169 29.6568 0.0002749 0.0013\n", " 5 4350 -82.2885 29.6558 0.0002403 0.0016\n", " 6 6075 -82.2902 29.6317 0.0003356 0.0019\n", " 7 2180 -82.3322 29.6390 0.0001204 0.0020\n", " 8 3630 -82.3335 29.6274 0.0002005 0.0022\n", " 9 1364 -82.3493 29.6305 7.534e-05 0.0023\n", "...\n", " 4235 2506 -86.0550 30.3097 0.0001384 0.9982\n", " 4236 7367 -86.3621 30.3869 0.0004069 0.9986\n", " 4237 0 -86.1947 30.3021 0 0.9986\n", " 4238 1651 -85.5683 30.7151 9.119e-05 0.9987\n", " 4239 3335 -85.5911 30.7873 0.0001842 0.9989\n", " 4240 3228 -85.4889 30.7741 0.0001783 0.9991\n", " 4241 3624 -85.7855 30.6721 0.0002002 0.9993\n", " 4242 2282 -85.5749 30.5019 0.000126 0.9994\n", " 4243 6615 -85.7626 30.5286 0.0003654 0.9998\n", " 4244 4161 -85.5266 30.6417 0.0002298 1.0000\n" ] } ], "source": [ "# Code to set up PDF and CDF from Census Data #\n", "\n", "m = len ( cen_pop )\n", "cen_pop_sum = np.sum ( cen_pop )\n", "pdf = cen_pop.copy ( ) / float ( cen_pop_sum )\n", "cdf = np.zeros ( m )\n", "\n", "cdf[0] = pdf[0]\n", "for i in range ( 1, m ):\n", " cdf[i] = pdf[i] + cdf[i-1]\n", "\n", "print ( '' )\n", "print ( ' I Pop Lon Lat PDF CDF' )\n", "print ( '' )\n", "for i in range ( 0, 10 ):\n", " print ( ' %3d %6d %8.4f %8.4f %8.4g %8.4f' % ( i, cen_pop[i], cen_lon[i], cen_lat[i], pdf[i], cdf[i] ) ) \n", "print ( '...')\n", "for i in range ( m - 10, m ):\n", " print ( ' %3d %6d %8.4f %8.4f %8.4g %8.4f' % ( i, cen_pop[i], cen_lon[i], cen_lat[i], pdf[i], cdf[i] ) ) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Selecting Random People #\n", "\n", "The CDF function makes it easy to select N random people from Florida\n", "by location. Any random number R between 0 and 1 can be bracketed between\n", "CDF[I-1] and CDF[I], meaning we should pick a person from location I.\n", "Since we will assume that all the people at location I are at exactly the\n", "same spot, CEN_LON[I], CEN_LAT[I], we know everything we want to about the person now." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Code for selecting N random people\n", "\n", "def florida_sample ( n ):\n", " indx = np.zeros ( n, dtype = np.int32 )\n", " m = len ( cen_lon )\n", " for i in range ( 0, n ):\n", " r = np.random.rand ( )\n", " for j in range ( 0, m ):\n", " if ( r <= cdf[j] ):\n", " indx[i] = j\n", " break\n", "\n", " return indx\n", "#\n", "# Test the code.\n", "#\n", "n = 100\n", "indx = florida_sample ( n )\n", "plt.plot ( fla_lon, fla_lat, 'b-' )\n", "plt.plot ( cen_lon[indx], cen_lat[indx], 'ro' )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Initialize generators for Congressional Districts #\n", "\n", "A CVT iteration needs starting values for its generators.\n", "We have a list of 27 cities, each of which is inside a\n", "particular Congressional district. Use these as initial\n", "values for G.\n", "\n", " Name Longitude Latitude\n", " ------------------ ---------- ---------\n", " Chumuckla -87.237222 30.776389\n", " Tallahassee -84.253333 30.455000\n", " Gainesville -82.324722 29.651944\n", " Callahan -81.830833 30.560833\n", " Jacksonville -81.661389 30.336944\n", " Jacksonville Beach -81.396111 30.284167\n", " Winter Park -81.346667 28.596111\n", " Rockledge -80.732778 28.325000\n", " Kissimmee -81.412778 28.303889\n", " Orlando -81.298889 28.415833\n", " Spring Hill -82.547778 28.478889\n", " Palm Harbor -82.753889 28.083889\n", " Indian Shores -82.843333 27.850556\n", " Tampa -82.476389 27.968056\n", " Lakeland -81.958889 28.041111\n", " Longboat Key -82.644722 27.396944\n", " Okeechobee -80.833333 27.233333\n", " Jupiter -80.105000 26.926111\n", " Bonita Springs -81.790833 26.349722\n", " Miramar -80.282500 25.978889\n", " Boca Raton -80.100000 26.368611\n", " West Palm Beach -80.064167 26.709722\n", " Weston -80.388056 26.107500\n", " Miami Gardens -80.269722 25.941944\n", " Hialeah -80.293889 25.860556\n", " Kendall -80.356667 25.666667\n", " Miami -80.208889 25.775278" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "g27 = np.array ( [ \\\n", " [ -87.237222, 30.776389], \\\n", " [ -84.253333, 30.455000], \\\n", " [ -82.324722, 29.651944], \\\n", " [ -81.830833, 30.560833], \\\n", " [ -81.661389, 30.336944], \\\n", " [ -81.396111, 30.284167], \\\n", " [ -81.346667, 28.596111], \\\n", " [ -80.732778, 28.325000], \\\n", " [ -81.412778, 28.303889], \\\n", " [ -81.298889, 28.415833], \\\n", " [ -82.547778, 28.478889], \\\n", " [ -82.753889, 28.083889], \\\n", " [ -82.843333, 27.850556], \\\n", " [ -82.476389, 27.968056], \\\n", " [ -81.958889, 28.041111], \\\n", " [ -82.644722, 27.396944], \\\n", " [ -80.833333, 27.233333], \\\n", " [ -80.105000, 26.926111], \\\n", " [ -81.790833, 26.349722], \\\n", " [ -80.282500, 25.978889], \\\n", " [ -80.100000, 26.368611], \\\n", " [ -80.064167, 26.709722], \\\n", " [ -80.388056, 26.107500], \\\n", " [ -80.269722, 25.941944], \\\n", " [ -80.293889, 25.860556], \\\n", " [ -80.356667, 25.666667], \\\n", " [ -80.208889, 25.775278] ] )\n", "#\n", "# Test the code.\n", "#\n", "plt.plot ( fla_lon, fla_lat, 'b-' )\n", "plt.plot ( g27[:,0], g27[:,1], 'ro' )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# CVT step for Florida #\n", "\n", "Now we can write the CVT step for Florida.\n", "\n", "Given M current generators G, and using N sample points, we want to\n", "estimate the M centroids C of the corresponding Voronoi regions.\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[-86.86943784 30.48801892]\n", " [-84.50954615 30.41549487]\n", " [-82.33535417 29.59923542]\n", " [-81.85998 30.45498 ]\n", " [-81.70468333 30.27136389]\n", " [-81.4043125 30.1104625 ]\n", " [-81.39345085 28.78659068]\n", " [-80.68914667 28.20332 ]\n", " [-81.43954815 28.30446296]\n", " [-81.27099412 28.44352353]\n", " [-82.43539091 28.60199394]\n", " [-82.68866875 28.14381875]\n", " [-82.73148667 27.83667333]\n", " [-82.41316923 27.96674808]\n", " [-81.89438718 28.00787949]\n", " [-82.40257692 27.27038077]\n", " [-80.493955 27.286065 ]\n", " [-80.24657586 27.15567931]\n", " [-81.80344462 26.46772308]\n", " [-80.22822162 26.03854054]\n", " [-80.14554643 26.33770179]\n", " [-80.13587297 26.67935135]\n", " [-80.32211842 26.15567632]\n", " [-80.21556 25.943148 ]\n", " [-80.31535625 25.84950625]\n", " [-80.38364878 25.67422927]\n", " [-80.20846667 25.80198788]]\n" ] } ], "source": [ "# cvt_step_florida()\n", "#\n", "def cvt_step_florida ( g, n ):\n", " import numpy as np\n", " m = g.shape[0]\n", " indx = florida_sample ( n )\n", " s = np.zeros ( [ n, 2 ] )\n", " s[:,0] = cen_lon[indx]\n", " s[:,1] = cen_lat[indx]\n", " ni = np.zeros ( m )\n", " c = np.zeros ( [ m, 2 ])\n", " for i in range ( 0, n ):\n", " k = -1\n", " d = np.Inf\n", " for j in range ( 0, m ):\n", " dj = np.linalg.norm ( s[i,:] - g[j,:] )\n", " if ( dj < d ):\n", " d = dj\n", " k = j\n", " ni[k] = ni[k] + 1\n", " c[k,:] = c[k,:] + s[i,:]\n", " \n", " for i in range ( 0, m ):\n", " c[i,:] = c[i,:] / float ( ni[i] )\n", " \n", " return c\n", "#\n", "# Test the function using 7 Florida cities to start with.\n", "#\n", "import numpy as np\n", "\n", "\n", "n = 1000\n", "c = cvt_step_florida ( g27, n )\n", "print ( c )" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# CVT Iteration for Florida #\n", "\n", "Once we have a CVT step function, the CVT iteration is easy. Try taking 10 steps,\n", "and plot the locations of G before the iteration, and then at the end.\n", "\n", "You may also be able to plot the Voronoi diagram on top of the plot of the cities." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Code for CVT Iteration for Florida\n", "#\n", "def cvt_florida ( g, n, step_num ):\n", "\n", "# vor = spatial.Voronoi ( g )\n", "# spatial.voronoi_plot_2d ( vor )\n", " plt.plot ( g[:,0], g[:,1], 'go' )\n", " plt.plot ( fla_lon, fla_lat, 'b-' )\n", " plt.axis ( 'Equal' )\n", " plt.title ( 'Before CVT iteration')\n", " plt.show ( )\n", "\n", " for step in range ( 0, step_num ):\n", " c = cvt_step_florida ( g, n )\n", " g = c.copy ( )\n", " return g\n", "#\n", "# Test the code.\n", "#\n", "g = g27.copy ( )\n", "n = 5000\n", "step_num = 10\n", "g = cvt_florida ( g, n, step_num )\n", "\n", "#vor = spatial.Voronoi ( g )\n", "#spatial.voronoi_plot_2d ( vor )\n", "plt.plot ( g[:,0], g[:,1], 'ro' )\n", "plt.plot ( fla_lon, fla_lat, 'b-' )\n", "plt.axis ( 'Equal' )\n", "plt.title ( 'After CVT iteration')\n", "plt.show ( )" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.8" } }, "nbformat": 4, "nbformat_minor": 0 }