City Distance Datasets

**CITIES**
is a dataset directory which
contains files describing intercity distances.

This data can be used for

- traveling salesman problems (connected path through every city);
- K-means calculations (find M spots that minimize total of the distance from each city to its nearest spot);
- K-medians calculations (make M of the cities "special", to minimize the total distance from each city to its nearest special city);
- Weighted K-means or K-medians (let the population of each city be used as a weight, which makes some cities more important);
- Minimal spanning trees (construct the shortest highway system that connects all the cities, using only straight paths from one city to another (ignore the possibility that two roads could cross, or that a Y-shaped connector between three cities might be cheaper);
- Voronoi diagrams (assign each spot of land to the nearest city, making "provinces");

The computer code and data files described and made available on this web page are distributed under the GNU LGPL license.

ASA058, a C++ library which implements the K-means algorithm of Sparks.

ASA136, a MATLAB library which implements the Hartigan and Wong clustering algorithm.

CITIES, a FORTRAN90 library which handles various problems associated with a set of "cities" on a map.

DISTANCE_TO_POSITION, a FORTRAN90 program which estimates the positions of cities based on a city-to-city distance table.

DISTANCE_TO_POSITION, a MATLAB program which estimates the positions of cities based on a city-to-city distance table.

DISTANCE_TO_POSITION_SPHERE, a MATLAB program which estimates the positions of cities on a sphere (such as the earth) based on a city-to-city distance table.

KMEANS, a FORTRAN90 library which contains several different algorithms for the K-Means problem.

KMEDIAN, a FORTRAN77 program which handles the K-Median problem.

LAU_NP, a FORTRAN90 library which implements heuristic algorithms for various NP-hard combinatorial problems.

SPAETH, a dataset directory which contains datasets for cluster analysis;

SPAETH, a FORTRAN90 library which can cluster data according to various principles.

SPAETH2, a dataset directory which contains datasets for cluster analysis;

SPAETH2, a FORTRAN90 library which can cluster data according to various principles.

TOMS456, a FORTRAN77 library which implements the routing algorithm; this is ACM TOMS algorithm 456.

TSP, a dataset directory which contains test data for the traveling salesperson problem;

**CO04** gives the distances between 4 cities; the
goal is to choose some of the cities as distribution sites;
for each city, we then choose the FURTHEST distribution site,
so as to MAXIMIZE the total transportation cost to all cities.

- co04_main.txt, is the main file.
- co04_dist.txt, the city-to-city NONSYMMETRIC transportation matrix.
- co04_weight.txt, contains weights that may be applied to the cities.

**GRID04** describes 4 "cities" on a simple grid.

- grid04_main.txt, is the main file.
- grid04_dist.txt, the city-to-city distance matrix.
- grid04_xy.txt, the (X,Y) coordinates of each city.
- grid04_xy.png, a PNG image of the (X,Y) coordinates of each city.

**HA30** describes the airline distances, in hundreds
of miles, between 30 international cities. For this problem,
the spherical geometry is significantly different from the planar
approximation.

- ha30_main.txt, is the main file.
- ha30_code.txt, a code for each city.
- ha30_dist.txt, the city-to-city distance matrix.
- ha30_xyz.txt, XYZ coordinates (in hundreds of miles) that can be assigned to the cities, assuming the earth has a radius of 3,959 miles, so that the great circle distances are approximately those given in the distance table, estimated by the MATLAB program "distance_to_position_sphere".
- ha30_name.txt, the name of each city.

**KN57** describes the distance in (integer) miles between 57 cities.
Since these are "highway miles", the distances are difficult to use
to construct an exact planar map.

- kn57_main.txt, is the main file.
- kn57_dist.txt, the city-to-city distance matrix.
- kn57_xy.txt, the (X,Y) coordinates of each city, estimated from the distance table.
- kn57_xy.png, a PNG image of the (X,Y) coordinates of each city.

**LAU15** describes the distance between 15 cities,
and seeks a solution to the traveling salesman problem.
The XY data was computed as a least squares fit to the DIST data,
and so does not exactly correspond.

- lau15_main.txt, is the main file.
- lau15_dist.txt, the city-to-city distance matrix.
- lau15_tsp.txt, an itinerary that solves the traveling salesman problem.
- lau15_xy.txt, the (X,Y) coordinates of each city, estimated from the distance table.
- lau15_xy.png, a PNG image of the (X,Y) coordinates of each city.

**SGB128** describes 128 cities in North America.

- sgb128_main.txt, is the main file.
- sgb128_dist.txt, the city-to-city road distance matrix.
- sgb128_dms.txt, the latitude and longitude of each city.
- sgb128_map.tex, a TeX file to draw a picture of the cities.
- sgb128_map.png, a PNG image of the map of the cities.
- sgb128_name.txt, the name of each city.
- sgb128_weight.txt, the weight (population) of each city.
- sgb128_xy.txt, the XY coordinates of the cities.
- sgb128_xy.png, a PNG image of the XY coordinates of the cities.

**SH07** gives the distances between 7 cities; the
goal is to choose 3 of the cities as distribution sites
so as to minimize the total transportation cost to all cities.

- sh07_main.txt, is the main file.
- sh07_dist.txt, the city-to-city distance matrix.
- sh07_xy.txt, the XY coordinates of the cities, estimated from the distance table.
- sh07_xy.png, a PNG image of the XY coordinates of the cities.

**SP11** describes 11 artificial cities, with
arbitrary integer distances.

- sp11_main.txt, is the main file.
- sp11_code.txt, contains a code for each city.
- sp11_dist.txt, the city-to-city distance matrix.
- sp11_xy.txt, the (X,Y) coordinates of each city, estimated from the distance table.
- sp11_xy.png, a PNG image of the (X,Y) coordinates of each city.

**UK12** describes 12 cities, with integer distances
representing the travel time, in minutes, between pairs of cities.

- uk12_main.txt, is the main file.
- uk12_code.txt, contains a code for each city.
- uk12_dist.txt, the city-to-city distance matrix.
- uk12_name.txt, the name of each city.
- uk12_xy.txt, the (X,Y) coordinates of each city, estimated from the distance table.
- uk12_xy.png, a PNG image of the (X,Y) coordinates of each city.

**USCA312** describes 312 cities in the US and Canada.
Distances between the city are computed from latitude and longitude,
not from road mileage.

- usca312_main.txt, is the main file.
- usca312_dist.txt, is the city-to-city distance, in miles, computed from the latitude/longitude data and rounded to integers.
- usca312_dms.txt, the latitude and longitude of each city.
- usca312_name.txt, the name of each city.
- usca312_xy.txt, the XY coordinates of the cities.
- usca312_xy.png, a PNG image of the XY coordinates of the cities.

**USCAP** describes the location of the 50 US state capitals.

- uscap_main.txt, is the main file.
- uscap_ll.txt, the latitude and longitude of each capital.
- uscap_name.txt, the name and state of each capital.
- uscap_xy.txt, the (X,Y) coordinates of each capital using a cylindrical projection.
- uscap.png, an image of the capital locations on the cylindrical projection.

**WG22** describes 22 cities in West Germany and comes from Spaeth.

- wg22_main.txt, is the main file.
- wg22_dist.txt, the city-to-city distance matrix.
- wg22_name.txt, the name of each city.
- wg22_xy.txt, the (X,Y) coordinates of each city.
- wg22_xy.png, a PNG image of the (X,Y) coordinates of each city.

**WG59** describes 59 cities in West Germany, and comes from Spaeth.

- wg59_main.txt, is the main file.
- wg59_dist.txt, the city-to-city distance matrix.
- wg59_name.txt, the name of each city.
- wg59_xy.txt, contains the (X,Y) coordinates of each city.
- wg59_xy.png, a PNG image of the (X,Y) coordinates of each city.

You can go up one level to the DATASETS directory.