Creating a Map Based on Distances Between Cities
October 29, 2009 7:03 PM

Mapping question: I have a list of (imaginary) cities. I also know each city's as-the-crow-flies distance from every other city on the list. Based on this data alone, I would like to create a (hypothetical) map that shows all of these cities in their proper locations. I recognize that that the map's "rotation" would be arbitrary based on where & how you start plotting, but it seems that you should be able to plot out the relationships between cities properly. Is there software that can help me do this?

To be a little more concrete, imagine that you have a mileage chart similar to this one. (That chart contains driving distances, but let's assume we have as-the-crow-flies distances instead.) If you pick an arbitrary starting point for one city, you can then plot out where every other city belongs, because you know how far each city is supposed to be from all the other cities.

I can envision at least one simple way to do this manually if you have just a handful of cities. But if you have a lot of cities, it becomes harder and harder. So I'd love to know if there is software that can calculate and draw such a map automatically. (Note: Using latitude/longitude is not an option - the scenario only involves knowing as-the-crow-flies distances between imaginary cities.) Thanks!
posted by Conrad Cornelius o'Donald o'Dell to Computers & Internet (23 answers total) 4 users marked this as a favorite
It seems like a Multidimensional Scaling plot would work.

Not your problem exactly, but check out this map of France "derived from inadequate data".
posted by Rumple at 7:08 PM on October 29, 2009


These images give a better idea of MDS output.
posted by Rumple at 7:09 PM on October 29, 2009


A word of warning: it is possible that a given set of city-to-city distances might contain internal contradictions and be unmappable.
posted by Songdog at 7:15 PM on October 29, 2009


For example, this doesn't work (at least not in the space I'm comfortable with):

A to B: 10 units of distance
B to C: 10 units of distance
A to C: 100 units of distance

Do you already know that your distances are internally consistent?
posted by Songdog at 7:19 PM on October 29, 2009


Odinsdream: I have too many cities to do this manually. I also am not interested in topology - just pure distances. (Maybe I should have just said "nodes" instead of cities.)

Songdog: I have every reason to believe my distances are internally consistent.
posted by Conrad Cornelius o'Donald o'Dell at 7:25 PM on October 29, 2009


Rumple is correct. Multidimensional scaling is exactly what you need. It does precisely this. A simple MDS routine is implemented in R; the command is cmdscale.
posted by shadow vector at 7:29 PM on October 29, 2009


Indeed, it looks like Rumple and shadow vector are exactly correct. This page even uses the same sort of example that I did in my question.

Unfortunately, though, I am not a programmer. Are you aware of any software which might be friendly to an end-user such as myself?
posted by Conrad Cornelius o'Donald o'Dell at 7:35 PM on October 29, 2009


It's funny, I was thinking about this after that thread recently ... I didn't actually get to the point of doing it, but I think you can do this in R, which is a programming language designed for statistics. I had pulled up a bunch of howtos on drawing graphs using R (what you're getting into here is graph plotting, "graph" in the "graph theory" sense, not the more general sense). I was looking at this howto on plotting directed graphs, and also at this documentataion page on plotting graph objects.

What you have, in your list of distances but without directions, is an undirected graph. This is a bit more complex to plot than a directed graph, because you need to decide how you're going to orient each node with respect to the others (or at least its neighbor nodes); otherwise you'd just have a number line. To this end, the graphviz engine has a bunch of algorithms which try to arrange nodes in a more or less neat fashion. You'll probably have to play with them to see which produces the cleanest output.

It might be possible to use graphviz independently of R; I haven't really looked into it all that closely. (I actually just found its homepage while writing this, so I'm off to do a little more reading myself.)
posted by Kadin2048 at 7:38 PM on October 29, 2009


Oops, didn't see that your comment at 10:35 addressed R. That's, unfortunately, the best thing I've found so far.
posted by Kadin2048 at 7:40 PM on October 29, 2009


Kadin: What other thread were you thinking of?
posted by Conrad Cornelius o'Donald o'Dell at 7:48 PM on October 29, 2009


All of the alternatives that I know of software-wise are worse (maybe somebody will pop in with one I'm not aware of). Fortunately, this is pretty straightforward in R, and you don't need to know much about R to get it running. The command is already included in a default package. If you look at the help for the cmdscale command, you'll see the following example:
loc <>
x <>
y <>
plot(x, y, type="n", xlab="", ylab="", main="cmdscale(eurodist)")
text(x, y, rownames(loc), cex=0.8)
This uses a pre-loaded dataset called eurodist which has European city distances. The first command stores the output from cmdscale in the object 'loc'. The output is a set of points in (by default) 2-d space. The next two commands put the first and second columns of 'loc' into their own variables (flipping the sign of y) and then the last two plot and label them, with some extra options to make the graph look nicer. That's the whole process.

All you'll have to do is get your data into R and repeat this, basically.
posted by shadow vector at 7:51 PM on October 29, 2009


Hmm, that ate some of my pre text. You can click the link and scroll down to the bottom.
posted by shadow vector at 7:52 PM on October 29, 2009


I would use the "neato" program in Graphviz to solve this problem.

Neato is intended for drawing graphs. It takes a bunch of nodes and edge weights, and uses an optimization algorithm to try to position each node so the lengths of its edges accurately reflect their weights.

For this application, the edge between two cities should be 1/distance between those cities.

You'll have to format the input data in Graphviz' input language, which will at least require some serious search-and-replace. Anyone want to write a quick script?
posted by miyabo at 7:53 PM on October 29, 2009


Miyabo: Can you explain a bit more about what an "edge weight" is? Thanks.
posted by Conrad Cornelius o'Donald o'Dell at 7:57 PM on October 29, 2009


Indeed, it looks like Rumple and shadow vector are exactly correct. This page even uses the same sort of example that I did in my question.

Unfortunately, though, I am not a programmer. Are you aware of any software which might be friendly to an end-user such as myself?


DUDE!

THE PAGE YOU LINKED TO HAS THE PROGRAM YOU NEED RIGHT ON THAT PAGE.

This is the program:
source("http://personality-project.org/r/useful.r") #get some extra functions, including read.clipboard()


cities <> cities #show the data
city.location <> round(city.location,0) #print the locations to the screen
plot(city.location,type="n", xlab="Dimension 1", ylab="Dimension 2",main ="cmdscale(cities)") #put up a graphics window
text(city.location,labels=names(cities)) #put the cities into the map

Download R (which is free), copy your data to the clipboard, and type that program in. Voila!, you have a map
posted by orthogonality at 8:00 PM on October 29, 2009


Do you really have data for distances between every pair of nodes? As the number of nodes grows, the number of pairs grows pretty quickly: for 10 nodes you'd need 10*9/2 distances. But if you don't have all the pairs, your map/graph may not be unique (think of the sides of a rectangle distorting into a parallelogram).
posted by leahwrenn at 8:27 PM on October 29, 2009


Ortho: I'm aware of what was on the page I linked. I'm not sure there is a need to yell at me about it. As I explained, I am not a programmer; therefore, I had no way to know how easy or difficult it might be to use the program on that page.

Leah: Yes, I have data for distances between every single pair of nodes. I have a table, much like the mileage table linked above, which is completely filled in.
posted by Conrad Cornelius o'Donald o'Dell at 6:53 AM on October 30, 2009


On second thought, I'd go with the R answer. I think it's a better solution for this problem.
posted by miyabo at 9:39 AM on October 30, 2009


slightly OT, but Antigenic Cartography is a variation on this technique, where the "cities" are strains of a virus, and the "distances" are how similar they are immunologically. The resulting maps give insight into how the surface of the virus is changing.
posted by James Scott-Brown at 4:09 PM on October 30, 2009


R's cmdscale function appears to have worked. The program I linked earlier (here) did not work properly - R kept experiencing errors with the "read.clipboard" command. After Googling around, I learned enough to try switching it to a "read.table" command, which looks directly at a file, rather than at what you've got copied to the clipboard. That succeeded. The program spat back a list of coordinates and also created a graphical map based on my matrix. Pretty cool! Special thanks to Rumple and shadow vector.
posted by Conrad Cornelius o'Donald o'Dell at 12:46 PM on November 1, 2009


Thats because read.clipboard() requires the library which is loaded by the first line of the program:

source("http://personality-project.org/r/useful.r")
#get some extra functions, including read.clipboard()
posted by orthogonality at 5:16 PM on November 1, 2009


Interestingly, cmdscale will produce a map for you even if you have wildly inconsistent distances. (I tried with a matrix that was deliberately flawed.) Of course, the map it produces will be "wrong," but it will give you something.
posted by Conrad Cornelius o'Donald o'Dell at 5:50 PM on November 1, 2009


Actually, NO. That is not why I had problems with read.clipboard. I copied and pasted the program exactly as shown on that page, including the library loaded in the first line of the program. I followed the instructions exactly as laid out. Yet even with this library loaded, I kept getting a very specific error:
"Error in read.table(file("clipboard"), header = TRUE, ...) :
more columns than column names
I tried tweaking the mileage distance chart in a variety of ways (adding & removing columns, adding & removing column names, etc.), but R was stubborn. So I switched to read.table, which I found to be a much better solution for my purposes in any event.
posted by Conrad Cornelius o'Donald o'Dell at 6:03 PM on November 1, 2009


« Older Hard drive back from the dead. Should I trust it?   |   Who can I pay to type up my notes? Newer »
This thread is closed to new comments.