Excel filter: Nearest neighbor identification.
August 2, 2010 9:34 AM Subscribe
Excel filter: Nearest neighbor identification. Do I need a UDF?
I have an excel sheet with these fields:
name,lat,lon
For each name, I'd like to add fields identifying the 5 nearest neighbors. For example, if my data set looks like this:
name1,lat1,lon1
name2,lat2,lon2
...
I'd like my output to look like:
name1,lat1,lon1,[name of nearest neighbor],[name of 2nd nearest neighbor],...
name2,lat2,lon2,[name of nearest neighbor],[name of 2nd nearest neighbor],...
...
Is there some easy way to do this?
I have an excel sheet with these fields:
name,lat,lon
For each name, I'd like to add fields identifying the 5 nearest neighbors. For example, if my data set looks like this:
name1,lat1,lon1
name2,lat2,lon2
...
I'd like my output to look like:
name1,lat1,lon1,[name of nearest neighbor],[name of 2nd nearest neighbor],...
name2,lat2,lon2,[name of nearest neighbor],[name of 2nd nearest neighbor],...
...
Is there some easy way to do this?
What is a UDF? How many entries do you have?
You could make a table (n x n entries) and calculate the distance between each neighbor using a straight line distance ( sqrt( (x1 -x)^2 + (y1 - y)^2 ) ), and then pick the five smallest values.
posted by a womble is an active kind of sloth at 9:45 AM on August 2, 2010
You could make a table (n x n entries) and calculate the distance between each neighbor using a straight line distance ( sqrt( (x1 -x)^2 + (y1 - y)^2 ) ), and then pick the five smallest values.
posted by a womble is an active kind of sloth at 9:45 AM on August 2, 2010
Response by poster: Ended up making an autohotkey macro. Picked a record, calculated distances to all other record, sorted on distance, populated nearest neighbor names, rinse and repeat. PM me for details.
posted by sisquoc15 at 11:16 AM on August 2, 2010
posted by sisquoc15 at 11:16 AM on August 2, 2010
Do you have access to ArcGIS? It's pretty easy to do in arc...
posted by stratastar at 2:26 PM on August 2, 2010
posted by stratastar at 2:26 PM on August 2, 2010
Just for reference; this free tool for the not free ArcGIS; is really nice for creating point distance matrices.
posted by stratastar at 2:31 PM on August 2, 2010
posted by stratastar at 2:31 PM on August 2, 2010
Response by poster: I've gotten a few PM's on how my autohotkey worked. Here's a clunky explanation. My data looked like this.
name1, longitude1, latitude1
name2, longitude2, latitude2
...
I added a dummy column of zeros's. And I added a column that contained a formula that calculated the distance (a squared+b squared=c squared) for each point to the point in row 1. Data now looks like:
0,name1,longitude1,latitude1,calculated_distance_to_name1,
0,name2,longitude2,latitude2,calculated_distance_to_name1,
0,name3,longitude3,latitude3,calculated_distance_to_name1,
...
Finally I added columns that simply reported the names (name2, for example) in the rows below. Data:
0,name1, longitude1, latitude1,calculated_distance_to_name1,[name in one row down, column 2],[name in two rows down,c2], ...
0,name2, longitude2, latitude2,calculated_distance_to_name1,[name in one row down, column 2],[name in two rows down,c2], ...
0,name3, longitude3, latitude3,distance_to_name1,[name in one row down, column 2],[name in two rows down,c2], ...
...
Ok, now the autohotkey. I won't include the actual program (ask me if you'd like it) but I'll lay out the format.
Begin Loop
activate excel
select all and sort by the "calculated_distance_to_name1" column. This will give you all your records in order of distance to name1. It will also populate the name1 row with [name in one row down, column 2],[name in two rows down,c2],... which are name1's nearest neighbors, in order. Sweet.
change the 0 in row1,column1 to a 1. Copy row one and paste-special it back on itself as values. This locks in the nearest neighbor fields.
Now sort on column 1. This throws the completed row (marked with a one) to the bottom.
End Loop
Set the loop count for the number of rows you have.
I hope this makes sense, I have no experience in databases and I'm sure there is a more elegant way to explain this.
posted by sisquoc15 at 10:19 AM on August 20, 2010
name1, longitude1, latitude1
name2, longitude2, latitude2
...
I added a dummy column of zeros's. And I added a column that contained a formula that calculated the distance (a squared+b squared=c squared) for each point to the point in row 1. Data now looks like:
0,name1,longitude1,latitude1,calculated_distance_to_name1,
0,name2,longitude2,latitude2,calculated_distance_to_name1,
0,name3,longitude3,latitude3,calculated_distance_to_name1,
...
Finally I added columns that simply reported the names (name2, for example) in the rows below. Data:
0,name1, longitude1, latitude1,calculated_distance_to_name1,[name in one row down, column 2],[name in two rows down,c2], ...
0,name2, longitude2, latitude2,calculated_distance_to_name1,[name in one row down, column 2],[name in two rows down,c2], ...
0,name3, longitude3, latitude3,distance_to_name1,[name in one row down, column 2],[name in two rows down,c2], ...
...
Ok, now the autohotkey. I won't include the actual program (ask me if you'd like it) but I'll lay out the format.
Begin Loop
activate excel
select all and sort by the "calculated_distance_to_name1" column. This will give you all your records in order of distance to name1. It will also populate the name1 row with [name in one row down, column 2],[name in two rows down,c2],... which are name1's nearest neighbors, in order. Sweet.
change the 0 in row1,column1 to a 1. Copy row one and paste-special it back on itself as values. This locks in the nearest neighbor fields.
Now sort on column 1. This throws the completed row (marked with a one) to the bottom.
End Loop
Set the loop count for the number of rows you have.
I hope this makes sense, I have no experience in databases and I'm sure there is a more elegant way to explain this.
posted by sisquoc15 at 10:19 AM on August 20, 2010
This thread is closed to new comments.
posted by ChrisHartley at 9:44 AM on August 2, 2010