# Excel filter: Nearest neighbor identification.

August 2, 2010 9:34 AM Subscribe

Excel filter: Nearest neighbor identification. Do I need a UDF?

I have an excel sheet with these fields:

name,lat,lon

For each name, I'd like to add fields identifying the 5 nearest neighbors. For example, if my data set looks like this:

name1,lat1,lon1

name2,lat2,lon2

...

I'd like my output to look like:

name1,lat1,lon1,[name of nearest neighbor],[name of 2nd nearest neighbor],...

name2,lat2,lon2,[name of nearest neighbor],[name of 2nd nearest neighbor],...

...

Is there some easy way to do this?

I have an excel sheet with these fields:

name,lat,lon

For each name, I'd like to add fields identifying the 5 nearest neighbors. For example, if my data set looks like this:

name1,lat1,lon1

name2,lat2,lon2

...

I'd like my output to look like:

name1,lat1,lon1,[name of nearest neighbor],[name of 2nd nearest neighbor],...

name2,lat2,lon2,[name of nearest neighbor],[name of 2nd nearest neighbor],...

...

Is there some easy way to do this?

What is a UDF? How many entries do you have?

You could make a table (n x n entries) and calculate the distance between each neighbor using a straight line distance ( sqrt( (x1 -x)^2 + (y1 - y)^2 ) ), and then pick the five smallest values.

posted by a womble is an active kind of sloth at 9:45 AM on August 2, 2010

You could make a table (n x n entries) and calculate the distance between each neighbor using a straight line distance ( sqrt( (x1 -x)^2 + (y1 - y)^2 ) ), and then pick the five smallest values.

posted by a womble is an active kind of sloth at 9:45 AM on August 2, 2010

Response by poster: Ended up making an autohotkey macro. Picked a record, calculated distances to all other record, sorted on distance, populated nearest neighbor names, rinse and repeat. PM me for details.

posted by sisquoc15 at 11:16 AM on August 2, 2010

posted by sisquoc15 at 11:16 AM on August 2, 2010

Do you have access to ArcGIS? It's pretty easy to do in arc...

posted by stratastar at 2:26 PM on August 2, 2010

posted by stratastar at 2:26 PM on August 2, 2010

Just for reference; this free tool for the not free ArcGIS; is really nice for creating point distance matrices.

posted by stratastar at 2:31 PM on August 2, 2010

posted by stratastar at 2:31 PM on August 2, 2010

Response by poster: I've gotten a few PM's on how my autohotkey worked. Here's a clunky explanation. My data looked like this.

name1, longitude1, latitude1

name2, longitude2, latitude2

...

I added a dummy column of zeros's. And I added a column that contained a formula that calculated the distance (a squared+b squared=c squared) for each point to the point in row 1. Data now looks like:

0,name1,longitude1,latitude1,calculated_distance_to_name1,

0,name2,longitude2,latitude2,calculated_distance_to_name1,

0,name3,longitude3,latitude3,calculated_distance_to_name1,

...

Finally I added columns that simply reported the names (name2, for example) in the rows below. Data:

0,name1, longitude1, latitude1,calculated_distance_to_name1,[name in one row down, column 2],[name in two rows down,c2], ...

0,name2, longitude2, latitude2,calculated_distance_to_name1,[name in one row down, column 2],[name in two rows down,c2], ...

0,name3, longitude3, latitude3,distance_to_name1,[name in one row down, column 2],[name in two rows down,c2], ...

...

Ok, now the autohotkey. I won't include the actual program (ask me if you'd like it) but I'll lay out the format.

Begin Loop

activate excel

select all and sort by the "calculated_distance_to_name1" column. This will give you all your records in order of distance to name1. It will also populate the name1 row with [name in one row down, column 2],[name in two rows down,c2],... which are name1's nearest neighbors, in order. Sweet.

change the 0 in row1,column1 to a 1. Copy row one and paste-special it back on itself as values. This locks in the nearest neighbor fields.

Now sort on column 1. This throws the completed row (marked with a one) to the bottom.

End Loop

Set the loop count for the number of rows you have.

I hope this makes sense, I have no experience in databases and I'm sure there is a more elegant way to explain this.

posted by sisquoc15 at 10:19 AM on August 20, 2010

name1, longitude1, latitude1

name2, longitude2, latitude2

...

I added a dummy column of zeros's. And I added a column that contained a formula that calculated the distance (a squared+b squared=c squared) for each point to the point in row 1. Data now looks like:

0,name1,longitude1,latitude1,calculated_distance_to_name1,

0,name2,longitude2,latitude2,calculated_distance_to_name1,

0,name3,longitude3,latitude3,calculated_distance_to_name1,

...

Finally I added columns that simply reported the names (name2, for example) in the rows below. Data:

0,name1, longitude1, latitude1,calculated_distance_to_name1,[name in one row down, column 2],[name in two rows down,c2], ...

0,name2, longitude2, latitude2,calculated_distance_to_name1,[name in one row down, column 2],[name in two rows down,c2], ...

0,name3, longitude3, latitude3,distance_to_name1,[name in one row down, column 2],[name in two rows down,c2], ...

...

Ok, now the autohotkey. I won't include the actual program (ask me if you'd like it) but I'll lay out the format.

Begin Loop

activate excel

select all and sort by the "calculated_distance_to_name1" column. This will give you all your records in order of distance to name1. It will also populate the name1 row with [name in one row down, column 2],[name in two rows down,c2],... which are name1's nearest neighbors, in order. Sweet.

change the 0 in row1,column1 to a 1. Copy row one and paste-special it back on itself as values. This locks in the nearest neighbor fields.

Now sort on column 1. This throws the completed row (marked with a one) to the bottom.

End Loop

Set the loop count for the number of rows you have.

I hope this makes sense, I have no experience in databases and I'm sure there is a more elegant way to explain this.

posted by sisquoc15 at 10:19 AM on August 20, 2010

This thread is closed to new comments.

posted by ChrisHartley at 9:44 AM on August 2, 2010