Help me find this data analysis tool, so I can process lots of cool data.
I have this giant set of user-entered data describing places that I'm trying to classify. One specific task that I'm having problems with is that people enter very similar but not identical information, so each of them show up individually. So I have records like:
JOE'S GAS STATION
JOE'S GAS STATION PINEHURST DRIVE
JOE'S GAS STATIONS INC
JOES GAS STATION
but it would make my life a lot easier if they were all linked together, so I don't have to classify all of them individually.
I remember seeing, quite possibly here, a stand-alone Windows program that did this. It was a data analysis package, with a lot of other features and analytical capabilities, but there were robust functions for grouping these sorts of similar texts together using some sort of algorithm (I think I remember fuzzy clustering, but don't quote me). If memory serves, it was open-source or free or at least there was a free demo, and I seem to remember it being vaguely affiliated with Google.
I remember there was a modest hubbub when it was released; there was a series of demo videos showing cool features of the program. As I said, I think I may have seen it on the blue, but I follow enough data blogs that I may have seen it elsewhere.