I'll be the most powerful man in Hill Valley, and I'm gonna clean up this data.
September 5, 2012 7:33 PM Subscribe
Help me find this data analysis tool, so I can process lots of cool data.
I have this giant set of user-entered data describing places that I'm trying to classify. One specific task that I'm having problems with is that people enter very similar but not identical information, so each of them show up individually. So I have records like:
I remember seeing, quite possibly here, a stand-alone Windows program that did this. It was a data analysis package, with a lot of other features and analytical capabilities, but there were robust functions for grouping these sorts of similar texts together using some sort of algorithm (I think I remember fuzzy clustering, but don't quote me). If memory serves, it was open-source or free or at least there was a free demo, and I seem to remember it being vaguely affiliated with Google.
I remember there was a modest hubbub when it was released; there was a series of demo videos showing cool features of the program. As I said, I think I may have seen it on the blue, but I follow enough data blogs that I may have seen it elsewhere.
I have this giant set of user-entered data describing places that I'm trying to classify. One specific task that I'm having problems with is that people enter very similar but not identical information, so each of them show up individually. So I have records like:
JOE'S GAS JOE'S GAS STATION JOE'S GAS STATION PINEHURST DRIVE JOE'S GAS STATIONS INC JOES GAS JOES GAS STATIONbut it would make my life a lot easier if they were all linked together, so I don't have to classify all of them individually.
I remember seeing, quite possibly here, a stand-alone Windows program that did this. It was a data analysis package, with a lot of other features and analytical capabilities, but there were robust functions for grouping these sorts of similar texts together using some sort of algorithm (I think I remember fuzzy clustering, but don't quote me). If memory serves, it was open-source or free or at least there was a free demo, and I seem to remember it being vaguely affiliated with Google.
I remember there was a modest hubbub when it was released; there was a series of demo videos showing cool features of the program. As I said, I think I may have seen it on the blue, but I follow enough data blogs that I may have seen it elsewhere.
Google Refine, for sure. It's actually fun, too. I spent 10 hours on it the other day collapsing rows like it was Tetris.
posted by iamkimiam at 10:51 PM on September 5, 2012
posted by iamkimiam at 10:51 PM on September 5, 2012
Response by poster: Of course, Google Refine!
Thanks; I knew the hive mind would figure out in twelve minutes what I'd been bashing my brains in for all afternoon.
posted by Homeboy Trouble at 8:37 AM on September 6, 2012
Thanks; I knew the hive mind would figure out in twelve minutes what I'd been bashing my brains in for all afternoon.
posted by Homeboy Trouble at 8:37 AM on September 6, 2012
« Older Resources for crossdressers and their wives? | How do you judge how good a friend is? Newer »
This thread is closed to new comments.
posted by blahblahblah at 7:45 PM on September 5, 2012 [4 favorites]