Finding patterns using raw computational power
October 27, 2010 10:14 AM Subscribe
Looking for examples where large amounts of data + large amount of computer processing power = discovering unexpected patterns.
Essentially, Google-type stuff - finding patterns in massive datasets data and turning that into predictive tools. So, obviously, you've got everything from massively complex stuff like trying to predict earthquakes or weather, to (seemingly) simple stuff like Amazon-style customer taste predictions.
It's a big subject, of course - I'm just interested in fun examples of what this approach of throwing computational power at datasets has thrown up recently, and exciting areas where it's being used now...
Essentially, Google-type stuff - finding patterns in massive datasets data and turning that into predictive tools. So, obviously, you've got everything from massively complex stuff like trying to predict earthquakes or weather, to (seemingly) simple stuff like Amazon-style customer taste predictions.
It's a big subject, of course - I'm just interested in fun examples of what this approach of throwing computational power at datasets has thrown up recently, and exciting areas where it's being used now...
This nature paper on anxious temperament was based on some pretty sizeable datasets. full disclosure, the author of that paper sits a few feet from me.
The fMRI and PET data alone was about a terabyte. Coupled with the PET and genetic data the dataset was immense. It took weeks of CPU time just to do automated processing using tools like AFNI, FSL, matlab, R, and a metric ton of python and bash scripts. Going through the results and analyzing them took even longer.
I know that a much larger NIH funded study is the works to collect fMRI data on 4000+ adult human subjects. That would be roughly 15-20 Terabytes of raw data. The idea being that the NIH could make this dataset available to researchers for various kinds of analysis, such as connectivity.
posted by Pogo_Fuzzybutt at 11:36 AM on October 27, 2010
The fMRI and PET data alone was about a terabyte. Coupled with the PET and genetic data the dataset was immense. It took weeks of CPU time just to do automated processing using tools like AFNI, FSL, matlab, R, and a metric ton of python and bash scripts. Going through the results and analyzing them took even longer.
I know that a much larger NIH funded study is the works to collect fMRI data on 4000+ adult human subjects. That would be roughly 15-20 Terabytes of raw data. The idea being that the NIH could make this dataset available to researchers for various kinds of analysis, such as connectivity.
posted by Pogo_Fuzzybutt at 11:36 AM on October 27, 2010
Twitter + Dow Jones = (Probably Bullshit) predictions.
posted by empath at 11:40 AM on October 27, 2010
posted by empath at 11:40 AM on October 27, 2010
Best answer: A Canadian professor of English uses computers to develop concordances of the works of classic writers. He uploaded Agatha Christie's novels and made an interesting discovery about possible changes in her mental capacity later in life, evidenced by a distinct but previously unnoticed change in her writing style.
posted by fuse theorem at 12:06 PM on October 27, 2010 [1 favorite]
posted by fuse theorem at 12:06 PM on October 27, 2010 [1 favorite]
Best answer: The experiments at the LHC at CERN are some of the most data intensive experiments ever built. They project to produce on the order of ~15 petabytes of data a year written to disk (that's after cutting out orders of magnitude of background signal).
To handle this, they've built the LHC Computing Grid which is an international network of data centers which will use distributed computing to handle the mounds of data.
The data processing aspects of the LHC are just as impressive as the physics, if you're in to that kind of stuff.
Of course, no unexpected patterns have been found yet, but they should in the next couple of years.
posted by auto-correct at 12:35 PM on October 27, 2010
To handle this, they've built the LHC Computing Grid which is an international network of data centers which will use distributed computing to handle the mounds of data.
The data processing aspects of the LHC are just as impressive as the physics, if you're in to that kind of stuff.
Of course, no unexpected patterns have been found yet, but they should in the next couple of years.
posted by auto-correct at 12:35 PM on October 27, 2010
Eric Schmidt recently blurted out that people at google realised they could use their data to predict the stock market, but it would be illegal, so they don't.
posted by AmbroseChapel at 1:43 PM on October 27, 2010 [1 favorite]
posted by AmbroseChapel at 1:43 PM on October 27, 2010 [1 favorite]
There's a Radiolab episode that talks about machine learning that covers this. I think it's this one, about Eureqa.
FWIW, I used to work for a company that did datamining on webserver logs to extract these kinds of patterns. It's pretty common.
posted by chairface at 2:32 PM on October 27, 2010 [1 favorite]
FWIW, I used to work for a company that did datamining on webserver logs to extract these kinds of patterns. It's pretty common.
posted by chairface at 2:32 PM on October 27, 2010 [1 favorite]
Best answer: Recently the distributed computing project Einstein@Home discovered a pulsar.
posted by Rhomboid at 10:34 PM on October 27, 2010
posted by Rhomboid at 10:34 PM on October 27, 2010
This thread is closed to new comments.
posted by T.D. Strange at 10:23 AM on October 27, 2010