I'm a data scientist! What does that mean?
May 8, 2012 11:15 AM Subscribe
Can you point me to the best resources to learn about these new-fangled things they call data science and big data? I just started a new job as a data scientist and need to get up to speed.
My PhD is in astrophysics, but I left academia to join the ranks of the corporate and well-paid, and have ended up as a data scientist at a large financial services company.
I have no experience in finance. I have experience working with large data sets, but not the massive databases they have here. The bulk of my experience is in modeling, programming and stats. I'm the first of my new group to be hired, and at the moment, I'm twiddling my thumbs and filling out benefits information. I've been given free reign to come and go as I please as they set up the new group. I'd like to spend this time learning as much as I can about my new role.
I've gotten some books on topics like data mining and machine learning, and have been searching the web for information, but it's difficult to find much. It doesn't seem like people even agree on a definition--many don't even think it's a real thing. (I'm certainly not convinced yet that it's a "science.") It's so new, its Wikipedia page was only created in the last month.
Does anyone have insight on resources or books I can get to help? Journals? Online courses? Professional associations? Conferences? Books designed with business and marketing applications would be most useful, but any guidance or opinions on the subject would be much appreciated.
My PhD is in astrophysics, but I left academia to join the ranks of the corporate and well-paid, and have ended up as a data scientist at a large financial services company.
I have no experience in finance. I have experience working with large data sets, but not the massive databases they have here. The bulk of my experience is in modeling, programming and stats. I'm the first of my new group to be hired, and at the moment, I'm twiddling my thumbs and filling out benefits information. I've been given free reign to come and go as I please as they set up the new group. I'd like to spend this time learning as much as I can about my new role.
I've gotten some books on topics like data mining and machine learning, and have been searching the web for information, but it's difficult to find much. It doesn't seem like people even agree on a definition--many don't even think it's a real thing. (I'm certainly not convinced yet that it's a "science.") It's so new, its Wikipedia page was only created in the last month.
Does anyone have insight on resources or books I can get to help? Journals? Online courses? Professional associations? Conferences? Books designed with business and marketing applications would be most useful, but any guidance or opinions on the subject would be much appreciated.
Best answer: Conferences: O'Reilly's Strata Conference. There's one coming up in October. If your employer is wiling to pay for it, you could also purchase the videos of previous conferences.
Also on the O'Reilly site, check out Edd Dumbill's entries.
You said you got some books - did you get any by Hilary Mason?
posted by research monkey at 11:33 AM on May 8, 2012 [1 favorite]
Also on the O'Reilly site, check out Edd Dumbill's entries.
You said you got some books - did you get any by Hilary Mason?
posted by research monkey at 11:33 AM on May 8, 2012 [1 favorite]
To orient yourself, check out some of the O'Reilly stuff.
http://search.oreilly.com/?q=big+data
Also read something in the industry like Infoworld.
http://www.infoworld.com/
posted by PickeringPete at 11:34 AM on May 8, 2012
http://search.oreilly.com/?q=big+data
Also read something in the industry like Infoworld.
http://www.infoworld.com/
posted by PickeringPete at 11:34 AM on May 8, 2012
Best answer: There was a very smart guy, Jim Gray who came up with the idea of the Fourth Paradigm of science. The basic idea is that nowadays, scientific ideas are "powered" by data mining. Jim, sadly, was lost at sea, but his buddies got together and rounded up a bunch of essays on his topic, and bundled it all into a (free) book, The Fourth Paradigm: Data-Intensive Scientific Discovery that is required reading for someone like you. I'd start there.
posted by Geckwoistmeinauto at 11:46 AM on May 8, 2012 [5 favorites]
posted by Geckwoistmeinauto at 11:46 AM on May 8, 2012 [5 favorites]
At brass tacks, you'll still be a quant the money guys have hidden in the back - the difference is the tools are different compared to 10 years ago, and the data sets are (lots) bigger. Don't be surprised if you wind up building something that decides to make portfolio decisions based on social media data.
I'd skip journals and books because things are moving so fast and start tracking down people in your company and at competitors actively working on and living in this space.
posted by NoRelationToLea at 1:27 PM on May 8, 2012
I'd skip journals and books because things are moving so fast and start tracking down people in your company and at competitors actively working on and living in this space.
posted by NoRelationToLea at 1:27 PM on May 8, 2012
You might also want to look at the ecosystems around the tools you'll be using.
Cognos, Pentaho and other Data warehousing and Business Intelligence systems will have resources that help explain some key concepts. I've been looking at playing with Weka, and a related book on Data Mining.
The big organization I know of that runs conferences is The Data Warehousing Institute.
posted by Sleddog_Afterburn at 1:54 PM on May 8, 2012 [1 favorite]
Cognos, Pentaho and other Data warehousing and Business Intelligence systems will have resources that help explain some key concepts. I've been looking at playing with Weka, and a related book on Data Mining.
The big organization I know of that runs conferences is The Data Warehousing Institute.
posted by Sleddog_Afterburn at 1:54 PM on May 8, 2012 [1 favorite]
Best answer: You might want to take a look at Jure Leskovec's course: Mining Massive Data Sets - the slides and handouts are posted on the web. My fiancee took it, and it covers things like algorithms for machine learning when you can't fit all your training data in memory at once.
posted by pombe at 3:11 PM on May 8, 2012 [1 favorite]
posted by pombe at 3:11 PM on May 8, 2012 [1 favorite]
Response by poster: Thanks everyone, these are some great places to start!
posted by Tooty McTootsalot at 1:21 PM on May 9, 2012
posted by Tooty McTootsalot at 1:21 PM on May 9, 2012
And the Strata conference was mentioned: It is going on right now, Strata conference.
posted by willF at 12:22 PM on February 27, 2013
posted by willF at 12:22 PM on February 27, 2013
This thread is closed to new comments.
posted by rich at 11:22 AM on May 8, 2012 [3 favorites]