Be the "data guy" my current boss needs (and future bosses want)?
January 12, 2015 9:45 AM   Subscribe

I'm in academia. Two related questions: 1) Can you recommend readings, forums, programs, and tools under the broad heading, "data analysis?" 2) And, what's the next job I should be looking for?

Apologies in advance for identity-obfuscating vagueness from the Sock.

I'm in academia.

My new boss's boss came in as a data-driven administrator. He's an outlier in the discipline - he's interested in data for field that doesn't do much quantitative analysis. He has a PhD in a math and statistics-heavy discipline and is leading a unit bereft of anything resembling quantification or visualization savviness (either from the faculty, staff, or administration). Which, I think, is precisely why he was hired.

Happily, I've found some success under the new regime. My niche role rewards technical knowledge and data collection (but doesn't necessarily require it). I was among the first to walk into his office with well-formatted excel charts drawn from a robust access database I developed. But...I want to do more. My education is mostly non-technical (after a false start as an engineering student that had 18-year-old me sitting in front of a Sun terminal plodding through Fortran 77, and sleeping through differential equations), but I've come to my current role because I have a high tolerance for dealing with information overload and searching for answers. Here are some things I'm taking stabs at:

- Web scraping: I've figured out the browser extensions and X-path syntax that makes it easier to pull data off the web. I've learned enough Python to get BeautifulSoup (and to a lesser extent, Srapy) running some Hello World scripts.

- Visualization: Boss is sending me to a Tufte course and I've skimmed through a couple of his books.

- I get a kick out of r/dataisbeautiful

- R and Stata? I know enough to drop the names, but haven't explored them much.

-PDF text parsing - I get a lot of data as text only PDFs. This is a terrible state of affairs (that's not going to change any time soon).

Beyond the technical challenges, I'm suffering some "grass is greener" envy. For many of the tasks and projects I'm contemplating, there are analysis firms and packages that could almost certainly do it. But -- as they've explained -- neither we nor our academic niche have the budget to a make a business case for them to take any interest. Which makes me think that I ought to develop these skills to employ for the bigger fish they're working with. The most helpful career advice would come if I were willing to explain my niche, but broadly: what are the jobs related to the issues and skills I've mentioned?

Thanks!
posted by Admiral Sock to Technology (13 answers total) 45 users marked this as a favorite
 
Response by poster: (Oh, and I've consulted various academic departments in-house. They've been helpful to a point -- bureaucracy and logistics prevent them from taking me much further.)
posted by Admiral Sock at 9:50 AM on January 12, 2015


Hard to answer without really knowing what type of data you're working with, or what you're trying to do with it.

R is free and with the R-CMDR plugin you can run it from a menu-driven interface rather than direct coding.

Sounds like you could use a basic statistics class. Most come teaching a statistical analysis software. R is open source, STATA is user-friendly, and SAS is quite common though more for experts than dabblers.

But don't forget: the first part of making decisions with data is having data. If you're talking about making work decisions, then the big job is collecting it. If you track information, often a simple analysis ("how much time and money are we spending on this task? how much benefit are we getting from it? is it worth it?") is all you need to make an informed decision.
posted by entropone at 9:53 AM on January 12, 2015 [1 favorite]


Various departments at my university do "statistical computing" classes that will cover various computing packages, ranging from entire semesters covering R to a couple of days to learn Stata (which is much more user friendly). You might check in math/stats, but also in various social science departments. In my experience, most TAs would let outside people sit in unofficially/off the books as long as they didn't do annoying things like try to dominate the Q&A or take up a ton of time in office hours.

As far as which to learn, I would start with Stata simply because you can probably learn it an a few days on your own, using online user manuals and/or a couple of books checked out of the library. I don't have much of a computing background and I taught it to myself. I think R would be tougher to tackle on your own unless you are already pretty programming savvy, so for that I would look for a workshop or class.
posted by rainbowbrite at 10:55 AM on January 12, 2015 [1 favorite]


Are you in a city? Check out a Tableau demo. You probably won't use this stuff in academia, but at least you could get a taste of how people make this easy for themselves.
posted by oceanjesse at 11:11 AM on January 12, 2015 [1 favorite]


Coursera is offering a paid Data Science specialization that I've been window-shopping for a while. I have no idea how good it is or whether it's worth the time/money investment, but it's there.

Codeschool has a free R intro course.

School of Data has a ton of free resources.
posted by ourobouros at 11:43 AM on January 12, 2015 [2 favorites]


In my mind, there's no reason to pay for JHU's Data Science specialization on Coursera unless your work is footing the bill and you're not burning through a finite, small pile of money. You can also take all of the courses for free except the capstone, and I haven't heard good things about the capstone. The learning curve for it if you haven't done any substantive coding is harsh, and the level of explanation and forum discussion is highly variable between the courses.

That said, I'm in an analogous situation and learning bits of R and Python all the same, using those and other resources. Try the CodeAcademy or DataCamp intros to get you started and see if anything clicks for you. On the statistics front, if you need an intro (or a refresher) I can highly recommend this course by Dr. Mine Çetinkaya-Rundel at Duke. It starts again in early March, and as an added bonus, the labs are all conducted in R (using DataCamp) and very well put together.
posted by deludingmyself at 12:56 PM on January 12, 2015 [3 favorites]


If you've already learned enough Python to get BeautifulSoup working, I think you should look at NumPy, the plotting capabilities of Matplotlib, and if you need scientific data analysis, the analysis capabilities of SciPy.

NumPy is a professional-grade package that now provides the foundation for a wide range of scientific data analysis - it is fantastic. I don't have as glowing a report for Matplotlib, but let's say it is regularly improving, and things have gotten a lot more stable since it went 1.0.

Working through the NumPy and Matplotlib tutorials online should get you a solid footing with numerical data analysis and visualization.

(And if you need to install the whole thing, look at the academic licensing for the Enthought Python distribution - or Canopy, as it is now styled.)
posted by RedOrGreen at 1:03 PM on January 12, 2015 [4 favorites]


For PDF text parsing: if you have access to a unix server I use pdftotext with the -layout option to get data that should be a bit more easy to automate the processing of. Outside of unix, you might find some desktop apps simply by searching for pdftotext on Google, Yahoo, etc.

Seems like you're otherwise well on your way to building a tool box. Just be ready to itemize and consequentially be able to maximize the face value of your resume if/when the time comes. I don't know if these departments have died out or are still alive in companies, but you'll be seeking out Data Insights, Warehouse groups to get employed in. Companies that do analytics, targeted advertising, other related data intensive work.

Good luck! Sounds like you've been employed in a rather important part of science, not the most interesting or fulfilling, sure, but important.
posted by JoeXIII007 at 4:58 PM on January 12, 2015 [1 favorite]


The job title you're looking for is 'Data Scientist'. It's a great field to get into right now if this is a subject you're interested in - demand for competent data scientists greatly exceeds supply. I'll add another recommendation for the free data science specialization track on coursera. It'll give you a good, structured overview of the field, and after completing it you can decide which topics are of most interest to you. Regarding software, R and Python are both widely used (as is SAS, but the other two are open source).
posted by btkuhn at 6:33 PM on January 12, 2015 [1 favorite]


Instead of going to academic departments, does your university have a data and information services division of its library/ies? They often offer workshops and access to public data sets.
posted by Madamina at 6:58 PM on January 12, 2015 [1 favorite]


Response by poster: Thanks all for the great advice. A statistics course would indeed be very helpful.

Tableau's video pitch is compelling.

>If you're talking about making work decisions, then the big job is collecting it.

This is really helpful for framing my issues. I have good ideas about simple analyses to put on mountains of messy data--some public, some expensive to access. That's shaping up to be a bigger issue than the analysis itself.

Lots of good stuff to keep me going!
posted by Admiral Sock at 7:04 AM on January 13, 2015


I strongly recommend Udacity's courses, they have quite a lot of data science stuff and all free to view.
posted by greytape at 2:08 PM on January 13, 2015 [1 favorite]


You might want to browse DataTau a bit. It's an aspiring hacker-news-for-data-scientists. Not much discussion, but lots of links.

Also, Tableau is free for student use. (Of course, this is intended to get you hooked for later in your career.) Me, I like Python, but at least for now it's less bleeding edge than R.
posted by Going To Maine at 7:50 PM on March 13, 2015


« Older Recommend a mama hiking daypack?   |   How to save a movie in .mov format in iMovie 10.0... Newer »
This thread is closed to new comments.