Human Learning Machine Learning
September 3, 2013 4:13 PM   Subscribe

I'm mulling over the idea of working my way through a machine learning text and self-teaching. Help me choose a research project and a text!

As I said above the fold, I'm mulling over the idea of working my way through a machine learning text and self-teaching. I'd probably create a blog to document my efforts and to serve as notes to myself. I think I'd learn more and have more fun if I had a research project that I could work through as I learn.

I'm very much interested in finance and economics. Additionally, professionally I work in commercial real estate. However, I don't know how well these subjects would lend themselves to research projects. I'm wondering what unexplored, worthwhile areas of research might exist. I'm reaching out to the community to see if you guys have any interesting ideas.

Also, can someone please recommend a good machine learning text? Thanks!
posted by prunes to Computers & Internet (8 answers total) 15 users marked this as a favorite
not quite what you're asking for, but Coursera do a (free, online) machine learning course that could be a useful starting point.
posted by russm at 4:26 PM on September 3, 2013

I liked this one, but the book relies heavily on the Weka software, which may not be what the kids are using these days. Still, I found it highly accessible for a simple caveman like myself.
posted by RobotVoodooPower at 4:27 PM on September 3, 2013

Programming Collective Intelligence using Python, a language easy to learn. I would first go through the examples given then think about how to adapt them. For example, if you have access to a real estate database, depending on the amount of information in it (and the terms of reference) you could try to see if one can predict how long a house will sell based on the difference between the assessment and the asking price, or the size of the house, the geographic location, etc., or something similar for commercial space. Using the result of your research, you might be in a better position to make recommendations to clients (based on actual data that you know how to analyze and present).
posted by aroberge at 5:04 PM on September 3, 2013

The Coursera course is very good, if a bit basic. I have it on good authority (my grad-level machine learning prof) that the Bishop text is the best recent one. (The Mitchell that a lot of people like -- it's a lot less dense -- is showing its age.)

What's your programming background like? Fluency in Matlab, R, or Python (Numpy) is probably a prerequisite for any sort of project.

You can get a project out of any sort of large data set. In your field, something like a list of all building sales, with square footage, zoning, location, etc, as predictors and sale price (or sale price per sq ft) as the variable of interest. Cross-validate and bam.

It's probably best not to start with something unexplored. There are many machine learning competitions-- some organization will publish a data set, hold back some data, give you the predictors for the held back portion, and score based on fit with the real data. That might be the way to go so that you can compare your approach to other people/teams. And as you're working through the text, you can try the different methods and apply each one to this data set, and see which one is most appropriate and why. (That's the most important part.)
posted by supercres at 5:06 PM on September 3, 2013 [2 favorites]

Response by poster: I am fairly fluent in Python, although I haven't worked in numpy specifically.

What are the advantages to getting specifically a book on the theory of machine learning versus one that is more applied and which integrates a library like scikitlearn? I'm wondering whether something like the Bishop text might be too academic for me.
posted by prunes at 6:00 PM on September 3, 2013

Yeah, the Bishop is pretty theoretical. Requires at least a little background in probability and linear algebra. It's aimed at advanced engineering undergrads or early grad students.

I'd try the Coursera course first and go from there. I've heard it's pretty applied. Or just go for a more cookbook-type book like this.
posted by supercres at 6:22 PM on September 3, 2013 [1 favorite]

Best answer: Barber's Bayesian Reasoning and Machine Learning is my favorite; it's light on the theory, but manages to go very deep. The end of the book gets into dynamical/time series models, which you might find particularly interesting (although those are relatively not as well supported by scikit aside from HMMs, I think). Caveats: the code that comes with the book is in Matlab/Octave, but if you're planning on working through the Coursera ML course (which you should!), you're going to learn that anyway.

For something with a broader take, a lot of people like and refer to The Elements of Statistical Learning but I'm honestly not much of a fan and I think it's beginning to show its age. I've heard that Murphy's Machine Learning: A Probabilistic Perspective is a worthy successor, though haven't had the chance to check it out yet.

For something more theoretical, check out Abu-Mostafa's Learning From Data (you can get it for $28 new -- the publisher sells it through Amazon under AMLBook). It is EXCELLENT. There's even an accompanying set of lectures!

Good luck and please don't hesitate to MeMail me if you have any questions!
posted by un petit cadeau at 7:48 PM on September 3, 2013

Best answer: Programming Collective Intelligence is a great start. It's a fairly easy, hands-on, low-theory tour through a number of techniques. If you make in through and want more, great. If you've had enough, still great, because you'll have a better appreciation for what's possible.

Assuming you want more, do you want a research project, or do you have a problem to grapple with? They're different beasts. I've found that having an immediate problem to wrestle with provides good focus while learning, and can provide quicker feedback.

I'm working my way through Abu-Mostafa's Learning From Data course (Caltech), which provides a good theoretical foundation without requiring a PhD in math. (Andrew Ng's Coursera Course on Machine Learning is good, but is daunting unless your math chops are good).

I have an image recognition problem that I'm hoping to apply machine learning techniques to. On the surface, it's mundane: given a set of pictures of a bird feeder, select the pictures that have birds in them. Beneath the surface, it gets interesting. Can I apply unsupervised learning techniques (e.g., k-means clustering on some signal extracted from the images), or will a supervised technique (one that requires hand-labelling a training set) perform better?

The point here is to not get too hung up over the "research worthiness" of your problem. Pick on that you can gather data on, even if you don't yet know how you'll leverage that data. Then explore as you go. Get your hands into the clay.
posted by dws at 11:10 PM on September 3, 2013

« Older Leaving-job-filter - how to hand in my notice and...   |   How to get out of this? Newer »
This thread is closed to new comments.