Help me learn about image recognition
December 22, 2008 6:54 PM Subscribe
Image recognition on the PC, where do I start?
I've been playing around with some external sensors attached to my PC (phidgets) and I'd like to play around with image recognition.
At the most basic level I was thinking about taking a webcam, somehow taking image dumps every so often and analyzing light levels or even basic colors. (possibly graduating to shapes).
Does anyone know of any more information on this topic? Or know where I can find tools to do so? I have no idea where to start.
I've been playing around with some external sensors attached to my PC (phidgets) and I'd like to play around with image recognition.
At the most basic level I was thinking about taking a webcam, somehow taking image dumps every so often and analyzing light levels or even basic colors. (possibly graduating to shapes).
Does anyone know of any more information on this topic? Or know where I can find tools to do so? I have no idea where to start.
You should look at OpenCV. You'll need to be able to program to use it, probably C++, but it has an incredible variety of very good analysis tools and is quite well supported. If you're less inclined to program, OpenCV has bindings to matlab, which would be easier to pick up (though its not free). Matlab itself probably wouldn't be bad for basic stuff -- it probably has a built-in toolkit with useful functions, though I haven't looked for one.
posted by devilsbrigade at 7:42 PM on December 22, 2008
posted by devilsbrigade at 7:42 PM on December 22, 2008
Best answer: I'm going to assume you can program a computer and solve linear algebra problems competently. If you can't, you're not going to get very far at this before you have to go and learn those things.
I did a fair amount of work on this as an undergrad (and throughout my entire 8-week graduate career).
The general field you're looking for is "computer vision"--this is where you'll go for most of your basic algorithms. The subfield is (probably) "natural image recognition" or "natural scene recognition". The "natural" part is the key, as you're giving it real images taken with a camera against a noisy, cluttered background... not background-free scans, edited photos, or generated images.
For what you explicitly mention (luminance and chroma analysis), you don't need anything particularly sophisticated. You take a picture with your webcam (using whatever API is relevant). You'll get it in some sort of bitmap format... from there, you can get simple luminance by taking the norm of the RGB vector in each pixel (essentially how close it is to white). Simple chroma analysis is basically just a fuzzy comparison of the RGB vectors (the pixel [.7, 0, .3] is within your threshold of [.1, .1, .1] of your target [.65, .01, .2]). Anything more complex than "which pixels are pink?" generally requires transforming the color data into a different, more suitable, color space and doing similar things there (RGB is a shitty color space for many analyses).
It should be noted that real natural image recognition is an unsolved problem with a huge amount of active research behind it. You will not find meaningful tutorials, as there is no straight-forward, one-webpage, beginner-level way of solving these problems. Just determining whether an object is foreground or background is a massively difficult problem to which I've seen no satisfactory general solution. For any sort of assistance in this, you should look to the scientific literature. Google scholar and citeseer are where I go for free papers. Here is a list of books; I recognize some of them as well-written and authoritative. All of this literature, be it books or papers, is going to be highly technical--the authors will almost certainly assume fluency in computer science and linalg (plus probably some probstat and calculus).
If you have money to drop on this, it might be worth buying a copy of Matlab and the Image toolkit. It doesn't have much in the way of built-in analysis, but it does natively support all the numeric linear algebra you'll need, as well as offering a number of operations specific to images. I bloody well hate the Matlab programming language (INDEX FROM ONE?!?), but the environment is very nice and much easier to use for linalg than any sort of scripting_language+BLAS solution.
I don't mean to be discouraging. But, your question is a little like "I've been playing around with model rockets; how would I go about building an ICBM?" It's entirely doable and probably within your faculties. But, you're going to be studying for a while before you can hope to achieve your aim.
posted by Netzapper at 8:10 PM on December 22, 2008 [3 favorites]
I did a fair amount of work on this as an undergrad (and throughout my entire 8-week graduate career).
The general field you're looking for is "computer vision"--this is where you'll go for most of your basic algorithms. The subfield is (probably) "natural image recognition" or "natural scene recognition". The "natural" part is the key, as you're giving it real images taken with a camera against a noisy, cluttered background... not background-free scans, edited photos, or generated images.
For what you explicitly mention (luminance and chroma analysis), you don't need anything particularly sophisticated. You take a picture with your webcam (using whatever API is relevant). You'll get it in some sort of bitmap format... from there, you can get simple luminance by taking the norm of the RGB vector in each pixel (essentially how close it is to white). Simple chroma analysis is basically just a fuzzy comparison of the RGB vectors (the pixel [.7, 0, .3] is within your threshold of [.1, .1, .1] of your target [.65, .01, .2]). Anything more complex than "which pixels are pink?" generally requires transforming the color data into a different, more suitable, color space and doing similar things there (RGB is a shitty color space for many analyses).
It should be noted that real natural image recognition is an unsolved problem with a huge amount of active research behind it. You will not find meaningful tutorials, as there is no straight-forward, one-webpage, beginner-level way of solving these problems. Just determining whether an object is foreground or background is a massively difficult problem to which I've seen no satisfactory general solution. For any sort of assistance in this, you should look to the scientific literature. Google scholar and citeseer are where I go for free papers. Here is a list of books; I recognize some of them as well-written and authoritative. All of this literature, be it books or papers, is going to be highly technical--the authors will almost certainly assume fluency in computer science and linalg (plus probably some probstat and calculus).
If you have money to drop on this, it might be worth buying a copy of Matlab and the Image toolkit. It doesn't have much in the way of built-in analysis, but it does natively support all the numeric linear algebra you'll need, as well as offering a number of operations specific to images. I bloody well hate the Matlab programming language (INDEX FROM ONE?!?), but the environment is very nice and much easier to use for linalg than any sort of scripting_language+BLAS solution.
I don't mean to be discouraging. But, your question is a little like "I've been playing around with model rockets; how would I go about building an ICBM?" It's entirely doable and probably within your faculties. But, you're going to be studying for a while before you can hope to achieve your aim.
posted by Netzapper at 8:10 PM on December 22, 2008 [3 favorites]
Also, in terms of things you should know, a lot of computer vision relies on decently difficult statistics problems, so if you really want to get into foreground/background, edges, that kind of stuff, you're probably going to need to do a lot of background reading before you even touch the vision problems. As Netzapper said, linear algebra is a must to even get started.
posted by devilsbrigade at 9:14 PM on December 22, 2008
posted by devilsbrigade at 9:14 PM on December 22, 2008
To beat on my pd drum again, there is a puredata plugin called GEM with webcam inputs and objects for doing matrix math on the resulting images and displaying results on screen. This may be useful as a quick sketchpad for trying out different algorithms and settings (often the faster you can go from one failed attempt to the next the faster development goes, and the sooner you find something useful, and I don't know of any other open source environment where you can implement video processing ideas so fluidly).
The one problem will probably be the difference in cpu usage between using pd objects and coding things in a text language, it may be helpful to rewrite the abstractions you most often use in c (translating from pd to a traditional language is actually quite easy, much easier than the other way around).
It used to be that when I was trying to master some mathematical concept I would write a scheme or common lisp function and try different inputs to observe the behaviors, but lately I find the click and drag environment of pd much quicker. It may be useful to sketch out concepts from any textbooks or papers you need to study in pd as a tool for grasping them with.
It looks like people are doing interesting things with pd and reactivision, reactivision is the open source library used with the reactable computer vision music control interface.
posted by idiopath at 11:19 PM on December 22, 2008
The one problem will probably be the difference in cpu usage between using pd objects and coding things in a text language, it may be helpful to rewrite the abstractions you most often use in c (translating from pd to a traditional language is actually quite easy, much easier than the other way around).
It used to be that when I was trying to master some mathematical concept I would write a scheme or common lisp function and try different inputs to observe the behaviors, but lately I find the click and drag environment of pd much quicker. It may be useful to sketch out concepts from any textbooks or papers you need to study in pd as a tool for grasping them with.
It looks like people are doing interesting things with pd and reactivision, reactivision is the open source library used with the reactable computer vision music control interface.
posted by idiopath at 11:19 PM on December 22, 2008
Netzapper has it (I'm a post-doctoral researcher in this field, and it still throws up maths I can't understand on a regular basis). I use OpenCV (c++) and Matlab and some other libraries. OpenCV has python hooks too, I think, if your c++ isn't much use. OpenCV is free but requires good programming skills, Matlab costs money but has a less steep learning curve and better documentation.
You could get this book: Image Processing, Analysis and Machine Vision out of the library and give yourself an overview.
posted by handee at 6:18 AM on December 23, 2008
You could get this book: Image Processing, Analysis and Machine Vision out of the library and give yourself an overview.
posted by handee at 6:18 AM on December 23, 2008
There is a book about OpenCV that just came out: Learning OpenCV. I took a copy out from my library about a week ago and it looks good so far. The book does a good job of explaing how and why one would use the different facilites of OpenCV.
I'm an experienced programmer just starting to look at computer vision for some specific tasks and this book has been great for get up to speed and understnad the basic techniques.
posted by bdc34 at 8:11 AM on December 23, 2008
I'm an experienced programmer just starting to look at computer vision for some specific tasks and this book has been great for get up to speed and understnad the basic techniques.
posted by bdc34 at 8:11 AM on December 23, 2008
I'd say the easiest way to approach this is with the open-source java-based environment Processing. Processing was designed to make it easy for non-programmers to get a handle on these type of audio/visual tasks without writing much code at all. Specifically for computer vision, blob detection, etc. one might look closely at the libraries JMyron and BlobDetection. All are well-documented with plenty of example source code.
posted by martini at 1:06 PM on December 24, 2008 [1 favorite]
posted by martini at 1:06 PM on December 24, 2008 [1 favorite]
The computer vision field does have considerable amount of depth to it, but I definitely would not discourage anyone from delving into the field, regardless of your background. It's fascinating, challenging, and, while it does get quite mathematical, if you have a decent understanding of the main ideas behind the engineering maths (calculus, linear algebra, probability) or you have some time to learn them, I think you'll get a lot out of playing around with computer vision. I'm a researcher in the Machine Learning area and work in the computer vision field, and from my experience even some of the simplest algorithms out there can work quite well and are a great stepping stone to more complicated algorithms.
Definitely, the first place I would start out is a good image processing text book. There are probably lots of good ones out there, but one I can personally vouch for is Gonzalez and Woods' Digital Image Processing. This book does an excellent job of introducing all the building blocks of working with images. Some examples:
Aside from giving you a good background in the basics, the tail end of the Gonzalez-Woods book gives a descent introduction to statistical decision theory which is at the heart of many computer vision applications and a good spring board to computer vision topics propper.
I can't say that I have run into great computer vision textbooks. The field is so large that no one text can do every topic justice. At this point I would pick out an interesting problem and look for research papers on the topic. The research papers themselves may be pretty condensed and hard to follow if you are just starting out but look for the general techniques and keywords that have been used (Gaussian Mixtures, SIFT descriptors, SVMs) and search for tutorials on the topics, or read up on the techniques in machine learning texts (Here's a few that I find useful: Pattern Classification by Duda, Hart and Stork, Pattern Recognition and Machine Learning by Bishop, and The Elements of Statistical Learning by Hastie, et al.)
Finally, as for software tools, I do recommend getting a hold of Matlab with the Image Processing, Statistics, and Bioinformatics toolboxes (bioinformatics has some high level machine learning algorithms that you can play around with quickly). I think Matlab is the fastest way to play around with numerical algorithms in general. What takes me a weeks worth of work in a language like C++ literally only takes a day's worth of work (though you wouldn't want to deploy your code, as Matlab is a bit slow.)
Hope that helps. Good luck and have fun.
posted by etfd at 1:30 PM on December 24, 2008
Definitely, the first place I would start out is a good image processing text book. There are probably lots of good ones out there, but one I can personally vouch for is Gonzalez and Woods' Digital Image Processing. This book does an excellent job of introducing all the building blocks of working with images. Some examples:
- how to represent color in your algorithms. RGB values are fine but there are many more representations out there (Hue-Saturation-Value for example) that more closely resemble how humans describe color.
- how to represent texture, perform edge detection and line segment analysis on images.
- how to perform global and local manipulation of images such as smoothing, how filters work, etc.
Aside from giving you a good background in the basics, the tail end of the Gonzalez-Woods book gives a descent introduction to statistical decision theory which is at the heart of many computer vision applications and a good spring board to computer vision topics propper.
I can't say that I have run into great computer vision textbooks. The field is so large that no one text can do every topic justice. At this point I would pick out an interesting problem and look for research papers on the topic. The research papers themselves may be pretty condensed and hard to follow if you are just starting out but look for the general techniques and keywords that have been used (Gaussian Mixtures, SIFT descriptors, SVMs) and search for tutorials on the topics, or read up on the techniques in machine learning texts (Here's a few that I find useful: Pattern Classification by Duda, Hart and Stork, Pattern Recognition and Machine Learning by Bishop, and The Elements of Statistical Learning by Hastie, et al.)
Finally, as for software tools, I do recommend getting a hold of Matlab with the Image Processing, Statistics, and Bioinformatics toolboxes (bioinformatics has some high level machine learning algorithms that you can play around with quickly). I think Matlab is the fastest way to play around with numerical algorithms in general. What takes me a weeks worth of work in a language like C++ literally only takes a day's worth of work (though you wouldn't want to deploy your code, as Matlab is a bit slow.)
Hope that helps. Good luck and have fun.
posted by etfd at 1:30 PM on December 24, 2008
« Older What brand of clothing should I send to my pen pal... | What typewriter ribbon fits in a Remington 9? Newer »
This thread is closed to new comments.
posted by hattifattener at 7:02 PM on December 22, 2008