What's the statistical technique for combining several test results into one?
October 17, 2009 8:39 PM
Subscribe
I'm a statistics n00b trying to learn how to combine the results of several tests into one.
Basically, I'd like to learn to how to categorize entities in some experimental data by combining the scores from several domain-specific tests into one unified score. A practical but hypothetical example would be writing a computer program that given the sound of a car engine will try to identify what model of car it came from. Say that there are 10 possible cars each sound can be matched with, and three independent tests that are applied to each sound, each test producing a number between zero and one for each car model indicating how likely it is that the sound came from that model. I could combine the test results naively by simply adding them together, but that could produce crappy results if one of the tests is much more accurate than the others, possibly worse results than just using that test by itself.
There are established ways to do this that I've seen used in research before, but don't know any of the math and haven't had any luck Googling for info. I'm not looking for a detailed explanation, just some pointers to what I should research to teach myself. Thanks!
posted by gsteff to science & nature (4 comments total)
1 user marked this as a favorite
Taking your example, you have ten vectors, each vector containing three features, which are values between 0 and 1, representing the three audio measurements.
{0, 0, 1}
{0.1, 0.2, 0.9}
...
{0.2, 0.9, 0.4}
It's not necessary for the features to be on the same scale as the others.
You know ahead of time the label for each vector, i.e. the particular class or "automobile" that a vector is associated with. You can use this data to train an SVM. You can even use more features, if there are other measurements that you have data for.
A trained SVM can be used to classify new data. If you then take new measurements to try to find out what classes ("automobiles") they are associated with, you can use the trained SVM to associate a measurement with a particular class or car.
PyML is a Python-based tool for SVM generation and classification. The site has a great introduction to the concepts and mathematics.
posted by Blazecock Pileon at 10:03 PM on October 17, 2009 [1 favorite]