Comments on: What's the statistical technique for combining several test results into one?

Question: What's the statistical technique for combining several test results into one?

gsteff — Sat, 17 Oct 2009 20:39:39 -0800

I'm a statistics n00b trying to learn how to combine the results of several tests into one.

Basically, I'd like to learn to how to categorize entities in some experimental data by combining the scores from several domain-specific tests into one unified score. A practical but hypothetical example would be writing a computer program that given the sound of a car engine will try to identify what model of car it came from. Say that there are 10 possible cars each sound can be matched with, and three independent tests that are applied to each sound, each test producing a number between zero and one for each car model indicating how likely it is that the sound came from that model. I could combine the test results naively by simply adding them together, but that could produce crappy results if one of the tests is much more accurate than the others, possibly worse results than just using that test by itself.

There are established ways to do this that I've seen used in research before, but don't know any of the math and haven't had any luck Googling for info. I'm not looking for a detailed explanation, just some pointers to what I should research to teach myself. Thanks!

By: Blazecock Pileon

Blazecock Pileon — Sat, 17 Oct 2009 22:03:33 -0800

It sounds like you want to do classification? An SVM is one such technique that takes in vectors of "features". Features are just any scores of interest.

Taking your example, you have ten vectors, each vector containing three features, which are values between 0 and 1, representing the three audio measurements.

{0, 0, 1} {0.1, 0.2, 0.9} ... {0.2, 0.9, 0.4}

It's not necessary for the features to be on the same scale as the others.

You know ahead of time the label for each vector, i.e. the particular class or "automobile" that a vector is associated with. You can use this data to train an SVM. You can even use more features, if there are other measurements that you have data for.

A trained SVM can be used to classify new data. If you then take new measurements to try to find out what classes ("automobiles") they are associated with, you can use the trained SVM to associate a measurement with a particular class or car.

PyML is a Python-based tool for SVM generation and classification. The site has a great introduction to the concepts and mathematics.

By: sophist

sophist — Sun, 18 Oct 2009 00:06:40 -0800

I could combine the test results naively by simply adding them together, but that could produce crappy results if one of the tests is much more accurate than the others, possibly worse results than just using that test by itself.

It seems like if you just weight each test corresponding to its accuracy, you can easily overcome this problem. Create a test set, use it to gauge accuracy of each test, then apply those weights on the cases you are interested in. A machine learning solution, particularly one that relies on supervised learning techniques, is certainly a perfect solution to this task but it adds an entirely new level of complexity.

You might have some luck crossposting this to Math Overflow, which was just recently posted to the blue.

By: a robot made out of meat

a robot made out of meat — Sun, 18 Oct 2009 10:56:41 -0800

This depends a great deal on the kind of data and tests that you're trying to combine and if you're always interested in catagorization. The only unified answer is if you're Bayesian and think that you can write down a likelihood for your data and prior probabilities for your parameters.

By: acidic

acidic — Sun, 18 Oct 2009 11:40:37 -0800

If you don't know the accuracy of the tests and they are not reliable predictors of each other, then I think you know the answer-- there's no categorization method more rational than averaging the numbers. Why do you need the data to be categorized? If you're trying to use the tests as independent variables for something, you'd be much better off using the raw quantitative data, separately. With that method you might be able to discard one or more of the tests (if the coefficients are near zero or if they're not significant).

Of course, in your example, you would have the correct answers and thus would be able to measure the accuracy of the tests not just generally but with regard to specific sounds and cars.