Comments on: What's the statistical technique for combining several test results into one?
http://ask.metafilter.com/135754/Whats-the-statistical-technique-for-combining-several-test-results-into-one/
Comments on Ask MetaFilter post What's the statistical technique for combining several test results into one?Sat, 17 Oct 2009 22:03:33 -0800Sat, 17 Oct 2009 22:03:33 -0800en-ushttp://blogs.law.harvard.edu/tech/rss60Question: What's the statistical technique for combining several test results into one?
http://ask.metafilter.com/135754/Whats-the-statistical-technique-for-combining-several-test-results-into-one
I'm a statistics n00b trying to learn how to combine the results of several tests into one. <br /><br /> Basically, I'd like to learn to how to categorize entities in some experimental data by combining the scores from several domain-specific tests into one unified score. A practical but hypothetical example would be writing a computer program that given the sound of a car engine will try to identify what model of car it came from. Say that there are 10 possible cars each sound can be matched with, and three independent tests that are applied to each sound, each test producing a number between zero and one for each car model indicating how likely it is that the sound came from that model. I could combine the test results naively by simply adding them together, but that could produce crappy results if one of the tests is much more accurate than the others, possibly worse results than just using that test by itself. <br>
<br>
There are established ways to do this that I've seen used in research before, but don't know any of the math and haven't had any luck Googling for info. I'm not looking for a detailed explanation, just some pointers to what I should research to teach myself. Thanks!post:ask.metafilter.com,2009:site.135754Sat, 17 Oct 2009 20:39:39 -0800gsteffstatisticsresearchexperimentsdataminingBy: Blazecock Pileon
http://ask.metafilter.com/135754/Whats-the-statistical-technique-for-combining-several-test-results-into-one#1939792
It sounds like you want to do <a href="http://en.wikipedia.org/wiki/Statistical_classification">classification</a>? An <a href="http://en.wikipedia.org/wiki/Support_vector_machine">SVM</a> is one such technique that takes in vectors of "features". Features are just any scores of interest. <br>
<br>
Taking your example, you have ten vectors, each vector containing three features, which are values between 0 and 1, representing the three audio measurements.<br>
<br>
<tt>{0, 0, 1}<br>
{0.1, 0.2, 0.9}<br>
...<br>
{0.2, 0.9, 0.4}</tt><br>
<br>
It's not necessary for the features to be on the same scale as the others.<br>
<br>
You know ahead of time the label for each vector, i.e. the particular class or "automobile" that a vector is associated with. You can use this data to <em>train</em> an SVM. You can even use more features, if there are other measurements that you have data for.<br>
<br>
A trained SVM can be used to classify new data. If you then take new measurements to try to find out what classes ("automobiles") they are associated with, you can use the trained SVM to associate a measurement with a particular class or car.<br>
<br>
<a href="http://pyml.sourceforge.net/">PyML</a> is a Python-based tool for SVM generation and classification. The site has a <a href="http://pyml.sourceforge.net/doc/howto.pdf">great introduction</a> to the concepts and mathematics.comment:ask.metafilter.com,2009:site.135754-1939792Sat, 17 Oct 2009 22:03:33 -0800Blazecock PileonBy: sophist
http://ask.metafilter.com/135754/Whats-the-statistical-technique-for-combining-several-test-results-into-one#1939836
<em>I could combine the test results naively by simply adding them together, but that could produce crappy results if one of the tests is much more accurate than the others, possibly worse results than just using that test by itself.</em><br>
<br>
It seems like if you just weight each test corresponding to its accuracy, you can easily overcome this problem. Create a test set, use it to gauge accuracy of each test, then apply those weights on the cases you are interested in. A machine learning solution, particularly one that relies on supervised learning techniques, is certainly a perfect solution to this task but it adds an entirely new level of complexity.<br>
<br>
You might have some luck crossposting this to<a href="http://mathoverflow.net/"> Math Overflow</a>, which was just recently posted to the blue.comment:ask.metafilter.com,2009:site.135754-1939836Sun, 18 Oct 2009 00:06:40 -0800sophistBy: a robot made out of meat
http://ask.metafilter.com/135754/Whats-the-statistical-technique-for-combining-several-test-results-into-one#1940147
This depends a great deal on the kind of data and tests that you're trying to combine and if you're always interested in catagorization. The only unified answer is if you're Bayesian and think that you can write down a likelihood for your data and prior probabilities for your parameters.comment:ask.metafilter.com,2009:site.135754-1940147Sun, 18 Oct 2009 10:56:41 -0800a robot made out of meatBy: acidic
http://ask.metafilter.com/135754/Whats-the-statistical-technique-for-combining-several-test-results-into-one#1940190
If you don't know the accuracy of the tests and they are not reliable predictors of each other, then I think you know the answer-- there's no categorization method more rational than averaging the numbers. Why do you need the data to be categorized? If you're trying to use the tests as independent variables for something, you'd be much better off using the raw quantitative data, separately. With that method you might be able to discard one or more of the tests (if the coefficients are near zero or if they're not significant).<br>
<br>
Of course, in your example, you would have the correct answers and thus would be able to measure the accuracy of the tests not just generally but with regard to specific sounds and cars.comment:ask.metafilter.com,2009:site.135754-1940190Sun, 18 Oct 2009 11:40:37 -0800acidic