Statistically comparing different search engine results
December 6, 2006 8:12 AM
Subscribe
Stats101Filter. I have (I think) a stats question, but little stats knowledge. Problem: The same library, and two different information retrieval systems - A and B - incorporating different metadata and search engines. I search the library with each IR system, using the same query. Is there a way to compare the similarity or difference between the different results sets I get from each system, and also assign this difference some statistical significance?
For example, if I search for 'frogs,' I could get 'All about Frogs,' 'Lifecycle of the Frog,' 'Florida Frog Cam,' 'Cool Frog Pics,' etc., as results. I can see a range of scenarios for comparing A and B.
H0 - There is no difference
Scenario 1 - A and B return the same results, in the same order
H1 - There is a difference
Scenario 2 - A and B return the same results, but in different order
Scenario 3 - A and B return at least some different results
Etc.
For various reasons - basically we are replacing one engine with another that we think is more efficient and scalable - we are hoping for Scenario 1. However, we are worried that we may encounter Scenario 3. So the question is, how can we calculate any 'difference' we might encounter in Scenario 3, and how can we decide whether or not this difference is significant (and in real-world terms, likely to confuse users if we do switch our IR systems). Phew! (And thanks!)
posted by carter to technology (10 comments total)
Otherwise, the classic way to test and IR system is through precision and recall. That is, does a system return all of the possible relevant results and of the relevant results returned how many are correct? That will take care of scenario #3. If you have a relevance score then you can make sure that the most important matches are being returned.
posted by Alison at 9:59 AM on December 6, 2006