June 24, 2009 6:13 AM Subscribe

Having run split tests on the e-commerce website at work, how do I extract meaningful conclusions from the raw data?

At work we occasionally run split tests on our website to try out new functionality, content and layouts. Visitors to the site see one of the available options and when orders are placed we record which option they saw.

This means we have order totals for each option but how do we work out if one is statistically better than the other? From past experience different tests take different amounts of time to settle in the general trend (i.e. conversions increase by x%).

Is there anyway of knowing if a trend we're seeing at the moment is just a side effect of randomly assigning people to the different options or it's actually a genuine improvement?

Or is there a way of calculating the minimum number of results before having confidence that the trend is actually trend and not just noise?
posted by gi_wrighty to Computers & Internet (2 answers total) 4 users marked this as a favorite

At work we occasionally run split tests on our website to try out new functionality, content and layouts. Visitors to the site see one of the available options and when orders are placed we record which option they saw.

This means we have order totals for each option but how do we work out if one is statistically better than the other? From past experience different tests take different amounts of time to settle in the general trend (i.e. conversions increase by x%).

Is there anyway of knowing if a trend we're seeing at the moment is just a side effect of randomly assigning people to the different options or it's actually a genuine improvement?

Or is there a way of calculating the minimum number of results before having confidence that the trend is actually trend and not just noise?

I should add a note on interpretation. The lower the p-value returned, the more likely that the differences between the two samples are statistically significant. Generally, you require a p-value to be less than 0.05 before calling a result significant.

As with all statistical tests, this does make certain assumptions about the data (normality of the data, few or no outliers, etc). From what you've described, though, I think a t-test will be fine.

posted by chrisamiller at 7:03 AM on June 24, 2009

As with all statistical tests, this does make certain assumptions about the data (normality of the data, few or no outliers, etc). From what you've described, though, I think a t-test will be fine.

posted by chrisamiller at 7:03 AM on June 24, 2009

This thread is closed to new comments.

In excel, it'll look something like this:

=TTEST(A1:A25,B1:B25,1)

This page will get you started, as will searching for "excel ttest"

posted by chrisamiller at 6:59 AM on June 24, 2009