Comments on: How do I explain sample size in layman's terms?

Question: How do I explain sample size in layman's terms?

Avenger50 — Wed, 12 Nov 2014 08:54:14 -0800

How do I explain the principles of minimum detectable effect, statistical power, and statistical significance to a client?

My client's business has 100k monthly visitors and a 1% conversion rate on their home page. After using Optimizely to test a new home page, they're ready to call it quits after 2k visitors, saying they usually "spot problems in the funnel in less than 1k visits."

How do I explain the principles of minimum detectable effect, statistical power, sample size, and statistical significance (like Optimizely's A/B Test Sample Size Calculator) in layman's terms?

By: shivohum

shivohum — Wed, 12 Nov 2014 09:29:43 -0800

I wouldn't get too technical. I'd use an analogy. Maybe doctors are trying to figure out whether people become energetic when they take vitamin D. So they split people randomly into groups who take the vitamin and those who don't and ask participants how energetic they feel.

But many other things also affect energy -- like sleep, stress, hydration, etc. If the pill made a difference, but only a small one, wouldn't it be hard to tell given all these other factors that might swamp it? But the doctors might care even about a small difference, because over a large population, even a minor increase in energy levels would be a worthwhile discovery.

How many people would the doctors need to test to see if the vitamin made a difference? Well, if you have a large enough group of people, the other factors tend to wash out, and the difference the pill makes become clearer. The larger the group of people, the more powerful the magnifying glass that enables you to see the effect of the particular thing you're investigating. Of course, if the thing makes a huge difference, it will be apparent even with a relatively small group of people. But if you want to make sure you want to catch even very small differences, you need quite a powerful magnifying glass.

At a certain point, though, if the difference is absolutely miniscule, you stop caring. So you have to figure out how small a difference still matters.

So statisticians have calculated the exact numbers needed, depending on how small a difference you want to be able to detect with confidence that you won't miss a difference that really is there.

And in this case, say, if you care about this small a difference in your conversion rate, say, 2k people just isn't enough... say, 10k is needed.

By: entropone

entropone — Wed, 12 Nov 2014 09:30:16 -0800

"I can calculate the probability that the results we are seeing are due to CHANCE, not due to the effect of the new home page.

Since we're testing this, we want to be sure of the effect of the new home page - not just a random blip in the numbers. We need more visits to reliably do this. This isn't my opinion - this is the foundation of statistical analysis."

You can also show them Optimizely's calculator and walk them through the concepts.

By: klangklangston

klangklangston — Wed, 12 Nov 2014 09:33:37 -0800

I'm always a fan of the bag of marbles/deck of cards analogies. How many white marbles W would you have to pull out of a bag of N marbles to get an idea the proportion of W/N?

By: bleep

bleep — Wed, 12 Nov 2014 10:00:46 -0800

If you're talking about user experience issues leading to abandonment, it's true that user experience problems can be spotted with a smaller sample than would be statistically acceptable. That might be what they're referring to.

By: leopard

leopard — Wed, 12 Nov 2014 10:22:27 -0800

I don't think you should feel compelled to explain those particular topics, even if your knowledge of them underlies your understanding.

I imagine your client already understands that you can test things and draw conclusions from a relatively small sample. If you think they are drawing conclusions too quickly, then explain that particular point. There's no need to go back to Statistics 101 to produce an entire conceptual edifice.

What's the chance that there is a meaningful improvement in the "true" conversion rate (say up to 1.5%) but the observed conversion rate shows a decrease? Show them that number. Or give them a confidence interval for the new conversion rate, if it's really wide they'll understand that the data hasn't narrowed things down yet.

And be wary of assuming that the client is dumb. The client is running a business, not doing a statistics problem set. If it takes until the heat death of the universe to figure out if something works or not, or if only unreasonably large improvements can be reasonably detected, then the client has to try things out and rely on a combination of noisy data and gut instinct to make decisions. There's nothing wrong with making a decision even if the p-value is bigger than 0.05.

By: dances_with_sneetches

dances_with_sneetches — Wed, 12 Nov 2014 12:04:39 -0800

Rather than the two group comparison sample size, I like to focus on the more simple single group.

Let's say a new drug kills one out of a thousand test patients. How would you know this? If you looked at one thousand people, maybe by chance it would be the 1001st person who died. Or maybe you would have by chance two people die in the first thousand and then, by chance, none in the second thousand. Here you, again, one in one thousand dying (or 2/2000).

How many patients would you have to survey to make sure that 1 in 1000 was the correct number? Let's say you want to be 95% sure (p = 0.05). That's the question of sample size.

By: Avenger50

Avenger50 — Wed, 12 Nov 2014 12:09:26 -0800

bleep: I don't understand that. Aren't all user experience issues related to "abandonment"? If people aren't signing up?

leopard: I'm not assuming the client is dumb. I'm assuming they don't know the importance of statistical power and sample size. The calculator says 31k visitors per test for their scenario at an MDE of 20%. The smaller I make the MDE, the larger the visitor set is.

By: acidic

acidic — Wed, 12 Nov 2014 12:51:29 -0800

How much money do they lose if the conversion rate the test is currently showing holds true through all 31k visitors you want to test? Plus whatever they'd have to pay you to continue the experiment, if that's an issue? That's what you're fighting against, so argue in those terms. How much money could they earn from a reasonable best-case scenario and what are the chances of that? They're not paying you to make analogies, they're paying you to apply your expertise to the particulars of their business, so do that.

By: Good Brain

Good Brain — Wed, 12 Nov 2014 17:20:39 -0800

Have you considered the possibility that it may be perfectly reasonable for them to want to cut bait on something that doesn't show a statistically significant impact in 1-2K visits, whether or not they fully understand the implications of sample size on predictive power?

They have a sunk cost in the experiment so far, and ongoing costs for continuing it, if only due to the opportunity cost of not being able to use available resources for other experiments that have a better chance of a larger payoff.

Given the current non-result, what are the chances that continuing this experience will produce evidence of a significant result with a return that will exceed their historical average return on such experiments? Do you have reason to believe that they have reached the point where they are unlikely to find optimizations that produce a significant result in 1-2K trials? Do you have reason to believe that the results observed in past experiments with 1K sample sizes were likely the result of chance.

By: Captain Chesapeake

Captain Chesapeake — Wed, 12 Nov 2014 20:11:14 -0800

Perhaps the Cartoon Guide to Statistics can help.

By: ctmf

ctmf — Wed, 12 Nov 2014 21:41:31 -0800

Sometimes you don't need a number. You just need to see that you're not immediately deluged with hate mail, and that's enough info combined with other considerations to make the decision.

But Khan Academy has some good short, understandable statistics videos you might steal something from.

By: blub

blub — Thu, 13 Nov 2014 00:25:10 -0800

I could be your client and agree with those who advise you to thread carefully. I totally see how the analogy approach could work with some people, but it would be easy to make me feel like you think I'm an idiot.

If you wanted to convince me to keep going, you should show me periods in the past where 2000 visitors also did the thing they do now, with the old design. So, if the problem is that the new design has less people sign up for a mailing list, and you tested on a Monday, find me another Monday where few people sign up for the mailing list (do check for holidays or other special events like a link from a high profile website resulting in tons of visitors but no signups), or give me a chart with the variation in mailing list signup conversions for the past year or so on the old design, so that I can see that it is possible that this new number is due to chance. What I'm saying is: keep your explanation relevant to my site.

By: gregglind

gregglind — Thu, 13 Nov 2014 12:41:07 -0800

Something unmentioned by the original poster -- what effect size are they expecting to be powered for?

At 2k (1-tailed, beta=.8, alpha=.95, p(occur)=.01), they are powered to detect and ~80% increase. So, even at 2k people, they know pretty well that the new page isn't twice as good. If they are expecting an improvement of 10% (from 1 -> 1.1% say), then they are underpowered.

For rare events like these (1% is rare!), I like to explain it in terms of heart attacks. Because heart attacks are rare, one has to watch a lot of people to make claims about what causes heart attacks. In this case, asking if the page increased conversion by 20% (for example) is like asking if 22/2000 is reliably different from 20/2000. OTOH, 310/30000 vs 300/30000 is reliably different! (That's the whole claim of the sample size calculator.)

As for them "spotting problems in the funnel", that suggests:

a. they have additional information (value per transaction) that should guide this, or
b. the human cognitive bias of seeing patterns where there isn't, or over-weighing evidence.