# ANOVA Modelling

July 1, 2004 4:36 PM Subscribe

What's a good rule of thumb for the maximum number of 'varieties' and 'factors' you can have in an ANOVA model?

That is, where does the computation get intractable? How much data needed for a given layout? Stuff like this. I realize the question is somewhat ill-posed, just curious if anyone has practical experience from real-world application they'd like to share.

That is, where does the computation get intractable? How much data needed for a given layout? Stuff like this. I realize the question is somewhat ill-posed, just curious if anyone has practical experience from real-world application they'd like to share.

A common rule of thumb is that you need 10 data points for each predictor in a model. You can get away with less, particularly if your effect size is huge. But more data points is always better.

posted by nixxon at 5:46 PM on July 1, 2004

posted by nixxon at 5:46 PM on July 1, 2004

I think the question was actually, how many variables can you have. The answer is: Not too many. I think anything above 3 starts to get unruly. Think of all the interactions you would need to interpret.

I have two factors which leads to only one interaction factor (between the two factors) and I don't really know what it means. I'm faking it.

posted by ajpresto at 7:26 AM on July 2, 2004

I have two factors which leads to only one interaction factor (between the two factors) and I don't really know what it means. I'm faking it.

posted by ajpresto at 7:26 AM on July 2, 2004

With large numbers of variables, youâ€™re often better off looking at different methods of slicing your data, usually some sort of aggregate analysis. Which set of measurements is most like another, and such. Principle component analysis and/or hierarchical clustering become more interesting than individual variables. Even with four or five variables, PCA or HC can be useful. Doesn't really help directly with the question, I know, but still might be fruitful to pursue.

posted by bonehead at 8:05 AM on July 2, 2004

posted by bonehead at 8:05 AM on July 2, 2004

I guess my initial response was unclear. What I was trying to say is that the number of variables (or factors, or whatever you want to call them) you can include depends on the number of data points you have. From a purely mathematical perspective, you can have lots of variables if you have lots of data.

But ajpresto is right -- interpreting the results of an ANOVA is a bitch if you have lots of factors. I've driven myself to the brink of madness trying to interpret 4-way interactions. For models with lots of independent variables (predictors), a linear regression would be easier to interpret -- and it does essentially the same thing as ANOVA.

posted by nixxon at 8:48 AM on July 2, 2004

But ajpresto is right -- interpreting the results of an ANOVA is a bitch if you have lots of factors. I've driven myself to the brink of madness trying to interpret 4-way interactions. For models with lots of independent variables (predictors), a linear regression would be easier to interpret -- and it does essentially the same thing as ANOVA.

posted by nixxon at 8:48 AM on July 2, 2004

That was very helpful, thanks!

I've used PCA/etc, not quite what I need here. The (M)ANOVA stuff I've done has been with microarrays, so (if I keep the terminology right, it's been a while) I had a HUGE number of "varieties" (genes/spots) but only a few "Factors" - "Gene","Dye","Chip","Sample".

posted by freebird at 12:01 PM on July 2, 2004

I've used PCA/etc, not quite what I need here. The (M)ANOVA stuff I've done has been with microarrays, so (if I keep the terminology right, it's been a while) I had a HUGE number of "varieties" (genes/spots) but only a few "Factors" - "Gene","Dye","Chip","Sample".

posted by freebird at 12:01 PM on July 2, 2004

This thread is closed to new comments.

/had to be done

posted by Voivod at 5:23 PM on July 1, 2004