What are the odds that I can solve this problem?
June 19, 2007 7:46 AM
Subscribe
Probability/stat question: I'm looking for patterns in a protein sequence, and I've found a few that occur quite frequently. How do I know these are actual patterns and not just an artifact of random amino acid distribution?
I have a roughly 1000 amino acid sequence, and I've used a sliding window to chop it up into overlapping 6-mers. Some of these 6-mers occur much more frequently than others and I suspect they have some sort of biological significance. Unfortunately, I don't know to test whether these are true pattern in the biological sense, or if they could just as easily have been the result of random distribution.
I've tried comparing the expected frequency of these 6-mers based on the amino acid distribution with the observed frequency; but the chance of getting any given 6-mer randomly is so low that almost anything I observe (even the ones that only show up once) seem really significant. I'll be happy to clarify things if this post is a bit messy.
posted by reformedjerk to science & nature (11 comments total)
posted by Wolfdog at 7:56 AM on June 19, 2007