Statistics Filter: Finding the top 10% from thousands of data points. Bonus question: inflection point value.
August 7, 2008 9:49 AM Subscribe
Statistics Filter: Finding the top 10% from thousands of data points. Bonus question: inflection point value.
Statistics Filter: I have thousands of data points (ranging in value from 1 to 0, 8 decimal points) representing a score. I graphed the data set and it's a classic long-tail distribution--looks like a hockey stick. How do I find the value of the "inflection point" and also the value/position of top 10%, in other words, which scores are in the top 10%. I've googled my little heart out and think I don't know the right questions to ask. Thanks, wise ones!
Statistics Filter: I have thousands of data points (ranging in value from 1 to 0, 8 decimal points) representing a score. I graphed the data set and it's a classic long-tail distribution--looks like a hockey stick. How do I find the value of the "inflection point" and also the value/position of top 10%, in other words, which scores are in the top 10%. I've googled my little heart out and think I don't know the right questions to ask. Thanks, wise ones!
Best answer: In R, put your data points into a vector called
Programmatically, you can find a point of inflection by taking the absolute value of the rate of change from one point to the next. Then take the minimum value across those values.
posted by Blazecock Pileon at 10:11 AM on August 7, 2008
x
, then type quantile(x,probs=0.9)
.Programmatically, you can find a point of inflection by taking the absolute value of the rate of change from one point to the next. Then take the minimum value across those values.
posted by Blazecock Pileon at 10:11 AM on August 7, 2008
Even the most basic program (e.g. Excel) will tell you the highest value, which should be your "inflection point," unless I'm misunderstanding. So just sort your column Z>A and take the top value.
Then enter this formula into another column (C2 in the next step) and it will give you the cutoff value for the top 10%: =PERCENTILE([enter range here],0.9)
Enter =IF(A2<C$2,TRUE) into B2 and fill it down column B. All your values with TRUE are in the top 10%.
posted by desjardins at 10:18 AM on August 7, 2008
Then enter this formula into another column (C2 in the next step) and it will give you the cutoff value for the top 10%: =PERCENTILE([enter range here],0.9)
Enter =IF(A2<C$2,TRUE) into B2 and fill it down column B. All your values with TRUE are in the top 10%.
posted by desjardins at 10:18 AM on August 7, 2008
An inflection point of a curve is where the second derivative is zero. You can modify BP's suggestion to find it.
posted by Horselover Fat at 12:58 PM on August 7, 2008 [1 favorite]
posted by Horselover Fat at 12:58 PM on August 7, 2008 [1 favorite]
« Older Can you tunnel via SSH to access Gmail via IMAP in... | Mapgeek looking for a career Newer »
This thread is closed to new comments.
posted by desjardins at 9:59 AM on August 7, 2008