August 7, 2008 9:49 AM Subscribe

Statistics Filter: Finding the top 10% from thousands of data points. Bonus question: inflection point value.

Statistics Filter: I have thousands of data points (ranging in value from 1 to 0, 8 decimal points) representing a score. I graphed the data set and it's a classic long-tail distribution--looks like a hockey stick. How do I find the value of the "inflection point" and also the value/position of top 10%, in other words, which scores are in the top 10%. I've googled my little heart out and think I don't know the right questions to ask. Thanks, wise ones!

Statistics Filter: I have thousands of data points (ranging in value from 1 to 0, 8 decimal points) representing a score. I graphed the data set and it's a classic long-tail distribution--looks like a hockey stick. How do I find the value of the "inflection point" and also the value/position of top 10%, in other words, which scores are in the top 10%. I've googled my little heart out and think I don't know the right questions to ask. Thanks, wise ones!

In R, put your data points into a vector called

Programmatically, you can find a point of inflection by taking the absolute value of the rate of change from one point to the next. Then take the minimum value across those values.

posted by Blazecock Pileon at 10:11 AM on August 7, 2008

`x`

, then type `quantile(x,probs=0.9)`

.Programmatically, you can find a point of inflection by taking the absolute value of the rate of change from one point to the next. Then take the minimum value across those values.

posted by Blazecock Pileon at 10:11 AM on August 7, 2008

Even the most basic program (e.g. Excel) will tell you the highest value, which should be your "inflection point," unless I'm misunderstanding. So just sort your column Z>A and take the top value.

Then enter this formula into another column (C2 in the next step) and it will give you the cutoff value for the top 10%: =PERCENTILE([enter range here],0.9)

Enter =IF(A2<C$2,TRUE) into B2 and fill it down column B. All your values with TRUE are in the top 10%.

posted by desjardins at 10:18 AM on August 7, 2008

Then enter this formula into another column (C2 in the next step) and it will give you the cutoff value for the top 10%: =PERCENTILE([enter range here],0.9)

Enter =IF(A2<C$2,TRUE) into B2 and fill it down column B. All your values with TRUE are in the top 10%.

posted by desjardins at 10:18 AM on August 7, 2008

An inflection point of a curve is where the *second* derivative is zero. You can modify BP's suggestion to find it.

posted by Horselover Fat at 12:58 PM on August 7, 2008 [1 favorite]

posted by Horselover Fat at 12:58 PM on August 7, 2008 [1 favorite]

This thread is closed to new comments.

posted by desjardins at 9:59 AM on August 7, 2008