August 28, 2006 7:14 AM Subscribe

FunctionFilter: I need a function that allows me to skew a distribution from a uniform distribution to an extremely biased distribution for a domain and range of (0,1).

I think I'm looking for a CDF, but I can work with the details. The problem is that both the domain and range have to be between 0 and 1. This rules out a natural choice of a simple exponential. The inverse CDF should have a single inflection point which will always be on the line y=x. I would like to be able to tune the function with a single parameter between a uniform distribution (CDF: y=-1+1) and something like an exponential function y=n* e^(-nx) except with the asymptotes approaching the axes at 0 and 1.

In other words, f(0) approaches 1 and f(1) approaches 0. I need a function that allows me to tune how steeply they approach, and the function must be symetric around y=x. I feel like such a function must exist, but I can't find it in any of the standard stats references. It's for a simulation, so I can fudge behavior at 0 and 1 if needs be. Any ideas?
posted by allan to Science & Nature (6 answers total)

I think I'm looking for a CDF, but I can work with the details. The problem is that both the domain and range have to be between 0 and 1. This rules out a natural choice of a simple exponential. The inverse CDF should have a single inflection point which will always be on the line y=x. I would like to be able to tune the function with a single parameter between a uniform distribution (CDF: y=-1+1) and something like an exponential function y=n* e^(-nx) except with the asymptotes approaching the axes at 0 and 1.

In other words, f(0) approaches 1 and f(1) approaches 0. I need a function that allows me to tune how steeply they approach, and the function must be symetric around y=x. I feel like such a function must exist, but I can't find it in any of the standard stats references. It's for a simulation, so I can fudge behavior at 0 and 1 if needs be. Any ideas?

1. You don't give enough information to determine a unique single-parameter family that will fit your criteria. So there are various possibilities here.

2. If you want your cdf to be smooth at the endpoints (i.e., so that you can extend smoothly by constant 1 on the right and constant 1 on the left) you'll almost certainly need to use special functions or leave your cdf expressed as an integral and work with a tractable pdf.

3. If you don't mind special functions, then and excellent candidate for your family is based on the incomplete beta function: Use a parameter r>=0 to control the shape, and simply let

F(x)=Β(x,1+r,1+r)

It is easier to work with the pdf (which is the easier place to start with this problem, too), and that would be

f(x)=x^{r}(1-x)^{r}

Realize that the beta function is just, by definition, the integral of that guy. This picture will always be symmetric about x=1/2 and of course that's where your inflection point (corresponding to the unique mode of the distribution) will be.

4. If you need a cdf in terms of elementary functions then we can do that but it might be hard to get the cdf truly smooth at 0 and 1.

posted by Wolfdog at 8:08 AM on August 28, 2006

2. If you want your cdf to be smooth at the endpoints (i.e., so that you can extend smoothly by constant 1 on the right and constant 1 on the left) you'll almost certainly need to use special functions or leave your cdf expressed as an integral and work with a tractable pdf.

3. If you don't mind special functions, then and excellent candidate for your family is based on the incomplete beta function: Use a parameter r>=0 to control the shape, and simply let

F(x)=Β(x,1+r,1+r)

It is easier to work with the pdf (which is the easier place to start with this problem, too), and that would be

f(x)=x

Realize that the beta function is just, by definition, the integral of that guy. This picture will always be symmetric about x=1/2 and of course that's where your inflection point (corresponding to the unique mode of the distribution) will be.

4. If you need a cdf in terms of elementary functions then we can do that but it might be hard to get the cdf truly smooth at 0 and 1.

posted by Wolfdog at 8:08 AM on August 28, 2006

Oops, sorry, I left out the scaling factor you need in those. The cdf should be

F(x) = ( Γ(2r+2) / &Gamma(1+r)^{2} ) * &Beta(x,1+r,1+r)

which has pdf

f(x)=( Γ(2r+2) / &Gamma(1+r)^{2} ) x^{r}(1-x)^{r}

If you use integer values of r then the gamma expressions can be replace with factorials:

( Γ(2r+2) / &Gamma(1+r)^{2} ) = (2r+1)!/(r!)^2

but that will only allow you to tune the shape of your distribution in discrete steps.

posted by Wolfdog at 8:15 AM on August 28, 2006

F(x) = ( Γ(2r+2) / &Gamma(1+r)

which has pdf

f(x)=( Γ(2r+2) / &Gamma(1+r)

If you use integer values of r then the gamma expressions can be replace with factorials:

( Γ(2r+2) / &Gamma(1+r)

but that will only allow you to tune the shape of your distribution in discrete steps.

posted by Wolfdog at 8:15 AM on August 28, 2006

Stupid deceptive previewer.

Anyway, here's what that would look like for different values of r: cdfs and pdfs. Note that r=0 gives you uniform, and central tendency increases with r, with the pdf converging to unit mass at 1/2 and the cdf converging to a step function as r gets large.

Is that like what you're looking for? It would be unusual to have a cdf (or inverse cdf) that goes to 1 on the left and 0 on the right, which is what you described.

posted by Wolfdog at 8:37 AM on August 28, 2006

Anyway, here's what that would look like for different values of r: cdfs and pdfs. Note that r=0 gives you uniform, and central tendency increases with r, with the pdf converging to unit mass at 1/2 and the cdf converging to a step function as r gets large.

Is that like what you're looking for? It would be unusual to have a cdf (or inverse cdf) that goes to 1 on the left and 0 on the right, which is what you described.

posted by Wolfdog at 8:37 AM on August 28, 2006

Wolfdog -

Thanks for investing so much time in this!

I looked briefly at the beta distribution, but I what I'm looking for is a function where the first and second derivative are the same sign inside the range.I was looking for an inverse cdf to place it against the origin.

More about the problem: I want to assign a set of objects some property drawn from [0,1] based on some distribution. I would like to be able to tune that distribution: one extreme will be a uniform distribution, and at the other, most will be drawn from one end, but with a positive (decreasing) probability of drawing from the higher values. Obviously, which is the high or low value doesn't matter for my purposes.

What I don't want is any of the large family of functions with an s-curve CDF. If you have any insights into this, I would greatly appreciate it!

posted by allan at 10:17 AM on August 28, 2006

Thanks for investing so much time in this!

I looked briefly at the beta distribution, but I what I'm looking for is a function where the first and second derivative are the same sign inside the range.I was looking for an inverse cdf to place it against the origin.

More about the problem: I want to assign a set of objects some property drawn from [0,1] based on some distribution. I would like to be able to tune that distribution: one extreme will be a uniform distribution, and at the other, most will be drawn from one end, but with a positive (decreasing) probability of drawing from the higher values. Obviously, which is the high or low value doesn't matter for my purposes.

What I don't want is any of the large family of functions with an s-curve CDF. If you have any insights into this, I would greatly appreciate it!

posted by allan at 10:17 AM on August 28, 2006

I'm really having a hard time understanding what you're asking for. Some points of confusion:

1. A cdf is always increasing. I do not understand why you think the cdf should be allowed to be large at the left and small on the right. Is there any possibility you are confused about which is the cdf and which is the pdf? For instance, f(x)=re^{-rx} which you mentioned, is a pdf on the right halfline.

2.*CDF: y=-1+1* I'm assuming you meant y=-x+1, but even that doesn't make much sense as a cdf (because it's decreasing) or a pdf (because the amount of area over the unit interval is wrong).

3. You want the first and second derivative of (this function) to be the same sign all through the interval? For a cdf, then, the first derivative would have to be positive. Then the second derivative being positive all the time would be incompatible with your request for an inflection point.

4. I don't understand your symmetry condition if you want the distribution to be biased toward one side.

5.*an exponential function y=n* e^(-nx) except with the asymptotes approaching the axes at 0 and 1.* An exponential function like that only has one asymptote. But you're asking about functions on a closed interval, so the idea of asymptotes doesn't really make sense in this setting anyway.

Taking my best shot at interpreting your verbal description, you want a family that goes from uniform at one extreme to highly biased to ONE END (you'd prefer the low end?) at the other extreme. Again, there is more than one way to find a 1-parameter family that satisfies this, but here are two reasonable ones:

1. Using power functions: pdf f(x)=(r+1)(1-x)^{r} with cdf F(x)=1-(1-x)^{r+1}. This gives uniform distribution for r=0 and increasing bias to the left side as r gets large.

2. Using exponential functions: pdf f(x)= re^{-rx} / (1-e^{-r}) with cdf F(x)=(e^{r}(1-e^{-rx})) / (e^{r}-r-1). It approaches uniform distribution as r->0 (though it's not defined as written for r=0) and gives increasing bias to the left as r gets large.

These do not have symmetric cdfs (that's incompatible with bias toward one end) and they do not have inflection points so it still seems like I'm missing what you're asking for.

posted by Wolfdog at 12:33 PM on August 28, 2006

1. A cdf is always increasing. I do not understand why you think the cdf should be allowed to be large at the left and small on the right. Is there any possibility you are confused about which is the cdf and which is the pdf? For instance, f(x)=re

2.

3. You want the first and second derivative of (this function) to be the same sign all through the interval? For a cdf, then, the first derivative would have to be positive. Then the second derivative being positive all the time would be incompatible with your request for an inflection point.

4. I don't understand your symmetry condition if you want the distribution to be biased toward one side.

5.

Taking my best shot at interpreting your verbal description, you want a family that goes from uniform at one extreme to highly biased to ONE END (you'd prefer the low end?) at the other extreme. Again, there is more than one way to find a 1-parameter family that satisfies this, but here are two reasonable ones:

1. Using power functions: pdf f(x)=(r+1)(1-x)

2. Using exponential functions: pdf f(x)= re

These do not have symmetric cdfs (that's incompatible with bias toward one end) and they do not have inflection points so it still seems like I'm missing what you're asking for.

posted by Wolfdog at 12:33 PM on August 28, 2006

This thread is closed to new comments.

y = (1 / (x+a)

^{n}) - a ?n would be your skew parameter, and a would offset the function along the y=x line to make it intersect the axes at (1,0) and (0,1). i'm thinking you could solve for an analytic form for a(n) subject to those conditions. (says the guy who's too lazy to try!)

posted by sergeant sandwich at 7:38 AM on August 28, 2006