Help me add and multiply random variables
January 13, 2011 3:50 PM Subscribe
I need some stats help choosing distributions and multiplying and adding random variables. I'm rusty and short on time.
A paper I'm working on would benefit from a really rough statistical treatment. I would be able to read and figure this out on my own, but I am very tight for time and there is no-one around to ask for help. I will be working through this tonight but a bit of guidance will really help me understand what I'm doing.
I have something like this:
F = A + B * C
Where A, B, and C are all empirically measured properties with some unknown distributions. They're all independent. I've spent a lot of effort in this paper trying to better understand A B and C and I have enough information to roughly model them with probability distributions. What I'd like to do is propagate those and show what F looks like.
A is uniformly distributed on some interval I've figured out. Both B and C have something closer to a normal distribution. I have a rough guess at the mean and stdev that is good enough for my purposes.
My questions:
1. Both B and C should be greater than zero. I'm not sure if a normal distribution is appropriate here. If it helps, B is centered at 4.5 and has a stdev of about 2, while C is centered at 250 and has a stdev of about 100 (but skews high slightly). A normal distribution would go into the negatives and that's incorrect. If not for that, it's a good enough approximation to the data. What should I use? lognormal? What are the appropriate options here?
2. What happens when I multiply two independent normal (or lognormal, or whatever) distributions? And what happens when I add the result of that to a uniform distribution?
Thanks
A paper I'm working on would benefit from a really rough statistical treatment. I would be able to read and figure this out on my own, but I am very tight for time and there is no-one around to ask for help. I will be working through this tonight but a bit of guidance will really help me understand what I'm doing.
I have something like this:
F = A + B * C
Where A, B, and C are all empirically measured properties with some unknown distributions. They're all independent. I've spent a lot of effort in this paper trying to better understand A B and C and I have enough information to roughly model them with probability distributions. What I'd like to do is propagate those and show what F looks like.
A is uniformly distributed on some interval I've figured out. Both B and C have something closer to a normal distribution. I have a rough guess at the mean and stdev that is good enough for my purposes.
My questions:
1. Both B and C should be greater than zero. I'm not sure if a normal distribution is appropriate here. If it helps, B is centered at 4.5 and has a stdev of about 2, while C is centered at 250 and has a stdev of about 100 (but skews high slightly). A normal distribution would go into the negatives and that's incorrect. If not for that, it's a good enough approximation to the data. What should I use? lognormal? What are the appropriate options here?
2. What happens when I multiply two independent normal (or lognormal, or whatever) distributions? And what happens when I add the result of that to a uniform distribution?
Thanks
My stats is rather rusty, but here's a first attempt:
The product of your Gaussians is again Gaussian (for quick reference on their means and stddevs: https://ccrma.stanford.edu/~jos/sasp/Product_Two_Gaussian_PDFs.html). Since your means are both > 2 stddevs above 0, I would guess you're okay here.
The sum of a normal and uniform variable, however, sounds harder. If you have a mathematical package (Mathematica, Matlab, Maple, ...?) you can calculate it as the convolution of the two functions. I'll think a bit more on it...
posted by Talisman at 6:30 PM on January 13, 2011
The product of your Gaussians is again Gaussian (for quick reference on their means and stddevs: https://ccrma.stanford.edu/~jos/sasp/Product_Two_Gaussian_PDFs.html). Since your means are both > 2 stddevs above 0, I would guess you're okay here.
The sum of a normal and uniform variable, however, sounds harder. If you have a mathematical package (Mathematica, Matlab, Maple, ...?) you can calculate it as the convolution of the two functions. I'll think a bit more on it...
posted by Talisman at 6:30 PM on January 13, 2011
Actually I take that back: the convolution doesn't seem so hard to work out, but the limits are a little tricky. I have to head out right now, but will try to check in later.
posted by Talisman at 6:36 PM on January 13, 2011
posted by Talisman at 6:36 PM on January 13, 2011
If the point is to make a plot to show what you think the marginal distribution of F should look like, just simulate it. A uniform plus a gaussian isn't going to be any named special distribution, and unless you're doing math stats on it, (sounds like you aren't) then there's no benefit to working out the pdf. If your estimates about the marginal distributions of A,B,C are based on samples, then do 10000 draws with replacement to get simulated F.
posted by a robot made out of meat at 7:12 PM on January 13, 2011 [1 favorite]
posted by a robot made out of meat at 7:12 PM on January 13, 2011 [1 favorite]
If you have a distribution that looks like a Gaussian but physically cannot be negative, odds are it is a gamma distribution. The Gaussian distribution approximates the gamma distribution when the mean is many stdevs away from zero. It is common to use the Gaussian approximation to the gamma since Gaussians are much easier to manipulate algebraically. This is what Talisman was proposing.
But if you don't need the algebraic formula, for instance if are going to simulate from modeled distributions, then you might as well use real gammas.
posted by gmarceau at 8:12 PM on January 13, 2011
But if you don't need the algebraic formula, for instance if are going to simulate from modeled distributions, then you might as well use real gammas.
posted by gmarceau at 8:12 PM on January 13, 2011
2. What happens when I multiply two independent normal (or lognormal, or whatever) distributions? And what happens when I add the result of that to a uniform distribution?
I agree with a robot made out of meat... Your options are to just simulate F by writing a short script that samples from A,B,C, or to calculate the probability density function of F by doing some pencil and paper math with the probability density functions of A, B, and C. Unless you need an exact analytical solution, probably the simulation route is the way to go. If you must take the pencil and paper route, you should be able to find some good examples of the appropriate techniques by googling; manipulating PDFs 'by hand' is pretty central in grad-level stats courses.
Also, note that for both B and C less than 2% of the values will be below zero (based on the mean and sd you give), so, depending on your application, maybe you could get away with a normal approximation, which would make an analytic/algebraic solution much easier. It probably doesn't matter if you're going for the simulation solution.
posted by JumpW at 11:18 PM on January 13, 2011
I agree with a robot made out of meat... Your options are to just simulate F by writing a short script that samples from A,B,C, or to calculate the probability density function of F by doing some pencil and paper math with the probability density functions of A, B, and C. Unless you need an exact analytical solution, probably the simulation route is the way to go. If you must take the pencil and paper route, you should be able to find some good examples of the appropriate techniques by googling; manipulating PDFs 'by hand' is pretty central in grad-level stats courses.
Also, note that for both B and C less than 2% of the values will be below zero (based on the mean and sd you give), so, depending on your application, maybe you could get away with a normal approximation, which would make an analytic/algebraic solution much easier. It probably doesn't matter if you're going for the simulation solution.
posted by JumpW at 11:18 PM on January 13, 2011
In Mathematica you would represent the distribution of F = A + B * C using TransformedDistribution:
PDF[dist, x] tries to find a closed form for the probability density, but I don't think one exists for this case.
Nevertheless, without the PDF you can still automatically get a lot from dist. PDF histogram of random samples:
posted by hAndrew at 4:24 PM on January 14, 2011 [1 favorite]
dist = TransformedDistribution[a + b c, {a \[Distributed] UniformDistribution[{0, 10}], b \[Distributed] NormalDistribution[4.5, 2], c \[Distributed] NormalDistribution[250, 100]}](I just made up the range {0, 10} in UniformDistribution[{0, 10}], but you could substitute your actual range.)
PDF[dist, x] tries to find a closed form for the probability density, but I don't think one exists for this case.
Nevertheless, without the PDF you can still automatically get a lot from dist. PDF histogram of random samples:
Histogram[RandomVariate[dist, 10^5], 50, "PDF"]Properties of the distribution:
{Mean[dist], StandardDeviation[dist], Skewness[dist]}Here's a screen capture of the results of those.
The product of your Gaussians is again GaussianUnfortunately not true. The product of two Gaussian functions is a Gaussian function, but the product of two Gaussian-distributed variables is not Gaussian-distributed. See same screen capture above.
posted by hAndrew at 4:24 PM on January 14, 2011 [1 favorite]
As for distributions that fit your data better than a normal distribution, including the >= 0 constraint, some options:
* You could truncate an existing distribution. In Mathematica it would be TruncatedDistribution.
* If you have plenty of sample data, you could use an empirical distribution and work directly from that without ever choosing any particular closed form. In Mathematica it would be one of these distributions. Without knowing the details of your situation, I would first suggest a smooth kernel density distribution (SmoothKernelDistribution).
posted by hAndrew at 4:30 PM on January 14, 2011
* You could truncate an existing distribution. In Mathematica it would be TruncatedDistribution.
* If you have plenty of sample data, you could use an empirical distribution and work directly from that without ever choosing any particular closed form. In Mathematica it would be one of these distributions. Without knowing the details of your situation, I would first suggest a smooth kernel density distribution (SmoothKernelDistribution).
posted by hAndrew at 4:30 PM on January 14, 2011
Response by poster: I thought perhaps there would be a concise way to describe the result, but it seems pretty clear my best option is to simulate the data and show the result graphically. Thanks for your help.
posted by PercussivePaul at 7:29 PM on January 14, 2011
posted by PercussivePaul at 7:29 PM on January 14, 2011
This thread is closed to new comments.
posted by timsteil at 4:35 PM on January 13, 2011