What is the expected value function of the Gumbel distribution?
January 30, 2013 6:22 AM Subscribe
I believe that I need to use the expected value function of the Gumbel distribution to analyze some data. However, I can't find a guide of how to do this that is not written for mathematicians, beyond answers like a(n) + gamma*b(n), which isn't exactly helpful if a(n) and b(n) are not clearly defined. Or, do I even need to calculate this?
Here is my situation. I am good at mathematics, but I am not a mathematician. This means that I only half know what I am talking about.
I have 20 different populations of cells. For each population, I have surveyed 3 different numbers of cells (think 10 cells, 1000 cells, and 100000 cells) and recorded their maximum score for a trait. I only have this maximum score for the three different numbers of cells. Nothing else is known about these populations.
I want to see how these 20 populations differ from each other. My plan was to look at the linear regression of the number of cells on these maximum values. I could then compare the slope of the linear regression to see how the populations differ.
However, this function isn't expected to be linear. So, using a linear regression seems... unwise. As I am working with the maximum value of the trait, I believe that I need to use extreme value theory. I am working under the assumption that the distribution of the trait is a normal distribution, so this points me towards the Gumbel distribution to find maximums.
Therefore, by plotting the number of cells and the maximum trait values, I am really looking at the number of cells and the expected value of the Gumbel distribution.
What is the equation that links these two things?
I realize that this equation should rely on the mean and variance of the trait distribution (which I am assuming is normally distributed). So, by fitting my actual data with this equation, I can estimate the mean and standard deviation of the trait distribution. Then, my plan is to use these estimates to compare the 20 different populations.
I realize that '3 different numbers of cells' is an insanely low number of data points to estimate parameters on. But, I am looking for incredibly large effects, so large errors should not be a problem.
Does my logic make sense, and which equation am I looking for?
Thank you!
posted by Peter Petridish to science & nature (6 answers total) 1 user marked this as a favorite
You can make this incredibly easier by using the built in "gevfit" to fit the data to a Generalized Extreme Value distribution, which encompasses a Gumbel distribution. The built in algorithm uses Maximum Likelihood Estimation to parameterize a dataset.
posted by oceanjesse at 6:57 AM on January 30