for the statisticians: what probability distribution do I want to fit to my data, and how?
I have a collection of widgets. they age, and as they age some of them get tweaked. it's been suggested to me that given enough time every widget will eventually be tweaked, but I'm not convinced. the data I have covers ~64000 widgets, and for each widget is either the age at which it was tweaked, or alternatively the current age and the fact that it has never (yet) been tweaked.
as I understand it (I'm a database programmer with an interest in maths but very little working stats) I want to fit some probability distribution to my data and see whether the area under the probability mass function is < 1.
the 2 things that make this different from everything I've been able to find on the 'net are
1) I specifically don't expect the cumulative distribution to asymptote to 1, which appears to count out obvious candidates like a Poisson distribution
2) my data contains negative examples - those widgets which have not yet been tweaked, and may never be. I'm assuming these are relevant to the shape of the final distribution, but I can't work out how to get tools like
R to take them into account.
the best advice I've had so far is from an engineer friend who said to chuck the data on only the tweaked widgets into R, fit a Poisson distribution and be done with it. unfortunately this fails on both of the above points. (then again, all he normally cares about is failure rates in aircraft parts, and you know that given enough time every lug
will eventually break).
so, if there are any stats geeks who understand what I'm trying to do and can point me at tools/docs/info on how to do it you'd rock my world.
posted by langedon at 7:52 PM on January 18, 2008