# Help me understand weighted averages.

July 11, 2004 1:38 AM Subscribe

Remedial weighted averages for the mathematically illiterate – or, can the 10% of you who understand math help the remaining 95% of us who don't? [mi].

Consider:

A jockey who has 200 trips and finishes in the money 100 times has a Win/Place/Show rating of 50%. But so does a jockey who has only raced twice and happened to come the money once. Although in theory, their performance is equivalent, it hardly seems fair to treat them as equal odds until the neophyte jockey builds up more of a representative sample. In short, how would one weight this sort of statistic to make it more realistic in the real world?

(Disclaimer: The racing season where I live ends today, so you won't be contributing to my general delinquency. It's simply a math question that happened to occur to me whilst playing around with some pony figures.)

Consider:

A jockey who has 200 trips and finishes in the money 100 times has a Win/Place/Show rating of 50%. But so does a jockey who has only raced twice and happened to come the money once. Although in theory, their performance is equivalent, it hardly seems fair to treat them as equal odds until the neophyte jockey builds up more of a representative sample. In short, how would one weight this sort of statistic to make it more realistic in the real world?

(Disclaimer: The racing season where I live ends today, so you won't be contributing to my general delinquency. It's simply a math question that happened to occur to me whilst playing around with some pony figures.)

It's not a weighted-average problem.

The way I'd usually think about it is that any given jockey has a true long run probability of W/P/S'ing, or a long-run percentage of it, but you don't know it.

The actual W/P/S percentage you see is only an estimate of the true goodness of the jockey. If we make the heroic assumption that the races we see are a random sample of all possible races, we can apply the usual rules of confidence intervals. You might be able to say with 95% confidence that the 200-race jockey's true probability of W/P/S is 0.4--0.6... but all you can say about a jockey with 2 races under his belt, with 95% confidence, is that his true probability is between 0 and 1.

One way you could deal with this is to look at the worst-case probabilities. What's the worst you could reasonably expect the jockey to do? One will come out with a probability of doing well of ~0.4 or whatever, and the other will end up with a big fat goose egg.

You'd probably be better off imagining this as a multi-step Bayesian updating exercise, but this was shorter and I'm gonna go watch F1

posted by ROU_Xenophobe at 4:41 AM on July 11, 2004

The way I'd usually think about it is that any given jockey has a true long run probability of W/P/S'ing, or a long-run percentage of it, but you don't know it.

The actual W/P/S percentage you see is only an estimate of the true goodness of the jockey. If we make the heroic assumption that the races we see are a random sample of all possible races, we can apply the usual rules of confidence intervals. You might be able to say with 95% confidence that the 200-race jockey's true probability of W/P/S is 0.4--0.6... but all you can say about a jockey with 2 races under his belt, with 95% confidence, is that his true probability is between 0 and 1.

One way you could deal with this is to look at the worst-case probabilities. What's the worst you could reasonably expect the jockey to do? One will come out with a probability of doing well of ~0.4 or whatever, and the other will end up with a big fat goose egg.

You'd probably be better off imagining this as a multi-step Bayesian updating exercise, but this was shorter and I'm gonna go watch F1

posted by ROU_Xenophobe at 4:41 AM on July 11, 2004

Statistics tutorials that deal with this subject: Confidence Levels, P Values.

posted by sleslie at 7:25 AM on July 11, 2004

posted by sleslie at 7:25 AM on July 11, 2004

This thread is closed to new comments.

when people are talking about a weighting in polling what they mean is they are skewing their data set to try and match the population. like, if your polling group is 100 people, and your population is the state of texas; maybe your polling group has 4 hispanics, but the state has 35% hispanics, you might want to "weigh" the 4 polled hispanics higher, you would usually weigh them based on the population numbers, like mean*.35. This is why polls ask demographic questions.

what you're doing is descriptive though, not predictive, so maybe you could 1) give the n=x 2) arbitrarily value a loss as greater than 0.

I don't know anything about horse racing though, so they very well may have a system for doing this that's a lot cooler :)

posted by rhyax at 2:41 AM on July 11, 2004