# I don't understand why election modeling has any validity

November 9, 2022 12:12 PM Subscribe

I don't mean this critically, I mean genuinely as a conceptual matter it seems to me incorrect to use statistics to invoke the "odds" and "chances" of specific electoral outcomes. Help me wrap my mind around it?

The caveat here is that I'm not a person who easily understands statistical concepts, honestly I can barely multiply let along get into these other conceptual numerical frameworks. Here's what I'm struggling with:

Let's take 538 as the obvious example. They take polls and current events (party dynamics, inflation, trendlines, etc) and create a "model" through which they run some thousands of mock "elections" and then come out and say: "There is a 20% chance the Democrats will take the House. In 100 elections, Democrats would take it 20 times."

My mind tells me this is bunk. Take something that does make sense to me: the MLB batting average. Jose Altuve gets up to the plate, an MLB pitcher throws him a pitch, and there's a 30% chance he gets a base hit. That's measuring what's already happened, sure, but until Jose Altuve falls off, as a viewer, Jose Altuve is gonna get maybe 500 at-bats this season and I'm gonna see him get on base 150 times.

By contrast, an election model is predicting what's going to happen one time: the 2022 midterms, a

Put more simply: are there 20% "odds" that the Dems won the House yesterday? Or are the Republicans just set to win the House, and if Dems win it's not because they got the weird 2/10 coin flip, but rather it's because something was happening on the ground that journalists did not discover or accurately report out? That doesn't seem like a statistical matter, it seems like an epistemic matter.

Thank you in advance for explaining this to me. Feel free to explain it to me like I am a five year old.

The caveat here is that I'm not a person who easily understands statistical concepts, honestly I can barely multiply let along get into these other conceptual numerical frameworks. Here's what I'm struggling with:

Let's take 538 as the obvious example. They take polls and current events (party dynamics, inflation, trendlines, etc) and create a "model" through which they run some thousands of mock "elections" and then come out and say: "There is a 20% chance the Democrats will take the House. In 100 elections, Democrats would take it 20 times."

My mind tells me this is bunk. Take something that does make sense to me: the MLB batting average. Jose Altuve gets up to the plate, an MLB pitcher throws him a pitch, and there's a 30% chance he gets a base hit. That's measuring what's already happened, sure, but until Jose Altuve falls off, as a viewer, Jose Altuve is gonna get maybe 500 at-bats this season and I'm gonna see him get on base 150 times.

By contrast, an election model is predicting what's going to happen one time: the 2022 midterms, a

*sui generis*event. It's gonna take a couple dozen examples in recent political history where the in-party losts seats in the midterm, and it's gonna look at gas prices and poll quality, but then it'll express its findings in terms of statistical odds rather than in the simpler sense of, "Looks like Dems are going to lose, unless we're missing something."

Put more simply: are there 20% "odds" that the Dems won the House yesterday? Or are the Republicans just set to win the House, and if Dems win it's not because they got the weird 2/10 coin flip, but rather it's because something was happening on the ground that journalists did not discover or accurately report out? That doesn't seem like a statistical matter, it seems like an epistemic matter.

Thank you in advance for explaining this to me. Feel free to explain it to me like I am a five year old.

Thank you so much for asking this question! Following with intense interest - my brain tends to ask the same questions that yours does.

posted by rrrrrrrrrt at 12:37 PM on November 9, 2022 [3 favorites]

posted by rrrrrrrrrt at 12:37 PM on November 9, 2022 [3 favorites]

Response by poster: I realize even my "put more simply" was kind of convoluted. I think what I'm saying is that my brain tells me if you could run yesterday's election 1,000 times, the results would be the same every time and there was never any chance that it would be different because all the same things would be at play.

posted by kensington314 at 12:45 PM on November 9, 2022 [1 favorite]

posted by kensington314 at 12:45 PM on November 9, 2022 [1 favorite]

*I think what I'm saying is that my brain tells me if you could run yesterday's election 1,000 times, the results would be the same every time and there was never any chance that it would be different because all the same things would be at play*

Well, at least some of the model inputs (polls) come with confidence intervals, which I think necessarily means that there is some chance that the outcome would be different even if all the same things are at play. I don't know enough about the modeling to know how that really factors in, though.

posted by AndrewInDC at 12:49 PM on November 9, 2022

*if you somehow could run yesterday's election 1,000 times, the results would be the same every time and there was never any chance that it would be different because all the same things would be at play.*

Right, but going into yesterday, you can't know ahead of time which of those same things are going to be at play. If an archer shoots a dozen arrows at a target, the arrows will strike the target in some sort of a pattern that reflects the wind and air and the shape of the arrow itself and the archer's skill in accounting for all of those things. If you replayed any one of those shots exactly as it happened, it would -- in theory -- happen the same way every time you replayed it, but election modeling is more like shooting a dozen shots, all of them slightly different, and seeing what the results are.

Polling measures a small but real thing: "how did 500 people in district X answer this question?" and attempts to expand that small but real thing into a larger thing: "how are all 50,000 people in district X going to vote on this question?" By definition, expanding that small thing into the larger thing means that there is some margin of error; you might have just happened to poll 500 people more likely to say yes, or no, than the average voter, so you have to account for that.

Modeling runs different scenarios

*within those margins of error*, perhaps including additional information that the modeler thinks is relevant such as demographic or socioeconomic factors. So when 538 says that this result happened in 80% of its random scenarios, what it means is that, if it shoots a bunch of arrows within the parameters of the polling margins of error, 80% of them land on one side of some line.

posted by gauche at 12:55 PM on November 9, 2022 [6 favorites]

I don’t know if it’s what 538 uses, but a lot of statistical modelers use what’s known as a Monte Carlo simulation. They feed all of the polling data, with margins of error and other black magic data into the system, and the program does “hold” the election thousands of times.

So they’re reporting the results of what has happened from their multitude of simulated elections.

posted by hwyengr at 12:59 PM on November 9, 2022

So they’re reporting the results of what has happened from their multitude of simulated elections.

posted by hwyengr at 12:59 PM on November 9, 2022

Best answer: (my degrees are in stats/biostats, I do a lot of predictive modeling for things that probably won't happen)

The 20% is true IF you believe the model is built correctly, e.g., that it reasonably specifies the relationship between existing data (polls, betting markets, etc.) and expected votes in an accurate manner, and that you are willing to overlook all of the things the model didn't consider (e.g., what if a political scandal occurs? what if a natural disaster occurs?). In practice, the exact output from the model is almost guaranteed to be wrong because you can never account for absolutely everything, but it's possible that you can be qualitatively close enough.

There's a famous saying in statistics that "all models are wrong, some are useful". So how do you know which ones are useful, especially when you can't validate the results? The simplest heuristic is to look at how well the models performed in the past, and trust the modelers accordingly. Otherwise, you can look at how the models are built, and decide whether or not they accurately reflect reality (in general, many models are poorly constructed due to lack of expertise). Standard modeling practice is to train your model on data prior to a past event (say, the 2020 election using pre-election data), and see if it does a good job of predicting that, then applying some adjustments to make it relevant for 2022. So to answer your question, we know that election models have some level of validity because they have worked well (in varying degrees of success) in the past.

The statistician Andrew Gelman does great work re: polling/election models, and I would strongly recommend reading his take on things if you are interested. Recommended reading: link

On the philosophical/epistemic end of it, you could also look into aleatory vs epistemic uncertainty.

posted by bongerino at 1:05 PM on November 9, 2022 [21 favorites]

The 20% is true IF you believe the model is built correctly, e.g., that it reasonably specifies the relationship between existing data (polls, betting markets, etc.) and expected votes in an accurate manner, and that you are willing to overlook all of the things the model didn't consider (e.g., what if a political scandal occurs? what if a natural disaster occurs?). In practice, the exact output from the model is almost guaranteed to be wrong because you can never account for absolutely everything, but it's possible that you can be qualitatively close enough.

There's a famous saying in statistics that "all models are wrong, some are useful". So how do you know which ones are useful, especially when you can't validate the results? The simplest heuristic is to look at how well the models performed in the past, and trust the modelers accordingly. Otherwise, you can look at how the models are built, and decide whether or not they accurately reflect reality (in general, many models are poorly constructed due to lack of expertise). Standard modeling practice is to train your model on data prior to a past event (say, the 2020 election using pre-election data), and see if it does a good job of predicting that, then applying some adjustments to make it relevant for 2022. So to answer your question, we know that election models have some level of validity because they have worked well (in varying degrees of success) in the past.

The statistician Andrew Gelman does great work re: polling/election models, and I would strongly recommend reading his take on things if you are interested. Recommended reading: link

On the philosophical/epistemic end of it, you could also look into aleatory vs epistemic uncertainty.

posted by bongerino at 1:05 PM on November 9, 2022 [21 favorites]

Response by poster: hwyengr, I get that, but the thing I don't understand--and maybe I just don't understand the theoretical foundations of how and why statistical modeling has any validity--is what those "simulated elections" even mean. The 2022 election can be held once and never again. My brain, which is highly literalistic, tells me that this kind of exercise is a distraction that doesn't tell us anything meaningful.

posted by kensington314 at 1:06 PM on November 9, 2022

posted by kensington314 at 1:06 PM on November 9, 2022

I think it is, imagine 100 Earths very much like ours. In all of them, because of the error in sampling, they all got the exact same predicted vote counts with the same confidence interval. Except, the actual vote counts were different in each world. We would expect 20 of these worlds to have the House go Dem. We will only find out which of those 100 hypothetical worlds we are in after the election.

posted by flimflam at 1:16 PM on November 9, 2022 [2 favorites]

posted by flimflam at 1:16 PM on November 9, 2022 [2 favorites]

Best answer: In probability theory, every event is sui generis. That's the key. If Jose Altuve is batting .300, or if a coin flips heads half the time, that doesn't have any bearing on the next at-bat or the next flip. We're just using past performance because we assume that things will be similar (which is not always a valid assumption, which is why you added the "unless Altuve falls off" part: baseball players generally don't hit as well as they get older).

If you watch football, you might have seen the Amazon Next Gen Stats commercial where it pretends to real-time calculate the "catch probability" of a particular pass, and, then when the receiver is celebrating in the end zone, it displays a catch probability of 9.82% or whatever. That's not how probability works. Once the event is complete, the probably is 1 or 0. He either caught it or he didn't. Altuve hit a single or he didn't. The coin came up heads, or it came up tails. But before that point, we don't know what's going to happen, and probability is what tells us what's *likely* to happen.

Continuing with your Altuve example: If Altuve has a batting average of .300, and you watch a game where he bats four times, you'd expect to see him get one hit. But that's not a guarantee. He could be in a slump and get none, or he could get really hot and hit four. But we can refine our model to take more inputs into account. Maybe Altuve bats .500 against right-handed pitchers and .100 against left-handed pitchers (assuming an equal distribution of righties and lefties, that comes out to .300). You'd probably be more excited to watch a game against a righty, because Altuve is likely to get two hits that night. But again, he could be in a slump, so maybe not. But if your seven-year-old kid is a huge Altuve fan and wants to see his idol get a hit, buying a ticket to the game against the righty would be a lot smarter than buying a ticket to the game against the lefty.

Let's break it down even further by type of pitch. Say Altuve hits .750 against fastballs from righties, .250 against curveballs from righties, .150 against fastballs from lefties, and .050 against curveballs from lefties. You're in the stands with your kid, who has lost interest in the game, maybe they have to go to the bathroom. The opposing starter is a lefty who throws mostly curveballs. It's not looking good for our guy Altuve. But then the opposing manager pulls him and brings in a reliever, a right-hander with a 99mph fastball. At that point, you tell your kid to hold it, because Altuve just got a lot more likely to get a hit in his next at-bat. Make sense?

We can keep adding variables to our Altuve model: pitch speed, runners on base, opponent defense, park effects, etc. And the more variables you add, generally, the more realistic your model becomes. And because we have so many variables, we don't need to have seen the exact same scenario in the past to predict what might happen in the future. Altuve may never have come up to bat against a right-handed curveball pitcher throwing 93mph in Great American Ballpark (go Reds!) with runners on first and third before down by two runs in the 7th inning before, but based on the model with all those inputs, we could say there's a .2987314 (totally made up number, it just rounds up to .300 because that's your example)) chance that Altuve gets a hit on any particular pitch he sees in that at-bat.

I'm gonna go back to your "unless Altuve falls off" comment one more time. To state the obvious, Jose Altuve has never been 37 years old before. (He's 32 at the time of this writing, for readers who aren't familiar.) So by your logic, we have no way of knowing what he'll do in the 2027 season, when he's 37. Maybe he'll break Barry Bonds's record of 73 home runs. But if you know who Jose Altuve is, you're probably chuckling, because he's never hit more than 31 in a season before. And we generally know that ballplayers get worse as they age: generally they start declining in their early 30s, and even the best are usually pretty much done by 37. (That's actually part of why so many ballplayers from the 90s and 00s are suspected of steroid use - 37 happens to be Bonds's age when he hit 73 homers in 2001. It raises some eyebrows when you do things that have literally never been done by anyone at an age when most players have retired.) But anyway, that's a statistical model. As I said, Altuve's never been 37 before, but we still have a pretty good idea of how he'll be playing (or even if he'll be playing) when he's 37.

In actual statistics rather than just talking-shit-with-your-buddies "there's a 95% chance I'll get this girl's number" statistical BS, there's a concept of the confidence interval, which is a range of outcomes that encompasses a given percentage (95% is common) of possible outcomes. The media, for Reasons, doesn't generally report on confidence intervals, but they probably should, because it helps statistical predictions make more sense. If you say "the Democrats are predicted to win a Senate majority", as you said, that doesn't mean a lot. If but if you predict the Democrats will win 52 Senate seats with a confidence interval of 2 seats in either direction, that shows just how likely that Democratic majority actually is. A 95% confidence interval means that there's only a 2.5% chance of a Republican majority.

Hope this helps.

posted by kevinbelt at 1:16 PM on November 9, 2022 [13 favorites]

If you watch football, you might have seen the Amazon Next Gen Stats commercial where it pretends to real-time calculate the "catch probability" of a particular pass, and, then when the receiver is celebrating in the end zone, it displays a catch probability of 9.82% or whatever. That's not how probability works. Once the event is complete, the probably is 1 or 0. He either caught it or he didn't. Altuve hit a single or he didn't. The coin came up heads, or it came up tails. But before that point, we don't know what's going to happen, and probability is what tells us what's *likely* to happen.

Continuing with your Altuve example: If Altuve has a batting average of .300, and you watch a game where he bats four times, you'd expect to see him get one hit. But that's not a guarantee. He could be in a slump and get none, or he could get really hot and hit four. But we can refine our model to take more inputs into account. Maybe Altuve bats .500 against right-handed pitchers and .100 against left-handed pitchers (assuming an equal distribution of righties and lefties, that comes out to .300). You'd probably be more excited to watch a game against a righty, because Altuve is likely to get two hits that night. But again, he could be in a slump, so maybe not. But if your seven-year-old kid is a huge Altuve fan and wants to see his idol get a hit, buying a ticket to the game against the righty would be a lot smarter than buying a ticket to the game against the lefty.

Let's break it down even further by type of pitch. Say Altuve hits .750 against fastballs from righties, .250 against curveballs from righties, .150 against fastballs from lefties, and .050 against curveballs from lefties. You're in the stands with your kid, who has lost interest in the game, maybe they have to go to the bathroom. The opposing starter is a lefty who throws mostly curveballs. It's not looking good for our guy Altuve. But then the opposing manager pulls him and brings in a reliever, a right-hander with a 99mph fastball. At that point, you tell your kid to hold it, because Altuve just got a lot more likely to get a hit in his next at-bat. Make sense?

We can keep adding variables to our Altuve model: pitch speed, runners on base, opponent defense, park effects, etc. And the more variables you add, generally, the more realistic your model becomes. And because we have so many variables, we don't need to have seen the exact same scenario in the past to predict what might happen in the future. Altuve may never have come up to bat against a right-handed curveball pitcher throwing 93mph in Great American Ballpark (go Reds!) with runners on first and third before down by two runs in the 7th inning before, but based on the model with all those inputs, we could say there's a .2987314 (totally made up number, it just rounds up to .300 because that's your example)) chance that Altuve gets a hit on any particular pitch he sees in that at-bat.

I'm gonna go back to your "unless Altuve falls off" comment one more time. To state the obvious, Jose Altuve has never been 37 years old before. (He's 32 at the time of this writing, for readers who aren't familiar.) So by your logic, we have no way of knowing what he'll do in the 2027 season, when he's 37. Maybe he'll break Barry Bonds's record of 73 home runs. But if you know who Jose Altuve is, you're probably chuckling, because he's never hit more than 31 in a season before. And we generally know that ballplayers get worse as they age: generally they start declining in their early 30s, and even the best are usually pretty much done by 37. (That's actually part of why so many ballplayers from the 90s and 00s are suspected of steroid use - 37 happens to be Bonds's age when he hit 73 homers in 2001. It raises some eyebrows when you do things that have literally never been done by anyone at an age when most players have retired.) But anyway, that's a statistical model. As I said, Altuve's never been 37 before, but we still have a pretty good idea of how he'll be playing (or even if he'll be playing) when he's 37.

In actual statistics rather than just talking-shit-with-your-buddies "there's a 95% chance I'll get this girl's number" statistical BS, there's a concept of the confidence interval, which is a range of outcomes that encompasses a given percentage (95% is common) of possible outcomes. The media, for Reasons, doesn't generally report on confidence intervals, but they probably should, because it helps statistical predictions make more sense. If you say "the Democrats are predicted to win a Senate majority", as you said, that doesn't mean a lot. If but if you predict the Democrats will win 52 Senate seats with a confidence interval of 2 seats in either direction, that shows just how likely that Democratic majority actually is. A 95% confidence interval means that there's only a 2.5% chance of a Republican majority.

Hope this helps.

posted by kevinbelt at 1:16 PM on November 9, 2022 [13 favorites]

I mean, the 538 model did pretty well. It predicted that most likely the GOP would take back the house (with still a non-negligible chance it wouldn't happen - if the odds were 20% that I'd die doing activity [x], that's something I wouldn't want to gamble on!), and predicted the Senate would be a toss-up. And while some votes still need to be counted (and a run-off conducted in GA) it seems likely (based on the NYTimes needle) that the GOP will get a slim margin in the House, and the Dems will narrowly hold on to the Senate.

Models are not good at handling new factors - like, how much with Roe v. Wade falling impact things? How much are voters still pissed at Trump/Maga Republicans? How much does the growing awareness of climate change impact things? etc.

posted by coffeecat at 1:17 PM on November 9, 2022

Models are not good at handling new factors - like, how much with Roe v. Wade falling impact things? How much are voters still pissed at Trump/Maga Republicans? How much does the growing awareness of climate change impact things? etc.

posted by coffeecat at 1:17 PM on November 9, 2022

*My brain, which is highly literalistic, tells me that this kind of exercise is a distraction that doesn't tell us anything meaningful.*

You can read a brief explanation of whatthe simulations are:

*[The models] take lots of polls, perform various types of adjustments to them, and then blend them with other kinds of empirically useful indicators (what we sometimes call “the fundamentals”) to forecast each race. Then they account for the uncertainty in the forecast and simulate the election thousands of times.*

So, what the model does it take something they're unsure about--say, the amount of polling error--and then run the model with different values in there. What is the error if 5% in favor of Dems? 7%? 10%? What about 5% in favor of Republicans? It's a way to account for the uncertainty in the model.

They do this thousands of time, and look at what percentage of the time the Democrats win in the simulated races vs. the Republicans. The idea is to tell us which scenario is more likely, even given all of the uncertainty in the models.

posted by damayanti at 1:19 PM on November 9, 2022

*The 2022 election can be held once and never again*

Jose Altuve will only be up to bat once with a 1-2 count in the top of the 3rd inning of game 5 of the 2022 World Series against the Phillies in Citizens Bank Park once. But you yourself admitted that there's a statistical model that predicted he'd get a hit 30% of the time (.300 BA = 30%). Obviously you can't verify whether the 30% model was exactly correct, but you can evaluate it more generally. The batting average model predicted that an out was more than twice as likely as a hit (1.000 - .300 = .700 = 70% chance of an out) in that situation, and indeed, when this situation came up last Thursday, he grounded out to short. For the whole Game 5, he got two hits in four at-bats, which is consistent with the batting average prediction, which would expect 1.2 hits for four at-bats.

And again, he's never going to come up to bat in that exact situation again, so the probably is no longer 30% that he'll get a hit. It's 0%, because we know what happened and it's pointless to try to guess the outcome of well-documented past events.

posted by kevinbelt at 1:29 PM on November 9, 2022

There are two practical ways to think about this:

1) This is one of many projections they do. The goal is to have the distribution of actual outcomes mirror the prediction of uncertainty. So if you look at the universe of all projections after the fact, 10 percent of the actual outcomes should be in the 0-10th percentile of the estimated distribution, 10 percent in the 11th-20th percentiles, etc. And of course, half of the outcomes should be above the median and half below.

2) For any specific outcome, the modelers should be willing to put their money is. That is, they should be willing to take (slightly better than) 10-1 odds that the outcome will fall in the first decile, second decile, etc.

This is the conceptual goal; how to operationalize it is much harder. I think that it's based on the historical distribution of error between modeled and actual outcomes.

posted by Mr.Know-it-some at 1:32 PM on November 9, 2022

1) This is one of many projections they do. The goal is to have the distribution of actual outcomes mirror the prediction of uncertainty. So if you look at the universe of all projections after the fact, 10 percent of the actual outcomes should be in the 0-10th percentile of the estimated distribution, 10 percent in the 11th-20th percentiles, etc. And of course, half of the outcomes should be above the median and half below.

2) For any specific outcome, the modelers should be willing to put their money is. That is, they should be willing to take (slightly better than) 10-1 odds that the outcome will fall in the first decile, second decile, etc.

This is the conceptual goal; how to operationalize it is much harder. I think that it's based on the historical distribution of error between modeled and actual outcomes.

posted by Mr.Know-it-some at 1:32 PM on November 9, 2022

There is a theoretical point, that if you knew the position and speed of every particle in the field, every bit of the ball, every movement of air, every neuron in his brain, every noise in the stadium, that you would be able to tell with 100% certainty whether or not Jose Altuve will hit a specific pitch. And if you could create the exact same conditions, you would get the exact same result every time.

But we can't because we can't measure to that level of detail. We know that 30% of the time Altuve will hit the ball, and that this is more likely than if you swapped him for a replacement-level player. But there are specific at-bats where Altuve will not hit the ball and in a subset of those the replacement-level player would have.

It's possible that we can come up with more information to help clarify the outcomes; perhaps if the pitcher is good (for some definition of good), there's a 20% chance he hits the ball and if the pitcher is bad there's a 40% chance. Perhaps if Altuve is rested, that adds 3% to his chances, or if the game is already a blowout it reduces his chances by 5%. Again, if we knew

And this is the exact same thing with election forecasts; if we could tell exactly if and how every voter would vote, then there is only one possible election result. But beforehand, we know a hell of a lot less. We know polls, but those are noisy -- not only are they samples of voters, but they're inconsistent. Five independent polls in Pennsylvania that came out in November had Oz +3, Oz +2, Oz +1, Fetterman +1 and Fetterman +6. Based on these and other recent polls, 538 thought that the most likely outcome was Oz +1.

But there's actually a range of possibilities; just like the most likely outcome every time Altuve steps to the plate is he doesn't hit the ball, that's not what happens some of the time. Based on how polls have historically represented the vote, 538 has some idea that a +1 polling average translates to the leading candidate winning say 60% of the time -- less likely than Altuve not hitting the ball.

Things get more complex because outcomes are correlated; if the opposing pitcher sucks, not only is Altuve more likely to hit the ball, so are his teammates. In the same way, youth vote turnout was high, which meant more Democrats won (but this would have a larger effect in younger states, rather than Florida.) Eventually, it's easier to set up a system that represents the probabilities and the connections between them and randomly pick them to figure out what is more likely than it is. That's a monte carlo model, like 538 uses.

One of the other things that can be done to assess a model's reliability after the fact is to look at how well it performs relative to it's confidence. The '2022 midterms' isn't one event, any more than the World Series is one event. There are hundreds of congressional elections and dozens of Senate and Governor races each of which has a probability associated with it. You can look after the fact at the races where the model thought a candidate had a 70% chance of winning; you would expect about 70% of those to have gone the way the model thought was most likely. If only half the leading candidates won, the model would be too confident. If 90% of those won, then the model was under confident. (This is checking the 'calibration' of the model.) In the 2016 election, all of the most talked about models predicted Hillary was most likely to win, but a model (like 538) that thought she had a 65% chance was a better model than one (like the Princeton one) that thought she had a 99.9% chance.

posted by Superilla at 1:36 PM on November 9, 2022 [2 favorites]

But we can't because we can't measure to that level of detail. We know that 30% of the time Altuve will hit the ball, and that this is more likely than if you swapped him for a replacement-level player. But there are specific at-bats where Altuve will not hit the ball and in a subset of those the replacement-level player would have.

It's possible that we can come up with more information to help clarify the outcomes; perhaps if the pitcher is good (for some definition of good), there's a 20% chance he hits the ball and if the pitcher is bad there's a 40% chance. Perhaps if Altuve is rested, that adds 3% to his chances, or if the game is already a blowout it reduces his chances by 5%. Again, if we knew

*everything*we would be able to tell exactly on every pitch what would happen, but we don't know everything.And this is the exact same thing with election forecasts; if we could tell exactly if and how every voter would vote, then there is only one possible election result. But beforehand, we know a hell of a lot less. We know polls, but those are noisy -- not only are they samples of voters, but they're inconsistent. Five independent polls in Pennsylvania that came out in November had Oz +3, Oz +2, Oz +1, Fetterman +1 and Fetterman +6. Based on these and other recent polls, 538 thought that the most likely outcome was Oz +1.

But there's actually a range of possibilities; just like the most likely outcome every time Altuve steps to the plate is he doesn't hit the ball, that's not what happens some of the time. Based on how polls have historically represented the vote, 538 has some idea that a +1 polling average translates to the leading candidate winning say 60% of the time -- less likely than Altuve not hitting the ball.

Things get more complex because outcomes are correlated; if the opposing pitcher sucks, not only is Altuve more likely to hit the ball, so are his teammates. In the same way, youth vote turnout was high, which meant more Democrats won (but this would have a larger effect in younger states, rather than Florida.) Eventually, it's easier to set up a system that represents the probabilities and the connections between them and randomly pick them to figure out what is more likely than it is. That's a monte carlo model, like 538 uses.

One of the other things that can be done to assess a model's reliability after the fact is to look at how well it performs relative to it's confidence. The '2022 midterms' isn't one event, any more than the World Series is one event. There are hundreds of congressional elections and dozens of Senate and Governor races each of which has a probability associated with it. You can look after the fact at the races where the model thought a candidate had a 70% chance of winning; you would expect about 70% of those to have gone the way the model thought was most likely. If only half the leading candidates won, the model would be too confident. If 90% of those won, then the model was under confident. (This is checking the 'calibration' of the model.) In the 2016 election, all of the most talked about models predicted Hillary was most likely to win, but a model (like 538) that thought she had a 65% chance was a better model than one (like the Princeton one) that thought she had a 99.9% chance.

posted by Superilla at 1:36 PM on November 9, 2022 [2 favorites]

*By contrast, an election model is predicting what's going to happen one time: the 2022 midterms, a sui generis event*

A coin toss is perhaps the simplest way to think about odds, I think. Either heads or tails comes up when you flip a coin.

Let's say you and a friend bet $100 on a coin flip.

Your friend provides the coin.

Another friend flips the coin.

You call the side while the coin is in midair.

Like an election, the outcome of that particular coin flip is

*sui generis*: it is uniquely heads or tails, but it cannot be both.

However, your friend might have provided an unfair coin: One side is weighted and your friend has inside knowledge that one side is more likely to be face up.

Before you bet, what you could do first is have a model of coin flipping. $100 is a lot of money.

You could do this by assuming the simplest model, one of fairness: the coin is evenly weighted and either side is equally likely to land face up.

A different way to inform your model is that you know your friend is a cheat, so you expect an unfair coin: one or the other side of the coin is weighted.

You could have other models that consider the manner in which the third friend flips the coin. Do they do a proper flip, or are they sloppy and do they kind of just throw it in the air and let it hit the ground? The physical manner in which the coin is tossed could cause one or the other face to come up more often.

Given some model of fairness/unfairness/toss-mechanics, by having a computer simulate many thousands of coin tosses with that model, you get a sense of whether one side is a better bet than another — if you trust that model's assumptions. The computer provides a distribution of odds for the outcome of a toss, given that model.

What Silver's method does for elections is somewhat more complicated, but it is basically along the same lines.

Election models can be informed by different polls, say, which are weighted differently based on their inherent ideological biases or by polling methodologies. Some polls come from FOX News, say, or they get results by calling old people on landlines, which are less prevalent and introduce the general biases that old people have (they like Social Security, hate young people/gays/abortion/etc.).

These and other details create a model that can be run through a computer repeatedly, to get the odds of one or another political outcome.

*This doesn't specify the precise outcome*on election day, only that certain outcomes are perhaps more likely than others. Silver's modeling can fail spectacularly (see: 2016).

posted by They sucked his brains out! at 1:55 PM on November 9, 2022

Another way to think about it is that your own statement is also a probability: "Looks like Dems are going to lose, unless we're missing something" is the equivalent of saying "Dems have a less than 50% chance of winning." If you know lots of information, you can be more precise -- maybe it's quite unlikely Dems will win (10%? 20%?) or maybe the parties are really neck-and-neck (close to 50%). The percent is a way of expressing, based on all the information you have -- gas prices, the President's approval rating, polling averages -- how good or bad it looks for Democrats.

The difference with baseball isn't that it's a different kind of statistics: each at bat is its own thing and he will either get a hit or he won't, just like the election comes out however it comes out. the difference is there's a lot more data in baseball (each year there are a lot more at bats than there are elections) and fewer variables people look at to calculate odds.

posted by alligatorpear at 3:56 PM on November 9, 2022

The difference with baseball isn't that it's a different kind of statistics: each at bat is its own thing and he will either get a hit or he won't, just like the election comes out however it comes out. the difference is there's a lot more data in baseball (each year there are a lot more at bats than there are elections) and fewer variables people look at to calculate odds.

posted by alligatorpear at 3:56 PM on November 9, 2022

As a practical analogy, and cutting out all the math:

Imagine you are doing a big renovation, and the contractor says you won't be able to use your kitchen for a month. But you also talk to all your friends and neighbors, and everyone who's done a renovation that size, it's taken on average three months and never less than two. And that it's always been at least twice what the contractor estimated. Your remodel is different and unique, but you probably are going to assume at this point (unless you are a true and eternal optimist) that your renovation will take

This sort of "I took an average" is a very simple model, but even that has value. All the unique events are happening, and you're not trying to predict them, but it's clear that the chance of finishing in one month is statistically, vanishingly small. If you had a really big database, rather than just talking to friends, you could make a fancier model. Looks like rain and supply issues are big delays? Then doing it in dry weather in July and using expensive suppliers who always deliver on time means you're lower risk!

The math just lets you put numbers on this--on what you know and more importantly don't know. Maybe you don't know the quality of the subcontractors doing the roofing, but even without that you might see that 95% of projects of your size, in July, with high supplier quality, take between 10 and 18 weeks, and the smart money is to bet accordingly.

The odds on elections are

In general, the number crunching has found that unique events don't have a that big an impact. If they did you could still model it, but the odds might hover around 50-50 until very close to election day, at which point they'd trust the polls more because the odds of a last second change are pretty low.

posted by mark k at 6:07 PM on November 9, 2022

Imagine you are doing a big renovation, and the contractor says you won't be able to use your kitchen for a month. But you also talk to all your friends and neighbors, and everyone who's done a renovation that size, it's taken on average three months and never less than two. And that it's always been at least twice what the contractor estimated. Your remodel is different and unique, but you probably are going to assume at this point (unless you are a true and eternal optimist) that your renovation will take

*at least*two months and probably three or more.This sort of "I took an average" is a very simple model, but even that has value. All the unique events are happening, and you're not trying to predict them, but it's clear that the chance of finishing in one month is statistically, vanishingly small. If you had a really big database, rather than just talking to friends, you could make a fancier model. Looks like rain and supply issues are big delays? Then doing it in dry weather in July and using expensive suppliers who always deliver on time means you're lower risk!

The math just lets you put numbers on this--on what you know and more importantly don't know. Maybe you don't know the quality of the subcontractors doing the roofing, but even without that you might see that 95% of projects of your size, in July, with high supplier quality, take between 10 and 18 weeks, and the smart money is to bet accordingly.

The odds on elections are

*basically*that. They don't know how the candidates will perform in a debate and don't try to model it, but they do know the maximum historical difference a debate will make (which is not much.)In general, the number crunching has found that unique events don't have a that big an impact. If they did you could still model it, but the odds might hover around 50-50 until very close to election day, at which point they'd trust the polls more because the odds of a last second change are pretty low.

posted by mark k at 6:07 PM on November 9, 2022

Best answer: May I be so bold to suggest that you may be a Bayesian (and not a frequentist) statistician?

If you are unfamiliar with stats land, you may be unaware that that there is a "controversy" between two different statistical approaches (the frequentists vs the Bayesians).

In short, the the frequentist approach is how stats is taught in a "Stats 101 course." IMHO Many people forget how un-intuitive stats 101 can be at first. Most? people find Bayesian statistics challenging as well, but IMHO the two approaches involve "confusing the new learner's brain" in different ways.

To cite Larry Wasserman :

So basically the Bayesians are like okay... it's one thing to flip a fair coin a bunch of times and expect that in the long run about 50% of the flips will be heads and 50% will be tails. Moreover, the frequentist approach may make intuitive sense for a car insurance company to know that over the course of 5 years, it has an x% 'chance' of paying out for a given policy member. However, the Bayesian approach would say that it doesn't make sense to equate to predicting the outcome of flipping a coin to predicting the outcome of an election. A given election only happens once. And once we know the outcome of the election... we wouldn't need to predict it.

Instead the Bayesian approach uses a lot of math to try to answer the question "how certain are we about the outcome?" This Hannah Fry video explains this approach much better than I can.

So this is a long convoluted way to say that the Nate Silver approach is from the frequentist school. This isn't to say that the frequentist approach doesn't (or can't work). In fact, these two approaches often are different sides of the same coin. But rather you have a small contingent of people who agree with you that it requires a lot of "suspension of belief" to say that we can predict the outcome of one election based on the outcomes of elections that happen in hundreds of alternative universes.

posted by oceano at 11:54 PM on November 9, 2022 [2 favorites]

If you are unfamiliar with stats land, you may be unaware that that there is a "controversy" between two different statistical approaches (the frequentists vs the Bayesians).

In short, the the frequentist approach is how stats is taught in a "Stats 101 course." IMHO Many people forget how un-intuitive stats 101 can be at first. Most? people find Bayesian statistics challenging as well, but IMHO the two approaches involve "confusing the new learner's brain" in different ways.

To cite Larry Wasserman :

"In frequentist inference, probabilities are interpreted as long run frequencies. The goal is to create procedures with long run frequency guarantees.""In frequentist inference, probabilities are interpreted as long run frequencies. The goal is to create procedures with long run frequency guarantees."

*"In Bayesian inference, probabilities are interpreted as subjective degrees of belief. The goal is to state and analyze your beliefs."*

So basically the Bayesians are like okay... it's one thing to flip a fair coin a bunch of times and expect that in the long run about 50% of the flips will be heads and 50% will be tails. Moreover, the frequentist approach may make intuitive sense for a car insurance company to know that over the course of 5 years, it has an x% 'chance' of paying out for a given policy member. However, the Bayesian approach would say that it doesn't make sense to equate to predicting the outcome of flipping a coin to predicting the outcome of an election. A given election only happens once. And once we know the outcome of the election... we wouldn't need to predict it.

Instead the Bayesian approach uses a lot of math to try to answer the question "how certain are we about the outcome?" This Hannah Fry video explains this approach much better than I can.

So this is a long convoluted way to say that the Nate Silver approach is from the frequentist school. This isn't to say that the frequentist approach doesn't (or can't work). In fact, these two approaches often are different sides of the same coin. But rather you have a small contingent of people who agree with you that it requires a lot of "suspension of belief" to say that we can predict the outcome of one election based on the outcomes of elections that happen in hundreds of alternative universes.

posted by oceano at 11:54 PM on November 9, 2022 [2 favorites]

FiveThirtyEight has a page specifically for this showing how past prediction %s compared to actual results: How Good Are FiveThirtyEight Forecasts?

posted by Rhaomi at 12:06 AM on November 10, 2022

posted by Rhaomi at 12:06 AM on November 10, 2022

The probability part is created by the pre-election polling. Since a poll doesn't ask everybody, there is chance involved with who does get asked.

For a simple example, given a coin, you might flip it 100 times to see if it's a fair coin, i.e. comes up heads 50% of the time. Then you flip it the one time that counts.

posted by SemiSalt at 5:26 AM on November 10, 2022

For a simple example, given a coin, you might flip it 100 times to see if it's a fair coin, i.e. comes up heads 50% of the time. Then you flip it the one time that counts.

posted by SemiSalt at 5:26 AM on November 10, 2022

Best answer: Coming back for more because I think this is a really good question. I'm going to focus more on actual electoral projections this time, and I'll start by asking you a couple questions:

First, imagine an election where there's only one candidate on the ballot. Who is going to win that election? This might sound like a silly question, but your answer is based on a statistical model, to wit, "unopposed candidates will win 100% of votes cast".

Let's get a little more complicated. This cycle included two elections for US Senate seats in Oklahoma. Which party did you expect to win them? Republicans, right? Why? Because Oklahoma is one of the deepest red states in the country. Republicans make up an absolute majority of registered voters in the state. Democrats are fewer than 30%. The state went 65-28 for Trump in 2016 and 65-32 in 2020. But all of those things are statistics, just like batting average. Just like we expect Jose Altuve to hit somewhere around .300 next year because he's hit roughly .300 for a while now, we expect Oklahoma to vote Republican because they always vote Republican. You can say the same about Vermont for the Democrats.

So you can hopefully see that a simple model with inputs of voter registration, recent statewide elections, etc. has some validity. If we leave out numbers and go with a more qualitative output like "Republicans will win big in Oklahoma", well, our model is resoundingly correct. Even if we did try to get quantitative - let's say we average the last two presidential election results - we would predict a 65-30 Republican win in each seat, and that's... pretty close. One was 64-32 and one was 61-35. Both a little closer than our simple model predicted, but not enough to come anywhere close to mattering.

I hope I'm convincing you that statistical models can predict elections, but you still probably have a reservation about swing states, which you should, because that's the hard question. Wisconsin went 47-46 for Trump in 2016 but 50-49 for Biden in 2020. Our simple-average-of-the-last-two-presidential-elections model would predict a 48-48 split between Ron Johnson and Mandela Barnes. Johnson ended up winning 50-49. Our model was close, but not close enough, so why not?

Simply put, because it's not actually a great model. We're not actually modeling anything based on the race in question. Maybe Sconnies like Ron Johnson more than they like Donald Trump. His last race, he beat Russ Feingold (a former Senator himself) 50-47 when Trump only won 47-46. So maybe account for a point or two more than what our average-model predicts, and then we 49-48 or 50-48 and now we're getting close. But it's still kind of a dumb, basic model.

This is where I'm going to start talking about polls, because while polls have their flaws, they are useful for modeling. Let's make a new model, averaging the last three major polls for three swing states: Wisconsin, Pennsylvania, and Ohio. And to simplify, we'll just take the winner. In Wisconsin, Johnson won all three. In Ohio, JD Vance won all three. In Pennsylvania, it went 2-1 Oz. That gives us 27 possible results (3 WI * 3 OH * 3 PA). In 18 of them, Republicans, win all three, and in nine, Republicans win two. We can then say Republicans have a 2 to 1 chance of winning all three.

A poll is essentially just a practice election. Going back to our Jose Altuve analogy, polls are batting practice. A real at-bat is an opponent pitching to Altuve; BP is a coach or teammate pitching to him. But he's probably going to be a pretty similar hitter in both scenarios. I've never seen Altuve take BP, but I'd be surprised if he's either hitting like Aaron Judge, or swinging and missing 90% of the time. Likewise, an election is just a state-sponsored poll, or, on the flipside, a poll is an practice election where the pollster is a political scientist or media organization instead of the state. Polls can be wrong, but so can BP. Most players make contact more often in BP than in live games, right? But you can correct for that. If you see Altuve taking BP, and he hits every single ball into the gap between second and short, you don't expect him to hit 1.000. Likewise, just because 52% of Sconnies answered a poll that they'd be voting for Mandela Barnes (this happened, a Marquette University poll of 713 likely voters in August), that doesn't mean Barnes will get 52% of votes. Sophisticated models take multiple polls into account (and indeed, nearly all polls after that show Johnson winning, by varying margins), along with other non-poll factors. And the sophistication of models can vary by race. As we discussed, you don't need a ton of inputs to model Oklahoma voting for a Republican. Save your resources there and invest them into more detailed models for Wisconsin or Pennsylvania.

The point I'm trying to make, though, is that these detailed models are not all that different than your gut feeling. It's a difference of degree, not of kind. You could say that the point of probability theory is to make your gut feeling more accurate.

posted by kevinbelt at 10:48 AM on November 10, 2022

First, imagine an election where there's only one candidate on the ballot. Who is going to win that election? This might sound like a silly question, but your answer is based on a statistical model, to wit, "unopposed candidates will win 100% of votes cast".

Let's get a little more complicated. This cycle included two elections for US Senate seats in Oklahoma. Which party did you expect to win them? Republicans, right? Why? Because Oklahoma is one of the deepest red states in the country. Republicans make up an absolute majority of registered voters in the state. Democrats are fewer than 30%. The state went 65-28 for Trump in 2016 and 65-32 in 2020. But all of those things are statistics, just like batting average. Just like we expect Jose Altuve to hit somewhere around .300 next year because he's hit roughly .300 for a while now, we expect Oklahoma to vote Republican because they always vote Republican. You can say the same about Vermont for the Democrats.

So you can hopefully see that a simple model with inputs of voter registration, recent statewide elections, etc. has some validity. If we leave out numbers and go with a more qualitative output like "Republicans will win big in Oklahoma", well, our model is resoundingly correct. Even if we did try to get quantitative - let's say we average the last two presidential election results - we would predict a 65-30 Republican win in each seat, and that's... pretty close. One was 64-32 and one was 61-35. Both a little closer than our simple model predicted, but not enough to come anywhere close to mattering.

I hope I'm convincing you that statistical models can predict elections, but you still probably have a reservation about swing states, which you should, because that's the hard question. Wisconsin went 47-46 for Trump in 2016 but 50-49 for Biden in 2020. Our simple-average-of-the-last-two-presidential-elections model would predict a 48-48 split between Ron Johnson and Mandela Barnes. Johnson ended up winning 50-49. Our model was close, but not close enough, so why not?

Simply put, because it's not actually a great model. We're not actually modeling anything based on the race in question. Maybe Sconnies like Ron Johnson more than they like Donald Trump. His last race, he beat Russ Feingold (a former Senator himself) 50-47 when Trump only won 47-46. So maybe account for a point or two more than what our average-model predicts, and then we 49-48 or 50-48 and now we're getting close. But it's still kind of a dumb, basic model.

This is where I'm going to start talking about polls, because while polls have their flaws, they are useful for modeling. Let's make a new model, averaging the last three major polls for three swing states: Wisconsin, Pennsylvania, and Ohio. And to simplify, we'll just take the winner. In Wisconsin, Johnson won all three. In Ohio, JD Vance won all three. In Pennsylvania, it went 2-1 Oz. That gives us 27 possible results (3 WI * 3 OH * 3 PA). In 18 of them, Republicans, win all three, and in nine, Republicans win two. We can then say Republicans have a 2 to 1 chance of winning all three.

A poll is essentially just a practice election. Going back to our Jose Altuve analogy, polls are batting practice. A real at-bat is an opponent pitching to Altuve; BP is a coach or teammate pitching to him. But he's probably going to be a pretty similar hitter in both scenarios. I've never seen Altuve take BP, but I'd be surprised if he's either hitting like Aaron Judge, or swinging and missing 90% of the time. Likewise, an election is just a state-sponsored poll, or, on the flipside, a poll is an practice election where the pollster is a political scientist or media organization instead of the state. Polls can be wrong, but so can BP. Most players make contact more often in BP than in live games, right? But you can correct for that. If you see Altuve taking BP, and he hits every single ball into the gap between second and short, you don't expect him to hit 1.000. Likewise, just because 52% of Sconnies answered a poll that they'd be voting for Mandela Barnes (this happened, a Marquette University poll of 713 likely voters in August), that doesn't mean Barnes will get 52% of votes. Sophisticated models take multiple polls into account (and indeed, nearly all polls after that show Johnson winning, by varying margins), along with other non-poll factors. And the sophistication of models can vary by race. As we discussed, you don't need a ton of inputs to model Oklahoma voting for a Republican. Save your resources there and invest them into more detailed models for Wisconsin or Pennsylvania.

The point I'm trying to make, though, is that these detailed models are not all that different than your gut feeling. It's a difference of degree, not of kind. You could say that the point of probability theory is to make your gut feeling more accurate.

posted by kevinbelt at 10:48 AM on November 10, 2022

Best answer: I think oceano's comment suggests a useful framing, but applies it exactly backwards. OP is comfortable with frequentist ideas (e.g. batting average). They are not comfortable with assigning probabilities to unique one-time events, which is necessarily a Bayesian project (and 538 is absolutely Bayesian -- don't get misled by the repeated simulation aspect of it; their model is all about updating priors based on every new poll that comes in, and that's how Bayesian statistics works).

posted by aws17576 at 11:56 PM on November 10, 2022 [1 favorite]

posted by aws17576 at 11:56 PM on November 10, 2022 [1 favorite]

Frequentists are fine assigning probabilities to unique one time events, and you do not need a Bayesian approach to do that. The "frequentist" framing is the metaphor for understanding the inputs and the results, and does not need to be taken literally.

And FWIW a frequentist would also be updating their odds with every new poll that comes in. They just don't call that a "prior."

posted by mark k at 7:51 AM on November 11, 2022

And FWIW a frequentist would also be updating their odds with every new poll that comes in. They just don't call that a "prior."

posted by mark k at 7:51 AM on November 11, 2022

« Older 📢📢📢 Week 2 Fundraiser Update – With your help... | Audio recording with 360 degree coverage Newer »

You are not logged in, either login or create an account to post comments

posted by kensington314 at 12:13 PM on November 9, 2022