Are probabilistic theories scientific?September 8, 2011 1:08 AM   Subscribe

I'm struggling to understand the empirical content of probability theory. I understand the mathematical theory, and I understand how we get from empirical observations to a mathematical model. I do not understand how we get from the mathematical model back to the real world, e.g., what is the "empirical content" of a statement like "event x will occur with probability p"?

Scientific theories should produce predictions that can be verified or falsified empirically. For example, I may construct a mathematical model of a real-world physical system, based on empirical measurements of the real-world system. Then the (mathematical) theories of physics allow me to derive predictions, in the form of statements about measurable quantities.

Now I can perform measurements, and they can either agree with the predictions or not. Of course, it is a problem in itself to decide what a negative result means, as confirmation holists will tell you. In any case I would claim that there exist physical models and corresponding experiments for which I can determine by measurements whether the experimental result is consistent with the prediction or not (think classical mechanics, falling apples etc).

Now take a probabilistic theory, which, after feeding it with real-world measurements, produces predictions of the form: "quantity x will have value y with probability p". What does that mean? No measurement (and no number of repeated measurements) can ever verify or falsify such a statement. Talking about "statistical significance" doesn't really help, since what is "significant" is an arbitrary convention (if I am not mistaken).

It seems clear that frequentist interpretations of probability cannot make sense in the real world, since there are no "infinite sequences" and "limiting relative frequencies" in the real world. So we are left with some kind of Bayesian "probability-as-belief" interpretation. But what we actually mean is "probability-as-justified-belief", and the justification for my beliefs must lie outside my mind. Saying that probability reflects "the beliefs of rational agents" hides the issue in the word "rational", without explaining it. In short, I don't see how talking about minds or agents clarifies the concept of probability.

Some people say that, because of quantum uncertainty, the latter situation (no prediction can be verified/falsified by measurements) is actually the rule, even for non-probabilistic theories. As far as my very limited understanding goes, the uncertainty principle does not contradict the possibility of empirical verification/falsification, since it tells us that what we can measure is an approximation of "reality" (whatever that is), and we know the "coarseness" of this approximation. Hence if a physical theory makes a prediction about a macroscopic quantity, and my measurement is off by a large enough value, I know that the prediction has been contradicted, never mind the uncertainty principle.

What am I missing?
posted by ochlophonic to Science & Nature (23 answers total) 20 users marked this as a favorite

what is the "empirical content" of a statement like "event x will occur with probability p"?

In the simplest example, if you flip a fair coin, you are equally likely to get heads or tails. If someone else somewhere in the world flips a fair coin, they, too, are equally likely to get either side.

The mathematics are a convenient tool for translating that sentence into a model, which can be tested empirically by other scientists.

Some physicists believe that mathematics are reality; some are critical of that view. I am not a physicist so I do not know if this question is resolved.
posted by Blazecock Pileon at 1:31 AM on September 8, 2011

Thank you for that link, looks very interesting. Concerning your answer, it's precisely statements like "you are equally likely to get heads or tails" that I have a problem with. Of course I understand what you want to say (in an everyday-language sense), but how do you "test empirically" whether the coin is fair? Flip it once, you get either heads or tails. Were both outcomes equally likely? You can' tell. A fair coin can come up heads a thousand times in a row, all you can say then is that the sequence was "very unlikely". The "fair coin" is a purely mathematical concept.
posted by ochlophonic at 2:03 AM on September 8, 2011

This is a very real problem, and you're not alone in finding it not entirely well-defined.

I find the most sensible interpretation is the Bayesian one. Where I disagree with you is in saying all of the work there is done by the word "rational." One common justification for probabilities within a Bayesian framework are the so-called Dutch book arguments. In short, the idea is that a degree of belief X is coherent (and I guess you would say justified or valid in the external world) if it is such that nobody betting against someone who believed X would be able to make a "Dutch book" against them - i.e., nobody betting against the X-believer would be able to find a long-term winning strategy.

People find this more or less satisfying, as do I, depending on my mood. Still, most of the problems I have with it are problems with anything that seeks its validation in strings of future events in the external world - and these are basically Humean problems of induction which afflict a lot of science, not just probability theory.
posted by forza at 2:03 AM on September 8, 2011

Thank you forza. I guess "long-term winning strategies" have the same flavor as "limiting relative frequencies", i.e., you can't tell whether the strategy is winning just by looking at a finite prefix (of game moves)?
posted by ochlophonic at 2:11 AM on September 8, 2011

In a way, yes, in a way, no (from what I understand; I'm not a philosopher of science). People have used the Dutch Book idea to come up with axioms of probability theory (see, e.g., Kolmogorov axioms or Cox axioms). These axioms serve as necessary and sufficient conditions for ensuring that a Dutch Book couldn't be made against one. So it's not entirely the same flavour.

If you're interested in reading on this topic, I have found Jaynes to be fairly readable and thought-provoking.
posted by forza at 3:29 AM on September 8, 2011

I like Nick Taleb's story about the two kinds of intelligence and the coin flip. Given a coin that has flipped heads 95 times in a row, the PhD takes a 2-1 bet on tails, reasoning that the odds are 50% he'll win and past performance of the coin means nothing. The street-smart hustler, who does not understand classical statistics, refuses to touch the get because he believes the coin is rigged. Taleb's point is thaherbs hustler is the one looking at reality more often.

In other words, the assumptions behind our models are off, but because of blind faith in the assumptions (not even thinking about them, really) mixed with correct application of probability theory to the false assumptions, we routinely make mistakes in areas like finance overconfidently instead of treading carefully in unknown territory. You're doing well to be skeptical of blindly applying probability theory just because you know how to calculate probability.
posted by michaelh at 5:35 AM on September 8, 2011

That the hustler*. Stupid phone.
posted by michaelh at 5:36 AM on September 8, 2011

In the simplest example, if you flip a fair coin, you are equally likely to get heads or tails. If someone else somewhere in the world flips a fair coin, they, too, are equally likely to get either side.

Not so fast there. (pdf)

... what is "significant" is an arbitrary convention (if I am not mistaken) .... and my measurement is off by a large enough value ...

You're not mistaken that it's an arbitrary convention, but I think these kinds of conventions are a necessary component of any scientific methodology. What counts as a 'large enough value' to determine that a physical measurement has contradicted a theory's prediction?
posted by noahpoah at 5:52 AM on September 8, 2011 [1 favorite]

When probability is actually used in empirical research, the researchers and others acting on the results research have to make judgment calls about what do with that information. For most part, it's a matter of how important it is avoid making certain kinds of mistakes. These judgments fall outside of the sphere of mathematical theory. Statistical analysis can quantify uncertainty, but it doesn't tell us what to do with that uncertainty.

If someone is testing a batch to make sure they're fair, they will set some level of deviation given a certain number of trials to eliminate coins suspected of being defective. We can say that at certain level approximately what percentage of coins rejected will actually be defective and what percentage will be false positives, but we can't identify which individual coins are which. At same level, a certain percentage of coins that actually are defective will be retained.

What level is actually used as a cut-off will depend on how important it is for the coins to be fair, how close to exactly fair they need to be, the expense of replacing coins rejected as possibly defective, the amount of time that can practically be spent testing them, and so forth. The actual level used as a cut-off will involve a judgment about what levels of what kinds of error are acceptable under the circumstances.
posted by nangar at 6:47 AM on September 8, 2011

Ultimately, I think you're just mistaking the role of confirmation and disconfirmation or falsification, especially in a larger community. Take your assertion about the coins -- your complaint is that you can't ever really \emph{verify}, without the possibility of error, that a given coin is truly and eternally fair (or unfair), because we can't actually flip it a literally-infinite number of times. There are going to be some coins that we say are fair that are actually not, and some coins that we say are unfair that actually are fair.

This... just isn't a real problem for almost all circumstances. Science isn't about determining the eternal truth of things. More the opposite; it's acceptance that even cherished, highly-confirmed theories can still be wrong, that there is (almost) always a better theory out there somewhere.

Now take a probabilistic theory, which, after feeding it with real-world measurements, produces predictions of the form: "quantity x will have value y with probability p". What does that mean? No measurement (and no number of repeated measurements) can ever verify or falsify such a statement.

Sure you can. It's actually quite easy, if probably very tedious and expensive. You just sample a whole bunch of instances that create quantity X. Each of those instances, you iterate a whole bunch of times depending on what precision you require... the key is that you don't require that precision to be infinite. Then look at the distribution of your observed p's. Is it centered close to the predicted p's? Does the shape of the distribution of p's correspond closely to the distribution you predict should arise from repeated observation of empirical p's of a certain sample size?

Congratulations! You have confirmed this prediction of your theory, for now. Which is all you'll ever get from science. Confirmation... for now. Rejection... for now.

Does this mean that all people in your field will automatically now agree that your theory is correct? Of course not. Some may disagree that your observed distribution of p's is close enough to the shape predicted by your theory. Others may think that your observed overall p of 0.53 is not close enough to your predicted p of 0.5 exactly, where you assert that it is. Those people now have a strong incentive to (1) perform their own empirical assessments in the hopes that they show a result that differs strongly enough from yours and (2) to derive other predictions from your theory to attempt to disconfirm.
posted by ROU_Xenophobe at 7:07 AM on September 8, 2011 [1 favorite]

Flip it once, you get either heads or tails. Were both outcomes equally likely? You can' tell.

In a sample of one, you can't tell if the coin is fair, definitely. However, science is done with more than one sample. Empirical repeatability is how the mathematics are said to either "be" reality, or align with reality enough so that the maths are useful.
posted by Blazecock Pileon at 7:28 AM on September 8, 2011

Engineer with stats background here. The way I understand statistics is you are trying to predict a problem/situation with a single equation. Normally that problem is complex enough that a simple quadratic equation will not suffice.

Each has its factors that are visible, non visible and interactions between the factors can produce any amount of results. Most experiments with statistics are done within lab condition, Factor 1- constant, Factor 2 - (+5, -5). You may be able to replicate many results within the lab conditions with a high probability but as soon as you do "real world conditions" , it all changes. Factor 1 and Factor 2 are not in the same conditions + Additional factors like (temprature, humidity etc) and the interactions between these and your primary factors can change the results drastically. As Rou_Xenophobe said, you can keep doing sampling, from multiple factors , find their interactions one by one, but its too costly, time consuming and frustrating.

The above reason is why Design of Experiments and most of statistics is a educated guessing technique which allows you to have the best possible equation within a certain margin of error( lower margin of error = higher sample size = More time, money etc) given the experiments that have run.
posted by radsqd at 7:32 AM on September 8, 2011

Part of the problem is that you're insisting on perfect in the real world. There's no such thing a a perfectly fair coin, there's no way to determine if a coin is perfectly fair. Similarly, there's no such this as a perfect circle or perfect triangle in the real world (they would both have to be infinitely thin), there's no such thing as a table that's exactly 6 feet long (every table is a tiny fraction* off, no matter how small), etc.

mathematical concepts do not exist in the real world. They are imposed on the real world as models and approximations. I've flipped this coin a million times and it's close to fair, so I will call it fair. This looks like a circle, so I will call it a circle. This table is close to 6 feet long, so I will call it 6 feet long.

*strictly speaking, a tiny transcendental number off. True rational numbers are also far more rare than common usage.
posted by yeolcoatl at 8:10 AM on September 8, 2011

What does it mean to say "quantity x will have the value y" with certainty? That is, forget the probability part of it (or give it probability 1.) Because no finite set of measurements can confirm that either. All science is about infinite generalizations.
posted by Obscure Reference at 9:10 AM on September 8, 2011

All theories are probabilistic, and there is no certainty about anything. There are only theories and axioms.

When we say "A fair coin has an equal probability of coming up heads or tails" that isn't really a theory, it's an axiom. When we flip a coin and track the results, we aren't trying to determine whether the statement above is true, rather we are trying to determine whether this particular coin is "fair."

So say we flip a coin a thousand times and it come up heads every time. We still can't say for sure whether or not the coin is fair. After all, even a sequence this unlikely will eventually happen if you flip enough fair coins. But what we can do is calculate the probability that this coin is fair, which in turn allows us to assign a number to our certainty about its unfairness.

In this way, a probabilistic theory is no different from any other. Consider, as an example, an alternate theory that posits "This coin has heads on both sides" but that the only evidence we have available to us this the ability to flip the coin and then look at the top face. This theory makes no probabilistic claim, but upon seeing heads come up a thousand times, we're still in the same boat as we are with the other theory. We can calculate a degree of certainty, but we can't know for sure.
posted by 256 at 9:23 AM on September 8, 2011

But what we can do is calculate the probability that this coin is fair

Almost... we can calculate the probability that we would see such a string if the coin were fair. But the probability that the coin is fair is either exactly zero or exactly one.
posted by ROU_Xenophobe at 10:17 AM on September 8, 2011 [1 favorite]

I think I have to disagree a bit with the premise of your question. I recently spent a few years working in particle physics, where probability theory is paramount.

I found that while Bayesian principles were used in some cases, the Frequentist paradigm dominated.

In quantum physics, you get situations where, with identical initial conditions, the same process will give result A 80% of the time and result B 20% of the time. This is in line with the frequentist model where "80% probability" implies only that you will get the result 8 out of 10 times.

Bayesian models imply that predicting something has an 80% probability means that we lack some fundamental knowledge of the system we're studying. Bell's Theorem tells us that this isn't the case in quantum theory.
posted by auto-correct at 10:34 AM on September 8, 2011

Often "likelihood" and "probability" are treated as distinct concepts, which helps unmuddy statements like ROUX's (or it's supposed to).
posted by hattifattener at 10:36 AM on September 8, 2011

For example, I may construct a mathematical model of a real-world physical system, based on empirical measurements of the real-world system. Then the (mathematical) theories of physics allow me to derive predictions, in the form of statements about measurable quantities.

(Emphasis mine)

The measurements you take, and the statements you make about those measurable quantities are, at their very heart, still probabilistic. For example, air pressure. You have Boyle's law: pV=K. You conduct an experiment where you vary the volume, and the pressure behaves inversely, voila you've gone from mathematical model to real world.

But wait, the air pressure you measured was probabilistic in nature ... it's just that there was so many molecules that the variance between the *average* air pressure and the *actual* air pressure is extremely small. However, if you were to make a tiny container that contained a few hundred molecules or less, you'd find that you have the exact same problem with measuring air pressure as you have with coin flips. Specifically, you'd find a huge variance in your pressure readings as the random motion of the molecules becomes measurably noticeable.

However, at it's core, this is not any different than a macro version of the experiment. It's just that you have to compensate for the lack of molecules with some other *independent*attribute. If you require macro-scale certainty, then one option would be to conduct the experiment over a long period of time, another would be to have many experiments running in parallel.

The root of the problem, is what determines "good enough", i.e., the boundary between the macro-scale measurement and the micro-scale probabilities. The answer is that there is no boundary and that "deterministic measurement" is really just hand-waving away the probability that the measurement will be off.

tl;dr - probability isn't the illusion, determinism is.
posted by forforf at 10:53 AM on September 8, 2011

Thank you all for your input, there's a lot of stuff for me to think about.

@ROU_Xenophobe: I agree that in practice, theories are accepted or rejected by a community through a process that doesn't follow strict logical rules (I guess that's what Lakatos' research programmes are about). What I'm trying to understand is why somebody would be justified in rejecting a probabilistic theory, for example.

@256: I should have emphasized that I am mainly concerned with falsifiability (therefore the thread title): the theory that the coin has heads on both sides is falsifiable (just check both sides of the coin). On the other hand, the theory that the coin is fair is not falsifiable (at least I don't how).

@auto-correct: thank you for bringing Bell's theorem to my attention, I will definitely have to look at that.

@forforf: I agree with you about statistically defined units... but I don't agree with that last statement of yours:)

Concerning "non-probabilistic theories" and "measurable quantities" I actually had the following thought experiment in mind: imagine a big ("macroscopic") ball which is held at some distance above the ground in some location on earth. Given some values for the mass, the distance, the gravitational field etc., a physical theory will tell me that if I let go of the ball at some instant t, then the ball will hit the ground at some later instant t'.

Now I agree that measurements are inherently imprecise, but we know the maximum error: if I measure the ball's position, mass etc., I know that the "real" value (if such a thing exists) will be within a known interval around the measured value. Now imagine I have a device for dropping the ball, and I know that the drop will occur at a time instant that is close to t (again within a known interval). Propagating all maximum measurement errors, I can derive a time interval around t' during which the ball must hit the ground. Now assume that my computed time interval around t' passes by and I don't see the ball hit the ground. Surely now I know (with certainty) that something is wrong: either my experimental setup does not satisfy the assumptions of the theory, or the theory is wrong?
posted by ochlophonic at 12:11 PM on September 8, 2011

Excellent question. You are right that many physical theories are trivially falsified (modulo ROU_Xenophobe's observations about community acceptance). For example, my physical theory that implied a prediction that the sun wouldn't show up in the sky today has been falsified by my single direct observation of the sun in the sky today. And you are correct that probabilistic predictions frequently [ahem] aren't as easily falsified. And you're right that the frequentist interpretation is essentially useless in the real world (or anywhere).

But I'm not sure that your objections to the "probability as belief" idea foreclose it. In my example above, I am subjectively certain that the sun is out today, and that my theory has been conclusively falsified. [Of course, as Wittgenstein observed: "For 'I know' seems to describe a state of affairs which guarantees what is known, guarantees it as a fact. One always forgets the expression 'I thought I knew'."] Often, I will never have such subjective certainty about the falsification of probabilistic predictions. If a coin comes up heads in 94% of 1000 consecutive flips, I will seriously doubt a "prediction" that it is a fair coin, but I won't be "certain" that it is wrong. Nevertheless, I may have enough comfort in my rejection of the theory that I can act on it without much subjective concern. And that's all that matters to me.

"But what we actually mean is 'probability-as-justified-belief', and the justification for my beliefs must lie outside my mind."

In discussing subjective probability, I always start with Savage's Foundations of Statistics. He is remarkably frank about some of the issues you have raised, and attempts (tentatively) to address some of them. Worth a quick skim if you're seriously interested in theories of personal or subjective probability. There has been significant work since Savage's (his book was published in 1954, I think), but I don't know of anything as comprehensive and as eminently readable.

"what is the 'empirical content' of a statement like 'event x will occur with probability p'?"

First, for many such statements, for it to be anything other than a subjective statement of belief or confidence would be nonsensical - as ROU_Xenophobe observed, the coin is either fair or it isn't; sometimes, event x will either happen certainly or not at all; only our belief about it or is uncertain or our ability to observe or calculate or predict is limited.

I read your question as asking what a particular statement means, and I am unable to answer without clarifying - what it means to whom? To the person making the statement? If we're perfectly honest, often persons making such statements aren't entirely clear on what they mean by them. Or they mean something like "my mathematical model spit out this number p, and I have a vague, functional-but-not-terribly-deep understanding of the basic tools I used to generate that model, but I haven't thought beyond reporting the result my model gave me." Or they may mean "In the past, we have observed x roughly p percent of the time (possibly after adjusting for certain conditions) and nothing important has changed to make the future different from the past." Or they might mean "the trend line (with some precise mathematical definition) suggests that x may occur, but measurement error and model uncertainty and various non-empirical assumptions about error distributions combine to limit our trust in the empirical validity of that trend projection to no more than p." Or maybe "we're missing data we think relevant to predicting x, but given assumptions about that missing data, we think x will occur, but we're only confident to degree p in our assumptions about the missing data." Or they might mean "We observed z, and the chance of observing z if x were true, based on several non-empirical assumptions, is a certain Bayesian function of p." "Or they mean "I am confident, to a degree p, that x will occur." Occasionally they may even mean "I would be willing to bet small amounts of real money that x will occur if (and possibly only if) given odds as good as p/(1-p)." Or something else entirely. But it's hard to know what people mean by such statements without asking them. And what it might mean to anyone else... well, I'm completely out of my speculative depths now.

I should also say that if you "understand how we get from empirical observations to a mathematical model" you are ahead of many practicing scientists, who view that more as an art than a science itself.
posted by dilettanti at 2:18 PM on September 8, 2011 [1 favorite]

What I'm trying to understand is why somebody would be justified in rejecting a probabilistic theory, for example.

You'd be justified in rejecting a probabilistic theory to the extent that the available data did not support that theory. If the theory for whatever reason asserts that p=0.8 or even just roughly that p is high, and people keep finding that p seems to be 0.1-0.3, that doesn't support the theory.
posted by ROU_Xenophobe at 3:00 PM on September 8, 2011

« Older I've been double-crossed!   |   What to do with kids in London? Newer »
This thread is closed to new comments.