Using probit regressions coefficients to derive probabilities?
August 18, 2018 7:29 PM   Subscribe

I'm reading a paper (here) that uses a Probit Regression. I'm not entirely familiar with how this works. But, I'm wondering if there's a way to use the coefficients from Table 3 to derive the probabilities indicated in Tables 4, 5 and 6. And, if so, how I would do that?
posted by Proginoskes to Education (1 answer total) 1 user marked this as a favorite
 
Best answer: Big note: the paper uses ordered probit, not vanilla probit, which will make everything messier. The gist of what's going on is the same either way, so I'll describe it for vanilla probit first and then what changes with ordered probit.

Probit and logit both work by using a linear model that you then apply to a nonlinear curve to get probabilities. Probit uses the cumulative normal and logit uses the logit curve ( exp(x)/(1+exp(x)) ). If something has a coefficient of 1, then increasing that iv by one moves you one to the right along that curve -- but that makes little difference if you're moving from -5 to -4, a *lot* of difference if you're moving from -0.5 to 0.5, and not much difference again if you're moving from 4 to 5, because of the shape of the probability functions.

Anyway, you run your probit and get an equation y=b0+b1x1+b2x2+b3x3, where y is the number you'd plug into the cumulative normal to get back a probability. If y=1.96, your probability is 0.975.

What this means, first, is that you can compute the predicted probability of a "hit" for any combination of x's you want to. You wanna know what Pr(hit) is when x1=3, x2=-17.3, x3=2041? Just find y=b1(3)+b2(-17.3)+b3(2041) and plug it into the cumulative normal.

What *this* means, in turn, is that you can compute how the probability of a hit changes between some reference observation (typically all variables held at their mean, median, or mode depending on whether they're interval/ratio, ordinal, or categorical) and that reference observation with some specified difference. Tables 4a and 4b are doing basically this (or would be if they were doing vanilla probit) but they're showing variation in two variables rather than one.

Ordered probit, for when your dv is a multilevel ordinal variable instead of just binary, makes it more complicated because instead of one probability, you have i different probabilities for i different levels. What tables 4a and 4b seem to be actually doing is taking those probabilities and computing the expected value of the dv for each category, holding the other variables at some reference level.

The way you would typically do this in real life would be to generate predicted probabilities within your statistical software; most of them will have some easy, canned way to do postestimation predictions.
posted by GCU Sweet and Full of Grace at 7:54 PM on August 18, 2018 [6 favorites]


« Older How to use QOS to prioritize traffic to certain...   |   Leave reasonably okay company for freelancing? Newer »
This thread is closed to new comments.