christian fundamentalism and it's determinants
April 7, 2005 2:43 PM   Subscribe

Regression Filter: For my undergraduate thesis, I am working on determining economic and social determinants of christian fundamentalism. I am having some trouble analyzing the orbit probit regressions that I am running.

I am using the time-panel, General Social Survey which has good information on the denominations and charted the individual participation and beliefs (ie how fundamentalist etc)

My hypothesis is that economic factors push certain individuals into church support networks. My advisor is an econometrician and is very good with her econometrics but is not a sociologist: she suggested using an orbit probit (which we havn't gotten to in class.) i'm running the regressions and i'm finding some statistically significant correlations, between certain economic and social factors and fundamentalist beliefs, I just have no idea how to analyze the coefficients. I am using Stata, and the internal analysis: software.
posted by stratastar to Education (7 answers total)
i know nothing about this, but this document seems like a pretty good resource for probit models (i know understand what they are and it addresses various practical problems). you need to scroll down a bit to get to the meat, as the first bit is kind of a summary of the contents, which is a bit misleading). sorry if i'm completely out of my league here.

and did you mean orbit or obit?
posted by andrew cooke at 3:08 PM on April 7, 2005

posted by andrew cooke at 3:09 PM on April 7, 2005

What's your question?
posted by dness2 at 3:10 PM on April 7, 2005

Or did you mean ordered probit?

If so (hell, for any method), get the little green book from Sage on the topic. Those little green books from sage are amazing...The one for ordered probit is here.

If you want to interpret your coefficients before amazon can get you the book, search insdie the book. The formula for finding the predicted change in the DV based on a change in the IV* is on page 13. Start reading on page 12 by searching for "a natural question to ask is how the probabilities" . The last formula on page 13 is used for a continuous IV. Keep reading a few more pages for interpreting the coefficient on a dummy variable.

* Remember that for logit and probit models, you can't say "each $1000 of income increases the probability of being a fundamentalist by 5%" or any statement like that. The change in probability of going from 20K to 21K will be different from the change when you go from 30 to 31K.
posted by duck at 3:32 PM on April 7, 2005

right logit probit (it's that time of year)

duck: thank you that's what i actually meant to ask... how you analyze changes in the RHS variables with the logit probit model...
posted by stratastar at 3:37 PM on April 7, 2005

("my" link has an example of interpretation if you search for "Interpretation of MNL Model Results". see also "How are beta coefficients interpreted?". but even so, somehow i suspect duck's book will be better... :o)
posted by andrew cooke at 3:43 PM on April 7, 2005

OK...well then nevermind the book I mentioned which is for ordered probit or logit. Probit models and logit models are two different kinds of models based on two different distributions, though they do pretty much the same thing and the two distributions look very alike.

I've worked with logit models and taught them (regular old logit, not ordered), so I can actually tell you how to interpret those coefficients. Forgive me if the explanation is too basic and explains things you already know...

So the first thing you need is to understand is that a logit model is essentially a linear model where the DV is the *logged odds* of the event, rather than the even or probability of the event or whatever.

But of course you're interested in the probability of the event (we'll get to that) and the effect on the probability of the event depends on the value of the IV (so as said above, the effect of going from 30K - 31K would be different than the effect of going from 40k-41K -- there is no uniform "effect of an increase of $1000"). See note 1 below for an explanation of why. to interpret the co-efficient, pick two values of an IV. So let's say income (cause it's always an easy one to work with) let's say 30K and 40K (see note 2). Now start with 30K and plug that into your model:

constant + 30,000(co-efficient for income) + [rest of model -- use mean/modal values of control variables].

Now what you get when you do that is the *logged odds* of the event. So to turn that into the probabilities...first un-log it. Use the EXP button on your calculator. Now you have the odds. (See note 3). To turn the odds into the probability, take it and divide it by (itself +1). (1 because the denominator will be 1 since the number won't be expressed as a fraction.

What you have is the probability for a person with income 30K... now do the same with 40K. The difference between the two probabilities is the effect of going from 30-40K.

Now if that all sounds daunting, here's an example (note 4)...Using the cancer.dta set that comes with stata, I entered the command:

.logit died age

So this is the effect of age on the probability of dying. It spits out the model:

died | Coef. Std. Err. z P>|z| [95% Conf. Interval]
age | .0893535 .0585925 1.525 0.127 -.0254857 .2041928
_cons | -4.353928 3.238757 -1.344 0.179 -10.70177 1.993919

[paste the table into a word processor and change the font to courier, or just run the model yourself].

Ok, so the mean age is 55 and the standard deviation is we'll use 55 and 60.

The logged odds of dying if you're 55: -4.354 + 55 (.089) = .541

To get the odds, I enter that number and hit the EXP key...1.718 . So the odds are 1.718 / 1 . The probability is 1.718 / (1.718+1) = .632

So that's the probability of dying if you're 55 (presumably given that you have cancer...still pretty scary!).

Now calculate the same thing with 60: -4.354 + 60 (.089) = .986

So the logged odds of dying if you're 60 are .986 ... the odds are EXP (.986) = 2.68 (2.68/1)

And the probability is 2.68 /(2.68+1) = .728 (So the probability of dying if your 60 is .728)

So the effect of the 5 year increase is .728 - 632 = .096 . So going from 55 years to 60 years increases your probability of dying by 9.6 percentage points (not 9.6%).

Remember that you can't " Ageing 5 years increases the probability of dying by 9.6 percentage points" The effect will be different if you look at different ages (try it!).

Lastly...this is for the *logit* command in stata, not the logistic command, which uses the same models, but spits out the co-efficients in a different form.

This answer brought to you by your friendly neighbourhood quantitative methods TA. E-mail me if you have more questions and think I can help.

Note 1: If no one has told you why you're using a logit or probit model... A regular (linear, OLS) regression would have two problems.

First, if you ran a regular regression model with a dichotomous variable as your DV you would get something like "The probability of being a fundamentalist decreases by 5 percentage points with each $10K increase in income." So what's the problem with that? Well let's say your starting point is that a person with an income of $40K has a 15% chance of being a fundamentalist. Now what's the probability of a person with a $100K income being a fundamentalist? According to this example, that person would have a -15% chance of being a fundamentalist. Obviously you can't have a negative probability. That's the problem with a linear model -- it will allow the probability to be less than 0 or greater than 1 which is impossible. Logit or probit models will keep predicted probabilities between 0 and 1.

Second, the linear model assumes a uniform effect. So imagine you're modelling the probability of going to college based on parents' income. A linear model would suggest that a $5000 increase in income would always have the same effect on that probability. But if you give it some thought, it seems like going from $10K parental income to $15K probably won't increase your chances of going to college much (probability at both levels will be pretty low). Similarly, going from $1 million income to $1 million+5000 won't make much of a difference. But going from $40K to $45K we can imagine that that $5000 might actually have an effect. So logit and probit models allow the IV to have greater effects in the middle and smaller effects at the higher and lower ends of the scale. This is how you get the S-shaped effect on probability.


Note 2: When picking values I would go with the mean and then up or down one standard deviation. Just cause.

Note 3: Remember odds are the frequency of an event happening over the frequency of the event not happening. Probability is the frequence of an event happening over the total number of trials (event happening + not happening). So the numerators are the same, but the denominator of the probability is the sum of the numerator and denominator of the odds.

Note 4: I don't actually know what this data set is.
posted by duck at 6:52 PM on April 7, 2005

« Older Website awards - which are relevant?   |   Making Multipart text/html email Newer »
This thread is closed to new comments.