# What are some decent alternatives to SAS for statistical software?

November 10, 2006 1:15 PM Subscribe

What are some decent alternatives to SAS for statistical software?

I have a need for statistical software at work, and I understand that SAS is the best option out there. But when I started digging around for pricing, it seems WAY too expensive. With the options that would make the software easiest for me to use, we're talking over $35,000!

Here's what I need to do: Analyze "flat" files to determine pricing/volume relationships, which I think logistic or linear regression will be best suited for. I also may have a future need to clustering and/or factor analysis.

I understand that some software out there handles this nicely but requires in-depth programming. This is not for me - I am not a programmer. I simply want to be able to take my data sets and easily get the desired output for some standard statistical types of analysis.

Thanks for any help!

I have a need for statistical software at work, and I understand that SAS is the best option out there. But when I started digging around for pricing, it seems WAY too expensive. With the options that would make the software easiest for me to use, we're talking over $35,000!

Here's what I need to do: Analyze "flat" files to determine pricing/volume relationships, which I think logistic or linear regression will be best suited for. I also may have a future need to clustering and/or factor analysis.

I understand that some software out there handles this nicely but requires in-depth programming. This is not for me - I am not a programmer. I simply want to be able to take my data sets and easily get the desired output for some standard statistical types of analysis.

Thanks for any help!

Excel might be all you need. That can do most simple statistics.

If you need more power, go with R, which is an open source statistics program.

posted by cschneid at 1:23 PM on November 10, 2006

If you need more power, go with R, which is an open source statistics program.

posted by cschneid at 1:23 PM on November 10, 2006

Any straightforward statistical package will do ya.

SAS isn't really a statistical package any more, and a lot of the cost comes from the extra functionality it has for businesses.

If you just need to do OLS, logit, and principal components or similar, you can use any statistical package. I've used several:

R

Upsides: free, open-source, can do anything. Generates fabulous graphics as well.

Downsides: command-line oriented, might require an extra step to get data into it. Command-line orientation isn't really a downside since that's how you should be running, but you'll likely see it as one. Object-oriented nature can get annoying when you just want to get work done.

Cost: $0.00

Stata

Upsides: Well-supported and extensible. Useful compromises between proper command-line orientation and sissified point-n-click. Command line allows any nonambiguous abbreviation. Decent graphics.

Downsides: Might have to take an extra step to get data in (whatever --> CSV --> Stata). Expensive documentation.

Cost: ca. $1K

GRETL

Upsides: free

Downsides: numerous. Effective use requires pre-cleaning data before dumping to CSV for gretl.

Cost: $0.00

SPSS

Upsides: seems point and click. I can't stand it.

Downsides: I can't be rational about this one and find it loathsome for reasons I find difficult to express.

Cost: ca $1600.

In your shoes, I would look real hard at Stata or R.

posted by ROU_Xenophobe at 1:31 PM on November 10, 2006

SAS isn't really a statistical package any more, and a lot of the cost comes from the extra functionality it has for businesses.

If you just need to do OLS, logit, and principal components or similar, you can use any statistical package. I've used several:

R

Upsides: free, open-source, can do anything. Generates fabulous graphics as well.

Downsides: command-line oriented, might require an extra step to get data into it. Command-line orientation isn't really a downside since that's how you should be running, but you'll likely see it as one. Object-oriented nature can get annoying when you just want to get work done.

Cost: $0.00

Stata

Upsides: Well-supported and extensible. Useful compromises between proper command-line orientation and sissified point-n-click. Command line allows any nonambiguous abbreviation. Decent graphics.

Downsides: Might have to take an extra step to get data in (whatever --> CSV --> Stata). Expensive documentation.

Cost: ca. $1K

GRETL

Upsides: free

Downsides: numerous. Effective use requires pre-cleaning data before dumping to CSV for gretl.

Cost: $0.00

SPSS

Upsides: seems point and click. I can't stand it.

Downsides: I can't be rational about this one and find it loathsome for reasons I find difficult to express.

Cost: ca $1600.

In your shoes, I would look real hard at Stata or R.

posted by ROU_Xenophobe at 1:31 PM on November 10, 2006

I'm also not sure that SAS is the best; it's just used a lot. Everyone I know who uses it finds that it is a very frustrating program. I second checking out Excel, and trying R.

posted by dpx.mfx at 1:31 PM on November 10, 2006

posted by dpx.mfx at 1:31 PM on November 10, 2006

You do not want to use Excel for even simple regressions. It is prone to weird fuckups, and it will be much harder to get stuff done.

posted by ROU_Xenophobe at 1:32 PM on November 10, 2006

posted by ROU_Xenophobe at 1:32 PM on November 10, 2006

I should note that none of the options I listed require any particular programming. Some use the command line. Even the worst is not programming by any stretch.

In R, the one you probably think you need to do in-depth programming for, a simple OLS looks like this:

output < - lm( dv ~ iv1+iv2+iv3)br> summarize(output)

(though getting the data in will take another line or two)

In Stata, it would look like this:

regress DV IV1 IV2 IV3

Neither of which should strike fear into you. In general, I'd expect you might find Stata easier to deal with than R, but I'd look at both.

posted by ROU_Xenophobe at 1:38 PM on November 10, 2006

In R, the one you probably think you need to do in-depth programming for, a simple OLS looks like this:

output < - lm( dv ~ iv1+iv2+iv3)br> summarize(output)

(though getting the data in will take another line or two)

In Stata, it would look like this:

regress DV IV1 IV2 IV3

Neither of which should strike fear into you. In general, I'd expect you might find Stata easier to deal with than R, but I'd look at both.

posted by ROU_Xenophobe at 1:38 PM on November 10, 2006

Response by poster: Thanks for all the comments already! Excel definitely won't work for me - there is a plug-in I have tried (XLStat), and it works well but my data sets are often way too big for Excel to handle.

Anything that can hold my hand through the statistical process gets a plus in my book - I will take a look at Stata and R and some of the others that have been recommended.

I found a list of statistical software here: Wikipedia's list of statistical packages. Any thoughts on this?

Thanks everyone!

posted by scottso17 at 1:50 PM on November 10, 2006

Anything that can hold my hand through the statistical process gets a plus in my book - I will take a look at Stata and R and some of the others that have been recommended.

I found a list of statistical software here: Wikipedia's list of statistical packages. Any thoughts on this?

Thanks everyone!

posted by scottso17 at 1:50 PM on November 10, 2006

n

If you not up for command-line, consider S-PLUS. Its R in a more point-and-clicky form. You are certainly able to do obtain standard statistics without dropping into a command window, and the extra power is there if you need it.

posted by Well that's a lie at 2:24 PM on November 10, 2006 [1 favorite]

^{th}ing R. It really is as easy as ROU_Xenophobe says.If you not up for command-line, consider S-PLUS. Its R in a more point-and-clicky form. You are certainly able to do obtain standard statistics without dropping into a command window, and the extra power is there if you need it.

posted by Well that's a lie at 2:24 PM on November 10, 2006 [1 favorite]

SPSS is sufficient for regression, factor, cluster analysis and some other basic stuff. It requires practically no programming, though you have the option of using simple syntax. Outputs can be easily pasted into Excel (if that’s relevant), data input is also fairly easy. This is my backup package (when I forget to renew SAS license), so I haven’t used it enough to develop a loathing for it yet.

posted by of strange foe at 2:40 PM on November 10, 2006

posted by of strange foe at 2:40 PM on November 10, 2006

My brother who does quantitative political science has recommended Stata. I think he's also mentioned that R has a growing userbase/following, especially in Bayesian analysis circles.

posted by weston at 3:41 PM on November 10, 2006

posted by weston at 3:41 PM on November 10, 2006

Stata is the shiznit, but honestly SPSS might be the best option for you. SPSS is point/lick and there are a million decent books out there that will show you how to use it. Or on preview what of strange foe said.

posted by jtfowl0 at 4:52 PM on November 10, 2006

posted by jtfowl0 at 4:52 PM on November 10, 2006

err...point/click *sigh* (insert comment about fantasy regarding a hot chick who is good at analyzing quantitative datasets here).

posted by jtfowl0 at 4:54 PM on November 10, 2006

posted by jtfowl0 at 4:54 PM on November 10, 2006

R is pretty cool, though not user friendly. However, there is R Commander which is the point and click front end for R.

R Commander

Personally I like Stata the best. It's not as expensive or bloated as SAS and relatively inexpensive if you're a student.

Does anyone know where you can get a copy of SAS 6.12 for Mac? I hate having to keep an old Windows laptop around just to do SAS. My university stopped selling it last year.

posted by timetoevolve at 5:42 PM on November 10, 2006

R Commander

Personally I like Stata the best. It's not as expensive or bloated as SAS and relatively inexpensive if you're a student.

Does anyone know where you can get a copy of SAS 6.12 for Mac? I hate having to keep an old Windows laptop around just to do SAS. My university stopped selling it last year.

posted by timetoevolve at 5:42 PM on November 10, 2006

I was going to vote for SPlus, but "Well that's a lie" already did it for me.

posted by inigo2 at 10:48 PM on November 10, 2006

posted by inigo2 at 10:48 PM on November 10, 2006

Having spent a lot of time with Minitab I can say that it will do what you need done. Current cost is $US 1195. But more importantly they have direct support available on the phone.

I am not a minitab shareholder or employee.

posted by ptm at 12:31 AM on November 11, 2006

I am not a minitab shareholder or employee.

posted by ptm at 12:31 AM on November 11, 2006

I would not suggest any of the smaller packages like Minitab or Statgraphics. It will do what you want, but *any* statistical package will run OLS and logit.

Part of what you want with a statistical package is a large and active user base, so that you can bet that other users will be writing new routines that you can use and so that you can usefully google for solutions to problems. Unless the people on the phone line are statisticians, direct phone support isn't going to be much help. However, users helping each other deal with a problem three years ago is a fabulous resource.

That means Stata, R, or SPSS, and SPSS trails in this regard by a long shot.

posted by ROU_Xenophobe at 6:31 AM on November 11, 2006

Part of what you want with a statistical package is a large and active user base, so that you can bet that other users will be writing new routines that you can use and so that you can usefully google for solutions to problems. Unless the people on the phone line are statisticians, direct phone support isn't going to be much help. However, users helping each other deal with a problem three years ago is a fabulous resource.

That means Stata, R, or SPSS, and SPSS trails in this regard by a long shot.

posted by ROU_Xenophobe at 6:31 AM on November 11, 2006

I used to use R and SPSS, and found them both annoying for different reasons. I now use JMP, which I freaking love...it's visually oriented, like SPSS, but much smarter, better, and easier to use in every way. I think they have a free trial available, even.

posted by myeviltwin at 8:41 AM on November 11, 2006

posted by myeviltwin at 8:41 AM on November 11, 2006

Most of what I have to say is covered well above---I' a happy R user and agree that SPSS is offputting (it's all the idiot diologue boxes that I hate)---but I'd like to amplify the caution about using MS programs for statistical analysis.

As ROU_Xenophobe says, never use Excel for stats. Microsoft took speed shortcuts in their math libraries, which result in very unpredictable behaviour when calculating large differences between small numbers (among other problems--they cheated on the trig functions too). This has been an issue with all MS products since the earliest versions of Excel, right through Office XP/2002. Apparently Office 2003 fixes some of the problems, but not all (which is almost worse). Everything that uses the Microsoft C runtime math libraries is inaccurate.

MS is well aware of the problem---search the MS knowledge base for Excel and IEEE 754---they just don't think it's worth the hassle to fix properly. As a result most quantitative organizations (NIST, the EPA, ISO) all disrecommend using Excel for any statistics you care about.

posted by bonehead at 10:59 AM on November 11, 2006

As ROU_Xenophobe says, never use Excel for stats. Microsoft took speed shortcuts in their math libraries, which result in very unpredictable behaviour when calculating large differences between small numbers (among other problems--they cheated on the trig functions too). This has been an issue with all MS products since the earliest versions of Excel, right through Office XP/2002. Apparently Office 2003 fixes some of the problems, but not all (which is almost worse). Everything that uses the Microsoft C runtime math libraries is inaccurate.

MS is well aware of the problem---search the MS knowledge base for Excel and IEEE 754---they just don't think it's worth the hassle to fix properly. As a result most quantitative organizations (NIST, the EPA, ISO) all disrecommend using Excel for any statistics you care about.

posted by bonehead at 10:59 AM on November 11, 2006

Ploticus - A free, GPL, non-interactive software package for producing plots, charts, and graphics from data. It was developed in a Unix/C environment and runs on various Unix, Linux, and win32 systems. ploticus is good for automated or just-in-time graph generation, handles date and time data nicely, and has basic statistical capabilities. It allows significant user control over colors, styles, options and details.

posted by ptm at 11:26 PM on November 11, 2006

posted by ptm at 11:26 PM on November 11, 2006

This thread is closed to new comments.

*I'm not affiliated with Stata, just a happy user*

posted by langedon at 1:21 PM on November 10, 2006