Advertise here: Contact FM.


Recommended Stata alternatives?
November 30, 2007 4:46 AM   RSS feed for this thread Subscribe

What is the best free substitute for Stata?

I use Stata at university, but only have access to it during term-time. During vacation, I'd like a free program that can open large Stata (or SPSS or SAS) datasets, manipulate the data (pooling datasets, generating variables etc), and run regressions. I don't expect to use any particularly complex statistical methods. Ease of use is a priority: I have only a few weeks and a fair amount of work to do, so although I could probably get by with a command-line, a GUI solution would be great. Graphs would be nice, but are not essential. Windows support would be ideal, but Linux-only is acceptable.

I can use Google, so I'm not just looking for a list of open source statistical software; rather, I'd like recommendations of good Stata subsitutes. Thanks!
posted by matthewr to science & nature (15 comments total) 2 users marked this as a favorite
R will import datasets in these formats. I can't say that it is especially easy to use at first, but there are numerous GUIs for it, Linux and Windows.
posted by grouse at 5:09 AM on November 30, 2007


R (aka GNU S) can read Stata datasets, and there's a windows version. Do you need to run Stata scripts too?
posted by mkb at 5:10 AM on November 30, 2007


No, I don't think I need Stata scripts. Some kind of logging would be nice though.

One thing I should have emphasised is that the datasets I'm using are very large (650Mb or more). I need to be able to open these files and do things like generating new variables based on existing data, but I'm only going to be running regressions on about a dozen variables (out of well over a thousand), so I'll probably want to extract the desired variables into a new, smaller dataset.
posted by matthewr at 5:33 AM on November 30, 2007


I'm not sure what kind of logging Stata does, but R automatically saves all the commands you run to a history file. This is invaluable to me when I come back several months later and can't figure out just how I created that particular plot.
posted by grouse at 5:52 AM on November 30, 2007


Not free, but if your university is a participant in stata's gradplan a permanent license for intercooled ("normal") Stata 10 is only $155. And then you have Stata 10.x forever and can throw it on your laptop, etc.

If it's got to be free, R. R is not user friendly. At best, it vaguely tolerates users with a grumpy sniff.

In your shoes, I would load all of the datasets I think I'm going to use and dump them to csv from their home software rather than screw around with beating R into importing proprietary datasets. Learning the other parts of R will be enough of a pain in the ass, and it should be easy to make Stata/SPSS/SAS vomit a csv.
posted by ROU_Xenophobe at 6:02 AM on November 30, 2007


Here's an example of how to subset data with the mtcars dataset that comes with R:

> mtcars[1:5,]
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
> mtcars.mpg = subset(mtcars, select=c(mpg, cyl, hp))
> mtcars.mpg[1:5,]
                   mpg cyl  hp
Mazda RX4         21.0   6 110
Mazda RX4 Wag     21.0   6 110
Datsun 710        22.8   4  93
Hornet 4 Drive    21.4   6 110
Hornet Sportabout 18.7   8 175
> ?lm
> summary(lm(mpg ~ hp, mtcars.mpg))

Call:
lm(formula = mpg ~ hp, data = mtcars.mpg)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.7121 -2.1122 -0.8854  1.5819  8.2360 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
hp          -0.06823    0.01012  -6.742 1.79e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 3.863 on 30 degrees of freedom
Multiple R-Squared: 0.6024, Adjusted R-squared: 0.5892 
F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07 

> library(lattice)
> xyplot(mpg ~ hp, mtcars.mpg, type=c("p", "r"))


Listen to ROU_Xenophobe.
posted by grouse at 6:05 AM on November 30, 2007


The ?lm above gets the help for lm.

If you want an R book I highly recommend An Introduction to S and S-Plus by Phil Spector, which works just fine for R. Unfortunately the online introductions I have seen all suck, as do several of the books with R in their name.
posted by grouse at 6:07 AM on November 30, 2007


ROU_Xenophobe: Unfortunately, it has to be free. No GradPlan, so the cheapest Stata is £320. I'll definitely follow the CSV suggestion, thanks.

Many thanks for the R code, grouse.

As you and ROU say, R is not particularly easy to use. I can see that if the choice was between Stata and R, perhaps the effort of learning R would pay off in the long run. But I've already chosen Stata, and I'm looking for a temporary substitute for it rather than a fullblown long-term replacement. With that in mind, I suppose my question is: in the short term, what does R do for me that makes up for the extra learning effort?
posted by matthewr at 6:45 AM on November 30, 2007


Doesn't Stata have outsheet for vomiting TSV? Easy-peasy.
posted by mkb at 7:10 AM on November 30, 2007


I suppose my question is: in the short term, what does R do for me that makes up for the extra learning effort?

Nothing. Well, nothing + epsilon -- R will do some things that Stata won't, and R is good for simulation, if you care about that.

But if you can't afford Stata, then the choice isn't between Stata and R, it's between R and other free stuff.

I would be utterly astonished if you found a freeware workalike or work-similar to stata. No doubt there is other free stuff that will manipulate data and run regressions, but my sense is that it's generally either crippled in some way, user-hostile, or doesn't have a useful community around it. R isn't crippled -- if you can do it, you can do it with R -- and the community built up around it is very useful and keeps expanding its capabilities.

If you have to learn anything about any other software other than Stata, it should be R, because what you learn won't be useless to you in a year.
posted by ROU_Xenophobe at 7:21 AM on November 30, 2007 [1 favorite]


And yeah, getting a csv out of stata is as easy as "outsheet using filename, comma". I just have no idea how to do that in SPSS or SAS, because I've developed a real loathing for them.

SPSS I just never liked. And SAS pissed me off real bad in like 1995 when I found that while every version of SAS uses the same godawful syntax from 1787 or whenever it first came out, you couldn't transparently move a SAS-for-unix dataset to SAS-for-PCs, even with the same version number. Fuck you, SAS.
posted by ROU_Xenophobe at 7:29 AM on November 30, 2007 [1 favorite]


Are you sure you can't use Stata off campus? My university has a similar arrangement (where Stata communicates with a keyserver on the campus network), but it's easily gotten around by using a VPN to connect to the campus network. Is it possible that your school offers VPN access for occasions just like this?
posted by awesomebrad at 7:53 AM on November 30, 2007


That's what I initially tried doing, awesomebrad, but my college and the faculty both said they didn't offer that from outside the university network.
posted by matthewr at 8:49 AM on November 30, 2007


alternatively, is there a unix box with stata that you can get a login for? you'd lose graphics without x, but still.
posted by ROU_Xenophobe at 9:12 AM on November 30, 2007


I have no idea what your opinion is on the ease of use of SPSS, but I found it very straightforward. It's possible to download a free (expiring*) demo version of SPSS on their website, after registering. (Obligatory Bugmenot link)

It's how I managed to complete the statistic analysis of my dissertation. As every student, there was a lot of work to be done in a short amount of time, so I wanted the program itself to be as easy as possible. It might be exactly what you are looking for.

Download a full, working copy of SPSS for Windows® and try it for yourself. SPSS is a modular, tightly integrated, full-featured software comprised of SPSS Base and a range of add-on modules. Each module—SPSS Advanced Models™, SPSS Categories™, SPSS Complex Samples™, SPSS Conjoint™, SPSS Data Preparation™, SPSS Exact Tests™, SPSS Neural Networks™, SPSS Missing Value Analysis™, SPSS Regression Models™, SPSS Tables™, SPSS Trends™, and SPSS Classification Trees™—adds extra functionality to your system. This evaluation copy will install SPSS Base and all add-on modules. If you want to install the SPSS Programmability Extension, visit www.spss.com/devcentral and click on the “Download” link to download the SPSS Python Integration Plug-In. If you purchase SPSS after the evaluation period, please consult your local sales office to ensure you order the correct modules for the features you require. Please note that this software trial will expire in approximately 14 days and is for evaluation purposes only.

*Expiring isn't going to stop you from using the program, of course. *wink*
posted by lioness at 4:58 PM on November 30, 2007


« Older At the height of the "ear...   |   Song ID Filter: I want to ide... Newer »
This thread is closed to new comments.