Cheap data-fitting software?
April 17, 2008 5:58 PM Subscribe
Looking for free or low cost (under $50) multiple linear regression software which ideally works with Microsoft Excel (but not critical). Any recommendations?
Best answer: I would try using R; it doesn't interface with Excel but is really easy. If you save your data into a csv file, and you want to regress the column labeled 'weight' based on 'height' and 'age', you would just write
data = read.csv('filename')
lm(weight ~ height + age, data)
and that would perform what you need for you.
posted by bsdfish at 6:15 PM on April 17, 2008
data = read.csv('filename')
lm(weight ~ height + age, data)
and that would perform what you need for you.
posted by bsdfish at 6:15 PM on April 17, 2008
R is quite good. If you're comfortable with linear algebra then you might like Scilab or one of the other (free) clones of MatLab.
posted by thrako at 6:44 PM on April 17, 2008
posted by thrako at 6:44 PM on April 17, 2008
bsdfish is correct, R is the best free solution.
A spreadsheet is a terrible place to do statistical analysis for reasons that I won't go into here.
However, if you must, search the excel help for "options in the regression dialog box"
posted by singingfish at 6:58 PM on April 17, 2008
A spreadsheet is a terrible place to do statistical analysis for reasons that I won't go into here.
However, if you must, search the excel help for "options in the regression dialog box"
posted by singingfish at 6:58 PM on April 17, 2008
STATA has a cheap temporary version, though it may only be available to students.
posted by lunasol at 7:00 PM on April 17, 2008
posted by lunasol at 7:00 PM on April 17, 2008
R works, I've also done it directly in excel, using solver.
posted by Eringatang at 7:41 PM on April 17, 2008
posted by Eringatang at 7:41 PM on April 17, 2008
Best answer: You don't really need extra software.
In Excel, the LINEST function can do ordinary least squares, multiple linear regression.
posted by mikeand1 at 8:23 PM on April 17, 2008
In Excel, the LINEST function can do ordinary least squares, multiple linear regression.
posted by mikeand1 at 8:23 PM on April 17, 2008
What do you mean by "works with excel?" For close integration, there is a stats package lying around for excel somewhere. I wouldn't want to try and use it for anything serious, where you really need to be able to get down and dirty in the model, but for toy stuff, why not?
For real work, the best free solution is unquestionably R. Serious quant jocks pretty much only use R or Stata, and Stata ain't free... (though I absolutely deny bsdfish's claim that R is easy... there's a learning curve, but it's worth it for the power.)
posted by paultopia at 8:56 PM on April 17, 2008
For real work, the best free solution is unquestionably R. Serious quant jocks pretty much only use R or Stata, and Stata ain't free... (though I absolutely deny bsdfish's claim that R is easy... there's a learning curve, but it's worth it for the power.)
posted by paultopia at 8:56 PM on April 17, 2008
My claim was that was easy to do regression with R, not that using the full power of R is easy :)
posted by bsdfish at 9:20 PM on April 17, 2008
posted by bsdfish at 9:20 PM on April 17, 2008
Sigmaplot will do curve fitting of all sorts and plays well with Excel. They have a free 30 day trial.
posted by euphorb at 10:47 PM on April 17, 2008
posted by euphorb at 10:47 PM on April 17, 2008
Response by poster: singingfish: "A spreadsheet is a terrible place to do statistical analysis for reasons that I won't go into here."
Umm could you go into them just a little? Thanks
posted by vizsla at 3:48 AM on April 18, 2008
Umm could you go into them just a little? Thanks
posted by vizsla at 3:48 AM on April 18, 2008
Here's a good page on the problems with spreadsheets, quotation below:
"A key feature of spreadsheets is that a cell can be both a formula and a value. This is the great strength of spreadsheets. When something is made simple enough, it often becomes very powerful -- this is such a case.posted by singingfish at 4:11 AM on April 18, 2008
"While this double meaning of cells gives spreadsheets their appeal, it also has negative qualities. Primarily the problem is that some cells have hidden meaning. When you see a number in a cell, you don't know if that is a pure number or a number that is derived from a formula in the cell. While this distinction is usually immaterial, it can be critical."
actually this is fun. here's another good article, pitting various stats packages against excel.
The upshot is due to poor data management and typing, inadequate provisions for missing data, and poor choices of algorithms for statistical calculations, inadequate and wrong documentation you've a very good chance of coming up with a badly wrong statistical calculation that is very hard to debug due to mixing of functions and data, and inadequate provision for extending existing data sets.
posted by singingfish at 4:31 AM on April 18, 2008
The upshot is due to poor data management and typing, inadequate provisions for missing data, and poor choices of algorithms for statistical calculations, inadequate and wrong documentation you've a very good chance of coming up with a badly wrong statistical calculation that is very hard to debug due to mixing of functions and data, and inadequate provision for extending existing data sets.
posted by singingfish at 4:31 AM on April 18, 2008
I would try using R; it doesn't interface with Excel
Not in a pointy-clicky way, but it will input and output csv files.
gretl is also free, and pointy-clicky. But it's very limited.
posted by ROU_Xenophobe at 5:36 AM on April 18, 2008
Not in a pointy-clicky way, but it will input and output csv files.
gretl is also free, and pointy-clicky. But it's very limited.
posted by ROU_Xenophobe at 5:36 AM on April 18, 2008
SPSS has a free trial version. While I haven't done this myself, I've been told by others that when the free trial version runs out, they simply get a new free trial version. . .
posted by jujube at 8:27 AM on April 18, 2008
posted by jujube at 8:27 AM on April 18, 2008
Best answer: Oh SPSS is also bad software for different reasons:
* poor enforcement of data typing, which makes it too easy for you to do stuff that you should only be doing if you understand why you're doing it
* really horrible interface for doing data analysis when what you want is different from spss' conception of what you want to do (makes the easy things very easy and the less easy things far too hard).
* Incredibly expensive for what it offers, and the company abuse their customers with changing licence conditions and other lock-in tactics.
For simple point-click stuff I use JMP (free trial available) and for more repetitive or weird stuff I use R. I use SPSS only when the customer requires it, and when they provide me with a licence to use SPSS and when they've been resistant to wanting to use the alternatives.
posted by singingfish at 1:55 AM on April 19, 2008 [1 favorite]
* poor enforcement of data typing, which makes it too easy for you to do stuff that you should only be doing if you understand why you're doing it
* really horrible interface for doing data analysis when what you want is different from spss' conception of what you want to do (makes the easy things very easy and the less easy things far too hard).
* Incredibly expensive for what it offers, and the company abuse their customers with changing licence conditions and other lock-in tactics.
For simple point-click stuff I use JMP (free trial available) and for more repetitive or weird stuff I use R. I use SPSS only when the customer requires it, and when they provide me with a licence to use SPSS and when they've been resistant to wanting to use the alternatives.
posted by singingfish at 1:55 AM on April 19, 2008 [1 favorite]
This thread is closed to new comments.
posted by thisjax at 6:15 PM on April 17, 2008