Elegant weapons, for a more civilized age
October 25, 2008 2:35 AM Subscribe
Do lisp and dialects make sense for general purpose scientific computing?
I'm thinking of learning some variant of lisp as my next language, mostly out of masochism. My real-world computing needs are scientific/numerical, i.e. data manipulation, some statistics, lots of curve fitting and the like, with some data acquisition thrown in . So far I have been using C for the heavy stuff, perl for the quick and dirty and FORTRAN when I have to (I hang around with engineers).
Is lisp any use for such things? All the functional recursiveness seems pretty nifty, but I'd like to pick up a tool I can actually use.
I'm thinking of learning some variant of lisp as my next language, mostly out of masochism. My real-world computing needs are scientific/numerical, i.e. data manipulation, some statistics, lots of curve fitting and the like, with some data acquisition thrown in . So far I have been using C for the heavy stuff, perl for the quick and dirty and FORTRAN when I have to (I hang around with engineers).
Is lisp any use for such things? All the functional recursiveness seems pretty nifty, but I'd like to pick up a tool I can actually use.
Best answer: Mathlab, Mathematica and The Language R are the mainstream to-go languages for general purpose scientific computing, at least amongst the crowd that has realized that C and Perl suck at math, and that Fortran sucks at everything.
R has the advantages of being 1) open-source and 2) Scheme-based, which itself was Lisp-based, which is what you set out to learn in the first place.
Lisp's strengths are 1) Beautiful life-changing conceptualization of the programming is really about. 2) Problem solving via the invention of small domain-specific language (with syntactic macros) 3) Speed. So, while Lisp does make sense for scientific computing, and coding in Lisp would be a huge improvement over C, you'll fare even better with one of the 3 specialized language I mentioned.
posted by gmarceau at 3:35 AM on October 25, 2008 [3 favorites]
R has the advantages of being 1) open-source and 2) Scheme-based, which itself was Lisp-based, which is what you set out to learn in the first place.
Lisp's strengths are 1) Beautiful life-changing conceptualization of the programming is really about. 2) Problem solving via the invention of small domain-specific language (with syntactic macros) 3) Speed. So, while Lisp does make sense for scientific computing, and coding in Lisp would be a huge improvement over C, you'll fare even better with one of the 3 specialized language I mentioned.
posted by gmarceau at 3:35 AM on October 25, 2008 [3 favorites]
If you're looking for an excuse to learn Lisp this isn't it.
Learn it to attain a new way of looking at and solving problems -- not for its suitability (or lack thereof) for scientific computing.
(If you're doing a lot of processor-intensive stuff the way it sounds you are, you may consider Assembler somewhere down the line since high performance software is a must for any serious statistics / numerical work.)
posted by MaxK at 3:35 AM on October 25, 2008
Learn it to attain a new way of looking at and solving problems -- not for its suitability (or lack thereof) for scientific computing.
(If you're doing a lot of processor-intensive stuff the way it sounds you are, you may consider Assembler somewhere down the line since high performance software is a must for any serious statistics / numerical work.)
posted by MaxK at 3:35 AM on October 25, 2008
you may consider Assembler somewhere down the line since high performance software is a must for any serious statistics / numerical work.
oh god no.
Writing numerical routines that are accurate is hard. Really really hard. In fact, Numerical computing is a specialized field of programming. Unless you have studied the subject for a few years, you are bound to do it wrong.
Writing anything in assembly that produces a correct answer is already up there in term of difficulty. Writing something the goes fast in assembly, in addition to being correct, is also a specialty on its own. If you ever manage to master that skill, you can turn it into a lucrative career.
Combining numerical programming with programming in assembly is PhD-thesis level work. When you write one such procedure, you graduate, then your code gets incorporated into Mathematica and Mathlab and R. Reuse other people's hard work. Code in one of these languages.
posted by gmarceau at 4:01 AM on October 25, 2008 [5 favorites]
oh god no.
Writing numerical routines that are accurate is hard. Really really hard. In fact, Numerical computing is a specialized field of programming. Unless you have studied the subject for a few years, you are bound to do it wrong.
Writing anything in assembly that produces a correct answer is already up there in term of difficulty. Writing something the goes fast in assembly, in addition to being correct, is also a specialty on its own. If you ever manage to master that skill, you can turn it into a lucrative career.
Combining numerical programming with programming in assembly is PhD-thesis level work. When you write one such procedure, you graduate, then your code gets incorporated into Mathematica and Mathlab and R. Reuse other people's hard work. Code in one of these languages.
posted by gmarceau at 4:01 AM on October 25, 2008 [5 favorites]
If you want the nifty functional features and some nice libraries for numerical work, perhaps OCaml should be considered? I've not done that kind of work myself in the languange, but I know and met people who have and their experience has been very positive.
posted by Iosephus at 5:29 AM on October 25, 2008
posted by Iosephus at 5:29 AM on October 25, 2008
OCaml is *excellent* for math, with the caveat that a few things about it (okay, one: when working with floating point numbers, you need to use +. -. /. *. instead of + - / *, for reasons that will make sense eventually) may turn you off. But it's excellent for number crunching, and very fast, and you won't be blowing time on low-level details the way you would with C or (ack!) assembler.
I have not read this book, but it comes highly recommended: OCaml for Scientists
The _Practical OCaml_ book by Joshua Smith is embarrassingly bad, but this translated French O'Reilly book is a fantastic introduction to the language, and available free online.
Python is also recommended for your purposes. While it isn't a fast language in general, it *is* quite fast when almost all your time is spent in some specific libraries (which are coded in fanatically optimized C). The string libs are one, NumPy (a scientific computing library) is another. Python is quite flexible and easy to use, and would also probably be a good fit. If you know C and Perl, you will pick it up quite quickly; it's rather like a cleaned-up Perl.
I haven't personally used them much, but I second that mathematica, matlab, or R are also worth a look, since they're focused more closely on your needs.
While Lisp is quite a remarkable language, to my knowledge it doesn't have any advantages specific to scientific computing. I think those other languages would probably be a better fit. If you decide to experiment with Lisp, use either SBCL for Common Lisp or a fast Scheme compiler, such a Chicken.
posted by trouserbat at 6:42 AM on October 25, 2008
I have not read this book, but it comes highly recommended: OCaml for Scientists
The _Practical OCaml_ book by Joshua Smith is embarrassingly bad, but this translated French O'Reilly book is a fantastic introduction to the language, and available free online.
Python is also recommended for your purposes. While it isn't a fast language in general, it *is* quite fast when almost all your time is spent in some specific libraries (which are coded in fanatically optimized C). The string libs are one, NumPy (a scientific computing library) is another. Python is quite flexible and easy to use, and would also probably be a good fit. If you know C and Perl, you will pick it up quite quickly; it's rather like a cleaned-up Perl.
I haven't personally used them much, but I second that mathematica, matlab, or R are also worth a look, since they're focused more closely on your needs.
While Lisp is quite a remarkable language, to my knowledge it doesn't have any advantages specific to scientific computing. I think those other languages would probably be a better fit. If you decide to experiment with Lisp, use either SBCL for Common Lisp or a fast Scheme compiler, such a Chicken.
posted by trouserbat at 6:42 AM on October 25, 2008
Aha, yes. The two statistics folks I know well -- one is a baseball writer, the other an economist -- both keep telling me that R is the bee's knees. I'd forgotten about that until gmarceau mentioned it.
As for Python, I've just started trying it myself, since it seems quite sleek and sexy. To this noob, it does seem like a cleaner, more modern Perl replacement.
So forget what I said above: I'm switching my vote to R and Python! :)
posted by rokusan at 7:05 AM on October 25, 2008
As for Python, I've just started trying it myself, since it seems quite sleek and sexy. To this noob, it does seem like a cleaner, more modern Perl replacement.
So forget what I said above: I'm switching my vote to R and Python! :)
posted by rokusan at 7:05 AM on October 25, 2008
Ah, yes, OCaml... great language, one of my favorites [1] [2] [3, page 12]. It is Scheme + a syntax - macros + an awesome type system. But when I tried to do numerical stuff with it, I found I was starving for good libraries. Your millage may vary.
Here are some more links in support what I was saying above.
To know exactly how hard numerical computing is, read the words of Mark Sofroniou, a developer on the Mathematica Kernel Team. It is so hard, it will challenge you even when dealing with simple arithmetic, and even when you have been a specialist at it for 12 years.
Arithmetic Is Hard—To Get Right by Mark Sofroniou
To get a fell how Lisp can open your mind, with an example taken from numerical computing, read Section 4 of the following paper.
Why Functional Programming Matters by John Hughes
To understand why I said C sucks at math, see the "Encouraging Disaster" section of the following essay on numbers in different programming languages.
The Numbers Mini-Language, by Steve Yegge
posted by gmarceau at 7:43 AM on October 25, 2008 [2 favorites]
Here are some more links in support what I was saying above.
To know exactly how hard numerical computing is, read the words of Mark Sofroniou, a developer on the Mathematica Kernel Team. It is so hard, it will challenge you even when dealing with simple arithmetic, and even when you have been a specialist at it for 12 years.
To get a fell how Lisp can open your mind, with an example taken from numerical computing, read Section 4 of the following paper.
To understand why I said C sucks at math, see the "Encouraging Disaster" section of the following essay on numbers in different programming languages.
posted by gmarceau at 7:43 AM on October 25, 2008 [2 favorites]
Speaking as a bioinformaticist, you don't want Lisp. Pick it up as a hobby, but not for your career. For your needs, you want one of:
1 {Perl, Ruby, Python} - for text parsing, data munging. All of them have bio-related libraries available, with Perl's being the most complete (Bio::Perl). That said, Perl is also the ugliest of the three.
2 {R, Matlab} - for number crunching and stats. My recommendation is R, since it has fantastic libraries available, both for doing stats work and for dealing with all sorts of biological data.
posted by chrisamiller at 7:45 AM on October 25, 2008
1 {Perl, Ruby, Python} - for text parsing, data munging. All of them have bio-related libraries available, with Perl's being the most complete (Bio::Perl). That said, Perl is also the ugliest of the three.
2 {R, Matlab} - for number crunching and stats. My recommendation is R, since it has fantastic libraries available, both for doing stats work and for dealing with all sorts of biological data.
posted by chrisamiller at 7:45 AM on October 25, 2008
Response by poster: R seems like the way to go here, and since it's based on scheme it looks like I have an excuse to check out all the lispy goodness. Thanks for your insight folks.
posted by ghost of a past number at 7:57 AM on October 25, 2008
posted by ghost of a past number at 7:57 AM on October 25, 2008
Best answer: I love R and have used it almost daily in my work for four years or so. It sounds like the best task for "data manipulation, some statistics, lots of curve fitting and the like." I just should warn you of a few things:
posted by grouse at 8:31 AM on October 25, 2008 [1 favorite]
- It has a steep learning curve. The learning curve is even steeper when you use the crappy online resources to learn it. Get a book. I like Introduction to S & S-PLUS by Phil Spector (R is an implementation and superset of the S language).
- It's not always efficient. If you're dealing with data sets of less than 200,000 rows or so, this won't be a big problem though. When I deal with larger datasets, I usually do preprocessing with Python and NumPy.
- It'll be some time before you encounter any of the LISP-like features of R. While the implementation is similar to Scheme, the interface is not.
posted by grouse at 8:31 AM on October 25, 2008 [1 favorite]
R is efficient if you're using it as a direct wrapper for the underlying c/fortran routines. If you have a task which is heavy on iteration or not vectorizable, my understanding is that writing R code is bad at that compared to a compiled language.
posted by a robot made out of meat at 10:13 AM on October 25, 2008
posted by a robot made out of meat at 10:13 AM on October 25, 2008
Python is great for the ease of integration it provides with C, C++, fortran and R.
posted by PueExMachina at 10:17 AM on October 25, 2008
posted by PueExMachina at 10:17 AM on October 25, 2008
Interesting links, PueExMachina, thanks. It seems that people on the bleeding edge are transitioning away from {Mathlab, Maple, Mathematica, R} to Python+Libraries, as well as to Python+R's libraries. Very interesting.
posted by gmarceau at 12:01 PM on October 25, 2008
posted by gmarceau at 12:01 PM on October 25, 2008
gmarceau,
Have you tried Sage? I've been meaning to do so myself.
Nthing R.
I like gnuplot for quick and dirty plots.
posted by lukemeister at 12:43 PM on October 25, 2008
Have you tried Sage? I've been meaning to do so myself.
Nthing R.
I like gnuplot for quick and dirty plots.
posted by lukemeister at 12:43 PM on October 25, 2008
Best answer: Lisp can actually be very fast for numerical code. Common Lisp (one of the two dialects, the other is Scheme, which I know less about) has some very good compliers which produce fast numerical code *if* you declare the types of your variables. The usual way of doing things is to write you code, then add a few declarations in the inner-loop to make them fast, and continue optimization until you get performance you are happy with. Typically, very few declarations are needed since the inner loops tend to be compact.
I have used lisp for statistics / machine learning, and have had good luck with it; it's currently my language of choice for many tasks. Lisp macros are great, and they make a lot of things very easy. However, the library situation isn't as good as with some other languages, and while it's reasonably easy to write wrappers for c/fortran/etc libraries, other languages (like R) tend to have more pre-existing facilities.
R is a weird language -- it's good if you work with data which contains non-numeric columns; otherwise, I would suggest matlab (if you have access). R has nice libraries but is slow, has some strange function names, and is overall a pain for tasks requiring lots of computation that can't be phrased in terms of pre-existing libraries.
posted by bsdfish at 2:59 PM on October 25, 2008
I have used lisp for statistics / machine learning, and have had good luck with it; it's currently my language of choice for many tasks. Lisp macros are great, and they make a lot of things very easy. However, the library situation isn't as good as with some other languages, and while it's reasonably easy to write wrappers for c/fortran/etc libraries, other languages (like R) tend to have more pre-existing facilities.
R is a weird language -- it's good if you work with data which contains non-numeric columns; otherwise, I would suggest matlab (if you have access). R has nice libraries but is slow, has some strange function names, and is overall a pain for tasks requiring lots of computation that can't be phrased in terms of pre-existing libraries.
posted by bsdfish at 2:59 PM on October 25, 2008
The R Book by Crawley is pretty good, and it's a little cheaper than Spector's S book.
R is free, whereas Mathematica is expensive, and Matlab even more so (at least for regular licenses).
posted by lukemeister at 4:35 PM on October 25, 2008
R is free, whereas Mathematica is expensive, and Matlab even more so (at least for regular licenses).
posted by lukemeister at 4:35 PM on October 25, 2008
@lukemeister no I have not tried Sage, but now I want to. I looks really good. I hadn't realized that Sage is Python, which is good news. Their website seems to be trying to hush that fact for some reason.
posted by gmarceau at 7:34 PM on October 25, 2008
posted by gmarceau at 7:34 PM on October 25, 2008
This thread is closed to new comments.
C, Perl and FORTRAN are all much more numbers-oriented, in their ways, than Lisp.
Lisp is very cool and very fun to learn and use, though, and often it can make you think about programming problems in new ways, instead of following the textbook C approaches... that can be helpful in a deep sense.
But statistics? Curve fitting? No, you're going to want to stick to C, and all the libraries out there for almost any stats-oriented task.
But did I mention Lisp was tremendously fun?
posted by rokusan at 2:42 AM on October 25, 2008