Linux or Windows 7 for scientific computing?
October 27, 2010 9:50 AM   Subscribe

Linux or Windows 7 for scientific computing?

I need to set up a lab of half a dozen or so computers, for scientific research. Some of the key programs I need run only on *nix. I also rely heavily on MATLAB, MS Office, and Adobe products.

Most of the serious labs I have seen are *nix based. But science is changing... do we really need Linux anymore? Now that Windows 7 is so stable, and virtualization is so clean (e.g. VirtualBox), could I just set up a Windows 7 lab, with virtual Linux machines for the data analysis, and all the advantages that Windows brings (MS Office is necessary for collaborations with colleagues, web sites work better, etc.)?

Or if I do go the Linux route, should it be RedHat/CentOS, or Debian/Ubuntu?

I am basically just looking for the thoughts of practicing scientists on where we are heading with regard to operating systems in research labs. Thanks for any input.
posted by tabulem to Computers & Internet (20 answers total) 4 users marked this as a favorite
 
Do you need GUI programs in Linux? Why not have windows workstations and a shared Linux server?
posted by demiurge at 9:54 AM on October 27, 2010


I admin a bunch of geophysics PhDs, and they tend to have one workstation of each, or dual-boot.

I don't know what apps you are going to run, but I have found less-than-stellar performance in virtualized environments when the apps are resource-hungry.

I don't have any preference between RH and Debian flavors, except Debian is my well-broken-in baseball glove, as it were. We have a bunch of both, and both work well.
posted by everichon at 10:05 AM on October 27, 2010


What kind of science? What programs? Are you a bioinformatics lab, doing development work, or a wet lab that just needs to munge stuff in a spreadsheet from time to time? These things matter.

web sites work better

Maybe 10 years ago, but that's not true today.


MS Office is necessary for collaborations with colleagues

Office is horrible for collaborative writing. It's not always simple to buck the trend, but try setting up shared google docs instead.
posted by chrisamiller at 10:10 AM on October 27, 2010


Virtualiztion tends to kill performance on anything with heavy i/o, especially disk writes, so it's not a good option for number crunching, if that's the main purpose of these computers. So virtualized Windows to run Office will work better than virtualized linux to run Matlab. If you do decided to go with Windows, make sure the version of Windows and the software you need to run can access all the RAM that the machine has- some versions top out at 2GB.

If you choose Linux, I'd ask around to see which distros are in wider use around your institution, as you'll be able to get more help troubleshooting. Also, some places have an "official" distro, with yum or apt-get repository that has proprietary scientific programs already packaged and licensed. That makes installation and updates a lot easier.

My own group is tied to Linux for some things (stats) and Windows for others (GIS), so I run both, with remote access. Like demiurge suggests, I think it's easiest to set up machines with a specific purpose, as a server, and grant access to those who need it, rather than come up with a one-sized-fits-all solution. Our researchers do writing, lighter number crunching, and email on their own laptops, which are whatever OS they prefer.
posted by bendybendy at 10:34 AM on October 27, 2010


In my lab (physics grad students) the general setup is that everyone gets a Linux workstation for everyday work, and then there's a few shared Windows boxes for whenever they're needed. This works for us because pretty much everything we need to do can be done in Linux, so it may not be right for you.
posted by auto-correct at 10:44 AM on October 27, 2010


This is a non-answer. But it sounds like you want to run Windows. "But science is changing... do we really need Linux anymore?"

You've suggested that either way you'll be missing some application support, and forced to run some kind of compatibility layer or emulation.

You also allude to two things that I don't fully understand:
1)MS Office is necessary for collaborations with colleagues.

Is this a format issue? If so, I'd think that Open Office should fill that gap.

2)web sites work better.
There are very few sites that require internet explorer. Again, I'm not sure why this would be an issue.

It sounds like you'd prefer Windows, although from your arguments I'm not entirely sure what it offers you. Just a personal taste or comfort thing? If so, that's fine. To my ear, it sounds like either Linux or Windows would suit your needs, with some tweaking. If you prefer Windows and are willing to swallow the cost, go that route. If you want to save a buck, stick with Linux.
posted by Stagger Lee at 10:44 AM on October 27, 2010


In the lab I work in, I take the reverse approach, running a Linux box and virtualizing Windows when I really need some MS Office-specific feature. So, you might consider taking that approach instead, especially since MATLAB works just fine natively in Linux (maybe you already know that, sorry -- couldn't tell from the post). I also like this virtualization scheme because generally editing documents doesn't require as many resources as, e.g., analyzing data: you probably want to avoid adding overhead to tasks that are already computationally intensive. [On preview, what bendybendy said.]

I used CentOS briefly around 2008ish, then switched to Ubuntu after a few months. YMMV, but I think the packages I used were more up to date in Ubuntu, and there was better informal support (e.g. online forums). Also, there's the magic of apt-get with a Debian-like distro; there's yum on CentOS/RHL but I don't remember it being nearly as useful. Maybe it's better now.

In this day and age I've never really seen a website that didn't work properly under Linux -- even Flash seems to work just fine -- with the sole exception being Netflix streaming, and then only because it relies on a particular Microsoft-specific DRM system. (This could actually be a fringe benefit for going with Linux at work, now that I think about it.)
posted by en forme de poire at 10:48 AM on October 27, 2010


Maybe it's just the scientists I know, but have you considered the Mac? Office and Adobe products naturally while very Unixy when you need it.
posted by advicepig at 10:50 AM on October 27, 2010


But I should say that as other people have also mentioned, in my lab most "serious" computation tends to get handed off to Linux servers and clusters, so people are free to use whatever OS they feel the most comfortable with. Fortunately, most of the tools we use are either open-source or cross-platform.

There's one other tiny benefit to using Linux on both ends, and that is copying binaries: if your servers/cluster nodes have the same architecture as your workstations, and your binaries are statically linked, you can often get away with copying them rather than recompiling. (This can be a time-saver if, for example, one of your programs requires a specific version of gcc, or a specific Fortran compiler that isn't installed globally on the cluster.)
posted by en forme de poire at 11:00 AM on October 27, 2010


it really really depends on the field.

I work in the biochemistry/structural biology field where virtually all the major software is written for Linux. I actually think the improved usability of Linux (through distros such as Ubuntu/Debian) actually makes it a better choice than what the situation was several years ago.

MATLAB has a reasonably well supported Linux version.

Office - I have never had a problem with OpenOffice in writing papers and the like. Bibus is a very underrated reference manager for *nix systems if that it what your issue is.

Adobe - well.. again. what programs are you using specifically?
posted by TheOtherGuy at 11:02 AM on October 27, 2010


This is what my lab does. It works pretty well.

We have a huge awesome cluster that runs some flavor of Linux. That's where Matlab and other Unix programs live. Everyone in the lab has a Mac (which I'm surprised to see hasn't been mentioned), and many people have a local copy of Matlab as well. We can either SSH in to use Matlab from the command line, or get a GUI with X11. Either way, we can can put through multiprocessor jobs on the 100-node cluster, or we can use the Parallel Computing toolbox locally on our 4- to 12-core machines.

Our SVN repositories also live on the cluster, and everyone has a local checkout of things they're working on. We have repositories for everything-- in-house Matlab toolboxes, experiment code, hand-entered data, as well as grants and papers, written in LaTeX. (SVN+LaTeX is a much better collaborative writing tool than Office or even Google Docs. Steep learning curve, though.)

And of course, we also have local copies of Office and/or iWork, as well as Adobe Suite and the like.

On preview: okay, one other person mentioned Mac. But I'm surprised it wasn't mentioned earlier/more frequently.
posted by supercres at 11:07 AM on October 27, 2010


For ease of management, I recommend using Windows 7 with Linux Virtualboxed within it. You can go the other way around with Virtualbox as well which performs fairly nicely. For optimal performance, pick the main OS based on which one will be doing the heaviest processing and need the most memory. Dual boot is another viable option especially if these PCs are being used primarily as instruments/tools and do not to be secured to enterprise standards.
posted by samsara at 11:30 AM on October 27, 2010


A coworker filled in my details: our cluster is 26 machines attached to a RAID. Each machine has 2 GB of RAM and 8 cores, so 200+ processors. The whole thing runs Red Hat Linux.
posted by supercres at 11:41 AM on October 27, 2010


I'd use Linux as the host OS and Windows as the Virtual OS.
posted by delmoi at 12:34 PM on October 27, 2010 [2 favorites]


GIS and Matlab and R (and other custom things) on Windows XP. Windows isn't a problem for us.
posted by bonehead at 12:39 PM on October 27, 2010


supercres describes almost the exact situation in my lab. Lots of linux desktops that form a big cluster (with only a few dedicated high-spec machines and RAID data storage systems that are in a separate server area). All data and CPU power is accessible via infiniband or gigabit connections. Most people work locally on their desktop, while large jobs are distributed over all the nodes, running in the background. These jobs can be week-long jobs distributed over hundreds of cores, requiring lots of memory (CFD). Usually you don't even notice the number-crunching that is going on in the background when you're writing up a paper.

We have a single Windows client server that we can all access from linux to do "true" Windows Office stuff, otherwise people use Open Office.

We also have a standardized distribution (Scientific Linux) and installation image, with most of the larger applications provided by a single server to keep maintenance easy.

Our lab has always been very "DIY" and the drawback for our present routine is that we cannot rely on university IT support (I think they recently hired one linux expert...).
posted by swordfishtrombones at 12:44 PM on October 27, 2010


swordfishtrombones setup is similar to mine. A small windows client server with the windows apps is adequate. We spend a fairly small amount of time wanting to do Adobe Anything or needing real MSOffice. I (and others) have laptops which dual-boot Win/OSX or Win/Lin.
posted by a robot made out of meat at 1:40 PM on October 27, 2010


But science is changing... do we really need Linux anymore?

"Science" may be changing, but that doesn't mean that people are racing to rewrite all their software for Windows 7. (I also have yet to run into any scientific website that "works better" only on Windows or that requires Internet Explorer.)

What kind of science? And are we talking about a cluster, or personal computers?

Most big, complicated bioinformatics and structural bio software that I know of runs on *nix, usually in devoted clusters with good hardware (lots of RAM, lots of powerful processors, RAID arrays for data, etc.). Running computation-hungry programs via virtualization from seems like a great way to kill performance. Usually it's possible to dial into these clusters and do at least some of the work remotely. For most of the less computer-intensive stuff - Matlab or ChemDraw, or MacVector or VectorNTI, not to mention MS Office & Adobe stuff - people work on their own laptops or desktops, since there are options available on Windows and Mac (and sometimes non-Mac *nix). Most labs I've worked for also have a common machine or two to make sure that everyone can access common programs that they may not have on their desktop (or to access programs that there is no site license for.)

(Oh, additionally, individual pieces of equipment may or may not have specific OS requirements due to their custom-made software - I've run into DOS, every version of Windows from 3.1 on, various iterations of the Mac OS, and plenty of *nix installations. However, pretty much everything will export data in some sort of number, text, or image form, so this generally does not dictate what operating system lab members use.)
posted by ubersturm at 2:09 PM on October 27, 2010


Our neuroimaging lab has an ostensibly badass (but poorly administrated) cluster running KDE Linux and then 5 pretty iMacs. Macs are nice in the lab because you can get down and dirty with em' if you know how, but the people who come into the lab less frequently can still use them without having to bash a command line or use a particularly esoteric GUI. The lab macs are also really easy to set up for remote access/management, so people who don't necessarily know a ton about computers can still get files they need from afar.

This may be a unique setup, in that the number of people who occasionally use the lab boxen is large, while the number of people who are familiar enough with the computational aspects of our work (and hence do the bulk the analyses) is fairly limited. However, having macs is definitely ideal for this sort of situation, since those who know how can do anything they need to, and those who don't can still use the computers for more routine stuff without a hassle. Plus, they're pretty.
posted by solipsophistocracy at 6:48 PM on October 27, 2010


Response by poster: Thanks for the great responses everyone. I really appreciate this range of perspectives.

I am hearing several times that virtual machines are not good for serious number crunching. Based on that, Linux as the base OS with virtual machines to run Windows when needed sounds like the way to go.

Yes, Macs are nice and there are lots of people in my field who use them. So that would be another alternative, but I've already made investments in the Lin/Win direction.

Thanks again for your input.
posted by tabulem at 6:25 PM on October 29, 2010


« Older Android phone information   |   What would you/do you track in a daily record? Newer »
This thread is closed to new comments.