ECC ram for my personal server?
January 1, 2009 7:53 PM   Subscribe

ECC memory on a personal Linux server?

My friends and I are upgrading our server that hosts a few VMs that run under VMWare Server 2.0 on a Ubuntu 8.10 box. We are buying this small ASUS 1U server (http://tinyurl.com/9dr3ee) from NewEgg and it happens to support ECC or non-ECC memory...

When I ususally build my custom whitebox servers I almost never use ECC memory.. when I do things a corporate environment I usually have no choice and always get ECC memory for servers..

Here are my questions:

1. Is it worth getting ECC memory for a personal server? I haven't had any issues in the past (that I know of) with non-ECC based servers but now I am kinda freaking out over not getting it.

2. Its not clear what RAM I would get for it if I do go the ECC route.. NewEgg has this RAM (http://tinyurl.com/9s82dk) but then NewEgg says is for HP/Compaq systems..

Lazyweb..plz help!
posted by cowmix to Computers & Internet (12 answers total) 1 user marked this as a favorite
 
If you're doing something "important" with it, where data integrity is paramount, then go ECC. For what would pass as personal or recreational use, it's an expensive extravagance. If you're planning to run websites off it, and do some screwing around, I wouldn't bother with the ECC. If this is going to be some huge database or application server, get thee to the ECC-Mart immediately.

But it sounds like your use falls into the non-ECC side of things quite firmly. Do ensure you have a good backup routine regardless... but that is of course good advice no matter what.
posted by barc0001 at 8:10 PM on January 1, 2009


I understand the purpose of ECC RAM, but not the point. I mean, I've never noticed any issue resulting from cosmic ray bit flipping. Even on personal compute/compile servers with multiple year uptimes. Not to say that bits didn't flip, but they certainly didn't matter.

I've always assumed that ECC existed for space missions, hardened military machines, and for soaking the PHB for an extra 50%.
posted by Netzapper at 8:16 PM on January 1, 2009



I wouldn't invest the extra time in researching it or the money, if it in fact costs more. But then I run stuff I find in dumpsters. I think all the kernel does with is send the codes to syslog. This is more insurance in regards to HA than preventing data loss or increasing performance. Run memtest on the box and be happy.
posted by ezekieldas at 8:16 PM on January 1, 2009


I think all the kernel does with is send the codes to syslog.

Single-bit errors should be automatically (and transparently) detected and corrected. Since single-bit errors are theoretically what cosmic rays and background EMR produce, it's useful in this situation. Except that I've never actually seen that situation.
posted by Netzapper at 8:32 PM on January 1, 2009


In my experience running farms of a few thousands machines here and there, you're more likely to have Ext3 silently puke all over you than to have an ECC-correctable problem. That said, it does have its place in situations where 100% uptime is critical (telco, banking, etc.) but you're not it.
posted by kcm at 8:34 PM on January 1, 2009


Compare the cost of a fully loaded board with ECC vs non-ECC RAM, then consider the opportunity cost of the difference. I'd put the money into a better power supply or better drives (your most likely points of failure) before I'd put it in ECC RAM.
posted by dws at 8:51 PM on January 1, 2009


As memory becomes more dense and has smaller features, it becomes more susceptible to cosmic ray upset. If data integrity is important, then you should use ECC RAM. However, ECC RAM slightly reduces performance because each time you modify data that is not aligned on a 64-bit boundary, the memory unit must read out the entire 64-bit line, modify the necessary bytes and recalculate the new ECC. This may occur on the beginning and end of a data burst.
posted by JackFlash at 10:48 PM on January 1, 2009


What dws said. It's probably a "nice to have," but it's expensive and there are almost certainly better things you could be doing with the money. (A big UPS, dual redundant power supplies, and a heavily redundant disk system with lots of hotspares will all probably save your butt much more quickly than money spent on ECC memory will.)

I think ECC is cool stuff, but I've had servers both with and without it, and I've never had its presence or absence do anything, either way.
posted by Kadin2048 at 12:19 AM on January 2, 2009


Isn't putting a tinfoil hat on your server cheaper than ECC? Or a lead vest? To be honest -- of the 50 servers I have managed, the only 'memory' problems were from ECC machines. Odd.
posted by SirStan at 6:11 AM on January 2, 2009


Ever had a server freak out for what apparently is no reason at all? ECC does a lot to stop that from happening.

My feeling though, is that if you're going to pay for high end shit to try and save you in the "once a year" implausible worst case scenario event, you might as well go for serious levels of redundancy instead. Usually you get better overall performance, and as a result of designing redundancy in, tend to get more scalable service architectures.
posted by Freen at 9:35 AM on January 2, 2009


Netzapper, how do you know you've never seen a problem caused by a random single-bit error (or the accumulation thereof)? Are you saying that every problem you've had with your systems has been explained by some other cause (failing CPU or motherboard, identified software bug, bad HDD, etc). If so, then you are either an incredibly rigorous sysadmin, or have a lot more luck with computers than anyone I know? SirStan, how do you know that your non-ECC machines haven't been having transient memory errors that you've never detected as such? Kcm, do you know that EXT3 isn't puking all over itself because data structures were corrupted by accumulated undetected single bit errors.

The point of ECC memory is that without it, you don't know if you've had data corrupted because of a soft memory error, something grows more likely as system memories have grown. In addition, the events that cause a single bit error will often flip bits in adjacent memory cells, so chances are that any single bit soft error is probably just one of many.

That's not to say that cowmix should drop money on ECC for his application, but it should be understood that 1) ECC isn't so much about uptime as data integrity, 2) Without ECC there is no real way of measuring the rate of single bit errors.

I built some cheap servers at work recently. I made sure the CPUs and MBs I got supported ECC memory, but I only used it in the machine that I stuffed full of storage and hosted long-running VMs on.

NewEgg sells lots of ECC capable memory, you can power search to narrow down to what your system requires.
posted by Good Brain at 10:30 AM on January 2, 2009


Good Brain: You have a point, I don't know that I've never had a soft error. But, if you'll look at what I said, it's that I've never noticed one. I've never had a machine crash under unrepeatable circumstances (regardless of whether or not I knew the underlying cause), and no data that I've cared about has appeared corrupted. I'm certainly not rigorous in my admin, so I'll have to assume it's luck.
posted by Netzapper at 12:07 PM on January 2, 2009


« Older Advice on Cartagena, Columbia   |   1980s TV Movie - Planet w/Non-Stop Rain Newer »
This thread is closed to new comments.