What's the best available CPU/RAM right now for my needs?
November 1, 2011 5:28 PM Subscribe
I need to build a server that runs a CPU-bound, memory-intensive task that fully utilizes 3 cores and 24GB of RAM, but never needs to read/write from the disk other than loading the OS. What's the fastest CPU/RAM combination available for this task? I don't need server-level data integrity, but I don't mind buying a server-class CPU if it's the fastest option.
I know that "CPU-bound, memory-intensive" might not be precise enough to answer the question perfectly, so if there's not a clear winner, then I'd be interested in several options. I'd also like to spend under $2,500 on the CPU and RAM combined.
I know that "CPU-bound, memory-intensive" might not be precise enough to answer the question perfectly, so if there's not a clear winner, then I'd be interested in several options. I'd also like to spend under $2,500 on the CPU and RAM combined.
Response by poster: My budget is aprpox. $3,000 - but I probably won't need to buy a chassis/drive/PS, just a motherboard.
posted by helios at 5:42 PM on November 1, 2011
posted by helios at 5:42 PM on November 1, 2011
Can you tell us a little more about the task, specifically the timeframe and frequency: for example, is this a one-time thing, or something you anticipate being an ongoing task running a maxed out CPU for months/years at a time, or some recurring task that will take up a pantload of CPU for a few hours a week/month? The reason I ask is that Amazon EC2 is very well-suited for stuff like this, especially when using the GPU option. $2500 would get you a hell of a lot of processor time there.
posted by deadmessenger at 7:23 PM on November 1, 2011
posted by deadmessenger at 7:23 PM on November 1, 2011
Response by poster: It's an ongoing task running a maxed out CPU and RAM indefinitely.
Cloud computing is not an option, nor is using GPUs.
I just need the fastest per-core computing power that's available, keeping in mind that memory access is important too. I should mention that it's actually 3 single-threaded processes, one on each core. So raw integer, floating-point, and memory performance is all that's important.
Right now I'm using an 18-month old X3460 with DDR3-1333 ECC RAM. I'm hoping to increase performance by 50%.
odinsdream, I'm not convinced that's what I'm looking for. For raw CPUmark scores, it looks like an i7 980X is 66% faster than the 4130, and the 4130 is faster than the 4122.
My current plan was to use an i7-990X with DDR3-1600 RAM. It seems like the fastest possible option for my needs, but it's been a long time since I've done this, so I wanted to make sure that I'm not missing some important details, like multi-channel RAM capability or some features of the new Intel/AMD chipsets that would be relevant.
posted by helios at 8:42 PM on November 1, 2011
Cloud computing is not an option, nor is using GPUs.
I just need the fastest per-core computing power that's available, keeping in mind that memory access is important too. I should mention that it's actually 3 single-threaded processes, one on each core. So raw integer, floating-point, and memory performance is all that's important.
Right now I'm using an 18-month old X3460 with DDR3-1333 ECC RAM. I'm hoping to increase performance by 50%.
odinsdream, I'm not convinced that's what I'm looking for. For raw CPUmark scores, it looks like an i7 980X is 66% faster than the 4130, and the 4130 is faster than the 4122.
My current plan was to use an i7-990X with DDR3-1600 RAM. It seems like the fastest possible option for my needs, but it's been a long time since I've done this, so I wanted to make sure that I'm not missing some important details, like multi-channel RAM capability or some features of the new Intel/AMD chipsets that would be relevant.
posted by helios at 8:42 PM on November 1, 2011
If you've got a specific task that it's already doing, have you profiled the execution? Because when you say, "CPU bound" but then follow it up with "memory intensive," those are two entirely separate things, as are integer vs. floating point. Is it bound by integer instructions, floating point instructions or by memory bandwidth? Typically the biggest performance gain you'll get is not from installing faster CPUs but rather by finding your specific program's bottlenecks.
If it were me, I'd run the GNU Profiler, gprof (I'm sure there's a Windows equivalent) and see what functions are taking the most time. If it's floating point stuff, try optimizing it for the MMX/SSE/etc. operations. If it's integer performance, try caching expensive results, removing branches, unrolling your loops, etc. Modern CPUs have deep pipelines of a dozen instructions or more, so if statements tend to throw them for a loop (pun intended). If memory bandwidth is the issue, try to access data sequentially to take advantage of cache locality. All this stuff takes less time to try than it will take the new CPU to arrive.
Also, if you have three threads already, are you sure you can't run on six? Twelve? More parallelization of the slow parts could help a lot.
The CPUmark benchmark, as described on their site, takes a lot of factors into account. It's not likely that your application, which is quite unusual compared to most application software, would have much correlation with the CPUmark score. Sure, GHz will help anything.
posted by wnissen at 9:46 PM on November 1, 2011 [1 favorite]
If it were me, I'd run the GNU Profiler, gprof (I'm sure there's a Windows equivalent) and see what functions are taking the most time. If it's floating point stuff, try optimizing it for the MMX/SSE/etc. operations. If it's integer performance, try caching expensive results, removing branches, unrolling your loops, etc. Modern CPUs have deep pipelines of a dozen instructions or more, so if statements tend to throw them for a loop (pun intended). If memory bandwidth is the issue, try to access data sequentially to take advantage of cache locality. All this stuff takes less time to try than it will take the new CPU to arrive.
Also, if you have three threads already, are you sure you can't run on six? Twelve? More parallelization of the slow parts could help a lot.
The CPUmark benchmark, as described on their site, takes a lot of factors into account. It's not likely that your application, which is quite unusual compared to most application software, would have much correlation with the CPUmark score. Sure, GHz will help anything.
posted by wnissen at 9:46 PM on November 1, 2011 [1 favorite]
Intel's VTune will also tell you a lot about your code.
posted by pharm at 3:50 AM on November 2, 2011
posted by pharm at 3:50 AM on November 2, 2011
Response by poster: Thanks for the profiling advice, but that's not really what I was asking. I definitely want to spend time profiling, optimizing, and even potentially rewriting the application - but in the short term, I have to buy a new server anyway (the old one is being re-purposed), so I wanted to make the smartest educated guess in terms of hardware without actually benchmarking the app on every possible combination.
Typically making such a decision involves hours of searching various websites for benchmarks, but I was hoping that someone on the site might already be up-to-date on this info and had advice (for example, I was planning on buying an i7-990X, but if there's another processor that easily outperforms it in almost every benchmark, I'd like to know about that, etc..)
posted by helios at 12:34 AM on November 3, 2011
Typically making such a decision involves hours of searching various websites for benchmarks, but I was hoping that someone on the site might already be up-to-date on this info and had advice (for example, I was planning on buying an i7-990X, but if there's another processor that easily outperforms it in almost every benchmark, I'd like to know about that, etc..)
posted by helios at 12:34 AM on November 3, 2011
The trouble is, without profiling it's impossible to give accurate advice. You describe the code as "CPU" bound, but that might really be memory bandwidth bound, in which case buying a 12 way CPU won't help if it has the same memory interface as a 4-way one.
posted by pharm at 2:10 PM on November 3, 2011
posted by pharm at 2:10 PM on November 3, 2011
What everyone has been trying to tell you is there's no such thing as a generic Fast Computer any more. Fifteen years ago, you could point to the one with the highest megahertz (and the biggest price tag) and say, "That's the fastest PC chip in the whole world." That was because memory latency and bandwidth were nearly keeping up with CPU speed, and hard disks weren't nearly the bottleneck they are now.
Nowadays, the performance of a low-end quad core system with a well-threaded application can blow away even the fastest nitrogen-cooled single core out there. And vice versa! It all depends on what you're trying to do. If you don't know what you're limited by (any even the most experienced programmers will tell you that they are constantly surprised by bottlenecks) then yeah, find a broad CPU benchmark and pick the fastest one that fits your budget. Since the app is threaded already, my instinct would be for more, slower cores, as long as you wouldn't give up too much memory bandwidth.
posted by wnissen at 1:05 PM on November 14, 2011
Nowadays, the performance of a low-end quad core system with a well-threaded application can blow away even the fastest nitrogen-cooled single core out there. And vice versa! It all depends on what you're trying to do. If you don't know what you're limited by (any even the most experienced programmers will tell you that they are constantly surprised by bottlenecks) then yeah, find a broad CPU benchmark and pick the fastest one that fits your budget. Since the app is threaded already, my instinct would be for more, slower cores, as long as you wouldn't give up too much memory bandwidth.
posted by wnissen at 1:05 PM on November 14, 2011
This thread is closed to new comments.
What's your budget for the whole thing, chassis and all?
posted by mhoye at 5:31 PM on November 1, 2011