Load balancing versus robustness
September 24, 2007 8:47 AM   Subscribe

How important is load balancing versus server power for high volume page serving? That is, is it better to have a really robust server, or two load balanced semi-robust servers? Or is load balancing just a precautionary measure in case of hardware failure, and doesn't significantly affect server response otherwise?
posted by destro to Technology (13 answers total) 3 users marked this as a favorite
 
Best answer: two load balanced servers.

It handles failure better, upgrades easier (add one more server) and can make better use of system resources.
posted by bitdamaged at 8:50 AM on September 24, 2007


oh and generally more cost effective.
posted by bitdamaged at 8:51 AM on September 24, 2007


I work in a major medical hospital that has about 300 servers, and we use load balanced servers for high volume page serving, because we can have the page requests distributed evenly amongst the hosting servers. It's also good in case one server in the group goes down, then you have another one (or more) ready to go.

So in answer to your question, load balancing is good for both load distribution and server response, as well as for redundancy.
posted by omnipotentq at 8:56 AM on September 24, 2007


Best answer: Load balanced servers have higher I/O throughput, better redundancy and are generally more cost effective. Serving web pages is trivially distributable and so is an ideal application. Generally you only need larger servers for tasks that are less easily divisible, like large databases and applications like email, which are really just big databases.

I used to work for a web-based SaaS company and it was horizonal scalaing all the way - many cheap Dell 1U servers on the web layer and fewer, larger servers on the db layer.
posted by GuyZero at 9:06 AM on September 24, 2007


Ask Google:
To deal with the more than 10 billion Web pages and tens of terabytes of information on Google's servers, the company combines cheap machines with plenty of redundancy, Hoelzle said. Its commodity servers cost around $1,000 apiece, and Google's architecture places them into interconnected nodes.
posted by jacobian at 9:30 AM on September 24, 2007


Load balancing also makes patching much easier, since you don't have to take down your site when you're updating. Ditto for backups.
posted by Eddie Mars at 9:49 AM on September 24, 2007


Most large websites you visit are load balancers front-ending server farms.

A modern load balancer is not just some switch that sends requests in multiple directions, it also has all sorts of built-in features: health monitoring, fast failover, virtualization layers, caching, session management and connection handling and so on... That is, a load balancer can be useful even with only one server behind it.

The major disadvantage is multiple servers is synchronization. If each server has the same website then, if you make an update to that site, the change has to be made on all the servers.

There are also a class of applications where you want as big a server as possible. Databases are a good example. Although there are ways to split database among multiple machines, for most cases, data is so interdependent that you dont want the overhead of machines synchronizing with each other across the network.

Here too, a load balancer can be useful. If you can send different clients to different servers then you can split your servers based on some criterion - such as geographic region or whether the request is only a read-request.

In summary: load balancers=good
posted by vacapinta at 9:52 AM on September 24, 2007


Load balancers are good AND don't have to break the bank. Check out Coyote Point Systems' offerings. We're using them and they're *great*.

You have to give a lot of thought to your architecture, as Vacapinta says -- in my case at the day job, it took a *complete* rewrite.
posted by SpecialK at 9:59 AM on September 24, 2007


You have to give a lot of thought to your architecture

The basic issue is where user state gets stored. You have three choices:

1) in the browser. Technically possible, but cookies only let you store a few K, so there's not much space. In reality, every web app out there simply puts a magic value into the cookie and looks up user state in one of the next two places...

2) the web server.

3) the database (or somewhere that all web servers can access)

#2 is fast, but if a machine falls out of the farm the user loses his state. Also, you have to use "sticky" load balancing to ensure that all user requests go to the same web machine. #3 allows you to use true stateless load balancing where every request is load balanced but you have to know this in advance and put the user data into the db or some separate shared location.

Having said that, if it's just straight pages then there is no user state and everything is nice and simple.
posted by GuyZero at 10:43 AM on September 24, 2007


GuyZero: There's also layer 7 load balancing (tcp/ip layer 7) that will allow you to redirect the same user back to the same machine for content that needs to be related to the user's session.
posted by SpecialK at 11:27 AM on September 24, 2007


Consistency: data served to all requesters is always in sync, two requests can never contradict each other. State is maintained perfectly.

Availability: When one node fails, the cluster as a whole is not affected. Single points of failure are reduced or eliminated.

Partitionability: The cluster can be split up into nearly arbitrary subgroups that are individually self contained and cannot necessarily see all other partitions/hosts.

Pick two. Actually, they're more properly a sliding scale but in any case you can't have all three at once. And the closer to perfection each factor gets, the higher the price goes. Exponentially.

In general, I prefer more servers to fewer servers, but I would say that. How Amazon Does It.
posted by Skorgu at 12:12 PM on September 24, 2007


Skorgu: That applies to transactional systems. If I have a read-only application, I can get all three.
posted by vacapinta at 12:27 PM on September 24, 2007


Well, yes but truly read-only apps are incredibly rare. Even with a write-once, read-many model like a blog there's still room to serve outdated content if the scope of the cluster is large enough.

True static content serving is a solved problem anyway, just throw servers + bandwidth/Akamai/S3 at it until you run out of money.
posted by Skorgu at 1:21 PM on September 24, 2007


« Older Where to garage a car near Manhattan for a few...   |   Automatic podcastery possible? Newer »
This thread is closed to new comments.