how is proper load-balancing done these days?
April 8, 2008 1:54 PM   Subscribe

What are large websites using these days to load-balance their traffic?

I have been using round-robin DNS on my site which has five front-end web servers. I am beginning to realize why this is not a good idea - first of all, the first IP address in the round-robin chain seems to get a disproportionately high amount of traffic, and secondly, if one of the servers goes down, then 1/5 of my users can't access the site until I either fix the problem or change the DNS and allow it to re-propagate.

I have realized that many high-volume sites that I visit have a DNS entry that only resolves to a single IP address. So what are they doing differently?

I'm assuming that they are using a "load balancing" device, but is this just a reverse proxy server running Squid or Apache, which would itself be as vulnerable to hardware faults as any other server?

Or are there specific high-availability "load balancing" devices (like something Cisco would make) which automatically act as the front-end for all servers, taking a server out of the loop if it stops responding and/or notifying me by mail? If so, where can I find these devices?
posted by helios to Computers & Internet (16 answers total) 9 users marked this as a favorite
We use LVS. It works by editing ip headers on the fly, and will drop a server out of the rotation almost instantly when there is a failure. The LVS server has a backup that will take over in case the main server fails, and it pretty can easilly handle all the load we throw at it.
posted by aspo at 1:59 PM on April 8, 2008

Everybody's got a different technical solution, but the basic theory goes something like this:

#1: have a common IP address that everyone hits from the domain name;

#2: have a pile of servers that can be queried internally for status;

#3: have a mechanism at the common IP address that does nothing except route traffic (either round-robin or based on current/average load) to a box that has recently been verified as active via the internal status query.

How many levels/machines and the specific implementation you use isn't really the important part; the important part is that part of the system is dedicated to routing users to boxes known to be live.
posted by davejay at 2:00 PM on April 8, 2008

Perlbal is a free option. Cisco and Citrix (Netscaler) also make devices, among many other vendors, that let you do balancing based on just about any scheme you can think up - as well as content switching (directing traffic for certain paths or types to certain backend pools, etc.), edge caching, SSL acceleration, etc. The balancing schemes can go from round robin to lowest average response time to least connection count, and so on.

Low five figures, to start, for the decent stuff. Try Perlbal (and memcached and MogileFS and ..). :)
posted by kcm at 2:01 PM on April 8, 2008 [1 favorite]

(The other problem you've not discuss is session stickiness - if you use sessions in your code, the same backend server needs to be handling the user for the duration of the session in the naive case. You can offload that handling to another layer, though, or float memcached/another virtualized solution on another content-switched pool to do that stuff.)
posted by kcm at 2:02 PM on April 8, 2008

Foundry makes an appliance called a ServerIron which does load-balancing & failover, using the basic premise that davejay outlines. You have a Virtual IP which is mapped to x number of real IPs on real boxes.

Session preservation is something that your code must do, I'm not sure if any load balancing appliance will do it for you (maybe there's one out there I haven't heard of). Foundry's appliance does the SSL acceleration as well, which is useful but not really necessary.
posted by synaesthetichaze at 2:11 PM on April 8, 2008

Seconding LVS - it's not as feature rich as some commercial applications, but in the mode mentioned by aspo it works quite well, and as it only handles incoming packets, can handle very large loads with little hardware.

As others have said - your application needs to be designed around the idea as well, depending on how you want to handle requests, session-data, etc.
posted by TravellingDen at 2:20 PM on April 8, 2008

We use BigIP F5, and it seems a common choice among our counterparts (big higher ed institutions). To give an idea of scale, we just went over 1 billion hits for this year a few weeks ago on our main application, which is load-balanced using cookie-injection and load monitoring on the backend, and it preserves session affinity. It offers a ton of features and a variety of load balancing options, and has a thriving online user community.
posted by idb at 2:30 PM on April 8, 2008

I used LVS successfully at a small site for a number of years.

At my current job, we are using a pair of F5 Big IP Local Traffic Managers. They are hideously expensive (I think they start at $15k each and go up), but actually work extremely well.
posted by alienzero at 2:35 PM on April 8, 2008

I'm not 100% sure how exactly it all works, but we have 2 load-balancers running something called 'pound' and then a number of web-servers that it balances across. I *think* one of the load-balancers is a 'master' and it fails over to the other in event of failure (Both have the same IP?) but I just write the code that goes on the web-servers...
posted by gregjones at 3:03 PM on April 8, 2008

We've used both Foundry and F5, prefer Foundry. Both work well.
posted by togdon at 3:12 PM on April 8, 2008

We use Akamai to load-balance between a few Citrix NetScalers, which in turn load-balance to our server clusters. This is not cheap, but it works amazingly well.
posted by atomly at 3:23 PM on April 8, 2008

Oh, also, both Akamai and the NetScaler are smart enough to take non-functioning devices out of rotation, though you should look at something like Nagios to mail you about any server downtime. Also, if your site is big enough, KeyNote is a good option to monitor site availability and speed worldwide.
posted by atomly at 3:25 PM on April 8, 2008

Try Perlbal (and memcached and MogileFS and ..). :)

Seconded. I work with the teams that maintained (and invented) a lot of this tech, and it just performs like crazy. Plus it's free, not counting the (not insignificant) investment in learning it. On the plus side, that puts you in a position to share infrastructure with a lot of the biggest sites on the web. You can read more about it in Linux magazine, although a stupid registration is required.
posted by anildash at 5:40 PM on April 8, 2008

At my work, we use Coyote Point Systems load balances in one rack, and some BSD servers with routing and hot failover set up for load balancing. The Coyote Point devices are pretty slick, and they do sticky sessions via a variety of magic. The BSD load balancers do sticky sessions based on the traffic source.

Behind the Coyote Point devices, we have two xen virtual machines running squid, and then three virtual machines running Apache behind *them*. All three Apache servers feed from the same content area, which is an OCFS2 partition on our fibre channel SAN. OCFS2 enables multiple agent read/write without lots of lock lag & other issues.

The Coyote Points are actually just BSD boxes on commodity hardware inside a 1u case, but they have a lot of tacked on web access and snmp and stuff that are pretty cool... I especially like the SSL offloading, because then I don't have to have to deal with SSL at the server level.
posted by SpecialK at 6:29 PM on April 8, 2008

Two firms about which I have firsthand knowledge are both using F5 BigIP devices. As stated earlier, they are indeed very expensive but do work well.

There are also open source alternatives (such as LVS, the Linux High Availability Project or UltraMonkey) which can accomplish the same thing.
posted by tomwheeler at 10:11 PM on April 8, 2008

Holy crap how did I miss this thread? This is what I do professionally.

Anyway, the top application load balancer out there is made by F5. Their higher end platforms have the ability to cache data, they are extremely flexible from layers 4-7 and they have a solid HA set up.

The alternate vendors are Cisco with their CSS and ACE lines, although the ACE is new and I do not believe that it personally is a great product due to ease of configuration and tme on the market.

Citrix Netscalers are a decent product, they have the most intuitive GUI and tend to be the easiest to manage via the CLI as well, although they do have stability issues in high volume high availability environments I haven't been pleased with.

Figure on entry level pricing from any of those vendors to be in the neighborhood of 15k per appliance with feature and capacity increases taking you up beyond over 100k per device for the highest end units. All offer advanced L4-7 load balancing and application control feature sets and can run in an HA configuration.

The way this works is you configure your DNS entry to point at a virtual IP hosted on the load balancer. That load balancer has a server pool or application pool associated with the virtual IP and based on any number of metrics makes a decision about which server should get the traffic or which server it should talk to on behalf of the client to retrieve the requested data from the client.

Send mefimail if you have any further questions.
posted by iamabot at 5:01 PM on May 15, 2008

« Older What Would History Have Done?   |   Amsterdam Layover Newer »
This thread is closed to new comments.