how is proper load-balancing done these days?
April 8, 2008 1:54 PM Subscribe
What are large websites using these days to load-balance their traffic?
I have been using round-robin DNS on my site which has five front-end web servers. I am beginning to realize why this is not a good idea - first of all, the first IP address in the round-robin chain seems to get a disproportionately high amount of traffic, and secondly, if one of the servers goes down, then 1/5 of my users can't access the site until I either fix the problem or change the DNS and allow it to re-propagate.
I have realized that many high-volume sites that I visit have a DNS entry that only resolves to a single IP address. So what are they doing differently?
I'm assuming that they are using a "load balancing" device, but is this just a reverse proxy server running Squid or Apache, which would itself be as vulnerable to hardware faults as any other server?
Or are there specific high-availability "load balancing" devices (like something Cisco would make) which automatically act as the front-end for all servers, taking a server out of the loop if it stops responding and/or notifying me by mail? If so, where can I find these devices?
I have been using round-robin DNS on my site which has five front-end web servers. I am beginning to realize why this is not a good idea - first of all, the first IP address in the round-robin chain seems to get a disproportionately high amount of traffic, and secondly, if one of the servers goes down, then 1/5 of my users can't access the site until I either fix the problem or change the DNS and allow it to re-propagate.
I have realized that many high-volume sites that I visit have a DNS entry that only resolves to a single IP address. So what are they doing differently?
I'm assuming that they are using a "load balancing" device, but is this just a reverse proxy server running Squid or Apache, which would itself be as vulnerable to hardware faults as any other server?
Or are there specific high-availability "load balancing" devices (like something Cisco would make) which automatically act as the front-end for all servers, taking a server out of the loop if it stops responding and/or notifying me by mail? If so, where can I find these devices?
Everybody's got a different technical solution, but the basic theory goes something like this:
#1: have a common IP address that everyone hits from the domain name;
#2: have a pile of servers that can be queried internally for status;
#3: have a mechanism at the common IP address that does nothing except route traffic (either round-robin or based on current/average load) to a box that has recently been verified as active via the internal status query.
How many levels/machines and the specific implementation you use isn't really the important part; the important part is that part of the system is dedicated to routing users to boxes known to be live.
posted by davejay at 2:00 PM on April 8, 2008
#1: have a common IP address that everyone hits from the domain name;
#2: have a pile of servers that can be queried internally for status;
#3: have a mechanism at the common IP address that does nothing except route traffic (either round-robin or based on current/average load) to a box that has recently been verified as active via the internal status query.
How many levels/machines and the specific implementation you use isn't really the important part; the important part is that part of the system is dedicated to routing users to boxes known to be live.
posted by davejay at 2:00 PM on April 8, 2008
Perlbal is a free option. Cisco and Citrix (Netscaler) also make devices, among many other vendors, that let you do balancing based on just about any scheme you can think up - as well as content switching (directing traffic for certain paths or types to certain backend pools, etc.), edge caching, SSL acceleration, etc. The balancing schemes can go from round robin to lowest average response time to least connection count, and so on.
Low five figures, to start, for the decent stuff. Try Perlbal (and memcached and MogileFS and ..). :)
posted by kcm at 2:01 PM on April 8, 2008 [1 favorite]
Low five figures, to start, for the decent stuff. Try Perlbal (and memcached and MogileFS and ..). :)
posted by kcm at 2:01 PM on April 8, 2008 [1 favorite]
(The other problem you've not discuss is session stickiness - if you use sessions in your code, the same backend server needs to be handling the user for the duration of the session in the naive case. You can offload that handling to another layer, though, or float memcached/another virtualized solution on another content-switched pool to do that stuff.)
posted by kcm at 2:02 PM on April 8, 2008
posted by kcm at 2:02 PM on April 8, 2008
Foundry makes an appliance called a ServerIron which does load-balancing & failover, using the basic premise that davejay outlines. You have a Virtual IP which is mapped to x number of real IPs on real boxes.
Session preservation is something that your code must do, I'm not sure if any load balancing appliance will do it for you (maybe there's one out there I haven't heard of). Foundry's appliance does the SSL acceleration as well, which is useful but not really necessary.
posted by synaesthetichaze at 2:11 PM on April 8, 2008
Session preservation is something that your code must do, I'm not sure if any load balancing appliance will do it for you (maybe there's one out there I haven't heard of). Foundry's appliance does the SSL acceleration as well, which is useful but not really necessary.
posted by synaesthetichaze at 2:11 PM on April 8, 2008
Seconding LVS - it's not as feature rich as some commercial applications, but in the mode mentioned by aspo it works quite well, and as it only handles incoming packets, can handle very large loads with little hardware.
As others have said - your application needs to be designed around the idea as well, depending on how you want to handle requests, session-data, etc.
posted by TravellingDen at 2:20 PM on April 8, 2008
As others have said - your application needs to be designed around the idea as well, depending on how you want to handle requests, session-data, etc.
posted by TravellingDen at 2:20 PM on April 8, 2008
We use BigIP F5, and it seems a common choice among our counterparts (big higher ed institutions). To give an idea of scale, we just went over 1 billion hits for this year a few weeks ago on our main application, which is load-balanced using cookie-injection and load monitoring on the backend, and it preserves session affinity. It offers a ton of features and a variety of load balancing options, and has a thriving online user community.
posted by idb at 2:30 PM on April 8, 2008
posted by idb at 2:30 PM on April 8, 2008
I used LVS successfully at a small site for a number of years.
At my current job, we are using a pair of F5 Big IP Local Traffic Managers. They are hideously expensive (I think they start at $15k each and go up), but actually work extremely well.
posted by alienzero at 2:35 PM on April 8, 2008
At my current job, we are using a pair of F5 Big IP Local Traffic Managers. They are hideously expensive (I think they start at $15k each and go up), but actually work extremely well.
posted by alienzero at 2:35 PM on April 8, 2008
I'm not 100% sure how exactly it all works, but we have 2 load-balancers running something called 'pound' and then a number of web-servers that it balances across. I *think* one of the load-balancers is a 'master' and it fails over to the other in event of failure (Both have the same IP?) but I just write the code that goes on the web-servers...
posted by gregjones at 3:03 PM on April 8, 2008
posted by gregjones at 3:03 PM on April 8, 2008
We've used both Foundry and F5, prefer Foundry. Both work well.
posted by togdon at 3:12 PM on April 8, 2008
posted by togdon at 3:12 PM on April 8, 2008
We use Akamai to load-balance between a few Citrix NetScalers, which in turn load-balance to our server clusters. This is not cheap, but it works amazingly well.
posted by atomly at 3:23 PM on April 8, 2008
posted by atomly at 3:23 PM on April 8, 2008
Oh, also, both Akamai and the NetScaler are smart enough to take non-functioning devices out of rotation, though you should look at something like Nagios to mail you about any server downtime. Also, if your site is big enough, KeyNote is a good option to monitor site availability and speed worldwide.
posted by atomly at 3:25 PM on April 8, 2008
posted by atomly at 3:25 PM on April 8, 2008
Try Perlbal (and memcached and MogileFS and ..). :)
Seconded. I work with the teams that maintained (and invented) a lot of this tech, and it just performs like crazy. Plus it's free, not counting the (not insignificant) investment in learning it. On the plus side, that puts you in a position to share infrastructure with a lot of the biggest sites on the web. You can read more about it in Linux magazine, although a stupid registration is required.
posted by anildash at 5:40 PM on April 8, 2008
Seconded. I work with the teams that maintained (and invented) a lot of this tech, and it just performs like crazy. Plus it's free, not counting the (not insignificant) investment in learning it. On the plus side, that puts you in a position to share infrastructure with a lot of the biggest sites on the web. You can read more about it in Linux magazine, although a stupid registration is required.
posted by anildash at 5:40 PM on April 8, 2008
At my work, we use Coyote Point Systems load balances in one rack, and some BSD servers with routing and hot failover set up for load balancing. The Coyote Point devices are pretty slick, and they do sticky sessions via a variety of magic. The BSD load balancers do sticky sessions based on the traffic source.
Behind the Coyote Point devices, we have two xen virtual machines running squid, and then three virtual machines running Apache behind *them*. All three Apache servers feed from the same content area, which is an OCFS2 partition on our fibre channel SAN. OCFS2 enables multiple agent read/write without lots of lock lag & other issues.
The Coyote Points are actually just BSD boxes on commodity hardware inside a 1u case, but they have a lot of tacked on web access and snmp and stuff that are pretty cool... I especially like the SSL offloading, because then I don't have to have to deal with SSL at the server level.
posted by SpecialK at 6:29 PM on April 8, 2008
Behind the Coyote Point devices, we have two xen virtual machines running squid, and then three virtual machines running Apache behind *them*. All three Apache servers feed from the same content area, which is an OCFS2 partition on our fibre channel SAN. OCFS2 enables multiple agent read/write without lots of lock lag & other issues.
The Coyote Points are actually just BSD boxes on commodity hardware inside a 1u case, but they have a lot of tacked on web access and snmp and stuff that are pretty cool... I especially like the SSL offloading, because then I don't have to have to deal with SSL at the server level.
posted by SpecialK at 6:29 PM on April 8, 2008
Two firms about which I have firsthand knowledge are both using F5 BigIP devices. As stated earlier, they are indeed very expensive but do work well.
There are also open source alternatives (such as LVS, the Linux High Availability Project or UltraMonkey) which can accomplish the same thing.
posted by tomwheeler at 10:11 PM on April 8, 2008
There are also open source alternatives (such as LVS, the Linux High Availability Project or UltraMonkey) which can accomplish the same thing.
posted by tomwheeler at 10:11 PM on April 8, 2008
Holy crap how did I miss this thread? This is what I do professionally.
Anyway, the top application load balancer out there is made by F5. Their higher end platforms have the ability to cache data, they are extremely flexible from layers 4-7 and they have a solid HA set up.
The alternate vendors are Cisco with their CSS and ACE lines, although the ACE is new and I do not believe that it personally is a great product due to ease of configuration and tme on the market.
Citrix Netscalers are a decent product, they have the most intuitive GUI and tend to be the easiest to manage via the CLI as well, although they do have stability issues in high volume high availability environments I haven't been pleased with.
Figure on entry level pricing from any of those vendors to be in the neighborhood of 15k per appliance with feature and capacity increases taking you up beyond over 100k per device for the highest end units. All offer advanced L4-7 load balancing and application control feature sets and can run in an HA configuration.
The way this works is you configure your DNS entry to point at a virtual IP hosted on the load balancer. That load balancer has a server pool or application pool associated with the virtual IP and based on any number of metrics makes a decision about which server should get the traffic or which server it should talk to on behalf of the client to retrieve the requested data from the client.
Send mefimail if you have any further questions.
posted by iamabot at 5:01 PM on May 15, 2008
Anyway, the top application load balancer out there is made by F5. Their higher end platforms have the ability to cache data, they are extremely flexible from layers 4-7 and they have a solid HA set up.
The alternate vendors are Cisco with their CSS and ACE lines, although the ACE is new and I do not believe that it personally is a great product due to ease of configuration and tme on the market.
Citrix Netscalers are a decent product, they have the most intuitive GUI and tend to be the easiest to manage via the CLI as well, although they do have stability issues in high volume high availability environments I haven't been pleased with.
Figure on entry level pricing from any of those vendors to be in the neighborhood of 15k per appliance with feature and capacity increases taking you up beyond over 100k per device for the highest end units. All offer advanced L4-7 load balancing and application control feature sets and can run in an HA configuration.
The way this works is you configure your DNS entry to point at a virtual IP hosted on the load balancer. That load balancer has a server pool or application pool associated with the virtual IP and based on any number of metrics makes a decision about which server should get the traffic or which server it should talk to on behalf of the client to retrieve the requested data from the client.
Send mefimail if you have any further questions.
posted by iamabot at 5:01 PM on May 15, 2008
This thread is closed to new comments.
posted by aspo at 1:59 PM on April 8, 2008