Help me find and resolve a traffic bottleneck at my co-location. We launched a new web-app (it registers users and delivers flash content) at my work and the server fell over at about 17 conn/sec. Help me prevent this from happening again.
The network is at a colo on a T1 -- our bandwidth WAS, as best as I can tell, completely saturated when the server started to drop connections*, however, I'm concerned that there may be other issues (it seemed like it was possible to hit our other servers -- even hit other sites on that server, while connections were getting dropped to this application site) in play.
I had a top running on the machine the entire time and it never seemed to go about 10%. Running netstat, when I was first called about dropped connections, showed tons of "TIME WAIT" connections. Our application is pretty simple and does not have a lot of crazy queries -- it basically handles logins and does registrations. We deliver about 4 megs of flash content (in 100k swf blocks) after the login. MySQL connections are not explicitly closed after queries in the script.
The app seemed to still run pretty quickly *after* one was able to establish a connection (if you couldn't, you'd get a server timeout -- and a message saying that the server could not be found). After about a half hour the server settled down (conns dropped below 10-12/sec) and started running nicely again.
As far as I can tell, either apache is hobbling itself for some stupid reason or there's a bandwidth bottleneck. We've got the machine firewalled by a PIX 501 on which the CPU and mem usage were steady throughout the traffic spike (supposedly it should be able to do 60 mbps throughput, anyhow). There is a linksys prosumer switch behind the PIX.
Here's the specs on the server hardware:
1 GIG RAM, CPU: Intel(R) Xeon(TM) CPU 2.40GHz
df returns:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/md1 9614052 1741272 7384412 20% /
tmpfs 518068 0 518068 0% /dev/shm
tmpfs 518068 12588 505480 3% /lib/modules/2.6.12-10-386/volatile
/dev/md0 45037 19160 23474 45% /boot
/dev/md2 66295352 355780 62571952 1% /var
and software:
LAMP - Breezy Badger Ubuntu/Apache 2/MySQL 4.0/PHP 4.0.
We're running ISPConfig on this machine. The directory that serves the site in question is set up via a vhost with ISPConfig.
We have root on this machine.
Here's the relevant (?) Maxclients entries from apache2.conf
StartServers 5
MinSpareServers 5
MaxSpareServers 10
MaxClients 20
MaxRequestsPerChild 0
StartServers 2
MaxClients 150
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild 0
What can I do? What should I look at? I've already checked out the
other server load question, but my situation is slightly different (I am planning on trying out the benchmarking utils after the current traffic dies down). Again, I want to think that more bandwidth will solve the problem, but we'll be in a hell of a spot if it's not, so I need to check all the potential problems.
Oh yeah, we've got about a week to get this right.
(CUE CRAZY MONTAGE MUSIC GO GO GO)
* We are definitely going to try to increase the bandwidth to the server -- we're on percentage now, but I think our speeds are capped at 1.5 mbps or something, because we never do much above that, even during spikes.
those two apache.conf code fragments should read as:
<IfModule prefork.c>
StartServers 5
MinSpareServers 5
MaxSpareServers 10
MaxClients 20
MaxRequestsPerChild 0
</IfModule>
<IfModule worker.c>
StartServers 2
MaxClients 150
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild 0
</ifmodule>
posted by fishfucker at 4:21 PM on February 14, 2006