My server just wont ... let ... go
August 13, 2008 7:00 PM Subscribe
Why won't my shiny new Windows 2008 server let its TCP connections go?
I have a new Windows 2008 server (64bit, running on a big Dell rack) that acts as a file server for about 70 people - standard windows SMB, people hit it at \\servername on their XP desktops via login scripts.
Twice now, the thing has randomly 'frozen' and caused mass panic - all connected user machines freeze up when attempting to access their various shared drives on \\servername. Also, all their applications that rely on files on \\servername\mount also die/freeze/kill themselves/crash etc.
The thing is:
- All eventlogs on this box are clean. Zero. Nothing bad, no errors, application, system or otherwise.
- The network itself is fine. No high-bandwidth intruders, no heavy loads, no disconnects, no bad pings, nothing like that. Bandwidth is normal.
- Server processes are normal, nothing's eating up the local system's network or processor or HD load. No antivirus has kicked in, no special timed thing is happening, processor shows 99% idle, network is 1%usage, and so forth.
The *only* symptom:
.. a netstat shows many, many, many CLOSE_WAIT connections being held by all connected users. So desktop 'UserGuy' might have the following entries when the thing is 'frozen':
My initial hunch is that there's a similar open-tcp-connections limit of some sort that is being met, which then freezes out further TCP connections. XP users used to up this kind of internal Windows limit when torrenting and so on.
My hours of Googling shows you can up this number with registry hacking, but to me that's treating a symptom - not the cause.
So what's holding the connections open in the first place? Why won't they die? What am I missing here?
I have a new Windows 2008 server (64bit, running on a big Dell rack) that acts as a file server for about 70 people - standard windows SMB, people hit it at \\servername on their XP desktops via login scripts.
Twice now, the thing has randomly 'frozen' and caused mass panic - all connected user machines freeze up when attempting to access their various shared drives on \\servername. Also, all their applications that rely on files on \\servername\mount also die/freeze/kill themselves/crash etc.
The thing is:
- All eventlogs on this box are clean. Zero. Nothing bad, no errors, application, system or otherwise.
- The network itself is fine. No high-bandwidth intruders, no heavy loads, no disconnects, no bad pings, nothing like that. Bandwidth is normal.
- Server processes are normal, nothing's eating up the local system's network or processor or HD load. No antivirus has kicked in, no special timed thing is happening, processor shows 99% idle, network is 1%usage, and so forth.
The *only* symptom:
.. a netstat shows many, many, many CLOSE_WAIT connections being held by all connected users. So desktop 'UserGuy' might have the following entries when the thing is 'frozen':
(a dozen similar CLOSE_WAITS ...) TCP 192.168.1.201:445 UserGuy:1300 CLOSE_WAIT 4 TCP 192.168.1.201:445 UserGuy:1302 CLOSE_WAIT 4 TCP 192.168.1.201:445 UserGuy:1304 CLOSE_WAIT 4 TCP 192.168.1.201:445 UserGuy:1306 CLOSE_WAIT 4 TCP 192.168.1.201:445 UserGuy:1308 CLOSE_WAIT 4 TCP 192.168.1.201:445 UserGuy:1310 ESTABLISHED 4... and that sort of CLOSE_WAIT repeat occurs for all 70+ connected users. We reboot, log back in, and when it's fine and behaving we don't have this long litany of CLOSE_WAITs.
My initial hunch is that there's a similar open-tcp-connections limit of some sort that is being met, which then freezes out further TCP connections. XP users used to up this kind of internal Windows limit when torrenting and so on.
My hours of Googling shows you can up this number with registry hacking, but to me that's treating a symptom - not the cause.
So what's holding the connections open in the first place? Why won't they die? What am I missing here?
Best answer: Do you have Symantec Anti-Virus on this server, and/or more specifically a component called Symantec Endpoint Security? If so, remove one or both. I've seen several cases where the AV is holding the ports open for reasons known only to SAV.
Otherwise, you can bump HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\MaxUserPort to 65534 and that will give plenty of overhead for the connections to be released (eventually they will drop) before the server fills up.
posted by fireoyster at 7:54 PM on August 13, 2008
Otherwise, you can bump HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\MaxUserPort to 65534 and that will give plenty of overhead for the connections to be released (eventually they will drop) before the server fills up.
posted by fireoyster at 7:54 PM on August 13, 2008
Oh god yes. If you have any variant of Norton Antivirus any where near any of your computers, they're the first thing you should be removing if you're trying to diagnose odd network behaviour.
posted by flabdablet at 8:12 PM on August 13, 2008 [1 favorite]
posted by flabdablet at 8:12 PM on August 13, 2008 [1 favorite]
Seconding fireoyster, I have had the exact same symptoms on Server 2003 with Symantec Endpoint Protection 11 as well. We traced it to the "teefer2" driver that SEP installs.
posted by tracert at 10:40 PM on August 13, 2008
posted by tracert at 10:40 PM on August 13, 2008
Response by poster: Just as a followup - we removed Symantec AV (10.1 something or 11) and the machine started working fine again. Thanks for the help!
posted by bhance at 8:54 AM on August 23, 2008
posted by bhance at 8:54 AM on August 23, 2008
This thread is closed to new comments.
Only reason I ask is that I recently became aware that you could indefinitely postpone a forced shutdown on a Windows box by pushing the system date back into the past; and I wonder if the Windows TCP stack is also using some dodgy time-of-day clock calculation for connection timeouts.
Don't waste time on my jumped-to conclusions if somebody else comes up with something more plausible.
posted by flabdablet at 7:50 PM on August 13, 2008