Fixing an intermittent network fault
September 26, 2006 6:51 AM Subscribe
This intermittent fault on our small company network makes me bang my head on the wall. Metafilter: help me troubleshoot this annoyance.
Roughly once a day at no particular time I get the classic windows 'A network cable is unplugged' balloon appear on my desktop. About 50% of the time, this affects the rest of the 20-odd staff in the office and at least one person will lose some work due to application failure or somesuch.
The office network infrastructure is pretty basic: firewall, 2 HP Procurve 2324 unmanaged switches, one windows server.
Is it possible that some cheeky application is flooding the switches, or something?
The office network infrastructure is pretty basic: firewall, 2 HP Procurve 2324 unmanaged switches, one windows server.
Is it possible that some cheeky application is flooding the switches, or something?
Response by poster: The switches are plugged into floor sockets like everything else - there's no specific surge protection but we've no problems with anything else on the circuit.
Affected machines can be on either switch, though the switches do this independently of each other.
posted by whoojemaflip at 7:13 AM on September 26, 2006
Affected machines can be on either switch, though the switches do this independently of each other.
posted by whoojemaflip at 7:13 AM on September 26, 2006
when is the DHCP lease renewed? is it possible that when your DHCP leases renew, you're dropping connection for a few seconds?
posted by cosmicbandito at 7:22 AM on September 26, 2006
posted by cosmicbandito at 7:22 AM on September 26, 2006
Response by poster: I thought DHCP initially too and changed the lease time to 20 days. No bones - we're still losing connectivity ~daily.
posted by whoojemaflip at 7:31 AM on September 26, 2006
posted by whoojemaflip at 7:31 AM on September 26, 2006
Troubleshooting 101: eliminate one possible factor at a time.
My first suggestion: Take the switch out of the picture. Get a cheap 4-port switch. Plug your workstation into one port, the server into another, and a switchport from the HP switches into a third. See if the problem persists - if it does, it's not the HP switches. If it doesn't, you have a very likely candidate for your issue.
posted by deadmessenger at 7:39 AM on September 26, 2006
My first suggestion: Take the switch out of the picture. Get a cheap 4-port switch. Plug your workstation into one port, the server into another, and a switchport from the HP switches into a third. See if the problem persists - if it does, it's not the HP switches. If it doesn't, you have a very likely candidate for your issue.
posted by deadmessenger at 7:39 AM on September 26, 2006
Divide and conquer your problem space.
Check event logs to see exactly what is happening and to who. Map it out. Does it affect pepole on both switches? Does the server lose connectivity as well? Does it happen randomly, at specific times or in conjunction with certain network events?
Record the time and date and anything that happened around the time of the failure - ie - somebody making coffee in the room next to the server & switches.
If no pattern jumps out start systematically varying things one at a time.
posted by srboisvert at 8:43 AM on September 26, 2006
Check event logs to see exactly what is happening and to who. Map it out. Does it affect pepole on both switches? Does the server lose connectivity as well? Does it happen randomly, at specific times or in conjunction with certain network events?
Record the time and date and anything that happened around the time of the failure - ie - somebody making coffee in the room next to the server & switches.
If no pattern jumps out start systematically varying things one at a time.
posted by srboisvert at 8:43 AM on September 26, 2006
Oh and check if your switches are hot when you lose network connectivity.
posted by srboisvert at 8:50 AM on September 26, 2006
posted by srboisvert at 8:50 AM on September 26, 2006
When one person fails on a switch, does everyone else on that switch fail, or just some people?
It definitely sounds related to the switches somehow, but as others are pointing out, the cause could be external. The microwave idea is a good one. We had one lady with a space heater in her office that kept blowing our fuses. It took us quite awhile to figure it out, because she could run it a half-hour or 45 minutes before the circuits popped. It took days for us to figure it out, and only happened after she said, "hey, I wonder if...."
If you budget isn't zero, Dell has pretty decent 24-port unmanaged Fast Ethernet switches for about $80. You could swap one in for awhile and see what happens to the symptom. The 4-porter would also work, and would be cheaper, but the Dell might be more useful as a spare part (or as a true replacement) once you figure out where the problem is.
DHCP wouldn't give you a 'network cable unplugged' error, it would give you a 'no connectivity' error. It's 99.9% certain to be some kind of hardware problem, but determining precisely where the problem is may take awhile. As others are saying, divide and conquer. A cheapo switch for troubleshooting would be a good first step.
posted by Malor at 9:19 AM on September 26, 2006
It definitely sounds related to the switches somehow, but as others are pointing out, the cause could be external. The microwave idea is a good one. We had one lady with a space heater in her office that kept blowing our fuses. It took us quite awhile to figure it out, because she could run it a half-hour or 45 minutes before the circuits popped. It took days for us to figure it out, and only happened after she said, "hey, I wonder if...."
If you budget isn't zero, Dell has pretty decent 24-port unmanaged Fast Ethernet switches for about $80. You could swap one in for awhile and see what happens to the symptom. The 4-porter would also work, and would be cheaper, but the Dell might be more useful as a spare part (or as a true replacement) once you figure out where the problem is.
DHCP wouldn't give you a 'network cable unplugged' error, it would give you a 'no connectivity' error. It's 99.9% certain to be some kind of hardware problem, but determining precisely where the problem is may take awhile. As others are saying, divide and conquer. A cheapo switch for troubleshooting would be a good first step.
posted by Malor at 9:19 AM on September 26, 2006
I second the notion, step by step is the way to do it.
That windows message is almost always a link-level error, so I'd be immediately suspicious of a wiring or switch fault. You didn't mention a patch panel, but if you have one check it. Otherwise, keep an eye on the switches and see if they're power-cycling or resetting themselves. I have a netgear 5-port switch that is so picky about its power connection it'll reset if a gnat farts nearby.
posted by Skorgu at 10:04 AM on September 26, 2006
That windows message is almost always a link-level error, so I'd be immediately suspicious of a wiring or switch fault. You didn't mention a patch panel, but if you have one check it. Otherwise, keep an eye on the switches and see if they're power-cycling or resetting themselves. I have a netgear 5-port switch that is so picky about its power connection it'll reset if a gnat farts nearby.
posted by Skorgu at 10:04 AM on September 26, 2006
Best answer: "The switches are plugged into floor sockets like everything else - there's no specific surge protection but we've no problems with anything else on the circuit. ..."
posted by whoojemaflip at 10:13 AM EST on September 26 [+fave] [!]
Your firewall, switches, and server should be on a quality "on line" UPS (uninterruptible power source). Power line drop outs of 1 or 2 cycles are pretty common, and since they last only 1/60 or 1/30 (or 1/50 or 1/25, if you're on 50 cycle power) of a second, they can be unnoticeable. You will spend a little more for "on line" type UPS units, as opposed to "back ups" types, because the on line types have their inverters working from their batteries at all times, and do a better job of protecting from brown outs and line drops, so they are worth it for central infrastructure boxes.
Perhaps you actually have a UPS for the server, already, and its batteries need replacement? The batteries in a UPS age over time, and usually need to be replaced every 2 or 3 years, else the unit ceases to provide reliable protection.
posted by paulsc at 10:18 AM on September 26, 2006 [1 favorite]
posted by whoojemaflip at 10:13 AM EST on September 26 [+fave] [!]
Your firewall, switches, and server should be on a quality "on line" UPS (uninterruptible power source). Power line drop outs of 1 or 2 cycles are pretty common, and since they last only 1/60 or 1/30 (or 1/50 or 1/25, if you're on 50 cycle power) of a second, they can be unnoticeable. You will spend a little more for "on line" type UPS units, as opposed to "back ups" types, because the on line types have their inverters working from their batteries at all times, and do a better job of protecting from brown outs and line drops, so they are worth it for central infrastructure boxes.
Perhaps you actually have a UPS for the server, already, and its batteries need replacement? The batteries in a UPS age over time, and usually need to be replaced every 2 or 3 years, else the unit ceases to provide reliable protection.
posted by paulsc at 10:18 AM on September 26, 2006 [1 favorite]
You will spend a little more for "on line" type UPS units, as opposed to "back ups" types, because the on line types have their inverters working from their batteries at all times, and do a better job of protecting from brown outs and line drops, so they are worth it for central infrastructure boxes.
Needs to be repeated. The On-Line UPS units have saved my keester more times than I can count in this flakey power world.
posted by unixrat at 10:55 AM on September 26, 2006
Needs to be repeated. The On-Line UPS units have saved my keester more times than I can count in this flakey power world.
posted by unixrat at 10:55 AM on September 26, 2006
On-line UPSs are also called "line-interactive" UPSs and they're fantastic. Even here in the heart of NYC the power flickers occasionally and the only way we know is the beeping UPSs.
posted by Skorgu at 11:03 AM on September 26, 2006
posted by Skorgu at 11:03 AM on September 26, 2006
Had a similar problem before in an old building. They had two issues:
All the computers in the front office would all loose network connection or get really flaky at the end of the day. I had a tech troubleshoot it for a little bit and he was poking and prodding a lot of things. Finally I had him go step by step and found out it happened at 4:30 everyday. So what else happens at 4:30? Well the janitor comes in. What does the janitor do? He plugs in his vacuum into the ups that the switch was running on.
posted by bleucube at 1:54 PM on September 26, 2006
All the computers in the front office would all loose network connection or get really flaky at the end of the day. I had a tech troubleshoot it for a little bit and he was poking and prodding a lot of things. Finally I had him go step by step and found out it happened at 4:30 everyday. So what else happens at 4:30? Well the janitor comes in. What does the janitor do? He plugs in his vacuum into the ups that the switch was running on.
posted by bleucube at 1:54 PM on September 26, 2006
Other issue -
Evertime some used the xerox machine all the ups in the building would beep. That was fun.
posted by bleucube at 1:55 PM on September 26, 2006
Evertime some used the xerox machine all the ups in the building would beep. That was fun.
posted by bleucube at 1:55 PM on September 26, 2006
if everyone has a problem at the same time as your "network is disconnected" then it is a switch / cabling issue ... either way check the switch logs ... they should say either "switch just powered on" if it powercycled OR "[your physical switch port] just went down and then up again".
Other options include (for your coleagues) the server's connection is flaky (NIC, cabling, or switch port ... or flaky server) ... or ... perhaps some cabling in the wall is run along side some high voltage power cabling which is causing interferance.
J
posted by jannw at 2:30 PM on September 26, 2006
Other options include (for your coleagues) the server's connection is flaky (NIC, cabling, or switch port ... or flaky server) ... or ... perhaps some cabling in the wall is run along side some high voltage power cabling which is causing interferance.
J
posted by jannw at 2:30 PM on September 26, 2006
Elevators, perhaps? I never had a server problem, but a paralegal I worked with had her office next to the elevator shaft; every time a car passed, the image on her monitor would wobble visibly.
posted by lhauser at 7:58 PM on September 26, 2006
posted by lhauser at 7:58 PM on September 26, 2006
Response by poster: For sure, I twigged it may be something to do with dirty powerlines but at no point did I think of plugging the switches into the UPS. I'll be doing that just as soon as everyone leaves tonight and flashing the firmware just to be thorough.
Thanks to all y'all for helping out.
posted by whoojemaflip at 5:54 AM on September 27, 2006
Thanks to all y'all for helping out.
posted by whoojemaflip at 5:54 AM on September 27, 2006
This thread is closed to new comments.
Are the machines that are affected connected to any particular switch?
posted by unixrat at 7:00 AM on September 26, 2006