Screwed network, Help.
September 17, 2007 7:18 PM   Subscribe

We have a bunch of PCs that all connect to the rest of the network through a single switch and fiber link. They're all having problems with slow networking and database links breaking. This is what Task Manager's Networking tab shows when we copy a single 80MB file down from the server. Does anyone recognise the problem?

We'd love to just start swapping hardware to test stuff, but it would disrupt too many people and we don't necessarily have any spares. Any clues so as to minimise the disruption would be most appreciated.
posted by krisjohn to Computers & Internet (12 answers total) 1 user marked this as a favorite
 
Is it the computers on your side of the switch that are bogging down or on the other side? Either way, you might be able to log into switches on either side of the link and monitor the ports to see if one of them is going awol.

One thought is that when copying large files on the network server, our virus scanner sometimes bogs things down to a standstill. Do any services jump when things bogs?
posted by jmd82 at 7:46 PM on September 17, 2007


Is it a manage switch? If so might be able to tell you more about it. Any updates recently in anti-virus software or other patches? What operating system are you using? Is it just a group of PC's that are acting up or only at a specific location?

More than willing to help, just need more info :)
posted by bleucube at 7:46 PM on September 17, 2007


Response by poster: Basically, we have slightly fewer than 24 PCs connected to a 24 port Cisco switch. The switch in turn is connected to our fiber ring. Copying to or from any PC on that switch to any PC on the other side of the fiber link displays, at best, the pattern in the image. At worst there are huge pauses during transfer.

But you've just made me realise we haven't tried copying from PC to PC within the group of 20.
posted by krisjohn at 7:54 PM on September 17, 2007


Is your fiber connection in full-duplex or half-duplex mode?
posted by RichardP at 9:04 PM on September 17, 2007


You could also be having duplex issues with your switch. Ciscos are notoriously unreliable, particularly with Linux, at autonegotiating speed. This causes big packet loss past a certain amount of traffic, which can cause something like the symptom you're seeing.... spiky traffic.

The fix would be to force the port to 100Mbit/full duplex on the ports for the server and one client, and then set the server and client themselves to also force 100mbit/full. (don't set autonegotiate on one side and fixed on the other; that will make the problem even worse.) Then test between those two machines.

If that fixes it, just set everyone to forced 100/full, and you should be fine. If it doesn't fix it, just set things back the way they were and try something else.

The fiber link is probably not your problem, I don't think those have the same negotiation issues.
posted by Malor at 9:20 PM on September 17, 2007


As you've realized, the first test you need to perform is a copy from one machine on the switch to another, to figure out what side of the switch the problem is on. If it doesn't happen within the workgroup, then you know the problem is with the fiber ring.

I also suspect duplex/autonegotiation problem, because it sounds similar to issues I've had that have been solved by manually setting modes.
posted by Kadin2048 at 10:10 PM on September 17, 2007


Response by poster: More info: Transfers from one PC to another amongst the affected group peg the network utilisation to 50% for the duration of the transfer.
posted by krisjohn at 10:52 PM on September 17, 2007


I think switch information is critical. Cisco switches can generally give you useful information in troubleshooting this, if you know where to look. If transferring your files between the locally switched computers is ok, time to look at the fiber.

It could be something as simple as the fiber connectors needing to be polished or using different connectors with less DB loss.
posted by Industrial PhD at 11:01 PM on September 17, 2007


You probably need to take a multi-step approach to determining the source of your problem, if, in fact it is a problem at all.

First, you need to be sure that file transfers from one PC on the sub-net to another are actually limited by network/switch factors, and not local issues, like disk transfer rates. So, take a couple of the PCs, disconnect them from the switch, connect them directly with an Ethernet cable (or a crossover cable, if the network cards can't autosense), and see if you can peg the network speed. A surprising number of machines can't, due to disk transfer problems (many systems will still only do sustained transfer rates of 8 or 9 MB/s off disk). You need to see 15+ MB/sec disk transfer rates to avoid this kind of issue, as Fast Ethernet (100BaseT, etc) is limited to about 12.5 MB/s. Obviously, if your workstations can't peg the network ports involved, with only a straight wire between them, and quality network cards set for full duplex, you'll need higher performance in your servers and workstations to see network layer improvements. One thing to be aware of, if you're using on board Ethernet connections from your workstations, is that not all motherboard chip sets are created equal when it comes to Ethernet connections (you may need to install quality aftermarket network cards, to get the throughput you expect). Loaded servers may have fast disks, but if servicing lots of concurrent users, will not show high individual transfer rates, as they are splitting available disk throughput among many users, unless you have enough memory that the server can cache entire transfer files in memory.

Second, once you're sure you've got available throughput on end points in your LAN, you can reconnect those end points to the switch, and start your further investigation. Malor's advice regarding manually setting your link type and speed is a good starting point. But you can do a lot of things to collect data and analyze your traffic beyond this, by downloading and installing a network analyser like Ethereal. This can quickly capture session information to a log file for analysis, and you'll quickly see if you've got basic problems like excessive packet fragmentation, excessive ACK packet traffic, or noise/dropped packets, any of which could be cutting your available network bandwidth.

But the 50% thing sure sounds like you're set, intentionally or inadvertently, for half duplex operation. Wouldn't hurt to verify port parameters on your managed switch, at an early stage of your investigation.
posted by paulsc at 11:56 PM on September 17, 2007


Here's what I would do:

1. Login to cisco switch and few do a show int. I would look at each interface for errors. If it all looks good, I would clear the counters.

2. I would look for change, even the most simplistic change can have a harmful effect on computers and the network. Would ask all staff if they perform any configuration changes, any updates, anything period. If this just started happening, something caused it to happen.

3. Definitely look at duplex mode on the switch, see what happens when you force it.

4. To eliminate it as a switch problem, take a hub onsite and hook up two computers to it and try your transfer again. If it still happens, then probably need to look at the workstations, OR you just happened to pick a workstation that is causing the whole issue - so try two more just in case. Murphys law you know.

5. If the problem still occurs using a hub, Login to safe mode with networking. Make sure all anti-virus software, anti-malware and other TSR programs are turned off and try it again. Any better?

6. If not I would update the NIC drivers, or try a different network card altogether.

That should keep you busy :)
posted by bleucube at 5:40 AM on September 18, 2007


paulsc: ethereal is now wireshark.
posted by philomathoholic at 11:32 AM on September 18, 2007


Response by poster: For anyone keeping score, the problem was a damaged media converter. Thanks everyone for your help.
posted by krisjohn at 6:10 PM on October 11, 2007


« Older Shouldn't 4 months be long enough to get over this...   |   Who covered that Cure song, 'A Forest'? Newer »
This thread is closed to new comments.