A long TCP tuning question
May 4, 2007 5:24 PM   RSS feed for this thread Subscribe

I recently bought a new gigabit router and NIC cards and they're not running as fast as I hoped.

I recently bought new gigabit Ethernet cards (trendnet teg-pcitxr’s) and a router (a dlink GameLounge).

The cards were installed in my file server running slackware 10.1.0 (2.4.29) and a Windows 2000 computer. There is also an old iMac on the network (the kind w/ the monitor coming out of the half-sphere), which I assume has a 100Mb card.

I’ve been using iperf to see what kind of transfer rates I can get. The results are that it’s fast-ish when the server (iperf –s) is Win (~252Mb/sec) and Linux the client, Pretty slow when the server is Linux (~67Mb/sec) and the clinet Win.

The iMac is the best when it’s the client (~93Mb/sec, I assume it's max) but slower when it’s serving (~67Mb/sec) to either the Win or Linux computers.

From what I could gather through online reading I need to tune my tcp settings.

On the linux computer I tried adding each of these settings in /etc/sysctl.conf and then running sysctl –p.

(from http://www.acc.umu.se/~maswan/linux-netperf.txt)

net/core/rmem_max = 8738000
net/core/wmem_max = 6553600

net/ipv4/tcp_rmem = 8192 873800 8738000
net/ipv4/tcp_wmem = 4096 655360 6553600

(from http://dsd.lbl.gov/TCP-tuning/linux.html)

# increase TCP max buffer size
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
# increase Linux autotuning TCP buffer limits
# min, default, and max number of bytes to use
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

On the windows computer I downloaded DrTCP and set the Tcp Receive Window to 65535, tried turning Windows Scaling on. The network card had an advanced setting for JumboFrames. I set that to the max, 7K, and then tried entering 7000 in the MTU in DrTCP. There was a lot of rebooting as I fiddled with all this.

In the end, nothing much changed, and I feel that particular all-drained-out-after-futzing-with-computers malaise.

Is there something I’m missing? I realize this is kind of an open-ended question because the more I read about this the more I realize how complicated it all is, but I’m hoping someone who actually knows about TCP tuning might be able to shed some light on the situation for me.
posted by JulianDay to computers & internet (23 comments total)
The obvious thing to check is speed and duplex settings. In the linux systems, you can use mii-tool for this. Or I mean, that's always the first thing I look at when I think, "hmmm, this network link should be faster." Make sure everything is set to 1000, full duplex.
posted by autojack at 5:35 PM on May 4, 2007


There is an upper limit imposed by the respective hard disks and file systems on each machine, and tuning TCP won't change that. That's why transferring big files is not a good way to test network speed.

It's not too surprising that the iMac is the slowest. It probably has the slowest hard drive. (Apple cut every corner on hardware costs that they could for that model, buying bottom-of-the-line performance components for everything.)
posted by Steven C. Den Beste at 5:38 PM on May 4, 2007


iPerf also does UDP tests, right?

How do those stack up to the TCP tests?
posted by Good Brain at 5:42 PM on May 4, 2007


Steven, that was my first impulse as well, but I googled iperf seems to be solely about IP networking performance, file sharing protocols and hard disks aren't involved.
posted by Good Brain at 5:44 PM on May 4, 2007


A couple of things.

Just because interfaces negotiate at a gigabit full duplex does not mean you will get gigabit throughput, for that same reason just because a switch says it's gigabit doesn't mean you will achieve non blocking gigabit performance.

Tuning your TCP stack will only do so much. If you want to tell if the switch is the bottleneck, buy or make a crossover cable and put the machines back to back and test there. You won't beat that performance.

Changing the MTU, enabling jumbo frames, killing a chicken will only bring you a benefit if your devices all support the dead chicken protocol.

I wouldn't change the MTU, enable jumbo frames, or do anything else until you run a back to back test and a baseline.
posted by iamabot at 5:45 PM on May 4, 2007


Oh follow up, don't set your systems to 1000 full, autonegotiate is preferable for gigabit. It used to be true that the rule of thumb in the 10/100 days was to hard code. The protocol for auto-negotiation is far more robust in the gigabit spec.
posted by iamabot at 5:47 PM on May 4, 2007


Thanks so far.

mii-tool didn't work:

#mii-tool
eth0: no link
SIOCGMIIPHY on 'eth1' failed: Operation not supportedI
(eth0 is the built in NIC, eth1 the new card)

Dmesg seems to say auto-negotiation is on, is there something I need to do w/ ifconfig?

#dmesg | grep eth1
eth1: Identified chip type is 'RTL8169s/8110s'.
eth1: RTL8169 at 0xc8c18c00, 00:18:e7:07:cf:cf, IRQ 10
eth1: Auto-negotiation Enabled.
eth1: 1000Mbps Full-duplex operation.

I hadn't thought about the crossover cable. I have some cable laying around so I'll try making one tomorrow.

The iMac wasn't slowest, what I was surprised about was that it was most saturated, assuming it's a 100Mb card, 93Mb seemed good.
posted by JulianDay at 6:38 PM on May 4, 2007


As mentioned, use a tool that tests UDP, otherwise you'll end up dealing with TCP slowstart/etc.
posted by iamabot at 7:10 PM on May 4, 2007


What category of cables are you using? I'm not sure you'd get full speed on a CAT5 cable; CAT5e or CAT6 would be better.
posted by chndrcks at 7:17 PM on May 4, 2007


I hadn't thought about the crossover cable. I have some cable laying around so I'll try making one tomorrow.

Some cards I've dealt with recently can talk directly to each other over plain old ethernet cable. I'm assuming they can auto-sense the type of cable and switch tx/rx automatically. Might as well try that before you bother making a crossover cable.
posted by knave at 7:20 PM on May 4, 2007


If your network cards are PCI, you will never even get close to gigabit; your Windows performance is probably close to as good as you'll get. Why? Because PCI isn't fast enough.

In theory, PCI can do about a gigabit, but that's total bandwidth per bus, and includes no signaling overhead. The hard drives are usually on the same bus that the network card is. So when a packet comes in, it has to travel over the PCI bus into the system memory, be massaged by the CPU, and then travel again across the PCI bus to the hard drive, competing with the inbound traffic from the network card. In practical terms, unless you have a good server-class motherboard with multiple PCI buses, usually you're not going to see much past 250-300mbit. A really good PCI-based motherboard can sometimes get around 600mbit if you use jumbo frames and take advantage of network card checksum offloading. But a motherboard that good is very expensive. Basically, gigabit is so fast that a PCI-based computer will always bottleneck somewhere before maxing out the network link.... it's just a matter of where the system chokes first.

PCIe is much faster. The motherboard has so much more bandwidth that the system can easily saturate gigabit, if the storage medium is fast enough. Even at that, it's still a good idea to do jumbo frames, as the CPU load checksumming the vast quantities of 1500-byte packets is very high. If you don't do either jumbo frames or network-card checksum offload, your throughput will be impaired, although you may still be able to max out a gigabit link with a sufficently beefy system.

Ok, all that said, you shouldn't be seeing such a big variance in speed between Linux and Windows; they should get pretty close to one another, and if you spend careful time tuning, you should be able to do at least a little better with Linux.

What I'd suggest first is to start by ruling out as much as you can... start with FTP transfers from a memory filesystem. Then do FTP from a disk filesystem. Then do FTP the other way. You're bottlenecking in one of these areas:

1. CPU;
2. Memory bandwidth;
3. PCI bandwidth;
4. Disk bandwidth;
5. Switch bandwidth;
6. I/O errors on the network card.

So you start as simple as you can. Use the simplest possible protocol and gradually add tests to find out what system is breaking.

If FTP transfer from a memory filesystem is slow, then playing with the TCP stack is definitely in order... try jumbo frames and see what changes. (Note that your switch has to support jumbo frames; not all do.) And check your errors on the network card (with ifconfig)... if you see more than a small handful of send (Tx) or receive (Rx) errors, that might be the problem. It's entirely possible that switch is choking; DLink is not exactly a quality name in network gear. Tx/Rx errors could also indicate a flaky cable connection, an outright bad cable, or just a failure to correctly autonegotiate.

As others are saying, you can rule out both the switch and the cables with a crossover cable directly between computers.

The suggestions of testing UDP are also good ones.
posted by Malor at 7:33 PM on May 4, 2007 [1 favorite has favorites]


autonegotiate is preferable for gigabitJust FYI, you don't really get to set duplex for gigabit ethernet. Over copper (which this is) it's transmitted via four full-duplex pairs. So don't go worrying about that aspect. Another consequence of this is that you don't need a crossover cable to connect two machines directly, a straight through cable will work just fine.

As for your problem... sadly, many gigabit ethernet cards won't come close to actual gigabit speeds due to hardware limitations (on-board hardware buffers etc.) I've tested throughput of lots and lots of different cards. Many of them just plain suck. So while you may improve performance (and I realise I'm not making any suggestions on how to do that) don't be too surprised if you don't.
posted by buxtonbluecat at 7:45 PM on May 4, 2007


Hmmm, you've really done a decent job as far as the check-list type stuff is concerned. Im seconding the cross-over cable. Even untuned you should be doing far better than the sub-100mbps speeds you're seeing. Please let us know how it goes, I'm in the middle of trying to find a nice cheap, low powered gig switch and D-link was up there near the top of my list.
posted by datacenter refugee at 8:54 PM on May 4, 2007


Malor: Thanks for the answer, some parts are a bit over my head, but it's fodder for future Googling.

Maybe the PCI bus on the old Linux machine is slow, but why would the rates be different depending on which computer runs iperf -s and other iperf -c? Win to linux is ~254Mb/sec, Linux to Win is ~67Mb/sec.

Isn't iperf pretty simple as far as protocols go? I was under the impression it was made expressly to test this sort of thing: how much traffic can one computer send to another over the network (and that that's why it has to run server/client between two computers over the network, I thought it would skip the disks, etc.) If anything, I thought it would be the most artificial test of throughput you could do and at worst give artificially high numbers, but I'm getting sub 100Mb/sec.

The linux computer is old and slow, but the Win one is a 2 year old AMD 64 with an decent motherboard. I wasn't hoping for 900Mb/sec but to get 67Mb/sec in one direction made me think something was wrong. And wish I'd run iperf before I switched the cards.

And the iMac only confused me more. Here's a 100Mb card getting 93MbS when the 1000Mb card in the Win computer is getting 67MbS.

Tomorrow I'll try a crossover cable and see what that does, but till then thanks for all the answers.
posted by JulianDay at 10:16 PM on May 4, 2007


malor nails it above. Youre going to be cpu or bus bound. Real world 100mbps speeds are usually 40-60mbps of pure file transfer. Gigabit PCI (for all the reasons mentioned above) tends to top out at 100-350. I'd say you're getting excellent speeds considering these are plain-jane PCI cards.
posted by damn dirty ape at 10:21 PM on May 4, 2007


Your error running mii-tool is, I think, due to the fact that you are not root when you run it. In spite of others' responses that are better and more specific than mine, I still recommend that you verify you're running the card at gigabit in Linux.

Oh, but now I see, you got ifconfig to tell you that it's running 1000/full duplex, so you're good in that regard.

Yeah, listen to these other suggestions : )
posted by autojack at 10:23 PM on May 4, 2007


I'm just n'thng the hardware limitations keeping you from seeing "real" gigabit speeds.

I tested NIC drivers for a NIC manufacturer for your various Open Source operating systems. I'd only ever see close to real gigabit speeds on the 2 processor hyper-threaded machines with speedy BUS architectures.

A lot of tinkering needed to get to these close to gigabit speeds were jumbo frames(max frame size) you'd need a switch that you could configure for larger frame sizes as well(unless your crossthrough cabling).

Also, depending on the size of the packets, an interesting trend I found was that smaller packets would allow oodles of packets to make it out the sending end, but get "bunched up" at the recieving end while the BUS was busy brokering the packets to the processors, resulting in horrible(comparatively) throughput. Bigger packets resulted in less packets to be brokered and more and more throughput until it levelled out at around 980mbps.
posted by mnology at 10:24 PM on May 4, 2007


Oh..most of the testing was using nttcp.
posted by mnology at 10:26 PM on May 4, 2007


Interestingly, I'm getting similar results on my gigabit network with PCIe machines; I get 911 megabits pulling from Win2k3, but only 320 pulling from Linux. I think Linux must be misconfigured for gigabit somehow. I'll look into it and see if I can find anything.
posted by Malor at 11:52 PM on May 4, 2007


I think something is wrong with iperf in Linux. No matter what I do, I'm not able to get more than 320 megabits pulling from Linux, while I can pull a bit more than 900 from Windows. But if I use nuttcp (another variant of ttcp, in the same family as iperf), I get 950 megabits pulling from Linux, but only 650 or so pulling from Windows.

I actually believe the second numbers more, because the Windows client is running through the Cygwin libraries, which are likely to slow things down some.

I get very weird results with UDP: I cannot get Linux to send more than 1 megabit of UDP period. I imagine it must be a rate limit setting I don't know about somewhere; the limit seems absolute with both iperf and nuttcp. And I can't get nuttcp in server mode on Linux to send UDP at all, it complains about invalid buffer length when I connect from the Windows client.

My overall impression of these tools is that they're not terribly reliable, and I wouldn't trust their results overmuch. I'm getting numbers all over the damn map. :)

I may tinker with this some more tomorrow; if I do, I'll post back.
posted by Malor at 12:58 AM on May 5, 2007


One of the other things I remember with Jumbo frames on this manufacturers is that the drivers / silicon would handle jumbo frame sizes up to 16kb's. Competitor cards / switches we tested with for compatibility were only configurable up to 9kb jumbo frame size.

Depending on the manufacturer, you still may only be able to get so close to gigabit speeds even with a smoking system. This all depends on if the gear can handle it, and if the driver has support.

You would need both ends in a crossover configuration to be at the same jumbo frame size(obvious?). If one Gb card from mfg. "A" is configurable up to 16kb both for some reason you've got a mfg "B" card configurable to 9kb jumbo frame size..thats what you're stuck at.
posted by mnology at 3:03 AM on May 5, 2007


Knave: a regular 5e cable worked as a crossover. Nice to know.

More numbers for the curious. The router doesn't seem to be the problem, it's a wee bit slower, but still much slower than I was hoping for at sub 100Mb/s.

I think it's the fact that the linux computer is ~ 7 years old, and the kernel is older and probably a stable/slower Slack configuration. Somethings I read online suggest that 2.6 has better auto configuration for window size, etc.

All around it was interesting, and I'm glad I did it because I've been wanting to build a new server and now I'm gonna research the built-in NICs on server motherboards and PCIe Gb NICs.

Thanks all.

The numbers:

Crossover connection between computers:

PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data.
64 bytes from 192.168.1.3: icmp_seq=1 ttl=128 time=0.121 ms
64 bytes from 192.168.1.3: icmp_seq=2 ttl=128 time=0.140 ms
64 bytes from 192.168.1.3: icmp_seq=3 ttl=128 time=0.140 ms
64 bytes from 192.168.1.3: icmp_seq=4 ttl=128 time=0.139 ms

--- 192.168.1.3 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2997ms
rtt min/avg/max/mdev = 0.121/0.135/0.140/0.008 ms
------------------------------------------------------------


iperf -c (Win > Linux)

Client connecting to 192.168.1.3, TCP port 5001
TCP window size: 640 KByte (default)
------------------------------------------------------------
[ 5] local 192.168.1.2 port 1132 connected with 192.168.1.3 port 5001
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-10.0 sec 305 MBytes 256 Mbits/sec
------------------------------------------------------------


iperf -s (Linux > Win)

Server listening on TCP port 5001
TCP window size: 853 KByte (default)
------------------------------------------------------------
[ 6] local 192.168.1.2 port 5001 connected with 192.168.1.3 port 1763
[ ID] Interval Transfer Bandwidth
[ 6] 0.0-10.0 sec 97.2 MBytes 81.5 Mbits/sec


-----------------------------------------------------------------------

Connection through router:

PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data.
64 bytes from 192.168.1.3: icmp_seq=1 ttl=128 time=0.124 ms
64 bytes from 192.168.1.3: icmp_seq=2 ttl=128 time=0.150 ms
64 bytes from 192.168.1.3: icmp_seq=3 ttl=128 time=0.147 ms
64 bytes from 192.168.1.3: icmp_seq=4 ttl=128 time=0.144 ms

--- 192.168.1.3 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2999ms
rtt min/avg/max/mdev = 0.124/0.141/0.150/0.013 ms


iperf -c (Win > Linux)

------------------------------------------------------------
Client connecting to 192.168.1.3, TCP port 5001
TCP window size: 640 KByte (default)
------------------------------------------------------------
[ 5] local 192.168.1.2 port 1133 connected with 192.168.1.3 port 5001
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-10.0 sec 302 MBytes 253 Mbits/sec
------------------------------------------------------------


iperf -s (Linux > Win)

Server listening on TCP port 5001
TCP window size: 853 KByte (default)
------------------------------------------------------------
[ 6] local 192.168.1.2 port 5001 connected with 192.168.1.3 port 1765
[ ID] Interval Transfer Bandwidth
[ 6] 0.0-10.0 sec 80.5 MBytes 67.4 Mbits/sec
posted by JulianDay at 9:29 AM on May 5, 2007


You might want to run a set of trials with nuttcp. iperf gives me odd results sending from Linux. It's much lower than I think it should be, where nuttcp is not. I haven't taken any more time to play with it yet.

As long as you get a reasonably good PCIe motherboard, it should have at least one NIC connected with the fast bus. My ASUS Core2 board has two gigabit interfaces, but only 1 is on PCIe... the other is just regular PCI. Kind of stupid.
posted by Malor at 3:15 AM on May 6, 2007


« Older I'm a 2 Year Green Card holder...   |   What are some examples of famo... Newer »
This thread is closed to new comments.