Why is sftp killing our Ubuntu servers?
May 6, 2009 12:19 PM   Subscribe

Whenever we transfer large numbers of files (500+) over sftp, our Ubuntu servers freak out. This includes everything from disconnecting the current sftp sessions to nearly taking down the Apache sites running on them. Are there any server or client settings we should be taking a look at?

The servers in question are running Ubuntu 8.04.2 on our local network.

In terms of sftp clients, we are using FileZilla (WinXP), BeyondCompare3 (WinXP), and the sftp client that resides on the Ubuntu servers.

FTP does not have this problem; it's actually ~10x faster than sftp but we need the security of sftp.
posted by hitopshelf to Computers & Internet (16 answers total) 1 user marked this as a favorite
This sounds like it might be the out of memory (OOM) killer. When the kernel sees that it's running out of memory, it starts killing processes to keep some headroom for critical system tasks. If this is happening, you should see some entries in syslog or messages.

But I don't know why sftp would stress your system that much, unless maybe all 500 transfers are running in parallel? That's probably a client-side option.
posted by lalas at 12:33 PM on May 6, 2009

What is your re-key limit set at? What is the order of Ciphers and MACs on your server? By setting the rekey limit higher, you can reduce the CPU load on your server. Similarly, try prioritizing ciphers with smaller bit key sizes.
posted by nomisxid at 12:36 PM on May 6, 2009

Does scp have the same symptoms?
posted by benzenedream at 12:55 PM on May 6, 2009

Are you using ftps (port 21, with a secure channel) or ftp over ssh (port 22, using the ssh protocol)?
posted by bensherman at 12:56 PM on May 6, 2009

Response by poster: nomisxid -- guessing it's the default but not sure where to check these settings;

benzenedream -- haven't tried scp since it's not available in the programs we use (FileZilla client, BC3, DW CS3, etc.).

bensherman -- SFTP; SSH File Transfer Protocol on port 22.

Looking at my Beyond Compare 3 profiles (what I use the most and where these problems typically occur):
- Protocol: SFTP (SSH2)
- Port: 22
- Encoding: Detect
- Time zone: Automatically Detect / Use Server Time

- Simultaneous connections: 1
- Read timeout (seconds): 40

- Preserve timestamps on upload: Enabled
- Compress transfers (MODE Z): Disabled
- Force faster uploads to older OpenSSH servers: Disabled
posted by hitopshelf at 1:20 PM on May 6, 2009

If it works with FTP but breaks with SFTP, my first guess would be that you are maxing out CPU and/or memory. And yes, if you are maxing out memory, you will run into the dreaded OOM killer.
One thing you could try is temporarily installing an alternative sftp server, like VShell. This way, you could eliminate OpenSSH as the culprit. You can download a free 30-day trial of VShell here: http://www.vandyke.com/download/vshell/download.html
posted by alienzero at 2:06 PM on May 6, 2009

What are you monitoring the boxes with? If you haven't got something like nagios, at least look into getting sar running with reasonable (1 minute) granularity. That will give you the CPU, io, and memory trends leading up to Bad Things happening.
posted by rodgerd at 3:09 PM on May 6, 2009

Transfer to or from the servers. If to, are you possibly writing the files to /tmp? That might cause problems.
posted by zippy at 3:44 PM on May 6, 2009

rsync -auvz --stats --progress --no-blocking-io
posted by Mach5 at 4:52 PM on May 6, 2009

to expand more, rsync is much more tolerant of crappy tcp connections, allows you to start up where you left off, is secure, works great, transfers files faster, and does delta transferring. if its anything more then a directory with a few files, i use rsync. it backs up my website with 180 gigs of data daily (~100megs transferred daily), and hasn't broke ONCE!
posted by Mach5 at 4:55 PM on May 6, 2009

and --no-blocking-io keeps rsync from stealing cycles from your other Important Processes
posted by Mach5 at 4:57 PM on May 6, 2009

What do the logs say?
posted by dreadpiratesully at 5:39 PM on May 6, 2009

FTP does not have this problem; it's actually ~10x faster than sftp but we need the security of sftp.

This is to be expected. FTP is computationally absent from the process, so it's literally, read this block, forward to this socket.

Rsync is nice, but insufficient if you need encrypted traffic. Maybe rsync over stunnel?
posted by pwnguin at 6:25 PM on May 6, 2009

Yes, check the logs and tell us what they say. The system log, dmesg, and maybe also the actual console. Without knowing what the error messages are we're just shooting in the dark.

pwnguin, rsync is often (usually?) run over an ssh connection (set the RSYNC_RSH variable).
posted by hattifattener at 7:39 PM on May 6, 2009

Could it be that you're running out of space for the logs?
posted by I-baLL at 9:49 AM on May 7, 2009

Best answer: If the logs don't show anything you're down to detective work. Install 'atop' from apitude and it'll start gathering statistics automatically. Now open a terminal window on the server, maximize it, run 'atop' and watch to get an idea of what your machine looks like when it's not working real hard.

Now beat the sucker. Try to reproduce the behavior and watch what changes. Atop is nice enough to highlight particular issues in color so if it's thrashing your disk or CPU or swap it should be obvious.

If it's not apparent after that you have to get real clever.

(Aside: If you're going to be dependent on these machines it may be worth your time to set up some variety of monitoring/baselining. Ganglia does some nice auto-configuration and Munin is older than dirt and far more reliable, but does need some mojo to set up. Being able to look back at months of data can be quite helpful when Weird Stuff happens.)
posted by Skorgu at 4:38 PM on May 9, 2009

« Older Stop sneaking up on me!   |   Masochism expressed through higher-level law... Newer »
This thread is closed to new comments.