Copying a massive directory over the internet
January 24, 2010 11:21 PM   Subscribe

How can I reliably transfer a massive directory with lots of files over the internet? It's a one-time thing.

I have a directory on a computer on the east coast containing over 40,000 files (mostly photos), and taking up approximately 200 GB. I want to transfer the entire directory over the internet to a computer on the west coast. Since the destination computer is a laptop and the connection is unreliable, the transfer needs to be resumable.

This is only a one-time transfer, so I don't want to setup a huge system for just one use.

What's the best way to do this?
posted by aatreya to Computers & Internet (21 answers total) 2 users marked this as a favorite
posted by b1tr0t at 11:29 PM on January 24, 2010

Mail a hard drive. It's gonna take forever over the internet.
posted by rhizome at 11:30 PM on January 24, 2010 [5 favorites]

I would also suggest rsync. That said, it will take forever. My cable modem upload rate is 1mb/s. To transfer 200GB would take... 200 GB * 1024 MB/GB * 8 bits/byte * 1 sec/mb * 1 hr/3600 sec * 1 day/24 hr = 18 days. That's assuming your saturate your connection, and if you do you're internet will be unusable. So at a suggested rate of .33mb/sec, that's 54 days.
posted by sbutler at 11:36 PM on January 24, 2010

Buy a USB drive and write it onto that. Then ship the drive to the destination user by UPS or Fedex. Unless you have a blisteringly fast upload rate, that will be faster than trying to transfer it over the internet, and a lot less of a headache.

If you have 1.5 megabits of uplink, then 200 GB would take 16 and a half days to transfer. (Assuming your ISP doesn't pull the plug on you before that.)
posted by Chocolate Pickle at 11:36 PM on January 24, 2010

Either of the first two answers will do you, but if I were you, I'd modify rhizome's solution and FedEx a drive in a Pelican case. It works for the movies.
posted by fairytale of los angeles at 11:36 PM on January 24, 2010

Mr. F, as a man who moves a ton of data every day for his living, further suggests you get a FireWire 800 drive, assuming your two machines will handle FW800, as that's an hour and a half of copy time and then transit to FedEx/ shipping paperwork/ not having to deal with it until it gets to the other side.
posted by fairytale of los angeles at 11:41 PM on January 24, 2010

Thanks for the help so far. I'll look into rsync. I thought about mailing a drive, but at the moment I only have remote access to the computer. Both computers are on university connections, though, so bandwidth isn't too much of an issue. I don't really care how long it takes as long as I get it eventually.
posted by aatreya at 11:43 PM on January 24, 2010

You could create a torrent file for the directory and seed from the host computer, and then open up the torrent file on the receiving computer (just email it; torrent files are small). That way, it's a resumable download, but, like any other internet-based method, this will take a while.

I'd just mail a hard drive.
posted by reductiondesign at 11:44 PM on January 24, 2010

Well in that case, over a university connection you should easily be able to get this done in a couple days. Much faster if you can get a 100mb connection and the universities are connected via Internet2.
posted by sbutler at 11:46 PM on January 24, 2010 [1 favorite]

You could create a torrent file for the directory and seed from the host computer, and then open up the torrent file on the receiving computer (just email it; torrent files are small). That way, it's a resumable download, but, like any other internet-based method, this will take a while.

I would note that, wrt to torrents, my Big Ten university severely limits at egress the rate for bittorrent transfers. You might notice a big speed difference with rsync via bittorrent for this reason.
posted by sbutler at 11:48 PM on January 24, 2010

Yeah, rsync or mail a drive. I don't think bittorrent gets you anything in this situation that rsync doesn't do better.

If the laptop's connection is unreliable, it might also be worthwhile to transfer to a more-reliable computer that's near the laptop, then transfer to the laptop over local ethernet or something.
posted by hattifattener at 12:21 AM on January 25, 2010

Assuming fast university connections, I second using rsync.

In case bandwidth would be limited, I would suggest uploading the files to an external storage site, like Amazon S3. The upload and download(s) will be limited by your bandwidth, not by Amazon's. I use it all the time to share too-large-for-email files, especially when sharing with multiple people.
posted by willem at 12:54 AM on January 25, 2010

Are all of the photos jpegs, or are some of them PSD/raw files? The latter will compress well and you should consider rar-ing/7z-ing the whole tree if that's the case.
posted by Rhomboid at 1:27 AM on January 25, 2010

Amazon s3 is a great way to store data online, and S3fox is a great fronted (It plugs into firefox, but basically works like an FTP program).

Amazon also allows you to mail them a hard drive and they'll load it onto S3 for you, but the process is somewhat technical (you have to digitally sign a manifest file)

But in this case, it would probably be better to mail a drive directly.

USB is probably better then firewire, unless you know both machines have firewire.
posted by delmoi at 1:55 AM on January 25, 2010

I don't know if you have any Unix experience, but I'd suggest doing "rsync -rvzhP --size-only source-path remote-machine:remote-path". The -r makes it recursive (so it descends through the directory), the -vh makes it tell you what's going on, the -z does some compression, the -P means you can resume in the middle of a file (nice if you have any large files, though it sounds like you might be dealing just with a bunch of smallish files), and the --size-only means that once a file has transferred completely, you don't have to hash it again when you resume; it'll just check that the sizes are the same. If you're feeling anal (and with that many files, you should be), I'd suggest doing another "cleanup" run once it's all transferred, without the --size-only flag. Just to be safe.

There are also rsync clients for windows; I believe installing Cygwin will get you one. But better to do this with Unix if you can. However, the remote machine will need to be able to deal with rsync; this is easier if it's Unix, or at least can handle incoming ssh.
posted by spaceman_spiff at 3:08 AM on January 25, 2010 [1 favorite]

Another option is robocopy.
posted by fluffycreature at 4:58 AM on January 25, 2010

If the connection's intermittent, using s3 as an always-on intermediary might be faster. Using Jungle Disk might be helpful, preserving datestamps, or s3fox, which doesn't preserve datestamps.

If you're in a hurry, try s3 & then rsync to fix the dates. Otherwise, just rsync.

Note, s3 isn't free - if it took you the full month, a straightforward transfer could cost something like $50? (~$18 for 100 gigs average storage, & ~$35 for 200 gigs outgoing bandwidth)

I'd consider sending a hard drive or, um, stack of dvdrs, but i'm admittedly fond of offline backup anyway.
posted by Pronoiac at 7:18 AM on January 25, 2010

There's a joke about a staiton wagon full of tapes that applied here. Dupe the source drive to a naked hard drive (or one in an external case) and mail it to the recipient. Bonus: they have a backup in case the screw up the import!
posted by wenestvedt at 8:58 AM on January 25, 2010

Nthing rsync.

Also consider using the --bwlimit=# option in rsync. I'd use a speedtest website to estimate what your max bandwidth is, then set the transfer limit to some percentage of it so that you're still able to get decent speed for internet browsing, gaming, or whatever you may want to use your connection for.
posted by chrisamiller at 10:44 AM on January 25, 2010

Nthing rsync.

I immediately thought of wenestvedt's quote when I saw the title:
Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway. — Andrew S. Tanenbaum
But if you're on uni bandwidth, rsync is what you'd like.
posted by Brian Puccio at 3:29 PM on January 25, 2010 [1 favorite]

I'll second "limit your bandwidth." You'll likely show up on the uni's IT radar anyway, but that might help.
posted by Pronoiac at 2:40 PM on January 26, 2010

« Older Help Connecting A Solar Junction Box   |   What is the best audio editing workflow for making... Newer »
This thread is closed to new comments.