Help me move a huge file across the net is as fast a time as possible.
July 15, 2009 12:23 PM   Subscribe

I'd like to move a big file (110gig) to a business associate across the internet. The file must arrive in as fast a time frame as possible in order to update a database with as little downtime as possible.

FTP is an option, but the other party feels it might chew into their user bandwidth for too long. Are there file moving/bandwidth leasing options available for this sort of one time thing?
posted by BrodieShadeTree to Technology (46 answers total) 3 users marked this as a favorite
 
Unless you both have a 100/100 internet connection or faster i think the fastest way would fedex a backup tape with the info on it.
posted by majortom1981 at 12:25 PM on July 15, 2009 [3 favorites]


i think the fastest way would fedex a backup tape with the info on it.

or a hard drive.
posted by @troy at 12:28 PM on July 15, 2009


I sugegsted a tape just because of the possibility of the hdd getting damaged in transit but if he feels shipping an external hdd and all the data being intact when getting there then go ahead.
posted by majortom1981 at 12:29 PM on July 15, 2009


Would it be possible to transmit it in compressed parts during off-peak hours via FTP? You could throttle the transfer rate during peak hours.

If you're not going to ship it physically, that seems ideal.
posted by JauntyFedora at 12:29 PM on July 15, 2009


Buy two external hard drives, copy the file on to each of them, encrypt it if you must.

Overnight the external drives using different carriers.
posted by iamabot at 12:30 PM on July 15, 2009 [2 favorites]


more details about your database and how this file is going to update it may be helpful, as there may be a different way to do what you want to do.
posted by trevyn at 12:31 PM on July 15, 2009


Response by poster: Ok. The reason we would like to rule out shipping.

1. Shipping means 3 days downtime - Assume its not possible for the sake of this question. (The database can be down on Sun, possibly Sat. But close of business is Friday at 6 and the last fedex out for Sat arrival is 4:30 Friday meaning a Monday arrival!)

2. This is a .ldf and .mdf file for a patient management database in MSSql.
posted by BrodieShadeTree at 12:38 PM on July 15, 2009


Best answer: 110 gig is a lot of bandwidth for an internet transmission.

Let's assume for a moment that you both have a 100Mbit symmetric connection (e.g. 100Mb upstream & downstream). Mind you, that's a serious business-class connection, exhibiting transmission rates I usually don't see over the internet.

Rough math, assuming perfect transmission, zero errors, zero overhead, etc.

100Mbit = 12.5 Megabytes/sec (again, assuming theoretical max).

110 GB = 110 x 1024 = 112,640 MB

112,640 / 12.5 = 9,011 seconds

9,011 / 3600 = 2.5 hours


The problem is that you haven't described either of your connections - what is your peak upload speed, and what is their peak download speed?

The reason being, that unless you're in a serious datacenter environment, with serious network connections, you're not going to see anywhere close to that speed.

Your limits are your upload and their download bandwidth. If we knew that, we could offer other suggestions. FTP, bittorrrent, etc - they're all limited by how fat your pipes are.

In the words of Andrew Tanenbaum, "Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway."

Do you really need to send them all 110G? Can you send them only the data that's changed since the last interval?
posted by swngnmonk at 12:41 PM on July 15, 2009 [1 favorite]


right

fedex the big file on thursday, your other location gets it friday, and then you run rsync or something similar to sync up with your records on saturday
posted by Oktober at 12:41 PM on July 15, 2009


One person could hand deliver it in a few hours. Couple of hundred dollars max cost.
posted by blue_beetle at 12:42 PM on July 15, 2009


As I was constructing my post right as you responded:

What about live synchronization/replication to the remote location? This is how gigantic DB's are usually kept up-to-date in a remote situation like this.
posted by swngnmonk at 12:43 PM on July 15, 2009


Best answer: When we have to do this we put someone on a plane with the hard drives...

Your options are as follows:

1) Get the bulk of the data over there beforehand and then just move the delta or the replay log.
2) Make sure you have enough bandwidth and FTP/SCP the data or the delta of the data.
3) Put the data on portable drives and use a shipping carrier.
4) Put the data on portable drives and put someone on a plane.

You don't mention how much bandwidth you have, you probably do not want to go through an intermediary for this, although you certainly could, it just means you have to complete 2 file transfers in the window instead of one.
posted by iamabot at 12:43 PM on July 15, 2009


Other question - what's the geographic distance between the two locations?
posted by swngnmonk at 12:43 PM on July 15, 2009


Even at average hard drive write speeds (20Mb/s) it would take 83 minutes. Unless you've got some serious cash or serious connections, there's no way you're going to do it faster than physical delivery.

"Never underestimate the bandwidth of a truck full of tapes hurling down the highway"
- Andrew S. Tannenbaum
posted by blue_beetle at 12:45 PM on July 15, 2009 [2 favorites]


Is the data compressible? Can you export the db into some kind of flat-file and bzip2 it? You'll lose the import/export time, but I imagine that you'll more than make it up when bandwidth is factored in.

Nth-ing Tannenbaum's law. Ship the current database on a hard drive now - don't wait for the weekend. Then figure out an intelligent way to just transmit the diffs over the wire.
posted by chrisamiller at 12:49 PM on July 15, 2009


Response by poster: So, lets say I have late Friday, Sat and Sun to have the database migration done by Monday open of business.

Using swngmonks numbers above, it would likely take 37 hours to upload. Jeez.

Other alternatives could be like Oktober lists.

- Upload DB now, ahead of time. Isolate changes using Rsync or similar, and then send those changes across the net.

Other ideas?
posted by BrodieShadeTree at 12:51 PM on July 15, 2009


Response by poster: Distance is Arizona to Illinois
posted by BrodieShadeTree at 12:52 PM on July 15, 2009


You can get same day shipping between cities in the US. You will pay a good chunk of money for it.

Even if you find someone with a fat pipe that's willing to rent you some bandwidth on the spur of the moment you and your client are going to be carrying drives to the pipe endpoints. You might as well save them the hassle and ship it to them.
posted by rdr at 12:52 PM on July 15, 2009


Just a question: You mention this is a patient database... do you have to worry about HIPAA with respect to PHI?
posted by jangie at 12:53 PM on July 15, 2009


Response by poster: Jangie, of course we do. Thats already addressed in encryption.
posted by BrodieShadeTree at 12:54 PM on July 15, 2009


Response by poster: Rdr, again , for the sake of this question, assume FEDEX, UPS etc are out. Potentially courier could be available at a HIGH cost.
posted by BrodieShadeTree at 12:55 PM on July 15, 2009


ROAD TRIP!

It really does look as though the fastest way is a plane trip.

Is this a once-thing, or a weekly thing?
posted by notsnot at 12:55 PM on July 15, 2009


Response by poster: Its a two times thing. Once to test time it takes and second to actually do it.
posted by BrodieShadeTree at 12:55 PM on July 15, 2009


No really, the way to do this is to put someone on a plane, or multiple people on different carriers, but have TWO distinct paths to getting that data there.

This is (in ridiculous make it sound easy short anecdotal form) how we handle Big Ass Datacenter Migrations.

Gotta move 400 servers, associated databases, SAN gear/etc from Atlanta to Los Angeles from Friday 3pm and have everything back up and running by Monday 8am?

Take backups of everything, copy all the critical data on to multiple sets of portable hard drives or another enterprise storage system (large RAID array you can shove in to a human sized pelican case).

Send the physical gear in the truck (or a chartered plane) in a driving (flying) across the country marathon, stand up the gear when it arrives (this is so much more complex than this statement i think I will start drinking).

Panic a whole bunch, figure out a couple answers, move on to more panic.

Meanwhile dude who was escorting the backups arrives at the destination with his set of external drives or his big ass raid in a pelican case and we have the data there via two separate paths after sipping cocktails in first class.

What you have to ask yourself is how damaging is it to the business if you can't have that DB spun up Monday morning. If it's TOTALLY CRITICAL you're going to want to ove that data a number of ways to make sure it gets there and that if one options fails in a spectacular and unanticipated fashion you have another option.

That means whatever you select as your transport mechanism, test it beforehand! The whole thing, from step A of the project plan to Step Z of the project plan and then conduct an after exercise figuring out where you can improve gaps in it.
posted by iamabot at 1:05 PM on July 15, 2009 [3 favorites]


Best answer: If there is a service for doing one time transfers over the net, I haven't heard of it.

How big is each file? Unless you need the transaction log you should probably do a full backup & truncate so you don't have to move all that data? How big are the files if you compress them with something modern like bzip2, 7zip or rar?

Often the way this would be handled is to take a backup to disk or tape and ship that via fed ex. Once the data is on a new server, you'd do an incremental backup and just copy the changed data over the net. Unless the whole database is changing quickly, the delta is probably going be pretty small and easy to transfer.

I think most competent DBAs should be capable of doing this.
posted by Good Brain at 1:06 PM on July 15, 2009


Look at the database replication features of whatever RDBMS you're using. I'm pretty sure shipping a 110gb dump file is doing it wrong...
posted by word_virus at 1:16 PM on July 15, 2009


Arizona -> Il. Fly it. Flight should be like 200$ round trip. Could overnight it, but you'll drop half the cost on shipping with no guarantee it'll be coddled. Actually- There's a guarantee it'll be treated like a football for most of it's journey.
posted by GilloD at 1:17 PM on July 15, 2009


Replication will be an option in the future, but for this one giant update I think a human on an airplane with two portable drives is the fastest and most reliable (and most secure) means of getting this done. Any other sort of file transfer could fail, without enough time left to start over.
posted by Lyn Never at 1:38 PM on July 15, 2009


SQL Server backups are usually compressible. Often they are compressible by 9 or 10 times, i.e. 100 GB files can often be compressed to 100 megabytes or so (depending on what kind of data the database is holding. You may be able to backup, compress, SFTP, restore much faster than moving the whole mdf/ldf files.
posted by tayknight at 1:40 PM on July 15, 2009


Response by poster: The file is MsSQL.
It is 110 gig in back up format and grows about 2 gig a week.
Could I ship the backup, and then us MSSQL to sync the changes once it is up on the other side. The actual migration is not happening until a month from now. If I shipped now, and setup a sync, I'd be ready for the migration with an up to date DB.

Does anyone know if MSSQL offers a sync feature that would accomodate this?
posted by BrodieShadeTree at 1:46 PM on July 15, 2009


What if you flew it out there by hand? Less downtime than fedex.
posted by Lord_Pall at 1:48 PM on July 15, 2009


Why not look into transmitting an equivalent of a diff file? 110 Gb is just overkill for network transmission — in other words, perhaps you're trying to solve the wrong problem. Look into overnight delivery of tape or disk if you need to send that much data.
posted by Blazecock Pileon at 2:22 PM on July 15, 2009


This is such a timely question. As I type, I'm copying a 157 GB MSSQL backup file from one LAN center to another. So far it's been running for almost 25 hours and it looks like I have about another hour to go. In any event, I'd suggest that you use a backup compression product like the one from BMC or Quest LiteSpeed. My 157 GB backup is from a 950 GB database, so the compression rate is 6-7x. The one thing I would NOT do is try to ship the MDF and LDFs; those are always going to be bigger than your backup.
posted by Maisie at 2:53 PM on July 15, 2009


Oh, I should have mentioned that I'm copying the backup across my company's highly saturated WAN.

Anyway...after you backup the database at the source, you can just keep dumping out your transaction log on your source and then ship those separately. They'll be much, much smaller than the initial backup (which you're going to compress, right?) and they would be suitable for shipping over the internet.
posted by Maisie at 2:57 PM on July 15, 2009


Best answer: Assuming you want to ship tapes/drives, Southwest Airlines NFG (next flight direct) service from Phoenix to Chicago Midway is $90.73 this Friday evening/Saturday morning for a 10 lb. , 1 cu. ft. package. You need to have your package at the Phoenix air cargo facility at least 1 hour in advance of flight time, and if you have it at Phoenix by 6:00 a.m. Saturday morning, it will spend 4:20 on a direct flight leaving Phoenix at 7:00 a.m Saturday, and be available for pickup in Chicago a little after 12:00 noon Saturday. Presuming you can have someone in Chicago to receive and transport your package to your Illinois facility, that's probably your best compromise on shipping time/expense/availability in Illinois. Assuming your Illinois contact isn't having to transport to the farthest reaches of the state, you'd still have something in excess of 36 hours to do the restore, and bring up the database in Illinois.

You should specify direct flights only, to avoid issues with cargo transfers in route.
posted by paulsc at 2:57 PM on July 15, 2009 [3 favorites]


BrodieShadeTree,

Replication is a pretty stock-standard part of any DB environment where reliability is a requirement. Even in an environment where your backup server is in the same place, it's generally Not Acceptable(tm) to be down for the time it takes to restore the latest backup. MS SQL supports replication as well.

Caveat - of all the DB's I've worked with, MSSQL is not one of them, so YMMV.
posted by swngnmonk at 3:00 PM on July 15, 2009


If you had a friend with access to a telco hotel, call him up and get him a case of single malt scotch. Then run your transfer from there. They're definitely getting fat pipes so your upload will be very very fast. If your downstream were good, I think you could accomplish the transfer in under 10-12 hours.
posted by zpousman at 3:00 PM on July 15, 2009


Ok, 1 more thing. You said that your MDF and LDF together are the same size as your database backup? And your database grows by 2 GB a week? That's not good. You need to have your DBA resize your database so that it doesn't have to autogrow all the live-long day. I'll bet your database is fragmented all to hell at this point if you've relied on autogrow this whole time.
posted by Maisie at 3:05 PM on July 15, 2009


Have each end sneak into a local university's computer lab?

Split the file into pieces, and use many internet connections?

Send Monday-Thursday on a disc, and send Friday's data separately?

Or just learn how to use rsync?
posted by miyabo at 3:26 PM on July 15, 2009


Often they are compressible by 9 or 10 times, i.e. 100 GB files can often be compressed to 100 megabytes or so

Typo/thinko or a rather unusual definition of GB?
posted by effbot at 3:31 PM on July 15, 2009 [1 favorite]


paulsc, if the OP is shipping the data by air, he should send the MDF and LDF files so that all they have to do on the receiving end it attach them. OP, whichever way you choose to do this, make sure you also send along a script to recreate the logins, dump devices and any SQLAgent jobs you need.

And now I really will butt out!
posted by Maisie at 3:42 PM on July 15, 2009


Have you optimized your DB lately? I recently had a customer have a DBA come in and adjust their SQL database. The filesize shrank to less than 20% of its original size. 20-30GB, or a few DVD's worth, is a lot more manageable than 110GB.
posted by JuiceBoxHero at 4:30 PM on July 15, 2009


Not to mention, you can go to O'Hare airport before 10 pm and FedEx a package to almost anywhere and have it there the next morning. Presumably, it works the other way. Probably doesn't cost any more or less than the Southwest NFD service. (Which is awesome, I never thought that could be that cheap...!)

Definitely, though, there are more nimble ways to handle this, as others suggest. Even if the actual data size is 110gb and not able to be optimized down, shipping a couple of HDDs on Monday via, say, Priority Mail and FedEx 2nd day probably wouldn't cost more that $12 a piece. They arrive Wednesday, you move the data onto the new machine, and let it synchronize the delta until following Monday. I'm no database expert, but there has to be a way to sync that kind of data fairly easily if you have the bulk of it ahead of time.

Oh, and if it grows at 2gb a week, you are in for some trouble in a year or two. That's gonna be one big-ass file. Probably ought to think of another alternative.
posted by gjc at 5:27 PM on July 15, 2009


Just to answer your initial question in the most direct way possible: yes, there are ways to bandwidth-limit both FTP and rsync. I'm not totally sure with FTP (I think you do it in the server configuration), but with rsync or any other utility that uses scp for transport, you can use the -l option to limit bandwidth.

But as others have pointed out, I very much doubt that you'll beat physical media transport for 110GB of data. Direct air freight/cargo is how I'd probably go as well, although if cost is a factor you can always go Greyhound. In your case it seems like they might not have a direct Phoenix - Chicago route though, so you may prefer to stick with Southwest as previously suggested. (But if you ever have to do it again, you might want to keep it in mind; I didn't know until someone pointed it out to me that they even do shipping. Amtrak offers a similar service which can be very fast if you ever need to move stuff up and down the east or west coasts.)

I recently needed to find a way to move more than a TB of data, and after investigating all the options it turned out that shipping a large-capacity NAS back and forth was the best, fastest option.
posted by Kadin2048 at 6:00 PM on July 15, 2009


Thanks effbot. I noticed after I typed it. I dunno if it was a type or a thinko. I love the term 'thinko'. Gonna have to start using that.

MSSQL DB Backups are usually very compressible (especially if autogrow is turned on and autoshrink is turned of). 100GB may compress to 10GB.
posted by tayknight at 8:01 AM on July 16, 2009


FTP is an option, but the other party feels it might chew into their user bandwidth for too long. Are there file moving/bandwidth leasing options available for this sort of one time thing?

This statement doesn't really make sense to me. 110gig of data is going to take approximately 110gig of bandwidth plus a little overhead to transfer, whatever the method. Yes, the protocol may make a very small difference in the amount of data transferred, but basically recipients of the file are going to have to consume the requisite bandwidth no matter what you do. Of course you could avoid this by taking the physical delivery route that others have suggested.
posted by Vorteks at 1:12 PM on July 16, 2009


« Older Help with Health Care!   |   Ever hear of elevator shaft apartment conversions... Newer »
This thread is closed to new comments.