Can I write a backup script for OSX that will automate to S3?
March 2, 2011 8:13 AM   Subscribe

Is there a way to build an automated cloud backup job blending OS X and Amazon S3?

I found this article and tried following the instructions to build a little tool to backup to S3, but was unable to get it working on my Mac.

Am I crazy for even trying to do this? I don't care about *all* of my data, but my wife takes a large number of photos and would like a cloud backup solution for them. I've played around with S3 at work, but never using a command-line tool. In a perfect world, I'd have a job scheduled that simply backed up her photo library once a day, transferring new bits to S3.

Am I missing something? Am I trying to go too far for a command-line novice? All advice welcomed.
posted by rocketman to Computers & Internet (16 answers total) 3 users marked this as a favorite
 
A little googling found this guide, which may help you. I haven't tried it, but it looks interesting.

Alternately, there's software called Arq that will do a lot of the heavier lifting for you, but costs $29.
posted by tjenks at 8:19 AM on March 2, 2011


I use Jungle Disk. It costs $2/month or so on top of the S3 fees, but it works flawlessly- you just use a GUI to tell it what to back up and when. It is SO SO SO worth it.
posted by drjimmy11 at 8:22 AM on March 2, 2011


I don't use Arq, but I've heard nothing but good things about it.

A bit of caution about this:

rocketman: "In a perfect world, I'd have a job scheduled that simply backed up her photo library once a day, transferring new bits to S3."

iPhoto (still, I believe) stores your entire library as one file, so you'd be sending and storing the entire library each time. There wouldn't be escalating storage charges beyond the extra bits (unless you use versioning, which could get out of control quickly), but you'd incur S3 transfer fees each time.
posted by mkultra at 8:32 AM on March 2, 2011


You're looking for Jungledisk, which I've had great experience with.
posted by iamabot at 8:36 AM on March 2, 2011


You want S3CMD - which is a command line tool with a built in sync command. You simply run s3cmd sync local directory s3://remote directory on S3 servers and it only updates the files that have changed.
posted by COD at 8:42 AM on March 2, 2011


Another vote for Jungledisk. I've been using it for over a year with Amazon S3. Really easy to set up, and has some additional capabilities like synchronized folders. You can also use it with Rackspace's Cloud Files service, which I think is a bit cheaper.
posted by txsebastien at 8:43 AM on March 2, 2011


Response by poster: So if I'm only using one of these tools to backup an iPhoto library, how would I avoid transferring the (>100GB) file daily? Especially if there's only an additional 20 photos or so?

I suppose we could run the job less than once a day - I have Time Machine enabled and run hourly/daily backups, so synching to the cloud could theoretically happen once a month. But I'd prefer to do it daily or weekly.
posted by rocketman at 8:46 AM on March 2, 2011


Rather than use S3 you could start an EBS-backed micro EC2 instance once a month and rsync to it. This would copy only the changed parts of the Photo Library file, and would compress the data in transit, unlike any of the free S3 solutions I know of.
posted by nicwolff at 8:49 AM on March 2, 2011 [1 favorite]


(And for new AWS users, it's free for a year!)
posted by nicwolff at 8:58 AM on March 2, 2011


rocketman: "So if I'm only using one of these tools to backup an iPhoto library, how would I avoid transferring the (>100GB) file daily? Especially if there's only an additional 20 photos or so?"

I don't know about the underlying technology of Jungle Disk, but the standard rsync approach involves both ends calculating checksums for each block of the file and comparing. The checksum is only a couple bytes to summarize 4 kilobytes or more. So you'll pay a small price to check, and the usual price to upload changes.
posted by pwnguin at 9:00 AM on March 2, 2011


mkultra: "iPhoto (still, I believe) stores your entire library as one file, so you'd be sending and storing the entire library each time"

Well that's a downer if true. Rsync is great, but it's a very general algorithm that you either have to adapt to or replace with something smarter. In particular, compressed and merged files tend to distribute changes across the entire file. If they didn't use an rsync tuned compression that'll be a painful cost.
posted by pwnguin at 9:05 AM on March 2, 2011


JungleDisk FAQ

You actually want JungleDisk Plus, which does differential block level backups. You pay for what you store and how much you upload.
posted by iamabot at 9:24 AM on March 2, 2011


nicwolff: "you could start an EBS-backed micro EC2 instance once a month and rsync to it"

An EBS-backed instance will lose all of its data when you turn it off, and you don't want to keep that machine on 24/7.
posted by mkultra at 9:44 AM on March 2, 2011


Response by poster: It's my understanding that transfers between EC2 and S3 incur no charge, so perhaps running an instance and moving everything to S3 is the answer.
posted by rocketman at 9:51 AM on March 2, 2011


I'm a little mac stupid, so this may not be at all worthwhile as an answer. If you're just trying to add incremental backups as you add new files and they're not all 1 giant file, try goodsync.

I use it to sync to a local NAS and to my s3, mirroring only left to right, and propagating changes. (Ergo only upping new files.) Allwaysync does s3 as well, and is closer to free, but you may hit file size limits with it in its free form.

JungleDisk, as people have mentioned, has a great tool to help you do this. FWIW, if you choose rackspace as your file repository instead of S3, you pay only for storage and nothing for bandwidth. (S3 is currently 14c/Gb + bandwidth (like a cent a gig or something) while Rackspace is 15c/gig no bandwidth charge.) First 10Gb free with either host.
posted by TomMelee at 10:15 AM on March 2, 2011


An EBS-backed instance will lose all of its data when you turn it off

Only if you terminate it, not if you just stop and start it. Or, attach a second EBS, separate from the boot volume, for the backups.
posted by nicwolff at 10:36 AM on March 2, 2011


« Older Folder logic   |   Help with Latin translation Newer »
This thread is closed to new comments.