Making a stand-alone rsync appliance.
October 11, 2005 11:54 AM   RSS feed for this thread Subscribe

Is there a distribution of a linux-like OS which turns hardware into an easy-to-configure rsync appliance?

I have a client (web dev agency) whose president wants 'on site backups of all of the websites we host'. They had issues with less-than-clueful providers, I guess. They're not interested in switching their hosting provider.

She essentially wants to be able to point to a box in her office and say 'our backups of all of the sites we're responsible for live here'. I don't work at this office, so I'd like to be able to control this 'remote backup appliance' via a web-admin or VNC or some such. If I have no other choice but to be on-site, that's okay, but not optimal.

I have a Shuttle-size PC available to me for this project (p3, 256MB, 80GB mirrored HDDs.) My original plan was to install a flavor of linux, then make some shell scripts to do simple wget requests on a regular (perhaps daily) basis. Some research revealed that the aggregate file size to transfer was ~1.5 TB range. This office has only business-class cable modem service (5Mb down).

Now I'm thinking about implementing rsync, which I need to learn more about. The client claims to have root on their server at the ISP, so if I need to install sw to support rsync, that shouldn't be a problem.

My previous experience is with Mac OS X Server, where turning on services is as easy as clicking a button. I have enough unix experience to get around a shell, but not enough to know where all of the config files, etc. live for various services.

I need to be able to ssh into this box and set up cron scripts to fire rsync events, but if there were a web interface for this kind of thing, all the better.

Basically, I don't want to go through the effort of manually writing the scripts if there's a smarter way to achieve my goal.
posted by Wild_Eep to computers & internet (13 comments total)
There really isn't much in the way of scripts that you're going to actually need. rsync probably does everything you need already. You don't need to set it up as a service, rsync just needs to be installed on the client and the server. I've used rsync with windows under cygwin. There may be a native port also. It almost certainly exists for OSX also.

Anyway, I backup all my clients websites this way. I have a script that I wrote, but the only reason I have a script is that certain parts need to get backed up more than others, so I have a script that runs daily, which checks the last time each update was made, and only runs rsync for the ones which have gone past their expiration window.

The basic syntax you want is something like
rsync -av remotehost:/path/to/remote/directory /path/to/local/directory
that's it. Put that in cron.
You might want to tweak it a little bit, like maybe add --delete to the options so that if a file is removed from the server it'll be removed from the client also. I don't do this so that I can recover from accidental deletions, generally.
posted by RustyBrooks at 12:09 PM on October 11, 2005


Also, expect the initial sync to be quite time consuming. After that, it should be short if there are not a lot of changes to the customer's sites.
posted by RustyBrooks at 12:11 PM on October 11, 2005


I'm a little confused about one thing:

Some research revealed that the aggregate file size to transfer was ~1.5 TB range. This office has only business-class cable modem service (5Mb down).

Huh? Do you mean KB? MB?
posted by RustyBrooks at 12:12 PM on October 11, 2005


Since you mentioned VNC, it works on linux also. So you could use VNC to connect to the server. Personally for what you're talking about, ssh would probably be easier.
posted by RustyBrooks at 12:19 PM on October 11, 2005


Also, I am a POSTING MACHINE.
posted by RustyBrooks at 12:19 PM on October 11, 2005


I meant TB as in Terabyte and Mb as in Megabit.
posted by Wild_Eep at 12:26 PM on October 11, 2005


OK. You're going to be storing a terrabyte on an 80 gigabyte hard drive then?
posted by RustyBrooks at 12:28 PM on October 11, 2005


Whoops, I should have written GB, not TB.
posted by Wild_Eep at 12:42 PM on October 11, 2005


OK, that's more like it.

Does what I wrote above make sense? I don't think you need anything super-fancy here, or a special version of linux, or anything like that. Pretty much any OS would do, windows, osx, or linux. Connection to the machine via ssh or vnc would be fairly straightforward for all of these (except ssh under windows is not all that obvious).

You'll probably want to experiment with the command line params for rsync a little bit to get exactly what you want, but I think we're talking about 10-20 minutes of labor there, tops. In comparison, finding and installing some kind of web-thang that does exactly what you want is probably kind of a waste of time.
posted by RustyBrooks at 1:13 PM on October 11, 2005


Ex-coworker of mine wrote rs as an rsync wrapper to do what you describe. It's designed to be supervised under daemontools but runs fine from cron in one-shot mode, too.

Big advantages of that over just running rsync from cron are TTL management (so transfers don't overlap), configuration files instead of a command-line, better scheduling than cron, and clear success/fail summary notification messages.
posted by mendel at 1:24 PM on October 11, 2005


I dont know of any "easy install" distributions, but I use Fedora Core 4, the latest rsync, and rsnapshot to have NetApp-style "snapshots in time" backups of my colocated machines on a box with large hard drives at home.

Once the initial (huge) transfer is done, the only things that get transferred nightly are the files that change or are added/deleted.
posted by mrbill at 1:35 PM on October 11, 2005


If your web sites connect to database backends, rsync (nor any other file copying program) cannot guarantee the database files will be copied in a consistent state, resulting in corruption or loss of data. The database files may be in the middle of a write operation at the time of copy, for example. Only the database knows for sure. I would recommend doing a periodic archive/dump/hotbackup of your database using a tool supplied with your database and including the archive files in your nightly rsync backup. Then if your database files are corrupt you can restore from your archive.
posted by ldenneau at 7:45 PM on October 11, 2005


rsnapshot has an option to call out to database-dump scripts. I use it to save nightly snapshots from four servers in New York to a backup server at Easyspeedy in Denmark - it's cheap peace-of-mind and easy to set up. E-mail me questions if you want.
posted by nicwolff at 12:04 AM on October 12, 2005


« Older Doug Copp advocates finding a ...   |   What is the best way to see ha... Newer »
This thread is closed to new comments.