Backup for Dummies?
April 30, 2008 3:53 PM Subscribe
We're a small research lab, that uses a series of servers. A smart person would have these machines backed up regularly. But we're not that smart ...
There's 3 servers so far, soon to be 4, chiefly running Redhat Enterprise. Previous laboratories I worked in have mostly (yikes) just not worried about backup. In the best of possible worlds, we would call upon the institutes IT team. However they're under-resourced, over-stretched and (as unfortunately demonstrated recently) we can't rely on them backing anything up. So it's ended up on my plate and after a week of reading man pages, I'm keenly aware that (1) this is stretching my meagre sysadmin skills, (2) I don't enjoy doing this and (3) every day I spend on this is a day less for an already tight research schedule. So what's a transparent, easy to setup system for backing up several servers?
Offsite backups are totally unacceptable due to data privacy. We have some cash for a RAID or NAS drive, and there is a spare PC running Fedora that can be used as a backup server. However, all the methods I've investigated so far have quickly run up against the limits of my Unix knowledge. RESTORE needed to be run as a virtual machine. BackupPC looked okay, but I failed to setup the CGI program that's needed to administer it. Amanda baffled me.
To summarize: Is there a good, turnkey backup solution for multiple servers? If there's not a good one, which is the most painless? What are some good non-technical resources?
There's 3 servers so far, soon to be 4, chiefly running Redhat Enterprise. Previous laboratories I worked in have mostly (yikes) just not worried about backup. In the best of possible worlds, we would call upon the institutes IT team. However they're under-resourced, over-stretched and (as unfortunately demonstrated recently) we can't rely on them backing anything up. So it's ended up on my plate and after a week of reading man pages, I'm keenly aware that (1) this is stretching my meagre sysadmin skills, (2) I don't enjoy doing this and (3) every day I spend on this is a day less for an already tight research schedule. So what's a transparent, easy to setup system for backing up several servers?
Offsite backups are totally unacceptable due to data privacy. We have some cash for a RAID or NAS drive, and there is a spare PC running Fedora that can be used as a backup server. However, all the methods I've investigated so far have quickly run up against the limits of my Unix knowledge. RESTORE needed to be run as a virtual machine. BackupPC looked okay, but I failed to setup the CGI program that's needed to administer it. Amanda baffled me.
To summarize: Is there a good, turnkey backup solution for multiple servers? If there's not a good one, which is the most painless? What are some good non-technical resources?
I use Bacula for both personal machines (running a variety of versions of linux), and for corporate stuff (Win2k3 including VSS, CentOS, RHEL, others). It's robust enough to handle about 54TB worth of backups using only three front-ends (and we could probably whittle that down to two if we wanted to buy beefier servers), and at the same time simple enough to back one hard drive up to another in my workstation (screw RAID1, I want 90 days of being able to say "oh shit").
The tutorials and manuals were good enough for me, and the price was nice.
posted by togdon at 4:05 PM on April 30, 2008 [1 favorite]
The tutorials and manuals were good enough for me, and the price was nice.
posted by togdon at 4:05 PM on April 30, 2008 [1 favorite]
rsync + cron
Quick, easy, and better than nothing.
posted by qxntpqbbbqxl at 4:10 PM on April 30, 2008
Quick, easy, and better than nothing.
posted by qxntpqbbbqxl at 4:10 PM on April 30, 2008
Offsite backups are totally unacceptable due to data privacy. Are you sure about this? If the data is encrypted, as it is with most services, you may not run afoul of HIPAA.
That said, rsync and cron is probably the way to go. Set up your fileserver, then use rsync to do a nightly backup of what's changed. If you want to get fancy, you can do rotating backups, so that you've got versions of your data from a week or month ago. There are many good tutorials on setting this up with a quick google search.
posted by chrisamiller at 4:37 PM on April 30, 2008
That said, rsync and cron is probably the way to go. Set up your fileserver, then use rsync to do a nightly backup of what's changed. If you want to get fancy, you can do rotating backups, so that you've got versions of your data from a week or month ago. There are many good tutorials on setting this up with a quick google search.
posted by chrisamiller at 4:37 PM on April 30, 2008
My old company used Arkeia as a commercial solution. It works fine, and you can backup to USB hard drives instead of having to bother with a tape library.
I've had reasonable success with rsync + cron, supplemented with some perl.
Note that you really should have offsite backup. Data privacy should be sufficiently safe with proper encryption.
posted by chengjih at 4:39 PM on April 30, 2008
I've had reasonable success with rsync + cron, supplemented with some perl.
Note that you really should have offsite backup. Data privacy should be sufficiently safe with proper encryption.
posted by chengjih at 4:39 PM on April 30, 2008
Response by poster: Offsite backups are totally unacceptable due to data privacy. Are you sure about this?
Note that you really should have offsite backup. Data privacy should be sufficiently safe with proper encryption.
I know it's a good idea, but for the moment offsite is completely out of the question and completely out of my hands. Institute policy. I might be able to get something put into another building about 250 meters away, but that's about as good as I can do.
Thanks for the suggestions so far. (I confess to being dubious about rsync + cron: I'm already trying to debug someone else's cron scripts. Adding my own home-rolled backup script to the mix fills me with dread ...)
posted by outlier at 4:56 PM on April 30, 2008
Note that you really should have offsite backup. Data privacy should be sufficiently safe with proper encryption.
I know it's a good idea, but for the moment offsite is completely out of the question and completely out of my hands. Institute policy. I might be able to get something put into another building about 250 meters away, but that's about as good as I can do.
Thanks for the suggestions so far. (I confess to being dubious about rsync + cron: I'm already trying to debug someone else's cron scripts. Adding my own home-rolled backup script to the mix fills me with dread ...)
posted by outlier at 4:56 PM on April 30, 2008
One other important consideration, is whether any of your applications use database products. If so, you need to consider that in selecting backup solutions. Most database products provide APIs for backup solutions, or the vendors choose to provide backup products themselves. Unless you use compatible backup products for your databases, you can't be sure of either doing uncorrupted backups of active databases, or of restoring databases from backups.
rsync has many things to recommend it, but database backup and bare metal restore considerations are the things that usually keep it out of consideration as a comprehensive backup solution. You might want to look into disk imaging as a means of rapidly recovering your system partitions, perhaps combined with a standard backup solution for your data partitions. Coming back up, quickly and smoothly from a complete system failure, on new hardware, is the measure of greatness in a system administrator, and that is not the moment when you want to be looking for your original installation media, just so you can get to some point where your backups are meaningful.
posted by paulsc at 5:46 PM on April 30, 2008
rsync has many things to recommend it, but database backup and bare metal restore considerations are the things that usually keep it out of consideration as a comprehensive backup solution. You might want to look into disk imaging as a means of rapidly recovering your system partitions, perhaps combined with a standard backup solution for your data partitions. Coming back up, quickly and smoothly from a complete system failure, on new hardware, is the measure of greatness in a system administrator, and that is not the moment when you want to be looking for your original installation media, just so you can get to some point where your backups are meaningful.
posted by paulsc at 5:46 PM on April 30, 2008
Seconding rsync + cron.
Linux systems (and modern UNIXes generally) are almost embarassingly capable in this department.
Another classic approach is to attach a tape drive (e.g. DLT) to one of the machines and back up the data to that each night. That makes it pretty easy to keep several days backups (by rotating the tapes) or transport them off site. All of the machines can share one tape drive for this purpose, using only the basic programs that come with Linux (in this case, "tar").
As paulsc notes, the database thing can be a problem for any backup mechanism that works at the whole-file level (including tape backups and rsync). But given the situation you're in, I suspect that if the database automatically shut down for a while at 3am to allow the backup to run, nobody would notice. This is a rather old-fashioned approach but it's cheap, and it definitely still works!
posted by standbythree at 6:26 PM on April 30, 2008
Linux systems (and modern UNIXes generally) are almost embarassingly capable in this department.
Another classic approach is to attach a tape drive (e.g. DLT) to one of the machines and back up the data to that each night. That makes it pretty easy to keep several days backups (by rotating the tapes) or transport them off site. All of the machines can share one tape drive for this purpose, using only the basic programs that come with Linux (in this case, "tar").
As paulsc notes, the database thing can be a problem for any backup mechanism that works at the whole-file level (including tape backups and rsync). But given the situation you're in, I suspect that if the database automatically shut down for a while at 3am to allow the backup to run, nobody would notice. This is a rather old-fashioned approach but it's cheap, and it definitely still works!
posted by standbythree at 6:26 PM on April 30, 2008
I know it's a good idea, but for the moment offsite is completely out of the question and completely out of my hands. Institute policy. I might be able to get something put into another building about 250 meters away, but that's about as good as I can do.
If you have a VPN in place then it never really becomes 'offsite'. If encrypted (via a VPN or simply encrypted files) backups were not HIPAA-compliant then there would be a lot of companies that would be fucked when the shit hits the fan.
posted by arnold at 6:31 PM on April 30, 2008
If you have a VPN in place then it never really becomes 'offsite'. If encrypted (via a VPN or simply encrypted files) backups were not HIPAA-compliant then there would be a lot of companies that would be fucked when the shit hits the fan.
posted by arnold at 6:31 PM on April 30, 2008
> I confess to being dubious about rsync + cron: I'm already trying to debug someone else's cron scripts.
Then by all means ask us for help right here! For a start, there's no such thing as a cron "script": cron is incredibly simple, it's really just "at a certain time, do this".
Your "do this" will be an rsync command, and yes, it will be a bit obscure and incomprehensible, but look at it this way: you pretty much only have to get it right once.
Imagine I come to your lab and I say "OK outlier, you want to back up server X? Type this at the command line: '<rsync command with a bunch of flags>'". Does it work? Then you're nearly home. Now you add a cron job to, as it were, type that command for you at midnight every night.
That's all you're really talking about.
posted by AmbroseChapel at 7:05 PM on April 30, 2008
Then by all means ask us for help right here! For a start, there's no such thing as a cron "script": cron is incredibly simple, it's really just "at a certain time, do this".
Your "do this" will be an rsync command, and yes, it will be a bit obscure and incomprehensible, but look at it this way: you pretty much only have to get it right once.
Imagine I come to your lab and I say "OK outlier, you want to back up server X? Type this at the command line: '<rsync command with a bunch of flags>'". Does it work? Then you're nearly home. Now you add a cron job to, as it were, type that command for you at midnight every night.
That's all you're really talking about.
posted by AmbroseChapel at 7:05 PM on April 30, 2008
If you're thinking about rsync, check out rsnapshot, which builds on that & provides complete snapshots of files going back as far as you want.
posted by Pronoiac at 7:28 PM on April 30, 2008
posted by Pronoiac at 7:28 PM on April 30, 2008
I know you'd previously dismissed it, but Amanda has (or had) a very active support community. I once worked in a press shop that had a very elegant Amanda backup setup that only required a daily tape change, and then a weekly change of the rack in the autochanger. There may have been considerable pain in the setup (our sysadmin was a longtime contributor to the Amanda project), but the daily use was so simple a user could do it.
posted by scruss at 7:37 PM on April 30, 2008
posted by scruss at 7:37 PM on April 30, 2008
Response by poster: ... HIPAA ...
We're not in the US. In any event, institutional policy not law is the block here.
(Cron jobs) Then by all means ask us for help right here!
You asked for it: a cron job that apparently fires off (as seen in cron logs), but does not produce any discernable effect. That is, no output or error is thrown - the script named in cron apparently just doesn't get called. When manually called on the commandline, it works fine.
posted by outlier at 8:24 PM on April 30, 2008
We're not in the US. In any event, institutional policy not law is the block here.
(Cron jobs) Then by all means ask us for help right here!
You asked for it: a cron job that apparently fires off (as seen in cron logs), but does not produce any discernable effect. That is, no output or error is thrown - the script named in cron apparently just doesn't get called. When manually called on the commandline, it works fine.
posted by outlier at 8:24 PM on April 30, 2008
When cron jobs have output, they are typically mailed to the user who ran them. Check /var/spool/mail/yourusername and see if cron has been sending you mail.
posted by reishus at 8:59 PM on April 30, 2008
posted by reishus at 8:59 PM on April 30, 2008
On thing to keep in mind with cron jobs is that the user's environmental variables are ignored for the most part. If the command in the crontab doesn't include the full path, it could be failing because it cannot find the command.
So for example, this would be bad:
* 1 * * * run_backup.sh
But this would be good:
* 1 * * * /home/cayla/bin/run_backup.sh
(Very simplified example).
posted by cayla at 9:02 PM on April 30, 2008
So for example, this would be bad:
* 1 * * * run_backup.sh
But this would be good:
* 1 * * * /home/cayla/bin/run_backup.sh
(Very simplified example).
posted by cayla at 9:02 PM on April 30, 2008
rsync is not a backup tool.
rsync is not a backup tool.
rsync is not a backup tool.
rsync + cron == "oh hey, this file is all corrupted... oh, and it's corrupted on our backup as well!"
You can probably rig up some sort of iSCSI based D2D backup for $10-15k. A tape autoloader could probably do it for $5k. If you must use rsync, make damned sure that you are keeping staged backups, not just one copy (e.g. a month ago, a week ago, and then every day...), one copy of the most recent version is not a backup, it's a pending disaster.
I'd also suggest a solution that involves keeping one recent copy offsite, even though you said that's not possible. There are plenty of data centers that are FASB/whatever certified, and it's not a bad idea to implement a remote DR site. In fact, it's a good idea.
But whatever you do, remember that rsync is not a backup tool, despite the fact that it is often mistaken for one.
posted by Project F at 10:22 PM on April 30, 2008
rsync is not a backup tool.
rsync is not a backup tool.
rsync + cron == "oh hey, this file is all corrupted... oh, and it's corrupted on our backup as well!"
You can probably rig up some sort of iSCSI based D2D backup for $10-15k. A tape autoloader could probably do it for $5k. If you must use rsync, make damned sure that you are keeping staged backups, not just one copy (e.g. a month ago, a week ago, and then every day...), one copy of the most recent version is not a backup, it's a pending disaster.
I'd also suggest a solution that involves keeping one recent copy offsite, even though you said that's not possible. There are plenty of data centers that are FASB/whatever certified, and it's not a bad idea to implement a remote DR site. In fact, it's a good idea.
But whatever you do, remember that rsync is not a backup tool, despite the fact that it is often mistaken for one.
posted by Project F at 10:22 PM on April 30, 2008
Project F: It sure is, so long as you make copies of what you've synced once it's on the other server.
This way you get the reduction in network traffic along with the benefit of an actual backup (or something akin to a backup)
posted by wierdo at 10:42 PM on April 30, 2008
This way you get the reduction in network traffic along with the benefit of an actual backup (or something akin to a backup)
posted by wierdo at 10:42 PM on April 30, 2008
If cron is blowing your mind check out one of the many cron front-ends (kcron, gnome-cron, I think webmin has a module for this) to get the cron setting right.
Personally, I'd use bacula. The other major backup suite that hasn't been mentioned is mondorescue. Summary, and some guides
posted by a robot made out of meat at 4:39 AM on May 1, 2008
Personally, I'd use bacula. The other major backup suite that hasn't been mentioned is mondorescue. Summary, and some guides
posted by a robot made out of meat at 4:39 AM on May 1, 2008
if you're using RedHat enterprise, you should use the supplied packaged with the distribution, and I assume you have access to redhat support, so call them to ask. They actually have a page about backup.
If you want something simpler, rdiff-backup is the tool to use instead of rsync and BackupNinja (site is dead right now) should help you to manage backups, while being easy enough for a small labs like yours.
posted by anto1ne at 7:04 AM on May 1, 2008
If you want something simpler, rdiff-backup is the tool to use instead of rsync and BackupNinja (site is dead right now) should help you to manage backups, while being easy enough for a small labs like yours.
posted by anto1ne at 7:04 AM on May 1, 2008
Rsnapshot and Dirvish are both wrappers around this technique for generating automatic rotating "snapshot"-style backups of files. It's a good technique. Stay away from Bacula and Amanda; they're quite complex and probably overkill unless you're using a tape drive.
posted by PueExMachina at 1:25 AM on May 2, 2008
posted by PueExMachina at 1:25 AM on May 2, 2008
Response by poster: Apologies for disappearing during the process of this thread - urgent work called me away for a few days, which then segued into a conference trip. Thanks for all the answers - they've uncovered a host of solutions I hadn't heard of or considered. Hopefully they'll also retard my progression from scientist to "I hear you know about computers ..." guy ...
posted by outlier at 3:32 AM on May 15, 2008
posted by outlier at 3:32 AM on May 15, 2008
This thread is closed to new comments.
Just be sure to check that the backup is actually on the media and good every once in a while. More than a few people have been brought down by bad media that was never detected.
posted by wierdo at 4:05 PM on April 30, 2008