What's the best means to mirror a webserver directory on a local OS X machine?
October 17, 2005 5:41 PM   Subscribe

I maintain a complex website that I update by hand. (Unfortunately, this is unavoidable.) I want my local directory to be an exact mirror of the site, so that when I alter and modify thirty HTML files and I can simply run a script to copy them instead of FTPing them all by hand, in addition to removing deleted files and downloading any server-side changes made by scripts or other users. What's the best means to do this on OS X?

I've tried using Transmit's "Mirror" feature, but in my experience its datestamp-calculations are unreliable, and also have the sideffect that anytime I need to reupload or redownload the entire 2 GB site from scratch (if I believe my backup is bad, or if I change servers, for example), I need to synchronize the entire site in the other direction for the timestamps to work out.

I've tried to figure out RsyncX, but I'm not sure it's designed to do what I want, and it's a pretty hardcore piece of software. If it WILL do what I want, I'm all for learning how to make it do it, but I haven't found a good tutorial to that effect yet.

Any suggestions? I'm sure that there are a million people who need and have systems like this, but for the life of me I can't figure it out myself.

Also, any pointers to a good "newbie's guide to Rsync/RsyncX" would be appreciated, since I'd like to start running incremental archived backups as well.
posted by tweebiscuit to Computers & Internet (17 answers total)
 
Don't use sitecopy. It works well for this in general but will kill server-side changes. Just a warning.
posted by smackfu at 5:50 PM on October 17, 2005


Ant?
posted by furtive at 5:53 PM on October 17, 2005


Best answer: rsync is a very good tool for this. This tutorial is helpful for discussing the basics. And of course you will need the man page.

1: Check to make certain you have rsync installed on the remote server.

2: Download your files with something like:
rsync --recursive --verbose --archive user@remotehost:www .
this will copy all of the files from the directory www on the remote server to your local directory.

3: When you are ready to upload, use:
rsync --recursive --verbose --archive ./www user@remotehost:
this will copy changed files to the remote server.


The --archive flag tells rsync to preserve just about everything (modification times, permissions, etc.)

You can use --dry-run to get a preview of what rsync will do.

posted by KirkJobSluder at 6:06 PM on October 17, 2005


Yes, rsync is the right tool for this. Also, the best way to run rsync for incremental backups is with rsnapshot.
posted by nicwolff at 6:11 PM on October 17, 2005


Oh, you also might want to check the -E, --extended-attributes flag on Tiger as well for Macintosh file transfers. It handles resource forks and metadata.
posted by KirkJobSluder at 6:21 PM on October 17, 2005


Best answer: Rsync is definitely what you want. It was designed for exactly what you're trying to do. The key to using it has already been mentioned: always use --recursive and --archive. The latter is actually a shortcut that turns on a bunch of other switches for preserving date/time/owner/permissions/links. Another good one to always have is -P (shortcut for --partial --progress). This will mean that if you are transferring a large file and the process is interrupted, when it it restarted it will pick up where it left off, rather than having to start at the beginning of that file. To handle file deletions use --delete which will cause any files on the dest that don't exist on the source to be deleted. If you want this to happen in both directions use it on both invocations of the command.

Note that you don't have to setup an "rsync server", so ignore any tutorials on that. You can just do rsync over ssh, so as long as you can ssh to the remote, and the remote has the rsync command available, that's all you need.

rsync documentation

If you just aren't getting anywhere with rsync you can try Unison, which is similar.
posted by Rhomboid at 6:42 PM on October 17, 2005


Take the time now to implement and learn CVS. For a large, complex web site (2GB!) you will feel a lot better and probably save yourself a lot of grief using a real version control system. This will also give you the huge benefit of letting others make changes to the site, if necessary. There'll be a bit to learn in the beginning (you can pick it up in a week easily) but once you do you're home free.
posted by nixerman at 6:51 PM on October 17, 2005


Run an AFP (Apple File Protocol) file server, e.g. on Mac OS X Server. Then keep your web documents on a file share. Connect your workstations to the file share. Set up your Dreamweaver profiles to use the AFP file share as a "Local" directory, which for your purposes is exactly what this is.
posted by Rothko at 7:14 PM on October 17, 2005


Response by poster: Thanks guys! I don't think I'll use CVS (most of the 2GB are video files -- there aren't nearly that many HTML pages, and no other editors). And thanks, Rothko, but I don't use Dreamweaver. Now that I know it's the tool for the job, I'll start plugging through with Rsync. Thanks everybody!
posted by tweebiscuit at 7:31 PM on October 17, 2005


I actually think Dreamweaver is quite good at this task.

Some versions of it tend to crash on OS X but, that little problem aside, its "Synchronize" feature is exactly what you want. You keep a local mirror and you update what needs to be updated. If you have DW already, I'd suggest getting to know this feature, but of course I don't think you buy it just for its synching.

Also, if you find Transmit's datestamp functions are unreliable, I'm sure they'd love to hear about it, they're a very responsive company in that way.
posted by AmbroseChapel at 7:31 PM on October 17, 2005


This already has a best answer marked, but I just want to throw in another vote for rsync. I have cron jobs set up to back up my e-mail, iCal, etc., folders. It works basically transparently.
posted by oaf at 8:25 PM on October 17, 2005


Response by poster: Hmm. I keep getting an "stdin: is not a tty" error when trying to using an RsyncX-generated script in the terminal. Any ideas what's going on there?
posted by tweebiscuit at 9:30 PM on October 17, 2005


I'm very fond of unison, which I tend to use interactively but apparently is very easy to use noninteractively too.
posted by Aknaton at 11:14 PM on October 17, 2005


tweebiscuit: Hmm. I keep getting an "stdin: is not a tty" error when trying to using an RsyncX-generated script in the terminal. Any ideas what's going on there?

When I had this happen to me, it was due to a mixup with my login script on the remote server calling an interactive command. (In my case it was ssh-agent.) If you have a custom .login/.bashrc on the remote server, try disabling it.
posted by KirkJobSluder at 7:01 AM on October 18, 2005


Interarchy.
posted by joeclark at 7:26 AM on October 18, 2005


Response by poster: Hm. Looks like I'm just having trouble using RsyncX -- I'll just figure out how to do it via the command-line instead -- it'd be good practice anyway. Thanks KJS!
posted by tweebiscuit at 12:03 PM on October 18, 2005


When you do rsync over ssh (or anything over ssh) it requires that the ssh session be "clean". That means that if you have anything in your .login / .bashrc / .profile that outputs stuff to the terminal or expects to interact, it will not work when you try to use ssh as a transport for rsync. You can test this by doing something like "ssh host echo hello". You should see just "hello", and no other output. If you see any stray warnings or errors, check all your startup files on the host to see if there's anything there that might be interfering.
posted by Rhomboid at 3:37 PM on October 19, 2005


« Older Reminds me of the TSA!   |   "shading lines" what does it mean? Newer »
This thread is closed to new comments.