Sync two servers?
March 19, 2008 9:38 AM   Subscribe

IT Geek Filter: A better way to sync millions of files between very remote servers?

Currently I am syncing two file servers, one on the west coast one on the east coast, consisting of several million files, using Double-Take. The syncing performance basically sucks, as does the operation of Double-Take. Does anyone know of a better solution or better software to accomplish this task?
posted by Cosine to Computers & Internet (24 answers total)
 
Are you talking 10's of gbs? I would mail media to the remote location and have them copy it to the server and then start the sync, just to avoid the initial overhead.
posted by Cat Pie Hurts at 9:40 AM on March 19, 2008


we do something like this using plain old rsync

yeah, the initial sync takes forever (we let it run over the weekend), but the nightly deltas usually only take a few minutes, and we run them at midnight just to be safe
posted by Oktober at 9:45 AM on March 19, 2008


rsync.
posted by unixrat at 9:46 AM on March 19, 2008


Response by poster: Thanks, but I'm looking for something a little more "Enterprise" than mailing files, also there isn't anyone at this remote site.
posted by Cosine at 9:47 AM on March 19, 2008


thirding rsync.
posted by rmd1023 at 9:51 AM on March 19, 2008


Response by poster: I use rsyc for some smaller servers but for this situation rsync has been tried and couldn't handle it, perhaps because these are Windows servers and making rsync for on windows isn't great.

There is enough rate of change for these files that a nightly delta style of sync does not work, files need to be flying 24/7 to keep up. A lot of the overhead is the actual file comparison. It takes Double-Take as much as two weeks to fully compare all the files and if either server has any kind of glitch the file comparison task has to start over from scratch.
posted by Cosine at 9:52 AM on March 19, 2008


Response by poster: odinsdream: That is fine for the initial sync AND that is how I initially did it but that's not the problem, KEEPING them in sync is the problem.
posted by Cosine at 9:53 AM on March 19, 2008


rsync, rsync, rsync. It is very fast.
posted by AaRdVarK at 10:03 AM on March 19, 2008


I'm not sure if it's been mentioned, but rsync =) Also, if you're using Windows machines, deltacopy.
posted by bertrandom at 10:05 AM on March 19, 2008


What is your connection speed? File comparison should not take that long.
posted by mphuie at 10:06 AM on March 19, 2008


Response by poster: Bandwidth isn't the problem, latency is, the task of comparing 10,000,000 files doesn't even max out the connection speed now.
posted by Cosine at 10:09 AM on March 19, 2008


Unison, which leverages rsync
posted by iamabot at 10:09 AM on March 19, 2008


Also, you should be looking in to WAAS, it's ideal for these types of things.
posted by iamabot at 10:10 AM on March 19, 2008


Whatever rsync did was worse than two-weeks-plus-errors? How?

rsync is pretty much the gold standard for this sort of thing. You'll probably be best off figuring out on your own (since you neglect it here) what is wrong; there's probably something trivial in the way.
posted by cmiller at 10:12 AM on March 19, 2008


Response by poster: I'm guessing that everyone here suggesting Rsync on Windows hasn't had the pleasure of having Rsync decide after each DST change that ALL files are out of sync and must be recopied in their entirety.
posted by Cosine at 10:14 AM on March 19, 2008


rsync for sure.. if you really do have that many files and the inital scan takes an hour plus, you can break it up into multiple streams with much luck:
if you are rsyncing
/file_system

do:
/file_system/branch_a
/file_system/branch_b
...
so on
posted by joshgray at 10:22 AM on March 19, 2008


Here's an article on the rsync Windows DST problem with a couple of fixes:
http://www.samba.org/rsync/daylight-savings.html
posted by pocams at 10:22 AM on March 19, 2008


If you're gunshy about rsync:

Assuming you have immediate or remote access to server A or B, winscp (free) and SecureFX (paid) each have directory synch features, with incremental/diff-only options.

The only gotcha I see is that they both operate with the "local to remote" metaphor, as opposed to "remote to remote", hence the part about immediate access to one of the machines.
posted by bhance at 10:27 AM on March 19, 2008


seconding Unison
posted by qxntpqbbbqxl at 10:30 AM on March 19, 2008


Unison has proved useful for me in the past (rsync shy).
posted by holgate at 10:30 AM on March 19, 2008


Thirding or fourthing Unison.
posted by pmbuko at 11:16 AM on March 19, 2008


I'm guessing that everyone here suggesting Rsync on Windows hasn't had the pleasure of having Rsync decide after each DST change that ALL files are out of sync and must be recopied in their entirety.

By default rsync uses modification time and size to decide if a file has changed. It can use a checksum with the -c option (however computing checksums is (obviously) more processor intensive than looking at file attributes). Or, you could use --size-only to turn off the mtime check. You can also use --modify-window if your timestamps are off (in fact, the documentation says you're supposed to use this on FAT partitions).
posted by paulus andronicus at 12:08 PM on March 19, 2008


Nthing rsync. Are you using -z (--compress), --partial(-dir), and possibly --checksum?
posted by hattifattener at 6:03 PM on March 19, 2008


Do research on WAN Optimazation. There are tons of products in that marketspace, some cheaper then others.
posted by JintsFan at 1:35 PM on March 26, 2008


« Older Help me find a book from my childhood about...   |   Shooting the mob? Newer »
This thread is closed to new comments.