peer to peer folder sync
February 3, 2008 9:12 PM   Subscribe

Can anyone recommend software to syncronise files between multiple servers ?

I have staff creating multiple large files, >1gb, daily and I need to share them with multiple organisations. (medical research data, medical research institutes, univercities, etc)

I'm considering using bittorrent and handrolling a solution.
Scan a folder for new files, auto seed new files, check other servers for their new autoseeded files.

When staff want to commit a data file for synching, they would just move it to a specific folder and wait for the servers to sync overnight.
An enhancement would be a webpage for staff to tick boxes, to publish data, queue for syncing, etc.

As far as I knopw the data will be static, once it's created it won't change, there will just be more of it.

It would be nice to have some reporting functions too.
Platform would be windows or linux.
Encryption during transit would be nice.
Estimated new data is 15gb day.
Initial seed would be 1,000gb+ (may post harddisks to start off)
Server locations are global.
Bandwidth not a huge problem, academic networks will be used.(keen to be efficient though)

If there is a existing solution, I'd like to hear about it.

I considered PowerFolder, but the techs there say it doesn't doesn't support multiple sources. I'm really looking for something that support multiple sources at a block level.

If there's an obvious way to do this, I'm missing it.
I'll admit I like the idea of using bittorent for legitimate purposes, but that should not be the driver.
posted by matholio to Computers & Internet (14 answers total) 5 users marked this as a favorite
Unison, uses RSYNCH
posted by iamabot at 9:14 PM on February 3, 2008

Rsync is best for bandwidth efficiency. It's a snap to setup on linux. That's just the transport program though, you'd have to setup cron jobs / etc to get things to the right place.
posted by chundo at 9:18 PM on February 3, 2008

Seconding Unison. It can be a bit picky to set up -- make sure you use the exact same version on all machines. But it sounds like it's exactly the sort of thing you want.
posted by Kadin2048 at 9:24 PM on February 3, 2008

Uh yeah, not sure why I typed that all in caps with an extra h.
posted by iamabot at 9:28 PM on February 3, 2008

Response by poster:
Hmm, I'm not sure rsync really helps.
These data files are not going to change, so the delta will always be whole files which depreciates the 'rsync algorythm' cleverness.

Also, isn't rsync a one-to-one connection. If I have multiple sites, don't I end up sending the whole file to each site (from the original). The source uploads n times.

I'll read some more, but I'm not sold.

Thanks for answering though.
posted by matholio at 9:34 PM on February 3, 2008

Shouldn't be hard to handroll with a BT tracker and a few perl scripts. rsync is terrific, but not for this. If you have HIPAA data, you'll need to seriously, seriously think about how you're distributing it, though, & BT will trigger all of the alarm bells with your IRB, justified or not. If every lab shared data at an equal rate, having 'owner' servers for files, and every server pulling data that each other server owns. You can just use SCP for this. This has the advantage that the owning lab knows exactly where the data is going, which may or may not be necessary.
posted by devilsbrigade at 10:47 PM on February 3, 2008

Just to clarify, Unison fixes the problems you've identified with rsync. (Well, the fact that the data changes 100% each time isn't really a 'problem' with rsync per se, it just means rsync won't do it's neato delta-calculation stuff.) It does bidirectional sync, for starters.

If you set up Unison in a hub-and-spoke configuration (all clients syncing to a server), putting a file into the shared folder on any client will result in it being copied to all clients in two sync cycles. It doesn't matter really whether you initiate the sync from the client or the server, although it's typically easier to initiate from the client side.

Unison uses rsync to actually move the data around, but it is to rsync what any backup problem is to 'cp'. There's a lot of logic sitting on top of it.
posted by Kadin2048 at 10:56 PM on February 3, 2008

What any backup program is to cp. Ugh, time to go to bed; I'm the one having the 'problems.'
posted by Kadin2048 at 10:59 PM on February 3, 2008

Response by poster:
I wondering if outsourcing the problem is the way foreward.
Amazon Simple Storage Solution
posted by matholio at 11:44 PM on February 3, 2008

use foldershare.
www. is pretty much a set up & forget. I use it for something similar to maintain files on computers around the world.
posted by dripped at 12:22 AM on February 4, 2008

Super Flexible File Synchronizer has been good to us. A few points that might address your needs (quoted directly from website):
  • Includes a scheduler. Schedule the synchronization of your data on a backup hard disk at a convenient time each day, or as frequently as you wish. You can also schedule profiles to run upon shutdown or log-off. On Windows NT/2000 or higher, the scheduler can run as a service - without users having to log on.
  • Internet Support. Supports various Internet protocols, including FTP, FTPS, SFTP/SSH, WebDAV, SSL, HTTP, and Amazon S3 web storage.
  • Compression and Encryption Support. Easily zip your files and encrypt them with strong 256-bit encryption. To unzip and decrypt, simply make a copy of your profile and copy in the opposite direction.
  • Partial File Updating (or delta copying): this feature copies only the changed portions of files in order to speed up the synchronization.
Disclaimer: not affiliated with SFFS, YMMV, but it's worked well with many of our multi-TB archives (tapes weren't cutting it so we made a few RAID5 boxes and use SFFS to keep a few TBs of actively accessed/altered/added to data mirrored with file versioning (why not RAID 10, you ask? We kept hitting the RAID volume size ceiling and ended up splitting things up).
posted by FrotzOzmoo at 1:11 AM on February 4, 2008 [1 favorite]

Seconding Super Flexible File Synchronizer, we used it at my old work to backup Vmware images and it worked like a charm.
posted by bertrandom at 10:32 AM on February 4, 2008

I use foldershare as mentioned above, but I'll add:

- It has a 10k file limit per 'share'. I hit this limit when I was using it to sync iTunes libraries as it had image files for all of the songs in my collection.

- It won't sync files over 2GB. It took me a while to find that one out, but I finally figured out why I couldn't sync my sisters raw wedding footage.

- I use this solution for a side business to keep important business files 'distributed' among all of the owners, which gives us some off-site disaster proof safety.

If you have any other questions about it's use, let me know.

posted by crturboguy at 5:17 PM on February 4, 2008

Response by poster: I'm pretty sure I'll need to support large files. >2gb.
I appreciate all the answers, but I don;t hink I have a gold solution.

I have a meeting later in the week with stakeholders, I'll find out more about the who, where, when, why of the data.

It's not clear if I'm expected to provide backup services for all the data or if each producer will do that locally for there own data.
I'm not entirely sure if they do actually need all the data everywhere. It sound like one of those asks that researchers request before they know the cost/complexity.
'any file made local within 24 hours' might be acceptable and would change my approach.

I'll get back to this thread when I have some more info.

Thanks again.
posted by matholio at 7:53 PM on February 4, 2008

« Older Downloading torrents while I'm away?   |   mac word processor w/ multiple document gui? Newer »
This thread is closed to new comments.