Open-source or trial de-duplication tools anyone can recommend?
January 9, 2009 12:46 PM Subscribe
I have several old data backup devices, with some overlap on what's on them. I'd like to consolidate them all onto a new device, but without having 3 copies of the overlapping redundant data. I think the term I'm looking for is de-duplication. Any open source or free trial tools that MeFites can recommend?
I'm looking at several old SNAP! Server devices that were used for data backups. The information on all three is valuable, but there is some overlap of what's contained on them. And what's on there is so jumbled and disorganized that there's no way to sort it all out manually without an unreasonable amount of effort.
The hard drives on these things are basically at full capacity, so a new backup device is needed. Rather then keep 4 devices networked to access all this information (and continue having to search 3 devices to find a file), I'd like to consolidate all that data onto a new device as one big set of files.
So I'm looking for some tool that will let me get a new device (with a larger capacity), point some software at the three old devices, and tell it to move the contents of the three old devices onto the one new device, BUT NOT to copy multiples/duplicates. This should empty out the older devices so they can be wiped and disposed of, and give me one new device with all of their contents, but eliminating the need to store dozens of GB of overlapping files they all used to contain.
Anything anyone can recommend?
I'm looking at several old SNAP! Server devices that were used for data backups. The information on all three is valuable, but there is some overlap of what's contained on them. And what's on there is so jumbled and disorganized that there's no way to sort it all out manually without an unreasonable amount of effort.
The hard drives on these things are basically at full capacity, so a new backup device is needed. Rather then keep 4 devices networked to access all this information (and continue having to search 3 devices to find a file), I'd like to consolidate all that data onto a new device as one big set of files.
So I'm looking for some tool that will let me get a new device (with a larger capacity), point some software at the three old devices, and tell it to move the contents of the three old devices onto the one new device, BUT NOT to copy multiples/duplicates. This should empty out the older devices so they can be wiped and disposed of, and give me one new device with all of their contents, but eliminating the need to store dozens of GB of overlapping files they all used to contain.
Anything anyone can recommend?
You'll be able to do this with rsync (which is a Linux app). You can either install cygwin to run the command or follow another howto.
Basically, it will look something like this:
posted by yellowbkpk at 1:52 PM on January 9, 2009 [1 favorite]
Basically, it will look something like this:
rsync -a driveA/backup bigDrive/backup
rsync -a driveB/backup bigDrive/backup
rsync -a driveC/backup bigDrive/backup
posted by yellowbkpk at 1:52 PM on January 9, 2009 [1 favorite]
This thread is closed to new comments.
a) the information on all 3 disks is valuable
b) what's on there is jumbled and disorganized
c) duplicates are in the dozens of GB
and that "disks are cheap, but data may be priceless" you would be best served by making a full backup (hell, make several) of all 3 devices and only then doing de-duplication on another copy of the data. Measure twice, rm once.
For de-duping, use a tool that computes checksums over all the files, and then either deletes dupes or hard-links / junctions them (depending on your specific situation -- OS, filesystem, and whether you care about preserving structure or simply want to save space).
Check out trimtrees.pl or Google for "duplicate file finder".
posted by lascimmia at 1:35 PM on January 9, 2009