Looking for an anti-duplicate file finder.
August 23, 2008 5:14 PM Subscribe
Can someone suggest a Windows utility to search for non-duplicated files on different hard drives? I've got multiple hard drives with tens of thousands of files on them. Many are duplicates, but not necessarily in the same directories. I need to identify the files that are on one drive, but not on the other, regardless of their location in the directory tree.
I've found plenty of comparison utilities, but they all seem to take into account the file path. There are also duplicate finders that work well and ignore the path, but I need an "non-duplicate" finder. I've done this before by using a duplicate file finder (Clonespy), deleting the duplicates. The files left are the ones I want. However, I really would rather not modify the original disks. Thanks!
I've found plenty of comparison utilities, but they all seem to take into account the file path. There are also duplicate finders that work well and ignore the path, but I need an "non-duplicate" finder. I've done this before by using a duplicate file finder (Clonespy), deleting the duplicates. The files left are the ones I want. However, I really would rather not modify the original disks. Thanks!
Best answer: It's probably easier to write a small script to do this. So I did. You still run it from the command line, but it should give you the full paths as plain text.
The python script is available here and a windows executable here. Both have undergone only the very lightest of testing (it seemed to work for me) and so I offer them with no guarantees as to utility or safety. You might want to wait around to see if anything more user-friendly and less hacked-together is available.
If you have any problems let me know.
posted by xchmp at 9:14 PM on August 23, 2008
The python script is available here and a windows executable here. Both have undergone only the very lightest of testing (it seemed to work for me) and so I offer them with no guarantees as to utility or safety. You might want to wait around to see if anything more user-friendly and less hacked-together is available.
If you have any problems let me know.
posted by xchmp at 9:14 PM on August 23, 2008
xchmp: will this also work to find multiple copies on the same drive? Sorry to piggyback, but I have a very similar problem to cosmac, but with many nested directories on the same drive as well as copies on external drives.
Anyone?
posted by Meatbomb at 9:13 AM on August 24, 2008
Anyone?
posted by Meatbomb at 9:13 AM on August 24, 2008
Meatbomb: The script goes through two directories (A and B) and all their subdirectories and prints out all the files under directory B that are not duplicates of any file under directory A. So it won't work as expected if A and B overlap. In cosmac's case A and B will be the root directories of different drives, so all should be well. If you have one set of directories where all your master copies are and all the duplicates are in a completely different set of directories, then this might work for you. But it sounds like you want something a little more complicated.
posted by xchmp at 9:54 AM on August 24, 2008
posted by xchmp at 9:54 AM on August 24, 2008
Just for archival purposes, I went out and found this freeware: Easy Duplicate File Finder. Haven't tried it out yet though.
posted by Meatbomb at 10:43 AM on August 24, 2008
posted by Meatbomb at 10:43 AM on August 24, 2008
Response by poster: xchmp: Works great, thanks! It also prompted me to learn a little python. I modified it to add a regular expression filter.
posted by cosmac at 1:50 PM on August 24, 2008
posted by cosmac at 1:50 PM on August 24, 2008
This thread is closed to new comments.
I assume that you just need the filenames, not the paths. Open up a command prompt. Do a recursive dir command ( http://www.google.com/search?q=dir+command+syntax ) and redirect that output to a text file. Google anything you don't understand so far. Than import that into Excel and figure out how to get rid of the dupes. Again Google should help you with that.
posted by intermod at 8:30 PM on August 23, 2008 [1 favorite]