Drowning in Files
February 12, 2020 5:50 PM   Subscribe

Moving to a new computer. This involves moving a library of files that have been churning for 15 some odd years. Many are obsolete and a lot of them are duplicates. The duplicates need to be weeded and others need to be sorted by age and evaluated. I bought a copy of the latest 64-bit version of Xplorer2 that claims to find and remove duplicate files. It does not do the job. What do you use for dealing with duplicate files?
posted by Raybun to Computers & Internet (16 answers total) 20 users marked this as a favorite
 
Beyond Compare is what I use, though I don't use it often. Works a treat.
posted by Sunburnt at 6:13 PM on February 12, 2020 [1 favorite]


CCleaner will do it.
posted by jmfitch at 6:24 PM on February 12, 2020


Did you "flatten" (File > Browse flat) your drive first in Xplorer2? Here's the section of the help file dealing with duplicate files.
posted by davcoo at 6:29 PM on February 12, 2020


What do you use for dealing with duplicate files?

I rely on the fact that every new computer I acquire comes with exponentially more storage than the one it replaced, and on using an excellent de-duplicating archiver to back them all up.

Long and bitter experience has taught me that "cleaning up" my computers has always caused me far more grief than not doing that. But if you're determined to go ahead with some such effort anyway, I strongly recommend making sure you have at least one complete offline archive of all of the pre-existing mess on any computer you propose to subject to the tidying process.

Many, many people come to grief by trying to tidy their machines up preparatory to making their backups, on the apparent basis that they don't want their backups to be messy. But backups just are messy, especially if they're done as regularly as they really ought to be, and tidying up is peak data loss risk.
posted by flabdablet at 6:47 PM on February 12, 2020 [12 favorites]


I do a search for files of that type, sort by name, then delete the duplicates from the search window. Then again, I have a high tolerance for repetitive tasks.
posted by The Underpants Monster at 7:03 PM on February 12, 2020 [1 favorite]


I've used CloneSpy before. I'm not saying it got all of the duplicates, but it helped me clean up a lot of them before I did an OS upgrade.
posted by sardonyx at 7:23 PM on February 12, 2020


I use Duplicate Cleaner Pro. It is fantastic with photos, but works on everything. Worth every penny of $29.95.
posted by lhauser at 8:46 PM on February 12, 2020


Seconding flabdablet's advice on making a complete "messy" backup before you start deleting anything. My opinion is that the best way to approach this is to dump all your old, unsorted stuff into an "archive" folder, then pull stuff that you need and want organized out of it as you need it. If you really need to free up space, external hard drives are pretty cheap.
posted by Aleyn at 8:51 PM on February 12, 2020 [3 favorites]


the best way to approach this is to dump all your old, unsorted stuff into an "archive" folder, then pull stuff that you need and want organized out of it as you need it

Yep. And if you always do that by moving entire existing folder trees into a date-named subfolder inside your Archive folder, like Archive\2020-02-13, and then moving individual files out of Archive subfolders and into their shiny new well-organized locations when you first need them, then the archiving and dearchiving steps can run conveniently fast and your Archive folder can be incorporated into the 15 year evolving mess in a reasonably sustainable fashion.

If you're consistent about moving stuff in and out of Archive subfolders rather than trying to use Archive as a backup by making copies, then Archive will cost completely negligible amounts of disk space. Also, anything remaining inside an Archive/$date subfolder more than say five years old has clearly not been touched in at least five years and is therefore fairly unlikely to be needed again in a hurry. That lets you simply hive your oldest Archive subfolders off to external storage without needing to trigger a big tidy-up whenever you feel like reclaiming a bit of primary drive space. I like to think of this process as subduction.

It bears repeating that Archive is an organizing tool, not a backup strategy. Make backups, and make sure that the contents of Archive get backed up along with everything else.
posted by flabdablet at 10:40 PM on February 12, 2020 [1 favorite]


I use the Everything filename search tool. Finding dupes is easy. Enter this in the search bar -
dup:

If the number of matches is too large to deal with, you can narrow it down using the search syntax. Although the syntax is powerful and extensive, a subset will meet your needs. These are the ones I use the most.

Find duplicates of a specified file type, e.g.
dup: doc:
(or - audio: zip: exe: pic: video:)

Find duplicates using a size range constant defined in the search syntax, e.g.
dupe: size:large
(or - :tiny :small :medium :huge :gigantic)

You can combine search syntax, e.g.
dupe: video: size:gigantic

Find duplicates by file suffix, e.g.
dupe: endwith:pdf
(or any other file suffix)
posted by Homer42 at 3:12 AM on February 13, 2020


DoubleKiller
posted by urbanwhaleshark at 4:02 AM on February 13, 2020


Although your question focuses on duplicates, consider the following for the "obsolete" files: Use xplorer2 to find all files that are more than 3 years (or any other number of your choice) old and move them to a reliable external hard drive as an archive. That way the new computer looks less crowded but you haven't deleted anything that you might need to find next October.
posted by megatherium at 4:13 AM on February 13, 2020


find all files that are more than 3 years (or any other number of your choice) old and move them to a reliable external hard drive

It depends on what files the OP uses regularly. For example, it's possible that they have a music library where most of the files are >3 years old yet they could be needed at any moment.

If you can afford the initial disk space penalty, I second flabdablet and Aleyn's answers and emphasise you should place the Archive/$DATE folders within the same physical storage device and filesystem: whenever some old file is needed a simple CTRL-X CTRL-V will fix it instantly vs. having to plug in your USB drive and wait until things are copied back.
The external storage is still welcome as a full backup for when you start pruning things from the Archive.
posted by Bangaioh at 5:33 AM on February 13, 2020 [1 favorite]


find all files that are more than 3 years (or any other number of your choice) old and move them to a reliable external hard drive
This seems like false economy.

Unless you're big into video processing or something similar, it's HIGHLY unlikely that your garden-variety files represent a material portion of your disk usage. The incremental cost of keeping them around (with or without dupes, even) is pretty low, and the possible upside is large.

I've never done ANY such purge, and have just migrated my data from old Mac to new Mac for nearly 20 years. It's a non-issue.

When I have disk space issues, it's because of enormous client-party databases, or virtual machines, or because I need to do some catalog maintenance in LightRoom (you really CAN run out of space with higher-end photography, but LR makes it easy to push prior years to a network volume).

Generally speaking, though, don't put anything you actually want to keep onto an external drive. Stuff on outboard disks tends to get neglected for backups and other things, and the next thing you know you've lost data.
posted by uberchet at 7:41 AM on February 13, 2020 [1 favorite]


Agreed but the question doesn't seem to hinge on disk space usage but rather organisation, hence why the "move (but not delete yet) everything out of the way and start anew" seems worthwhile.

It's dangerous to make duplicate finder recommendations because the OP doesn't specify what exactly counts as a dupe for their purposes. Are 2 MP3 files with identical audio stream but different tags dupes? Or an MP3 and a FLAC of the same song? Or do only byte-for-byte identical files count?
posted by Bangaioh at 11:11 AM on February 13, 2020


I second the recommendations for Everything and CloneSpy. Everything is better when you want to view the dupes and decide for yourself which one (if either) to delete. CloneSpy has an option go through one by one, but then you have to decide for each pair as it presents them to you rather than being able to scroll through as you please. CloneSpy is great for batch jobs where you trust the criteria.
posted by soelo at 2:26 PM on February 13, 2020


« Older More like Mort   |   Chaat me up! Newer »
This thread is closed to new comments.