Join 3,363 readers in helping fund MetaFilter (Hide)

Duplicate file names... in the same folder
August 12, 2012 9:55 AM   Subscribe

In Windows 7, I have several examples of identically-named files in the same folder. As far as I know, this shouldn't be possible. Any ideas about why this would have happened and how to fix it?

Here's an example. The only in these files is the capitalization of letters and the modification dates:
dir *.mobi

 Directory of C:\Books\John Kennedy Toole\A Confederacy of Dunces (1209)

07/31/2012  10:10 PM           512,603 A Confederacy of Dunces - John Kennedy
07/21/2012  08:22 PM           512,603 A confederacy of dunces - John Kennedy

               2 File(s)      1,025,206 bytes
This seems to have happened after converting a number of books using Calibre. I have maybe 50+ examples of this. I've noticed that if I select the older file and delete it, the remaining file still keeps the older date. That is, if I were to delete the file from 21-Jul, the remaining file (with a date of 31-July) would start showing a 21-Jul last mod date.

Looking for explanations and suggestions about how to clean this up, preferably without risking the loss of the newer file.
posted by FreelanceBureaucrat to Technology (13 answers total)
Copy the two file names and paste each into a spell check equipped program and see if there is a extra space or hidden character? Renaming one by adding an underscore would protect it of course and retain the data. Or something obscure is going on and Google is your friend.
posted by Freedomboy at 10:25 AM on August 12, 2012

As far as I know, this shouldn't be possible. Any ideas about why this would have happened and how to fix it?

However you ended up here, this is an expected result. File names are case sensitive; that is, "Dunces" and "dunces" aren't the same at all, as far as the computer's concerned.

I strongly suspect your observation about file access dates is incorrect, but the easiest thing to do is to make a duplicate of the entire directory, then delete the files you want to delete from the original folder. If that gets you what you want, great! If not, you have a fallback.
posted by mhoye at 10:32 AM on August 12, 2012

Case sensitivity. Notice that C and D are capitalized in the first file, but not seond, thus they are not identically named.

The next time you try deleting the older file, note the capitalization of the file that remains... Did it adopt the older file's capitalization, too, or just the date? This might help you troubleshoot.

Also: what happens if you try to rename the newer file?
posted by davejay at 10:34 AM on August 12, 2012

Traditionally, Windows filenames are what Microsoft calls "case-retentive" rather than case-sensitive. That is, they remember the capitalization that the filename was created under, but in all other respects treat filenames that differ only in capitalization as the same. I'm guessing that Calibre works with the filesystem at a lower level and bypasses checks for identical filenames. But at the level where you interact with it, Windows is still assuming that case isn't important, so when you tell it to delete a file, it just deletes the first file with that name that it finds.

As for fixing the problem, I'm with Freedomboy. Rename one of each duplicated file in a trivial and reversible way. Then check to see whether the one you want to keep is the renamed version or not. Then you can delete the one you don't want to keep without worrying about Windows deleting the wrong one.
posted by baf at 10:46 AM on August 12, 2012 [1 favorite]

Ok, so it looks like the case-sensitive file name scenario is valid. I read here that NTFS can support case-sensitive file names, but that most applications, including Windows Explorer, can't deal with them properly. Of particular relevance: Other inconsistencies also exist. The Windows NT Command Prompt and File Manager correctly display the names of the files. However, normal commands, such as COPY, fail when you attempt to access one or more filenames that differ only in case.

This matches my result. To mhoye's suggestion above, I can't make a duplicate of the folder using Explorer because it will throw a duplicate filename error when it encounters the second file.

Any good tools that can interpret the case-sensitive names correctly in Windows? Or maybe I'd have better luck cleaning this up using a Linux Live CD?
posted by FreelanceBureaucrat at 10:49 AM on August 12, 2012

It's not super-convenient, but you can use the 8.3 version of the file names to distinguish them: open a command window, do dir /x to display the file names (or just know that the 8.3 name is the first six characters, a tilde, and a number, and then the first three letters of the extension), and then use that to rename -- probably something like RENAME ACONFE~1.MOB "A Confederacy"

If there are a lot of files and you've comfortable using linux, that's probably going to be easier, but if there are a relatively small number, then this works ok.
posted by inkyz at 11:13 AM on August 12, 2012 [2 favorites]

What exactly are you trying to accomplish? I.e., what do you mean by "cleaning this up"?
posted by jeffamaphone at 12:17 PM on August 12, 2012

Cygwin's a port of the standard unix environment to Windows, and its file utilities (rm, cp, mv, etc.) should be able to handle it.
posted by junco at 12:18 PM on August 12, 2012

My thinking is the same as junco that Cygwin will help you. The problem with cygwin is that it's difficult to uninstall (or at least used to be). However, there is a portable version you could put on USB that might be easier than the linux live CD!
posted by kg at 1:02 PM on August 12, 2012

Expanding on inkyz's suggestion, you could also reference these files by their shortname if you're looking for an easy way to clean up the duplicates (assuming these files are identical).

For example if the shortnames are FILENA~1 and FILENA~2 when looking at the directory via dir /x, you could type something like del *~2.* to remove all of the second instances of the file. I'd recommend working on this after making a backup of what's currently there of course...just in case. For example, if multiple files have the first 6 characters in the same directory, you'll have ~2's even though they would likely not be'd have to be more selective in that case with your identification and possible scripting.

Scripting aside, plenty of programs out there can tackle this with ease too. You could also try a duplicate file cleanup utility.
posted by samsara at 1:36 PM on August 12, 2012

On Unix systems when confronted with similar issues, one trick was to use "rm"s interactive mode. So you could do:

rm -i *

This prompts you for the deletion of each file. So you just say no until you get to the offending file, then hit yes.

DOS has a similar feature in "del"s /p option. It might help you should proceed with caution.
posted by chairface at 2:33 PM on August 12, 2012

BTW, if you ever get a file you can't delete, download a copy of a *NIX-based boot CD or USB drive, boot from it, navigate to the directory in question, and delete it from there.

This comes in useful if you have:
Filenames that exceed the system length limit (as when a long filename is moved to a deep directory).
Filenames that include illegal characters (which can be deposited there by another operating system using the disk).
Filenames that simply won't listen to reason, even after rebooting, under Windows.

Knoppix, Ubuntu, and UltimateBootCD are free systems that will do the trick easily, even if you are new to UNIX systems. You will have a short learning curve, but if you're clever enough to boot from CD/USB, you can probably manage it.
posted by IAmBroom at 11:48 AM on August 13, 2012

I used an Ubuntu LiveCD to delete the older of the two files. It seems that simple commands in Windows Explorer (such as Copy, Delete, and Rename) didn't consistently pick up the right file because of their case-insensitivity.
posted by FreelanceBureaucrat at 6:34 PM on August 13, 2012

« Older How can I control/observe my p...   |  Where can I buy industrial bub... Newer »
This thread is closed to new comments.