recovering from file overwrite
February 1, 2006 12:11 PM   Subscribe

Most PC mistakes are undoable -- I can undelete deleted files pretty reliably, and many applications have some kind of undo function. The one thing it seems I can't undo is accidentally saving one file on top of another.

I know you get a warning when you're asked for the filename, but I've blown past that and overwritten a valuable file often enough to make me think I'm not the only one. The question is, am I missing an obvious way to recover from this, or is there some kind of third party utility that can "undo" a file overwrite? Is the new Microsoft OS supposed to address this at all?
posted by stupidsexyFlanders to Computers & Internet (24 answers total)
 
I'm not aware of any operating or file system that has this sort of protection built-in; usually, keeping copies of multiple revisions (which is effectively what this would entail) has been the domain of source-control systems like CVS or Visual SourceSafe, which are primarily used for keeping copies of source code, but work just as well for any type of file. I don't know of any source control solutions that are transparent to the user; you'd have to actively add new files to the repository, and then check in new versions down the line.
posted by Godbert at 12:23 PM on February 1, 2006


This is exactly the type of things that CVS has been built for.

In your case, daily backups would probably be more appropriate. Depending on how much you're willing to spend, this could be done really seamlessly (if you're an always-on kinda guy, you could probably easily set up a solution for < $100 for a backup to an ext HD).
posted by fishfucker at 12:25 PM on February 1, 2006


Probably not. When you delete a file, the software just deletes an entry in a database that says, "foo.doc is located at address 1492" Usually when you overwrite a file, the software says, "go to 1492, and start writing." I believe that VMS at one time had implicit versioning that saved X number of versions.

Your best bet for avoiding this kind of thing is regular backups and/or a version control system.
posted by KirkJobSluder at 12:27 PM on February 1, 2006


The reason you can undelete files is that when Windows deletes a file, it simply flags the file as deleted-- but the data is still there until something overwrites it. By saving another file over it, you've basically destroyed the underlying data. A disk forensic expert might be able to recover it, but that's not a simple or inexpensive task.

If the overwriting file is shorter than the original, some of the data might still be there. Also, there may be older versions of the file that was deleted. Use a disk recovery tool like R-Studio to see what's available on your disk.
posted by justkevin at 12:31 PM on February 1, 2006


This sort of functionality has been present on NetApp filers for years.

In addition Windows Server 2003 has a feature called something like "Shadow Copy Service" that keeps past versions of files around. There is client side piece of software that makes these copies available by right clicking the file on a Windows PC client.

Samba appearantly can imitate a Win2k3 server from the point of view of the client side software, and LVM will let you take filesystem snapshots to provide the "shadow copies." So, you could put something like this together.

You could also do something like this with TortiseSVN, an implementation subversion version control program that integrates with the Explorer shell. You'd want to have a scheduled task to do a check in at regular intervals

As for whether Vista will make "shadow copies" available as purely client side tech, I think the answer is maybe. If it is offered, it probably won't be offered on all 256 million versions microsoft plans on releasing.
posted by Good Brain at 12:48 PM on February 1, 2006


Well, there are file systems that automatically retain old versions of files. The VMS file system does this, for example, and no doubt there are others. However, none of the file systems that I'm aware of for Windows or Linux do this - it can be a big waste of space, and it can be pretty confusing to users, who might accidentally use an old version of a file and become confused.

Suggestions: make backups, set important files to be read-only, use a version control system like Subversion, and pay attention to warning messages!
posted by jellicle at 12:50 PM on February 1, 2006


As Good Brain says, this functionality is available on heavy duty file servers. We have a BlueArc filer at work that does the same thing. I'm a little surprised that something like this isn't more readily available. It's saved my ass on a couple of occasions.

As others have said, VMS had it too. It was one of the nice features of VMS.
posted by pombe at 1:09 PM on February 1, 2006


I would also like my status as a grognard acknowledged by saying that yes, this was a cool feature of VMS. But otherwise I have nothing to add to the firly comprehensive answers given in this thread.
posted by GuyZero at 1:13 PM on February 1, 2006


WinFS was supposed to do this but it's not being released with Vista.

You can do it cheaply with Subversion and some drive mapping tool (Novell NetDrive, WebDrive, etc.). Now that the flatfile of Subversion is out use that, and set up apache to serve your filesystem via webdav and then map that. Transparent versioning!
posted by holloway at 1:22 PM on February 1, 2006


Ah, good memories of VMS saving my arse many times years ago.

Bad memories of the performance of that particular machine.

Other than that, some sort of versioning system or simple backup is essential.
posted by unixrat at 1:31 PM on February 1, 2006


"I'm not aware of any operating or file system..."

There are lots of OSs and file systems that do this. There are a good number of choices in the UNIX world, including Linux, BSD, and I assume OS X. But even these days Windows supports installable file systems and I wouldn't be surprised if there weren't versioning file systems available in the server market.
posted by Ethereal Bligh at 1:53 PM on February 1, 2006


I tried a continuous backup tool that was intended to do this kind of thing once - every single write was stored as a different file. It didn't work very well, and I didn't keep playing long enough to find out if the problem was fundamental. Here is one I just googled up - NTI Shadow.

It was my impression that many apps (word) don't always do complete re-writes, only incremental updates, which would make even file system solutions problematic. Am I out to lunch on that?
posted by Chuckles at 2:08 PM on February 1, 2006


pombe writes "VMS had it too. It was one of the nice features of VMS."

This is one of the places I wish the NT group had done a better job of copying VMS.
posted by Mitheral at 2:50 PM on February 1, 2006


There is a fundamental assumption here that I believe is wrong. When justkevin says:

The reason you can undelete files is that when Windows deletes a file, it simply flags the file as deleted-- but the data is still there until something overwrites it. By saving another file over it, you've basically destroyed the underlying data. A disk forensic expert might be able to recover it, but that's not a simple or inexpensive task.

I don't believe this is true. There is no reason for Windows to be rewriting to the same space where the old file was. Windows is actually just deleting the old file and saving a new one, it doesn't care where it goes (in terms of sectors on the hard disk).

This suggests that recovering the deleted file should not be that much harder than recovering any undeleted file. Not that I know how to do either.
posted by Dunwitty at 3:03 PM on February 1, 2006


I'm not sure if windows/NTFS works this way, but a possible issue with overwriting a file is that it blows away the info about where the peices of the old file live on the disk and replace it with info about the location of the pieces of the new file. As a result, it's harder to piece together the old file.
posted by Good Brain at 3:19 PM on February 1, 2006


I don't believe this is true. There is no reason for Windows to be rewriting to the same space where the old file was. Windows is actually just deleting the old file and saving a new one, it doesn't care where it goes (in terms of sectors on the hard disk).
Dunwitty: You're talking about something unrelated (and ambiguous). justkevin said something completely correct, in Windows data (where data is defined as 01s in clusters and sectors, not the FAT) exists until it's overwritten, see SHSC.info/DataRecovery

By the way, there is no evidence that Data Forensic experts have ever been able to recover data (same definition) that's been overwritten even once. I don't care what anyone thinks the spooks are capable of -- there's no evidence to think that.
This suggests that recovering the deleted file should not be that much harder than recovering any undeleted file. Not that I know how to do either.
What? Clarify your terms.
posted by holloway at 3:47 PM on February 1, 2006


VMS!! Oh how it brings me back:2.
posted by adzm at 8:56 PM on February 1, 2006


Best answer: While it wont rtm until this summer, Vista will have this.
posted by stupidcomputernickname at 9:42 PM on February 1, 2006


holloway - actually, my terms are all there.

What I'm saying is this: when you "Save" from an application, and a file with that name already exists, Windows doesn't actually "overwrite" the old file.

It does this:

1. deletes the old file
2. saves a new file

Step number two here is just a file save, as if the first file never existed. Windows simply puts it where ever in the file table and on disk that it wants. It has no more a chance of "overwriting" the actual data of the old file (which Windows has already "deleted") than any other disk write.

So, there's nothing special about what StupidSexyFlanders is doing. If he can recover other undeleted files with his tool (and I'm not talking about file sitting in the Recyle Bin files, I'm talking about files you have shift-deleted or otherwise actually deleted), then he can recover these files.

Does that make sense?
posted by Dunwitty at 1:49 AM on February 2, 2006


Ah VMS. (x 3)
anal/rms/out=t.t foo.exe;-3
posted by Pericles at 5:07 AM on February 2, 2006


A datapoint (Mac OS 10.4.3):
524 ~$ echo foo > file
525 ~$ ls -li file
1307363 -rw-r--r-- 1 ryan ryan 4 Feb 2 21:47 file
526 ~$ echo bar > file
527 ~$ ls -li file
1307363 -rw-r--r-- 1 ryan ryan 4 Feb 2 21:47 file
528 ~$ rm file
529 ~$ echo baz > file
530 ~$ ls -li file
1307364 -rw-r--r-- 1 ryan ryan 4 Feb 2 21:48 file

This behavior matches KirkJobSluder's description.
posted by ryanrs at 9:51 PM on February 2, 2006


Dunwitty, do you have actual knowledge of the Windows filesystem allocation policies? NTFS, FAT, FAT32? Most OSes behave contrary to your description, and for good reason. Reusing blocks helps fragmentation and caching. Of course, there is a possibility Windows may be sub-optimal.
posted by ryanrs at 10:04 PM on February 2, 2006


The NT group didn't "copy VMS", they built a new robust OS under the direction of the chief architect of VMS, Dave Cutler. That's the only connection to VMS and, by the way, don't you think DEC would have been a little annoyed if Cutler had "copied" VMS when he went to MS?

As to the Dunwitty/ryanrs argument, I don't know who's correct, but I'm sure some Googling would turn up a clue. NTFS was originally claimed to "not need defragmenting" and from that I suppose that how NTFS updates existing files may have various fragmentation-reducing strategies. I'd bet money, though, that FAT and FAT32 acts as dunwitty described.

Note that how FAT(32) and NTFS delete files, however, cannot be said to be similar, particularly the "change first letter to '?' in the table" action. This is why undelete tools for NTFS return full filenames or unknown filenames. NTFS is not fully documented publicly by MS, which is infuriating, so there's a number of its behaviors that are unknown or ambiguous.
posted by Ethereal Bligh at 10:33 AM on February 3, 2006


I think I'm being misread. I'm not saying that Windows doesn't re-use blocks in a general sense, I'm saying that Windows doesn't re-use those specific blocks, just because an application is saving a file with the same name as an existing file.

I'm not talking about low level stuff here at all. I'm saying this (in a very simplified way):

When Word opens a file "foo.doc", Windows basically opens that file stream; and any subsequent "Saves" get written to that filestream.

But when you open a new doc, type some text, and then choose "Save As..." and specify "foo.doc" (where foo.doc exists already), Windows does NOT seek out "foo.doc" and open it's filestream and overwrite those blocks.

Instead, it simply performs a "file delete" on foo.doc, opens a new filestream for the new "foo.doc" and saves the data.

So when you "Save As" over an old doc, what you are really doing is "delete old file, open a new file, save that".

So, there's no more chance of the new bits that make up the new "foo.doc" being written over the old bits that make up the old "foo.doc" than any other random disk write.

Does this make sense?
posted by Dunwitty at 5:54 AM on February 4, 2006


« Older Will it do a body good?   |   Location scouting in the Bay area Newer »
This thread is closed to new comments.