Help me separate the wheat from the chaff.
April 25, 2009 8:37 PM   Subscribe

I use Scalpel and PhotoRec to recover deleted and corrupt files for clients. Due to the way these two pieces of software search for lost files there are lots of false positives. Is there a way to automatically test the recovered files to separate the real files from the false positives?

Microsoft Office files seem to generate the most false positives (Microsoft Office files are also the files that my clients many times want back most). I can sort the Microsoft Office files by metadata, and the files that have no metadata most of the time aren't real Microsoft Office files. Are there other tools or tricks to figure out which Microsoft Office files are real?

What about methods for other file types? Are there other pieces of opensource/free software that do as good a job as Scalpel and PhotoRec at recovering files?
posted by gregr to Computers & Internet (3 answers total) 5 users marked this as a favorite
There is enough redundancy (headers, internal pointers, etc) in the OLE Structured Storage format to do a consistency check, but I don't know offhand of a tool to do so.
posted by hattifattener at 9:11 PM on April 25, 2009

Regarding the MS Office ones might one approach be to get a program to open the file and see if the program complains about it ? I write programs (in C# although there's plenty of other approaches) which drive Office files and (although I haven't tried it) it occurs to me that if one of your false positives was presented as being an Office file then my program would complain about it.

Another thing that comes to mind is that Python has a number of nice Office interface libraries which would provide a lightweight 'script-centric' approach to the same end.
posted by southof40 at 2:51 AM on April 26, 2009

Here is a list of file carving tools from the ForensicsWiki. On that wiki you can also find a bibliography of research papers that study carving.

One answer to your question is that the underlying file system can greatly affect how often a file carving tool is needed. For example, FAT32, which is commonly on any USB or external device that is small, sets up a difficult situation for recovering deleted files. Whereas NTFS keeps a lot of information around after the file is marked for deleted by the user. But it sounds like you don't have control of what you are given.
posted by about_time at 5:11 AM on April 26, 2009

« Older Lighting up a Salt Lamp without electricity?   |   I upgraded my cell phone (ATT) and now people... Newer »
This thread is closed to new comments.