Automated corrupt-pr0n detection tool?
January 16, 2010 10:33 PM   Subscribe

Does a tool exist for Mac OS X--either GUI or command-line--that can churn through a folders (or entire drive) worth of video files and detect which files are corrupted and non-playable?

I've been dumping tons of "old" DVD-R and CD-R archives (and by "old" I mean only 5 years old) to my Drobo, and--no big surprise--I'm finding random files to be corrupt (proving my assertion that optical-media is worthless for critical backup purposes). I dont want to waste space on the Drobo for files that are unplayable, but I also dont want to have to load all the files into Quicktime or VLC and manually hunt the baddies down.

I have no checksums for any of these files, so hash-checking tools arent really going to work for my purposes.

Ideally, I'd like this tool to be able to detect corruption in not only Quicktime files, but non-Apple-centric, Perian-supported video formats like AVI, WMV, MPG, DivX, etc.
posted by melorama to Computers & Internet (18 answers total) 3 users marked this as a favorite
 
you could probably script this with ffmpeg (just do a null conversion).
posted by rr at 10:41 PM on January 16, 2010


One approach to doing what rr suggests:

set root="/path/to/drobo/root/folder"
find "$root" -type f | while read -r pathname
do
    ffmpeg -i "$pathname" -acodec copy -vcodec copy -f avi -y /dev/null 2>/dev/null || echo rm "'$pathname'" >>deletions
done

Paste all the above into a text editor, make sure the ffmpeg command all ends up one one line (it should be the only line between "do" and "done"), change /path/to/drobo/root/folder to the pathname of your Drobo's root folder, then copy the whole mess out of the text editor and paste it into a Terminal.

The "find" command enumerates the pathnames of all files in all folders inside the nominated root folder. These are read one at a time into $pathname, then handed to ffmpeg to try to convert to an AVI and then discard (output file is /dev/null). All of ffmpeg's diagnostic output is also discarded. Any file that ffmpeg can't convert will cause a rm (delete) command for that file to be appended to a file called "deletions". You can review the deletions file to make sure it doesn't contain anything unexpected, or edit it to get rid of deletions you don't want to happen. Once you're happy with the deletions list, copy the whole thing and paste it back into Terminal to do the actual deletions.
posted by flabdablet at 1:09 AM on January 17, 2010


Response by poster: I'm getting a "find: ftsopen: No such file or directory" error when running that script, flabdablet...

(I'm running 10.6.2)
posted by melorama at 2:32 AM on January 17, 2010


Ah, bugger it - been writing far too many Windows scripts lately. Get rid of "set" in the first line and try again:

root="/path/to/drobo/root/folder"
find "$root" -type f | while read -r pathname
do
    ffmpeg -i "$pathname" -acodec copy -vcodec copy -f avi -y /dev/null 2>/dev/null || echo rm "'$pathname'" >>deletions
done

posted by flabdablet at 2:40 AM on January 17, 2010


Response by poster: Cool, thanks for the update. The script is now churning away.

One question, though. Is ffmpeg literally encoding every single file to /dev/null in order to determine whether it's corrupted?

I worried that this script will take four thousand years to complete, considering that I have over 800 GB of movie files to churn through.
posted by melorama at 3:16 AM on January 17, 2010


Best answer: 800GB is indeed going to take a hell of a long time to process, whatever you do; but this method is probably about as quick as you'll get. Yes, ffmpeg is indeed reading its way through every single file, but because both -acodec and -vcodec are set to "copy", it's not actually re-encoding the audio and video streams, merely splitting them out of the container they're in and writing them straight back out to /dev/null. Writes to /dev/null are very fast, because they don't actually do anything - so, in effect, all ffmpeg is doing for you is scanning through your files and making sure they're fully playable.
posted by flabdablet at 3:23 AM on January 17, 2010


If you want to get an idea of how long the thing is going to take, you could add a progress indicator:

root="/path/to/drobo/root/folder"
filecount=$(find "$root" -type f | wc -l)
processed=0
find "$root" -type f | while read -r pathname
do
    ffmpeg -i "$pathname" -acodec copy -vcodec copy -f avi -y /dev/null 2>/dev/null || echo rm "'$pathname'" >>deletions
    echo -n ' ' $((++processed)) / $filecount $'\r'
done

posted by flabdablet at 3:31 AM on January 17, 2010


Err... sorry, that won't work; the "filecount" and "processed" variables belong to the shell, but the subshell that's actually processing the output from "find" won't see them. Here's a better version - it exports the filecount to the environment, so subshells will inherit it, and puts the counter variables inside the subshell:

root="/path/to/drobo/root/folder"
export filecount=$(find "$root" -type f | wc -l)
find "$root" -type f | {
    processed=0
    corrupt=0
    while read -r pathname
    do
        ffmpeg -i "$pathname" -acodec copy -vcodec copy -f avi -y /dev/null 2>/dev/null || { echo rm "'$pathname'" >>deletions; let ++corrupt }
        echo -n ' ' $((++processed)) / $filecount, $corrupt $'bad\r'
    done
}

posted by flabdablet at 3:39 AM on January 17, 2010


Bollocks. Forgot a semicolon, and I don't actually need that "export"; subshells inherit shell variables as well as environment variables. Let me reformat it a little, for clarity:

root="/path/to/drobo/root/folder"
filecount=$(find "$root" -type f | wc -l)
find "$root" -type f | {
    processed=0
    corrupt=0
    while read -r pathname
    do
        if ! ffmpeg -i "$pathname" -acodec copy -vcodec copy -f avi -y /dev/null 2>/dev/null
        then
            let ++corrupt
            echo rm "'$pathname'" >>deletions
        fi
        echo -n ' ' $((++processed)) / $filecount, $corrupt $'bad\r'
    done
}

I believe that should work. I'll stop fiddling with it now.
posted by flabdablet at 3:50 AM on January 17, 2010


Fiddlesticks! The deletions file will contain a broken command for each corrupt file whose pathname contains an apostrophe. One... last... try...

root="/path/to/drobo/root/folder"
filecount=$(find "$root" -type f | wc -l)
find "$root" -type f | {
    processed=0
    corrupt=0
    while read -r pathname
    do
        if ! ffmpeg -i "$pathname" -acodec copy -vcodec copy -f avi -y /dev/null 2>/dev/null
        then
            let ++corrupt
            printf 'rm %q\n' "$pathname" >>deletions
        fi
        echo -n ' ' $((++processed)) / $filecount, $corrupt $'bad\r'
    done
}

Sorry about the rust. I really am confident that it's all OK now.
posted by flabdablet at 3:58 AM on January 17, 2010


... but you might want to rm deletions before you start, as this code will only append to it, never empty it.
posted by flabdablet at 4:01 AM on January 17, 2010


Response by poster: This looks like it's gonna work, flabdablet. Thanks 100000x for your scripting genius. I'm learning a lot of great ideas from your script that I never would have thought of, like making the deletions logfile able to serve as it's own mass-deletion script. Simple but brilliant.
posted by melorama at 4:35 AM on January 17, 2010


Response by poster: I'm tail -f'ing the deletions log, and noticing that jpeg files and other non-video files get passed to ffmpeg, which it obviously chokes on and thinks they are corrupt.

Is there a way to feed the script a list of file extensions to filter on? I've never done this sort of thing before, but would it be possible to poll the OS X LaunchServices database and get a list of all video extensions that VLC or Quicktime Player will open, then use it as a filter for the script?

Also, do you accept Paypal payments? Such quick awesomeness deserves to be compensated.
posted by melorama at 4:44 AM on January 17, 2010


Response by poster: Hmm...now I'm finding that there are many files getting piped to the deletions log that appear to play fine when I manually check it in Quicktime Player.
posted by melorama at 5:38 AM on January 17, 2010


Best answer: Is there a way to feed the script a list of file extensions to filter on?

Sure. I'm a Linux guy, not a Mac guy, so I know nothing of LaunchServices databases; but if you have some way to generate a simple space-separated list of suitable extensions, you can use it like this:

root="/path/to/drobo/root/folder"
extensions="mov avi mpg mpeg flv etc"

i=0
unset filter
for extension in $extensions
do
    filter[i++]=-o
    filter[i++]=-iname    # change -iname to -name if you want case-sensitive matching
    filter[i++]="*.$extension"
done
filter[0]='('
filter[i]=')'

filecount=$(find "$root" -type f "${filter[@]}" | wc -l)
find "$root" -type f "${filter[@]}" | {
    processed=0
    corrupt=0
    while read -r pathname
    do
        if ! ffmpeg -i "$pathname" -acodec copy -vcodec copy -f avi -y /dev/null 2>/dev/null
        then
            let ++corrupt
            printf 'rm %q\n' "$pathname" >>deletions
        fi
        echo -n ' ' $((++processed)) / $filecount, $corrupt $'bad\r'
    done
}

Paypal, schmaypal. I'm sure there's some kind of quick awesomeness that you could pass on to someone who needs your expertise. Paying it forward is the Ask Metafilter way.

On preview: change /path/to/drobo/root/folder to the pathname of a folder containing one of the files that looks like it's being erroneously selected for deletion, so that the script will process just those few instead of the whole lot; change $'bad\r' to $'bad\n', so you get a proper line break after each progress indicator; and remove 2>/dev/null from the ffmpeg line. That way, you'll see a bunch of ffmpeg spew that should help work out what its problem is.

This is precisely why I structured the script so it doesn't do immediate deletions. There are always, always teething problems.
posted by flabdablet at 6:18 AM on January 17, 2010 [1 favorite]


Also keep in mind that because ffmpeg is indeed reading entire files, it may well catch corruption that doesn't immediately make itself apparent in a quick play check. Quicktime Player might have more robust error recovery than ffmpeg. So before we go deleting great slabs of stuff off the Drobo, we want to make sure that the corruption detection tool isn't too overzealous.

It might be worth trying mplayer or mencoder or vlc in place of ffmpeg. All of them have comprehensive command line interfaces, and all of them can be persuaded to do high-speed null output. I don't know whether the same is true of Quicktime Player.

We may end up with a script that tries several playback tools in succession, marking the file for deletion only if it fails in all of them.
posted by flabdablet at 6:44 AM on January 17, 2010


I totally forgot about this thread and meant to come back with a script around my suggestion, so I decided to come back and look at the requirements before doing it. Awesome job, flabdablet.
posted by rr at 3:33 PM on January 18, 2010


Let's find out why ffmpeg and QuickTime Player disagree about what's playable before patting ourselves on the back too much :)
posted by flabdablet at 3:56 PM on January 18, 2010


« Older How can I be more clear in letting my clients know...   |   3D at home? Newer »
This thread is closed to new comments.