How to detect RARs/EXEs hidden in JPGs?
April 19, 2009 5:23 PM   Subscribe

I recently noticed a few poor-quality image files with suspiciously large file sizes that were uploaded to my public FTP server quite some time ago. They ended up being readable as WinRAR archives and contained some highly illegal content as well as what I suspect are viruses. The instructions for how to hide archives within JPGs is pretty well documented but I would like to know if there is some utility I can run that would scan all the files on my server and detect if any of them are more than they appear to be. Linux or Windows utilities would both work. Thanks
posted by anonymous to Computers & Internet (11 answers total) 6 users marked this as a favorite
This isn't a simple problem and there's no simple answer. Google for steganography detection for some more info/links. There's a tool called Stegdetect that can detect some hidden info in files, but keep in mind that anyone can write their own program to hide data in a JPG or MP3 or whatever and chances are it won't be detectable.
posted by reptile at 5:29 PM on April 19, 2009

Sorry I fail at links: steganography detection
posted by reptile at 5:30 PM on April 19, 2009

What happens when you run the `file` command on it? Does it show as a JPG or a RAR?

You may be able to set up a scanning script to run `file` on the public directories and delete the archives from there. Even limit it to .jpg files.

I imagine the `find` command would probably be your friend.
posted by puddpunk at 5:37 PM on April 19, 2009

reptile: I'm not sure we're talking about steganography. from the question, I understood that these were actual .rar files, renamed with .jpg.
posted by ArgentCorvid at 6:10 PM on April 19, 2009

They're probably functioning jpgs with other files embedded within using this technique.
posted by Tenuki at 6:25 PM on April 19, 2009

reptile: I'm not sure we're talking about steganography. from the question, I understood that these were actual .rar files, renamed with .jpg.

They're both.
posted by unixrat at 6:43 PM on April 19, 2009

Try scanning each .JPG for the presence of the RAR marker block.

Here's the byte sequence: 0x52 0x61 0x72 0x21 0x1a 0x07 0x00
posted by shinybeast at 7:00 PM on April 19, 2009

This isn't a problem you're going to be able to solve perfectly, but if the people abusing your FTP server are following a set pattern (i.e. always uploading RAR archives and naming the extension .jpg), you can detect particular file types pretty easy and block them.

Most file formats have some "magic" in them, static header information that allow easy identification that doesn't rely on extensions, which is mostly what the unix file(1) command relies on. If you grep for JFIF and it does not show up, it's probably not an actual jpeg (although technically it could be). RAR archives should start with the string "Rar!", zipfiles will have "PK\x03\x04", and so on. You can see a comprehensive set of rules in /usr/share/misc/magic on pretty much any UNIX/Linux machine that has file(1) installed. Also, believe it or not, WikiPedia pages on most file formats describe the header byte sequences in detail.

That said, the real problem here is running an open FTP server that allows anonymous uploads. I can't think of any good reasons for doing this when you consider all the potential legal problems you could get mired in. So, even if you figure out how to block the payload they are sneaking in using a blacklist on the magic header, it will be trivial for them to work around it if they know what they are doing.

As a side-note, there are more technical ways to detect if something is an image or not. On a webapp I wrote recently that allows (authenticated!) users to upload images, I use the PIL (Python Imaging Library) to attempt a decode and also verify the dimensions are within the allowed range. You can mail me if you want the code for that. You could probably leverage ImageMagick or GD in this way too for a shell-scripted solution.
posted by cj_ at 7:19 PM on April 19, 2009

Hrm, re-reading your question, I think I misunderstood. You are able to view these images fine? That suggests they just appended a RAR archive onto a real JPEG. What you could do is re-encode the image, which will strip out the appended junk that isn't really part of the JPEG. For example:
$ cat orig.jpg > stacked.jpg
$ file stacked.jpg 
stacked.jpg: JPEG image data, JFIF standard 1.01
$ convert stacked.jpg stripped.jpg  # this is an ImageMagick command
$ ls -l
 11483 orig.jpg
322399 stacked.jpg
 11484 stripped.jpg
(I still recommend locking down open FTP servers completely)
posted by cj_ at 7:39 PM on April 19, 2009

I think hafinder, a Windows command-line program, will do what you want.
posted by exphysicist345 at 7:58 PM on April 19, 2009

cj_ has the best answer: run a cron job that 'convert's every jpg to a new jpg, that will strip all extranious data. This could be extended to mp3s with lame etc. etc.
posted by idiopath at 9:02 PM on April 19, 2009

« Older Care Package Recommendations?   |   How to mix fonts? Newer »
This thread is closed to new comments.