How to detect RARs/EXEs hidden in JPGs?
April 19, 2009 5:23 PM
I recently noticed a few poor-quality image files with suspiciously large file sizes that were uploaded to my public FTP server quite some time ago. They ended up being readable as WinRAR archives and contained some highly illegal content as well as what I suspect are viruses.
The instructions for how to hide archives within JPGs is pretty well documented but I would like to know if there is some utility I can run that would scan all the files on my server and detect if any of them are more than they appear to be. Linux or Windows utilities would both work. Thanks
What happens when you run the `file` command on it? Does it show as a JPG or a RAR?
You may be able to set up a scanning script to run `file` on the public directories and delete the archives from there. Even limit it to .jpg files.
I imagine the `find` command would probably be your friend.
posted by puddpunk at 5:37 PM on April 19, 2009
You may be able to set up a scanning script to run `file` on the public directories and delete the archives from there. Even limit it to .jpg files.
I imagine the `find` command would probably be your friend.
posted by puddpunk at 5:37 PM on April 19, 2009
reptile: I'm not sure we're talking about steganography. from the question, I understood that these were actual .rar files, renamed with .jpg.
posted by ArgentCorvid at 6:10 PM on April 19, 2009
posted by ArgentCorvid at 6:10 PM on April 19, 2009
They're probably functioning jpgs with other files embedded within using this technique.
posted by Tenuki at 6:25 PM on April 19, 2009
posted by Tenuki at 6:25 PM on April 19, 2009
reptile: I'm not sure we're talking about steganography. from the question, I understood that these were actual .rar files, renamed with .jpg.
They're both.
posted by unixrat at 6:43 PM on April 19, 2009
They're both.
posted by unixrat at 6:43 PM on April 19, 2009
Try scanning each .JPG for the presence of the RAR marker block.
Here's the byte sequence: 0x52 0x61 0x72 0x21 0x1a 0x07 0x00
posted by shinybeast at 7:00 PM on April 19, 2009
Here's the byte sequence: 0x52 0x61 0x72 0x21 0x1a 0x07 0x00
posted by shinybeast at 7:00 PM on April 19, 2009
This isn't a problem you're going to be able to solve perfectly, but if the people abusing your FTP server are following a set pattern (i.e. always uploading RAR archives and naming the extension .jpg), you can detect particular file types pretty easy and block them.
Most file formats have some "magic" in them, static header information that allow easy identification that doesn't rely on extensions, which is mostly what the unix file(1) command relies on. If you grep for JFIF and it does not show up, it's probably not an actual jpeg (although technically it could be). RAR archives should start with the string "Rar!", zipfiles will have "PK\x03\x04", and so on. You can see a comprehensive set of rules in /usr/share/misc/magic on pretty much any UNIX/Linux machine that has file(1) installed. Also, believe it or not, WikiPedia pages on most file formats describe the header byte sequences in detail.
That said, the real problem here is running an open FTP server that allows anonymous uploads. I can't think of any good reasons for doing this when you consider all the potential legal problems you could get mired in. So, even if you figure out how to block the payload they are sneaking in using a blacklist on the magic header, it will be trivial for them to work around it if they know what they are doing.
As a side-note, there are more technical ways to detect if something is an image or not. On a webapp I wrote recently that allows (authenticated!) users to upload images, I use the PIL (Python Imaging Library) to attempt a decode and also verify the dimensions are within the allowed range. You can mail me if you want the code for that. You could probably leverage ImageMagick or GD in this way too for a shell-scripted solution.
posted by cj_ at 7:19 PM on April 19, 2009
Most file formats have some "magic" in them, static header information that allow easy identification that doesn't rely on extensions, which is mostly what the unix file(1) command relies on. If you grep for JFIF and it does not show up, it's probably not an actual jpeg (although technically it could be). RAR archives should start with the string "Rar!", zipfiles will have "PK\x03\x04", and so on. You can see a comprehensive set of rules in /usr/share/misc/magic on pretty much any UNIX/Linux machine that has file(1) installed. Also, believe it or not, WikiPedia pages on most file formats describe the header byte sequences in detail.
That said, the real problem here is running an open FTP server that allows anonymous uploads. I can't think of any good reasons for doing this when you consider all the potential legal problems you could get mired in. So, even if you figure out how to block the payload they are sneaking in using a blacklist on the magic header, it will be trivial for them to work around it if they know what they are doing.
As a side-note, there are more technical ways to detect if something is an image or not. On a webapp I wrote recently that allows (authenticated!) users to upload images, I use the PIL (Python Imaging Library) to attempt a decode and also verify the dimensions are within the allowed range. You can mail me if you want the code for that. You could probably leverage ImageMagick or GD in this way too for a shell-scripted solution.
posted by cj_ at 7:19 PM on April 19, 2009
Hrm, re-reading your question, I think I misunderstood. You are able to view these images fine? That suggests they just appended a RAR archive onto a real JPEG. What you could do is re-encode the image, which will strip out the appended junk that isn't really part of the JPEG. For example:
posted by cj_ at 7:39 PM on April 19, 2009
$ cat orig.jpg test.zip > stacked.jpg $ file stacked.jpg stacked.jpg: JPEG image data, JFIF standard 1.01 $ convert stacked.jpg stripped.jpg # this is an ImageMagick command $ ls -l 11483 orig.jpg 322399 stacked.jpg 11484 stripped.jpg 310916 test.zip(I still recommend locking down open FTP servers completely)
posted by cj_ at 7:39 PM on April 19, 2009
I think hafinder, a Windows command-line program, will do what you want.
posted by exphysicist345 at 7:58 PM on April 19, 2009
posted by exphysicist345 at 7:58 PM on April 19, 2009
cj_ has the best answer: run a cron job that 'convert's every jpg to a new jpg, that will strip all extranious data. This could be extended to mp3s with lame etc. etc.
posted by idiopath at 9:02 PM on April 19, 2009
posted by idiopath at 9:02 PM on April 19, 2009
This thread is closed to new comments.
posted by reptile at 5:29 PM on April 19, 2009