Help me hack open a proprietary binary image.
June 17, 2008 5:07 AM   Subscribe

I'm trying to disassemble an undocumented, proprietary binary image format. I'm looking for the best tools and tutorials available.

I can see that the first thing to do is to open it up with a hex editor and poke around (I don't see any ASCII strings in there, but I do see a header, data blocks, etc)... However, beyond that I have next to no experience with this kind of thing. If you have any pointers to tutorials, techniques, software, or advice of your own, it would be most welcome.

I wish I could say more about the source of the file, but I can't.
posted by fake to Computers & Internet (14 answers total) 5 users marked this as a favorite
What I would is gather the specifications of image and compression formats that predate the one you have. It's possible that whoever developed it borrowed or copied some ideas, which might give you a foothold.
posted by jedicus at 5:33 AM on June 17, 2008

I have looked at other specs, but this is data from an image sensor on a proprietary system with totally custom hardware and software at both ends. The closest thing to an extant spec, in that regard, is the code for dcraw, which I've looked at. But dcraw wasn't designed to work with this type of data, because it doesn't need interpolation.

I suspect that you're right - once I get some purchase, I'll see some TIFF-like or bitmapped structures. I just need a bit more bootstrapping to get that far.
posted by fake at 5:40 AM on June 17, 2008

The first rule of thumb is that a proprietary file format often isn't; developers are likely to copy from existing formats and approaches, even if they roll their own implementation.

O'Reilly has a good book (now out of print, but available from amazon's used book dealers) called the "Encyclopedia of Graphics File Formats". It covers a ton of different formats and introduces a fair amount of the theory behind them.

Ultimately, there are two general types of image formats; vector-based graphics that contain a mathematical description of how to draw the image (like post script or SVG), and raster graphics that map the image file to an area of memory for display (like JPEG or GIF). The first step is figuring out which type of image format you have. The size of the file will likely be a good hint here; raster files tend to be much bigger.

Once you have the type, you'll need to figure out the encoding- the image data is almost certainly compressed somehow, if it's a raster image. You may be able to get a sense for this by looking at the relative file size and the image complexity- if the file size is about the same as the amount of memory used to display it, it may even be a "raw" format. Color depth and tonal variations may also help you here; JPEG and similar lossy compressions have artifacts that you can pick up on close examination of the image.

FWIW, I'd try using the imagemagick "Identify" command and see if it can make sense of it. Expectations would be low if it is truly proprietary, but if it's a common format that's been gussied up a bit, it might be able to get in.

If you can post any more details about the file, what you can presently do with it and what you are trying to do (without getting into trouble), it would help out a bit in understanding the issue.
posted by jenkinsEar at 5:42 AM on June 17, 2008

The Strings command might turn up ASCII that you missed browsing in a text editor. Do you at least know the dimensions of the image? The color depth?

I've never done anything like this before, but I'd probably try just mapping all the bits onto an image first (i.e. 0=black, 1=white). It should at least give you a feel for the layout of the data.
posted by meta_eli at 5:46 AM on June 17, 2008

Thanks, jenkinsEar.

This is definitely a raster file. It is "raw" data coming from a linear sensor array. It could be 10, 12, 16, 24, 32bit, I can't tell. Likely one of the lower numbers. I put raw in quotes because so far, I don't just see neat matrices of numbers like one might see in PGM or some other nice format.

I know that the header must be at the beginning of the file because the incoming data can be stopped at any moment or continued indefinitely.

Your book+software recommendations are great. I have to run, I'll be back this evening.
posted by fake at 5:54 AM on June 17, 2008

Do you at least know the dimensions of the image?

Yes. Let's say 1k pix* whatever. How do I put that knowledge to use? (sorry for these bozo questions)
posted by fake at 5:55 AM on June 17, 2008

If it's not compressed, then my strategy would be to map bits onto black and white, as meta_eli suggested. I would start with every bit, then every 2nd bit, then every 4th bit, etc. At some point you'll discover what the most significant bit is and you should see the low frequency data show up as a black and white image. That should give you a start.
posted by jedicus at 5:58 AM on June 17, 2008

My first attempt would be to run it past the unix file command and see what that chucks up. I've gotten pretty lucky with it in the past. Can't hurt.
posted by Orb2069 at 7:55 AM on June 17, 2008

Your #1 question is whether the images are compressed. That should be easy to determine; are all the files the same size?

Can you make new images with the device? The first thing I'd do is take pictures of a completely white scene and a completely black scene and see if the resulting files make any sense. The second thing I'd do is take two pictures of an identical scene and compare them, see if that helps you identify the inevitable metadata at the start of the file.
posted by Nelson at 8:01 AM on June 17, 2008

Have you tried opening the file in an universal image viewer like ACDsee and seeing if anything shows up?
posted by wongcorgi at 11:04 AM on June 17, 2008

Another option is ParseRat, an application which examines a file with unknown structure and attempts to make sense of it.
posted by exphysicist345 at 12:03 AM on June 18, 2008

If the images are compressed, they are not compressed like jpegs or something. They could be compressed with RLE or similar, but I doubt it.

wongcorgi, I tried irfanview, which decided that the image was a TIFF but could go no further.

I'll try ParseRat, file, and all the other great advice in this thread today and tomorrow. I'll report back here. Thanks everyone so far. More ideas always welcome.
posted by fake at 4:43 AM on June 18, 2008

Also, I found that if you rename a file as ".RAW", you can open it with Photoshop and specify many useful things (bit depth, columns/rows).
posted by fake at 4:53 AM on June 18, 2008

Resolution: the format turned out to be substantially tiff-like; I figured that out by using ImageMagick and Irfanview (which both reported that it was a munged TIFF). Turned out that there were several image data "channels" containing non-image information. After I discarded those, it was easily readable. Thanks everyone.
posted by fake at 9:49 AM on June 26, 2008

« Older Help me love (or at least somewhat enjoy) my...   |   Should I be happy with a partial salary increase? Newer »
This thread is closed to new comments.