Cracking the (almost certainly simple) code
August 5, 2017 2:24 PM   Subscribe

I have a programmable bike-spoke light system. Amazon calls it something else, but all its internal branding calls it the "FTL FH-801". It's programmed by putting a file on a microSD card (unlike most which use either a cable or Bluetooth to push the data onto the lights themselves); it comes with a Windows utility to write these files. Due to some combination of (a) using Linux typically, (b) wanting a command-line, precision interface, and (c) willful perversity, I'd like to write an open-source utility to program this thing. Naturally, this involves reverse-engineering the file format, and I am pretty certain I'm not approaching that as well as I could.

The obvious solution, contacting the manufacturer and asking for documentation on the file format, ran into a brick wall. I contacted the only e-mail address I could find linked to this company at all, and never got a reply, which could be any or all of the possibilities that it's the wrong address, it goes to a division that's not at all technical, that it goes to someone not fluent in English (the manufacturers, and presumably also their support team, are Chinese), or that they just see my request as all downside and no upside for them. Either way, looks like I'm on my own, but since I can generate as many typical examples as I want (a single all-red pattern, a single all-green pattern except for a single blue line, three patterns, all monochrome with different timings, etc.) and since the lights themselves probably want to do as little computation as possible, it should be easy to puzzle out the format, right?

Well, maybe not, which is leading me to wonder if I'm not very good at this. A sample one-image solid-red had a 2560-byte header I figure I'll puzzle out later (some parts are obvious, like the first three characters being "POV') and the 512th byte being timing data), and what looks like 718848 bytes of image data. Those divide nicely by the 52 LEDs in the display to give me 13824 bytes per LED, which seems like too many (assuming 24-bit color, that's still 4608 pixels which would be an insane resolution level). Those bytes are exclusively FF, AA, 55, and 00, which is a bt suggestive on the bit level (11, 10, 01, and 00 quadrupled), and the bytes seem to be divided into triplets of which the first two bytes are always FF (lending credence to my 24-bit color theory, or perhaps a reversed 24-bit color theory, since the non-FF value is last, which presumably corresponds to red being intense, which is the exact reverse of the standard correspondence between value and intensity and the standard order of RGB color triples). A full-black image, incidentally, is all FF in the data (again reversing standard color order).

I include all this not to ask anyone to solve the puzzle on the basis of these clues, so much as to suggest I have dug into this at least a bit, and before I get deeper into the weeds, I have to ask: how is this usually done? Are there intelligent tools or techniques to use to explore this sort of thing? I was halfway hoping for something which was a trivial header kludged onto what was unmistakably 24-bit pixel data, maybe with a checksum or three thrown in. What's there is more complicated than that, but not by much. Are there good resources for figuring this sort of thing out further? I'm not averse to generating a huge number of very slightly different data files and looking at them all with a hex dumper, but I want to make sure I'm doing it in a remotely intelligent way.
posted by jackbishop to Computers & Internet (5 answers total) 8 users marked this as a favorite
 
Best answer: It's usually done exactly the way you are doing it. I like the Python struct module for doing this stuff.

It's probably using 24-bit addressable LEDs like WS2812, which are pretty common. But these LEDs don't have any extra header information, and they just take a raw 24-bit stream of bits. My guess is that either due to some engineering reason or sloppiness, they're taking the 24-bit stream and encoding it as those 4 bytes with the colors reversed.
posted by miyabo at 2:39 PM on August 5, 2017


Best answer: It looks like they support some kind of video too? That may explain the "too much" data part if, for example, "solid red" is just a short video looped indefinitely, say. Also they are probably using the dumbest/cheapest approach possible (i.e., not reinventing all wheels) so factor that into your thinking as well. For example, maybe rgba rather than just rgb (ignoring the alpha component), which would give 432 32-byte values per pixel... still too much but getting warmer?
posted by axiom at 2:48 PM on August 5, 2017


Best answer: You're mostly on the right path. Usually one builds up custom parsing tools to speed the generate-evaluate loop. Have a tool to just print out the range of data you care about; once you have enough understanding it should print out a description rather than just raw data (at first it might print hex codes; later printing color names; later it might detect patterns even). You make the assumption a certain range is only FF, AA, 55, or 00 -- automate that test to confirm your belief.

Also, write a creation tool right away. For example, first step is just output a single all-red pattern. Then add an option to choose the color. If it doesn't do what you expect, figure that out before moving on, and repeat ad nauseum.

Finally, IIRC there are wifi or BT microSD cards so you don't need to physically remove the microSD card to update the data in it.
posted by flimflam at 3:14 PM on August 5, 2017 [3 favorites]


Best answer: If there are two bytes of seeming colour data, another format used by LED displays is RGB-565: 0brrrrrggggggbbbbb. It's also fairly common to require mask data between frames to erase the LEDs you want to change before writing the next frame. This would appear as a field of the same length as the image, except with all the bits inverted.

I'm surprised that Allen from Sub-Etha Software or one of his commentators hasn't had a shot at reverse-engineering this. Even though his interest in this was a couple of years ago, maybe ask by commenting there?
posted by scruss at 8:14 AM on August 6, 2017


Best answer: Very good suggestions above. I would write/script two tools as early as possible: One for parsing a POV file generated by the software, dumping the header information and the data as well as flagging stuff that does not look as expected. The other one for taking an image (PNG via some library, raw pixel data) and generating a POV file that can be tested. Is it possible to load a POV file in the Windows software? This would allow to quickly check that you have managed to produce the expected output, without having to load onto the real display. Try to make testing your assumptions as frictionless as possible.

If you are comfortable with a debugger and disassembler, another option would be to have a look at the binary of the Windows software. Perhaps the functionality you are interested in is in a DLL, i.e., you might be lucky and there is an exported function generatePOV(img) that takes image data and dumps the file? Then you could write a thin wrapper that would allow generating POV files from some image without having to interact with the GUI. If there are checksums in the files, looking at the disassembly is probably necessary. Keep in mind that such reverse engineering might have legal pitfalls depending on your jurisdiction.

There are approaches to automatically infer file structure from fuzzing / observing the code paths used in loading files, e.g., using american fuzzy lop (afl). And I think I saw somewhere that there are ways to use afl with Windows DLLs using WINE. But that's probably overkill.

Translating a few test patterns using the Windows software should give you some very good initial ideas about the data part. In particular, just changing one pixel and looking at the hex diff (e.g., using vbindiff) should answer quite a lot of questions: Are there checksums, is the data actually a video stream (single pixel change replicated multiple times), is the data stored in a rectangular format or already transformed into some angular format to allow streaming the data to the LEDs without having to do transformations, ...

This sounds like a fun project!
posted by ltl at 4:02 AM on August 7, 2017


« Older Friend not inviting my girlfriend to his party   |   Apples available in southern CA most like Empire... Newer »
This thread is closed to new comments.