Help me decode what looks like base64
June 21, 2012 9:04 AM   Subscribe

I have a program that stores information in a database. Because this program doesn't have an API, any automation has to be done by writing directly to the database. Some of the database columns are encoded, so in order to write into them I need to know the encoding method. Can you help me figure it out? Examples inside.

It looks like Base64, but decoding it just comes up with random unicode characters, so I think there's more going on (such as some character padding). Here's some examples, and I can generate additional arbitrary examples through the program if it will be helpful.

plaintext:metafilter encoded:s9/L/Eh32x8V1gKnpqTGug==
plaintext:0123456789 encoded:dQA8RA9MA8EVj95j6cr7fg==
plaintext:Base64?? encoded:rLBE10f44pM5Y+jQzo6+/Q==
plaintext:base64?? encoded:nNxyuE3+t5gDuhEoCBN60w==
plaintext:a encoded:g11eOdF3AzkRE7/MTBfVPQ==
plaintext:b encoded:Fh1hlqvtQB+cYx2+FI0x0Q==
posted by Nonsteroidal Anti-Inflammatory Drug to Computers & Internet (26 answers total) 5 users marked this as a favorite
Response by poster: And upon some additional testing, it's padding with spaces up to a maximum of 10 characters.
posted by Nonsteroidal Anti-Inflammatory Drug at 9:07 AM on June 21, 2012

And upon some additional testing, it's padding with spaces up to a maximum of 10 characters.

At the front or the back? What happens if you save a string longer than 10 characters?
posted by tylerkaraszewski at 9:11 AM on June 21, 2012

Looks like some kind of HMAC to me, the difference between two inputs that are only a few bits apart (b vs B, a vs b) are huge and all over the string. I'd go and reverse engineer the code.
posted by themel at 9:14 AM on June 21, 2012

Response by poster: At the front or the back?

Great question. ' a ' generates the same result as 'a', so maybe it's stripping spaces internally? I assumed it was padding since any input gets an identical length output, but maybe it's doing something else.

This data would never be longer than 10 characters, so the program won't let me give an input longer than that.
posted by Nonsteroidal Anti-Inflammatory Drug at 9:18 AM on June 21, 2012

Googling hash trailing equals sign shows a link to what looks like a similar question. In that link, they want to understand encodings that have 22 characters followed by 2 equals signs as well. It doesn't look like good news for you in that thread, but perhaps some of the other links from the original search will help?
posted by bessel functions seem unnecessarily complicated at 9:27 AM on June 21, 2012

So the two equals signs at the end tell me that this is likely a base 64 encoding scheme that's being used. Base 64 takes every three bytes and converts them to four ASCII characters, and then pads the resultant string with between 0 and 2 equals signs so that the length of the encoded string is a multiple of four. So you might start there.
posted by delfuego at 9:28 AM on June 21, 2012

You're able to get the data back out, which means its encryption and not an HMAC. It's 16 byte output, so my first guess would be AES with a fixed encryption key, as it processes in 16 byte blocks and is readily available in OS provided crypto libraries.

I'd trace the DB and see if they're doing the encryption in a stored procedure. If so, the problem is easy. If it's happening in application code, you'll (presumably) need to reverse to get the key. Or brute force it.
posted by bfranklin at 9:31 AM on June 21, 2012

Of course, you said you've already started there (base 64). Duh.
posted by delfuego at 9:32 AM on June 21, 2012

It looks like base64 encoding of some binary data, possibly md5. Is this a hash function or a reversible algorithm? If it's a something like passwords, it's probably a hash, possibly md5 or sha1.

What's the program? If it's custom, is it PHP, compiled exe or what? That might have some clues. Also, does the program have a password?
posted by justkevin at 9:33 AM on June 21, 2012

What's the encoding of an empty string? Try putting the output of that into google to see if this same encoding has been used elsewhere.
posted by justkevin at 9:38 AM on June 21, 2012

Response by poster: I'd trace the DB and see if they're doing the encryption in a stored procedure.
There's hundreds of procedures in the db, but as best as I can tell none of them are for encoding data...but that's just from reading the names. I'll have to actually try tracing it.

Is this a hash function or a reversible algorithm?
Some of the data is definitely reversible, but I guess some columns wouldn't necessarily need to be. I'm assuming that the encoding method is the same in both cases given that the results take the same form.

This is a compiled exe, no password. It's been forever since I've read up on the cracking scene, but if anyone has any suggestions for some sort of debugger to attach, I can give that a try.
posted by Nonsteroidal Anti-Inflammatory Drug at 9:44 AM on June 21, 2012

OllyDbg is pretty much it these days for free-to-use. In your case, though, using the trial period of IDA Pro would likely be much, much easier since it features a decompiler.

If you suspect certain fields might be hashes, check their byte count when decoded from base64, and google "n byte hash". If the above were a hash, it would be MD5 based on the byte count.
posted by bfranklin at 10:01 AM on June 21, 2012

So what you/we know so far:

* The output is almost certainly base64-encoded; the two equals signs at the end give that away.
* Given that, the input into the base64-encoding process is 16 bytes long; a 16-byte input generates a base64-encoded string of the length that you're seeing with the two equals signs padding the end of it.
* Given that your inputs are of variable length, there's another process, likely a hash function, that's turning what you put in into the 16-byte input to the base64 encoding process.
* Given that you see the same output for "a" and " a ", you know that that the input has spaces stripped from it before being hashed, or the hash function itself strips spaces.
* Given that "Base64??" and "base64??" produce different output, you know that the hash function is case-sensitive and that the input is NOT case-normalized before being hashed.
* Given that you see the same output for "a" and " a ", you know that if there is a salt involved in the hash function, it's not a random salt but rather a fixed one.

I just did a simple test which took "metafilter", created a straight MD5 hash of it (which outputs a 16-byte digest), and then base64-encoded that digest; the output ("TPc3tK3+eXqXTeyOGZ0Tcg==") does not match your known value, so that's not it. There aren't many more common hash functions that output 16-byte digests -- SHA-1 is 20 bytes, SHA-256 is 32 bytes, SHA-512 is 64 bytes. So either this is a rare or novel hash function, or the app is salting the input string with a static value, hashing it with MD5, and then encoding it to base64. That's my bet.
posted by delfuego at 10:09 AM on June 21, 2012

odinsdream raises an interesting point. Encrypting a 10-byte string with DES and encoding the output in base64 would also look exactly like this; since DES uses a fixed 8-byte block size, a 10-byte string would be padded to 16 bytes prior to being encrypted. And DES produces output the same size as the (post-padding) input, so you'd have 16 bytes of output, which would then be base64-encoded into what you see.

So DES is also a possibility. If that's what's happening, then the sequence would be: your input is having spaces stripped, then it's being padded out to 10 characters (or, I should say between 10 and 16 characters) with something, then it's being DES-encrypted with a secret key, and then it's being base64-encoded for storage in the DB.

All in all, I don't know if you'll get any further without a little source diving. What app is this?
posted by delfuego at 10:23 AM on June 21, 2012

Check out signsrch, which claims to be able to look at executables and give you a clue what encryption algorithms are being used.
posted by delfuego at 10:26 AM on June 21, 2012

What database is being used? Some databases have built in encryption functions (e.g. Oracle 8 provided DES and 3DES, and probably provides more modern options now).

If the encryption is done in the database, you may be able to enable some sort of query log to see the encryption happening.
posted by danielparks at 11:09 AM on June 21, 2012

Response by poster: Tracing the database indicates the data is already encoded when it is written. I'm trying out some of these debuggers, but right now it's just a lot of assembly. I'll have to look up a tutorial on how to use them.
posted by Nonsteroidal Anti-Inflammatory Drug at 11:36 AM on June 21, 2012

Is it case sensitive? If the operation is reversible and it can store 10 characters then it must be using 5.5 bits per character. If that's true then there must be character set reduction, most likely including case folding and then some since 2^5 = 32.

But given themel's observation and the fixed width output I'm inclined to believe it's a hash function and thus not reversible.
posted by chairface at 11:48 AM on June 21, 2012

Disregard my previous post. I didn't read clearly enough and forgot how base64 encoding worked. delfuego's analysis is most likely correct.
posted by chairface at 11:52 AM on June 21, 2012

(At some point, we might learn what DB this is, what app it is, etc...)
posted by delfuego at 11:53 AM on June 21, 2012

Response by poster: (At some point, we might learn what DB this is, what app it is, etc...)

It's MS SQL Server, but like I said, tracing the DB indicates that this is all being done within the application. This is a legacy piece of inventory/accounting software written by a now-defunct company called iCode.
posted by Nonsteroidal Anti-Inflammatory Drug at 11:58 AM on June 21, 2012

Response by poster: And for those playing along at home, here's the output from signsrch:

- start signatures scanning:

offset num description [bits.endian.size]
01b279f4 165 AES Rijndael S / ARIA S1 [..256]
01b27af4 166 AES Rijndael Si / ARIA X1 [..256]
01b27c2c 168 Rijndael Te0 (0xc66363a5U) []
01b2802c 170 Rijndael Te1 (0xa5c66363U) []
01b2842c 172 Rijndael Te2 (0x63a5c663U) []
01b2882c 174 Rijndael Te3 (0x6363a5c6U) []
01b27bf4 186 Rijndael rcon []
01b278e4 191 Blowfish bfp table []
01b3235c 289 MD5 digest [32.le.272&]
01b2a58b 309 padding used in hashing algorithms (0x80 0 ... 0) [..64]
01b27904 326 Haval hash pass2 []
01b214d7 357 Zlib dist_code [..512]
01b216d7 358 Zlib length_code [..256]
01b2139c 360 Zlib base_length [32.le.116]
01b21410 362 Zlib base_dist [32.le.120]
01b2d398 371 LZ Huffman (lzhuf/lha) decoding table [..256]
01b2d358 372 LZ Huffman (lzhuf/lha) encoding table [..64]
01b266f0 384 Jpeg dct 14 bit aanscales [16.le.128]
01b26770 388 Jpeg dct AA&N scale factor [double.le.64]
00042366 564 TEA encryption/decryption (0xc6ef3720 0x9e3779b9) [32.le.8&]
008b27e0 1228 rfc3548 Base 64 Encoding with URL and Filename Safe Alphabet [..62]
008c31b8 1234 UUEncodeTable [..64]
008b27e0 1237 B64EncodeTable [..64]
008c2f24 1240 XXEncodeTable [..64]
008c1590 1243 BHEncodeTable [..64]
01b278e4 1300 Haval init []
000422e1 1483 TEA1_DS [32.le.4]
01b20d6c 1525 zinflate_lengthExtraBits [32.le.116]
01b20de0 1529 zinflate_distanceExtraBits [32.le.120]
01b20ddd 1530 zinflate_distanceExtraBits []
01b3235c 1626 Lucifer (outerbridge) DFLTKY [..16]
01b3236c 1639 Misty md5const [32.le.256]
01b2d318 1646 Huffman LZH p_len [..64]
01b2d498 1647 Huffman LZH d_len [..256]
0004be14 1767 anti-debug: IsDebuggerPresent [..17]
0248712e 1887 libavformat gif_clut [..648]
01b26b3b 2094 libavcodec ff_mjpeg_val_ac_luminance [..162]
01b26bee 2095 libavcodec ff_mjpeg_val_ac_chrominance [..162]
01ce54ad 2272 compression algorithm seen in the game DreamKiller []

- 39 signatures found in the file

posted by Nonsteroidal Anti-Inflammatory Drug at 12:01 PM on June 21, 2012

OK, so I bet it's either doing a salted MD5 hash that's then getting base64-encoded, or it's doing AES encryption. (AES encryption uses a block size of 128 bits, or 16 bytes, so encrypting input that's less than or equal to 16 bytes would produce a 16-byte output.) If it's the former, then you'd have to find the salt in the source somewhere in order to replicate what it's doing. If it's the latter, you'd have to find the key.
posted by delfuego at 12:08 PM on June 21, 2012

Have you considered, instead of solving this problem by talking to the database, automating the program through its interface with something like Sikuli? It's not as robust a solution, but if you don't need it to run without supervision and can work within Sikuli's limitations, you might be able to avoid the whole reverse-engineering project.
posted by Honorable John at 12:20 PM on June 21, 2012 [1 favorite]

Well - I suggest going another way...

If the company is truly defunct (and the software un-supported), then it wouldn't hurt to try contacting some people who used to work there, no?

Was this "iCode Systems Ltd." in the UK?

Within my extended network, it turns out I can possibly contact up-to 7 different people who have purported to work there in the past (one even as "founder/owner").

It may cost a couple $$$, but it is worth it, rather than trying to go down "rabbit-hole" after "rabbit-hole" - especially if as above, it appears to have quite a few encryption algorithms in-built.

Could try using a "strings" utility on the executable to see if they stupidly left the key in a resource string somewhere... ;-) (doubtful)
posted by jkaczor at 3:32 PM on June 22, 2012

Response by poster: Thanks for the offer, jkaczor, but it looks like that's a different company.

I'm still working on using some disassemblers, but it's slow going right now. I'm hoping that as I continue to pick apart other aspects of this programs, I might luck into finding something to point me in the right direction.
posted by Nonsteroidal Anti-Inflammatory Drug at 6:53 AM on June 25, 2012

« Older Buying new PC Speakers   |   Help Me Choose A Wireless Router That Plays Well... Newer »
This thread is closed to new comments.