Best way to manage decryption/decompression of iPhone app bundle resources?
March 4, 2009 2:24 PM   Subscribe

How to best encrypt and compress a folder of html-files in the iPhone app bundle, and unpack on first start? (DRM)

My client wants to encrypt/compress the html-code for their medical books inside the iPhone bundle, to protect their IP.

Whats is a good way to prepare this file for the app bundle, and what complementary libraries (C, Obj-C) should I use to do the decryption and decompressing on the first launch of the app?

Copying the file to ~/Documents, then working on it seems like the best solution. Thoughts?
posted by avocade to Computers & Internet (14 answers total) 1 user marked this as a favorite
 
If you want to use C, consider zlib. If you want ObjC, consider zip-framework.

Unless you store the HTML in memory, you'll need to work within the application's sandbox. Apple recommends you use the application's Documents folder. If you use another folder inside the sandbox for user data, I'm uncertain if Apple would reject your application, but they like standard behaviors so that's a possibility.
posted by Blazecock Pileon at 2:47 PM on March 4, 2009


Oh, I missed encryption. You could compile OpenSSL libraries or use Apple's pre-built CommonCrypto framework, which you can include directly into your project. If you prefer Blowfish, you'd probably want OpenSSL. If you want to use 3DES or AES, you could use CommonCrypto.

With both frameworks you'll be doing work with C types, so you'll need to get familiarized with pointers and some of the NSData and NSString conversion methods. You can mix C and ObjC in the same file.
posted by Blazecock Pileon at 2:58 PM on March 4, 2009


One last comment and then I'll shut up. Depending on the size of your resources, you may want to split your document into smaller pieces. Then first compress each piece, before encrypting. Encrypting before you compress will likely be less efficient, if it is a good encryption algorithm.
posted by Blazecock Pileon at 3:03 PM on March 4, 2009


Who are they trying to protect from? Apple? Or users? Cause if it's saved decrypted, that's not gonna help on my jailbroked iPhone.
posted by rbs at 3:17 PM on March 4, 2009 [1 favorite]


One last comment and then I'll shut up. Depending on the size of your resources, you may want to split your document into smaller pieces. Then first compress each piece, before encrypting. Encrypting before you compress will likely be less efficient, if it is a good encryption algorithm.

Actually, you'll get better compression if you pool all documents together into one container, compress them as a group, and then encrypt the compressed archive. This is why we tar and then zip, instead of the other way around.
posted by Netzapper at 4:05 PM on March 4, 2009


Netzapper's comment is true but it is also true that such approach makes random access impossible. As far as document compression goes, you should get a copy of Witten and Moffat's Managing Gigabytes which will answer those questions and give you some great ideas.

As for the encryption part... I dunno, seems kinda crazy to me. Apps are already under Apple's own DRM protection. I don't know what the piracy rate is for iPhone apps but it can't be very high given the low cost of apps and that app distribution is largely controlled by Apple.
posted by chairface at 5:26 PM on March 4, 2009


Actually, you'll get better compression if you pool all documents together into one container, compress them as a group, and then encrypt the compressed archive. This is why we tar and then zip, instead of the other way around.

As far as I know, tar is neither a compression nor encryption tool, it just packages files into a bundle or archive. Still, you're saying the same thing I'm saying: compress first, then encrypt.

Here's a test 33 MB text file, should lend well to compression:

$ ls -al elements.list
-rw-r--r-- 1 pileon staff 33500188 Mar 14 2008 elements.list


First, I encrypt the file with 256-bit AES and then compress it:

$ time openssl enc -aes-256-cbc -salt -in elements.list | gzip > elements.list.enc.gz
enter aes-256-cbc encryption password:
Verifying - enter aes-256-cbc encryption password:

real 0m4.971s
user 0m3.175s
sys 0m0.238s


Let's look at its size:

$ ls -al elements.list.enc.gz
-rw-r--r-- 1 pileon staff 33505336 Mar 4 17:30 elements.list.enc.gz


The resulting file size is just a little bigger than the original, uncompressed text file!

This makes sense, because the compression algorithm looks for patterns of redundancy, and a strong encryption algorithm will aim to eliminate patterns that make it vulnerable to analysis.

For our second test, we compress the text file first, before encrypting it:

$ time gzip -c elements.list | openssl enc -aes-256-cbc -salt -out elements.list.gz.enc
enter aes-256-cbc encryption password:
Verifying - enter aes-256-cbc encryption password:

real 0m7.995s
user 0m2.814s
sys 0m0.098s


Let's look at the size of this file:

$ ls -al elements.list.gz.enc
-rw-r--r-- 1 pileon staff 6954496 Mar 4 17:32 elements.list.gz.enc


It might take longer to compress before encrypting, but the result is much smaller, about one-fifth of the size of the original uncompressed, unencrypted file! This is a huge savings.

Depending on speed versus storage needs for the iPhone application the asker is writing, the choice of order of operations can be an important design consideration, but I think I have a good argument that compressing text before encrypting yields more efficient results.
posted by Blazecock Pileon at 5:47 PM on March 4, 2009 [2 favorites]


In fact, here's a quick demo that shows that tar does not do any compression:

$ ls -al elements.list mapoftheuniverse.pdf
-rw-r--r-- 1 pileon staff 33500188 Mar 14 2008 elements.list
-rw-r--r-- 1 pileon staff 3807909 Oct 22 2007 mapoftheuniverse.pdf


Those two files use a total of 37308097 bytes.

Let's tar them up:

$ tar cvf test.tar elements.list mapoftheuniverse.pdf
elements.list
mapoftheuniverse.pdf


Now let's look at the file size:

$ ls -al test.tar
-rw-r--r-- 1 alexreynolds staff 37314560 Mar 4 18:01 test.tar


The tar archive actually uses another 6463 bytes to contain the two files. There's no compression going on here.
posted by Blazecock Pileon at 6:03 PM on March 4, 2009


How are you going to handle the decryption? If you're going to require your user to have a net connection in order to get a decryption key each time they fire up the application, that'll suck (esp if your client works in an environment that discourages the use of mobile technologies).

If you're going to embed the key in the application, then what's the point? Yes, you'll make things a little bit trickier for the hacker, but not incredibly so. You could probably add more annoyances to finding the key, but that's the kind of stuff that cracker nerds love.
posted by lowlife at 7:07 PM on March 4, 2009


The tar archive actually uses another 6463 bytes to contain the two files. There's no compression going on here.

No, of course tar isn't a compression tool.

However, what tar does is put all of the files into a single continuous file. This means that when zip (or gzip, or lda, or whatever) goes through and looks for identical bitstrings to compress, it can search across files instead of being forced to find identical bitstrings only inside of each individual file.

So, if you had a number of files with identical headers and you place them in a tar file before compression, the headers will all get the same compressed token. If you don't, most compression programs are going to build a separate symbol table for each file, meaning that you gain zero compression from the headers--they're not duplicated in the file, and so no compression can take place.

The more similarity there is between files, the more you gain from packaging them together before compression.

Here's a compressed python source tree from my dev folder: the zip file was created with "zip test.zip -r src"; the tar.zip file was created with "tar cf test.tar src; zip test.tar.zip test.tar".
-rw-r--r-- 1 netzapper netzapper 162K 2009-03-04 20:04 test.tar.zip
-rw-r--r-- 1 netzapper netzapper 176K 2009-03-04 20:03 test.zip

Ignoring all other shared bitstrings between files, all of those files have an identical header at the top with the copyright and licensing information. The plain .zip sees each of those headers, finds that they occur nowhere else in each file, and doesn't compress them. The .tar, however, encodes that header about fifty times in the same file; when zip compresses the tar, it sees multiple repetitions of the same bitstring, writes it out in full only once, and replaces the rest of the occurrences with a very short bitstring.

If you start out with only one file, and not a tree of them, then you don't gain anything by tarring it. However, you do lose compression by breaking it up into separate pieces--the compression algorithm cannot fully exploit repeated bitstrings, since it must encode the full string for each chunk.
posted by Netzapper at 8:14 PM on March 4, 2009


However, you do lose compression by breaking it up into separate pieces--the compression algorithm cannot fully exploit repeated bitstrings, since it must encode the full string for each chunk.

I don't disagree, but I think I would still want to profile how much time it takes for unarchiving a book that is split into chapters on an iPhone. The application as a whole would probably be more responsive for a user, if the iPhone were only unpacking one chapter at a time, instead of an entire book. The storage overhead might be slightly more (it might be a lot), but from my own experience, I'd still suggest compressing individual files before doing any encrypting.

Another option is to write a custom archive format that strings together bz2-, zip- or gzip-compressed components with a catalog. We do this for per-chromosome genomic data, but it's not something we have publicly available, and I don't know of any similar tar-based equivalent that is out there in a public domain (though that doesn't mean it doesn't exist).
posted by Blazecock Pileon at 8:37 PM on March 4, 2009


Best answer: Blazecock Pileon:

I usually agree with you, but I think you're dead wrong here.

If your goal is to reduce space, then concatenating all the files together and then compressing them is clearly the optimal solution. This can be shown trivially. If you have 10 copies of identical random bitstrings, and compress them individually, you will get ~0% compression--random data doesn't compress. On the other hand, you concatenate those 10 copies together into a single stream, and then compress the stream as a whole, you'll actually get compression (it depends on the algorithm how much you'll get; zip gets 1% in my test, bzip2 gets 75%). One copy of the random data will be written out, and the rest of the stream will consist of a single, short symbol repeated 9 times.

Even in the discussion of time, included in the time analysis is the constant-time setup necessary to initiate decompression/decryption of any and every file. If you have a thousand 1KB files, all of which need to be decrypted/decompressed in order for the program to operate, and there's a .1 constant-time initialization cost for a single decom/decrypt operation, then you're looking at 100 cumulative seconds just to initialize the operations. It may easily be that you can decom/decrypt at a rate of 1000KB/s and putting everything in one file results in only 1.1 seconds of decom/decrypt instead of 101 seconds.

While those figures are obviously bogus, I'm actually absolutely certain that if the thing we're analyzing is the time necessary to decompress and decrypt the entirety of data to mass storage, a single compressed file will yield lower operation times than a bunch of smaller files--the overhead of opening a file handle, of initializing the Huffman tree, and initializing the cipher are only done once instead of a thousand times.

And you're right that profiling is the best way to answer whether this question actually matters in the real world. It may be that the difference is a fraction of a second in the real world and that the losses in terms of time and in terms

The OP says "should I use to do the decryption and decompressing on the first launch of the app?" So, it is my understanding that the compression and encryption was for the distribution process, and that the files were to live on the phone as uncompressed cleartext. In this situation, we're not talking about how responsive the application is during execution, we're talking about the best tradeoff for first launch. The decompression and decryption happen only once, and the results are stored to disk.

You'd be right if we were talking about random access into a perpetually compressed and encrypted file. Obviously smaller files would be superior. But, for a one-time operation, a single file is superior. Random access is coming with cached decompressed/decrypted files here, not into the compressed/encrypted BLOB.

So, OP: my professional recommendation is the following.

Concatenate all of your directory of HTML files into a single file of some sort. There are a zillion schemes to do this, and most are pretty obvious--a header table of file names plus run lengths or start positions are all that's necessary to demux simple concatenated data. Then, use a well-known compression algorithm on that single file. Then encrypt the result using a well-known encryption algorithm. On the other end, decrypt the file to a temporary location. Then, decompress it. Then un-concatenate it using the table information that you've supplied.

Really, the algorithm for tar is the least of your worries in doing this. The biggest issued I'd imagine would be finding compression and encryption algorithms that are implemented in a compatible manner both on your development machine and on the iPhone.
posted by Netzapper at 9:29 PM on March 4, 2009 [1 favorite]


Response by poster: Thanks for all the tips! I decided to go with AES encryption, implemented in C, in a separate class doing on-the-fly decryption and passing the NSString with the resulting html-code to the WebView. Works pretty well.
posted by avocade at 1:55 PM on April 4, 2009


Perian's developer Graham Booker has an interesting discussion on compression approaches on the iPhone that somewhat agree with my advice above.
posted by Blazecock Pileon at 2:35 PM on August 8, 2009


« Older Best ways to harness the web for museum/gallery...   |   Have you grown your family through surrogacy? Newer »
This thread is closed to new comments.