How to serve HTML out of a compressed archive?
November 17, 2007 9:41 AM   Subscribe

I want to serve lots of different static HTML files out of a single compressed archive, to save disk space on the server. Is there a quick and easy solution for this?

So I got an iPod Touch recently, and could not restrain myself from running the jailbreak on it. There are some nifty third party apps out there, but I'm pretty sure the killer app for me is being able to browse a Wikipedia snapshot without requiring wifi.

Due to the disk space limitations, I'd like to serve the snapshot out of a compressed archive. Compressing each html file individually would work, but it's relatively storage-inefficient, so I'd rather have everything wrapped up in a big gzipped tarball or similar.

Is there an Apache or lighthttpd module that would make this project easy? Or is embedded OS X capable of mounting an archive through a VFS/loopback device? The fallback solution is that I write a cgi script to do the decompression, but I'm looking for something better.
posted by Galvatron to Computers & Internet (8 answers total) 2 users marked this as a favorite
 
I don't know about the module, but the archive format you want is Zip or something similar. Zip files allow random access to each file instantly; a gzipped tarball requires you scan the entire archive to find where the file you care about starts.

Douwe Osinga did something similar in 2005 with a smartphone: he has some notes online. Sounds like he used Tomeraider, an ebook reader. Just over a gig for Wikipedia in their format.
posted by Nelson at 9:45 AM on November 17, 2007


If the iPod touch can mount .dmg image files, those can be made as compressed. The files will just show up as normal, but will be compressed inside the .dmg.
posted by zsazsa at 10:16 AM on November 17, 2007


Best answer: You could do it in not-a-lot of Python.

import zipfile
import BaseHTTPServer
z = zipfile.ZipFile("wikipedia.zip", "r")
...
# listen for requests, for each,
page_source = z.read("path/PageName")
output.write(page_source)
...
z.close()

... or something very close to that.
posted by cmiller at 10:17 AM on November 17, 2007 [1 favorite]


Response by poster: Interesting thought, cmiller; I hadn't considered using Python's built in server. That really would be not-a-lot of python.
posted by Galvatron at 10:25 AM on November 17, 2007


It'd be pretty cool if you tossed this up on Projects when your done. I'll have one of those gizmos ~Jan.
posted by a_green_man at 12:14 PM on November 17, 2007


Response by poster: If the project turns out ok then I'll be sure to post a write-up, a_green_man. I wrote a short Python script based on cmiller's suggestion, which pretty well resolves my original question... but now I have to deal with the logistics of decompressing and rearchiving Very Large Quantities of data, and possibly figuring out a strategy to prune it down to something more manageable.
posted by Galvatron at 1:45 PM on November 17, 2007


You could save space by only storing the Wikipedia CD collection.
posted by Mwongozi at 2:08 PM on November 17, 2007


Response by poster: Mwongozi, I did see the CD collection, but I don't think such a limited set of articles would satisfy me. I want to carry the Hitchhiker's Guide in my pocket. Possibly bearing a wallpaper image that says "Don't Panic" in large, friendly letters...
posted by Galvatron at 2:30 PM on November 17, 2007


« Older How to do a lease transfer   |   Yes, I've read Plato already Newer »
This thread is closed to new comments.