Optimizing large HTML file?
February 15, 2009 6:14 PM   Subscribe

Alternative to large HTML table?

I have a relatively large dataset that I turn into a sort of histogram/graph using an HTML table. It produces what I want, but the downside is the HTML files are rather large (up to several megabytes).

I'm wondering if there's some way to represent the information (perhaps using CSS) that will minimize the filesize. I know little about HTML and even less about CSS, so I'm not quite sure to begin.

Here are more details: I have written a small program which examines large fixed-length text files, analyzes each character type, then categorizes them (letter, number, space, etc.). Each column's categories are then averaged, each category is assigned a color, and then an HTML table is populated with a placeholder character and the relevant background color. The table has 10 rows, and an arbitrary number of columns (equal to the row-length of the text file).

The end result then is basically a histogram of the character type data. So if column 1 of my textfile is all characters, and I use black to represent letters, my table will have a black column 10 rows high at the first position. If the second column is say 30% numbers and 70% letters, the table will have 7 black rows and 3 green (or whatever color) rows.

I use this to compare two text files and look for major changes in the character patterns - a sort of low-level data analysis tool.

Anyway, for the sake of efficiency I'd like to make these files smaller; any help would be appreciated. I can send (or possibly post somewhere) a representative graph + the HTML. Or I can post just a single row of HTML; all the rows are more or less identical except for the background color and label.
posted by jjsonp to Computers & Internet (18 answers total)
Can you post a link to the data? If you are defining each cell's visual attributes on a cell-per-cell basis, then I can see how the file is so large. If you aren't, I'm at a loss. Let's have a look at the data.
posted by aloneinvietnam at 6:20 PM on February 15, 2009

Could you use something like Gnuplot to render the data as a graphic?
posted by ijoshua at 6:24 PM on February 15, 2009

(See the 5th chart on this demo page.)
posted by ijoshua at 6:31 PM on February 15, 2009

You could do this with CSS like this:

<style type="text/css">
.c {background-color: black;}
.n {background-color: green;}
.s {background-color: blue;}


<td class="c">&nbsp;&lt/td>
<td class="n">&nbsp;&lt/td>
<td class="n">&nbsp;&lt/td>

However I expect you will run into similar problems with file size and rendering time in the browser that you have already been having with pure HTML. Presumably you are generating your HTML file from a program, you might have more luck using an image generation library like GD to build a PNG image instead.

See this page for an example of using GD with PHP.
posted by astro38 at 6:33 PM on February 15, 2009

HTML tables suck for very large sets of data because they generate a lot of overhead and because they need to be completely rendered by the browser before being displayed to the user.

Without more details about the data and your project, it's difficult to give exact suggestions. However, some general solutions would be:
0. Turn your data into a actual diagram (gif, png, jpg).

1. skip tables all together and use lists and divs. You'll notice that your data loads MUCH faster because lists and divs are rendered much faster by browsers.

2. Continue to use tables but split up the data in smaller chunks and use Ajax to load the chunks on the user's command.

3. Make sure the web server has compression enabled. If you're using Apache as a web server, enable mod_gzip and mod_deflate.

4. Use an external CSS file to handle all of the formatting. The CSS file is cachable so all the data only needs to be downloaded once.
posted by Foci for Analysis at 6:36 PM on February 15, 2009

What are you currently using to generate the HTML? On the face of it, the simplest thing would be to split it into pages, but the ease of that depends on what you're using to generate.
posted by rhizome at 7:07 PM on February 15, 2009

Response by poster: Thanks everyone. Here is a sample line of HTML:

{td width=1 bgcolor="#5B8772"}{font face="Arial" size=1 color="#FFFFFF" title="0"}{sup}46;{/sup}{/font}{/td}
(note that I replaced '<>' with '}', since the HTML was being interpreted in my comment).

Other than the header & footer, the entire HTML file is identical except for bgcolor & title (the title reflects the column or byte position in the text file and appears when I hover the cursor over a given column, which is somewhat important but not absolutely necessary - not sure if I could reproduce that with an image file).

I view these on my local machine and am not serving them up. So unfortunately I don't have a place I can post the full file for viewing at the moment.
posted by jjsonp at 7:20 PM on February 15, 2009

If you want VERY compact file size, use a monospaced font and just use letterspace for rows/cols.


And so on. You can css the fonts/colors to get some fine effects that will seem like graphics.

Unicode characters could even be faux graphics.
posted by rokusan at 7:37 PM on February 15, 2009

Response by poster: Hm so if I use Courier or something similar as opposed to Arial that should reduce the file size?

I am currently using the '.' character as a placeholder as that seemed to provide the best appearance. I seem to remember that using a space didn't work for some reason (I first wrote the program which outputs this several years ago).

If anyone wants to see a full sample I can email it to you; it's only 55kb zipped up, and it unzips to 1.9mb.
posted by jjsonp at 7:51 PM on February 15, 2009

Do you have to use pure HTML? Would Javascript be ok for your purposes?

If you stuck your data in a Javascript Array, a small JS function could then create a lovely graph (perhaps with coloured DIVs), and the total size would be much smaller.
posted by i_am_joe's_spleen at 7:58 PM on February 15, 2009

Response by poster: Yes, a Javascript array displaying a graph might work. I don't know much Javascript nor how to display graphs, but it might be worth looking into.
posted by jjsonp at 8:03 PM on February 15, 2009

Oh, and here's an idea.

I'm assuming you have the skills to write a script that generates text, but binary image files are beyond you.

Well, make your script generate XPM, PPM, or PGM, which are all text-based formats, and then run them through imagemagick or other converter, and you can have spiffy GIFs or JPEGs, which will be much smaller. In fact your script can write some HTML that embeds those JPEGs.
posted by i_am_joe's_spleen at 8:05 PM on February 15, 2009

If you want to keep it simple HTML, then I vote Javascript. The browser can generate the necessary elements based on a fairly small Javascript.
posted by sycophant at 8:38 PM on February 15, 2009

You can use a non-breaking space with the code &nbsp; instead of the '.' character for a placeholder.

You could reduce the file size considerably using the method that astro38 describes. This will allow you to eliminate all the repetition of the formatting information that you currently have in each cell.

Alternatively, instead of using 10 cells of varying colors to represent a value, you could use a div with the width/height specified. So if it is 100% black it would be a div with a height of 100px, if you want the 70% black and 30% green you have two divs, one with a black background and height of 70px and another green one below it with a height of 30px. You'll have to play around a bit with divs and their positioning to get the results you want but I'm sure the final size will be a lot smaller.

Using javascript to generate the final HTML, as suggested, will definitely help even further.
posted by Gomez_in_the_South at 10:32 PM on February 15, 2009

HTML tables suck (...) because they need to be completely rendered by the browser before being displayed to the user.

Only if you don't specify table-layout: fixed in your CSS. Remember to give expliciti column widths for best results.
posted by ghost of a past number at 3:03 AM on February 16, 2009

Now that I've come back here and seen it, I like the Javascript idea better, so I'll second that. Ignore that earlier rokusan, he's an idiot.

(There are 100 ways to write it, but the idea of using an algorithm and recursion to display very repetitive information is the golden egg in the middle. If you can write instructions (in javascript) on HOW TO DISPLAY what you need, it's bound to be much more compact, the same way that saying "write six thousand zeros" takes up a lot less space than doing it.)
posted by rokusan at 3:35 AM on February 16, 2009

If you really want to do this right, you skip using a web browser to do something that other programs are designed from the ground up to do. Like Excel, or Gnumeric... a dedicated program that runs on the user's computer that will handle vast data sets without choking. As mentioned above, large tables take enormous amounts of calculations for a browser to display properly. But if it really is tabular data, it should be in a table. So, let your users decide for themselves how they want to manipulate the data by simply sending CSV (comma-delimited files).
posted by Civil_Disobedient at 5:19 AM on February 16, 2009

Two options:

1. Google charts api: http://code.google.com/apis/chart/

2. divs with fixed-widths and background colors. Something like this:

<div style="width:1024px; height:100%">
<div style-"width: xxxpx; background-color:red; height:10px"></div>
posted by ragaskar at 9:14 AM on February 16, 2009

« Older How can A and B be connected if there's nothing to...   |   Hard-to-find instrumentalists in bands and... Newer »
This thread is closed to new comments.