Setting up a huge database
December 6, 2005 10:05 AM   Subscribe

I'm about to set up a database (pet science project) that has about 200 million records consisting of about 50 mostly-numeric fields. Is mysql up to this task? If not, what is? And since many fields are null, how can it be optimized to consume the least possible disk space? I'm worried about bloat eating up my 100 GB drive.
posted by shannymara to Computers & Internet (8 answers total)
 
Well, if you have 200 million fields, then 100 gigs means you can have an average field size of 500 bytes. That's not all that much, I guess.

Anyway I'm not sure how MySQL's performance would be, but PostgreSQL is definitely up to the task.
posted by delmoi at 10:08 AM on December 6, 2005


According to this an empty varchar feild in PosgreSQL is 4 bytes
posted by delmoi at 10:11 AM on December 6, 2005


Oh, actually I misread your question, you have 50 text fields, so that's actually going to be 200m * 50 = 10 billion strings, many of which could be null.

If X is the average string length for a non-null string, and P(null) is the probability a string is null, the amount of storage space you'll need will be (4+X) * (1-P(null)) + 4*P(null).

At least using PGSql.
posted by delmoi at 10:16 AM on December 6, 2005


yes, mysql can certainly handle a database this small.

the key to keeping the database file size down is picking the types for your columns to be only as large as necessary for the data you're storing. int vs. tinyint, that sort of thing.

depending on how you are going to be accessing the data, the new archive storage engine may be something to consider. no indexes, but compression of the data. (this article explains more.)

or if this is static data, you can use myisam and compress the table once you have loaded it.
posted by jimw at 10:18 AM on December 6, 2005


I'd suggest postgres. Even if MySQL can hold it and work on it, restoring a backup (which seems to be needed much more frequently with MySQL than with Postgres, just my experience) will be a bitch. Postgres is may be a bit slower overall, but it takes care of your data.
posted by devilsbrigade at 11:53 AM on December 6, 2005


Most of the comments above probably refer to previous versions. The recently released MySQL 5.0 and PostgreSQL 8.1 both have dramatically improved their performance. I don't think MySQL should have any problems handling it.

You could enable NTFS compression on the database to save up space. In my experience NTFS compression doesn't adversely affect performance nor reliability (although it does increase fragmentation).

You might also want to reconsider if your database design is optimal if there's a very high percentage of NULL values.
posted by Sharcho at 3:47 PM on December 6, 2005


It's interesting that most of your fields are null. You could possibly restructure your table (make it two or three tables) to drop the size down even more.

Plus, what do you mean by "mostly numeric data"?
posted by seanyboy at 3:49 PM on December 6, 2005


What seanyboy said. If most of your fields are empty, that sets off some warning bells. Sure you can't reconsider your relational schema? Apologies in advance if I'm way off base (obviously, I know nothing about the problem you're trying to solve), but I've met enough folks who seem to think a database is like a giant spreadsheet for it to be a concern.

Nor is there information about how the database is to be populated, typical size and data format of the records that aren't null, what kinds of queries are to be run on it and how frequently, how many users there might be and what other hardware/software/performance conditions might pertain. It may be that MySql is just the ticket, but I can't make a recommendation based on the information here.
posted by normy at 6:46 PM on December 6, 2005


« Older savannah?   |   Czech Airlines: good or bad? Newer »
This thread is closed to new comments.