MySQLFilter: Help me with my 56 million record table! Lots of cumbersome background inside...
I currently have a single transactions table which is just a data dump of every single transaction (for a major retailer) for the last several years. It includes their internal customer_id, the transaction date and a few other choice bits of information.
Right now, I am attempting to generate a week-by-week breakdown of "total unique customers," "existing" customers and "new" customers. (They're "existing" if we can find their customer_id within the same table, somewhere in the 3.5 years preceding the week in question.
I have this built into one carefully haphazard query:
SELECT SQL_CALC_FOUND_ROWS t1.customer_number cn, t1.transaction_date, t2.transaction_date
from data_import t1
INNER JOIN data_import t2
USING ( customer_number )
WHERE
t1.transaction_date
BETWEEN '2006-01-01'
AND '2006-01-07'
AND t2.transaction_date
BETWEEN '2002-07-03'
AND '2005-12-31'
GROUP BY t1.customer_number
And so on, with the dates incremented for each week in 2006.
Surprisingly enough, this works. It just takes absolutely forever. (6 solid days of query munching for Jan 1-Jun-10.)
In a given week, there will be approximately 240,000 unique customers. The 3.5 year window before it consists of approximately 47 million records. (The database ranks in at 56,233,924 total records.)
Now, I have each field indexed, naturally. I've run an EXPLAIN, and MySQL is using the index. (Explain output visible
here.)
Most of the time for a given week is being dedicated to Copying Tmp Table. The IO usage on this drive is skyrocketing, and it's relatively clear to me that it's writing this as a DISK table. (My tmp table max size is 32 MB.)
My questions:
1) Will increasing my temp table size help? Running this in memory would be ideal, but the machine only has 2 GB, a P4 2.8 GHz and an IDE drive, which is absolutely killing me.)
2) Where the hell is it writing the temp table to? MySQL System Variables shows blank for the tmp_dir option. I can't find it in /var/lib/mysql/mysql, but I don't even know what I'm looking for. I'd like to see how large the temp tables are becoming so that I can see if using them in memory is feasible.
3) Is there any way to restructure this query such that it won't tear my disk up? It's thrashing like crazy because it's having to copy entire possible record sets to this damn disk-based temp table. It's an IDE, which is a major no-no for non-sequential massive RW like this, but it's what I'm stuck with for now, which leads me into...
4) This database consists of 1.8 GB data, 3.3 GB index. It's a MyISAM table running on MySQL 4.1.15.
Is it time to consider a new database platform? Will that see me any performance gains when dealing with this much raw data?
5) We're looking to move this to a new machine.
Will loading up with 8 GB of RAM allow us to run the entire thing in memory efficiently? Will smart indexing and query writing allow me to run it in memory with 2 GB?
Is there a faster way to do this, or is the harsh reality of 56 million records simply that things will take much longer?
I know there's a lot to mull over. Your help and advice as crazy-talented DBA-folk is much, much appreciated by me. Thanks!
But that's useless. MySQL can only use one index at a time, and you're searching on two columns. Try adding multicolumn indexes like this INDEX(date, customer) and INDEX(customer,date). I'm not sure which will help most.
I'd also suggest creating a temporary table containing the customer ids (ie INSERT INTO temp SELECT customer ... WHERE week=blah GROUP BY customer) for the week you're searching and join that against the main table.
posted by cillit bang at 6:28 AM on July 19, 2006