DTMFA
November 9, 2007 3:33 PM   Subscribe

How do I dump a heavy-loaded 800MB MySQL table without affecting anything? I have a MySQL database serving data for a business-critical popular website. I need to grab the data from one of the tables, but cannot affect the website's performance in any way. I don't mind if it takes days to do it, as long as it affects absolutely nothing.
posted by lpctstr; to Computers & Internet (17 answers total) 1 user marked this as a favorite
 
InnoDB or MyISAM storage engine?
posted by jdgdotnet at 3:34 PM on November 9, 2007


Is there an auto-incremented id field? You could write a script that loops through a counter to pull rows 1 at a time and dump it to another server. Put a time delay in the loop and adjust adjust it to keep the load reasonable.
posted by waxboy at 3:40 PM on November 9, 2007


Response by poster: jdgdotnet: MyISAM
posted by lpctstr; at 3:41 PM on November 9, 2007


Best answer: Is it just the one table?

I think you could do it from "userspace". Select from the table, and increment over the rows, writing them to another table in an idle database of yours. Sleep for a microsecond between rows, to make sure that you're not being too greedy with I/O.

"SHOW CREATE TABLE foo;"
"CREATE TABLE foo...;"

Write a few lines of script to "SELECT *" from the source and "INSERT" those values to the destination. Easy peasy.
posted by cmiller at 3:45 PM on November 9, 2007


Oh, and if it's MyISAM, then you could "LOCK TABLES; FLUS TABLES;" copy the database file to a new location, and then "UNLOCK TABLES". That shouldn't take very long.

FWIW, IANADBA, but I *ahem* know a little of MySQL source code.
posted by cmiller at 3:48 PM on November 9, 2007


Best answer: What cmiller suggests is basically what mysqlhotcopy does. You should probably use it instead of trying to do this yourself :) But locking the tables will definitely introduce a pause to your clients.

Are there any data integrety issues? That is, do you care if your snapshot happens in the middle of any multi-step operations that might be occuring? If the answer is yes, then you're out of luck. You'll have to take the app down in order to get a consistent snapshot. This wouldn't be a problem if you used a real backend like InnoDB.

If the answer is no, there might be a couple tricks you can do. Hopefully you're on a 5.x version of the database. It's hackish, but I would try this:
CREATE PROCEDURE slowbackup()BEGIN  DECLARE done INT DEFAULT 0;  DECLARE a CHAR(16);  DECLARE b INT;  DECLARE cur CURSOR FOR SELECT id,data FROM test.t1;  DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done = 1;  OPEN cur;  REPEAT    FETCH cur INTO a, b;    IF NOT done THEN          INSERT INTO test.t2 VALUES (a,b);          SLEEP(1);    END IF;  UNTIL done END REPEAT;  CLOSE cur;END
So you slowly copy all the rows from t1 to t2. Then you can use mysqlhotcopy on t2 w/o stopping clients.

The best way, IMHO, is to setup master-slave replication. Then you can take the slave down and backup all you want w/o anyone noticing. But that's going to require quite a bit of planning and downtime to accomplish. Something to think about next time.
posted by sbutler at 4:02 PM on November 9, 2007 [1 favorite]


You know what, never mind. My hack is unworkable. If you had 3 million rows in your table, it would take over a month to run.
posted by sbutler at 4:17 PM on November 9, 2007


just copy the db file to a thumb drive. might not want to copy over the network (that could have a performance affect if the db is under stress or heavy use at the time).

you might want to decide beforehand what time you want your snapshot, and after your copy delete rows written after a certain time (if one or more tables have time stamps and you can prove that you're being complete with ALL data written after that time). Such a hack though haha.
posted by albatross5000 at 4:44 PM on November 9, 2007


Okay, help me out here, people. Am I dense or something? Is mysqldump out of the question? Surely it must have been designed with semi-mission critical db servers in mind, right?

If this is mission-critical data, wouldn't you want the copy you pulled to be up-to-date as well?
posted by Deathalicious at 4:48 PM on November 9, 2007


If you have a snapshot-capable filesystem and can stand to have a pause for however long they take to create, you can do FLUSH TABLES WITH READ LOCK, make a snapshot, then UNLOCK TABLES; you can then copy the table, or indeed the entire database from that point in time, though obviously this will compete with the database for IO.

But yes, you should have a replicated server for this. It's a business critical website that can't afford so much as a drop in performance, but it has a massive single point of failure? I'd strongly suggest you plan for some minimal downtime now to set up some replication, to defend against more serious downtime in the future.

Deathalicious: No, mysqldump isn't really made with online use in mind, especially on a MyISAM table; it'll lock out all writers while it makes the dump.

(Our MySQL database just passed 1.5 billion rows, woo)
posted by Freaky at 5:32 PM on November 9, 2007


The way you handle this is by not getting yourself in a situation where it's necessary in the first place: enable replication. Then your backup procedure becomes: take slave offline, backup slave, turn slave back online, allow it to queue up any updates that were made while it was offline.
posted by Rhomboid at 5:43 PM on November 9, 2007 [1 favorite]


Why don't you restore from a recent backup to another server. Overkill if you only need the data from one table, sure, but it's no extra work.
If you're not backing up business-critical data, make that a priority and get this done as a bonus.
posted by Horselover Fat at 6:53 PM on November 9, 2007


Stop using MyISAM.

Because it requires a full-table lock for any write action, your dump will prevent your app from functioning at all on a given table until your dump is finished there because the writes wait for your read to finish.

With InnoDB, by contrast, which can function with only row-level locking, your app can most likely continue with minimal interruption during this dump because the likelihood of a direct locking conflict is pretty low.

MyISAM is the default, it's very quick for reads, but when it comes to actually being a real database, it sucks out loud. As you're discovering right now.

Change the tables to InnoDB if you want to do stuff like this. You can't take advantage of the replication suggestions above without ditching MyISAM from the master DB either, if memory serves.
posted by genghis at 9:34 PM on November 9, 2007


And add another hundred times more NOOOOOO! on the MyISAM if, as you say, this DB is business-critical. Business-critical apps use transactions to prevent referential integrity from getting fucked. MyISAM doesn't allow transactions.
posted by genghis at 9:37 PM on November 9, 2007


InnoDB is not suitable for every sitation. If you need super-fast performance, (especially if you're using a lot of group by and order by statements) then switching to innodb will significantly hamper query speed.
Yes InnoDB is better at dealing with fuck-ups, but thats your trade-off - do you want better performance or better 'oh shit' recovery.
Unless your application is doing a lot of simultaneous inserts/selects/deletes, you'll likely get better performance out of myIsam (fwiw, Innodb is myIsam at its core)

What I find most alarming about your question is that you don't appear to be doing back-ups of this business-critical DB! If you were then this would be a non-issue, the backup would have the data you need.
posted by missmagenta at 1:08 AM on November 10, 2007


Just dump the database. If the site doesn't have a backup, you're doing them a favor. missmagenta is right about this, wrong about MyIsam vs. InnoDb.
posted by orthogonality at 1:28 AM on November 10, 2007


800mb? A dump of that size will take a few minutes on modest hardware. Can they really not tolerate a few minutes of downtime? What sort of level of traffic do they get? Is there not a quiet period (late at night, early in the morning) that they could tolerate a few minutes of downtime? Could you not put a "we're down for a few mins for maintenence" screen up during the process?
posted by gaby at 4:16 AM on November 10, 2007


« Older Geek self-study   |   I shot John Galt Newer »
This thread is closed to new comments.