Fixing an unreasonably slow MySQL query
May 24, 2013 10:36 PM   Subscribe

Query and Explain output here. Query must be executed 4022 times, each with a different Strip_ID, while the script processes, adding 4+ minutes to what would otherwise be a ~90-second task. I have (separate) indices defined @ Parent_Table and Parent_ID, in addition to the table's primary key on an auto-incrementing ID. Adding combined indices for those two fields (in each order, just to be safe) did not change MySQL's analysis of the query or improve execution speed.

The full (relevant) context of the query is available here. SHOW CREATE TABLE for Strip_Text available here.
posted by The Confessor to Computers & Internet (9 answers total) 5 users marked this as a favorite
 
Best answer: It's the dependent subquery probably, the MySQL optimiser is notoriously bad at these.
Try to rewrite the subquery as a join, it's always faster.

If your two different subqueries can't be converted together, use a UNION.

End result should look something like this:


(SELECT Text FROM Strip_text
INNER JOIN Strip_Panels ON Scene_ID=Parent_ID AND Strip_ID=1
WHERE Parent_Table= "Scenes")
UNION ALL
(SELECT Text FROM Strip_text
INNER JOIN Strip_Panels ON Scene_ID=ID AND Strip_ID=1
WHERE Paren_Table="Panels");


Two notes:

a) I havent' actually run this and
b) With your approach, you get one Text no matter how many entries match in Strip_Panels due to the IN operator. With the join, you'll get multiple result rows if you have many (though from the field names I'm guessing this is not possible) - you may need to add the DISTINCT keyword or otherwise handle multiples.
posted by Dr Dracator at 10:53 PM on May 24, 2013 [1 favorite]


Initial drive-by thoughts:

1. Correlated subqueries are slow. Dr. Dracator has covered that above.
2. Another big problem is that ALL in the primary query. It's selecting all of the rows of the Strip_Text table. Converting the subqueries to joins should fix that, but if it doesn't, you need to add a clause that will filter out rows so that MySQL is not looking at all of them every time.

If you want more extensive help, MeMail me; i'm happy to go over it with you and look at it in more detail tomorrow or Sunday. Can't help right now because i'm trying to do about 5 other things.
posted by adrienneleigh at 11:33 PM on May 24, 2013


Response by poster: I ended up with:

(SELECT Text FROM Strip_Text
LEFT JOIN Strip_Panels ON Strip_Text.Parent_Table = "Panels" AND Strip_Text.Parent_ID = Strip_Panels.ID
WHERE Strip_Panels.Strip_ID = ?)
UNION ALL
(SELECT DISTINCT Text FROM Strip_Text
LEFT JOIN Strip_Panels ON Strip_Text.Parent_Table = "Scenes" AND Strip_Text.Parent_ID = Strip_Panels.Scene_ID
WHERE Strip_Panels.Strip_ID = ?)


Which is identical to your UNION ALL solution, moved around a bit so as not to confuse my impression of the table's structure and purpose. DISTINCT was warranted for the Scene_ID portion because scenes can be (and usually are) shared between multiple Panel_IDs associated with a given Strip_ID.

Using this query, which returns substantively identical results to my own "solution" (since the field is intended only for searching, the ordering doesn't matter), the speed penalty for strip text processing was reduced from greater than four minutes to about two seconds.

I am not unfamiliar with the UNION statement - I use one myself in the "relevant context" - and I might have hit on something like your solution myself had I not wrongly assumed (a) the fault was some flaw in my index structure, rather than in the query, and (b) the penalty from using UNION and DISTINCT would be worse than the subqueries. Suffice to say I've learned my lesson. When something goes wrong with a MySQL query in the future, the subqueries will be the first place that I look.

Thank you.
posted by The Confessor at 11:59 PM on May 24, 2013


On reading it again I think I understand your search better, try this one as well:


SELECT DISTINCT Text FROM Strip_Text INNER JOIN Strip_Panels ON
(Strip_Text.Parent_Table = "Panels" AND Strip_Text.Parent_ID = Strip_Panels.ID)
OR (Strip_Text.Parent_Table = "Scenes" AND Strip_Text.Parent_ID = Strip_Panels.Scene_ID)
WHERE Strip_Panels.Strip_ID = ?


UNIONs also have some kinks (e.g. you can't optimize GROUP BY out of a nested UNION query) that might bite you in the ass later on.

(BTW, Why are you using LEFT JOIN? You are relying on Strip_ID=NULL to evaluate to NULL to select your rows, it's kind of counterintuitive and might trip you up if the search condition becomes more complicated later on)
posted by Dr Dracator at 2:29 AM on May 25, 2013


FYI, I find this guy has a lot of interesting things to say about writing efficient queries.

Could you share the explain output for your final query?
posted by epo at 2:49 AM on May 25, 2013 [1 favorite]


Response by poster: Explain output for JOIN/UNION solution (the final query) & Combined Conditional JOIN here.

I'd already tried your combined conditional JOIN before, Dr Dracator. As noted by the commented documentation at the top of the "relevant context" linked in my question, I had to remove that logic from the initial INSERT INTO [...] SELECT query, which I use to populate most of the fields, because it killed the execution time. I think it was that failure which (wrongly) put me off the use of joins altogether for selecting/assembling text.

I use LEFT JOINs wherever possible because I know *exactly* what I'm getting when I use them. I've been coding in this environment for years, but have no formal training in any of the technologies that I use, so the consistency of relying on a single JOIN type is helpful. And from my knowledge of the table structure of my project and the code that populates that structure with data, I know that there is no possibility of a Panel.ID existing where Strip_ID IS NULL, now or in the future.

Thank you, and sorry for waiting 'til now to respond; I was sleeping.
posted by The Confessor at 8:39 AM on May 25, 2013


Best answer: I hate to sound like "that guy" in a forum requesting tech help, but you've asked the wrong question if you want to get the best performance from your code.

You have effectively written your query as procedural, rather than set-based code. Instead of doing this in an RBAR style, you need to consider "foreach" and "while" as nothing short of blasphemies as far as performance goes. Yes, you will find places you need to use them; I can count on one hand the number of places I've needed to resort to explicitly looping in my entire SQL career.

Instead of thinking in terms of what goes on each row, think about what you want to end up in each column. Even if you pull that together one column at a time using painfully convoluted logic in your query, it will virtually always run much, much faster than treating your data as a 2d matrix.
posted by pla at 2:07 PM on May 25, 2013 [3 favorites]


I can count on one hand the number of places I've needed to resort to explicitly looping in my entire SQL career.

Probably a derail, but there's one case where I keep resorting to cursors: I often use a stored procedure to do something with one row, and then need to call it repeatedly for a dynamically generated set. Merging the row selection logic into the stored procedure makes for really convoluted code, and gets in the way of code reuse and separation of atomic operations.
posted by Dr Dracator at 2:56 PM on May 25, 2013


Response by poster: Thank you, pla.

I'm aware that iterating over a single query ~4278 times (at last count) is not ideal. In fact, if you look at the relevant context I linked in my extended explanation1, you'll find that I currently assemble and populate 13 of the 15 fields in the table I'm constructing using a single ~2KB INSERT INTO [...] SELECT query. I resort to iteration only when I cannot think of a good, graceful way to do the required logic solely in SQL.

Inspired by Dr Dracator and everyone else's help with the per-record query, I think I have already constructed a workable separate-JOIN-based syntax to append to the main assembly query:

SELECT CONCAT_WS("\n",GROUP_CONCAT(DISTINCT Strip_Panel_Text.Text SEPARATOR "\n"),GROUP_CONCAT(DISTINCT Strip_Scene_Text.Text SEPARATOR "\n")) AS Strip_Text FROM Strip_Dates LEFT JOIN Strip_Panels ON Strip_Dates.Strip_ID = Strip_Panels.Strip_ID LEFT JOIN Strip_Text AS Strip_Panel_Text ON Strip_Panel_Text.Parent_Table = "Panels" AND Strip_Panels.ID = Strip_Panel_Text.Parent_ID LEFT JOIN Strip_Text AS Strip_Scene_Text ON Strip_Scene_Text.Parent_Table = "Scenes" AND Strip_Panels.Scene_ID = Strip_Scene_Text.Parent_ID GROUP BY Strip_Dates.Strip_Date ORDER BY Strip_Dates.Strip_Date ASC;

All that's lacking is the (trivial) logic needed to intercept empty ('') values and replace them with NULL.

I already use a JOIN AS [Alias] in the big query with another table, although I don't concatenate the information from those two joins. I truly have no idea why why this solution had not already occurred to me.

---
  1. I don't mean to convey exasperation by repeatedly referring to the context in my replies, since (fortunately) it ended up being pretty irrelevant to my question, and so much of it relies on an understanding of other things like table structure that is not exposed.

posted by The Confessor at 4:01 PM on May 25, 2013


« Older What tools should I use to build this website?   |   Get me up to speed on HTML5, Responsive Design... Newer »
This thread is closed to new comments.