Is it OK if I pretend SQL is just a funny dialect of Haskell?
April 22, 2010 10:29 AM

I want to use nested SELECT statements to make my SQL procedure more readable. Is this a bad idea from a performance point of view?

I'm writing a stored procedure to retrieve some data from a couple of tables in a MySQL database, along with some calculated values. The calculation logic is pretty involved, with lots of IF statements, and can naturally be broken down in a series of steps and made both more readable and (somewhat) more efficient by calculating some intermediate values. Since field aliases can't be referenced in the same SELECT statement, I am thinking of using a chain of nested SELECT statements to explicitly structure the computation.

As an example, suppose I want to retrieve values a,b,c and calculate functions f(a,b,c),g(a,b,c),h(a,b,c). The straightforward way would be to do


SELECT a,b,c, f(a,b,c), g(a,b,c), h(a,b,c) FROM mytable;


The functions f,g,h can be made much simpler by first calculating some intermediate value x=s(b,c) like this:


SELECT a,b,c, x, f1(a,x), g1(a,x), h1(a,x) FROM
(SELECT a,b,c,s(b,c) as x FROM mytable);


In reality, I might have more than one step, like so:


SELECT a,b,c, x,y, f2(a,x), g2(a,y), h2(x,y) FROM
(SELECT a,b,c, x, t(a,x) as y FROM
(SELECT a,b,c, s(b,c) as x FROM mytable)
);


Is this a bad idea from a performance point of view? Am I wasting memory creating intermediate tables to avoid repeating some IF statements, or is MySQL smart enough to generate rows lazily as needed?
posted by Dr Dracator to Computers & Internet (6 answers total) 3 users marked this as a favorite
IANADBA but I believe that if you use subqueries you are putting the database in a situation where optimization is impossible and results in full table scans (for instance). Using your first single statement would allow the optimizer to do its work. If there is a way to further combine these transaction that might work but I don't believe that simply breaking them down into small pieces will result in any performance improvements. If you haven't done it, I would recommend running an 'explain plan' or the equivalent on the queries.
posted by mcarthey at 10:42 AM on April 22, 2010


Hrm, MySQL is better than it used to be about subselects, but it may be better to break it into a few steps, and make temporary tables.

CREATE TABLE tmp_uuid_foo AS SELECT ...

SELECT ...

DROP TABLE tmp_uuid_foo;
posted by cmiller at 10:52 AM on April 22, 2010


Use EXPLAIN, or EXPLAIN EXTENDED, in front of your select queries to find out the differences in how they're executed.
posted by joost de vries at 11:28 AM on April 22, 2010


It's bad for performance in that you are doing (worst case) 3 full table scans instead of one, depending how large the table is that may not matter too much.

The best way to analyse database performance is to just try it both ways with some data and measure the time but I think its unlikely to be 100x slower or anything.

MySQL does support VIEWS now so you could simplify this to

SELECT a,b,c, x,y, f2(a,x), g2(a,y), h2(x,y) FROM my_awesome_view;
posted by Lanark at 12:31 PM on April 22, 2010


I like sub-queries because they often produce code that's significantly easier to read than the corresponding joins .

I also think that concerns about performance are often overdone. I know some databases have tables with billions of rows but out here in the real world I often see useful databases where the 80% of the tables have so few rows that a full table scan would be faster than a key access anyway. If you're combining BIG_TABLE with TINY_TABLE1, TINY_TABLE2 etc then I would suggest that the performance hit might easily be no concern.

Regarding temp tables - yes it also produces good looking code and has a nice 'modular' feel to it (slicing up the query into smaller bits) but I'm not sure what the performance impact of them within MySQL is so you'd need to check that if performance was likely to be problem.
posted by southof40 at 3:11 PM on April 22, 2010


If performance is not an issue, I like the subqueries form best. To me it's just cleaner. Although I am not averse to using temp tables. Also, if you are getting complicated with the subqueries I would pick up a SQL formatting program.. very handy.
posted by jockc at 3:50 PM on April 22, 2010


« Older What would you do with 30k in cash savings at 27?   |   Changing demographics of France: fave book? Newer »
This thread is closed to new comments.