Is it OK if I pretend SQL is just a funny dialect of Haskell?
April 22, 2010 10:29 AM Subscribe
I want to use nested SELECT statements to make my SQL procedure more readable. Is this a bad idea from a performance point of view?
I'm writing a stored procedure to retrieve some data from a couple of tables in a MySQL database, along with some calculated values. The calculation logic is pretty involved, with lots of IF statements, and can naturally be broken down in a series of steps and made both more readable and (somewhat) more efficient by calculating some intermediate values. Since field aliases can't be referenced in the same SELECT statement, I am thinking of using a chain of nested SELECT statements to explicitly structure the computation.
As an example, suppose I want to retrieve values a,b,c and calculate functions f(a,b,c),g(a,b,c),h(a,b,c). The straightforward way would be to do
The functions f,g,h can be made much simpler by first calculating some intermediate value x=s(b,c) like this:
In reality, I might have more than one step, like so:
Is this a bad idea from a performance point of view? Am I wasting memory creating intermediate tables to avoid repeating some IF statements, or is MySQL smart enough to generate rows lazily as needed?
I'm writing a stored procedure to retrieve some data from a couple of tables in a MySQL database, along with some calculated values. The calculation logic is pretty involved, with lots of IF statements, and can naturally be broken down in a series of steps and made both more readable and (somewhat) more efficient by calculating some intermediate values. Since field aliases can't be referenced in the same SELECT statement, I am thinking of using a chain of nested SELECT statements to explicitly structure the computation.
As an example, suppose I want to retrieve values a,b,c and calculate functions f(a,b,c),g(a,b,c),h(a,b,c). The straightforward way would be to do
SELECT a,b,c, f(a,b,c), g(a,b,c), h(a,b,c) FROM mytable;
The functions f,g,h can be made much simpler by first calculating some intermediate value x=s(b,c) like this:
SELECT a,b,c, x, f1(a,x), g1(a,x), h1(a,x) FROM
(SELECT a,b,c,s(b,c) as x FROM mytable);
In reality, I might have more than one step, like so:
SELECT a,b,c, x,y, f2(a,x), g2(a,y), h2(x,y) FROM
(SELECT a,b,c, x, t(a,x) as y FROM
(SELECT a,b,c, s(b,c) as x FROM mytable)
);
Is this a bad idea from a performance point of view? Am I wasting memory creating intermediate tables to avoid repeating some IF statements, or is MySQL smart enough to generate rows lazily as needed?
Hrm, MySQL is better than it used to be about subselects, but it may be better to break it into a few steps, and make temporary tables.
CREATE TABLE tmp_uuid_foo AS SELECT ...
SELECT ...
DROP TABLE tmp_uuid_foo;
posted by cmiller at 10:52 AM on April 22, 2010
CREATE TABLE tmp_uuid_foo AS SELECT ...
SELECT ...
DROP TABLE tmp_uuid_foo;
posted by cmiller at 10:52 AM on April 22, 2010
Use EXPLAIN, or EXPLAIN EXTENDED, in front of your select queries to find out the differences in how they're executed.
posted by joost de vries at 11:28 AM on April 22, 2010 [1 favorite]
posted by joost de vries at 11:28 AM on April 22, 2010 [1 favorite]
It's bad for performance in that you are doing (worst case) 3 full table scans instead of one, depending how large the table is that may not matter too much.
The best way to analyse database performance is to just try it both ways with some data and measure the time but I think its unlikely to be 100x slower or anything.
MySQL does support VIEWS now so you could simplify this to
SELECT a,b,c, x,y, f2(a,x), g2(a,y), h2(x,y) FROM my_awesome_view;
posted by Lanark at 12:31 PM on April 22, 2010
The best way to analyse database performance is to just try it both ways with some data and measure the time but I think its unlikely to be 100x slower or anything.
MySQL does support VIEWS now so you could simplify this to
SELECT a,b,c, x,y, f2(a,x), g2(a,y), h2(x,y) FROM my_awesome_view;
posted by Lanark at 12:31 PM on April 22, 2010
I like sub-queries because they often produce code that's significantly easier to read than the corresponding joins .
I also think that concerns about performance are often overdone. I know some databases have tables with billions of rows but out here in the real world I often see useful databases where the 80% of the tables have so few rows that a full table scan would be faster than a key access anyway. If you're combining BIG_TABLE with TINY_TABLE1, TINY_TABLE2 etc then I would suggest that the performance hit might easily be no concern.
Regarding temp tables - yes it also produces good looking code and has a nice 'modular' feel to it (slicing up the query into smaller bits) but I'm not sure what the performance impact of them within MySQL is so you'd need to check that if performance was likely to be problem.
posted by southof40 at 3:11 PM on April 22, 2010
I also think that concerns about performance are often overdone. I know some databases have tables with billions of rows but out here in the real world I often see useful databases where the 80% of the tables have so few rows that a full table scan would be faster than a key access anyway. If you're combining BIG_TABLE with TINY_TABLE1, TINY_TABLE2 etc then I would suggest that the performance hit might easily be no concern.
Regarding temp tables - yes it also produces good looking code and has a nice 'modular' feel to it (slicing up the query into smaller bits) but I'm not sure what the performance impact of them within MySQL is so you'd need to check that if performance was likely to be problem.
posted by southof40 at 3:11 PM on April 22, 2010
If performance is not an issue, I like the subqueries form best. To me it's just cleaner. Although I am not averse to using temp tables. Also, if you are getting complicated with the subqueries I would pick up a SQL formatting program.. very handy.
posted by jockc at 3:50 PM on April 22, 2010
posted by jockc at 3:50 PM on April 22, 2010
« Older What would you do with 30k in cash savings at 27? | Changing demographics of France: fave book? Newer »
This thread is closed to new comments.
posted by mcarthey at 10:42 AM on April 22, 2010