Join 3,573 readers in helping fund MetaFilter (Hide)


CALL ask_metafilter();
October 17, 2010 2:00 AM   Subscribe

I think MySQL procedures and temporary tables are the bee's knees - am I Doing It Wrong?

I'm writting a bunch of reports off a MySQL database, and have settled towards the following pattern: I'll create a procedure that makes a temporary table, fills the table with data from several sources, mostly using INSERT ... SELECT and UPDATE statements, and finally selects the entire output table. The end table will usually have a few (5-10) rows, corresponding to a high level view of the data.

I could probably achieve the same effect with liberal application of views and aggregation functions, but I find the procedural approach much more readable and convenient.

Is this a Bad Thing from a performance point of view? Am I putting undue stress on the server in any way by dropping and re-creating temporary tables with each procedure invocation? Should I be using any specific options when creating the tables for best performance?
posted by Dr Dracator to Computers & Internet (12 answers total) 2 users marked this as a favorite
 
I could probably achieve the same effect with liberal application of views and aggregation functions, but I find the procedural approach much more readable and convenient.

That's the important part. Ignore the rest, that's for the database (or your DBA) to handle.
posted by devnull at 2:16 AM on October 17, 2010


How often are you showing these reports?

If you're doing it (for example) only once a day, and your data set isn't too gigantic, then it probably doesn't matter. What matters a lot more is the ease of maintaining/understanding the code.
posted by beerbajay at 2:18 AM on October 17, 2010


I can't really address the MySQL aspect of this, but in my professional life this is the way this is done in MSSQL, Sybase and Oracle in the (multi-national, household name) investment banking clients I work for.

It's important to understand that as data volumes rise you may have to rework the underlying select into's etc for performance reasons (or at least recompile the procedures on a regular basis) but any half decent database should handle the scenario you are talking about with ease.
posted by hardcode at 4:23 AM on October 17, 2010


With MySQL it's the same as with normal programming languages, say like C or C++. The general advice for optimization is simple, regardless of environment, and consists of two steps.

1. Don't do it,
2. (for experts only) Don't do it yet.

Stop worrying about it until you run into something that is a performance bottleneck for you, and only then look at what is going wrong. Although, with most run-of-the-mill database performance problems adding an index at the right place or changing a join or select will be quite enough, and procedures instead of views is not likely going to make much of a difference.

For all other things YOUR performance, in the form of ease of doing things, is far more valuable than the database performance.
posted by DreamerFi at 5:45 AM on October 17, 2010


Is this a Bad Thing from a performance point of view? Am I putting undue stress on the server in any way by dropping and re-creating temporary tables with each procedure invocation? Should I be using any specific options when creating the tables for best performance?

Your strategy would be bad if you were an application programmer trying to build a scalable system. As a person working with a small data warehouse, this seems fine.

As a rough guideline, do whatever seems convenient until you are spending more than about $30k on hardware. Below $30k, the hardware is generally going to be cheaper than reimplementing your system with more code outside the RDBMS system and less code inside it. Above $30k it might be worth using the RDBMS as a data store and refactoring your stored procedures into java, python, or whatever you want running on separate systems.


Views are extremely, horribly, terribly evil because you can't index them. Anyone who uses a view with more than 30 rows should go to their room and think about what they've done :)
posted by b1tr0t at 6:15 AM on October 17, 2010


Anyone who uses a view with more than 30 rows should go to their room and think about what they've done :)

True, and that's mostly from a maintainability point of view, which nicely intersects with Dracators problems - stuff that's easy to maintain is far more valuable than stuff that has been prematurely optimized.
posted by DreamerFi at 6:21 AM on October 17, 2010


Views are extremely, horribly, terribly evil because you can't index them. Anyone who uses a view with more than 30 rows should go to their room and think about what they've done :)

thats complete rubbish. views are a very powerful complex-query-decomposition tool ideally suited to this application. In the end if the optimizer is any good it will see through the layers and just find the best approach using the native indexes.

I have a lot of experience working with databases and despite opinions to the contrary, they are *not* all created equal. Nested queries that get MySQL and Sybase into awful n^3query plans will operate perfectly and quickly in Oracle.

I think this is why the #temptable business really came to a head in Sybase/SQLServer and to some degree MySQL - its not so much for the user as it is for the optimizer that simply cant do the algebra properly with too much nesting.

In terms of performance its much much worse to go creating temp tables assuming the optimizer can actually see through the fog and implement it properly. If the optimizer cant handle the nesting, then you have no real choice.

Im not trying to troll "Oracle is better" or whatever, but just saying that how you go about these problems really depends on what tools you are using. I
posted by H. Roark at 7:30 AM on October 17, 2010 [1 favorite]


As a former sysadmin I would advise you just keep doing what you're doing until someone bitches about it.

At some point, maybe the performance will get bad enough, in an important enough way, that someone will figure out who's doing that thing, and come storming into your office all het up about your reports hogging all the resources. When that happens, obviously you would apologize profusely and figure out a different way to do things.

Until that point, it's obviously not impacting anything in a negative fashion. Don't worry about overworking the servers, like a string of 18th century coal-mining ponies. They're machines; that's what they do!

Think of all the sad, lonely servers that just sit around idle all day because no one knows or cares how to use them. THAT'S the real shame.
posted by ErikaB at 7:51 AM on October 17, 2010


I agree with the above, you seem to be doing fine. However, I'm pretty sure that if you compiled your procedural approach into one large query with joins and such that MySQL would create the temp table for itself anyway. I'm a little rusty on the behind the scenes operations of MySQL, but for my interpretation you seem to be in the same neighborhood as what MySQL would be doing anyway.
posted by rhizome at 9:45 AM on October 17, 2010


The problems I've seen with temp tables usually relate to slowness that leads to locking. But I wouldn't start trying to build some FrankenQuery substitute until you know for sure that there's a problem (premature optimization and all that).

Views aren't nearly as bad as some people suggest. You can even insert into views (YES YOU CAN!)
posted by Civil_Disobedient at 9:48 AM on October 17, 2010


Temp tables are ugly, as they are disconnected from the source data, and can thus contain arbitrary data. A view maintains the connection, can't contain arbitrary data, and is always as up to date as the underlying data.

Decomposed, layered views let me "see" the transformation of my data, as each layer is only a single transform. That also lets me insert shims between layers to deal with new requirements. And as each layer does only one thing, I can easily test each layer.

Basically, with a temp table, you have a handheld calculator that only shows you the result of the current operation. With views, I have an adding machine tape that shows me all the operations I've performed to get to the current operation. And if there's a mistake, rather than re-doing all operations, I just change the one that's incorrect.
posted by orthogonality at 5:14 PM on October 17, 2010


How often are you showing these reports?

Infrequently under normal operation - a few times per month, at most. It is however possible the user might request the report several times in a short time span to tweak parameters, or just because they can. My tables are now in the hundreds of rows range, but will (hopefully) increase by several orders of magnitude as the system is used.

Temp tables are ugly, as they are disconnected from the source data, and can thus contain arbitrary data.

Just to clarify, most of my procedures look like this:
BEGIN

   DROP TABLE IF EXISTS temp_results;

   CREATE TEMPORARY TABLE temp_results (...);

   (... do stuff to temp_results ...)

   SELECT * FROM temp_results;

END
I'm running the procedure every time the report is requested, precisely because I don't want to be serving stale data. This also gives me a nice abstraction layer, as I can later change the procedure to select from a view without having to change anything in the front-end.
posted by Dr Dracator at 11:25 PM on October 17, 2010


« Older Your favorite baked pumpkin re...   |  WW2 books? Especially looking ... Newer »
This thread is closed to new comments.