Increasing database performance for SELECTs
May 29, 2011 12:34 AM   Subscribe

Databases: is it better to spend money on faster disk I/O or partitioned tables? Which has more influence on the speed of a SELECT query?

The company I'm contracted to uses SQL Server 2008 R2 on a virtual server hosted by a data center. It's an OLTP database (although a decidedly impure one), with the largest tables holding about 1.5Tb comprising three years of history. Performance has been steadily dropping off in terms of running queries on the data.

They will be scaling up in the future, and are trying to decide where the money is best spent. If they upgrade the server software to SQL Server Enterprise for around $50K they can partition the largest tables and indexes and performance will probably improve. Or they can upgrade their SAN storage to (expensive) SAS or Fibre Channel and goose disk I/O. I need to understand which is the more cost-effective option.

Is there any research out there that tackles this problem? I'm not a DBA (just a dumb BI guy), but I've been asked to get involved in the planning and I'd prefer to get into it having looked at the evidence and understanding the variables a bit better. I'd also like to understand (being a BI guy) if and how the equation changes if the scenario involved a data warehouse.
posted by Ritchie to Computers & Internet (8 answers total) 1 user marked this as a favorite
Not knowing how the virtual server is structured, I would first figure out how the current storage is structured. SQL Server, perhaps more than other databases, is very sensitive to disk I/O variations. You need to consider where the tempdb is located (it's used for GROUP BY or ORDER BY queries, for example, or for anything actually using a temporary table), as well as what disks actually host the database itself.

My guess is that just normal disk storage tuning -- without going all the way to a SAN solution -- will yield significant improvements. Beyond that, you could also take a hard look at queries being executed, with particular attention to query plans of the worst performing queries, and try to get them fixed up.

In my experience, partitioning only helps if you are going to only ever really want to query one or two partitions at a time. Once you are crossing more than a few partitions in a query, you're in effect paying the price of executing your query multiple times.

I would also spend some time with a book like Professional SQL Server 2008 Internals and Troubleshooting to learn more about how SQL Server works and how to go about looking at optimizing a working system.

Finally, buying better boxes is only going to take you so far; at some point, you're better off thinking of a different way to store data altogether. Could you, for example, have more than one SQL Server machine to store a portion of the data? It may be cheaper to have 3 machines with Standard Edition rather than purchasing Enterprise Edition and trying to get partitioning to work.
posted by haykinson at 12:58 AM on May 29, 2011

If you don't have a DBA currently, I would suggest hiring an expert with the brief of wringing maximum performance out of your current hardware. In the process of doing that he will gain intimate knowledge of your setup and queries, and be able to advise you what the next step should be. Unfortunately, people who have that kind of knowledge are outnumbered by people who think they do. But to give you an idea of what the benefits of tweaking your current setup might be, this month I saw badly-set-up database server reconfigured to double disc I/O.
posted by Leon at 4:01 AM on May 29, 2011 [1 favorite]

Benchmarking your I/O subsystem is always advisable to get you a baseline for performance tweeks that you make. MS ships a little tool called SQLIO that can do that although it is rough around the edges to say the least. However, here is great SQLIO tutorial.

I think this is especially important since you are running virtualized SQL Server and while it is possible to get good performance out of virtualized SQL Server, it is much more difficult to achieve.
posted by mmascolino at 6:53 AM on May 29, 2011 [2 favorites]

Definitely hire a DBA first. What you're actually looking for is your bottleneck; there's one somewhere, and it's probably not the table size. The bottleneck may be that you're running in a virtual server instance and need a dedicated server. Or it could be in your disk speed and I/O ability to your current disks. Or it could be in some other configuration aspect.

I definitely wouldn't spend any money on hardware or software until you know exactly where the bottleneck is.
posted by SpecialK at 9:42 AM on May 29, 2011 [1 favorite]

I too would recommend monitoring and benchmarking the current setup. Look at what is happening when utilization is high (or performance starts to slow). Without solid information about the actual current state of the system, everyone is just guessing.

But since we are just guessing, I would think the first thing to look at is how scalable the virtual server is, and what the costs and benefits of the virtualization are. Virtualization always adds overhead, but if that's mitigated by being able to "borrow" higher powered hardware during heavy usage, maybe it is good.

(Another issue to consider is that virtualization is going to mess up your benchmarking results. One, performance will be lower if other machines hosted on the same hardware are also under high usage. Two, utilization/performance will be relatively flat as the virtualization software throws more resources at your machine, until you hit a wall when it runs out of headroom.)
posted by gjc at 9:44 AM on May 29, 2011

Many (most?) performance tweaks are data-specific. Solutions that work wonders in one setting make not help or even can cause slowdowns in others. Science is the only way forward. Try a bunch of things, see which help.

Either that, or hire an expert who has already seen lots and lots of setups and has lots of experience and can give you advice customized to your situation. It sounds like you are in a situation where you are looking to blow tens of thousands of dollars on this decision, so hire an expert for 10kish and get them to tell you the right thing after they have studied your needs and setup.
posted by pmb at 10:07 AM on May 29, 2011

We go through this all the time in a bunch of different situations, and each one is always different. Luckily, the higher ups budget for test systems for this sort of thing. A DBA won't magically solve your problems, but they should know how to set up some benchmarks to figure out the best setup.
posted by devilsbrigade at 1:30 PM on May 29, 2011

Also, at least for Oracle, the default query plans suck. We've had to hint almost all of our queries to avoid Oracle just being utterly retarded at very inopportune times (why it ever thinks doing a full table scan over an index lookup is beyond me, yet this has happened several times). If SQL Server has similar functionality, it might be worth trying to fix up some of the queries to make sure you're not needlessly wasting cycles.
posted by devilsbrigade at 1:32 PM on May 29, 2011

« Older Does this blinker have a rate?   |   I'm peeing a LOT and drinking a ton of water. All... Newer »
This thread is closed to new comments.