Gurus only
July 31, 2006 10:33 AM   Subscribe

Perhaps advanced MSSQL question..

Ok, this may be a bit hard to explain, so please bear with me.

The problem is that a query I am doing is performing a table scan when it should not have to.

I have a unique key (clustered index) defined as follows:

hea_scheme
hea_subscheme
hea_pol
hea_rec
hea_incident
hea_ctl

(all ascending)

(Yes, I know that is a horrible key, but it is what it is.)

I need to get the first 10 results in that key order, starting with values like this:

hea_scheme=3000
hea_subscheme=2084
hea_pol='208400838 '
hea_rec=2
hea_incident=5743
hea_ctl='2006STI014027'

My progamming tool generates something like this:

SELECT TOP 10 * FROM tpahea WHERE (hea_scheme>=3000) AND ((hea_scheme>3000) OR ((hea_subscheme>=2084) AND ((hea_subscheme>2084) OR ((hea_pol>='208400838 ') AND ((hea_pol>'208400838 ') OR ((hea_rec>=2) AND ((hea_rec>2) OR ((hea_incident>=5743) AND ((hea_incident>5743) OR ((hea_ctl>='2006STI014027') ))))))))))

Problem is, this does a full table scan, reading 100,000s of records before returning my result. It should be instantaneous.

What am I doing wrong?

Shortening the query to this:
SELECT TOP 10 * FROM tpahea WHERE (hea_scheme>=3000) AND ((hea_scheme>3000) OR ((hea_subscheme>=2084) AND ((hea_subscheme>2084) OR ((hea_pol>='208400838 ') AND ((hea_pol>'208400838 ') OR ((hea_rec>=2) AND ((hea_rec>2) )))))))

Does not produce a table scan, so I figure it must be something in the last two elements.

Anyway, I know this is a tough one, but if anyone has any ideas, I will much appreciate them.

Thanks!
posted by eas98 to Computers & Internet (7 answers total)
 
what version of SQL Server are you running?

create table tpahea
(
hea_scheme integer,
hea_subscheme integer,
hea_pol varchar(10),
hea_rec integer,
hea_incident integer,
hea_ctl varchar(13)
)

create UNIQUE CLUSTERED index IDX_tpaheaCL on tpahea(hea_scheme, hea_subscheme, hea_pol, hea_rec, hea_incident,hea_ctl )

SELECT TOP 10 * FROM tpahea WHERE (hea_scheme>=3000) AND ((hea_scheme>3000) OR ((hea_subscheme>=2084) AND ((hea_subscheme>2084) OR ((hea_pol>='208400838 ') AND ((hea_pol>'208400838 ') OR ((hea_rec>=2) AND ((hea_rec>2) OR ((hea_incident>=5743) AND ((hea_incident>5743) OR ((hea_ctl>='2006STI014027') ))))))))))


If I run this script in SQL2005 (SP1) the execution plan indicates a clustered index seek only.
posted by blue_beetle at 10:45 AM on July 31, 2006


Response by poster: Thanks for the reply. I am running 2000, SP3.

I cheated a bit in my explanation (to keep it simpler). I am actually running cursors, and the cursor is prepared like this:

Execution Tree
--------------
Clustered Index Seek(OBJECT:([ICS_Ecu].[dbo].[tpahea].[hea_PrimKey] AS [A]), SEEK:([A].[hea_scheme] >= [@Param1001]), WHERE:([A].[hea_scheme]>[@Param1002] OR ([A].[hea_subscheme]>=[@Param1003] AND ([A].[hea_subscheme]>[@Param1004] OR ([A].[hea_pol]>=[@Param1005] AND ([A].[hea_pol]>[@Param1006] OR ([A].[hea_rec]>=[@Param1007] AND ([A].[hea_rec]>[@Param1008] OR ([A].[hea_incident]>=[@Param1009] AND ([A].[hea_incident]>[@Param1010] OR [A].[hea_ctl]>=[@Param1011]))))))))) ORDERED FORWARD)


Here are the declations:
-------------------------------
declare @P1 int
set @P1=244
declare @P2 int
set @P2=180150263
declare @P3 int
set @P3=4
declare @P4 int
set @P4=4
declare @P5 int
set @P5=-1
exec sp_cursorprepexec @P1 output, @P2 output, N'@P1 smallint,@P2 smallint,@P3 smallint,@P4 smallint,@P5 char(15),@P6 char(15),@P7 smallint,@P8 smallint,@P9 int,@P10 int,@P11 char(13)', N'SELECT A.hea_scheme, A.hea_subscheme, A.hea_pol, A.hea_rec, A.hea_incident, A.hea_ctl, A.hea_ClaimNo, A.hea_mark, A.hea_name, A.hea_undyear, A.hea_plan, A.hea_suffix, A.hea_chargeToDed, A.hea_ldt, A.hea_dia, A.hea_ccd, A.hea_aby, A.hea_DateAndTimeadt, A.hea_CNNoProv, A.hea_CNNoEnr, A.hea_TransTypeProv, A.hea_TransTypeEnr, A.hea_sts, A.hea_deact, A.hea_cohLocale, A.hea_cohBilled, A.hea_EOBNo, A.hea_depseq, A.hea_MainCedula FROM dbo.tpahea A WHERE ( A.hea_scheme >= @P1 ) AND ( ( A.hea_scheme > @P2 ) OR ( ( A.hea_subscheme >= @P3 ) AND ( ( A.hea_subscheme > @P4 ) OR ( ( A.hea_pol >= @P5 ) AND ( ( A.hea_pol > @P6 ) OR ( ( A.hea_rec >= @P7 ) AND ( ( A.hea_rec > @P8 ) OR ( ( A.hea_incident >= @P9 ) AND ( ( A.hea_incident > @P10 ) OR ( ( A.hea_ctl >= @P11 ) )))))))))) ORDER BY A.hea_scheme ASC, A.hea_subscheme ASC, A.hea_pol ASC, A.hea_rec ASC, A.hea_incident ASC, A.hea_ctl ASC', @P3 output, @P4 output, @P5 output, 3000, 3000, 2084, 2084, '208400216 ', '208400216 ', 1, 1, 4129, 4129, '2006STI013744'
select @P1, @P2, @P3, @P4, @P5

And then the execution:
exec sp_cursorfetch 180150263, 2, 1, 1

Problem is, that cursorfetch is peforming 106,000 reads in this case (the parameters are different in this last example).

Sorry for the confusion. Does this help?
posted by eas98 at 10:58 AM on July 31, 2006


Response by poster: My first example sucked because I forgot a part of it.

SELECT TOP 10 * FROM tpahea WHERE (hea_scheme>=3000) AND ((hea_scheme>3000) OR ((hea_subscheme>=2084) AND ((hea_subscheme>2084) OR ((hea_pol>='208400838 ') AND ((hea_pol>'208400838 ') OR ((hea_rec>=2) AND ((hea_rec>2) OR ((hea_incident>=5743) AND ((hea_incident>5743) OR ((hea_ctl>='2006STI014027') )))))))))) ORDER BY hea_scheme ASC, hea_subscheme ASC, hea_pol ASC, hea_rec ASC, hea_incident ASC, hea_ctl ASC

The ORDER BY clause is what's causing the heavy read load. I guess my question at this point is: why is the ORDER BY clause screwing with things if the ORDER BY is in the clustered key order?
posted by eas98 at 11:03 AM on July 31, 2006


Response by poster: Here is a little more info, in case it helps..

SELECT TOP 10 * FROM tpahea WITH(INDEX=hea_primkey) WHERE (hea_scheme>=3000) AND ((hea_scheme>3000) OR ((hea_subscheme>=2084) AND ((hea_subscheme>2084) OR ((hea_pol>='208400838 ') AND ((hea_pol>'208400838 ') OR ((hea_rec>=2) AND ((hea_rec>2) OR ((hea_incident>=5743) AND ((hea_incident>5743) OR ((hea_ctl>='2006STI014027') ))))))))))

This query is forcing the use of INDEX hea_primkey, which consists of all the fields in the query. No order by, but it is still doing 100000+ reads. Very curious..
posted by eas98 at 12:32 PM on July 31, 2006


I'm going to take a stab in the dark and just mention that the "_" is a wildcard in mysql. I don't pretend to understand the rest :)
posted by roue at 1:59 PM on July 31, 2006


Whitespace, my friend, is your friend.

I'm not at home, so I haven't run this through an actual SQL server, but I think what's killing you is the "OR"s; since the (top-level) ORs are on different attributes, the query can't really use the clustered index. (It can use it some, "some" in this case meaning, "until 106,000 records are left".)

But if it's really the ORDER BY killing you, and you reallydoes.)

If it's not the order by that's really killing you, you need to do the following: First of, all, get rid of the "WITH(INDEX=hea_primkey)" unless a show plan/ explain has actuallty indicated it's necessary; otherwise, it's just constraining teh optimizer, right?

Second, add indices on the attributes (fields/columns) that participate in each top-level OR.

Third, reconsider using a cursor; cursors are almost always the least efficient way to do things, and generally make the intent of the code far less obvious. (Long anecdote about a colleague who only ever did cursors when he should have done updates elided.)



eas98 writes "This query is forcing the use of INDEX hea_primkey, which consists of all the fields in the query. No order by, but it is still doing 100000+ reads. Very curious.."

No kidding. Yes, because the query is explictly requiring use of that index, as I indicated above. Get rid of that.


roue writes "I'm going to take a stab in the dark and just mention that the '_' is a wildcard in mysql."

Yes, but only within the second operand of the LIKE operator, when that operand is a literal string and not an identifier; anywhere else, as in eas98's example, it's just a legal part of an identifier.


Eas98, if you want more help, you can either a) "...get a consultant. It looks like you need a consultant. My hourly rate is..." (MetaTalk joke), or b) tell us what you're actually trying to do, not just how you're trying to do it, along with posting the exact DDL of the table in question, any indices on it, and the cursor. The cursor and the TOP 10 business strongly suggest that we don't actually need a cursor here. And please, please, please, with vertical whitespace. As you have it now, without the whitespace, it's pretty much unreadable.
posted by orthogonality at 2:44 PM on July 31, 2006


Er, part of my commnet got munged (I'm find this Apple unfriendly, I really am):
But if it's really the ORDER BY killing you, and you really just need the top ten, select 'em into a temp table then select * order by from the temp table. (The order by increases the query time by requiring a sort, and often but not neccesarily the records are written to a temp table in order to do the sort; but it cannot account for the table scanof 106,000 records, that's just not what an order by does.)
posted by orthogonality at 2:47 PM on July 31, 2006


« Older Name for a magic potion shop?   |   Doing/Eating/Seeing something in BarbaDOS Newer »
This thread is closed to new comments.