How to determine if a record has a certain value at any given time.
August 4, 2006 1:13 PM   Subscribe

I need some SQL Server/database design help. I have a custom built admin for an ecommerce site where we manage our orders. During the life of an order, it can have many statuses ...

For example: In Process, Transmitted, Confirmed, Backordered, Shipped, Closed.

In my database I keep track of status changes in a table called Tracking. It's basically a log of status changes an order goes through and when. A sample data set for Order # 12111 might be something like:

Tracking
-------------
uid orderID statusID dateChanged
1 12111 5 7/1/2006
22 12111 21 7/1/2006
43 12111 7 7/3/2006
54 12111 8 7/5/2006
102 12111 7 7/8/2006
155 12111 15 7/15/2006

uid is the unique identifier for the record. StatusID is just a numeric unique identifier for the statusnames. The statusIDs don't progress sequentially in any way, meaning a statusID of 15 doesn't mean the order is further along in the process than a statusID of 7. DateChanged is the date the order was changed to that status.

Our accounting department wants to be able to do some kind of query to find out which orders were in a certain status on any given date or within a date range. For example, they might want to find all orders on Backorder (statusID=8) on 7/7/2006. What is the best way to do that?

Order 12111 would be one of those orders returned in the result set as it's status changed to backorder on 7/5/2006 and it didn't move to another status until 7/8/2006. How can I query for these records?

I'm hoping it can be done easily, but let's say it can't. What changes can I make to my Tracking table to be able to perform queries like that easily? Do I need to start keeping track of what the previous status is in each record? Something like:

Tracking
-------------
orderID oldstatusID statusID dateChanged
12111 1 5 7/1/2006

I really, really don't want to change the Tracking table because I know it would be a nightmare trying to back populate fields, etc. BUT if I have to, I will.

Help, please!
posted by SoulOnIce to Computers & Internet (12 answers total)
 

select T1.orderid
from tracking T1
where T1.uid = (select max(T2.UID) from tracking T2 where T2.date <= @DateCriteria and T1.OrderID = T2.OrderID)
and T1.StatusID = @StatusID


or something along those lines, you'll need a correlated subquery in anycase.
posted by blue_beetle at 2:09 PM on August 4, 2006


That might be really slow, given that it has to do a separate query for every single row in the table which hits the first time.

It will run a lot faster if you change it to have an extra date field for when the orderID stopped having the status in question. Then you can make the query with a single scan of the table (even less if you've indexed it by one of the date fields)

Creating a new table with the additional field from the old table is a single SQL statement, so its really not a lot of work.
posted by pharm at 2:23 PM on August 4, 2006


But do note that your order may have *many statuses* on any given calrndar day.
posted by baylink at 2:57 PM on August 4, 2006


blue_beetle has it.

Note that his query relies on uid being monotonic increasing, which is probably true for most auto_increment/serial types; he's using uid rather than the date column because max( uid ) is gaurenteed to be unique, while the max( date ) isn't. Note also that if an order enters two or more statuses on the same day (or period), this query will only show the last status for the day (or period), which is probably what you want.

If in fact you want to find all orders that were in a particular status at any point during a period, you'd want to do this query:

select * from tracking a where a.statusid = @statusid and a.date >= @period_start and a.date <= @period_end
posted by orthogonality at 3:08 PM on August 4, 2006


orthogonality: that query for a range isn't correct. The difference is between the words "all orders that were in a particular status" and "all orders that entered a particular status". Your query captures the latter, but not the former. What if an order entered the status before the date range, but is still in that status once the date range starts? I.e. the order was shipped before @period_start, and still hasn't changed status until after @period_end. Based on this schema, it wouldn't show up in your results at all, because the status change dates fall outside your range.

soulonice: For the "on a given date" part ... because of your table structure, you can infer an order's status at a given point in time based on the status change record directly prior to that point. Therefore, you can select the one "most recent" record for each order before that date (which gives you a snapshot of where every order was at that point) and then further restrict it to only rows with a particular status. Something like this might work:
SELECT T.orderid
  FROM tracking T
  WHERE 
    T.date < @datecriteria br>
    AND NOT EXISTS (
        SELECT * FROM Tracking 
          WHERE orderID = T.orderID 
          AND uid > T.uid
          AND date < @datecriteriabr>
    )
    AND T.StatusID = @StatusID
That should give you all the direct predecessor status changes to your point in time (and thus tell you where every order was at that time, with exactly one row per order). Then the statusID criterion narrows the list to just the status you're looking for.

The second question (in a range) is a little harder from this point of view. I'd try unioning the query above with orthogonality's query (which yields all orders that transitioned into that status during the date range). That'll give the full range.

The bigger question is that this is a difficult schema to work with for this data; like a log file, easy to write to but hard to report on. If there are a finite number of statuses and each order only goes into each status once, you could transform those key points into columns in the order table:
In_Process_Date DATETIME, 
Transmitted_Date DATETIME, 
Confirm_Date DATETIME, 
Backorder_Date DATETIME, 
Ship_Date DATETIME, 
Close_Date DATETIME
But of course, that doesn't work well if orders ping-pong back and forth to various statuses in their lifetime. But if you can get away with it, it makes life a hell of a lot easier. Give it some thought, though - from the example statuses you listed, I don't see why an item wouldn't hit each status just once. Throw in a few extras (delivery_date, return_date, refund_date, etc) and you can probably account for all circumstances.
posted by ivarley at 11:05 PM on August 4, 2006


Yes, blue_beetle and orthogonality have it, depending on whether you want just the last status of an order or any status in the time period.

A variation of blue_beetle's query that might be faster (depending on how clever the query optimizer is) is this:

SELECT T1.orderid
FROM tracking T1,
(SELECT T2.orderid, MAX(T2.uid) maxid
FROM tracking T2 WHERE T2.date < @datecriteria br> GROUP BY T2.orderid) T3
WHERE T1.uid = T3.maxid
AND T1.StatusID = @StatusID

posted by klausness at 4:38 AM on August 5, 2006


Oops, looks like I was previewing a stale version of the page and missed ivarley's comment. Indeed, my query (and orthogonality's and blue_beetle's) just gets the orders that entered a status in the date range, not the orders that are in that status in the date range. I guess I need to work on my reading comprehension...
posted by klausness at 4:43 AM on August 5, 2006


Best answer: ...so if you want all orders that ever had a given StatusID between MinDate and MaxDate (inclusive), I think this should work:

SELECT T1.orderid
FROM tracking T1,
(SELECT T2.orderid, MAX(T2.uid) maxid
FROM tracking T2 WHERE T2.date <= @MinDate
GROUP BY T2.orderid) T3
WHERE T1.uid = T3.maxid
AND T1.StatusID = @StatusID
UNION
SELECT T4.orderid
FROM tracking T4
WHERE T4.date > @MinDate
AND T4.date <= @MaxDate
AND T1.StatusID = @StatusID

If you don't mind duplicates, you can speed things up by using a UNION ALL instead of a UNION.
posted by klausness at 5:26 AM on August 5, 2006


Response by poster: Wow, thanks! klausness, I believe you nailed it, except that last line of code should be (for references sake):

AND T4.StatusID = @StatusID

Thanks for everyone's input, I certainly learned a lot!
posted by SoulOnIce at 7:49 AM on August 5, 2006


ivarley writes "orthogonality: that query for a range isn't correct. The difference is between the words 'all orders that were in a particular status' and 'all orders that entered a particular status'. Your query captures the latter, but not the former."


Yeah, you're right. Thanks for catching that.
posted by orthogonality at 11:28 AM on August 5, 2006


Sorry about the typo on the last line there. Glad I could help, in any case.
posted by klausness at 12:03 PM on August 5, 2006


Great Read!! Here's a twist on this topic. I have a similar need but I'd like to find duplicates based on Dates within a two week window.

Anyone know how I can do this?

Sample Data:

ID,Member,Account,DateSent
XXXXXX192725,John Jenson,5555330,300,2006-05-03 07:31:00.000
XXXXXX192725,John Jenson,5555330,300,2006-05-10 07:36:00.000
XXXXXX192725,John Jenson,5555330,300,2006-05-12 07:31:00.000 < ----- possible duplicate br> XXXXXX192725,John Jenson,5555330,300,2006-05-19 07:31:00.000
XXXXXX192725,John Jenson,5555330,300,2006-05-26 07:31:00.000
XXXXXX192725,John Jenson,8044911385,500,2006-05-02 10:00:00.000
XXXXXX192725,John Jenson,8044911385,500,2006-05-12 07:31:00.000
XXXXXX192725,John Jenson,8044911385,500,2006-05-26 07:31:00.000


I've taken a stab at this with Klausness's SQL script but it returns all the records.

SELECT Q1.UniqKey,Q1.ProcessDate
FROM BillPayInvoice Q1,
(SELECT Q2.UniqKey, MAX(Q2.MemberId) MemberId
FROM BillPayInvoice Q2
WHERE Q2.ProcessDate BETWEEN DATEADD(Day,14,ProcessDate) AND DATEADD(Day,-14,Q2.ProcessDate)
GROUP BY Q2.UniqKey) Q3
WHERE Q1.MemberId = Q3.MemberId
UNION
SELECT Q4.UniqKey,Q4.ProcessDate
FROM BillPAyInvoice Q4
WHERE Q4.ProcessDate BETWEEN DATEADD(Day,14,ProcessDate) AND DATEADD(Day,-14,Q2.ProcessDate)


Thanks in advance,

BadMonkey
posted by BadMonkey at 10:06 AM on August 17, 2006


« Older Resource problems on a web forum   |   WhiskerFilter Newer »
This thread is closed to new comments.