SQL Challenge
July 26, 2009 12:52 PM   Subscribe

SQL question: Getting all records based on a row-specific criteria

There are some number of processes that execute on a somewhat random schedule. When the process starts, we insert a record with the StartTime and a null EndTime / Succeeded value. If the process finishes, it updates the EndTime and Succeeded columns. But sometimes the process hangs/dies and never updates its record. We'd like to be able to distinguish between "Things which are probably still running" and "Things which probably died".

My schema looks like this:
ResultID    ProcessName     StartTime      EndTime      Succeeded
--------    -----------     ---------      -------      ---------
0           BuildWidgets    7:20 PM        7:25 PM      True
1           BuildWidgets    7:20 PM        7:28 PM      False
2           BuildWidgets    3:50 PM        3:55 PM      True
3           BuildSprockets  4:20 PM        5:25 PM      True
4           BuildWidgets    9:50 PM        NULL         NULL
5           BuildSprockets  9:30 PM        NULL         NULL
I want to have a stored procedure update this table every few minutes and sets Succeeded = false for all the records where it's taken longer than any prior succeeded instance of that ProcessName. In this example, when BuildWidgets passes, it always takes at most 5 minutes, whereas BuildSprockets might take an hour. For example, at 10:00 PM, we'd update the ResultID 4 BuildWidgets Passed = False, but leave the ResultID 5 BuildSprockets row alone since it might still be running.

SQL Server 2005 if it matters
posted by 0xFCAF to Computers & Internet (6 answers total)
 
Response by poster: Forgot to add I'm already familiar with DATEDIFF / GETSYSDATE, so you can skip explaining that part.
posted by 0xFCAF at 12:55 PM on July 26, 2009


I'd do something like:

update BUILD_RESULTS
set succeeded = FALSE
where datediff(mi, startTime, sysdate) > 5
and endTime is null
and succeeded is null
posted by jenkinsEar at 1:02 PM on July 26, 2009


You're either asking for an SQL based solution to the Halting Problem, or looking for a practical modification. I'll assume the latter.

Add a column to the table called "ExpectedTime", and have the process enter a value there. This will be an arbitrary value based on experience, but it can be a reasonable 'timeout' value. Have the stored procedure check if the current time is greater than StartTime + ExpectedTime. If it is, and the process hasn't updated as complete, it's probably hung and you can update succeeded to false.
posted by fatbird at 1:04 PM on July 26, 2009


Best answer: ah, I see, you want to read in the previous success length for that process.

update BUILD_RESULTS B
set B.succeeded = FALSE
where datediff(mi, B.startTime, sysdate) >
(select max(datediff(mi(C.startTime, C.endTime))
from build_results C
where C.processName = B.processName
and C.succeeded = TRUE
)
and B.endTime is null
and B.succeeded is null
posted by jenkinsEar at 1:12 PM on July 26, 2009


We'd like to be able to distinguish between "Things which are probably still running" and "Things which probably died".

Not possible. Imagine the hypothetical process that exits correctly and successfully -- in a thousand years.

I want to have a stored procedure update this table every few minutes and sets Succeeded = false for all the records where it's taken longer than any prior succeeded instance of that ProcessName.

Why do this?

Think about this: you have nothing to update. You have no new data, so what reason could you possibly have to do an update?

What you have is a hueristic, a rule of thumb: "any process that takes longer than the longest time something successfully completed is probably dead". You have an opinion about your data. You do not have new data.

See the distinction? You don't have any new information to add/insert/update into your table. Therefore, doing an update is foolish. In fact, it will cause you to lose data, as you overwrite that null with "false" -- what you're losing is that you're asserting a certainty where one does not exist. It's possible (if unlikely) that the process will at some point successfully complete, in which case setting succeeded = "false" is a lie.

I belabor this point because you need to see the need to always think clearly about your data. Most people are going to read this and not see my point. So if you go here and still don't see it, go back and read the question and my response above again.

Now, given all that, you'll respond, "but my heuristic is still correct most of the time." And you're right. Your heuristic is mostly correct.

Implement it as a view. Again, it requires no new data; it's just a transformation of the existing data:
create view process_with_hueristic_applied as 
select 
  a.*, 
  case when datediff( ss, a.starttime, getdate() )  
           > ( select max( datediff( ss, b.starttime, b.endtime ) )
           from processes b 
           where b.processname = a.processname and b.endtime is not null)
  then 1 else 0 end as probably_dead
from processes a;

Additional advantages to the view: no need for a cron job, no lag between now when I want to look at the data and when the last time the cron ran, no calculation except when I actually need the data, no updates, no updating of lies, when and if the maxtime changes, all rows are "updated" (refect the change) automatically, and anyone who wants a different heuristic she thinks is better can write her own view that will run against data not corrupted by lying updates to support your heuristic. Oh, and the "succeeded" column becomes redundant and can be removed (or calculated in a view to support existing code: "case when endtime is null then false else true as succeeded")).

Take home lesson one: updates are for changed data, not for data transformations. Views are for data transformations.

Take home lesson ultimate: don't lie to your database, or it will lie to you.
posted by orthogonality at 1:43 PM on July 26, 2009 [5 favorites]


I'm sorry, re-reading your example data I see that "succeeded" is not (solely) derived from "endtime".
posted by orthogonality at 1:54 PM on July 26, 2009


« Older What's wrong with my cat, and should I take him to...   |   We need a bulldog minus the 70 pounds of drool and... Newer »
This thread is closed to new comments.