How to pinpoint software culprit?
November 22, 2005 7:18 AM Subscribe
My computer is freezing intermittently. My gut says it's a software problem. Is there some sort of "black box" program to help figure out which program is the culprit?
For instance, a program which monitors the start up of new processes or applications, and records these events, or which monitors file accesses and records them in a log.
It should not bother keeping stuff over a certain period of time, since I'm only interested in what happens in the few seconds / minutes prior to the freeze. For instance, once I had an indexing program which used to hang the entire PC when it accessed a certain file to index it. I could never track down which file though (just knew it was the same one every time).
A solution to this problem would kill two birds with one stone: first -- find the offending program that's causing today's freezing; second -- allow me to re-start using the offending indexing program (which is now available as a plug-in to Google Desktop Search, but which still hangs the PC on the same files it always did).
For instance, a program which monitors the start up of new processes or applications, and records these events, or which monitors file accesses and records them in a log.
It should not bother keeping stuff over a certain period of time, since I'm only interested in what happens in the few seconds / minutes prior to the freeze. For instance, once I had an indexing program which used to hang the entire PC when it accessed a certain file to index it. I could never track down which file though (just knew it was the same one every time).
A solution to this problem would kill two birds with one stone: first -- find the offending program that's causing today's freezing; second -- allow me to re-start using the offending indexing program (which is now available as a plug-in to Google Desktop Search, but which still hangs the PC on the same files it always did).
Why not verify your gut feeling? It's easy:
First, download and burn a bootable recovery CD such as this one.
Now boot up with that and run some of the included test / burn in utilities such as MemTest86+
If your system passes, then your gut reaction was right.
posted by dudeman at 9:54 AM on November 22, 2005
First, download and burn a bootable recovery CD such as this one.
Now boot up with that and run some of the included test / burn in utilities such as MemTest86+
If your system passes, then your gut reaction was right.
posted by dudeman at 9:54 AM on November 22, 2005
If you have friends that work in the software industry, one of them might have access to Appsight. We use it here to trubleshoot all kinds of crazy software/os problems...
posted by starscream at 10:02 AM on November 22, 2005
posted by starscream at 10:02 AM on November 22, 2005
To use the Event Viewer, right click on the my computer icon, choose manage. The eventlog is divided into System, Security & Application logs, and is the 1st place I look.
Ctrl+Alt+Del gives you the security panel & you can choose Task Manager, then choose the Processes tab to see all running processes.
You can download Process Explorer from Sysinternals.com.
This behavior may indicate a virus of spyware, so install/run spybot & adaware and make sure your antivirus software is up to date.
posted by theora55 at 10:14 AM on November 22, 2005
Ctrl+Alt+Del gives you the security panel & you can choose Task Manager, then choose the Processes tab to see all running processes.
You can download Process Explorer from Sysinternals.com.
This behavior may indicate a virus of spyware, so install/run spybot & adaware and make sure your antivirus software is up to date.
posted by theora55 at 10:14 AM on November 22, 2005
Response by poster: OK -- thanks for the suggestions so far.
Event Viewer was first port of call. There is nothing in any of the event logs to indicate anything other than an approximate time for the freeze (which to date has always happened when the computer's unattended).
Spybot / Ad-aware, etc. all report the system as clean.
dudeman's suggestion is very worthwhile -- I shouldn't be laboring under a misapprehension if it's not the software. I will however, have to wait a day or two for the time to do this. I will report back when I have the results.
I already have Process Explorer in place, and use it mostly for tracking locked files -- a perennial problem with M$ applications (including M$Office and the MS Desktop Indexing tool...).
However, I need to figure out how to use it as a kind of rolling black-box recorder to track stuff in a log for the previous x minutes. (A log that I can inspect easily upon reboot, without it getting immediately overwritten). Hints on this are welcome.
starscream's getting the idea of what I'm looking for -- but something I can run indefinitely on a home computer. Appsight seems to have the breadth of thought -- i.e., what happened just before the freeze, in terms of processes, file handles, dialogs, application start / stops, etc.
Do any mefites know of something like Appsight, but free / affordable for home use?
posted by blue_wardrobe at 10:35 AM on November 22, 2005
Event Viewer was first port of call. There is nothing in any of the event logs to indicate anything other than an approximate time for the freeze (which to date has always happened when the computer's unattended).
Spybot / Ad-aware, etc. all report the system as clean.
dudeman's suggestion is very worthwhile -- I shouldn't be laboring under a misapprehension if it's not the software. I will however, have to wait a day or two for the time to do this. I will report back when I have the results.
I already have Process Explorer in place, and use it mostly for tracking locked files -- a perennial problem with M$ applications (including M$Office and the MS Desktop Indexing tool...).
However, I need to figure out how to use it as a kind of rolling black-box recorder to track stuff in a log for the previous x minutes. (A log that I can inspect easily upon reboot, without it getting immediately overwritten). Hints on this are welcome.
starscream's getting the idea of what I'm looking for -- but something I can run indefinitely on a home computer. Appsight seems to have the breadth of thought -- i.e., what happened just before the freeze, in terms of processes, file handles, dialogs, application start / stops, etc.
Do any mefites know of something like Appsight, but free / affordable for home use?
posted by blue_wardrobe at 10:35 AM on November 22, 2005
If you haven't checked dudeman's suggestion of running Memtest, give it a shot (it can take a while, depending on how much ram you have), but from your description, bad ram is my gut feeling.
posted by PurplePorpoise at 11:29 AM on November 22, 2005
posted by PurplePorpoise at 11:29 AM on November 22, 2005
Regmon and Filemon from sysinternals.com are probably what the others were implying.
However, if the PC freezes there really is no way to inspect their output. You might be able to get them to log to a file but I'm pretty sure that file would remain open and not flushed when the freeze happened.
Frankly though, it seems to me that if something is freezing the PC to the point where you can't Alt-Tab or anything, then that is a lot more than just "Some program." Normally a regular program running under a recent NT-based windows (such as XP or 2k) should not be able to do anything that takes the whole system down or makes it unresponsive. Only the UI of that program should be frozen - you should be able to switch to the task manager with ctrl-shift-esc instantly (since taskman runs at a higher than normal priority) and kill the offending task.
If indeed the whole PC is fozen to the point where you can't do any of these things then it suggests either you have a faulty kernel driver or faulty hardware, and I would most certainly try to rule those out first before going any further.
posted by Rhomboid at 11:52 AM on November 22, 2005
However, if the PC freezes there really is no way to inspect their output. You might be able to get them to log to a file but I'm pretty sure that file would remain open and not flushed when the freeze happened.
Frankly though, it seems to me that if something is freezing the PC to the point where you can't Alt-Tab or anything, then that is a lot more than just "Some program." Normally a regular program running under a recent NT-based windows (such as XP or 2k) should not be able to do anything that takes the whole system down or makes it unresponsive. Only the UI of that program should be frozen - you should be able to switch to the task manager with ctrl-shift-esc instantly (since taskman runs at a higher than normal priority) and kill the offending task.
If indeed the whole PC is fozen to the point where you can't do any of these things then it suggests either you have a faulty kernel driver or faulty hardware, and I would most certainly try to rule those out first before going any further.
posted by Rhomboid at 11:52 AM on November 22, 2005
How often does the hang happen? does everything just freeze or is it a reboot or bluescreen? Perhaps you could boot a linux CD and let the computer idle for awhile to see if it will freeze. That would could help strengthen the case for it being software. Also you could try booting in safe mode to see if it still hangs.
I agree with purpleporpoise.. bad ram or some device issue. Have you reseated everything?
posted by jockc at 11:53 AM on November 22, 2005
I agree with purpleporpoise.. bad ram or some device issue. Have you reseated everything?
posted by jockc at 11:53 AM on November 22, 2005
If were talking a true freeze -- mouse stops, keyboard doesn't work (and you can't change the Cap/Numlock lights), it just stops until you reset it, nothing in the logs -- it's almost certainly hardware, not software.
It's trivial for software to kill a machine, but modern machines have exception handlers, even last resort ones like the Unix Kernel Panic and the Window STOP error. It's very hard with modern multitasking machines to just vaporlock the machine to the point that the exception handlers don't even get a chance to play -- via software. With hardware, though, it's really easy -- and it's quite frequently heat related as well. The other two big freeze candidates are the memory controller (read, need motherboard) anda a really flakey power shutting down the CPU (read, powersupply.)
Most other hardware failures don't cause a true freeze, though a bad video card can look like it did -- the tell is that you can cycle the numlock light, or you can ping the machine from elsewhere on the network.
Once again, we're talking a true, complete freeze. If something's getting written to the logs, that's not a freeze. If the Caps/Num/Scrol lock lights work, it's not a hardware freeze. If you can ping the machine, it's not hardware.
Caveat: This applies to desktop and consumer grade PCs. Workstation and servers have more monitoring gear onboad that may well respond when the CPU is stopped -- but in such cases, they also *tell* you the problem. Most people aren't spending that kind of money, though.
posted by eriko at 1:53 PM on November 22, 2005
It's trivial for software to kill a machine, but modern machines have exception handlers, even last resort ones like the Unix Kernel Panic and the Window STOP error. It's very hard with modern multitasking machines to just vaporlock the machine to the point that the exception handlers don't even get a chance to play -- via software. With hardware, though, it's really easy -- and it's quite frequently heat related as well. The other two big freeze candidates are the memory controller (read, need motherboard) anda a really flakey power shutting down the CPU (read, powersupply.)
Most other hardware failures don't cause a true freeze, though a bad video card can look like it did -- the tell is that you can cycle the numlock light, or you can ping the machine from elsewhere on the network.
Once again, we're talking a true, complete freeze. If something's getting written to the logs, that's not a freeze. If the Caps/Num/Scrol lock lights work, it's not a hardware freeze. If you can ping the machine, it's not hardware.
Caveat: This applies to desktop and consumer grade PCs. Workstation and servers have more monitoring gear onboad that may well respond when the CPU is stopped -- but in such cases, they also *tell* you the problem. Most people aren't spending that kind of money, though.
posted by eriko at 1:53 PM on November 22, 2005
Response by poster: OK -- so two things to think about before I take my next steps:
1) I know of a piece of software (PaperPort) whose indexer (which does OCR on the fly) freezes my machine when it encounters certain .MAX files -- same point in the indexing cycle every time. (The company that makes this also makes a plugin to GDS, which also freezes my machine reliably). The symptoms of these freezes are what erico suggests are very hard in software -- and I don't disagree with him in principle. Still -- the evidence is there: mouse stops, caps lock won't work, Ctrl-Alt-Del won't work, etc.
This is the software I referred to earlier. With the indexer disabled, I get no freezes (until now, that is). This is how I know that software can (sometimes) cause total freezes. On this issue, if I could identify exactly which files, I could convert them to PDF and solve the problem, and re-enable the indexer.
This however, is not running at present, so I am leaning more towards hardware, based on the responses so far.
2) Upon inspection, the PSU is underrated for the machine when I take into account a year-old ATI RADEON graphics card. I suppose it is incumbent upon me to find a suitable PSU replacement before I take this too much further.
However, when I get the new PSU in place, I will still need to tackle item 1) again. The indexer is something I really want to get running, since I have a ton of scanned documents.
I will post results when I have them.
posted by blue_wardrobe at 5:16 PM on November 22, 2005
1) I know of a piece of software (PaperPort) whose indexer (which does OCR on the fly) freezes my machine when it encounters certain .MAX files -- same point in the indexing cycle every time. (The company that makes this also makes a plugin to GDS, which also freezes my machine reliably). The symptoms of these freezes are what erico suggests are very hard in software -- and I don't disagree with him in principle. Still -- the evidence is there: mouse stops, caps lock won't work, Ctrl-Alt-Del won't work, etc.
This is the software I referred to earlier. With the indexer disabled, I get no freezes (until now, that is). This is how I know that software can (sometimes) cause total freezes. On this issue, if I could identify exactly which files, I could convert them to PDF and solve the problem, and re-enable the indexer.
This however, is not running at present, so I am leaning more towards hardware, based on the responses so far.
2) Upon inspection, the PSU is underrated for the machine when I take into account a year-old ATI RADEON graphics card. I suppose it is incumbent upon me to find a suitable PSU replacement before I take this too much further.
However, when I get the new PSU in place, I will still need to tackle item 1) again. The indexer is something I really want to get running, since I have a ton of scanned documents.
I will post results when I have them.
posted by blue_wardrobe at 5:16 PM on November 22, 2005
If running as administrator, then the software could elevate its process priority to something higher than even the window manager or Explorer in which case if one of its threads grabbed the CPU for an extended period of time, it would appear to lock the system completely because all of the other UI threads would starve. Try running as a regular user. Or have process explorer running in the background at the same time, and set its priority as high as you can so that you can try switching to it during the hang. You can also try reducing the priority of the app that is known to cause the lockup.
Running as administrator, the process can even install services and/or drivers in realtime (without needing a reboot or anything) and the latter can run in kernel mode. Essentially, if you run a program as administrator it can do anything it wants, and this might cause the appearance of a hard lockup (when in fact it's just starving all UI threads of the CPU.) So try running it as a normal user, not an administrator.
posted by Rhomboid at 1:53 AM on November 23, 2005
Running as administrator, the process can even install services and/or drivers in realtime (without needing a reboot or anything) and the latter can run in kernel mode. Essentially, if you run a program as administrator it can do anything it wants, and this might cause the appearance of a hard lockup (when in fact it's just starving all UI threads of the CPU.) So try running it as a normal user, not an administrator.
posted by Rhomboid at 1:53 AM on November 23, 2005
Running as administrator, the process can even install services and/or drivers in realtime (without needing a reboot or anything) and the latter can run in kernel mode.
It would have to be very seriously miswritten to stop a ctrl-alt-delete from responding, and *not* throw a blue screen.
Still, worth a try, given the amount of effort required to try it. There's lots of "fixes" -- unplug and replug all the cables, cards and RAM is the classic example -- that rarely fix a real problem, but it only takes five minutes, so it's worth trying. Swapping a motherboard takes time and a motherboard, reseating everything doesn't.
So, yeah, I'd try it, but even at 100% kernel mode CPU load, certain things work --- Caps/Numlock are hardware baded and CTRL-ALT-DELETE is the lowest NMI you can reach in software.
The nice thing is blue_wardrobe has a test case (which argues strongly against heat, btw -- heat lockups take time, if you can run FOO and lock it, it's not time.) I'm still betting on hardware, but the test case does argue, somewhat convincingly, for software.
The fast test here, if you have two nearly identical machines, is to swap hard drives and see if it repeats. If it does, you've eliminated bad hardware, and the chance of it being software has just gone up tremendously. (Note, this wouldn't eliminate buggy hardware, since the bug would be in both machines, but it would eliminate bad hardware, unless both were bad in the same way.)
posted by eriko at 6:09 AM on November 23, 2005
It would have to be very seriously miswritten to stop a ctrl-alt-delete from responding, and *not* throw a blue screen.
Still, worth a try, given the amount of effort required to try it. There's lots of "fixes" -- unplug and replug all the cables, cards and RAM is the classic example -- that rarely fix a real problem, but it only takes five minutes, so it's worth trying. Swapping a motherboard takes time and a motherboard, reseating everything doesn't.
So, yeah, I'd try it, but even at 100% kernel mode CPU load, certain things work --- Caps/Numlock are hardware baded and CTRL-ALT-DELETE is the lowest NMI you can reach in software.
The nice thing is blue_wardrobe has a test case (which argues strongly against heat, btw -- heat lockups take time, if you can run FOO and lock it, it's not time.) I'm still betting on hardware, but the test case does argue, somewhat convincingly, for software.
The fast test here, if you have two nearly identical machines, is to swap hard drives and see if it repeats. If it does, you've eliminated bad hardware, and the chance of it being software has just gone up tremendously. (Note, this wouldn't eliminate buggy hardware, since the bug would be in both machines, but it would eliminate bad hardware, unless both were bad in the same way.)
posted by eriko at 6:09 AM on November 23, 2005
Response by poster: UPDATE: Problem resolved (I think)
I know it's been a while, but I was saving up for a new PSU.
I normally follow the maxim of "only change one thing at a time", but that didn't happen because...
I got to CompUSA, and was at the checkout with my new 450W PSU (nothing like overkill), when my wife called on my cellphone to say that the PC was making the most awful buzzing noise. It was still running.
When I got home with the new PSU, the buzzing noise had stopped, but so had the PC, with a "CPU Fan Malfunction".
Upon inspection, the CPU Fan was totally clogged with dust, so I removed it, vacuumed it, vacuumed the whole innards of the PC while I was at it. Ended up removing and reseating all the cards, air ducts, cables, etc.
Ah well, the thing's in pieces, may as well replace the PSU now. Oh, and upgrade my USB 1.1 2-port card to USB 2.0 4-port card. After all, the new card's been sitting staring at me for six months waiting for my round tuit to arrive.
The cleaned out fan ran perfectly out of the case, but stopped when placed back in place. This was repeatable. Hmmm. Went back to CompUSA to buy a fan. No fans of the right size available.
Took an old fan out of another similar beast and put it in. OK, this worked. So does the similar beast. My son pointed out that that fan hadn't even been connected to the power header on the mobo. Not for at least two years. Go figure.
Of course, neither machine has frozen since.
Thanks all you hardware pundits. I think you were right, but as to which component...
posted by blue_wardrobe at 1:06 PM on January 5, 2006
I know it's been a while, but I was saving up for a new PSU.
I normally follow the maxim of "only change one thing at a time", but that didn't happen because...
I got to CompUSA, and was at the checkout with my new 450W PSU (nothing like overkill), when my wife called on my cellphone to say that the PC was making the most awful buzzing noise. It was still running.
When I got home with the new PSU, the buzzing noise had stopped, but so had the PC, with a "CPU Fan Malfunction".
Upon inspection, the CPU Fan was totally clogged with dust, so I removed it, vacuumed it, vacuumed the whole innards of the PC while I was at it. Ended up removing and reseating all the cards, air ducts, cables, etc.
Ah well, the thing's in pieces, may as well replace the PSU now. Oh, and upgrade my USB 1.1 2-port card to USB 2.0 4-port card. After all, the new card's been sitting staring at me for six months waiting for my round tuit to arrive.
The cleaned out fan ran perfectly out of the case, but stopped when placed back in place. This was repeatable. Hmmm. Went back to CompUSA to buy a fan. No fans of the right size available.
Took an old fan out of another similar beast and put it in. OK, this worked. So does the similar beast. My son pointed out that that fan hadn't even been connected to the power header on the mobo. Not for at least two years. Go figure.
Of course, neither machine has frozen since.
Thanks all you hardware pundits. I think you were right, but as to which component...
posted by blue_wardrobe at 1:06 PM on January 5, 2006
This thread is closed to new comments.
For more information, see the MS technet article here.
posted by richardhay at 8:54 AM on November 22, 2005