Recurring computer crashes
October 6, 2010 7:02 AM   Subscribe

Weird recurring computer problems - gradual data corruption? Seems like we've tried everything...

Ever since my brother moved out to his new house, he's been experiencing frustrating computer issues. At this point it seems like we've tried everything and even though I'm the computer guy, this one has me stumped.

What will happen is that his computer will start crashing more and more. At first, he may get a day or two without any issues, then one day it'll suddenly reboot, the next day he may get a couple of blue screens, and it gets worse and worse until windows refuses to boot anymore. Once that happens, we will reformat the hard drive, reinstall windows, and repeat the experience. The blue screens change from time to time, I saw a BSOD in the mouse driver, I saw stop errors, this latest one is about a thread trying to release a resource it didn't own.

So far, we've replaced the motherboard, the CPU, the memory, the hard drive, the power supply and the DVD burner. The only part left which hasn't been replaced is essentially the video card. At some point I learned that the wiring in his new house didn't have a ground plug, so I told him to get that fixed. So now he's properly grounded. He was going through a UPS, I had him bypass that for now. The symptoms are still the same. He had win XP, we went to win 7 64-bit. Same pattern.

After replacing everything but the power supply, we ran memtest86+ for 18 hours and it encountered no errors.

I could ask him to shell out a couple hundred for a new video card, but, really? Would a video card do this? I guess I could have him run checkdisk and do a thorough bad-sector identification run, but I gave him one of my old hard drives so it seems unlikely to be the issue.

He is running out of patience, and I am running out of ideas. It seems unreal that he would be unable to have a properly functioning computer because of these issues, but we've essentially replaced everything and the problems remain.

Any bright ideas? Diagnostics we can try to isolate the issue? Please hope us!
posted by splice to Computers & Internet (19 answers total)
 
Try bringing it to another house, leaving it on for a few days, seeing if the issue persists.

If it does, then you have eliminated everything but the video card, so it would likely be time to replace that.
posted by FAMOUS MONSTER at 7:49 AM on October 6, 2010


Alternately, you could try running a system from a boot disk, perhaps an Ubuntu live CD. That might help narrow down if there are any problems with the hard drive, since you wouldn't be using it for anything. Of course, if it takes a few days for the problem to manifest, that might be too long to use an unfamiliar, non-persistent system if your brother needs to get work done. He could still surf the web and open/save files on a USB flash drive.
posted by The Winsome Parker Lewis at 8:02 AM on October 6, 2010


Seconding the Ubuntu CD. Linux fails more verbosely, and a good parallel kernel compile usually brings out any errors in the hardware. Also, I think it comes with memtest.
posted by themel at 8:37 AM on October 6, 2010


You say you replaced the power supply (I think), but did you upgrade it? If the system's peak utilization outstrips the power supply's capabilities, I'd expect random crashes as you describe.
posted by scatter gather at 8:51 AM on October 6, 2010 [1 favorite]


I saw behaviour like this when I moved to my current apartment. All of the problems stopped when I got a ups with power conditioning. If the line voltage in your building is unreliable you will get crashes and bluescreens even when there is nothing wrong with the hardware.
posted by Zetetics at 9:09 AM on October 6, 2010 [1 favorite]


Response by poster: Scatter gather, yeah, it was upgraded as well as replaced. Power utilization is a non-issue given that the system worked fine where he was before moving (afaik). At any rate if it was an issue, it wouldn't be anymore.

He was using a UPS with power conditioning. I thought perhaps it may be defective, so he's off of it for now, but with no change either way. I'll probably ask him to go back onto it. But with an 18-hour long memtest and no errors, I don't know if this is PSU related, you'd expect spikes that would cause data corruption to affect the memory tests somehow, no?

Linux and moving the computer to a different house are interesting options, but a bit complicated because of the logistics. He would have to use the computer regularly (I doubt not using it will result in any problem either way), but with Linux he's unlikely to play Starcraft 2, and I'm unsure about Minecraft with WINE. Moving the computer to another house would mean he'd have to go there periodically for a week or so to see if the problem reproduces itself under normal use.

If there was something like memtest that could diagnose different hardware parts that'd be swell, so I could run PSUTest and see, then VideoCardTest, etc.
posted by splice at 9:31 AM on October 6, 2010


I agree with Zetetics.
The house wiring might be foo. I had a problem when I first plugged my UPS in and it failed a grounding test. Had an electrician in to repair the wiring problem. That would affect the machine, whether it was plugged directly into the wall or via a UPS.

Also have you considered that the router has been hacked, which means that the machine will get infected very quickly, even when you re-install from scratch? Pay particular attention to the DNS addresses. Use OpenDNS to be safe. Or you can use Google's.

Also check for BIOS rootkits. They are no longer theoretical. Google that term.

I would also check any thumb drives or CDs or DVDs for malware. I assume your brother is recovering his user data after re-installing the OS. If the backup media is infected, you have a problem.

Good luck.
posted by PickeringPete at 9:39 AM on October 6, 2010


This might be a dumb question but did your brother plug his computer's power cord into a filtered socket on the UPS? Some UPS devices have filtered and non-filtered sockets. You might want to double check.
posted by PickeringPete at 9:45 AM on October 6, 2010


This set of facts makes me suspect that the problems are power-related. As you point out, that's hard to test for as problems will likely happen under heavy load, perhaps combined with external power factors. Running only memtest should not stress the PSU. What model of power supply and UPS is he using? Not all UPSs will compensate for all power issues.

Btw, minecraft is written in Java - no WINE required
posted by Zetetics at 9:46 AM on October 6, 2010


I tend to agree with Zetetics. It has not happened to me but apparently bad cords can cause trouble. Try another power cord from his computer to the UPS if all else fails.
posted by PickeringPete at 9:49 AM on October 6, 2010


Response by poster: I don't believe the UPS has non-filtered sockets. The model is an APC Back-UPS ES 500. It has 4 inputs for surge protection only, and 4 for battery + surge protection. He was hooking into the battery + surge protect hookups. Is there any way to test the UPS's functionality? Haven't used powerchute in a while, would it detect issues with the line conditioning?

He's not copying old user data over. Everything he installs after a windows reinstall is from downloads, not old files left around. He doesn't install anything really suspicious (Adobe reader, starcraft, java, minecraft, video drivers, avg free, chrome).

Since the PSU has been replaced, a bad cable is not going to be responsible. Cable was replaced along with the new PSU. Also, hard disk cable has been replaced as well.

I've been considering the possibility of an infected router. Seems rather remote but I guess it's possible. How do I check tho?

BIOS rootkits are crazy scary, and as far as I can see, no way to diagnose or detect them, and even flashing the BIOS might not correct the issue. Given that the mobo was replaced I would assume that's not the issue, unless he somehow got reinfected with the same rootkit and that rootkit could handle both BIOS versions...
posted by splice at 10:05 AM on October 6, 2010


While I only seconded the Linux suggestion for testing, not as an alternative, I can assure you that WINE runs StarCraft 2 perfectly, I have wasted a significant amount of time verifying that in recent weeks. Minecraft is a Java app and runs fine as well.
posted by themel at 10:21 AM on October 6, 2010


The Back-UPS ES 500 does not do what I think of as power conditioning. I guess that has become a useless term. If you have line voltage problems, that model will not help as it it just a battery and surge-protector. You would want something to protect against under-voltage. Something from the APC Smart-UPS line would work and Belkin used to have less-expensive models that would also work.

If the mother board is an ASUS model, they have utilities that can monitor and log supplied voltage and issue an alarm if it drops out of tolerance. Perhaps, ASUS PC Probe. Other manufacturers may have similar utilities.
posted by Zetetics at 11:17 AM on October 6, 2010


Sorry, missed that you had replaced the mobo. Therefore should not be a BIOS rootkit.

If he is not copying over user data then he is not re-infecting his machine, with a previous infection. This was a stretch anyway, since the problems seemed to coincide with your brother's move. So 2 things happened: (1) there was a physical move so perhaps there was some shaking up involved (2) the power system is different as is the internet hookup.

(1) You have replaced just about everything, so I would at least re-seat the video card to be thorough. Since you have invested a lot of time in this, if you wish to be exhaust the possibilities, I would disconnect all unessential peripherals, (USB hubs, printers, scanners, etc.) in case they are causing trouble. Boot in safe mode and see if the system is stable. Since the machine worked fine before, I would not think that the video card is drawing too much power. If that were the case, the machine should have been acting up before the move.

(2) You might ask the ISP to check out the cable modem or DSL modem. In my own case, the cable company warns that splitting the cable and using a TV signal booster will cause trouble. So maybe have it checked out just to rule it out.

To answer your question, the commonest thing with routers is that the DNS is changed by hackers. This was one of the more recent exploits. Quite a number of routers were vulnerable. Hackers change the DNS so that they get first dibs on the traffic and can set up a man-in-the-middle exploit; they can also trigger a drive-by download of malware.
Log into the router with your browser (probably something like 192.168.0.1 or 192.168.1.1). Navigate to the section that specifies the DNS servers. Make sure that they are valid servers. You can put the OpenDNS servers in there for safety in any case. If the DNS servers are good and the router is locked down, then I would assume that it is likely OK and not the problem. I am assuming that your brother is using a wireless connection. If he is using a LAN cable to connect to his router, that can be damaged and I have had weird problems in the past with that.

If you wish to be thorough, you should make sure that you have security on the router locked down. You do not want remote management enabled. UnP is a problem for some routers as well. Obviously, it should be password protected and you probably do not want the default name; rename it. You can get the latest firmware from the vendor's site and install it. You probably want to check the manual on how to do that. It only takes a couple of minutes to do and often involves TFTP.


I still tend to agree with Zetetics about the power system but am trying to think out of the box.
In my own case, the type of UPS that I first used, had a warning light for wiring faults. The electrician used a polarity/grounding test tool. You might be able to get one at an electronic supply store. Sorry, but I used an electrician.
posted by PickeringPete at 11:34 AM on October 6, 2010


Response by poster: These latest details about the UPS perhaps not doing everything it's supposed to may just be it. I was dismissing power issues based on the UPS doing what it should. Will check that out.

Will check the router out as well. If it's just about DNS servers being changed, etc, I can figure that out. At any rate if the latest diagnostic options (moving it to another house temporarily, off the router and modem it was using) result in a working system, I will wager that power issues are it. I do believe the mobo is an ASUS, we'll see about the voltage warnings.

Thanks for all the suggestions, and keep 'em coming if you have any more ideas. I'll be updating this as troubleshooting progresses.
posted by splice at 12:21 PM on October 6, 2010


I've encountered similar problems twice before. Here were the causes for mine:
1. The ribbon on the PATA cable, attached to my primary hard drive, was slightly torn causing intermittent read/write and OS failures. I'm guessing your probably using SATA cables but it might not hurt to switch them out and see what happens.

2. My parent's were having computer troubles similar to this. They had just gotten a new dog and I didn't notice that it had chewed on a USB line coming from the front of computer case. I kept scratching my head because I had built this computer for them just about a month before. Finally I noticed the lesions in the USB chord, unplugged it, rebooted and everything was fine. My guess is that when a powered USB cable shorts (even one that isn't attached to anything else, like this one was) it can cause the system to intermittently reboot or not boot at all.

In any case maybe it's time to give all of your cables some extra scrutiny.
posted by coolxcool=rad at 12:24 PM on October 6, 2010


Response by poster: Just some quick updates...

The video card won't need reseating. We've switched the motherboard since the move, so it already has been reseated (into the new motherboard).

We switched the hard drive cables, actually all cables have been switched (PSU cables in and out, no other USB devices). Perhaps the only cable left is network.

I had him run a thorough checkdisk, no new bad sectors identified, no corruption found on disk.

He's currently running another fresh install of win 7, got him to install asus probe and will monitor for voltage drops over the next few days.

Checked out his DNS settings, they're fine (provider's DNS).
posted by splice at 8:30 AM on October 7, 2010


Response by poster: Asus PC Probe didn't detect any funky voltages or temperatures. The problem remains.

We replaced the video card. The problem remains.

At this point all of the computer save the case has been replaced. I'm getting pretty stumped.
posted by splice at 1:28 PM on October 13, 2010


I would also like to suggest that you back up your data NOW.
posted by drstein at 2:17 PM on October 13, 2010


« Older Dyeing leather white   |   What Are Some Great Movies About War? Newer »
This thread is closed to new comments.