What is causing my new PC build to BSOD?
January 25, 2010 5:56 PM Subscribe
I build my own PC for the first time, and everything started out alright, computer booted, Windows 7 installed no problems. But ever since then I get about 2-5 minutes of playing around with my computer before it BSODs...
Longer version: The day I installed Win7 I had no problems, and was able to use my computer for 2+ hours, updating drivers, installing programs, etc. The next day though I got repeated BSODs shortly after starting the computer. This has been a repeated theme, where I will try something and it will work the first time, but later when I try my computer I get repeated BSODs. Sometimes it will be stable enough for me to fiddle around for longer, but usually not. A few things I've tried:
- Reinstalled Windows7, and let it update all the drivers rather than me installing them from disks.
- Install Windows XP (installs fine, but as with Win7 BSODs 2-5 minutes after logging into Windows)
- Upgraded the power supply from 380W to 650W
- Pull out the video card and put the video card from my old computer in. (Ran fine the first time I booted, BSOD 2-5min afterwards)
- Run in Safe Mode (I get a bit more time to play around 10-15min, but it eventually BSODs)
- MEMTEST86+ ran overnight (10 hours, 9 passes) with no errors. Ran each stick of RAM individually overnight with no errors as well (10 hours)
- Analyzed the BSOD errors. They are a variety (PFN_LIST_CORRUPT is the most common, but I also get IRQL_NOT_LESS_OR_EQUAL, and MEMORY_MANAGEMENT). When I looked at them with the Windows debugger I get different causes (ntkrnlmp.exe ( nt! ?? ::FNODOBFM::`string'+35ed2 )
win32k.sys ( win32k+c3ff8 )
Probably caused by : win32k.sys ( win32k+c3ff8 )
memory_corruption ( nt!MiBadShareCount+4c )
memory_corruption ( nt!MiBadShareCount+4c )
ntkrnlmp.exe ( nt!CmpKcbCacheLookup+1dd )
cdrom.sys ( cdrom!DeviceSendRequestSynchronously+db ) )
The build is:
MB GIGABYTE | GA-MA770T-UD3P AMD770
CPU AMD|ATH II X4 630 AM3 RT
MEM 2Gx2|GSKILL F3-10666CL9D-4GBRL
VGA GIGABYTE|GV-R489OC-1GD 4890 RT
HD 1T|WD 7K 32M SATA2 WD1001FALS
PSU ANTEC EA650W
I'm guessing this is some kind of hardware problem, but I've never troubleshooted anything like this, and I'm not sure how to procede. Does the MEMTEST 100% eliminate the chance of faulty RAM? Is there a way I can test to see if my motherboard is bad?
Longer version: The day I installed Win7 I had no problems, and was able to use my computer for 2+ hours, updating drivers, installing programs, etc. The next day though I got repeated BSODs shortly after starting the computer. This has been a repeated theme, where I will try something and it will work the first time, but later when I try my computer I get repeated BSODs. Sometimes it will be stable enough for me to fiddle around for longer, but usually not. A few things I've tried:
- Reinstalled Windows7, and let it update all the drivers rather than me installing them from disks.
- Install Windows XP (installs fine, but as with Win7 BSODs 2-5 minutes after logging into Windows)
- Upgraded the power supply from 380W to 650W
- Pull out the video card and put the video card from my old computer in. (Ran fine the first time I booted, BSOD 2-5min afterwards)
- Run in Safe Mode (I get a bit more time to play around 10-15min, but it eventually BSODs)
- MEMTEST86+ ran overnight (10 hours, 9 passes) with no errors. Ran each stick of RAM individually overnight with no errors as well (10 hours)
- Analyzed the BSOD errors. They are a variety (PFN_LIST_CORRUPT is the most common, but I also get IRQL_NOT_LESS_OR_EQUAL, and MEMORY_MANAGEMENT). When I looked at them with the Windows debugger I get different causes (ntkrnlmp.exe ( nt! ?? ::FNODOBFM::`string'+35ed2 )
win32k.sys ( win32k+c3ff8 )
Probably caused by : win32k.sys ( win32k+c3ff8 )
memory_corruption ( nt!MiBadShareCount+4c )
memory_corruption ( nt!MiBadShareCount+4c )
ntkrnlmp.exe ( nt!CmpKcbCacheLookup+1dd )
cdrom.sys ( cdrom!DeviceSendRequestSynchronously+db ) )
The build is:
MB GIGABYTE | GA-MA770T-UD3P AMD770
CPU AMD|ATH II X4 630 AM3 RT
MEM 2Gx2|GSKILL F3-10666CL9D-4GBRL
VGA GIGABYTE|GV-R489OC-1GD 4890 RT
HD 1T|WD 7K 32M SATA2 WD1001FALS
PSU ANTEC EA650W
I'm guessing this is some kind of hardware problem, but I've never troubleshooted anything like this, and I'm not sure how to procede. Does the MEMTEST 100% eliminate the chance of faulty RAM? Is there a way I can test to see if my motherboard is bad?
Death after a few minutes is often a sign of CPU overheating. If this is the first time you've built a PC, you might not have got the heat sink paste application quite right. Immediately after a BSOD, what does the PC Health section of your BIOS setup menu show you for CPU temperature?
posted by flabdablet at 6:05 PM on January 25, 2010 [2 favorites]
posted by flabdablet at 6:05 PM on January 25, 2010 [2 favorites]
Also: you're not overclocking this beast at all, are you?
posted by flabdablet at 6:06 PM on January 25, 2010
posted by flabdablet at 6:06 PM on January 25, 2010
Seconding flabdablet. Make sure you've got fans installed and watch the BIOS screen for fan speed and temperature.
You can use a motherboard monitoring tool to keep an eye on this on a ongoing basis from Windows.
posted by Mr. Gunn at 6:14 PM on January 25, 2010
You can use a motherboard monitoring tool to keep an eye on this on a ongoing basis from Windows.
posted by Mr. Gunn at 6:14 PM on January 25, 2010
Check the temp on your video card; if you're able to go through 10+ hours on memtest, the card may be getting cooked. (memtest doesn't drive the card very hard). If you can swap out the card with another one for a little while, it may help you pinpoint the issue.
posted by jenkinsEar at 6:15 PM on January 25, 2010
posted by jenkinsEar at 6:15 PM on January 25, 2010
Get speedfan or a similar program and check your temperatures in Windows, or check them on the bios screen if your bios supports that. I suspect it's a heating issue. You applied thermal grease appropriately and are using a good heatsink/fan, right? When you ran the machine the fans/ventilation on the case were functioning and appropriately plugged in to the motherboard or PSU?
posted by drpynchon at 6:55 PM on January 25, 2010
posted by drpynchon at 6:55 PM on January 25, 2010
Try taking the side off, and putting a house fan up to it. Run it and see if it BSOD again. If it doesn't, then you have a cooling problem.
posted by DeltaForce at 6:56 PM on January 25, 2010
posted by DeltaForce at 6:56 PM on January 25, 2010
Grab a Linux distro and install it as a dual boot system. Then try using that for a while. If it's a heating problem (which is certainly possible), it should replicate in Linux exactly the way it happens in Windows 7.
Unfortunately this behavior (while typical of heating problems) could also be a bad motherboard.
posted by oddman at 7:24 PM on January 25, 2010
Unfortunately this behavior (while typical of heating problems) could also be a bad motherboard.
posted by oddman at 7:24 PM on January 25, 2010
Response by poster: OK, I'm running SpeedFan and something is way too hot. It lists Temp1, Temp2, Temp3 and Core. They are all around 35C, except Temp3 which is 80C. I'm guessing this is one of the four cores on my CPU?
The AMD processor came with a heatsink with thermal paste already applied, so I just stuck it on there after putting the chip in the motherboard. Is applying my own thermal paste suggested?
The processor fan, videocard fan, and case fan are all running fine, and the case has been open pretty much the whole time since I've built the PC.
posted by Endure You Are Not Alone at 7:39 PM on January 25, 2010
The AMD processor came with a heatsink with thermal paste already applied, so I just stuck it on there after putting the chip in the motherboard. Is applying my own thermal paste suggested?
The processor fan, videocard fan, and case fan are all running fine, and the case has been open pretty much the whole time since I've built the PC.
posted by Endure You Are Not Alone at 7:39 PM on January 25, 2010
Response by poster: Oh, and I'm not overclocking anything.
posted by Endure You Are Not Alone at 7:41 PM on January 25, 2010
posted by Endure You Are Not Alone at 7:41 PM on January 25, 2010
It may be overheating, or it may be that one of the temperature sensors isn't connected to anything.
It would be very difficult for one core to be 176F and one only a few fractions of an inch away from it to be 98F. That's a pretty big gap.
Still, I don't have any suggestions for you except what's mentioned above; blow lots of air over everything (and making sure that the heatsink is attached to the CPU properly wouldn't hurt ... buy some good heatsink grease like Arctic Silver to put on there in case there are gaps), see if that solves the problem.
Using Linux to eliminate software problems, and then running diagnostics that stress various subsystems (memory, which you've already tested; CPU; GPU; IO) until it crashes will let you narrow things down.
posted by Kadin2048 at 9:00 PM on January 25, 2010
It would be very difficult for one core to be 176F and one only a few fractions of an inch away from it to be 98F. That's a pretty big gap.
Still, I don't have any suggestions for you except what's mentioned above; blow lots of air over everything (and making sure that the heatsink is attached to the CPU properly wouldn't hurt ... buy some good heatsink grease like Arctic Silver to put on there in case there are gaps), see if that solves the problem.
Using Linux to eliminate software problems, and then running diagnostics that stress various subsystems (memory, which you've already tested; CPU; GPU; IO) until it crashes will let you narrow things down.
posted by Kadin2048 at 9:00 PM on January 25, 2010
I'd suspect that that temp reflects something other than the cores -- probably the southbridge. That seems a bit hot for a few minutes of startup and idling, but I think southbridges can often run pretty hot.
posted by drpynchon at 9:48 PM on January 25, 2010
posted by drpynchon at 9:48 PM on January 25, 2010
Best answer: Both times I had crashing problems with new builds, it turned out to be a RAM compatibility issue. The memory itself was fine, the BIOS just wasn't detecting the proper settings for it and was running it out of spec. Interestingly the problem didn't show up on memtest86+
To rule that out RAM compatibility, there are a few things you can do. First, check to see if your RAM is on the supported list for your motherboard. Second, update to the latest version of the BIOS. Third, look around on the web for some conservative timings for the memory you have. Fourth, try a different make/model of memory.
posted by Good Brain at 11:46 PM on January 25, 2010
To rule that out RAM compatibility, there are a few things you can do. First, check to see if your RAM is on the supported list for your motherboard. Second, update to the latest version of the BIOS. Third, look around on the web for some conservative timings for the memory you have. Fourth, try a different make/model of memory.
posted by Good Brain at 11:46 PM on January 25, 2010
I'm guessing this is one of the four cores on my CPU?
No, there is only one sensor for the entire CPU die. The others are usually sensors placed at various points throughout the motherboard. It's not uncommon to get really wild readings for a temperature sensor if the program hasn't detected the hardware correctly; Temp3 might not be connected to anything, for example. You should read the temperatures of the BIOS at boot because usually those are correct for the specifics of the board and then match up the numbers there with what the software shows you -- in a lot of cases the labels are wrong or need adjusting.
posted by Rhomboid at 4:01 AM on January 26, 2010
No, there is only one sensor for the entire CPU die. The others are usually sensors placed at various points throughout the motherboard. It's not uncommon to get really wild readings for a temperature sensor if the program hasn't detected the hardware correctly; Temp3 might not be connected to anything, for example. You should read the temperatures of the BIOS at boot because usually those are correct for the specifics of the board and then match up the numbers there with what the software shows you -- in a lot of cases the labels are wrong or need adjusting.
posted by Rhomboid at 4:01 AM on January 26, 2010
My first thought was heat, too. But if you can run Memtest all night without it crashing, probably not. Second thought is video card. You said you swapped the card; does it work better with the old one? Last thought is a software driver problem, again specifically the video drivers.
I like the suggestion of booting into Linux and trying it for an hour. That'll let you quickly sort out whether it's the hardware alone or not. No need to install the OS, just boot a Live CD.
posted by Nelson at 8:36 AM on January 26, 2010
I like the suggestion of booting into Linux and trying it for an hour. That'll let you quickly sort out whether it's the hardware alone or not. No need to install the OS, just boot a Live CD.
posted by Nelson at 8:36 AM on January 26, 2010
Response by poster: You said you swapped the card; does it work better with the old one?
No, I had the same issues.
You should read the temperatures of the BIOS at boot because usually those are correct for the specifics of the board and then match up the numbers there with what the software shows you -- in a lot of cases the labels are wrong or need adjusting.
I checked the temperatures in the BIOS and it listed two, the processor and system temperature. Both of which were similar to what SpeedFan listed. I'll check the temperatures when the computer is cold and see if I still get that weird high temperature.
Grab a Linux distro and install it as a dual boot system. Then try using that for a while.
I'm downloading an ISO now, and will try it tonight.
posted by Endure You Are Not Alone at 12:40 PM on January 26, 2010
No, I had the same issues.
You should read the temperatures of the BIOS at boot because usually those are correct for the specifics of the board and then match up the numbers there with what the software shows you -- in a lot of cases the labels are wrong or need adjusting.
I checked the temperatures in the BIOS and it listed two, the processor and system temperature. Both of which were similar to what SpeedFan listed. I'll check the temperatures when the computer is cold and see if I still get that weird high temperature.
Grab a Linux distro and install it as a dual boot system. Then try using that for a while.
I'm downloading an ISO now, and will try it tonight.
posted by Endure You Are Not Alone at 12:40 PM on January 26, 2010
The memory itself was fine, the BIOS just wasn't detecting the proper settings for it and was running it out of spec.
I recently worked around an issue very much like that on an older Gigabyte mobo in which I had replaced the original 256MB DDR1 RAM stick with a pair of 512s. I don't know whether the original stick was CL2 or CL3; the new sticks are both CL3, and there are no BIOS options to force a CL setting. Possibly as a result of this, the mobo will not run this RAM at its rated DDR 400 speed, even though it is in fact quite capable of generating the appropriate clock for this; it autoselects DDR 333 instead.
At that speed, the mobo mostly works but suffers from occasional mysterious crashes. Also, if Memtest86+ is started after the board has already warmed up, and the Block Move test (test #5) is selected for repeated running, it reports buckets of errors as the address crosses the 511->512MB and 1023->0MB boundaries. Forcing DDR 302 instead of the auto-selected speed results in reliable operation.
If I'm right about the CL2/CL3 thing, then theoretically I should be running at 266 instead of 302 (2 clock periods at 266MHz are the same length as 3 at 400MHz) but what I've done works.
The AMD processor came with a heatsink with thermal paste already applied, so I just stuck it on there after putting the chip in the motherboard. Is applying my own thermal paste suggested?
Pre-gooped heatsinks often come with a bit of foil stuck over the goop to protect it during shipping. Every now and then, I have seen that foil still in there after taking a heatsink off a CPU, so the thermal path goes CPU -> air gap -> foil -> goop -> heatsink. I think the only way this works in because the goop allows the foil to conform to the face of the CPU well enough to make the air gap very small, but it's certainly not optimal.
A skilfully cleaned unscratched heatsink with the minimum required amount of skilfully applied high quality thermal paste between itself and the CPU will keep the CPU maybe two degrees cooler than the pre-applied stock goop. The stock goop is probably about two degrees better than a ham-fisted paste job. Stock goop + protective foil is about fifteen degrees worse.
posted by flabdablet at 8:29 PM on January 26, 2010
I recently worked around an issue very much like that on an older Gigabyte mobo in which I had replaced the original 256MB DDR1 RAM stick with a pair of 512s. I don't know whether the original stick was CL2 or CL3; the new sticks are both CL3, and there are no BIOS options to force a CL setting. Possibly as a result of this, the mobo will not run this RAM at its rated DDR 400 speed, even though it is in fact quite capable of generating the appropriate clock for this; it autoselects DDR 333 instead.
At that speed, the mobo mostly works but suffers from occasional mysterious crashes. Also, if Memtest86+ is started after the board has already warmed up, and the Block Move test (test #5) is selected for repeated running, it reports buckets of errors as the address crosses the 511->512MB and 1023->0MB boundaries. Forcing DDR 302 instead of the auto-selected speed results in reliable operation.
If I'm right about the CL2/CL3 thing, then theoretically I should be running at 266 instead of 302 (2 clock periods at 266MHz are the same length as 3 at 400MHz) but what I've done works.
The AMD processor came with a heatsink with thermal paste already applied, so I just stuck it on there after putting the chip in the motherboard. Is applying my own thermal paste suggested?
Pre-gooped heatsinks often come with a bit of foil stuck over the goop to protect it during shipping. Every now and then, I have seen that foil still in there after taking a heatsink off a CPU, so the thermal path goes CPU -> air gap -> foil -> goop -> heatsink. I think the only way this works in because the goop allows the foil to conform to the face of the CPU well enough to make the air gap very small, but it's certainly not optimal.
A skilfully cleaned unscratched heatsink with the minimum required amount of skilfully applied high quality thermal paste between itself and the CPU will keep the CPU maybe two degrees cooler than the pre-applied stock goop. The stock goop is probably about two degrees better than a ham-fisted paste job. Stock goop + protective foil is about fifteen degrees worse.
posted by flabdablet at 8:29 PM on January 26, 2010
Best answer: I had a similar question not too long ago, although with different hardware. It turned out to be the RAM and memtest86+ didn't detect it.
I found something that would consistently give a failure: Prime95 (I think it was the In-place large FFTs stress test, but it might have been Blend). I would get errors in Prime95 almost immediately and IIRC, it would often BSOD. Once I had a reliable test, my plan was to test components one at a time. I tested the RAM first (partly because the Small FFTs test never gave me problems), and fortunately that was the culprit.
I'd recommend the same approach.
posted by chndrcks at 10:45 PM on January 26, 2010
I found something that would consistently give a failure: Prime95 (I think it was the In-place large FFTs stress test, but it might have been Blend). I would get errors in Prime95 almost immediately and IIRC, it would often BSOD. Once I had a reliable test, my plan was to test components one at a time. I tested the RAM first (partly because the Small FFTs test never gave me problems), and fortunately that was the culprit.
I'd recommend the same approach.
posted by chndrcks at 10:45 PM on January 26, 2010
Searching for GSKILL in customer reviews of your motherboard at newegg returns a few people who had timing issues, if that helps.
posted by chndrcks at 10:53 PM on January 26, 2010
posted by chndrcks at 10:53 PM on January 26, 2010
Yeah, setting the memory specs manually might help.
What's surprising is that the install goes fine, but then the OS doesn't load. My experience with these issues is that the install freaks out. But that's neither here nor there.
Also, when you run the memory tests, run them with only one stick of memory at a time. I've had machines that were flaky with all their ram installed, because the bad spot on the ram didn't get used during boot and would only freak out when it tried to use that bad spot. Using one stick at a time forces the computer to try and use more of the memory.
posted by gjc at 8:09 AM on January 27, 2010
What's surprising is that the install goes fine, but then the OS doesn't load. My experience with these issues is that the install freaks out. But that's neither here nor there.
Also, when you run the memory tests, run them with only one stick of memory at a time. I've had machines that were flaky with all their ram installed, because the bad spot on the ram didn't get used during boot and would only freak out when it tried to use that bad spot. Using one stick at a time forces the computer to try and use more of the memory.
posted by gjc at 8:09 AM on January 27, 2010
Response by poster: I like the suggestion of booting into Linux and trying it for an hour.
I ran Ubuntu off a CD for an hour, three separate times. No crashes, although I would always get some kind of Kernal Failure (it would be reported but not crash the computer).
The memory itself was fine, the BIOS just wasn't detecting the proper settings for it and was running it out of spec.
Searching for GSKILL in customer reviews of your motherboard at newegg returns a few people who had timing issues, if that helps.
Also, when you run the memory tests, run them with only one stick of memory at a time.
I checked out the BIOS settings, and they properly automatically detected with the correct speed (1333) and timings (9-9-9-24). I tried setting the RAM to run at a lower speed as suggested in one of the NewEgg reviews, but it didn't improve things. And I ran MEMTEST86+ with the individual sticks of RAM, each one overnight, no errors. Sometimes though when I start up MEMTEST86+ it detects the speed of my RAM improperly. If I reboot and try again it's correct the second time.
I found something that would consistently give a failure: Prime95
During one the times where Windows didn't crash after two minutes, I ran Prime95. When I did the CPU alone test it would run fine for 30-60 minutes. When I did the test which heavily used RAM and CPU, I would get errors after 5 minutes. But it wouldn't BSOD the computer.
I'm leaning towards either faulty RAM, or a brand of RAM that doesn't play so well with my system, but Linux running so well makes me hesitate. I couldn't figure out what was causing the kernel failure though, so who knows.
Thanks to everyone who's suggested things to try!
posted by Endure You Are Not Alone at 12:37 PM on January 29, 2010
I ran Ubuntu off a CD for an hour, three separate times. No crashes, although I would always get some kind of Kernal Failure (it would be reported but not crash the computer).
The memory itself was fine, the BIOS just wasn't detecting the proper settings for it and was running it out of spec.
Searching for GSKILL in customer reviews of your motherboard at newegg returns a few people who had timing issues, if that helps.
Also, when you run the memory tests, run them with only one stick of memory at a time.
I checked out the BIOS settings, and they properly automatically detected with the correct speed (1333) and timings (9-9-9-24). I tried setting the RAM to run at a lower speed as suggested in one of the NewEgg reviews, but it didn't improve things. And I ran MEMTEST86+ with the individual sticks of RAM, each one overnight, no errors. Sometimes though when I start up MEMTEST86+ it detects the speed of my RAM improperly. If I reboot and try again it's correct the second time.
I found something that would consistently give a failure: Prime95
During one the times where Windows didn't crash after two minutes, I ran Prime95. When I did the CPU alone test it would run fine for 30-60 minutes. When I did the test which heavily used RAM and CPU, I would get errors after 5 minutes. But it wouldn't BSOD the computer.
I'm leaning towards either faulty RAM, or a brand of RAM that doesn't play so well with my system, but Linux running so well makes me hesitate. I couldn't figure out what was causing the kernel failure though, so who knows.
Thanks to everyone who's suggested things to try!
posted by Endure You Are Not Alone at 12:37 PM on January 29, 2010
No crashes, although I would always get some kind of Kernal Failure ... Linux running so well
Something's wrong with the hardware. Linux systems should never report any sort of kernel errors while running.
posted by Nelson at 1:53 PM on January 29, 2010
Something's wrong with the hardware. Linux systems should never report any sort of kernel errors while running.
posted by Nelson at 1:53 PM on January 29, 2010
Sometimes though when I start up MEMTEST86+ it detects the speed of my RAM improperly. If I reboot and try again it's correct the second time.
Which aspect of the speed is it getting wrong?
As far as I know, Memtest just believes what the BIOS tells it about RAM timing, so if Memtest has it wrong, the BIOS probably does too for that session, and that's probably not good; it means that the BIOS is not reading the SPD data correctly from the RAM sticks.
I'm starting to think that this might be less about timing than bus noise. You might want to try over-volting the RAM by a notch or two, if your BIOS has settings that allow that.
posted by flabdablet at 5:01 PM on January 29, 2010
Which aspect of the speed is it getting wrong?
As far as I know, Memtest just believes what the BIOS tells it about RAM timing, so if Memtest has it wrong, the BIOS probably does too for that session, and that's probably not good; it means that the BIOS is not reading the SPD data correctly from the RAM sticks.
I'm starting to think that this might be less about timing than bus noise. You might want to try over-volting the RAM by a notch or two, if your BIOS has settings that allow that.
posted by flabdablet at 5:01 PM on January 29, 2010
Response by poster: I returned the RAM to the manufacturer, and installed the new sticks. No more blue screens! Thanks to everyone who suggested things to try. For future mefites who read this question and are tl;dr: Prime95 and a linux boot cd were most helpful to determine my RAM problems.
posted by Endure You Are Not Alone at 7:56 PM on February 23, 2010
posted by Endure You Are Not Alone at 7:56 PM on February 23, 2010
This thread is closed to new comments.
posted by sammyo at 6:03 PM on January 25, 2010