Skip

BSODude, This Fucking Sucks
December 7, 2007 5:02 PM   Subscribe

BSOD Filter: Help me pinpoint the cause. Dump file contents included.

Purchased about 2 months ago, it is still under warranty, but I would like to know exactly what I am looking at before sending it in or asking for replacement parts. From what I can tell, it is bad memory I guess. Is there a way to tell which stick it is? Any software recommendations for testing the memory? Is it possible that it is off the video card memory? If it is the DDR2, should I just replace it with better memory?

Here is some information on the rig, further below is the dump...

Intel Core 2 Quad Processor Q6600 (4x 2.4GHz/8MB L2 Cache/1066FSB)
Asus P5N-E SLI nForceĀ® 650i SLI Chipset w/6-channel CODEC, Gb LAN, S-ATA Raid, USB 2.0, IEEE-1394 Dual PCI-E MB
2GB Corsair XMS2
NVIDIA GeForce 8600GT 512MB w/DVI + TV Out Video


Microsoft (R) Windows Debugger Version 6.8.0004.0 X86
Copyright (c) Microsoft Corporation. All rights reserved.

Loading Dump File [C:\Documents and Settings\Bleeping PC\Desktop\Mini120607-01.dmp]
wMini Kernel Dump File: Only registers and stack trace are available

Symbol search path is: C:\WINDOWS\Symbols
Executable search path is:
Unable to load image ntoskrnl.exe, Win32 error 0n2
*** WARNING: Unable to verify timestamp for ntoskrnl.exe
Windows XP Kernel Version 2600 (Service Pack 2) MP (4 procs) Free x86 compatible
Product: WinNt, suite: TerminalServer SingleUserTS
Kernel base = 0x804d7000 PsLoadedModuleList = 0x8055c720
Debug session time: Thu Dec 6 08:33:14.906 2007 (GMT-7)
System Uptime: 0 days 12:38:18.528
Unable to load image ntoskrnl.exe, Win32 error 0n2
*** WARNING: Unable to verify timestamp for ntoskrnl.exe
Loading Kernel Symbols
.............................................................................................................................
Loading User Symbols
Loading unloaded module list
....................
*** WARNING: Unable to verify timestamp for hal.dll
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 9C, {0, bab3c050, b2000040, 800}



Probably caused by : memory_corruption ( nt!MmDeleteKernelStack+156 )

Followup: MachineOwner
---------

1: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

MACHINE_CHECK_EXCEPTION (9c)
A fatal Machine Check Exception has occurred.
KeBugCheckEx parameters;
x86 Processors
If the processor has ONLY MCE feature available (For example Intel
Pentium), the parameters are:
1 - Low 32 bits of P5_MC_TYPE MSR
2 - Address of MCA_EXCEPTION structure
3 - High 32 bits of P5_MC_ADDR MSR
4 - Low 32 bits of P5_MC_ADDR MSR
If the processor also has MCA feature available (For example Intel
Pentium Pro), the parameters are:
1 - Bank number
2 - Address of MCA_EXCEPTION structure
3 - High 32 bits of MCi_STATUS MSR for the MCA bank that had the error
4 - Low 32 bits of MCi_STATUS MSR for the MCA bank that had the error
IA64 Processors
1 - Bugcheck Type
1 - MCA_ASSERT
2 - MCA_GET_STATEINFO
SAL returned an error for SAL_GET_STATEINFO while processing MCA.
3 - MCA_CLEAR_STATEINFO
SAL returned an error for SAL_CLEAR_STATEINFO while processing MCA.
4 - MCA_FATAL
FW reported a fatal MCA.
5 - MCA_NONFATAL
SAL reported a recoverable MCA and we don't support currently
support recovery or SAL generated an MCA and then couldn't
produce an error record.
0xB - INIT_ASSERT
0xC - INIT_GET_STATEINFO
SAL returned an error for SAL_GET_STATEINFO while processing INIT event.
0xD - INIT_CLEAR_STATEINFO
SAL returned an error for SAL_CLEAR_STATEINFO while processing INIT event.
0xE - INIT_FATAL
Not used.
2 - Address of log
3 - Size of log
4 - Error code in the case of x_GET_STATEINFO or x_CLEAR_STATEINFO
AMD64 Processors
1 - Bank number
2 - Address of MCA_EXCEPTION structure
3 - High 32 bits of MCi_STATUS MSR for the MCA bank that had the error
4 - Low 32 bits of MCi_STATUS MSR for the MCA bank that had the error
Arguments:
Arg1: 00000000
Arg2: bab3c050
Arg3: b2000040
Arg4: 00000800

Debugging Details:
------------------

NOTE: This is a hardware error. This error was reported by the CPU
via Interrupt 18. This analysis will provide more information about
the specific error. Please contact the manufacturer for additional
information about this error and troubleshooting assistance.

This error is documented in the following publication:

- IA-32 Intel(r) Architecture Software Developer's Manual
Volume 3: System Programming Guide

Bit Mask:

MA Model Specific MCA
O ID Other Information Error Code Error Code
VV SDP ___________|____________ _______|_______ _______|______
AEUECRC| | | |
LRCNVVC| | | |
^^^^^^^| | | |
6 5 4 3 2 1
3210987654321098765432109876543210987654321098765432109876543210
----------------------------------------------------------------
1011001000000000000000000100000000000000000000000000100000000000


VAL - MCi_STATUS register is valid
Indicates that the information contained within the IA32_MCi_STATUS
register is valid. When this flag is set, the processor follows the
rules given for the OVER flag in the IA32_MCi_STATUS register when
overwriting previously valid entries. The processor sets the VAL
flag and software is responsible for clearing it.

UC - Error Uncorrected
Indicates that the processor did not or was not able to correct the
error condition. When clear, this flag indicates that the processor
was able to correct the error condition.

EN - Error Enabled
Indicates that the error was enabled by the associated EEj bit of the
IA32_MCi_CTL register.

PCC - Processor Context Corrupt
Indicates that the state of the processor might have been corrupted
by the error condition detected and that reliable restarting of the
processor may not be possible.

BUSCONNERR - Bus and Interconnect Error BUS{LL}_{PP}_{RRRR}_{II}_{T}_err
These errors match the format 0000 1PPT RRRR IILL



Concatenated Error Code:
--------------------------
_VAL_UC_EN_PCC_BUSCONNERR_0

This error code can be reported back to the manufacturer.
They may be able to provide additional information based upon
this error. All questions regarding STOP 0x9C should be
directed to the hardware manufacturer.



BUGCHECK_STR: 0x9C_GenuineIntel

CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: INTEL_CPU_MICROCODE_ZERO

PROCESS_NAME: Idle

LAST_CONTROL_TRANSFER: from 806e7bf7 to 804f9f05

STACK_TEXT:
bab3c028 806e7bf7 0000009c 00000000 bab3c050 nt!MmDeleteKernelStack+0x156
bab3c154 806e2c52 bab38d70 00000000 00000000 hal!_allshr+0x9
00000000 00000000 00000000 00000000 00000000 hal!HalpWriteCmosTime+0xce


STACK_COMMAND: kb

FOLLOWUP_IP:
nt!MmDeleteKernelStack+156
804f9f05 5d pop ebp

SYMBOL_STACK_INDEX: 0

SYMBOL_NAME: nt!MmDeleteKernelStack+156

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: nt

DEBUG_FLR_IMAGE_TIMESTAMP: 469f3fa8

IMAGE_NAME: memory_corruption

FAILURE_BUCKET_ID: 0x9C_GenuineIntel_nt!MmDeleteKernelStack+156

BUCKET_ID: 0x9C_GenuineIntel_nt!MmDeleteKernelStack+156

Followup: MachineOwner
---------
posted by B(oYo)BIES to Computers & Internet (6 answers total)
 
"Any software recommendations for testing the memory?"

The gold standard for this is Memtest86+. Click on "Download (Pre-Built & ISOs)", then download "Pre-Compiled Bootable ISO (.zip)". Burn a CD of that image, boot from the image, and let it run its tests for a while.

If your memory is catastrophically bad, you'll see red warnings right away. On the other hand, sometimes it can run for 12 hours or more before finding the defect. Give it time. You can see in the top-right how many times it's run its battery of tests, let it go through at least a couple (may take a couple hours).
posted by CrayDrygu at 5:14 PM on December 7, 2007


Before you waste a lot of hours running memtest86+, do open the box, and make sure the memory DIMMs are fully seated. It's a good utility, but it expects things to be fully seated :-)
posted by paulsc at 5:47 PM on December 7, 2007


Yes, memtest86 is really easy to use. One other idea: check that all the fans are working.
posted by jjj606 at 6:53 PM on December 7, 2007


You might also care to pull the DIMMs and check the gold edge connectors under a good strong light. Doofus installers sometimes handle these with bare fingers, which leaves a noticeably discolored fingerprint on the edge connector after a month or two.

If you see a fingerprint on the edge connector, you can clean it up with a white pencil eraser (don't use an ink eraser - too abrasive). Do this after discharging yourself to the chassis, don't wear synthetic clothing while doing it, and make sure you blow all the rubber crumbs off before reseating the DIMM. This is an Apple II-era trick that still works well.

If Memtest86+ runs overnight and finds no errors, your RAM is most likely fine. If not, it won't tell you explicitly which DIMM is bad. If you find an error, repeat the test with one DIMM installed at a time.
posted by flabdablet at 4:07 AM on December 8, 2007


One bit of advice I would give here: If you're going to run memtest86+, I would shut down your machine overnight and let it completely cool first. Then, power it on with the memtest boot CD in the morning, say before you go to work. The reason for this is that I have seen hardware problems that only show up when the machine is either cold or fully warmed up. In this scenario, you'd get to test both cases.
posted by deadmessenger at 10:24 AM on December 8, 2007


Tech people usually recommend removing all but one stick, and then booting and running memtest, then repeating with the other sticks, to narrow down the problem.
posted by Mr. Gunn at 1:23 PM on December 8, 2007


« Older Spilled milk in my MacBook's k...   |  Looking for tips on implementi... Newer »
This thread is closed to new comments.


Post