BSOD Filter: Help me pinpoint the cause. Dump file contents included.
Purchased about 2 months ago, it is still under warranty, but I would like to know exactly what I am looking at before sending it in or asking for replacement parts. From what I can tell, it is bad memory I guess. Is there a way to tell which stick it is? Any software recommendations for testing the memory? Is it possible that it is off the video card memory? If it is the DDR2, should I just replace it with better memory?
Here is some information on the rig, further below is the dump...
Intel Core 2 Quad Processor Q6600 (4x 2.4GHz/8MB L2 Cache/1066FSB)
Asus P5N-E SLI nForceĀ® 650i SLI Chipset w/6-channel CODEC, Gb LAN, S-ATA Raid, USB 2.0, IEEE-1394 Dual PCI-E MB
2GB Corsair XMS2
NVIDIA GeForce 8600GT 512MB w/DVI + TV Out Video
Microsoft (R) Windows Debugger Version 6.8.0004.0 X86
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [C:\Documents and Settings\Bleeping PC\Desktop\Mini120607-01.dmp]
wMini Kernel Dump File: Only registers and stack trace are available
Symbol search path is: C:\WINDOWS\Symbols
Executable search path is:
Unable to load image ntoskrnl.exe, Win32 error 0n2
*** WARNING: Unable to verify timestamp for ntoskrnl.exe
Windows XP Kernel Version 2600 (Service Pack 2) MP (4 procs) Free x86 compatible
Product: WinNt, suite: TerminalServer SingleUserTS
Kernel base = 0x804d7000 PsLoadedModuleList = 0x8055c720
Debug session time: Thu Dec 6 08:33:14.906 2007 (GMT-7)
System Uptime: 0 days 12:38:18.528
Unable to load image ntoskrnl.exe, Win32 error 0n2
*** WARNING: Unable to verify timestamp for ntoskrnl.exe
Loading Kernel Symbols
.............................................................................................................................
Loading User Symbols
Loading unloaded module list
....................
*** WARNING: Unable to verify timestamp for hal.dll
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
Use !analyze -v to get detailed debugging information.
BugCheck 9C, {0, bab3c050, b2000040, 800}
Probably caused by : memory_corruption ( nt!MmDeleteKernelStack+156 )
Followup: MachineOwner
---------
1: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
MACHINE_CHECK_EXCEPTION (9c)
A fatal Machine Check Exception has occurred.
KeBugCheckEx parameters;
x86 Processors
If the processor has ONLY MCE feature available (For example Intel
Pentium), the parameters are:
1 - Low 32 bits of P5_MC_TYPE MSR
2 - Address of MCA_EXCEPTION structure
3 - High 32 bits of P5_MC_ADDR MSR
4 - Low 32 bits of P5_MC_ADDR MSR
If the processor also has MCA feature available (For example Intel
Pentium Pro), the parameters are:
1 - Bank number
2 - Address of MCA_EXCEPTION structure
3 - High 32 bits of MCi_STATUS MSR for the MCA bank that had the error
4 - Low 32 bits of MCi_STATUS MSR for the MCA bank that had the error
IA64 Processors
1 - Bugcheck Type
1 - MCA_ASSERT
2 - MCA_GET_STATEINFO
SAL returned an error for SAL_GET_STATEINFO while processing MCA.
3 - MCA_CLEAR_STATEINFO
SAL returned an error for SAL_CLEAR_STATEINFO while processing MCA.
4 - MCA_FATAL
FW reported a fatal MCA.
5 - MCA_NONFATAL
SAL reported a recoverable MCA and we don't support currently
support recovery or SAL generated an MCA and then couldn't
produce an error record.
0xB - INIT_ASSERT
0xC - INIT_GET_STATEINFO
SAL returned an error for SAL_GET_STATEINFO while processing INIT event.
0xD - INIT_CLEAR_STATEINFO
SAL returned an error for SAL_CLEAR_STATEINFO while processing INIT event.
0xE - INIT_FATAL
Not used.
2 - Address of log
3 - Size of log
4 - Error code in the case of x_GET_STATEINFO or x_CLEAR_STATEINFO
AMD64 Processors
1 - Bank number
2 - Address of MCA_EXCEPTION structure
3 - High 32 bits of MCi_STATUS MSR for the MCA bank that had the error
4 - Low 32 bits of MCi_STATUS MSR for the MCA bank that had the error
Arguments:
Arg1: 00000000
Arg2: bab3c050
Arg3: b2000040
Arg4: 00000800
Debugging Details:
------------------
NOTE: This is a hardware error. This error was reported by the CPU
via Interrupt 18. This analysis will provide more information about
the specific error. Please contact the manufacturer for additional
information about this error and troubleshooting assistance.
This error is documented in the following publication:
- IA-32 Intel(r) Architecture Software Developer's Manual
Volume 3: System Programming Guide
Bit Mask:
MA Model Specific MCA
O ID Other Information Error Code Error Code
VV SDP ___________|____________ _______|_______ _______|______
AEUECRC| | | |
LRCNVVC| | | |
^^^^^^^| | | |
6 5 4 3 2 1
3210987654321098765432109876543210987654321098765432109876543210
----------------------------------------------------------------
1011001000000000000000000100000000000000000000000000100000000000
VAL - MCi_STATUS register is valid
Indicates that the information contained within the IA32_MCi_STATUS
register is valid. When this flag is set, the processor follows the
rules given for the OVER flag in the IA32_MCi_STATUS register when
overwriting previously valid entries. The processor sets the VAL
flag and software is responsible for clearing it.
UC - Error Uncorrected
Indicates that the processor did not or was not able to correct the
error condition. When clear, this flag indicates that the processor
was able to correct the error condition.
EN - Error Enabled
Indicates that the error was enabled by the associated EEj bit of the
IA32_MCi_CTL register.
PCC - Processor Context Corrupt
Indicates that the state of the processor might have been corrupted
by the error condition detected and that reliable restarting of the
processor may not be possible.
BUSCONNERR - Bus and Interconnect Error BUS{LL}_{PP}_{RRRR}_{II}_{T}_err
These errors match the format 0000 1PPT RRRR IILL
Concatenated Error Code:
--------------------------
_VAL_UC_EN_PCC_BUSCONNERR_0
This error code can be reported back to the manufacturer.
They may be able to provide additional information based upon
this error. All questions regarding STOP 0x9C should be
directed to the hardware manufacturer.
BUGCHECK_STR: 0x9C_GenuineIntel
CUSTOMER_CRASH_COUNT: 1
DEFAULT_BUCKET_ID: INTEL_CPU_MICROCODE_ZERO
PROCESS_NAME: Idle
LAST_CONTROL_TRANSFER: from 806e7bf7 to 804f9f05
STACK_TEXT:
bab3c028 806e7bf7 0000009c 00000000 bab3c050 nt!MmDeleteKernelStack+0x156
bab3c154 806e2c52 bab38d70 00000000 00000000 hal!_allshr+0x9
00000000 00000000 00000000 00000000 00000000 hal!HalpWriteCmosTime+0xce
STACK_COMMAND: kb
FOLLOWUP_IP:
nt!MmDeleteKernelStack+156
804f9f05 5d pop ebp
SYMBOL_STACK_INDEX: 0
SYMBOL_NAME: nt!MmDeleteKernelStack+156
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: nt
DEBUG_FLR_IMAGE_TIMESTAMP: 469f3fa8
IMAGE_NAME: memory_corruption
FAILURE_BUCKET_ID: 0x9C_GenuineIntel_nt!MmDeleteKernelStack+156
BUCKET_ID: 0x9C_GenuineIntel_nt!MmDeleteKernelStack+156
Followup: MachineOwner
---------
The gold standard for this is Memtest86+. Click on "Download (Pre-Built & ISOs)", then download "Pre-Compiled Bootable ISO (.zip)". Burn a CD of that image, boot from the image, and let it run its tests for a while.
If your memory is catastrophically bad, you'll see red warnings right away. On the other hand, sometimes it can run for 12 hours or more before finding the defect. Give it time. You can see in the top-right how many times it's run its battery of tests, let it go through at least a couple (may take a couple hours).
posted by CrayDrygu at 5:14 PM on December 7, 2007