Basic Crash Dump Analysis

If OCA fails to identify a resolution or you are unable to submit the crash to OCA, an alternative is analyzing crashes yourself. As mentioned earlier, WinDbg and Kd both execute the same analysis engine used by OCA when you load a crash dump file, and the basic analysis can sometimes pinpoint the problem. So you might be fortunate and have the crash dump solved by the automatic analysis. But if not, there are some straightforward techniques to try to solve the crash.

This section explains how to perform basic crash analysis steps, followed by tips on leveraging the Driver Verifier to catch buggy drivers when they corrupt the system so that a crash dump analysis pinpoints them.

OCA’s automated analysis may occasionally identify a highly likely cause of a crash but not be able to inform you of the suspected driver. This happens because it only reports the cause for crashes that have their bucket ID entry populated in the OCA database, and entries are created only when Microsoft crash-analysis engineers have verified the cause. If there’s no bucket ID entry, OCA reports that the crash was caused by “unknown driver.”

You can use the Notmyfault utility from Windows Sysinternals ( to generate the crashes described here. Notmyfault consists of an executable named Notmyfault.exe and a driver named Myfault.sys. When you run the Notmyfault executable, it loads the driver and presents the dialog, which allows you to crash the system in various ways or to cause the driver to leak paged pool. The crash types offered represent the ones most commonly seen by Microsoft’s product support services. Selecting an option and clicking the Do Bug button causes the executable to tell the driver, by using the DeviceIoControl Windows API, which type of bug to trigger.

You should execute Notmyfault crashes on a test system or on a virtual machine because there is a small risk that memory it corrupts will be written to disk and result in file or disk corruption.

The names of the Notmyfault executable and driver highlight the fact that user mode cannot directly cause the system to crash. The Notmyfault executable can cause a crash only by loading a driver to perform an illegal operation for it in kernel mode.

Basic Crash Dump Analysis
The most straightforward Notmyfault crash to debug is the one caused by selecting the High IRQL Fault (Kernelmode) option and clicking the Do Bug button. This causes the driver to allocate a page of paged pool, free the pool, raise the IRQL to above DPC/dispatch level, and then touch the page it has freed. If that doesn’t cause a crash, the process continues by reading memory past the end of the page until it causes a crash by accessing invalid pages. The driver performs several illegal operations as a result:

1. It references memory that doesn’t belong to it.

2. It references paged pool at an IRQL that’s DPC/dispatch level or higher, which is illegal because page faults are not permitted when the processor IRQL is DPC/dispatch level or higher.

3. When it goes past the end of the memory that it had allocated, it tries to reference memory that is potentially invalid. The reason the first page reference might not cause a crash is that it won’t generate a page fault if the page that the driver frees remains in the system working set. When you load a crash generated with this bug into WinDbg, the tool’s analysis displays something like this:

Microsoft (R) Windows Debugger Version 6.9.0003.113 X86
Copyright (c) Microsoft Corporation. All rights reserved.

Loading Dump File [C:\windows\MEMORY.DMP]
Kernel Summary Dump File: Only kernel address space is available

Symbol search path is: srv*c:\programming\symbols\*
Executable search path is:
Windows Server 2008 Kernel Version 6001 (Service Pack 1) MP (2 procs) Free x86 compatible
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 6001.18063.x86fre.vistasp1_gdr.080425-1930
Kernel base = 0x81804000 PsLoadedModuleList = 0x8191bc70
Debug session time: Sun Sep 21 22:58:19.994 2008 (GMT-4)
System Uptime: 2 days 0:11:17.876
Loading Kernel Symbols
Loading User Symbols
Loading unloaded module list
* *
* Bugcheck Analysis *
* *

Use !analyze -v to get detailed debugging information.

BugCheck D1, {a35db800, 1c, 0, 9879c3dd}

*** ERROR: Module load completed but symbols could not be loaded for myfault.sys
Probably caused by : myfault.sys ( myfault+3dd )

Followup: MachineOwner

The first thing to note is that WinDbg reports errors trying to load symbols for Myfault.sys and Notmyfault.exe. These are expected because the symbol files for Myfault.sys and Notmyfault.exe are not on the symbol-file path (which is configured to point at the Microsoft symbol server). You’ll see similar errors for third-party drivers and executables that do not ship with the operating system.

The analysis text itself is terse, showing the numeric stop code and bug-check parameters followed by a “Probably caused by” line that shows the analysis engine’s best guess at the offending driver. In this case it’s on the mark and points directly at Myfault.sys, so there’s no need for manual analysis.

The “Followup” line is not generally useful except within Microsoft, where the debugger looks for the module name in the Triage.ini file that’s located within the Triage directory of the Debugging Tools for Windows installation directory. The Microsoft-internal version of that file lists the developer or group responsible for handling crashes in a specific driver, and the debugger displays the developer’s or group’s name in the Followup line when appropriate.

Verbose Analysis
Even though the basic analysis of the Notmyfault crash identifies the faulty driver, you should always have the debugger execute a verbose analysis by entering the command:

!analyze –v

The first obvious difference between the verbose and default analysis is the description of the stop code and its parameters. Following is the output of the command when executed on the same dump:

An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arg1: a35db800, memory referenced
Arg2: 0000001c, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: 9879c3dd, address which referenced memory

This saves you the trouble of opening the help file to find the same information, and the text sometimes suggests troubleshooting steps, an example of which you’ll see in the next section on advanced crash dump analysis. The other potentially useful information in a verbose analysis is the stack trace of the thread that was executing on the processor that crashed at the time of the crash. Here’s what it looks like for the same dump:

80395b78 9879c3dd badb0d00 8312d054 00000003 nt!KiTrap0E+0x2ac
WARNING: Stack unwind information not available. Following frames may be wrong.
80395c44 81a505e5 855802e0 849e26c0 849e2730 myfault+0x3dd
80395c64 81a50d8a 83746238 855802e0 00000000 nt!IopSynchronousServiceTail+0x1d9
80395d00 81a3aa61 83746238 849e26c0 00000000 nt!IopXxxControlFile+0x6b7
80395d34 8185ba7a 0000007c 00000000 00000000 nt!NtDeviceIoControlFile+0x2a
80395d34 770f9a94 0000007c 00000000 00000000 nt!KiFastCallEntry+0x12a
0012f4a0 77e84c9b 0000007c 00000000 00000000 ntdll!ZwDeviceIoControlFile+0xb
0012f504 004017c3 0000007c 83360018 00000000 KERNEL32!DeviceIoControl+0x100
000200ac 00000000 00000000 00000000 00000000 NotMyfault+0x17c3

The preceding stack shows that the Notmyfault executable image, shown at the bottom, invoked the DeviceIoControl function in Kernel32.dll, which in turn invoked ZwDeviceIo-Control File in Ntdll.dll, and so on, until finally the system crashed with the execution of an instruction in the Myfault image. A stack trace like this can be useful because crashes sometimes occur as the result of one driver passing another one that is improperly formatted or corrupt or has illegal parameters. The driver that’s passed the invalid data might cause a crash and get the blame in an analysis, when the stack reveals that another driver was involved. In this sample trace, no driver other than Myfault is listed. (The module “nt” is Ntoskrnl.)

If the driver singled out by an analysis is unfamiliar to you, use the lm (list modules) command to look at the driver’s version information. Add the k (kernel modules) and v (verbose) options along with the m (match) option followed by the name of the driver and a wildcard:

lkd> lm kv m myfault*
start end module name
a98e1000 a98e1ec0 myfault (deferred)
Image path: \??\C:\Windows\system32\drivers\myfault.sys
Image name: myfault.sys
Timestamp: Sat Oct 14 16:09:18 2006 (453143EE)
CheckSum: 0000295E
ImageSize: 00000EC0
File version:
Product version:
File flags: 0 (Mask 3F)
File OS: 40004 NT Win32
File type: 3.7 Driver
File date: 00000000.00000000
Translations: 0409.04b0
CompanyName: Sysinternals
ProductName: Sysinternals Myfault
InternalName: myfault.sys
OriginalFilename: myfault.sys
ProductVersion: 2.0
FileVersion: 2.0
FileDescription: Crash Test Driver
LegalCopyright: Copyright (C) M. Russinovich 2002-2004

In addition to using the description to identify the purpose of a driver, you can also use the file and product version numbers to see whether the version installed is the most up-to-date version available. (You can do this by checking the vendor Web site, for instance.) If version information isn’t present (because it might have been paged out of physical memory at the time of the crash), look at the driver image file’s properties in Windows Explorer on the system that crashed.

Source of Information : Microsoft Press Windows Internals 5th Edition

No comments:

Cloud storage is for blocks too, not just files

One of the misconceptions about cloud storage is that it is only useful for storing files. This assumption comes from the popularity of file...