Advanced Crash Dump Analysis - Stack Trashes

Stack overrun or stack trashing typically results from a buffer overrun or underrun or when a driver passes a buffer address located on the stack to a lower driver on the device stack, which then performs the work asynchronously.

In the case of a buffer overrun or underrun, instead of residing in pool, as you saw with Notmyfault’s buffer overrun bug, the target buffer is on the stack of the thread that executes the bug. This type of bug is another one that’s difficult to debug because the stack is the foundation for any crash dump analysis.

In the case of passing buffers on the stack to lower drivers, if the lower driver returns to the caller immediately because it used a completion routine to perform the work, instead of returning synchronously, when the completion routine is called, it will use the stack address that was passed previously, which could now correspond to a different state on the caller’s stack and result in corruption.

When you run Notmyfault and select Stack Trash, the Myfault driver overruns a buffer it allocates on the kernel stack of the thread that executes it. When Myfault tries to return control to the Ntoskrnl function that was invoked, it reads the return address, which is the address at which it should continue executing, from the stack. The address was corrupted by the stackbuffer overrun, so the thread continues execution at some different address in memory—an address that might not even contain code. An illegal exception and crash occur when the thread executes an illegal CPU instruction or it references invalid memory. The driver that the crash dump analysis of a stack overrun points the blame at will vary from crash to crash, but the stop code will almost always be KMODE_EXCEPTION_NOT_HANDLED. If you execute a verbose analysis, the stack trace looks like this:

STACK_TEXT:
881fc744 81c82590 0000008e c0000005 00000000 nt!KeBugCheckEx+0x1e
881fcb14 81ca45da 881fcb30 00000000 881fcb84 nt!KiDispatchException+0x1a9
881fcb7c 81ca458e 881fcc44 00000000 badb0d00 nt!CommonDispatchException+0x4a
881fcc2c 81d07fd3 9762b658 84736e68 84736e68 nt!Kei386EoiHelper+0x186
881fcc44 81e98615 99321810 84736e68 84736ed8 nt!IofCallDriver+0x63
881fcc64 81e98dba 9762b658 99321810 00000000 nt!IopSynchronousServiceTail+0x1d9
881fcd00 81e82a8d 9762b658 84736e68 00000000 nt!IopXxxControlFile+0x6b7
881fcd34 81ca3a1a 0000007c 00000000 00000000 nt!NtDeviceIoControlFile+0x2a
881fcd34 779e9a94 0000007c 00000000 00000000 nt!KiFastCallEntry+0x12a
WARNING: Frame IP not in any known module. Following frames may be wrong.
0012f9f4 00000000 00000000 00000000 00000000 0x779e9a94

Notice how the call to IofCallDriver leads immediately to Kei386EoiHelper and into an exception, instead of a driver’s IRP dispatch routine. This is consistent with the stack having been corrupted and the IRP dispatch routine causing an exception when attempting to return to its caller by referencing a corrupted return address. Unfortunately, mechanisms like special pool and system code write protection can’t catch this type of bug. Instead, you must take some manual analysis steps to determine indirectly which driver was operating at the time of the corruption. One way is to examine the IRPs that are in progress for the thread that was executing at the time of the stack trash. When a thread issues an I/O request, the I/O manager stores a pointer to the outstanding IRP on the IRP list of the ETHREAD structure for the thread. The !thread debugger command dumps the IRP list of the target thread. (If you don’t specify a thread object address, !thread dumps the processor’s current thread.) Then you can look at the IRP with the !irp command:

lkd> !thread
THREAD 858d1aa0 Cid 0248.02c0 Teb: 7ffd9000 Win32Thread: ffad4e90 RUNNING on processor 0
IRP List:
bc5a7f68: (0006,0094) Flags: 00000000 Mdl: 00000000
Not impersonating
Attached Process 84f45d90
...

lkd> !irp bc5a7f68
Irp is active with 1 stacks 1 is current (= 0x837a7ab8)
No Mdl Thread 858d1aa0: Irp stack trace.
cmd flg cl Device File Completion-Context
>[ e, 0] 0 0 856f6378 8504f290 00000000-00000000
\Driver\MYFAULT Args: 00000000 00000000 83360010 00000000

The output shows that the IRP’s current and only stack location (designated with the “>” prefix) is owned by the Myfault driver. If this were a real crash, the next steps would be to ensure that the driver version installed is the most recent available, install the new version if it isn’t, and if it is, to enable the Driver Verifier on the driver (with all settings except low memory simulation).

Manually analyzing the stack is often the most powerful technique when dealing with crashes such as these. Typically, this involves dumping the current stack pointer register (for example, esp and rsp on 32-bit and x64 respectively). However, because the code responsible for crashing the system itself might modify the stack in ways that make analysis difficult, the processor responsible for crashing the system provides a backing store for the current data in the stack, called KiPreBugcheckStackSaveArea, which contains a copy of the stack before any code in KeBugCheckEx executes. By using the dps (dump pointer with symbols) command in the debugger, you can dump this area (instead of the CPU’s stack pointer register) and resolve symbols in an attempt to discover any potential stack traces. In this crash, here’s what dumping the stack area eventually revealed on a 32-bit system.

kd> dps KiPreBugcheckStackSaveArea KiPreBugcheckStackSaveArea+3000
81d7dd20 881fcc44
81d7dd24 98fcf406 myfault+0x406
81d7dd28 badb0d00

Although this data was located among many other different functions, it is of special interest because it mentions a function in the Myfault driver, which as we’ve seen was currently executing an IRP, that doesn’t show on the stack.

Source of Information : Microsoft Press Windows Internals 5th Edition

No comments:

Cloud storage is for blocks too, not just files

One of the misconceptions about cloud storage is that it is only useful for storing files. This assumption comes from the popularity of file...