Windows stops with MEMORY MANAGEMENT code
Environment
- Red Hat OpenShift Container Platform 4.x
- Red Hat OpenShift Virtualization 4.x
Issue
- Microsoft Windows running on OpenShift Virtualization has random BSODs with these codes:
- 0x1A - MEMORY MANAGEMENT (most common)
- 0x7A - KERNEL DATA INPAGE ERROR
- 0x50 - PAGE FAULT IN NONPAGED AREA
- 0xEF - CRTCL PROCESS DIED
Resolution
- Ensure the storage is not producing excessive IO delays.
- Verify storage configuration with your storage vendor.
- On the server, check for any storage errors:
- For Filesystem based storage a general start of diagnosing this can be reviewed here: Gathering data and logs to troubleshoot NFS issues
- For Block baed storage a general start of diagnosing this can be reviewed here: Troubleshooting performance issue in RHEL
- If no storage delays are found, it could be a known driver bug with similar symptoms, see Windows stops with MEMORY MANAGEMENT code on low latency storage.
-
Ensure the Virtual Machine is configured with adequate RAM memory size for its workload, to avoid excessive use of pagefile.
-
As a workaround while storage delays are investigated, a higher timeout can be set on virtio-blk (when using VirtIO interface) and/or virtio-scsi (when using SCSI interface). In the Windows registry, set these below to REG_DWORD value higher than the default (60s), such as 120 (for 120s). But note this will prevent the BSODs as long as the delays are lower than this timeout, but may the storage issue may still be present and cause other delays and performance issues. The system may also take more time to recover from I/O issues. Storage delays should still be investigated.
HKLM\System\CurrentControlSet\Services\viostor\Parameters\IoTimeoutValue HKLM\System\CurrentControlSet\Services\vioscsi\Parameters\IoTimeoutValue
Root Cause
- These stop codes usually relate to a slowdown in storage (60s or more by default), causing delayed IO during paging operations which trigger a Windows BSOD.
- This was investigated in This content is not included.RHEL-115745 - Multiple BSODs on Windows virtual machines probably related to the storage backend.
- The CRC error (ReportPageHashError) is due to an issue with transfer length of the virtio driver during a reset (which happens on the I/O timeout of 60s from the Guest side). This is fixed in virtio-win-1.9.49-0.el9_6.iso. But this only changes the BSOD code from
MEMORY MANAGEMENTtoCRTCL PROCESS DIEDto more appropriately reflect the situation. The main issue caused by slower storage or stuck IO remains and still ultimately causes the BSOD.
Diagnostic Steps
- In the guest console, look for the stop code.
- Open the crashdump with WinDbg and check for traces of page hash errors:
0: kd> k
Child-SP RetAddr Call Site
00 ffff9b8c`0c855b28 fffff806`4516e7e2 nt!KeBugCheckEx
01 (Inline Function) ------{}`{}------ nt!MiReportPageHashError+0x25
02 ffff9b8c`0c855b30 fffff806`450cadc6 nt!MiValidatePagefilePageHash+0x30e
03 ffff9b8c`0c855c10 fffff806`44f48b0d nt!MiWaitForInPageComplete+0x1828c6
04 ffff9b8c`0c855d00 fffff806`44f6000f nt!MiIssueHardFault+0x1ad
05 ffff9b8c`0c855db0 fffff806`4506cf58 nt!MmAccessFault+0x32f
06 ffff9b8c`0c855f50 fffff806`44faf64a nt!KiPageFault+0x358
07 ffff9b8c`0c8560e0 fffff806`451ed06d nt!ExAllocateHeapPool+0xaca
08 ffff9b8c`0c8561d0 fffff80f`b8a2d6a6 nt!ExAllocatePoolWithTag+0x3d
09 ffff9b8c`0c8562b0 fffff80f`b8a2306c Ntfs!NtfsCreateLcb+0x276
0a ffff9b8c`0c856340 fffff80f`b8a1727f Ntfs!NtfsOpenFile+0xa8c
0b ffff9b8c`0c8565e0 fffff80f`b8a11450 Ntfs!NtfsCommonCreate+0x1cef
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.