NotReady node with high load and many D state processes in OpenShift 4
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
- Node
Issue
- There is a node in
NotReadystatus with very high load and many processes inDstate. - The load average shown for the node is very high, while there are no processes using excessive CPU.
Resolution
Follow the Diagnostic Steps section to find the cause of many processes in D status. There is additional information about processes in D state in what is "D" state (or dstate, d-state)?
If the cause is the rpc_wait_bit_killable, it typically means the processes are waiting on a response from a NFS server, and checking if there is any issue with the NFS will be required.
For processes in D state caused by NFS, please refer to the workaround below. For other causes shown in the ps -elfL WCHAN column, please open a This content is not included.new Support Case for troubleshooting.
Workaround for processes in D state caused by NFS
To clear NFS blocked processes that have not automatically recovered, the system will need to be rebooted as explained in is there a way to kill a process in the 'Z' or 'D' state without a reboot?
>If operations do not complete due to a hardware or software fault (for example network connectivity problems) it may not be possible to eliminate all processes in D state without rebooting.
Processes stuck in D status due to NFS can be caused by the recovery behavior of the NFS client after an NFS request times out. From the man nfs:
soft / hard
Determines the recovery behavior of the NFS client after an NFS request times out. If neither option is specified (or if the hard option is specified), NFS requests are retried indefinitely. If the soft option is specified, then the NFS client fails an NFS request after retrans retransmissions have been sent, causing the NFS client to return an error to the calling application.
NB: A so-called "soft" timeout **can cause silent data corruption** in certain cases. As such, use the soft option only when client responsiveness is more important than data integrity. Using NFS over TCP or increasing the value of the retrans option may mitigate some of the risks of using the soft option.
For configuring NFS mount options, refer to how to edit mount options for pods that are backed by NFS PersistentVolumes in OpenShift 4.
Root Cause
If there are many processes in D status due to rpc_wait_bit_killable, it typically means the processes are waiting on a response from a NFS server. When a NFS client is waiting on the server and doesn't receive a response, the processes will go into D state. The load average calculation includes blocked processes, so the load average will be very high and similar to the number of blocked processes.
Diagnostic Steps
-
Access the node via
oc debug node/[node_name]orssh. If accessed viaoc debug node, then runchroot /host bash). -
Check if the
uptimecommand shows a very high load average:# uptime 11:25:02 up 25 days, 8:45, 4 users, load average: 3524.92, 3524.04, 3523.24 -
Check the
WCHAN(waiting channel) column for the processes inDstate, which shows the name of the kernel function in which the process is sleeping:# ps -elfL | awk '{if($2~"D"){print $13}}' 3519 rpc_wa -
If the output shows
rpc_wa, it isrpc_wait_bit_killableand it typically means the processes are waiting on a response from a NFS server. -
Check if there are messages related to NFS in the
dmesg:$ grep -i nfs sos_commands/kernel/dmesg | wc -l 10750 $ grep -i nfs sos_commands/kernel/dmesg | less [...] [204462.604701] nfs: server 10.0.0.1 not responding, timed out [...] [9585786.855380] nfs: server 10.0.0.1 not responding, still trying
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.