RHEL 7 system experiences connectivity issues while under load.
Environment
- Red Hat Enterprise Linux 7
- Possibly trigger by heavy network or memory load
- Possibly more likely to occur in environments with large MTU (Jumbo frames)
- Seen on an NFS client with kernel 3.10.0-957.5.1.el7 or NFS server with 3.10.0-862.14.4.el7, but applications other than NFS affected as well
ixgbeandmlx5based NICs (possibly others)
Issue
- A RHEL 7 host may lose some network connectivity, possibly for minutes at a time.
- May affect NFS and trigger nfs: server [...] not responding, still trying log messages on an NFS client machine.
- Connectivity with some subset of remote hosts may continue to function as expected while this is occurring.
- No obvious OS network error counter related to the issue.
Resolution
-
Increase the sysctl
vm.min_free_kbytesto something like ten times its default value: How to tune vm.min_free_kbytes -
Newer kernels include an SNMP counter
TcpExtPFMemallocDropwhich is incremented when this condition is met. This counter is available in all RHEL 8 kernels and in RHEL 7.7 (kernel-3.10.0-1062.el7 and above). Please see the Diagnostic Steps section of this article below for how to use it.
Root Cause
-
Packets may be silently lost on receive in the
sk_filter_trim_cap()function if it returns-ENOMEM:70 /** 71 * sk_filter_trim_cap - run a packet through a socket filter 72 * @sk: sock associated with &sk_buff 73 * @skb: buffer to filter 74 * @cap: limit on how short the eBPF program may trim the packet 75 * 76 * Run the filter code and then cut skb->data to correct size returned by 77 * sk_run_filter. If pkt_len is 0 we toss packet. If skb->len is smaller 78 * than pkt_len we keep whole skb->data. This is the socket level 79 * wrapper to sk_run_filter. It returns 0 if the packet should 80 * be accepted or -EPERM if the packet should be tossed. 81 * 82 */ 83 int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap) 84 { 85 int err; 86 struct sk_filter *filter; 87 >88 /* >89 * If the skb was allocated from pfmemalloc reserves, only >90 * allow SOCK_MEMALLOC sockets to use it as this socket is >91 * helping free memory >92 */ >93 if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC)) >94 return -ENOMEM; 95 96 err = security_sock_rcv_skb(sk, skb); 97 if (err) 98 return err; 99 100 rcu_read_lock(); 101 filter = rcu_dereference(sk->sk_filter); 102 if (filter) { 103 unsigned int pkt_len = SK_RUN_FILTER(filter, skb); 104 105 err = pkt_len ? pskb_trim(skb, max(cap, pkt_len)) : -EPERM; 106 } 107 rcu_read_unlock(); 108 109 return err; 110 } 111 EXPORT_SYMBOL(sk_filter_trim_cap); -
Increasing the sysctl
vm.min_free_kbytesavoids the condition. -
The issue has been reported on systems using
ixgbebased interfaces andmlx5based interfaces. -
Newer kernels add an SNMP counter in sk_filter_trim_cap() so the condition can be more easily recognized: Content from git.kernel.org is not included.net: add LINUX_MIB_PFMEMALLOCDROP counter
Diagnostic Steps
-
Check the kernel version in use:
$ uname -r -
For RHEL 8 and RHEL 7.7+ (kernel-3.10.0-1062.el7 and above) the
nstatcommand can be used to check theTcpExtPFMemallocDropcounter:$ nstat -rsz | grep TcpExtPFMemallocDrop TcpExtPFMemallocDrop 0 0.0 -
For older RHEL 7 kernels nothing is logged and no counter is incremented if the function
sk_filter_trim_cap()returns-ENOMEM. In this case, the return value of the function can be probed with tools such asperforSystemTap. An example probe usingperfwhich will watch for the condition for 10 seconds:# perf probe -a 'sk_filter_trim_cap%return return=$retval:s32' # perf record -e probe:sk_filter_trim_cap -agR --filter 'return < 0' sleep 10 // or for kernels < 3.10.0-1062.el7 : # perf record -e probe:sk_filter_trim_cap__return -agR --filter 'return < 0' sleep 10 # perf report: Samples: 692 of event 'probe:sk_filter_trim_cap', Event count (approx.): 692 Children Self Trace output + 100.00% 100.00% (ffffffff8a0551c0 <- ffffffff8a0a735c) return=-12
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.