RHEL 6.9 nfs or lockd server not responding - due to TCP 3-way handshake failure during reconnect, NFS client erroneously sends burst of SYNs
Environment
- Red Hat Enterprise Linux 6.9
- kernels 2.6.32-696.el6 up to, but not including 2.6.32-696.6.3.el6
- Fixed in 2.6.32-696.6.3.el6
- NFS
- seen with NFSv3 and lockd traffic
- so far this issue has not been seen with NFSv4.0 or NFS4.1
- seen with iptables / nf_conntrack
- NOTE: It is possible iptables is not a necessary condition for this problem to occur, and there may be other failure modes unrelated to iptables
Issue
- A RHEL 6.9 client fails to reconnect to the NFS server and server not responding messages are seen. At the TCP level, the second step (the SYN,ACK) of the 3-way TCP handshake is failing with
ICMP 102 Destination unreachable (Host administratively prohibited)sent by the NFS client. - NFS share cannot reconnect due to the 3-way TCP handshake failure. The NFS client erroneously sends multiple SYN packets from the same TCP port but different sequence numbers, the NFS server responds with SYN,ACK to one of the SYNs, but not the others, and this sequence leads to confusion between the NFS client and server's TCP stacks. As a result of the confusion, the NFS client never sends the final ACK and so the NFS share cannot be reconnected, leading to a DoS of the NFS share.
- NFS client TCP connection goes idle and disconnects, and upon attempt to re-use the NFS share, the share is not usable and the following errors are seen:
xs_tcp_setup_socket: connect returned unhandled error -107
Resolution
Red Hat Enterprise Linux 6
- The issue (bz1450850) has been resolved with errata RHSA-2017:1723 with the following package(s):
kernel-2.6.32-696.6.3.el6or later.
Workarounds
- Any of the following should work around the problem, though a few workarounds listed are not 100% reliable
- Disable iptables, even only temporarily (not fully reliable). After the connection to the NFS server is established, iptables can be restarted with unchanged rules - however, the issue is likely to reoccur. NOTE: This workaround is only appropriate if iptables is active in the environment, and even if you disable iptables you may still hit this problem.
- Boot back to a kernel prior to 2.6.32-696.*el6
- Use some approach to ensure the NFS (and lockd) TCP connections do not go idle and drop (Note that the idle timer will drop a connection after 5 minutes). One possibility is to use a variation of a script to monitor for hung NFS mount points as described in <https://access.redhat.com/solutions/97873>.
- Use NFSv4.1 or NFSv4.0 (not fully reliable). NOTE: This issue can still occur on NFSv4.0 or NFSv4.1, but it is believed to be harder to reproduce due to the fact that NFS4.x periodically issues either NFS4 RENEW or SEQUENCE operations to keep the TCP connection active. However, on NFS 4.x any NFS operation completing with an error code (i.e. NFS4ERR_BAD_STATEID) will cause the TCP connection to drop so this issue can occur.
Root Cause
-
Regression that will result in multiple NFS / RPC commands initiating a connection on a socket after a connection has dropped. The regression occurs due to a series of patches related to the re-use of ports after a sunrpc TCP connection drops and reconnect occurs. One patch was missed in this series and caused this problem.
-
Note that TCP port re-use has changed through the lifetime of RHEL6. For more information, see <https://access.redhat.com/solutions/2979751>
-
Currently the following are the main findings:
- The tcpdumps show an ICMP error which normally indicates a firewall problem, and disabling iptables resolves the issue.
6816 15:40:15.891399 6.005801 10.0.0.182 → 10.0.0.36 TCP 74 [TCP ACKed unseen segment] [TCP Retransmission] 304 → 866 [SYN, ACK] Seq=2901992808 Ack=956456355 Win=65535 Len=0 MSS=1460 WS=64 SACK_PERM=1 TSval=2932388808 TSecr=429565362
6817 15:40:15.891491 0.000092 10.0.0.36 → 10.0.0.182 ICMP 102 Destination unreachable (Host administratively prohibited)
-
A series of sunrpc commits were included with the 6.9 kernel which change how rpc choses the source port for TCP connections. It now uses the SO_REUSEPORT socket option.
-
The following commit is under test and believed to fix this problem
commit 0fdea1e8a2853f79d39b8555cc9de16a7e0ab26f
Author: Trond Myklebust <trond.myklebust@primarydata.com>
Date: Wed Sep 16 23:43:17 2015 -0400
SUNRPC: Ensure that we wait for connections to complete before retrying
Commit 718ba5b87343, moved the responsibility for unlocking the socket to
xs_tcp_setup_socket, meaning that the socket will be unlocked before we
know that it has finished trying to connect. The following patch is based on
an initial patch by Russell King to ensure that we delay clearing the
XPRT_CONNECTING flag until we either know that we failed to initiate
a connection attempt, or the connection attempt itself failed.
Diagnostic Steps
General steps
- Gather a tcpdump at the time of the unresponsive issue.
- Analyze the tcpdump with tshark or wireshark, and make sure to turn off relative sequence numbers. With tshark, this is the following option: "-o tcp.relative_sequence_numbers:FALSE "
- Look for multiple 'SYN' packets from the NFS client being sent within the same fraction of a second.
Analysis of tcpdump: NLM connection with iptables
- In the pcap there is a connection to the rpcbind process at port 111. This tells the client to connect to port 304:
$ TZ=EST5EDT tshark -t a -r tcpdump-2017-04-24-15-39-45.pcap0.gz -Y "tcp.stream == 2 && portmap"
6775 15:40:15.868378 0.000000 10.0.0.36 → 10.0.0.182 Portmap 154 V2 GETPORT Call NLM(100021) V:4 TCP
6777 15:40:15.868822 0.000444 10.0.0.182 → 10.0.0.36 Portmap 98 V2 GETPORT Reply (Call In 6775) Port:304
- The client 10.0.0.36 has already been making attempts to connect to 10.0.0.182:304 TCP and it gets a SYN ACK from 10.0.0.182 (frame 1227). However, this SYN ACK has a completed unexpected ACK number given the SEQ used in the clients SYN. This is presumably why the client responds with an ICMP Destination Unreachable. If the client 10.0.0.36 is running something like iptables its connection tracking (nf_conntrack) handler will not match (ACCEPT) the SYN ACK with the unexpected ACK value and the packet will presumably fall through to some REJECT rule.
$ TZ=EST5EDT tshark -t a -r tcpdump-2017-04-24-15-39-45.pcap0.gz -Y "tcp.stream == 1"
1226 15:39:50.891862 0.000000 10.0.0.36 → 10.0.0.182 TCP 74 866 → 304 [SYN] Seq=4115864593 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=429549362 TSecr=0 WS=128
1227 15:39:50.891963 0.000101 10.0.0.182 → 10.0.0.36 TCP 74 [TCP ACKed unseen segment] 304 → 866 [SYN, ACK] Seq=2901992808 Ack=956456355 Win=65535 Len=0 MSS=1460 WS=64 SACK_PERM=1 TSval=2932388808 TSecr=429549362
1228 15:39:50.892008 0.000045 10.0.0.36 → 10.0.0.182 ICMP 102 Destination unreachable (Host administratively prohibited)
1984 15:39:53.887879 2.995871 10.0.0.182 → 10.0.0.36 TCP 74 [TCP ACKed unseen segment] [TCP Retransmission] 304 → 866 [SYN, ACK] Seq=2901992808 Ack=956456355 Win=65535 Len=0 MSS=1460 WS=64 SACK_PERM=1 TSval=2932388808 TSecr=429549362
1985 15:39:53.887921 0.000042 10.0.0.36 → 10.0.0.182 ICMP 102 Destination unreachable (Host administratively prohibited)
3434 15:39:59.896455 6.008534 10.0.0.182 → 10.0.0.36 TCP 74 [TCP ACKed unseen segment] [TCP Retransmission] 304 → 866 [SYN, ACK] Seq=2901992808 Ack=956456355 Win=65535 Len=0 MSS=1460 WS=64 SACK_PERM=1 TSval=2932388808 TSecr=429549362
3435 15:39:59.896515 0.000060 10.0.0.36 → 10.0.0.182 ICMP 102 Destination unreachable (Host administratively prohibited)
3995 15:40:06.891804 6.995289 10.0.0.36 → 10.0.0.182 TCP 74 [TCP Retransmission] 866 → 304 [SYN] Seq=4115864593 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=429565362 TSecr=0 WS=128
3996 15:40:06.891898 0.000094 10.0.0.182 → 10.0.0.36 TCP 74 [TCP ACKed unseen segment] [TCP Retransmission] 304 → 866 [SYN, ACK] Seq=2901992808 Ack=956456355 Win=65535 Len=0 MSS=1460 WS=64 SACK_PERM=1 TSval=2932388808 TSecr=429565362
3997 15:40:06.891936 0.000038 10.0.0.36 → 10.0.0.182 ICMP 102 Destination unreachable (Host administratively prohibited)
4867 15:40:09.885497 2.993561 10.0.0.182 → 10.0.0.36 TCP 74 [TCP ACKed unseen segment] [TCP Retransmission] 304 → 866 [SYN, ACK] Seq=2901992808 Ack=956456355 Win=65535 Len=0 MSS=1460 WS=64 SACK_PERM=1 TSval=2932388808 TSecr=429565362
4868 15:40:09.885598 0.000101 10.0.0.36 → 10.0.0.182 ICMP 102 Destination unreachable (Host administratively prohibited)
6816 15:40:15.891399 6.005801 10.0.0.182 → 10.0.0.36 TCP 74 [TCP ACKed unseen segment] [TCP Retransmission] 304 → 866 [SYN, ACK] Seq=2901992808 Ack=956456355 Win=65535 Len=0 MSS=1460 WS=64 SACK_PERM=1 TSval=2932388808 TSecr=429565362
6817 15:40:15.891491 0.000092 10.0.0.36 → 10.0.0.182 ICMP 102 Destination unreachable (Host administratively prohibited)
- In another tcpdump tracing TCP reconnect with NLM, a burst of SYN packets from the NFS client, each from the same TCP port, but different sequence numbers, all in the same fraction of a second. The server is a touch slower to respond. By the time the server's SYN ACK for the first client SYN arrives, the client already has three other outstanding SYNs and nf_conntrack does not like it. The burst of SYN packets from the NFS client on the same source port is unusual, and looks like a bug in the NFS client reconnect logic, which causes a DoS on the NFS client being unable to finish the 3-way handshake to the NFS server.
809763 13:06:49.637735 399.059843 10.0.0.47 → 10.0.0.81 TCP 74 [TCP Port numbers reused] 705 → 304 [SYN] Seq=2301595334 Win=14480 Len=0 MSS=1460 SACK_PERM=1 TSval=110171009 TSecr=1300375205 WS=128
809764 13:06:49.637825 0.000090 10.0.0.47 → 10.0.0.81 TCP 74 [TCP Port numbers reused] 705 → 304 [SYN] Seq=4241129203 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=110171009 TSecr=0 WS=128
809765 13:06:49.637886 0.000061 10.0.0.47 → 10.0.0.81 TCP 74 [TCP Port numbers reused] 705 → 304 [SYN] Seq=4241130150 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=110171009 TSecr=0 WS=128
809766 13:06:49.637933 0.000047 10.0.0.47 → 10.0.0.81 TCP 74 [TCP Port numbers reused] 705 → 304 [SYN] Seq=4241130922 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=110171009 TSecr=0 WS=128
809767 13:06:49.637976 0.000043 10.0.0.81 → 10.0.0.47 TCP 74 304 → 705 [SYN, ACK] Seq=1488229741 Ack=2301595335 Win=65535 Len=0 MSS=1460 WS=64 SACK_PERM=1 TSval=1978101347 TSecr=110171009
809768 13:06:49.638001 0.000025 10.0.0.47 → 10.0.0.81 ICMP 102 Destination unreachable (Host administratively prohibited)
809769 13:06:49.638086 0.000085 10.0.0.81 → 10.0.0.47 TCP 74 [TCP Out-Of-Order] 304 → 705 [SYN, ACK] Seq=1488229741 Ack=2301595335 Win=65535 Len=0 MSS=1460 WS=64 SACK_PERM=1 TSval=1978101347 TSecr=110171009
809770 13:06:49.638095 0.000009 10.0.0.47 → 10.0.0.81 ICMP 102 Destination unreachable (Host administratively prohibited)
809771 13:06:49.638131 0.000036 10.0.0.81 → 10.0.0.47 TCP 74 [TCP Out-Of-Order] 304 → 705 [SYN, ACK] Seq=1488229741 Ack=2301595335 Win=65535 Len=0 MSS=1460 WS=64 SACK_PERM=1 TSval=1978101347 TSecr=110171009
809772 13:06:49.638139 0.000008 10.0.0.47 → 10.0.0.81 ICMP 102 Destination unreachable (Host administratively prohibited)
809773 13:06:49.638147 0.000008 10.0.0.81 → 10.0.0.47 TCP 74 [TCP Out-Of-Order] 304 → 705 [SYN, ACK] Seq=1488229741 Ack=2301595335 Win=65535 Len=0 MSS=1460 WS=64 SACK_PERM=1 TSval=1978101347 TSecr=110171009
809774 13:06:49.638151 0.000004 10.0.0.47 → 10.0.0.81 ICMP 102 Destination unreachable (Host administratively prohibited)
810089 13:06:50.636568 0.998417 10.0.0.47 → 10.0.0.81 TCP 74 [TCP Retransmission] 705 → 304 [SYN] Seq=4241130922 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=110172009 TSecr=0 WS=128
810090 13:06:50.636754 0.000186 10.0.0.81 → 10.0.0.47 TCP 74 [TCP Retransmission] 304 → 705 [SYN, ACK] Seq=1488229741 Ack=2301595335 Win=65535 Len=0 MSS=1460 WS=64 SACK_PERM=1 TSval=1978101347 TSecr=110172009
810091 13:06:50.636807 0.000053 10.0.0.47 → 10.0.0.81 ICMP 102 Destination unreachable (Host administratively prohibited)
810593 13:06:52.636640 1.999833 10.0.0.47 → 10.0.0.81 TCP 74 [TCP Retransmission] 705 → 304 [SYN] Seq=4241130922 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=110174009 TSecr=0 WS=128
810594 13:06:52.636881 0.000241 10.0.0.81 → 10.0.0.47 TCP 74 [TCP Retransmission] 304 → 705 [SYN, ACK] Seq=1488229741 Ack=2301595335 Win=65535 Len=0 MSS=1460 WS=64 SACK_PERM=1 TSval=1978101347 TSecr=110174009
810595 13:06:52.636979 0.000098 10.0.0.47 → 10.0.0.81 ICMP 102 Destination unreachable (Host administratively prohibited)
810682 13:06:55.638900 3.001921 10.0.0.81 → 10.0.0.47 TCP 74 [TCP Retransmission] 304 → 705 [SYN, ACK] Seq=1488229741 Ack=2301595335 Win=65535 Len=0 MSS=1460 WS=64 SACK_PERM=1 TSval=1978101347 TSecr=110174009
810683 13:06:55.638966 0.000066 10.0.0.47 → 10.0.0.81 ICMP 102 Destination unreachable (Host administratively prohibited)
810719 13:06:56.636591 0.997625 10.0.0.47 → 10.0.0.81 TCP 74 [TCP Retransmission] 705 → 304 [SYN] Seq=4241130922 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=110178009 TSecr=0 WS=128
810720 13:06:56.636710 0.000119 10.0.0.81 → 10.0.0.47 TCP 74 [TCP Retransmission] 304 → 705 [SYN, ACK] Seq=1488229741 Ack=2301595335 Win=65535 Len=0 MSS=1460 WS=64 SACK_PERM=1 TSval=1978101347 TSecr=110178009
810721 13:06:56.636752 0.000042 10.0.0.47 → 10.0.0.81 ICMP 102 Destination unreachable (Host administratively prohibited)
810901 13:06:59.630044 2.993292 10.0.0.47 → 10.0.0.81 TCP 74 [TCP Port numbers reused] 705 → 304 [SYN] Seq=102290257 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=110181002 TSecr=0 WS=128
810902 13:06:59.630124 0.000080 10.0.0.81 → 10.0.0.47 TCP 74 304 → 705 [SYN, ACK] Seq=1488229741 Ack=2301595335 Win=65535 Len=0 MSS=1460 WS=64 SACK_PERM=1 TSval=1978101347 TSecr=110181002
810903 13:06:59.630145 0.000021 10.0.0.47 → 10.0.0.81 ICMP 102 Destination unreachable (Host administratively prohibited)
Analysis of tcpdump: NFS example without iptables
- In this variant of this problem, the TCP connection for NFS runs into problems due to the multiple SYNs from the NFS client, and the NFS server responding to the second SYN, which confuses the NFS client's TCP stack.
- The following sequence occurs
- Frames 48-49: NFS client sends a duplicate SYN, the first one has Seq=3677241340, and the second one has Seq=3677245016, and both are sent in the same fraction of a second.
- Frame 50: NFS server responds with Ack=3677241341, which is a response to the first SYN from the NFS client
- Frame 51: NFS client responds with RST and Seq=3677241341, indicating the Ack packet in frame 50 is not understood, and the connection should be reset
- Frame 52: NFS server responds with RST, ACK, indicating it has reset the connection
- Frames 53-57: The sequence in 1-4 repeats
- Frames 58-62: The sequence in 1-4 repeats
48 2017-05-17 20:24:50.684597 192.168.122.18 -> 192.168.122.16 TCP 74 [TCP Port numbers reused] 815 > 2049 [SYN] Seq=3677241340 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=966741 TSecr=0 WS=128
49 2017-05-17 20:24:50.684833 192.168.122.18 -> 192.168.122.16 TCP 74 [TCP Port numbers reused] 815 > 2049 [SYN] Seq=3677245016 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=966742 TSecr=0 WS=128
50 2017-05-17 20:24:50.685595 192.168.122.16 -> 192.168.122.18 TCP 74 2049 > 815 [SYN, ACK] Seq=1365940280 Ack=3677241341 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=1741462612 TSecr=966741 WS=128
51 2017-05-17 20:24:50.685625 192.168.122.18 -> 192.168.122.16 TCP 54 815 > 2049 [RST] Seq=3677241341 Win=0 Len=0
52 2017-05-17 20:24:50.685654 192.168.122.16 -> 192.168.122.18 TCP 54 2049 > 815 [RST, ACK] Seq=0 Ack=3677245017 Win=0 Len=0
53 2017-05-17 20:24:50.689371 192.168.122.18 -> 192.168.122.16 TCP 74 [TCP Port numbers reused] 815 > 2049 [SYN] Seq=3677316102 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=966746 TSecr=0 WS=128
54 2017-05-17 20:24:50.689452 192.168.122.18 -> 192.168.122.16 TCP 74 [TCP Port numbers reused] 815 > 2049 [SYN] Seq=3677317394 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=966746 TSecr=0 WS=128
55 2017-05-17 20:24:50.689666 192.168.122.16 -> 192.168.122.18 TCP 74 2049 > 815 [SYN, ACK] Seq=1366005876 Ack=3677316103 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=1741462616 TSecr=966746 WS=128
56 2017-05-17 20:24:50.689688 192.168.122.18 -> 192.168.122.16 TCP 54 815 > 2049 [RST] Seq=3677316103 Win=0 Len=0
57 2017-05-17 20:24:50.689711 192.168.122.16 -> 192.168.122.18 TCP 54 2049 > 815 [RST, ACK] Seq=0 Ack=3677317395 Win=0 Len=0
58 2017-05-17 20:24:50.689766 192.168.122.18 -> 192.168.122.16 TCP 74 [TCP Port numbers reused] 815 > 2049 [SYN] Seq=3677322298 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=966746 TSecr=0 WS=128
59 2017-05-17 20:24:50.689826 192.168.122.18 -> 192.168.122.16 TCP 74 [TCP Port numbers reused] 815 > 2049 [SYN] Seq=3677323273 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=966747 TSecr=0 WS=128
60 2017-05-17 20:24:50.689968 192.168.122.16 -> 192.168.122.18 TCP 74 2049 > 815 [SYN, ACK] Seq=1366010498 Ack=3677322299 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=1741462617 TSecr=966746 WS=128
61 2017-05-17 20:24:50.689980 192.168.122.18 -> 192.168.122.16 TCP 54 815 > 2049 [RST] Seq=3677322299 Win=0 Len=0
62 2017-05-17 20:24:50.690001 192.168.122.16 -> 192.168.122.18 TCP 54 2049 > 815 [RST, ACK] Seq=0 Ack=3677323274 Win=0 Len=0
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.