NFS hangs when sysctl tcp_frto=2 due to TCP recovery issues
Environment
- Red Hat Enterprise Linux
- NFS
Issue
- Although the problem was originally reported after a RHEL 5 to RHEL 6 upgrade. This can happen on any RHEL release when "net.ipv4.tcp_frto = 2" which is the default from RHEL6 onwards.
- We have eight rhel 5 servers that we started upgrading to rhel6 (fresh clean install), what we have discovered after upgrading a couple is that in rhel 6, the NFS locks up after a bit of heavier traffic to the nfs.
Jul 14 14:02:05 client kernel: nfs: server server not responding, still trying
Jul 14 14:02:10 client kernel: nfs: server server not responding, still trying
Jul 14 14:02:10 client kernel: nfs: server server not responding, still trying
Jul 14 14:02:38 client kernel: nfs: server OK
Jul 14 14:02:38 client kernel: nfs: server OK
We can reproduce this every time. However, we can not duplicate this on rhel 5 servers on the same network. We can duplicate this on different hardware and even a VM. We use the same NFS mount options as we did with RHEL5 etc. etc. etc. but it locks up every time.
Resolution
Change the following sysctl from 2 to 0. Add the following line to /etc/sysctl.conf
net.ipv4.tcp_frto = 0
Then execute sysctl -p for the setting to effect:
# sysctl -p
Root Cause
If TCP data is missing from a TCP stream then the receiver will send a series of duplicate ACKs to initiate a fast retransmission. This is expected in normal TCP operation as long as the connection recovers from the packet drop. But in this case there were duplicate ACKs every few milliseconds both ways slowing recovery and causing the RPC layer to report nfs: server not responding, still trying errors. The net.ipv4.tcp_frto value has been 2 for a long time and it is not understood what triggered this behaviour.
The default value of net.ipv4.tcp_frto changed from 0 in RHEL 5 to 2 in RHEL 6
tcp_frto (integer; default: 0; since Linux 2.4.21/2.6)
Enable F-RTO, an enhanced recovery algorithm for TCP retrans‐
mission timeouts (RTOs). It is particularly beneficial in
wireless environments where packet loss is typically due to
random radio interference rather than intermediate router con‐
gestion. See RFC 4138 for more details.
This file can have one of the following values:
0 Disabled.
1 The basic version F-RTO algorithm is enabled.
2 Enable SACK-enhanced F-RTO if flow uses SACK. The basic
version can be used also when SACK is in use though in that
case scenario(s) exists where F-RTO interacts badly with the
packet counting of the SACK-enabled TCP flow.
Diagnostic Steps
- The pcap trace will show 100s of duplicate ACKs followed by retransmissions. The duplicate ACKs will alternate both ways "client->server" and "server->client" indicating a stalled TCP session as follows:
$ tshark -r /tmp/small.pcap 'frame.number > 124297 && frame.number < 124309'
2015-07-15 05:01:26.598318 12.307675 server -> client TCP 66 [TCP Dup ACK 123984#157] nfs > netconfsoapbeep [PSH, ACK] Seq=335889396 Ack=3432155397 Win=33120 Len=0 TSval=3002711016 TSecr=253665 124298 0.000000 335889396
2015-07-15 05:01:26.598339 12.307696 client -> server TCP 66 [TCP Dup ACK 123985#157] netconfsoapbeep > nfs [ACK] Seq=3432221061 Ack=335889396 Win=501 Len=0 TSval=253991 TSecr=3002711338 124299 0.000021 3432221061
2015-07-15 05:01:26.600256 12.309613 server -> client TCP 66 [TCP Dup ACK 123984#158] nfs > netconfsoapbeep [PSH, ACK] Seq=335889396 Ack=3432155397 Win=33120 Len=0 TSval=3002711016 TSecr=253665 124300 0.001917 335889396
2015-07-15 05:01:26.600277 12.309634 client -> server TCP 66 [TCP Dup ACK 123985#158] netconfsoapbeep > nfs [ACK] Seq=3432221061 Ack=335889396 Win=501 Len=0 TSval=253993 TSecr=3002711338 124301 0.000021 3432221061
2015-07-15 05:01:26.601468 12.310825 server -> client TCP 66 [TCP Dup ACK 123984#159] nfs > netconfsoapbeep [PSH, ACK] Seq=335889396 Ack=3432155397 Win=33120 Len=0 TSval=3002711016 TSecr=253665 124302 0.001191 335889396
2015-07-15 05:01:26.601489 12.310846 client -> server TCP 66 [TCP Dup ACK 123985#159] netconfsoapbeep > nfs [ACK] Seq=3432221061 Ack=335889396 Win=501 Len=0 TSval=253994 TSecr=3002711338 124303 0.000021 3432221061
2015-07-15 05:01:26.603866 12.313223 server -> client TCP 66 [TCP Dup ACK 123984#160] nfs > netconfsoapbeep [PSH, ACK] Seq=335889396 Ack=3432155397 Win=33120 Len=0 TSval=3002711016 TSecr=253665 124304 0.002377 335889396
2015-07-15 05:01:26.603887 12.313244 client -> server TCP 66 [TCP Dup ACK 123985#160] netconfsoapbeep > nfs [ACK] Seq=3432221061 Ack=335889396 Win=501 Len=0 TSval=253996 TSecr=3002711338 124305 0.000021 3432221061
2015-07-15 05:01:26.604202 12.313559 server -> client TCP 66 [TCP Dup ACK 123984#161] nfs > netconfsoapbeep [PSH, ACK] Seq=335889396 Ack=3432155397 Win=33120 Len=0 TSval=3002711016 TSecr=253665 124306 0.000315 335889396
2015-07-15 05:01:26.604215 12.313572 client -> server TCP 66 [TCP Dup ACK 123985#161] netconfsoapbeep > nfs [ACK] Seq=3432221061 Ack=335889396 Win=501 Len=0 TSval=253997 TSecr=3002711338 124307 0.000013 3432221061
Followed by a retransmission:
2015-07-15 05:01:26.607051 12.316408 client -> server RPC 1434 [TCP Fast Retransmission] Continuation 124308 0.002836 3432155397 3432156765
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.