NFS hangs when sysctl tcp_frto=2 due to TCP recovery issues

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux
  • NFS

Issue

  • Although the problem was originally reported after a RHEL 5 to RHEL 6 upgrade. This can happen on any RHEL release when "net.ipv4.tcp_frto = 2" which is the default from RHEL6 onwards.
  • We have eight rhel 5 servers that we started upgrading to rhel6 (fresh clean install), what we have discovered after upgrading a couple is that in rhel 6, the NFS locks up after a bit of heavier traffic to the nfs.
Jul 14 14:02:05 client kernel: nfs: server server not responding, still trying
Jul 14 14:02:10 client kernel: nfs: server server not responding, still trying
Jul 14 14:02:10 client kernel: nfs: server server not responding, still trying
Jul 14 14:02:38 client kernel: nfs: server  OK
Jul 14 14:02:38 client kernel: nfs: server  OK

We can reproduce this every time. However, we can not duplicate this on rhel 5 servers on the same network. We can duplicate this on different hardware and even a VM. We use the same NFS mount options as we did with RHEL5 etc. etc. etc. but it locks up every time.

Resolution

Change the following sysctl from 2 to 0. Add the following line to /etc/sysctl.conf

net.ipv4.tcp_frto = 0

Then execute sysctl -p for the setting to effect:

# sysctl -p

Root Cause

If TCP data is missing from a TCP stream then the receiver will send a series of duplicate ACKs to initiate a fast retransmission. This is expected in normal TCP operation as long as the connection recovers from the packet drop. But in this case there were duplicate ACKs every few milliseconds both ways slowing recovery and causing the RPC layer to report nfs: server not responding, still trying errors. The net.ipv4.tcp_frto value has been 2 for a long time and it is not understood what triggered this behaviour.

The default value of net.ipv4.tcp_frto changed from 0 in RHEL 5 to 2 in RHEL 6

tcp_frto (integer; default: 0; since Linux 2.4.21/2.6)
              Enable  F-RTO,  an enhanced recovery algorithm for TCP retrans‐
              mission timeouts (RTOs).   It  is  particularly  beneficial  in
              wireless  environments  where  packet  loss is typically due to
              random radio interference rather than intermediate router  con‐
              gestion.  See RFC 4138 for more details.

              This file can have one of the following values:

              0  Disabled.

              1  The basic version F-RTO algorithm is enabled.

              2  Enable  SACK-enhanced  F-RTO  if  flow uses SACK.  The basic
                 version can be used also when SACK is in use though in  that
                 case scenario(s) exists where F-RTO interacts badly with the
                 packet counting of the SACK-enabled TCP flow.

Diagnostic Steps

  • The pcap trace will show 100s of duplicate ACKs followed by retransmissions. The duplicate ACKs will alternate both ways "client->server" and "server->client" indicating a stalled TCP session as follows:
$ tshark -r /tmp/small.pcap 'frame.number > 124297 && frame.number < 124309'
2015-07-15 05:01:26.598318  12.307675 server -> client TCP 66 [TCP Dup ACK 123984#157] nfs > netconfsoapbeep [PSH, ACK] Seq=335889396 Ack=3432155397 Win=33120 Len=0 TSval=3002711016 TSecr=253665 124298 0.000000 335889396   
2015-07-15 05:01:26.598339  12.307696 client -> server TCP 66 [TCP Dup ACK 123985#157] netconfsoapbeep > nfs [ACK] Seq=3432221061 Ack=335889396 Win=501 Len=0 TSval=253991 TSecr=3002711338 124299 0.000021 3432221061   
2015-07-15 05:01:26.600256  12.309613 server -> client TCP 66 [TCP Dup ACK 123984#158] nfs > netconfsoapbeep [PSH, ACK] Seq=335889396 Ack=3432155397 Win=33120 Len=0 TSval=3002711016 TSecr=253665 124300 0.001917 335889396   
2015-07-15 05:01:26.600277  12.309634 client -> server TCP 66 [TCP Dup ACK 123985#158] netconfsoapbeep > nfs [ACK] Seq=3432221061 Ack=335889396 Win=501 Len=0 TSval=253993 TSecr=3002711338 124301 0.000021 3432221061   
2015-07-15 05:01:26.601468  12.310825 server -> client TCP 66 [TCP Dup ACK 123984#159] nfs > netconfsoapbeep [PSH, ACK] Seq=335889396 Ack=3432155397 Win=33120 Len=0 TSval=3002711016 TSecr=253665 124302 0.001191 335889396   
2015-07-15 05:01:26.601489  12.310846 client -> server TCP 66 [TCP Dup ACK 123985#159] netconfsoapbeep > nfs [ACK] Seq=3432221061 Ack=335889396 Win=501 Len=0 TSval=253994 TSecr=3002711338 124303 0.000021 3432221061   
2015-07-15 05:01:26.603866  12.313223 server -> client TCP 66 [TCP Dup ACK 123984#160] nfs > netconfsoapbeep [PSH, ACK] Seq=335889396 Ack=3432155397 Win=33120 Len=0 TSval=3002711016 TSecr=253665 124304 0.002377 335889396   
2015-07-15 05:01:26.603887  12.313244 client -> server TCP 66 [TCP Dup ACK 123985#160] netconfsoapbeep > nfs [ACK] Seq=3432221061 Ack=335889396 Win=501 Len=0 TSval=253996 TSecr=3002711338 124305 0.000021 3432221061   
2015-07-15 05:01:26.604202  12.313559 server -> client TCP 66 [TCP Dup ACK 123984#161] nfs > netconfsoapbeep [PSH, ACK] Seq=335889396 Ack=3432155397 Win=33120 Len=0 TSval=3002711016 TSecr=253665 124306 0.000315 335889396   
2015-07-15 05:01:26.604215  12.313572 client -> server TCP 66 [TCP Dup ACK 123985#161] netconfsoapbeep > nfs [ACK] Seq=3432221061 Ack=335889396 Win=501 Len=0 TSval=253997 TSecr=3002711338 124307 0.000013 3432221061

Followed by a retransmission:

2015-07-15 05:01:26.607051  12.316408 client -> server RPC 1434 [TCP Fast Retransmission] Continuation 124308 0.002836 3432155397 3432156765  
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.