Sudden connection failures in TCP traffic such as SSH, NFS, CIFS with Cisco ASA firewall
Environment
- Red Hat Enterprise Linux
- Cisco ASA, seen on firmware versions
9.6(2)or9.1(7)9
- TCP connection. More likely with long-lived TCP connection such as SSH, NFS, CIFS, etc
Issue
- After a Cisco ASA firmware upgrade, connections are dropped. The Cisco ASA reports a TCP PAWS failure and drops packets:
33: 19:10:43.774236 802.1Q vlan#123 P0 10.0.0.24:22 > 192.168.0.110:60018: P 299327905:299328005(100) ack 207669419 win 339 <nop,nop,timestamp 2703 2363294310> Drop-reason: (tcp-paws-fail) TCP packet failed PAWS test
- TCP sessions like SSH appear to hang, but establishing a new session works again straight away
- NFS Client reports "nfs: server not responding" then "nfs: server OK" 10 minutes later
Resolution
Pursue a firmware fix with Cisco. The exact firmware release number which resolves this is not known at this time.
Workaround - Downgrade firewall
Return to the previous known working firmware version.
Workaround - Disable TCP Timestamps
echo "# $(date) YOUR-NAME-GOES-HERE -- disable tcp timestamps until Cisco ASA is fixed" >> /etc/sysctl.conf
echo "net.ipv4.tcp_timestamps = 0" >> /etc/sysctl.conf
sysctl -p
This should only be considered a temporary workaround. TCP Timestamps are important on high-speed networks to prevent TCP session hangs due to wrapped TCP Sequence numbers, and TCP Timestamps also help with better estimation of RTT which enables the TCP Window to grow accurately which results in better high-speed performance.
Root Cause
- Cisco ASA firmware incorrectly handling TCP Timestamp value wrap
- On Linux the TCP Timestamp value is incremented by 1 each millisecond. This will result in wrapping approximately every 50 days or less. After a reboot, the TCP Timestamp initialization is based on jiffies which are counter of CPU cycles from the boot time. This means that in some solid point in boot time this variable has always more or less the same value. However, the TCP Timestamp wrap should be handled by the firewall's PAWS logic.
- This will cause the connection to fail due to retransmissions either because the sender's data packets are dropped or the receiver's ACK packets are dropped.
- Establishing a new session causes a new TCP Timestamp value to be used, resulting in a new connection working fine.
Diagnostic Steps
The Cisco ASA log will report a PAWS failure:
33: 19:10:43.774236 802.1Q vlan#123 P0 10.0.0.24:22 > 192.168.0.110:60018: P 299327905:299328005(100) ack 207669419 win 339 <nop,nop,timestamp 2703 2363294310> Drop-reason: (tcp-paws-fail) TCP packet failed PAWS test
A packet capture on the sender will show retransmissions of data packets and no ACKs from the receiver.
If the sender's TCP Timestamp has wrapped, a packet capture on the receiver will not show these data packets from the sender. If the receiver's TCP Timestamp has wrapped, a packet capture on the receiver will show both the sender's data packets and the receiver's ACK packets.
A packet capture can show the TCP Timestamp value of the packets just before the failure is very close to the wrap value:
$ tshark -nr tcpdump.cap -O tcp "frame.number == 68734 || frame.number == 68735" | egrep TSval
Timestamps: TSval 4294965815, TSecr 1160473651
Timestamps: TSval 1160477215, TSecr 4294965815
The TCP timestamp is a 32-bit (4 byte) value, so the wrap value is:
$ printf "%d\n" 0xffffffff
4294967295
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.