TCP connection failures due to dropped SYN-ACK packets
Environment
- Red Hat Enterprise Linux 6.4 and 6.5
- Cisco WS-C4948-10GE edge switch running
cat4500-entservicesk9-mz.122-31.SGA9.binwith QoS and/or traffic shaping enabled; no layer 4 management functions enabled - Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet PCIe - drv tg3 v3.124 / fw FFV7.2.20 bc 5720-v1.25
- Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet PCIe - drv tg3 v3.132 / fw FFV7.2.20 bc 5720-v1.25
Issue
The SYN-ACK, the second packet in the TCP 3-way handshake, occasionally appears to be dropped from a Red Hat Enterprise Linux client when connecting to various other servers.
The packet:
- does not show up in
tcpdumpon the bond or physical interfaces - does not show up in
netstatstatistics captured close to the event or/proc/net/dev - is apparently dropped whether NIC offloading options are enabled or disabled
- is apparently dropped with bonding and firewalls enabled or disabled
- is apparently dropped with old or new firmware on the NIC
- is apparently dropped even after network cards are replaced
Network traces from a sniffer connected to a span port on the edge switch show the packets, but they don't ever seem to reach the kernel of the server.
Resolution
Engage your switch hardware vendor's support. It is unclear as yet whether the issue goes away with a reload or is a physical failure of the switch.
Root Cause
An issue internal to the switch was occasionally causing a SYN-ACK packet to be lost in specific conversations, even while hundreds of other conversations continued to happen at the time of the event. There was nothing logged in the switch's counters, nor in its event log; similarly there was no indication from the server side that the packet was ever received or dropped, because it wasn't making it down the wire to the server at all.
Diagnostic Steps
- Network captures were taken from a span port on the directly-connected switch and showed the packet even though it wasn't being transmitted to the server
- tcpdump was run on the server and never saw the
SYN-ACK
The best way to troubleshoot this would be to have a tap, or in-line sniffer between the switch and the server. This would prove that the packet truly left the switch port.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.