MTU Mismatches and PMTUD Black Holes

The Silent Killer of Network Connectivity: MTU Mismatches and PMTUD Black Holes

Ever found yourself in a networking nightmare where pings work perfectly, but then SSH sessions hang mysteriously, or large file transfers just die? You’ve checked your firewalls, verified your DNS, and still, something feels off. Welcome to the perplexing world of MTU Mismatches and PMTUD Black Holes, often the hidden culprits behind such frustrating network behavior.

What is MTU? (And why does it matter?)

MTU stands for Maximum Transmission Unit. It’s the largest size (in bytes) that a network interface can send in a single packet without fragmentation. For standard Ethernet networks, the MTU is typically 1500 bytes. Think of it as the maximum size of a box your network can ship in one go.

When a packet needs to traverse different network segments, it’s crucial that it doesn’t exceed the MTU of any link along its path. If a packet is too large for a given link, it must be either:

  1. Fragmented: Broken into smaller pieces by a router (IPv4 only).
  2. Dropped: If fragmentation is not allowed (common for IPv6, where hosts are expected to discover the path MTU).

The Role of PMTUD: Path MTU Discovery

To avoid fragmentation (which is inefficient) and ensure packets don’t get dropped, network devices use Path MTU Discovery (PMTUD).

  • How it works: When a device sends a packet larger than a specific link’s MTU, the router on that link should send back an ICMP message (IPv4: “Fragmentation Needed and Don’t Fragment (DF) bit set” / IPv6: “Packet Too Big”). This message tells the sending device to reduce its effective MTU for that specific path.
  • Why it’s crucial: PMTUD allows devices to dynamically determine the smallest MTU along the entire path to a destination, ensuring efficient and successful communication.

The PMTUD Black Hole: When Good Packets Go to Die

A PMTUD Black Hole occurs when PMTUD fails. This happens if:

  1. ICMP messages are blocked: A firewall (or an overloaded router) along the path mistakenly filters or drops the crucial ICMP “Packet Too Big” or “Fragmentation Needed” messages.
  2. Asymmetric routing: The return path for ICMP messages differs from the data path, causing the messages to never reach the original sender.

The Symptom: Your device tries to send a packet larger than the actual Path MTU. The intermediate router drops it and tries to send an ICMP error message, but that message is blocked. The sending device never learns to send smaller packets, so it keeps sending oversized packets, which keep getting dropped. The connection then hangs indefinitely, or until a timeout occurs (often 2 minutes for SSH). Pings (which are usually small) will continue to work fine, making the problem even more perplexing.

Troubleshooting an MTU Mismatch / PMTUD Black Hole

When you suspect an MTU issue, here’s a systematic approach:

  1. Identify the Problem:
    • “Ping works, but SSH/HTTP/large transfers hang/fail”: This is the classic symptom.
    • Timeouts: Note the exact timeout duration. Common timeouts (e.g., 2 minutes for SSH, 30-60 seconds for HTTP) can be clues.
    • ssh -vvv output: Look for hangs after the initial SSH2_MSG_KEXINIT exchange. This suggests the KEX packets are too large.
  2. Determine the Path MTU:
    • From your client: Use ping with the “Don’t Fragment” (DF) bit set and a specific packet size.
      • Linux/macOS: ping -D -s <packet_size> <destination_IP> (IPv4) or ping6 -D -s <packet_size> <destination_IP> (IPv6). Start with 1500, then decrease by 10 or 20 until you find the largest size that successfully returns. (Remember packet_size is data payload, add 28 bytes for IP/ICMP header for total packet size).
      • Windows: ping -f -l <packet_size> <destination_IP> (IPv4 only).
    • From the server: Repeat the above from the server to your client.
  3. Check for Common MTU Reductions:
    • Are you using a VPN (WireGuard, OpenVPN, IPSec)? These typically reduce MTU (e.g., 1420-1472 bytes).
    • Are you using a virtualized network (VMs, containers, cloud instances)? Hypervisors or network overlays (VXLAN, GRE) can add headers, reducing the effective MTU. A 1450 MTU for a virtual DMZ is a prime example.
  4. Inspect Router/Firewall Rules (ICMP Filtering):
    • Review your firewall rules (e.g., OPNsense, pfSense, Cisco, Juniper). Ensure that ICMP Fragmentation Needed (Type 3, Code 4 for IPv4) and ICMPv6 Packet Too Big (Type 2 for IPv6) messages are explicitly allowed or that your firewall is configured to handle PMTUD correctly. This is often overlooked.
  5. Adjust MTU on Involved Interfaces:
    • Once you’ve identified the optimal (or a safe) MTU, configure it on the network interfaces of the communicating hosts.
    • Restart networking: sudo systemctl restart networking or sudo ifdown <interface>; sudo ifup <interface>.

By understanding the mechanics of MTU and PMTUD, and systematically troubleshooting, you can conquer these insidious network issues that often leave engineers scratching their heads. In the world of networking, sometimes the smallest details (like packet size) can make the biggest difference!

🧪 Use MSS clamping (firewall workaround)

In OPNsense:

  1. Go to Firewall > Rules > [LAN]
  2. Edit the rule that matches LAN→DMZ
  3. Expand Advanced Options
  4. Set:
    • TCP flags: SYN
    • Max MSS: 1410 (or 40–60 bytes less than your MTU)

This rewrites MSS in SYN packets to avoid fragmentation entirely.

WordPress Appliance - Powered by TurnKey Linux