Back to notes
homelab 5 April 2026

WireGuard throughput: from 5 Mbps to saturating the link

Investigating why remote users were getting 5 Mbps through a WireGuard tunnel despite 400+ Mbps available on both ends. MTU fragmentation, TCP congestion collapse, and UDP buffer exhaustion.

wireguard networking prometheus synology

Remote users accessing my self-hosted Immich instance (photos.gread.uk) were getting 2–5 Mbps throughput. The architecture is straightforward:

Client → VPS (Helsinki) → nginx TCP proxy → WireGuard tunnel → NAS (London) → Traefik → Immich

Every link in the chain had plenty of bandwidth:

  • Home upload: 433 Mbps
  • VPS: 3,415 Mbps
  • Test client in Italy: 213 Mbps

Something in the middle was the problem.

Ruling things out

SuspectFindingMethod
VPS bandwidth3,415 Mbpsspeedtest-cli on VPS
Home upload433 MbpsCloudflare speedtest via Home Assistant
NAS CPU85.5% idle during transfersNode exporter / Grafana
Disk speedHDD vs SSD made no differenceDSM Drive download test
Encrypted shareSame speed as unencryptedComparative download
Immich applicationFaster than DSM DriveComparative download

None of these were the bottleneck. The problem was in the tunnel itself.

Root cause 1: MTU fragmentation

WireGuard interfaces on both VPS and NAS had the default MTU of 1420. Without explicit MTU configuration, packet fragmentation was happening at the WireGuard boundary causing TCP retransmits.

Evidence: TCP segment retransmits visible in node exporter metrics during transfers.

Fix: set MTU = 1280 on both ends to eliminate fragmentation.

# Added to [Interface] on both VPS and NAS wg0.conf
MTU = 1280

Result: 5 Mbps → 53 Mbps (10x improvement). Better, but still well below what the link should support.

Root cause 2: TCP congestion collapse

iperf3 testing revealed asymmetric tunnel performance:

  • NAS → VPS: 325 Mbps sustained, 0 retransmits
  • VPS → NAS (uncapped): starts at 374 Mbps, collapses to ~160 Mbps average, 441 retransmits

The 60-second test showed periodic degradation every ~15–20 seconds — a cyclical bottleneck, not random congestion.

Node exporter metrics during sustained transfer:

MetricIdle baselineDuring test
Context switches~12K ops/s88–131K ops/s
Interrupts~7K ops/s46–66K ops/s
Out-of-order queued1 p/s207 p/s
CPU 2 packet processingbaseline33.7K p/s vs 3–5K p/s on other cores

Synology DSM7’s userspace WireGuard implementation pins network IRQ handling predominantly to CPU 2. Under high load, CPU 2 saturates, the packet queue fills, out-of-order packets accumulate, TCP backs off. When the queue drains, TCP recovers — hence the periodic oscillation.

Bandwidth threshold testing

CapAvg achievedRetransmitsStable?
Uncapped165 Mbps672Collapse
300 Mbps276 Mbps473Degrading
250 Mbps231 Mbps151Partial
200 Mbps200 Mbps115Short term
150 Mbps142 Mbps132Sustained 60s

150 Mbps was the sweet spot — below the IRQ saturation threshold, sustained reliably with minimal retransmits.

Fix: Linux traffic control (tc) token bucket filter on VPS wg0 interface.

tc qdisc replace dev wg0 root tbf rate 150mbit burst 1mb latency 400ms

Why rate limiting improves real-world experience

Counterintuitively, capping at 150 Mbps outperformed uncapped (mean 165 Mbps) because uncapped oscillates between 180 and 90 Mbps — video buffers stall during the collapses. Consistent throughput is more valuable than higher peak throughput for streaming workloads.

Result at this point: 5 Mbps → ~188 Mbps (37x improvement).

The second investigation: pushing further

With a structured experiment plan, I set up 120-second iperf3 baselines across four configurations (single/parallel streams, with/without tc shaper) and started working through the variables: MTU correction, RPS, tc cap tuning.

MTU correction

The 1280 MTU was overly conservative. PMTUD testing from VPS:

ping -M do -s 1380 10.8.0.2  # pass
ping -M do -s 1400 10.8.0.2  # pass
ping -M do -s 1412 10.8.0.2  # pass
ping -M do -s 1420 10.8.0.2  # 100% packet loss

Correct MTU: 1412 + 28 = 1440. The gap from 1280 to 1440 was costing ~10% throughput per packet for no benefit.

UDP receive buffer exhaustion

Going through netstat metrics and building a dedicated Grafana dashboard for these tests, I found the real remaining bottleneck: UDP buffer overflows on the NAS, tracked by the node_netstat_Udp_RcvbufErrors metric.

Synology’s defaults:

net.core.rmem_max = 212992 # 208 KB
net.core.netdev_max_backlog = 1000

208 KB receive buffers trying to handle hundreds of megabits of WireGuard UDP traffic. The fix:

sudo sysctl -w net.core.rmem_max=26214400 # 25 MB
sudo sysctl -w net.core.netdev_max_backlog=5000

The result was immediate and dramatic — fully saturated the connection. Single stream: 350+ Mbps sustained. Parallel streams: 430+ Mbps.

The Grafana dashboards tell the story clearly. Network throughput hitting the link ceiling:

Network traffic after UDP buffer fix — sustained 500+ Mbps Single capped, 4x parallel, single uncapped, 4x parallel uncapped

TCP errors remained low during the single uncapped test, but increased during the parallel tests. There was a single naughty stream that kept hitting the link ceiling and causing TCP congestion collapse for that link. 3 of 4 is pretty good in my eye.

TCP errors during testing

And retransmit segments echoing back the same data.

TCP retransmit segments

Final configuration

tc shaper on VPS raised to 450 Mbps to provide breathing room for competing parallel uploads:

tc qdisc replace dev wg0 root tbf rate 450mbit burst 32mb latency 400ms

The same UDP buffer issue existed on the VPS (identical 212 KB defaults). After fixing both ends:

  • Single stream NAS → VPS: 352 Mbps mean, 2 retransmits (down from 377)
  • Parallel streams: 433 Mbps combined

No tc shaping needed on the NAS outbound — TCP self-regulates acceptably.

Summary

StateSpeedFix
Original~5 Mbps
After MTU fix (1280)~53 MbpsEliminated fragmentation
After tc rate limit (150 Mbps)~188 MbpsPrevented IRQ saturation collapse
After MTU correction (1440)~200 MbpsRecovered per-packet overhead
After UDP buffer fix (25 MB)~450 MbpsEliminated buffer exhaustion

90x improvement from the original baseline. The final bottleneck is the physical fibre upload speed.

Monitoring

Key metrics to watch for regression:

# TCP retransmits — should stay low
rate(node_netstat_Tcp_RetransSegs[5m])

# wg0 receive throughput
rate(node_network_receive_bytes_total{device="wg0"}[$__rate_interval]) * 8

# UDP receive buffer errors — should be zero
rate(node_netstat_Udp_RcvbufErrors[5m])