From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Daniel Borkmann <daniel@iogearbox.net>,
"Jason A. Donenfeld" <Jason@zx2c4.com>
Subject: [PATCH 6.6 23/39] Revert "wireguard: device: enable threaded NAPI"
Date: Tue, 17 Feb 2026 21:30:45 +0100 [thread overview]
Message-ID: <20260217200005.106779227@linuxfoundation.org> (raw)
In-Reply-To: <20260217200004.221651386@linuxfoundation.org>
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Borkmann <daniel@iogearbox.net>
This reverts commit 8c9e9cd398777fd60ba202211da1110614cb5bc5 which is
commit db9ae3b6b43c79b1ba87eea849fd65efa05b4b2e upstream.
We have had three independent production user reports in combination
with Cilium utilizing WireGuard as encryption underneath that k8s Pod
E/W traffic to certain peer nodes fully stalled. The situation appears
as follows:
- Occurs very rarely but at random times under heavy networking load.
- Once the issue triggers the decryption side stops working completely
for that WireGuard peer, other peers keep working fine. The stall
happens also for newly initiated connections towards that particular
WireGuard peer.
- Only the decryption side is affected, never the encryption side.
- Once it triggers, it never recovers and remains in this state,
the CPU/mem on that node looks normal, no leak, busy loop or crash.
- bpftrace on the affected system shows that wg_prev_queue_enqueue
fails, thus the MAX_QUEUED_PACKETS (1024 skbs!) for the peer's
rx_queue is reached.
- Also, bpftrace shows that wg_packet_rx_poll for that peer is never
called again after reaching this state for that peer. For other
peers wg_packet_rx_poll does get called normally.
- Commit db9ae3b ("wireguard: device: enable threaded NAPI")
switched WireGuard to threaded NAPI by default. The default has
not been changed for triggering the issue, neither did CPU
hotplugging occur (i.e. 5bd8de2 ("wireguard: queueing: always
return valid online CPU in wg_cpumask_choose_online()")).
- The issue has been observed with stable kernels of v5.15 as well as
v6.1. It was reported to us that v5.10 stable is working fine, and
no report on v6.6 stable either (somewhat related discussion in [0]
though).
- In the WireGuard driver the only material difference between v5.10
stable and v5.15 stable is the switch to threaded NAPI by default.
[0] https://lore.kernel.org/netdev/CA+wXwBTT74RErDGAnj98PqS=wvdh8eM1pi4q6tTdExtjnokKqA@mail.gmail.com/
Breakdown of the problem:
1) skbs arriving for decryption are enqueued to the peer->rx_queue in
wg_packet_consume_data via wg_queue_enqueue_per_device_and_peer.
2) The latter only moves the skb into the MPSC peer queue if it does
not surpass MAX_QUEUED_PACKETS (1024) which is kept track in an
atomic counter via wg_prev_queue_enqueue.
3) In case enqueueing was successful, the skb is also queued up
in the device queue, round-robin picks a next online CPU, and
schedules the decryption worker.
4) The wg_packet_decrypt_worker, once scheduled, picks these up
from the queue, decrypts the packets and once done calls into
wg_queue_enqueue_per_peer_rx.
5) The latter updates the state to PACKET_STATE_CRYPTED on success
and calls napi_schedule on the per peer->napi instance.
6) NAPI then polls via wg_packet_rx_poll. wg_prev_queue_peek checks
on the peer->rx_queue. It will wg_prev_queue_dequeue if the
queue->peeked skb was not cached yet, or just return the latter
otherwise. (wg_prev_queue_drop_peeked later clears the cache.)
7) From an ordering perspective, the peer->rx_queue has skbs in order
while the device queue with the per-CPU worker threads from a
global ordering PoV can finish the decryption and signal the skb
PACKET_STATE_CRYPTED out of order.
8) A situation can be observed that the first packet coming in will
be stuck waiting for the decryption worker to be scheduled for
a longer time when the system is under pressure.
9) While this is the case, the other CPUs in the meantime finish
decryption and call into napi_schedule.
10) Now in wg_packet_rx_poll it picks up the first in-order skb
from the peer->rx_queue and sees that its state is still
PACKET_STATE_UNCRYPTED. The NAPI poll routine then exits early
with work_done = 0 and calls napi_complete_done, signalling
it "finished" processing.
11) The assumption in wg_packet_decrypt_worker is that when the
decryption finished the subsequent napi_schedule will always
lead to a later invocation of wg_packet_rx_poll to pick up
the finished packet.
12) However, it appears that a later napi_schedule does /not/
schedule a later poll and thus no wg_packet_rx_poll.
13) If this situation happens exactly for the corner case where
the decryption worker of the first packet is stuck and waiting
to be scheduled, and the network load for WireGuard is very
high then the queue can build up to MAX_QUEUED_PACKETS.
14) If this situation occurs, then no new decryption worker will
be scheduled and also no new napi_schedule to make forward
progress.
15) This means the peer->rx_queue stops processing packets completely
and they are indefinitely stuck waiting for a new NAPI poll on
that peer which never happens. New packets for that peer are
then dropped due to full queue, as it has been observed on the
production machines.
Technically, the backport of commit db9ae3b6b43c ("wireguard: device:
enable threaded NAPI") to stable should not have happened since it is
more of an optimization rather than a pure fix and addresses a NAPI
situation with utilizing many WireGuard tunnel devices in parallel.
Revert it from stable given the backport triggers a regression for
mentioned kernels.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/net/wireguard/device.c | 1 -
1 file changed, 1 deletion(-)
--- a/drivers/net/wireguard/device.c
+++ b/drivers/net/wireguard/device.c
@@ -369,7 +369,6 @@ static int wg_newlink(struct net *src_ne
if (ret < 0)
goto err_free_handshake_queue;
- dev_set_threaded(dev, true);
ret = register_netdevice(dev);
if (ret < 0)
goto err_uninit_ratelimiter;
next prev parent reply other threads:[~2026-02-17 20:40 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-17 20:30 [PATCH 6.6 00/39] 6.6.127-rc1 review Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 01/39] scsi: qla2xxx: Fix bsg_done() causing double free Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 02/39] PCI: endpoint: Remove unused field in struct pci_epf_group Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 03/39] PCI: endpoint: Avoid creating sub-groups asynchronously Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 04/39] bus: fsl-mc: Replace snprintf and sprintf with sysfs_emit in sysfs show functions Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 05/39] bus: fsl-mc: fix use-after-free in driver_override_show() Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 06/39] ALSA: hda/realtek: Fix headset mic for TongFang X6AR55xU Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 07/39] ASoC: amd: yc: Add ASUS ExpertBook PM1503CDA to quirks list Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 08/39] gpio: sprd: Change sprd_gpio lock to raw_spin_lock Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 09/39] ALSA: hda/realtek: Add quirk for Inspur S14-G1 Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 10/39] ASoC: cs35l45: Corrects ASP_TX5 DAPM widget channel Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 11/39] romfs: check sb_set_blocksize() return value Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 12/39] =?UTF-8?q?drm/tegra:=20hdmi:=20sor:=20Fix=20error:=20variable=20?= =?UTF-8?q?=E2=80=98j=E2=80=99=20set=20but=20not=20used?= Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 13/39] platform/x86: classmate-laptop: Add missing NULL pointer checks Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 14/39] ASoC: Intel: sof_es8336: Add DMI quirk for Huawei BOD-WXX9 Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 15/39] ASoC: amd: yc: Add quirk for HP 200 G2a 16 Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 16/39] platform/x86/amd/pmc: Add quirk for MECHREVO Wujie 15X Pro Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 17/39] platform/x86: panasonic-laptop: Fix sysfs group leak in error path Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 18/39] ASoC: cs42l43: Correct handling of 3-pole jack load detection Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 19/39] ASoC: fsl_xcvr: fix missing lock in fsl_xcvr_mode_put() Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 20/39] gpiolib: acpi: Fix gpio count with string references Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 21/39] LoongArch: Add WriteCombine shadow mapping in KASAN Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 22/39] LoongArch: Rework KASAN initialization for PTW-enabled systems Greg Kroah-Hartman
2026-02-17 20:30 ` Greg Kroah-Hartman [this message]
2026-02-17 20:30 ` [PATCH 6.6 24/39] mm/hugetlb: fix copy_hugetlb_page_range() to use ->pt_share_count Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 25/39] mm/hugetlb: fix hugetlb_pmd_shared() Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 26/39] mm/hugetlb: fix two comments related to huge_pmd_unshare() Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 27/39] mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 28/39] cpuset: Fix missing adaptation for cpuset_is_populated Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 29/39] LoongArch: Add writecombine support for DMW-based ioremap() Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 30/39] fbdev: rivafb: fix divide error in nv3_arb() Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 31/39] fbdev: smscufx: properly copy ioctl memory to kernelspace Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 32/39] f2fs: fix to add gc count stat in f2fs_gc_range Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 33/39] f2fs: fix out-of-bounds access in sysfs attribute read/write Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 34/39] f2fs: fix IS_CHECKPOINTED flag inconsistency issue caused by concurrent atomic commit and checkpoint writes Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 35/39] f2fs: fix to avoid UAF in f2fs_write_end_io() Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 36/39] f2fs: fix zoned block device information initialization Greg Kroah-Hartman
2026-02-17 20:30 ` [PATCH 6.6 37/39] f2fs: fix to avoid mapping wrong physical block for swapfile Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 6.6 38/39] USB: serial: option: add Telit FN920C04 RNDIS compositions Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 6.6 39/39] net: tunnel: make skb_vlan_inet_prepare() return drop reasons Greg Kroah-Hartman
2026-02-17 22:40 ` [PATCH 6.6 00/39] 6.6.127-rc1 review Florian Fainelli
2026-02-18 1:51 ` Peter Schneider
2026-02-18 8:22 ` Jon Hunter
2026-02-18 9:09 ` Brett A C Sheffield
2026-02-18 11:57 ` Mark Brown
2026-02-18 13:00 ` Francesco Dolcini
2026-02-19 6:38 ` Ron Economos
2026-02-19 12:29 ` Miguel Ojeda
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260217200005.106779227@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=Jason@zx2c4.com \
--cc=daniel@iogearbox.net \
--cc=patches@lists.linux.dev \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox