netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/1] wireguard updates for 6.16, part 2, late
@ 2025-05-30  3:04 Jason A. Donenfeld
  2025-05-30  3:04 ` [PATCH net-next 1/1] wireguard: device: enable threaded NAPI Jason A. Donenfeld
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Jason A. Donenfeld @ 2025-05-30  3:04 UTC (permalink / raw)
  To: netdev, kuba, pabeni; +Cc: Jason A. Donenfeld

Hey Jakub/Paolo,

This one patch missed the cut off of the series I sent last week for
net-next. It's a oneliner, almost trivial, and I suppose it could be a
"net" patch, but we're still in the merge window. I was hoping that if
you're planning on doing a net-next part 2 pull, you might include this.
If not, I'll send it later in 6.16 as a "net" patch.

Thanks,
Jason

Mirco Barone (1):
  wireguard: device: enable threaded NAPI

 drivers/net/wireguard/device.c | 1 +
 1 file changed, 1 insertion(+)

-- 
2.48.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-next 1/1] wireguard: device: enable threaded NAPI
  2025-05-30  3:04 [PATCH net-next 0/1] wireguard updates for 6.16, part 2, late Jason A. Donenfeld
@ 2025-05-30  3:04 ` Jason A. Donenfeld
  2025-06-03  8:25 ` [PATCH net-next 0/1] wireguard updates for 6.16, part 2, late Paolo Abeni
  2025-06-05 15:10 ` [PATCH net-next 0/1] wireguard updates for 6.16, part 2, late patchwork-bot+netdevbpf
  2 siblings, 0 replies; 7+ messages in thread
From: Jason A. Donenfeld @ 2025-05-30  3:04 UTC (permalink / raw)
  To: netdev, kuba, pabeni; +Cc: Mirco Barone, Jason A. Donenfeld

From: Mirco Barone <mirco.barone@polito.it>

Enable threaded NAPI by default for WireGuard devices in response to low
performance behavior that we observed when multiple tunnels (and thus
multiple wg devices) are deployed on a single host.  This affects any
kind of multi-tunnel deployment, regardless of whether the tunnels share
the same endpoints or not (i.e., a VPN concentrator type of gateway
would also be affected).

The problem is caused by the fact that, in case of a traffic surge that
involves multiple tunnels at the same time, the polling of the NAPI
instance of all these wg devices tends to converge onto the same core,
causing underutilization of the CPU and bottlenecking performance.

This happens because NAPI polling is hosted by default in softirq
context, but the WireGuard driver only raises this softirq after the rx
peer queue has been drained, which doesn't happen during high traffic.
In this case, the softirq already active on a core is reused instead of
raising a new one.

As a result, once two or more tunnel softirqs have been scheduled on
the same core, they remain pinned there until the surge ends.

In our experiments, this almost always leads to all tunnel NAPIs being
handled on a single core shortly after a surge begins, limiting
scalability to less than 3× the performance of a single tunnel, despite
plenty of unused CPU cores being available.

The proposed mitigation is to enable threaded NAPI for all WireGuard
devices. This moves the NAPI polling context to a dedicated per-device
kernel thread, allowing the scheduler to balance the load across all
available cores.

On our 32-core gateways, enabling threaded NAPI yields a ~4× performance
improvement with 16 tunnels, increasing throughput from ~13 Gbps to
~48 Gbps. Meanwhile, CPU usage on the receiver (which is the bottleneck)
jumps from 20% to 100%.

We have found no performance regressions in any scenario we tested.
Single-tunnel throughput remains unchanged.

More details are available in our Netdev paper.

Link: https://netdevconf.info/0x18/docs/netdev-0x18-paper23-talk-paper.pdf
Signed-off-by: Mirco Barone <mirco.barone@polito.it>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 drivers/net/wireguard/device.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireguard/device.c b/drivers/net/wireguard/device.c
index 3ffeeba5dccf..4a529f1f9bea 100644
--- a/drivers/net/wireguard/device.c
+++ b/drivers/net/wireguard/device.c
@@ -366,6 +366,7 @@ static int wg_newlink(struct net_device *dev,
 	if (ret < 0)
 		goto err_free_handshake_queue;
 
+	dev_set_threaded(dev, true);
 	ret = register_netdevice(dev);
 	if (ret < 0)
 		goto err_uninit_ratelimiter;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 0/1] wireguard updates for 6.16, part 2, late
  2025-05-30  3:04 [PATCH net-next 0/1] wireguard updates for 6.16, part 2, late Jason A. Donenfeld
  2025-05-30  3:04 ` [PATCH net-next 1/1] wireguard: device: enable threaded NAPI Jason A. Donenfeld
@ 2025-06-03  8:25 ` Paolo Abeni
  2025-06-03  8:30   ` Paolo Abeni
  2025-06-05 15:10 ` [PATCH net-next 0/1] wireguard updates for 6.16, part 2, late patchwork-bot+netdevbpf
  2 siblings, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2025-06-03  8:25 UTC (permalink / raw)
  To: Jason A. Donenfeld, netdev, kuba

On 5/30/25 5:04 AM, Jason A. Donenfeld wrote:
> This one patch missed the cut off of the series I sent last week for
> net-next. It's a oneliner, almost trivial, and I suppose it could be a
> "net" patch, but we're still in the merge window. I was hoping that if
> you're planning on doing a net-next part 2 pull, you might include this.
> If not, I'll send it later in 6.16 as a "net" patch.

We usually (always AFAIR) send a single PR for net-next, mostly because
there is no additional material due to net-next being closed in the
merge window.

Anyhow I can apply directly this patch to the net tree and it will be
included in this week net PR.

Side note: no need for the cover letter for a single patch series,
unless it's a formal PR.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 0/1] wireguard updates for 6.16, part 2, late
  2025-06-03  8:25 ` [PATCH net-next 0/1] wireguard updates for 6.16, part 2, late Paolo Abeni
@ 2025-06-03  8:30   ` Paolo Abeni
  2025-06-05 12:06     ` [PATCH net-next v2 1/1] wireguard: device: enable threaded NAPI Jason A. Donenfeld
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2025-06-03  8:30 UTC (permalink / raw)
  To: Jason A. Donenfeld, netdev, kuba

On 6/3/25 10:25 AM, Paolo Abeni wrote:
> On 5/30/25 5:04 AM, Jason A. Donenfeld wrote:
>> This one patch missed the cut off of the series I sent last week for
>> net-next. It's a oneliner, almost trivial, and I suppose it could be a
>> "net" patch, but we're still in the merge window. I was hoping that if
>> you're planning on doing a net-next part 2 pull, you might include this.
>> If not, I'll send it later in 6.16 as a "net" patch.
> 
> We usually (always AFAIR) send a single PR for net-next, mostly because
> there is no additional material due to net-next being closed in the
> merge window.
> 
> Anyhow I can apply directly this patch to the net tree and it will be
> included in this week net PR.

I'm sorry, I rushed my reply a bit. Could you please provide a suitable
Fixes: tag for this patch?

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-next v2 1/1] wireguard: device: enable threaded NAPI
  2025-06-03  8:30   ` Paolo Abeni
@ 2025-06-05 12:06     ` Jason A. Donenfeld
  2025-06-05 15:10       ` patchwork-bot+netdevbpf
  0 siblings, 1 reply; 7+ messages in thread
From: Jason A. Donenfeld @ 2025-06-05 12:06 UTC (permalink / raw)
  To: pabeni, netdev; +Cc: Mirco Barone, Jason A. Donenfeld

From: Mirco Barone <mirco.barone@polito.it>

Enable threaded NAPI by default for WireGuard devices in response to low
performance behavior that we observed when multiple tunnels (and thus
multiple wg devices) are deployed on a single host.  This affects any
kind of multi-tunnel deployment, regardless of whether the tunnels share
the same endpoints or not (i.e., a VPN concentrator type of gateway
would also be affected).

The problem is caused by the fact that, in case of a traffic surge that
involves multiple tunnels at the same time, the polling of the NAPI
instance of all these wg devices tends to converge onto the same core,
causing underutilization of the CPU and bottlenecking performance.

This happens because NAPI polling is hosted by default in softirq
context, but the WireGuard driver only raises this softirq after the rx
peer queue has been drained, which doesn't happen during high traffic.
In this case, the softirq already active on a core is reused instead of
raising a new one.

As a result, once two or more tunnel softirqs have been scheduled on
the same core, they remain pinned there until the surge ends.

In our experiments, this almost always leads to all tunnel NAPIs being
handled on a single core shortly after a surge begins, limiting
scalability to less than 3× the performance of a single tunnel, despite
plenty of unused CPU cores being available.

The proposed mitigation is to enable threaded NAPI for all WireGuard
devices. This moves the NAPI polling context to a dedicated per-device
kernel thread, allowing the scheduler to balance the load across all
available cores.

On our 32-core gateways, enabling threaded NAPI yields a ~4× performance
improvement with 16 tunnels, increasing throughput from ~13 Gbps to
~48 Gbps. Meanwhile, CPU usage on the receiver (which is the bottleneck)
jumps from 20% to 100%.

We have found no performance regressions in any scenario we tested.
Single-tunnel throughput remains unchanged.

More details are available in our Netdev paper.

Link: https://netdevconf.info/0x18/docs/netdev-0x18-paper23-talk-paper.pdf
Signed-off-by: Mirco Barone <mirco.barone@polito.it>
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Changes v1->v2:
- Add Fixes tag.

 drivers/net/wireguard/device.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireguard/device.c b/drivers/net/wireguard/device.c
index 3ffeeba5dccf..4a529f1f9bea 100644
--- a/drivers/net/wireguard/device.c
+++ b/drivers/net/wireguard/device.c
@@ -366,6 +366,7 @@ static int wg_newlink(struct net_device *dev,
 	if (ret < 0)
 		goto err_free_handshake_queue;
 
+	dev_set_threaded(dev, true);
 	ret = register_netdevice(dev);
 	if (ret < 0)
 		goto err_uninit_ratelimiter;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 0/1] wireguard updates for 6.16, part 2, late
  2025-05-30  3:04 [PATCH net-next 0/1] wireguard updates for 6.16, part 2, late Jason A. Donenfeld
  2025-05-30  3:04 ` [PATCH net-next 1/1] wireguard: device: enable threaded NAPI Jason A. Donenfeld
  2025-06-03  8:25 ` [PATCH net-next 0/1] wireguard updates for 6.16, part 2, late Paolo Abeni
@ 2025-06-05 15:10 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 7+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-06-05 15:10 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: netdev, kuba, pabeni

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 30 May 2025 05:04:57 +0200 you wrote:
> Hey Jakub/Paolo,
> 
> This one patch missed the cut off of the series I sent last week for
> net-next. It's a oneliner, almost trivial, and I suppose it could be a
> "net" patch, but we're still in the merge window. I was hoping that if
> you're planning on doing a net-next part 2 pull, you might include this.
> If not, I'll send it later in 6.16 as a "net" patch.
> 
> [...]

Here is the summary with links:
  - [net-next,1/1] wireguard: device: enable threaded NAPI
    https://git.kernel.org/netdev/net/c/db9ae3b6b43c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next v2 1/1] wireguard: device: enable threaded NAPI
  2025-06-05 12:06     ` [PATCH net-next v2 1/1] wireguard: device: enable threaded NAPI Jason A. Donenfeld
@ 2025-06-05 15:10       ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 7+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-06-05 15:10 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: pabeni, netdev, mirco.barone

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu,  5 Jun 2025 14:06:16 +0200 you wrote:
> From: Mirco Barone <mirco.barone@polito.it>
> 
> Enable threaded NAPI by default for WireGuard devices in response to low
> performance behavior that we observed when multiple tunnels (and thus
> multiple wg devices) are deployed on a single host.  This affects any
> kind of multi-tunnel deployment, regardless of whether the tunnels share
> the same endpoints or not (i.e., a VPN concentrator type of gateway
> would also be affected).
> 
> [...]

Here is the summary with links:
  - [net-next,v2,1/1] wireguard: device: enable threaded NAPI
    https://git.kernel.org/netdev/net/c/db9ae3b6b43c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-06-05 15:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-30  3:04 [PATCH net-next 0/1] wireguard updates for 6.16, part 2, late Jason A. Donenfeld
2025-05-30  3:04 ` [PATCH net-next 1/1] wireguard: device: enable threaded NAPI Jason A. Donenfeld
2025-06-03  8:25 ` [PATCH net-next 0/1] wireguard updates for 6.16, part 2, late Paolo Abeni
2025-06-03  8:30   ` Paolo Abeni
2025-06-05 12:06     ` [PATCH net-next v2 1/1] wireguard: device: enable threaded NAPI Jason A. Donenfeld
2025-06-05 15:10       ` patchwork-bot+netdevbpf
2025-06-05 15:10 ` [PATCH net-next 0/1] wireguard updates for 6.16, part 2, late patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).