Netdev List
 help / color / mirror / Atom feed
* [PATCH net] net/mlx5e: xsk: Fix unlocked writing to ICOSQ
@ 2026-05-13  6:46 Tariq Toukan
  2026-05-17 10:30 ` Simon Horman
  2026-05-18 23:20 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 3+ messages in thread
From: Tariq Toukan @ 2026-05-13  6:46 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Tariq Toukan, Mark Bloch, Leon Romanovsky,
	Tariq Toukan, Maxim Mikityanskiy, Daniel Borkmann, Saeed Mahameed,
	netdev, linux-rdma, linux-kernel, Gal Pressman, Paul Saab,
	Dragos Tatulea, Paul Saab

From: Dragos Tatulea <dtatulea@nvidia.com>

During napi poll, when the affinity changes and there's still XSK work
to be done, we trigger an ICOSQ interrupt on the new CPU. However, this
triggering on the ICOSQ is done unprotected.

There are 2 such races:

A) mlx5e_trigger_irq() is called while mlx5e_xsk_alloc_rx_mpwqe() is
running from a different CPU due to affinity change. This can happen
because IRQ triggering is done after napi_complete_done(). At this point
the NAPI can be scheduled on a different CPU. Like this:

  CPU A (old affinity, NAPI tail)    CPU B (new affinity, fresh NAPI)
  -------------------------------    --------------------------------
  napi_complete_done()  clears SCHED
  mlx5e_cq_arm(...)
                                     napi_schedule_prep() sets SCHED
                                     mlx5e_napi_poll()
                                       mlx5e_xsk_alloc_rx_mpwqe()
                                         mlx5e_icosq_sync_lock() // noop
                                         memcpy 640 B UMR body
                                         advance sq->pc by 10
  mlx5e_trigger_irq(&c->icosq)
    wqe_info[pi] = {NOP, 1}
    mlx5e_post_nop() advances sq->pc

B) mlx5e_trigger_irq() is called on the ICOSQ when
mlx5e_trigger_napi_icosq() is running.

The obvious fix would be to lock the ICOSQ. But ICOSQ has an optimized
locking scheme that doesn't work for this scenario. Kick the async ICOSQ
instead which is always locked.

This issue was noticed in the wild with the following splat:

  netdevice: ge-0-0-1: Bad OP in ICOSQ CQE: 0xd
  WARNING: drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:826 [...]
  [...]
  Call Trace:
   <IRQ>
   mlx5e_napi_poll+0x11d/0x7f0 [mlx5_core]
   __napi_poll+0x30/0x200
   ? skb_defer_free_flush+0x9c/0xc0
   net_rx_action+0x2fe/0x3f0
   handle_softirqs+0xd8/0x340
   __irq_exit_rcu+0xbc/0xe0
   common_interrupt+0x85/0xa0
   </IRQ>
   <TASK>
   asm_common_interrupt+0x26/0x40
  [...]
  ---[ end trace 0000000000000000 ]---
  mlx5_core 0000:08:00.0 ge-0-0-1: Error cqe on cqn 0x548, ci 0x2022, qn 0x8f4,
  opcode 0xd, syndrome 0x2, vendor syndrome 0x68
  00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00000030: 00 00 00 00 01 00 68 02 01 00 08 f4 de 14 59 d2
  WQE DUMP: WQ size 16384 WQ cur size 0, WQE index 0x1e14, len: 64
  00000000: 00 00 00 01 d9 ed 80 02 00 00 00 01 d9 ed 90 02
  00000010: 00 00 00 01 d9 ed a0 02 00 00 00 01 d9 ed b0 02
  00000020: 00 00 00 01 d9 ed c0 02 00 00 00 01 d9 ed d0 02
  00000030: 00 00 00 01 d9 ed e0 02 00 00 00 01 d9 ed f0 02
  mlx5_core 0000:08:00.0 ge-0-0-1: Error cqe on cqn 0x548, ci 0x2023, qn 0x8f4,
  opcode 0xd, syndrome 0x5, vendor syndrome 0xf9
  00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00000030: 00 00 00 00 01 00 f9 05 01 00 08 f4 de 15 cf d2

Fixes: db05815b36cb ("net/mlx5e: Add XSK zero-copy support")
Reported-by: Paul Saab <ps@mu.org>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index b31f689fe271..e90c6c6df835 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -252,7 +252,7 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
 		mlx5e_cq_arm(&c->xdpsq->cq);
 
 	if (unlikely(aff_change && busy_xsk)) {
-		mlx5e_trigger_irq(&c->icosq);
+		mlx5e_trigger_napi_async_icosq(c);
 		ch_stats->force_irq++;
 	}
 

base-commit: f5b2772d14884f4be9e718644f1203d4d0e6f0d6
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net] net/mlx5e: xsk: Fix unlocked writing to ICOSQ
  2026-05-13  6:46 [PATCH net] net/mlx5e: xsk: Fix unlocked writing to ICOSQ Tariq Toukan
@ 2026-05-17 10:30 ` Simon Horman
  2026-05-18 23:20 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 3+ messages in thread
From: Simon Horman @ 2026-05-17 10:30 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller, Saeed Mahameed, Mark Bloch, Leon Romanovsky,
	Tariq Toukan, Maxim Mikityanskiy, Daniel Borkmann, Saeed Mahameed,
	netdev, linux-rdma, linux-kernel, Gal Pressman, Paul Saab,
	Dragos Tatulea, Paul Saab

On Wed, May 13, 2026 at 09:46:13AM +0300, Tariq Toukan wrote:
> From: Dragos Tatulea <dtatulea@nvidia.com>
> 
> During napi poll, when the affinity changes and there's still XSK work
> to be done, we trigger an ICOSQ interrupt on the new CPU. However, this
> triggering on the ICOSQ is done unprotected.
> 
> There are 2 such races:
> 
> A) mlx5e_trigger_irq() is called while mlx5e_xsk_alloc_rx_mpwqe() is
> running from a different CPU due to affinity change. This can happen
> because IRQ triggering is done after napi_complete_done(). At this point
> the NAPI can be scheduled on a different CPU. Like this:
> 
>   CPU A (old affinity, NAPI tail)    CPU B (new affinity, fresh NAPI)
>   -------------------------------    --------------------------------
>   napi_complete_done()  clears SCHED
>   mlx5e_cq_arm(...)
>                                      napi_schedule_prep() sets SCHED
>                                      mlx5e_napi_poll()
>                                        mlx5e_xsk_alloc_rx_mpwqe()
>                                          mlx5e_icosq_sync_lock() // noop
>                                          memcpy 640 B UMR body
>                                          advance sq->pc by 10
>   mlx5e_trigger_irq(&c->icosq)
>     wqe_info[pi] = {NOP, 1}
>     mlx5e_post_nop() advances sq->pc
> 
> B) mlx5e_trigger_irq() is called on the ICOSQ when
> mlx5e_trigger_napi_icosq() is running.
> 
> The obvious fix would be to lock the ICOSQ. But ICOSQ has an optimized
> locking scheme that doesn't work for this scenario. Kick the async ICOSQ
> instead which is always locked.

...

> Fixes: db05815b36cb ("net/mlx5e: Add XSK zero-copy support")
> Reported-by: Paul Saab <ps@mu.org>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net] net/mlx5e: xsk: Fix unlocked writing to ICOSQ
  2026-05-13  6:46 [PATCH net] net/mlx5e: xsk: Fix unlocked writing to ICOSQ Tariq Toukan
  2026-05-17 10:30 ` Simon Horman
@ 2026-05-18 23:20 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 3+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-05-18 23:20 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: edumazet, kuba, pabeni, andrew+netdev, davem, saeedm, mbloch,
	leon, tariqt, maxtram95, daniel, saeedm, netdev, linux-rdma,
	linux-kernel, gal, ps, dtatulea, ps

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 13 May 2026 09:46:13 +0300 you wrote:
> From: Dragos Tatulea <dtatulea@nvidia.com>
> 
> During napi poll, when the affinity changes and there's still XSK work
> to be done, we trigger an ICOSQ interrupt on the new CPU. However, this
> triggering on the ICOSQ is done unprotected.
> 
> There are 2 such races:
> 
> [...]

Here is the summary with links:
  - [net] net/mlx5e: xsk: Fix unlocked writing to ICOSQ
    https://git.kernel.org/netdev/net/c/c326f9c68921

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-18 23:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-13  6:46 [PATCH net] net/mlx5e: xsk: Fix unlocked writing to ICOSQ Tariq Toukan
2026-05-17 10:30 ` Simon Horman
2026-05-18 23:20 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox