* [PATCH net] net/mlx5e: XSK, Fix unintended ICOSQ change
@ 2026-02-17 7:45 Tariq Toukan
2026-02-17 16:48 ` Alice Mikityanska
2026-02-19 1:10 ` patchwork-bot+netdevbpf
0 siblings, 2 replies; 3+ messages in thread
From: Tariq Toukan @ 2026-02-17 7:45 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, netdev, linux-rdma, linux-kernel, bpf,
Gal Pressman, Moshe Shemesh, Alice Mikityanska, William Tu,
David Wei, Dragos Tatulea
XSK wakeup must use the async ICOSQ (with proper locking), as it is not
guaranteed to run on the same CPU as the channel.
The commit that converted the NAPI trigger path to use the sync ICOSQ
incorrectly applied the same change to XSK, causing XSK wakeups to use
the sync ICOSQ as well. Revert XSK flows to use the async ICOSQ.
XDP program attach/detach triggers channel reopen, while XSK pool
enable/disable can happen on-the-fly via NDOs without reopening
channels. As a result, xsk_pool state cannot be reliably used at
mlx5e_open_channel() time to decide whether an async ICOSQ is needed.
Update the async_icosq_needed logic to depend on the presence of an XDP
program rather than the xsk_pool, ensuring the async ICOSQ is available
when XSK wakeups are enabled.
This fixes multiple issues:
1. Illegal synchronize_rcu() in an RCU read- side critical section via
mlx5e_xsk_wakeup() -> mlx5e_trigger_napi_icosq() ->
synchronize_net(). The stack holds RCU read-lock in xsk_poll().
2. Hitting a NULL pointer dereference in mlx5e_xsk_wakeup():
[] BUG: kernel NULL pointer dereference, address: 0000000000000240
[] #PF: supervisor read access in kernel mode
[] #PF: error_code(0x0000) - not-present page
[] PGD 0 P4D 0
[] Oops: Oops: 0000 [#1] SMP
[] CPU: 0 UID: 0 PID: 2255 Comm: qemu-system-x86 Not tainted 6.19.0-rc5+ #229 PREEMPT(none)
[] Hardware name: [...]
[] RIP: 0010:mlx5e_xsk_wakeup+0x53/0x90 [mlx5_core]
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Closes: https://lore.kernel.org/all/20260123223916.361295-1-daniel@iogearbox.net/
Fixes: 56aca3e0f730 ("net/mlx5e: Use regular ICOSQ for triggering NAPI")
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en.h | 1 +
.../ethernet/mellanox/mlx5/core/en/xsk/pool.c | 4 ++--
.../ethernet/mellanox/mlx5/core/en/xsk/tx.c | 2 +-
.../net/ethernet/mellanox/mlx5/core/en_main.c | 24 +++++++++++++------
4 files changed, 21 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index a7de3a3efc49..19fce51117c9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -1103,6 +1103,7 @@ int mlx5e_open_locked(struct net_device *netdev);
int mlx5e_close_locked(struct net_device *netdev);
void mlx5e_trigger_napi_icosq(struct mlx5e_channel *c);
+void mlx5e_trigger_napi_async_icosq(struct mlx5e_channel *c);
void mlx5e_trigger_napi_sched(struct napi_struct *napi);
int mlx5e_open_channels(struct mlx5e_priv *priv,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
index db776e515b6a..5c5360a25c64 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
@@ -127,7 +127,7 @@ static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
goto err_remove_pool;
mlx5e_activate_xsk(c);
- mlx5e_trigger_napi_icosq(c);
+ mlx5e_trigger_napi_async_icosq(c);
/* Don't wait for WQEs, because the newer xdpsock sample doesn't provide
* any Fill Ring entries at the setup stage.
@@ -179,7 +179,7 @@ static int mlx5e_xsk_disable_locked(struct mlx5e_priv *priv, u16 ix)
c = priv->channels.c[ix];
mlx5e_activate_rq(&c->rq);
- mlx5e_trigger_napi_icosq(c);
+ mlx5e_trigger_napi_async_icosq(c);
mlx5e_wait_for_min_rx_wqes(&c->rq, MLX5E_RQ_WQES_TIMEOUT);
mlx5e_rx_res_xsk_update(priv->rx_res, &priv->channels, ix, false);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
index 9e33156fac8a..8aeab4b21035 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
@@ -34,7 +34,7 @@ int mlx5e_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
&c->async_icosq->state))
return 0;
- mlx5e_trigger_napi_icosq(c);
+ mlx5e_trigger_napi_async_icosq(c);
}
return 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 4b8084420816..6a7ca4571c19 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2744,16 +2744,26 @@ static int mlx5e_channel_stats_alloc(struct mlx5e_priv *priv, int ix, int cpu)
void mlx5e_trigger_napi_icosq(struct mlx5e_channel *c)
{
+ struct mlx5e_icosq *sq = &c->icosq;
bool locked;
- if (!test_and_set_bit(MLX5E_SQ_STATE_LOCK_NEEDED, &c->icosq.state))
- synchronize_net();
+ set_bit(MLX5E_SQ_STATE_LOCK_NEEDED, &sq->state);
+ synchronize_net();
- locked = mlx5e_icosq_sync_lock(&c->icosq);
- mlx5e_trigger_irq(&c->icosq);
- mlx5e_icosq_sync_unlock(&c->icosq, locked);
+ locked = mlx5e_icosq_sync_lock(sq);
+ mlx5e_trigger_irq(sq);
+ mlx5e_icosq_sync_unlock(sq, locked);
- clear_bit(MLX5E_SQ_STATE_LOCK_NEEDED, &c->icosq.state);
+ clear_bit(MLX5E_SQ_STATE_LOCK_NEEDED, &sq->state);
+}
+
+void mlx5e_trigger_napi_async_icosq(struct mlx5e_channel *c)
+{
+ struct mlx5e_icosq *sq = c->async_icosq;
+
+ spin_lock_bh(&sq->lock);
+ mlx5e_trigger_irq(sq);
+ spin_unlock_bh(&sq->lock);
}
void mlx5e_trigger_napi_sched(struct napi_struct *napi)
@@ -2836,7 +2846,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
netif_napi_add_config_locked(netdev, &c->napi, mlx5e_napi_poll, ix);
netif_napi_set_irq_locked(&c->napi, irq);
- async_icosq_needed = !!xsk_pool || priv->ktls_rx_was_enabled;
+ async_icosq_needed = !!params->xdp_prog || priv->ktls_rx_was_enabled;
err = mlx5e_open_queues(c, params, cparam, async_icosq_needed);
if (unlikely(err))
goto err_napi_del;
base-commit: ee5492fd88cfc079c19fbeac78e9e53b7f6c04f3
--
2.44.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH net] net/mlx5e: XSK, Fix unintended ICOSQ change
2026-02-17 7:45 [PATCH net] net/mlx5e: XSK, Fix unintended ICOSQ change Tariq Toukan
@ 2026-02-17 16:48 ` Alice Mikityanska
2026-02-19 1:10 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 3+ messages in thread
From: Alice Mikityanska @ 2026-02-17 16:48 UTC (permalink / raw)
To: Tariq Toukan, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn, David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Mark Bloch, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
linux-rdma, linux-kernel, bpf, Gal Pressman, Moshe Shemesh,
William Tu, David Wei, Dragos Tatulea, alice
On Tue, Feb 17, 2026, at 09:45, Tariq Toukan wrote:
> XSK wakeup must use the async ICOSQ (with proper locking), as it is not
> guaranteed to run on the same CPU as the channel.
>
> The commit that converted the NAPI trigger path to use the sync ICOSQ
> incorrectly applied the same change to XSK, causing XSK wakeups to use
> the sync ICOSQ as well. Revert XSK flows to use the async ICOSQ.
>
> XDP program attach/detach triggers channel reopen, while XSK pool
> enable/disable can happen on-the-fly via NDOs without reopening
> channels. As a result, xsk_pool state cannot be reliably used at
> mlx5e_open_channel() time to decide whether an async ICOSQ is needed.
>
> Update the async_icosq_needed logic to depend on the presence of an XDP
> program rather than the xsk_pool, ensuring the async ICOSQ is available
> when XSK wakeups are enabled.
>
> This fixes multiple issues:
>
> 1. Illegal synchronize_rcu() in an RCU read- side critical section via
> mlx5e_xsk_wakeup() -> mlx5e_trigger_napi_icosq() ->
> synchronize_net(). The stack holds RCU read-lock in xsk_poll().
>
> 2. Hitting a NULL pointer dereference in mlx5e_xsk_wakeup():
>
> [] BUG: kernel NULL pointer dereference, address: 0000000000000240
> [] #PF: supervisor read access in kernel mode
> [] #PF: error_code(0x0000) - not-present page
> [] PGD 0 P4D 0
> [] Oops: Oops: 0000 [#1] SMP
> [] CPU: 0 UID: 0 PID: 2255 Comm: qemu-system-x86 Not tainted
> 6.19.0-rc5+ #229 PREEMPT(none)
> [] Hardware name: [...]
> [] RIP: 0010:mlx5e_xsk_wakeup+0x53/0x90 [mlx5_core]
>
> Reported-by: Daniel Borkmann <daniel@iogearbox.net>
> Closes:
> https://lore.kernel.org/all/20260123223916.361295-1-daniel@iogearbox.net/
> Fixes: 56aca3e0f730 ("net/mlx5e: Use regular ICOSQ for triggering NAPI")
> Tested-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/en.h | 1 +
> .../ethernet/mellanox/mlx5/core/en/xsk/pool.c | 4 ++--
> .../ethernet/mellanox/mlx5/core/en/xsk/tx.c | 2 +-
> .../net/ethernet/mellanox/mlx5/core/en_main.c | 24 +++++++++++++------
> 4 files changed, 21 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h
> b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> index a7de3a3efc49..19fce51117c9 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> @@ -1103,6 +1103,7 @@ int mlx5e_open_locked(struct net_device *netdev);
> int mlx5e_close_locked(struct net_device *netdev);
>
> void mlx5e_trigger_napi_icosq(struct mlx5e_channel *c);
> +void mlx5e_trigger_napi_async_icosq(struct mlx5e_channel *c);
> void mlx5e_trigger_napi_sched(struct napi_struct *napi);
>
> int mlx5e_open_channels(struct mlx5e_priv *priv,
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
> b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
> index db776e515b6a..5c5360a25c64 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
> @@ -127,7 +127,7 @@ static int mlx5e_xsk_enable_locked(struct
> mlx5e_priv *priv,
> goto err_remove_pool;
>
> mlx5e_activate_xsk(c);
> - mlx5e_trigger_napi_icosq(c);
> + mlx5e_trigger_napi_async_icosq(c);
>
> /* Don't wait for WQEs, because the newer xdpsock sample doesn't
> provide
> * any Fill Ring entries at the setup stage.
> @@ -179,7 +179,7 @@ static int mlx5e_xsk_disable_locked(struct
> mlx5e_priv *priv, u16 ix)
> c = priv->channels.c[ix];
>
> mlx5e_activate_rq(&c->rq);
> - mlx5e_trigger_napi_icosq(c);
> + mlx5e_trigger_napi_async_icosq(c);
> mlx5e_wait_for_min_rx_wqes(&c->rq, MLX5E_RQ_WQES_TIMEOUT);
>
> mlx5e_rx_res_xsk_update(priv->rx_res, &priv->channels, ix, false);
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
> b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
> index 9e33156fac8a..8aeab4b21035 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
> @@ -34,7 +34,7 @@ int mlx5e_xsk_wakeup(struct net_device *dev, u32 qid,
> u32 flags)
> &c->async_icosq->state))
> return 0;
>
> - mlx5e_trigger_napi_icosq(c);
> + mlx5e_trigger_napi_async_icosq(c);
> }
>
> return 0;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 4b8084420816..6a7ca4571c19 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -2744,16 +2744,26 @@ static int mlx5e_channel_stats_alloc(struct
> mlx5e_priv *priv, int ix, int cpu)
>
> void mlx5e_trigger_napi_icosq(struct mlx5e_channel *c)
> {
> + struct mlx5e_icosq *sq = &c->icosq;
> bool locked;
>
> - if (!test_and_set_bit(MLX5E_SQ_STATE_LOCK_NEEDED, &c->icosq.state))
> - synchronize_net();
> + set_bit(MLX5E_SQ_STATE_LOCK_NEEDED, &sq->state);
> + synchronize_net();
>
> - locked = mlx5e_icosq_sync_lock(&c->icosq);
> - mlx5e_trigger_irq(&c->icosq);
> - mlx5e_icosq_sync_unlock(&c->icosq, locked);
> + locked = mlx5e_icosq_sync_lock(sq);
> + mlx5e_trigger_irq(sq);
> + mlx5e_icosq_sync_unlock(sq, locked);
>
> - clear_bit(MLX5E_SQ_STATE_LOCK_NEEDED, &c->icosq.state);
> + clear_bit(MLX5E_SQ_STATE_LOCK_NEEDED, &sq->state);
> +}
> +
> +void mlx5e_trigger_napi_async_icosq(struct mlx5e_channel *c)
> +{
> + struct mlx5e_icosq *sq = c->async_icosq;
> +
> + spin_lock_bh(&sq->lock);
> + mlx5e_trigger_irq(sq);
> + spin_unlock_bh(&sq->lock);
> }
>
> void mlx5e_trigger_napi_sched(struct napi_struct *napi)
> @@ -2836,7 +2846,7 @@ static int mlx5e_open_channel(struct mlx5e_priv
> *priv, int ix,
> netif_napi_add_config_locked(netdev, &c->napi, mlx5e_napi_poll, ix);
> netif_napi_set_irq_locked(&c->napi, irq);
>
> - async_icosq_needed = !!xsk_pool || priv->ktls_rx_was_enabled;
> + async_icosq_needed = !!params->xdp_prog || priv->ktls_rx_was_enabled;
Acked-by: Alice Mikityanska <alice.kernel@fastmail.im>
With a follow-up suggestion that we discussed at:
https://lore.kernel.org/netdev/8a3a3ff4-16c7-4d99-8854-38d741cc6b82@gmail.com/
> err = mlx5e_open_queues(c, params, cparam, async_icosq_needed);
> if (unlikely(err))
> goto err_napi_del;
>
> base-commit: ee5492fd88cfc079c19fbeac78e9e53b7f6c04f3
> --
> 2.44.0
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH net] net/mlx5e: XSK, Fix unintended ICOSQ change
2026-02-17 7:45 [PATCH net] net/mlx5e: XSK, Fix unintended ICOSQ change Tariq Toukan
2026-02-17 16:48 ` Alice Mikityanska
@ 2026-02-19 1:10 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 3+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-02-19 1:10 UTC (permalink / raw)
To: Tariq Toukan
Cc: edumazet, kuba, pabeni, andrew+netdev, davem, saeedm, leon,
mbloch, ast, daniel, hawk, john.fastabend, netdev, linux-rdma,
linux-kernel, bpf, gal, moshe, alice.kernel, witu, dw, dtatulea
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Tue, 17 Feb 2026 09:45:25 +0200 you wrote:
> XSK wakeup must use the async ICOSQ (with proper locking), as it is not
> guaranteed to run on the same CPU as the channel.
>
> The commit that converted the NAPI trigger path to use the sync ICOSQ
> incorrectly applied the same change to XSK, causing XSK wakeups to use
> the sync ICOSQ as well. Revert XSK flows to use the async ICOSQ.
>
> [...]
Here is the summary with links:
- [net] net/mlx5e: XSK, Fix unintended ICOSQ change
https://git.kernel.org/netdev/net/c/0da1dba72616
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-02-19 1:10 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-17 7:45 [PATCH net] net/mlx5e: XSK, Fix unintended ICOSQ change Tariq Toukan
2026-02-17 16:48 ` Alice Mikityanska
2026-02-19 1:10 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox