From: Tariq Toukan <ttoukan.linux@gmail.com>
To: Daniel Borkmann <daniel@iogearbox.net>, netdev@vger.kernel.org
Cc: William Tu <witu@nvidia.com>, Tariq Toukan <tariqt@nvidia.com>,
David Wei <dw@davidwei.uk>, Jakub Kicinski <kuba@kernel.org>,
Gal Pressman <gal@nvidia.com>
Subject: Re: [PATCH net-next] net/mlx5e: Undo saving per-channel async ICOSQ
Date: Sun, 1 Feb 2026 14:50:30 +0200 [thread overview]
Message-ID: <421d9b09-a157-4f95-8775-7883ea097038@gmail.com> (raw)
In-Reply-To: <6188b9f5-ce38-4de9-80b7-c7b1cab48595@iogearbox.net>
[-- Attachment #1: Type: text/plain, Size: 2715 bytes --]
On 26/01/2026 11:23, Daniel Borkmann wrote:
> Hi Tariq,
>
> On 1/25/26 9:33 AM, Tariq Toukan wrote:
>> On 24/01/2026 0:39, Daniel Borkmann wrote:
>>> This reverts the following commits:
>>>
>>> - ea945f4f3991 ("net/mlx5e: Move async ICOSQ lock into ICOSQ struct")
>>> - 56aca3e0f730 ("net/mlx5e: Use regular ICOSQ for triggering NAPI")
>>> - 1b080bd74840 ("net/mlx5e: Move async ICOSQ to dynamic allocation")
>>> - abed42f9cd80 ("net/mlx5e: Conditionally create async ICOSQ")
>>>
>>> There are a couple of regressions on the xsk side I ran into:
>>>
>>> Commit 56aca3e0f730 triggers an illegal synchronize_rcu() in an RCU
>>> read-
>>> side critical section via mlx5e_xsk_wakeup() ->
>>> mlx5e_trigger_napi_icosq()
>>> -> synchronize_net(). The stack holds RCU read-lock in xsk_poll().
>>>
>>> Additionally, this also hits a NULL pointer dereference in
>>> mlx5e_xsk_wakeup():
>>>
>>> [ 103.963735] BUG: kernel NULL pointer dereference, address:
>>> 0000000000000240
>>> [ 103.963743] #PF: supervisor read access in kernel mode
>>> [ 103.963746] #PF: error_code(0x0000) - not-present page
>>> [ 103.963749] PGD 0 P4D 0
>>> [ 103.963752] Oops: Oops: 0000 [#1] SMP
>>> [ 103.963756] CPU: 0 UID: 0 PID: 2255 Comm: qemu-system-x86 Not
>>> tainted 6.19.0-rc5+ #229 PREEMPT(none)
>>> [ 103.963761] Hardware name: [...]
>>> [ 103.963765] RIP: 0010:mlx5e_xsk_wakeup+0x53/0x90 [mlx5_core]
>>>
>>> What happens is that c->async_icosq is NULL when in mlx5e_xsk_wakeup()
>>> and therefore access to c->async_icosq->state triggers it. (On the NIC
>>> there is an XDP program installed by the control plane where traffic
>>> gets redirected into an xsk map - there was no xsk pool set up yet.
>>> At some later time a xsk pool is set up and the related xsk socket is
>>> added to the xsk map of the XDP program.)
>>
>> Thanks for your report.
>>
>>> Reverting the series fixes the problems again.
>>
>> Revert is too aggressive here. A fix is preferable.
>> We're investigating the issue in order to fix it.
>> We'll update.
> Ok, sounds good. Certainly the kTLS fixes seem independent, from the cause
> of the issues I've hit it just seemed to me that they were quite
> fundamental
> and that perhaps a different approach would be needed (or alternatively
> only
> kTLS would need fixing, and the xsk optimization left as it was
> originally).
> Anyway, I'll keep the revert locally for now, and happy to test patches.
Hi Daniel,
Please check attached patch.
We were able to repro the issues internally and verify the fix.
We're finalizing it before submission.
I'd be glad if you can confirm it solves the issues for you.
Thanks,
Tariq
[-- Attachment #2: 0001-net-mlx5e-XSK-Fix-unintended-ICOSQ-change.patch --]
[-- Type: text/plain, Size: 4503 bytes --]
From fd51958ad083528859a171a84aeba7021989872b Mon Sep 17 00:00:00 2001
From: Tariq Toukan <tariqt@nvidia.com>
Date: Mon, 26 Jan 2026 17:07:29 +0200
Subject: [PATCH] net/mlx5e: XSK, Fix unintended ICOSQ change
Change-Id: Iffaa183f9ed8a9190c38be2c5290b44bf6ef731a
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en.h | 1 +
.../ethernet/mellanox/mlx5/core/en/xsk/pool.c | 4 +--
.../ethernet/mellanox/mlx5/core/en/xsk/tx.c | 2 +-
.../net/ethernet/mellanox/mlx5/core/en_main.c | 28 ++++++++++++++-----
4 files changed, 25 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 19b9683f4622..63da9f680e87 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -1109,6 +1109,7 @@ int mlx5e_open_locked(struct net_device *netdev);
int mlx5e_close_locked(struct net_device *netdev);
void mlx5e_trigger_napi_icosq(struct mlx5e_channel *c);
+void mlx5e_trigger_napi_async_icosq(struct mlx5e_channel *c);
void mlx5e_trigger_napi_sched(struct napi_struct *napi);
int mlx5e_open_channels(struct mlx5e_priv *priv,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
index db776e515b6a..5c5360a25c64 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
@@ -127,7 +127,7 @@ static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
goto err_remove_pool;
mlx5e_activate_xsk(c);
- mlx5e_trigger_napi_icosq(c);
+ mlx5e_trigger_napi_async_icosq(c);
/* Don't wait for WQEs, because the newer xdpsock sample doesn't provide
* any Fill Ring entries at the setup stage.
@@ -179,7 +179,7 @@ static int mlx5e_xsk_disable_locked(struct mlx5e_priv *priv, u16 ix)
c = priv->channels.c[ix];
mlx5e_activate_rq(&c->rq);
- mlx5e_trigger_napi_icosq(c);
+ mlx5e_trigger_napi_async_icosq(c);
mlx5e_wait_for_min_rx_wqes(&c->rq, MLX5E_RQ_WQES_TIMEOUT);
mlx5e_rx_res_xsk_update(priv->rx_res, &priv->channels, ix, false);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
index 9e33156fac8a..8aeab4b21035 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
@@ -34,7 +34,7 @@ int mlx5e_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
&c->async_icosq->state))
return 0;
- mlx5e_trigger_napi_icosq(c);
+ mlx5e_trigger_napi_async_icosq(c);
}
return 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index e1e05c9e7ebb..4a6887ba5741 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2789,16 +2789,30 @@ static int mlx5e_channel_stats_alloc(struct mlx5e_priv *priv, int ix, int cpu)
void mlx5e_trigger_napi_icosq(struct mlx5e_channel *c)
{
+ struct mlx5e_icosq *sq = &c->icosq;
bool locked;
- if (!test_and_set_bit(MLX5E_SQ_STATE_LOCK_NEEDED, &c->icosq.state))
- synchronize_net();
+ WARN_ON(test_bit(MLX5E_SQ_STATE_LOCK_NEEDED, &sq->state));
- locked = mlx5e_icosq_sync_lock(&c->icosq);
- mlx5e_trigger_irq(&c->icosq);
- mlx5e_icosq_sync_unlock(&c->icosq, locked);
+ set_bit(MLX5E_SQ_STATE_LOCK_NEEDED, &sq->state);
+ synchronize_net();
- clear_bit(MLX5E_SQ_STATE_LOCK_NEEDED, &c->icosq.state);
+ locked = mlx5e_icosq_sync_lock(sq);
+ mlx5e_trigger_irq(sq);
+ mlx5e_icosq_sync_unlock(sq, locked);
+
+ clear_bit(MLX5E_SQ_STATE_LOCK_NEEDED, &sq->state);
+}
+
+void mlx5e_trigger_napi_async_icosq(struct mlx5e_channel *c)
+{
+ struct mlx5e_icosq *sq = c->async_icosq;
+
+ WARN_ON(!sq);
+
+ spin_lock_bh(&sq->lock);
+ mlx5e_trigger_irq(sq);
+ spin_unlock_bh(&sq->lock);
}
void mlx5e_trigger_napi_sched(struct napi_struct *napi)
@@ -2881,7 +2895,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
netif_napi_add_config_locked(netdev, &c->napi, mlx5e_napi_poll, ix);
netif_napi_set_irq_locked(&c->napi, irq);
- async_icosq_needed = !!xsk_pool || priv->ktls_rx_was_enabled;
+ async_icosq_needed = !!params->xdp_prog || priv->ktls_rx_was_enabled;
err = mlx5e_open_queues(c, params, cparam, async_icosq_needed);
if (unlikely(err))
goto err_napi_del;
base-commit: d8f87aa5fa0a4276491fa8ef436cd22605a3f9ba
--
2.44.0
next prev parent reply other threads:[~2026-02-01 12:50 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-23 22:39 [PATCH net-next] net/mlx5e: Undo saving per-channel async ICOSQ Daniel Borkmann
2026-01-25 8:33 ` Tariq Toukan
2026-01-26 9:23 ` Daniel Borkmann
2026-02-01 12:50 ` Tariq Toukan [this message]
2026-02-02 16:13 ` Daniel Borkmann
2026-02-03 9:20 ` Alice Mikityanska
2026-02-03 14:58 ` Tariq Toukan
2026-02-03 17:54 ` Alice Mikityanska
2026-02-04 6:20 ` Tariq Toukan
2026-02-03 12:30 ` Tariq Toukan
2026-01-26 16:06 ` Alice Mikityanska
2026-01-26 20:54 ` Tariq Toukan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=421d9b09-a157-4f95-8775-7883ea097038@gmail.com \
--to=ttoukan.linux@gmail.com \
--cc=daniel@iogearbox.net \
--cc=dw@davidwei.uk \
--cc=gal@nvidia.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=tariqt@nvidia.com \
--cc=witu@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox