From: Cosmin Ratiu <cratiu@nvidia.com>
To: Tariq Toukan <tariqt@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
Saeed Mahameed <saeedm@nvidia.com>,
"matt@readmodwrite.com" <matt@readmodwrite.com>,
"leon@kernel.org" <leon@kernel.org>
Cc: "linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
"andrew+netdev@lunn.ch" <andrew+netdev@lunn.ch>,
"davem@davemloft.net" <davem@davemloft.net>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"kernel-team@cloudflare.com" <kernel-team@cloudflare.com>,
"kuba@kernel.org" <kuba@kernel.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"edumazet@google.com" <edumazet@google.com>,
"pabeni@redhat.com" <pabeni@redhat.com>,
"mfleming@cloudflare.com" <mfleming@cloudflare.com>
Subject: Re: [PATCH net] net/mlx5e: Fix use-after-free in mlx5e_tx_reporter_timeout_recover
Date: Tue, 12 May 2026 11:08:48 +0000 [thread overview]
Message-ID: <00ce6b5b1081bea03195a39c525b9230e3524256.camel@nvidia.com> (raw)
In-Reply-To: <20260408184458.1274662-1-matt@readmodwrite.com>
On Wed, 2026-04-08 at 19:44 +0100, Matt Fleming wrote:
> From: Matt Fleming <mfleming@cloudflare.com>
First of all, apologies for the delay, I missed this and it seems
nobody else reacted for more than a month.
Next time, you will probably get more immediate reactions if you
directly CC the people involved in the patch which introduced the bug.
This will also make the patchwork checkers happier.
>
> mlx5e_tx_reporter_timeout_recover() accesses sq->netdev after
> mlx5e_safe_reopen_channels() has torn down and freed the channel (and
> its embedded SQs). Replace the three sq->netdev references with
> priv->netdev which is safe because priv outlives channel teardown.
>
> The netdev_err() call already used priv->netdev for this reason; make
> the trylock/unlock and health_channel_eq_recover calls consistent.
>
> This fixes the following KASAN splat:
>
> BUG: KASAN: use-after-free in
> mlx5e_tx_reporter_timeout_recover+0x1dd/0x360 [mlx5_core]
> Read of size 8 at addr ffff889860ed0b28 by task kworker/u113:2/5277
>
> Call Trace:
> mlx5e_tx_reporter_timeout_recover+0x1dd/0x360 [mlx5_core]
> devlink_health_reporter_recover+0xa2/0x150
> devlink_health_report+0x254/0x7c0
> mlx5e_reporter_tx_timeout+0x297/0x380 [mlx5_core]
> mlx5e_tx_timeout_work+0x109/0x170 [mlx5_core]
> process_one_work+0x677/0xf20
> worker_thread+0x51f/0xd90
> kthread+0x3a5/0x810
> ret_from_fork+0x208/0x400
> ret_from_fork_asm+0x1a/0x30
>
> Fixes: 83ac0304a2d7 ("net/mlx5e: Fix deadlocks between devlink and
> netdev instance locks")
> Signed-off-by: Matt Fleming <mfleming@cloudflare.com>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
> b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
> index afdeb1b3d425..8409ae73768f 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
> @@ -160,13 +160,13 @@ static int
> mlx5e_tx_reporter_timeout_recover(void *ctx)
> * channels are being closed for other reason and this work
> is not
> * relevant anymore.
> */
> - while (!netdev_trylock(sq->netdev)) {
> + while (!netdev_trylock(priv->netdev)) {
> if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv-
> >state))
> return 0;
> msleep(20);
> }
>
> - err = mlx5e_health_channel_eq_recover(sq->netdev, eq, sq-
> >cq.ch_stats);
> + err = mlx5e_health_channel_eq_recover(priv->netdev, eq, sq-
> >cq.ch_stats);
> if (!err) {
> to_ctx->status = 0; /* this sq recovered */
> goto out;
> @@ -186,7 +186,7 @@ static int mlx5e_tx_reporter_timeout_recover(void
> *ctx)
> "mlx5e_safe_reopen_channels failed recovering
> from a tx_timeout, err(%d).\n",
> err);
> out:
> - netdev_unlock(sq->netdev);
> + netdev_unlock(priv->netdev);
> return err;
> }
>
Thank you for the fix, it is a real problem which can happen if direct
SQ recovery fails and all channels need to be reopened, which is
apparently what happened in your KASAN report.
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
next prev parent reply other threads:[~2026-05-12 11:08 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-08 18:44 [PATCH net] net/mlx5e: Fix use-after-free in mlx5e_tx_reporter_timeout_recover Matt Fleming
2026-05-01 10:03 ` Matt Fleming
2026-05-12 11:08 ` Cosmin Ratiu [this message]
2026-05-12 11:12 ` Tariq Toukan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=00ce6b5b1081bea03195a39c525b9230e3524256.camel@nvidia.com \
--to=cratiu@nvidia.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kernel-team@cloudflare.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=matt@readmodwrite.com \
--cc=mbloch@nvidia.com \
--cc=mfleming@cloudflare.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox