From: Moshe Shemesh <moshe@nvidia.com>
To: Gerd Bayer <gbayer@linux.ibm.com>,
Saeed Mahameed <saeedm@nvidia.com>,
Leon Romanovsky <leon@kernel.org>,
Tariq Toukan <tariqt@nvidia.com>,
"Mark Bloch" <mbloch@nvidia.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
Paolo Abeni <pabeni@redhat.com>, Shay Drory <shayd@nvidia.com>,
Simon Horman <horms@kernel.org>
Cc: Lukas Wunner <lukas@wunner.de>,
Bjorn Helgaas <helgaas@kernel.org>,
"Niklas Schnelle" <schnelle@linux.ibm.com>,
Farhan Ali <alifm@linux.ibm.com>, <netdev@vger.kernel.org>,
<linux-rdma@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<linux-s390@vger.kernel.org>, <linux-pci@vger.kernel.org>
Subject: Re: [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component
Date: Thu, 4 Dec 2025 19:07:20 +0200 [thread overview]
Message-ID: <1bef8fd9-e9b8-4184-98be-98d016df20d0@nvidia.com> (raw)
In-Reply-To: <502727b0ad4a9bc34afb421d465646248c69f7d4.camel@linux.ibm.com>
On 12/4/2025 11:48 AM, Gerd Bayer wrote:
>
> On Wed, 2025-12-03 at 17:14 +0200, Moshe Shemesh wrote:
>>
>> On 12/2/2025 1:12 PM, Gerd Bayer wrote:
>>>
>
> [ ... snip ... ]
>
>>>
>>> Fixes: 5a977b5833b7 ("net/mlx5: Lag, move devcom registration to LAG layer")
>>> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
>>
>> Reviewed-by: Moshe Shemesh <moshe@nvidia.com>> ---
>>> Hi Shay et al,
>>>
>>
>> Hi Gerd,
>> I stepped on this bug recently too, without s390 and was about to
>> submit same fix :) So as you wrote it is unrelated to Lukas' patches and
>> this fix is correct.
>
> Good to hear. I wonder if you could share how you got to run into this?
>
mlx5_unload_one() can be called from few flows.
Even that it is always called with devlink lock, serial of
mlx5_unload_one() twice caused it. I got it on fw_reset and shutdown. I
I will submit also a patch for calling mlx5_drain_fw_reset() on shutdown
soon.
>>
>>>
>>> I've spotted two additional places where the devcom reference is not
>>> cleared after calling mlx5_devcom_unregister_component() in
>>> drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c that I have not
>>> addressed with a patch, since I'm unclear about how to test these
>>> paths.
>>
>> As for the other cases, we had the patch 664f76be38a1 ("net/mlx5: Fix
>> IPsec cleanup over MPV device") and two other cases on shared clock and
>> SD but I don't see any flow the shared clock or SD can fail,
>> specifically mlx5_sd_cleanup() checks sd pointer at beginning of the
>> function and nullify it right after sd_unregister() that free devcom.
>
> I didn't locate any calls to mxl5_devcom_unregister_component() in
> "shared clock" - is that not yet upstream?
mlx5_shared_clock_unregister() in
drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
>
> Regarding SD, I follow that sd_cleanup() is followed immediately after
> sd_unregister() and does the clean-up. One path remains uncovered
> though: The error exit at
> https://elixir.bootlin.com/linux/v6.18/source/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c#L265
>
> Not sure, how likely that is...
It comes on error flow but after successful
mlx5_devcom_register_component() in sd_register(), and that error leads
to error flow in mlx5_sd_init(), which calls sd_cleanup() too.
>
> Thanks,
> Gerd
next prev parent reply other threads:[~2025-12-04 17:07 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-02 11:12 [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component Gerd Bayer
2025-12-03 15:14 ` Moshe Shemesh
2025-12-04 9:48 ` Gerd Bayer
2025-12-04 17:07 ` Moshe Shemesh [this message]
2025-12-05 8:23 ` Gerd Bayer
2025-12-03 21:10 ` Farhan Ali
2025-12-04 8:27 ` Tariq Toukan
2025-12-04 9:00 ` Tariq Toukan
2025-12-04 14:30 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1bef8fd9-e9b8-4184-98be-98d016df20d0@nvidia.com \
--to=moshe@nvidia.com \
--cc=alifm@linux.ibm.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gbayer@linux.ibm.com \
--cc=helgaas@kernel.org \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=mbloch@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=schnelle@linux.ibm.com \
--cc=shayd@nvidia.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.