* Re: [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component
2025-12-02 11:12 [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component Gerd Bayer
@ 2025-12-03 15:14 ` Moshe Shemesh
2025-12-04 9:48 ` Gerd Bayer
2025-12-03 21:10 ` Farhan Ali
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Moshe Shemesh @ 2025-12-03 15:14 UTC (permalink / raw)
To: Gerd Bayer, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
Mark Bloch, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Shay Drory, Simon Horman
Cc: Lukas Wunner, Bjorn Helgaas, Niklas Schnelle, Farhan Ali, netdev,
linux-rdma, linux-kernel, linux-s390, linux-pci
On 12/2/2025 1:12 PM, Gerd Bayer wrote:
> Clear hca_devcom_comp in device's private data after unregistering it in
> LAG teardown. Otherwise a slightly lagging second pass through
> mlx5_unload_one() might try to unregister it again and trip over
> use-after-free.
>
> On s390 almost all PCI level recovery events trigger two passes through
> mxl5_unload_one() - one through the poll_health() method and one through
> mlx5_pci_err_detected() as callback from generic PCI error recovery.
> While testing PCI error recovery paths with more kernel debug features
> enabled, this issue reproducibly led to kernel panics with the following
> call chain:
>
> Unable to handle kernel pointer dereference in virtual kernel address space
> Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803 ESOP-2 FSI
> Fault in home space mode while using kernel ASCE.
> AS:00000000705c4007 R3:0000000000000024
> Oops: 0038 ilc:3 [#1]SMP
>
> CPU: 14 UID: 0 PID: 156 Comm: kmcheck Kdump: loaded Not tainted
> 6.18.0-20251130.rc7.git0.16131a59cab1.300.fc43.s390x+debug #1 PREEMPT
>
> Krnl PSW : 0404e00180000000 0000020fc86aa1dc (__lock_acquire+0x5c/0x15f0)
> R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
> Krnl GPRS: 0000000000000000 0000020f00000001 6b6b6b6b6b6b6c33 0000000000000000
> 0000000000000000 0000000000000000 0000000000000001 0000000000000000
> 0000000000000000 0000020fca28b820 0000000000000000 0000010a1ced8100
> 0000010a1ced8100 0000020fc9775068 0000018fce14f8b8 0000018fce14f7f8
> Krnl Code: 0000020fc86aa1cc: e3b003400004 lg %r11,832
> 0000020fc86aa1d2: a7840211 brc 8,0000020fc86aa5f4
> *0000020fc86aa1d6: c09000df0b25 larl %r9,0000020fca28b820
> >0000020fc86aa1dc: d50790002000 clc 0(8,%r9),0(%r2)
> 0000020fc86aa1e2: a7840209 brc 8,0000020fc86aa5f4
> 0000020fc86aa1e6: c0e001100401 larl %r14,0000020fca8aa9e8
> 0000020fc86aa1ec: c01000e25a00 larl %r1,0000020fca2f55ec
> 0000020fc86aa1f2: a7eb00e8 aghi %r14,232
>
> Call Trace:
> __lock_acquire+0x5c/0x15f0
> lock_acquire.part.0+0xf8/0x270
> lock_acquire+0xb0/0x1b0
> down_write+0x5a/0x250
> mlx5_detach_device+0x42/0x110 [mlx5_core]
> mlx5_unload_one_devl_locked+0x50/0xc0 [mlx5_core]
> mlx5_unload_one+0x42/0x60 [mlx5_core]
> mlx5_pci_err_detected+0x94/0x150 [mlx5_core]
> zpci_event_attempt_error_recovery+0xcc/0x388
>
> Fixes: 5a977b5833b7 ("net/mlx5: Lag, move devcom registration to LAG layer")
> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>> ---
> Hi Shay et al,
>
> while checking for potential regressions by Lukas Wunner's recent work
> on pci_save/restore_state() for the recoverability of mlx5 functions I
> consistently hit this bug. (Bjorn has queued this up for 6.19, according
> to [0] and [1])
>
> Apparently, the issue is unrelated to Lukas' work but can be reproduced
> with master. It appears to be timing-sensitive, since it shows up only
> when I use s390's debug_defconfig, but I think needs fixing anyhow, as
> timing can change for other reasons, too.
Hi Gerd,
I stepped on this bug recently too, without s390 and was about to
submit same fix :) So as you wrote it is unrelated to Lukas' patches and
this fix is correct.
>
> I've spotted two additional places where the devcom reference is not
> cleared after calling mlx5_devcom_unregister_component() in
> drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c that I have not
> addressed with a patch, since I'm unclear about how to test these
> paths.
As for the other cases, we had the patch 664f76be38a1 ("net/mlx5: Fix
IPsec cleanup over MPV device") and two other cases on shared clock and
SD but I don't see any flow the shared clock or SD can fail,
specifically mlx5_sd_cleanup() checks sd pointer at beginning of the
function and nullify it right after sd_unregister() that free devcom.
Thanks,
Moshe.>
> Thanks,
> Gerd
>
> [0] https://lore.kernel.org/all/cover.1760274044.git.lukas@wunner.de/
> [1] https://lore.kernel.org/linux-pci/cover.1763483367.git.lukas@wunner.de/
> ---
> drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> index 3db0387bf6dcb727a65df9d0253f242554af06db..8ec04a5f434dd4f717d6d556649fcc2a584db847 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> @@ -1413,6 +1413,7 @@ static int __mlx5_lag_dev_add_mdev(struct mlx5_core_dev *dev)
> static void mlx5_lag_unregister_hca_devcom_comp(struct mlx5_core_dev *dev)
> {
> mlx5_devcom_unregister_component(dev->priv.hca_devcom_comp);
> + dev->priv.hca_devcom_comp = NULL;
> }
>
> static int mlx5_lag_register_hca_devcom_comp(struct mlx5_core_dev *dev)
>
> ---
> base-commit: 4a26e7032d7d57c998598c08a034872d6f0d3945
> change-id: 20251202-fix_lag-6a59b39a0b3c
>
> Best regards,
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component
2025-12-03 15:14 ` Moshe Shemesh
@ 2025-12-04 9:48 ` Gerd Bayer
2025-12-04 17:07 ` Moshe Shemesh
0 siblings, 1 reply; 9+ messages in thread
From: Gerd Bayer @ 2025-12-04 9:48 UTC (permalink / raw)
To: Moshe Shemesh, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
Mark Bloch, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Shay Drory, Simon Horman
Cc: Lukas Wunner, Bjorn Helgaas, Niklas Schnelle, Farhan Ali, netdev,
linux-rdma, linux-kernel, linux-s390, linux-pci
On Wed, 2025-12-03 at 17:14 +0200, Moshe Shemesh wrote:
>
> On 12/2/2025 1:12 PM, Gerd Bayer wrote:
> >
[ ... snip ... ]
> >
> > Fixes: 5a977b5833b7 ("net/mlx5: Lag, move devcom registration to LAG layer")
> > Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
>
> Reviewed-by: Moshe Shemesh <moshe@nvidia.com>> ---
> > Hi Shay et al,
> >
>
> Hi Gerd,
> I stepped on this bug recently too, without s390 and was about to
> submit same fix :) So as you wrote it is unrelated to Lukas' patches and
> this fix is correct.
Good to hear. I wonder if you could share how you got to run into this?
>
> >
> > I've spotted two additional places where the devcom reference is not
> > cleared after calling mlx5_devcom_unregister_component() in
> > drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c that I have not
> > addressed with a patch, since I'm unclear about how to test these
> > paths.
>
> As for the other cases, we had the patch 664f76be38a1 ("net/mlx5: Fix
> IPsec cleanup over MPV device") and two other cases on shared clock and
> SD but I don't see any flow the shared clock or SD can fail,
> specifically mlx5_sd_cleanup() checks sd pointer at beginning of the
> function and nullify it right after sd_unregister() that free devcom.
I didn't locate any calls to mxl5_devcom_unregister_component() in
"shared clock" - is that not yet upstream?
Regarding SD, I follow that sd_cleanup() is followed immediately after
sd_unregister() and does the clean-up. One path remains uncovered
though: The error exit at
https://elixir.bootlin.com/linux/v6.18/source/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c#L265
Not sure, how likely that is...
Thanks,
Gerd
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component
2025-12-04 9:48 ` Gerd Bayer
@ 2025-12-04 17:07 ` Moshe Shemesh
2025-12-05 8:23 ` Gerd Bayer
0 siblings, 1 reply; 9+ messages in thread
From: Moshe Shemesh @ 2025-12-04 17:07 UTC (permalink / raw)
To: Gerd Bayer, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
Mark Bloch, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Shay Drory, Simon Horman
Cc: Lukas Wunner, Bjorn Helgaas, Niklas Schnelle, Farhan Ali, netdev,
linux-rdma, linux-kernel, linux-s390, linux-pci
On 12/4/2025 11:48 AM, Gerd Bayer wrote:
>
> On Wed, 2025-12-03 at 17:14 +0200, Moshe Shemesh wrote:
>>
>> On 12/2/2025 1:12 PM, Gerd Bayer wrote:
>>>
>
> [ ... snip ... ]
>
>>>
>>> Fixes: 5a977b5833b7 ("net/mlx5: Lag, move devcom registration to LAG layer")
>>> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
>>
>> Reviewed-by: Moshe Shemesh <moshe@nvidia.com>> ---
>>> Hi Shay et al,
>>>
>>
>> Hi Gerd,
>> I stepped on this bug recently too, without s390 and was about to
>> submit same fix :) So as you wrote it is unrelated to Lukas' patches and
>> this fix is correct.
>
> Good to hear. I wonder if you could share how you got to run into this?
>
mlx5_unload_one() can be called from few flows.
Even that it is always called with devlink lock, serial of
mlx5_unload_one() twice caused it. I got it on fw_reset and shutdown. I
I will submit also a patch for calling mlx5_drain_fw_reset() on shutdown
soon.
>>
>>>
>>> I've spotted two additional places where the devcom reference is not
>>> cleared after calling mlx5_devcom_unregister_component() in
>>> drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c that I have not
>>> addressed with a patch, since I'm unclear about how to test these
>>> paths.
>>
>> As for the other cases, we had the patch 664f76be38a1 ("net/mlx5: Fix
>> IPsec cleanup over MPV device") and two other cases on shared clock and
>> SD but I don't see any flow the shared clock or SD can fail,
>> specifically mlx5_sd_cleanup() checks sd pointer at beginning of the
>> function and nullify it right after sd_unregister() that free devcom.
>
> I didn't locate any calls to mxl5_devcom_unregister_component() in
> "shared clock" - is that not yet upstream?
mlx5_shared_clock_unregister() in
drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
>
> Regarding SD, I follow that sd_cleanup() is followed immediately after
> sd_unregister() and does the clean-up. One path remains uncovered
> though: The error exit at
> https://elixir.bootlin.com/linux/v6.18/source/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c#L265
>
> Not sure, how likely that is...
It comes on error flow but after successful
mlx5_devcom_register_component() in sd_register(), and that error leads
to error flow in mlx5_sd_init(), which calls sd_cleanup() too.
>
> Thanks,
> Gerd
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component
2025-12-04 17:07 ` Moshe Shemesh
@ 2025-12-05 8:23 ` Gerd Bayer
0 siblings, 0 replies; 9+ messages in thread
From: Gerd Bayer @ 2025-12-05 8:23 UTC (permalink / raw)
To: Moshe Shemesh, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
Mark Bloch, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Shay Drory, Simon Horman
Cc: Lukas Wunner, Bjorn Helgaas, Niklas Schnelle, Farhan Ali, netdev,
linux-rdma, linux-kernel, linux-s390, linux-pci
On Thu, 2025-12-04 at 19:07 +0200, Moshe Shemesh wrote:
>
> On 12/4/2025 11:48 AM, Gerd Bayer wrote:
> >
> > On Wed, 2025-12-03 at 17:14 +0200, Moshe Shemesh wrote:
> > >
> > > On 12/2/2025 1:12 PM, Gerd Bayer wrote:
> > > >
> >
> > [ ... snip ... ]
> >
> > > >
> > > > Fixes: 5a977b5833b7 ("net/mlx5: Lag, move devcom registration to LAG layer")
> > > > Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
> > >
> > > Reviewed-by: Moshe Shemesh <moshe@nvidia.com>> ---
> > > > Hi Shay et al,
> > > >
> > >
> > > Hi Gerd,
> > > I stepped on this bug recently too, without s390 and was about to
> > > submit same fix :) So as you wrote it is unrelated to Lukas' patches and
> > > this fix is correct.
> >
> > Good to hear. I wonder if you could share how you got to run into this?
> >
>
> mlx5_unload_one() can be called from few flows.
> Even that it is always called with devlink lock, serial of
> mlx5_unload_one() twice caused it. I got it on fw_reset and shutdown. I
> I will submit also a patch for calling mlx5_drain_fw_reset() on shutdown
> soon.
I agree, serialization through the devlink lock does not help if
mlx5_unload_one() does not clean up all the references.
>
> > >
> > > >
> > > > I've spotted two additional places where the devcom reference is not
> > > > cleared after calling mlx5_devcom_unregister_component() in
> > > > drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c that I have not
> > > > addressed with a patch, since I'm unclear about how to test these
> > > > paths.
> > >
> > > As for the other cases, we had the patch 664f76be38a1 ("net/mlx5: Fix
> > > IPsec cleanup over MPV device") and two other cases on shared clock and
> > > SD but I don't see any flow the shared clock or SD can fail,
> > > specifically mlx5_sd_cleanup() checks sd pointer at beginning of the
> > > function and nullify it right after sd_unregister() that free devcom.
> >
> > I didn't locate any calls to mxl5_devcom_unregister_component() in
> > "shared clock" - is that not yet upstream?
>
> mlx5_shared_clock_unregister() in
> drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
Hah - my fault! I was searching through the indexer's parameterized
cross-references, and w/o CONFIG_PTP_1588_CLOCK that file was excluded.
>
> >
> > Regarding SD, I follow that sd_cleanup() is followed immediately after
> > sd_unregister() and does the clean-up. One path remains uncovered
> > though: The error exit at
> > https://elixir.bootlin.com/linux/v6.18/source/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c#L265
> >
> > Not sure, how likely that is...
>
> It comes on error flow but after successful
> mlx5_devcom_register_component() in sd_register(), and that error leads
> to error flow in mlx5_sd_init(), which calls sd_cleanup() too.
>
> >
> > Thanks,
> > Gerd
Thanks for you explanations,
Gerd
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component
2025-12-02 11:12 [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component Gerd Bayer
2025-12-03 15:14 ` Moshe Shemesh
@ 2025-12-03 21:10 ` Farhan Ali
2025-12-04 8:27 ` Tariq Toukan
2025-12-04 9:00 ` Tariq Toukan
2025-12-04 14:30 ` patchwork-bot+netdevbpf
3 siblings, 1 reply; 9+ messages in thread
From: Farhan Ali @ 2025-12-03 21:10 UTC (permalink / raw)
To: Gerd Bayer, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
Mark Bloch, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Shay Drory, Simon Horman
Cc: Lukas Wunner, Bjorn Helgaas, Niklas Schnelle, netdev, linux-rdma,
linux-kernel, linux-s390, linux-pci
On 12/2/2025 3:12 AM, Gerd Bayer wrote:
> Clear hca_devcom_comp in device's private data after unregistering it in
> LAG teardown. Otherwise a slightly lagging second pass through
> mlx5_unload_one() might try to unregister it again and trip over
> use-after-free.
>
> On s390 almost all PCI level recovery events trigger two passes through
> mxl5_unload_one() - one through the poll_health() method and one through
> mlx5_pci_err_detected() as callback from generic PCI error recovery.
> While testing PCI error recovery paths with more kernel debug features
> enabled, this issue reproducibly led to kernel panics with the following
> call chain:
>
> Unable to handle kernel pointer dereference in virtual kernel address space
> Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803 ESOP-2 FSI
> Fault in home space mode while using kernel ASCE.
> AS:00000000705c4007 R3:0000000000000024
> Oops: 0038 ilc:3 [#1]SMP
>
> CPU: 14 UID: 0 PID: 156 Comm: kmcheck Kdump: loaded Not tainted
> 6.18.0-20251130.rc7.git0.16131a59cab1.300.fc43.s390x+debug #1 PREEMPT
>
> Krnl PSW : 0404e00180000000 0000020fc86aa1dc (__lock_acquire+0x5c/0x15f0)
> R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
> Krnl GPRS: 0000000000000000 0000020f00000001 6b6b6b6b6b6b6c33 0000000000000000
> 0000000000000000 0000000000000000 0000000000000001 0000000000000000
> 0000000000000000 0000020fca28b820 0000000000000000 0000010a1ced8100
> 0000010a1ced8100 0000020fc9775068 0000018fce14f8b8 0000018fce14f7f8
> Krnl Code: 0000020fc86aa1cc: e3b003400004 lg %r11,832
> 0000020fc86aa1d2: a7840211 brc 8,0000020fc86aa5f4
> *0000020fc86aa1d6: c09000df0b25 larl %r9,0000020fca28b820
> >0000020fc86aa1dc: d50790002000 clc 0(8,%r9),0(%r2)
> 0000020fc86aa1e2: a7840209 brc 8,0000020fc86aa5f4
> 0000020fc86aa1e6: c0e001100401 larl %r14,0000020fca8aa9e8
> 0000020fc86aa1ec: c01000e25a00 larl %r1,0000020fca2f55ec
> 0000020fc86aa1f2: a7eb00e8 aghi %r14,232
>
> Call Trace:
> __lock_acquire+0x5c/0x15f0
> lock_acquire.part.0+0xf8/0x270
> lock_acquire+0xb0/0x1b0
> down_write+0x5a/0x250
> mlx5_detach_device+0x42/0x110 [mlx5_core]
> mlx5_unload_one_devl_locked+0x50/0xc0 [mlx5_core]
> mlx5_unload_one+0x42/0x60 [mlx5_core]
> mlx5_pci_err_detected+0x94/0x150 [mlx5_core]
> zpci_event_attempt_error_recovery+0xcc/0x388
>
> Fixes: 5a977b5833b7 ("net/mlx5: Lag, move devcom registration to LAG layer")
> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
> ---
> Hi Shay et al,
>
> while checking for potential regressions by Lukas Wunner's recent work
> on pci_save/restore_state() for the recoverability of mlx5 functions I
> consistently hit this bug. (Bjorn has queued this up for 6.19, according
> to [0] and [1])
>
> Apparently, the issue is unrelated to Lukas' work but can be reproduced
> with master. It appears to be timing-sensitive, since it shows up only
> when I use s390's debug_defconfig, but I think needs fixing anyhow, as
> timing can change for other reasons, too.
>
> I've spotted two additional places where the devcom reference is not
> cleared after calling mlx5_devcom_unregister_component() in
> drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c that I have not
> addressed with a patch, since I'm unclear about how to test these
> paths.
>
> Thanks,
> Gerd
>
> [0] https://lore.kernel.org/all/cover.1760274044.git.lukas@wunner.de/
> [1] https://lore.kernel.org/linux-pci/cover.1763483367.git.lukas@wunner.de/
> ---
> drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> index 3db0387bf6dcb727a65df9d0253f242554af06db..8ec04a5f434dd4f717d6d556649fcc2a584db847 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> @@ -1413,6 +1413,7 @@ static int __mlx5_lag_dev_add_mdev(struct mlx5_core_dev *dev)
> static void mlx5_lag_unregister_hca_devcom_comp(struct mlx5_core_dev *dev)
> {
> mlx5_devcom_unregister_component(dev->priv.hca_devcom_comp);
> + dev->priv.hca_devcom_comp = NULL;
> }
Though this fix looks correct to me in freeing hca_devcom_comp (not too
familiar with mlx5 internals), I wonder if it would be better to just
set devcom = NULL in devcom_free_comp_dev() after the kfree? This would
also take care of other places where devcom is not set to NULL?
Thanks
Farhan
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component
2025-12-03 21:10 ` Farhan Ali
@ 2025-12-04 8:27 ` Tariq Toukan
0 siblings, 0 replies; 9+ messages in thread
From: Tariq Toukan @ 2025-12-04 8:27 UTC (permalink / raw)
To: Farhan Ali, Gerd Bayer, Saeed Mahameed, Leon Romanovsky,
Tariq Toukan, Mark Bloch, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Shay Drory,
Simon Horman
Cc: Lukas Wunner, Bjorn Helgaas, Niklas Schnelle, netdev, linux-rdma,
linux-kernel, linux-s390, linux-pci
On 03/12/2025 23:10, Farhan Ali wrote:
>
> On 12/2/2025 3:12 AM, Gerd Bayer wrote:
>> Clear hca_devcom_comp in device's private data after unregistering it in
>> LAG teardown. Otherwise a slightly lagging second pass through
>> mlx5_unload_one() might try to unregister it again and trip over
>> use-after-free.
>>
>> On s390 almost all PCI level recovery events trigger two passes through
>> mxl5_unload_one() - one through the poll_health() method and one through
>> mlx5_pci_err_detected() as callback from generic PCI error recovery.
>> While testing PCI error recovery paths with more kernel debug features
>> enabled, this issue reproducibly led to kernel panics with the following
>> call chain:
>>
>> Unable to handle kernel pointer dereference in virtual kernel
>> address space
>> Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803 ESOP-2 FSI
>> Fault in home space mode while using kernel ASCE.
>> AS:00000000705c4007 R3:0000000000000024
>> Oops: 0038 ilc:3 [#1]SMP
>>
>> CPU: 14 UID: 0 PID: 156 Comm: kmcheck Kdump: loaded Not tainted
>> 6.18.0-20251130.rc7.git0.16131a59cab1.300.fc43.s390x+debug #1
>> PREEMPT
>>
>> Krnl PSW : 0404e00180000000 0000020fc86aa1dc
>> (__lock_acquire+0x5c/0x15f0)
>> R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
>> Krnl GPRS: 0000000000000000 0000020f00000001 6b6b6b6b6b6b6c33
>> 0000000000000000
>> 0000000000000000 0000000000000000 0000000000000001
>> 0000000000000000
>> 0000000000000000 0000020fca28b820 0000000000000000
>> 0000010a1ced8100
>> 0000010a1ced8100 0000020fc9775068 0000018fce14f8b8
>> 0000018fce14f7f8
>> Krnl Code: 0000020fc86aa1cc: e3b003400004 lg %r11,832
>> 0000020fc86aa1d2: a7840211 brc
>> 8,0000020fc86aa5f4
>> *0000020fc86aa1d6: c09000df0b25 larl
>> %r9,0000020fca28b820
>> >0000020fc86aa1dc: d50790002000 clc 0(8,%r9),0(%r2)
>> 0000020fc86aa1e2: a7840209 brc
>> 8,0000020fc86aa5f4
>> 0000020fc86aa1e6: c0e001100401 larl
>> %r14,0000020fca8aa9e8
>> 0000020fc86aa1ec: c01000e25a00 larl
>> %r1,0000020fca2f55ec
>> 0000020fc86aa1f2: a7eb00e8 aghi %r14,232
>>
>> Call Trace:
>> __lock_acquire+0x5c/0x15f0
>> lock_acquire.part.0+0xf8/0x270
>> lock_acquire+0xb0/0x1b0
>> down_write+0x5a/0x250
>> mlx5_detach_device+0x42/0x110 [mlx5_core]
>> mlx5_unload_one_devl_locked+0x50/0xc0 [mlx5_core]
>> mlx5_unload_one+0x42/0x60 [mlx5_core]
>> mlx5_pci_err_detected+0x94/0x150 [mlx5_core]
>> zpci_event_attempt_error_recovery+0xcc/0x388
>>
>> Fixes: 5a977b5833b7 ("net/mlx5: Lag, move devcom registration to LAG
>> layer")
>> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
>> ---
>> Hi Shay et al,
>>
>> while checking for potential regressions by Lukas Wunner's recent work
>> on pci_save/restore_state() for the recoverability of mlx5 functions I
>> consistently hit this bug. (Bjorn has queued this up for 6.19, according
>> to [0] and [1])
>>
>> Apparently, the issue is unrelated to Lukas' work but can be reproduced
>> with master. It appears to be timing-sensitive, since it shows up only
>> when I use s390's debug_defconfig, but I think needs fixing anyhow, as
>> timing can change for other reasons, too.
>>
>> I've spotted two additional places where the devcom reference is not
>> cleared after calling mlx5_devcom_unregister_component() in
>> drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c that I have not
>> addressed with a patch, since I'm unclear about how to test these
>> paths.
>>
>> Thanks,
>> Gerd
>>
>> [0] https://lore.kernel.org/all/cover.1760274044.git.lukas@wunner.de/
>> [1] https://lore.kernel.org/linux-pci/
>> cover.1763483367.git.lukas@wunner.de/
>> ---
>> drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/
>> drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
>> index
>> 3db0387bf6dcb727a65df9d0253f242554af06db..8ec04a5f434dd4f717d6d556649fcc2a584db847 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
>> @@ -1413,6 +1413,7 @@ static int __mlx5_lag_dev_add_mdev(struct
>> mlx5_core_dev *dev)
>> static void mlx5_lag_unregister_hca_devcom_comp(struct mlx5_core_dev
>> *dev)
>> {
>> mlx5_devcom_unregister_component(dev->priv.hca_devcom_comp);
>> + dev->priv.hca_devcom_comp = NULL;
>> }
>
> Though this fix looks correct to me in freeing hca_devcom_comp (not too
> familiar with mlx5 internals), I wonder if it would be better to just
> set devcom = NULL in devcom_free_comp_dev() after the kfree? This would
> also take care of other places where devcom is not set to NULL?
>
Setting NULL after the kfree will have no impact, it won't nullify the
original field, but the function parameter copy (by-value).
devcom_free_comp_dev() and mlx5_devcom_unregister_component() get struct
mlx5_devcom_comp_dev *devcom to work with, they can't nullify it for the
caller context.
> Thanks
>
> Farhan
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component
2025-12-02 11:12 [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component Gerd Bayer
2025-12-03 15:14 ` Moshe Shemesh
2025-12-03 21:10 ` Farhan Ali
@ 2025-12-04 9:00 ` Tariq Toukan
2025-12-04 14:30 ` patchwork-bot+netdevbpf
3 siblings, 0 replies; 9+ messages in thread
From: Tariq Toukan @ 2025-12-04 9:00 UTC (permalink / raw)
To: Gerd Bayer, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
Mark Bloch, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Shay Drory, Simon Horman
Cc: Lukas Wunner, Bjorn Helgaas, Niklas Schnelle, Farhan Ali, netdev,
linux-rdma, linux-kernel, linux-s390, linux-pci
On 02/12/2025 13:12, Gerd Bayer wrote:
> Clear hca_devcom_comp in device's private data after unregistering it in
> LAG teardown. Otherwise a slightly lagging second pass through
> mlx5_unload_one() might try to unregister it again and trip over
> use-after-free.
>
> On s390 almost all PCI level recovery events trigger two passes through
> mxl5_unload_one() - one through the poll_health() method and one through
> mlx5_pci_err_detected() as callback from generic PCI error recovery.
> While testing PCI error recovery paths with more kernel debug features
> enabled, this issue reproducibly led to kernel panics with the following
> call chain:
>
> Unable to handle kernel pointer dereference in virtual kernel address space
> Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803 ESOP-2 FSI
> Fault in home space mode while using kernel ASCE.
> AS:00000000705c4007 R3:0000000000000024
> Oops: 0038 ilc:3 [#1]SMP
>
> CPU: 14 UID: 0 PID: 156 Comm: kmcheck Kdump: loaded Not tainted
> 6.18.0-20251130.rc7.git0.16131a59cab1.300.fc43.s390x+debug #1 PREEMPT
>
> Krnl PSW : 0404e00180000000 0000020fc86aa1dc (__lock_acquire+0x5c/0x15f0)
> R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
> Krnl GPRS: 0000000000000000 0000020f00000001 6b6b6b6b6b6b6c33 0000000000000000
> 0000000000000000 0000000000000000 0000000000000001 0000000000000000
> 0000000000000000 0000020fca28b820 0000000000000000 0000010a1ced8100
> 0000010a1ced8100 0000020fc9775068 0000018fce14f8b8 0000018fce14f7f8
> Krnl Code: 0000020fc86aa1cc: e3b003400004 lg %r11,832
> 0000020fc86aa1d2: a7840211 brc 8,0000020fc86aa5f4
> *0000020fc86aa1d6: c09000df0b25 larl %r9,0000020fca28b820
> >0000020fc86aa1dc: d50790002000 clc 0(8,%r9),0(%r2)
> 0000020fc86aa1e2: a7840209 brc 8,0000020fc86aa5f4
> 0000020fc86aa1e6: c0e001100401 larl %r14,0000020fca8aa9e8
> 0000020fc86aa1ec: c01000e25a00 larl %r1,0000020fca2f55ec
> 0000020fc86aa1f2: a7eb00e8 aghi %r14,232
>
> Call Trace:
> __lock_acquire+0x5c/0x15f0
> lock_acquire.part.0+0xf8/0x270
> lock_acquire+0xb0/0x1b0
> down_write+0x5a/0x250
> mlx5_detach_device+0x42/0x110 [mlx5_core]
> mlx5_unload_one_devl_locked+0x50/0xc0 [mlx5_core]
> mlx5_unload_one+0x42/0x60 [mlx5_core]
> mlx5_pci_err_detected+0x94/0x150 [mlx5_core]
> zpci_event_attempt_error_recovery+0xcc/0x388
>
> Fixes: 5a977b5833b7 ("net/mlx5: Lag, move devcom registration to LAG layer")
> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
> ---
> Hi Shay et al,
>
> while checking for potential regressions by Lukas Wunner's recent work
> on pci_save/restore_state() for the recoverability of mlx5 functions I
> consistently hit this bug. (Bjorn has queued this up for 6.19, according
> to [0] and [1])
>
> Apparently, the issue is unrelated to Lukas' work but can be reproduced
> with master. It appears to be timing-sensitive, since it shows up only
> when I use s390's debug_defconfig, but I think needs fixing anyhow, as
> timing can change for other reasons, too.
>
> I've spotted two additional places where the devcom reference is not
> cleared after calling mlx5_devcom_unregister_component() in
> drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c that I have not
> addressed with a patch, since I'm unclear about how to test these
> paths.
>
> Thanks,
> Gerd
>
> [0] https://lore.kernel.org/all/cover.1760274044.git.lukas@wunner.de/
> [1] https://lore.kernel.org/linux-pci/cover.1763483367.git.lukas@wunner.de/
> ---
> drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> index 3db0387bf6dcb727a65df9d0253f242554af06db..8ec04a5f434dd4f717d6d556649fcc2a584db847 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> @@ -1413,6 +1413,7 @@ static int __mlx5_lag_dev_add_mdev(struct mlx5_core_dev *dev)
> static void mlx5_lag_unregister_hca_devcom_comp(struct mlx5_core_dev *dev)
> {
> mlx5_devcom_unregister_component(dev->priv.hca_devcom_comp);
> + dev->priv.hca_devcom_comp = NULL;
> }
>
> static int mlx5_lag_register_hca_devcom_comp(struct mlx5_core_dev *dev)
>
> ---
> base-commit: 4a26e7032d7d57c998598c08a034872d6f0d3945
> change-id: 20251202-fix_lag-6a59b39a0b3c
>
> Best regards,
Thanks for your patch.
Acked-by: Tariq Toukan <tariqt@nvidia.com>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component
2025-12-02 11:12 [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component Gerd Bayer
` (2 preceding siblings ...)
2025-12-04 9:00 ` Tariq Toukan
@ 2025-12-04 14:30 ` patchwork-bot+netdevbpf
3 siblings, 0 replies; 9+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-12-04 14:30 UTC (permalink / raw)
To: Gerd Bayer
Cc: saeedm, leon, tariqt, mbloch, andrew+netdev, davem, edumazet,
kuba, pabeni, shayd, horms, lukas, helgaas, schnelle, alifm,
netdev, linux-rdma, linux-kernel, linux-s390, linux-pci
Hello:
This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:
On Tue, 02 Dec 2025 12:12:57 +0100 you wrote:
> Clear hca_devcom_comp in device's private data after unregistering it in
> LAG teardown. Otherwise a slightly lagging second pass through
> mlx5_unload_one() might try to unregister it again and trip over
> use-after-free.
>
> On s390 almost all PCI level recovery events trigger two passes through
> mxl5_unload_one() - one through the poll_health() method and one through
> mlx5_pci_err_detected() as callback from generic PCI error recovery.
> While testing PCI error recovery paths with more kernel debug features
> enabled, this issue reproducibly led to kernel panics with the following
> call chain:
>
> [...]
Here is the summary with links:
- [net] net/mlx5: Fix double unregister of HCA_PORTS component
https://git.kernel.org/netdev/net/c/6a107cfe9c99
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 9+ messages in thread