[PATCH net] net/mlx5: Fix double unregister of HCA

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component
@ 2025-12-02 11:12 Gerd Bayer
  2025-12-03 15:14 ` Moshe Shemesh
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Gerd Bayer @ 2025-12-02 11:12 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Shay Drory, Simon Horman
  Cc: Lukas Wunner, Bjorn Helgaas, Niklas Schnelle, Farhan Ali, netdev,
	linux-rdma, linux-kernel, linux-s390, linux-pci, Gerd Bayer

Clear hca_devcom_comp in device's private data after unregistering it in
LAG teardown. Otherwise a slightly lagging second pass through
mlx5_unload_one() might try to unregister it again and trip over
use-after-free.

On s390 almost all PCI level recovery events trigger two passes through
mxl5_unload_one() - one through the poll_health() method and one through
mlx5_pci_err_detected() as callback from generic PCI error recovery.
While testing PCI error recovery paths with more kernel debug features
enabled, this issue reproducibly led to kernel panics with the following
call chain:

 Unable to handle kernel pointer dereference in virtual kernel address space
 Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803 ESOP-2 FSI
 Fault in home space mode while using kernel ASCE.
 AS:00000000705c4007 R3:0000000000000024
 Oops: 0038 ilc:3 [#1]SMP

 CPU: 14 UID: 0 PID: 156 Comm: kmcheck Kdump: loaded Not tainted
      6.18.0-20251130.rc7.git0.16131a59cab1.300.fc43.s390x+debug #1 PREEMPT

 Krnl PSW : 0404e00180000000 0000020fc86aa1dc (__lock_acquire+0x5c/0x15f0)
            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
 Krnl GPRS: 0000000000000000 0000020f00000001 6b6b6b6b6b6b6c33 0000000000000000
            0000000000000000 0000000000000000 0000000000000001 0000000000000000
            0000000000000000 0000020fca28b820 0000000000000000 0000010a1ced8100
            0000010a1ced8100 0000020fc9775068 0000018fce14f8b8 0000018fce14f7f8
 Krnl Code: 0000020fc86aa1cc: e3b003400004        lg      %r11,832
            0000020fc86aa1d2: a7840211           brc     8,0000020fc86aa5f4
           *0000020fc86aa1d6: c09000df0b25       larl    %r9,0000020fca28b820
           >0000020fc86aa1dc: d50790002000       clc     0(8,%r9),0(%r2)
            0000020fc86aa1e2: a7840209           brc     8,0000020fc86aa5f4
            0000020fc86aa1e6: c0e001100401       larl    %r14,0000020fca8aa9e8
            0000020fc86aa1ec: c01000e25a00       larl    %r1,0000020fca2f55ec
            0000020fc86aa1f2: a7eb00e8           aghi    %r14,232

 Call Trace:
  __lock_acquire+0x5c/0x15f0
  lock_acquire.part.0+0xf8/0x270
  lock_acquire+0xb0/0x1b0
  down_write+0x5a/0x250
  mlx5_detach_device+0x42/0x110 [mlx5_core]
  mlx5_unload_one_devl_locked+0x50/0xc0 [mlx5_core]
  mlx5_unload_one+0x42/0x60 [mlx5_core]
  mlx5_pci_err_detected+0x94/0x150 [mlx5_core]
  zpci_event_attempt_error_recovery+0xcc/0x388

Fixes: 5a977b5833b7 ("net/mlx5: Lag, move devcom registration to LAG layer")
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
---
Hi Shay et al,

while checking for potential regressions by Lukas Wunner's recent work
on pci_save/restore_state() for the recoverability of mlx5 functions I
consistently hit this bug. (Bjorn has queued this up for 6.19, according
to [0] and [1]) 

Apparently, the issue is unrelated to Lukas' work but can be reproduced
with master. It appears to be timing-sensitive, since it shows up only
when I use s390's debug_defconfig, but I think needs fixing anyhow, as
timing can change for other reasons, too.

I've spotted two additional places where the devcom reference is not
cleared after calling mlx5_devcom_unregister_component() in
drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c that I have not
addressed with a patch, since I'm unclear about how to test these
paths.

Thanks,
Gerd

[0] https://lore.kernel.org/all/cover.1760274044.git.lukas@wunner.de/
[1] https://lore.kernel.org/linux-pci/cover.1763483367.git.lukas@wunner.de/
---
 drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index 3db0387bf6dcb727a65df9d0253f242554af06db..8ec04a5f434dd4f717d6d556649fcc2a584db847 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -1413,6 +1413,7 @@ static int __mlx5_lag_dev_add_mdev(struct mlx5_core_dev *dev)
 static void mlx5_lag_unregister_hca_devcom_comp(struct mlx5_core_dev *dev)
 {
 	mlx5_devcom_unregister_component(dev->priv.hca_devcom_comp);
+	dev->priv.hca_devcom_comp = NULL;
 }

 static int mlx5_lag_register_hca_devcom_comp(struct mlx5_core_dev *dev)

---
base-commit: 4a26e7032d7d57c998598c08a034872d6f0d3945
change-id: 20251202-fix_lag-6a59b39a0b3c

Best regards,
-- 
Gerd Bayer <gbayer@linux.ibm.com>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component
  2025-12-02 11:12 [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component Gerd Bayer
@ 2025-12-03 15:14 ` Moshe Shemesh
  2025-12-04  9:48   ` Gerd Bayer
  2025-12-03 21:10 ` Farhan Ali
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Moshe Shemesh @ 2025-12-03 15:14 UTC (permalink / raw)
  To: Gerd Bayer, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Shay Drory, Simon Horman
  Cc: Lukas Wunner, Bjorn Helgaas, Niklas Schnelle, Farhan Ali, netdev,
	linux-rdma, linux-kernel, linux-s390, linux-pci



On 12/2/2025 1:12 PM, Gerd Bayer wrote:
> Clear hca_devcom_comp in device's private data after unregistering it in
> LAG teardown. Otherwise a slightly lagging second pass through
> mlx5_unload_one() might try to unregister it again and trip over
> use-after-free.
> 
> On s390 almost all PCI level recovery events trigger two passes through
> mxl5_unload_one() - one through the poll_health() method and one through
> mlx5_pci_err_detected() as callback from generic PCI error recovery.
> While testing PCI error recovery paths with more kernel debug features
> enabled, this issue reproducibly led to kernel panics with the following
> call chain:
> 
>   Unable to handle kernel pointer dereference in virtual kernel address space
>   Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803 ESOP-2 FSI
>   Fault in home space mode while using kernel ASCE.
>   AS:00000000705c4007 R3:0000000000000024
>   Oops: 0038 ilc:3 [#1]SMP
> 
>   CPU: 14 UID: 0 PID: 156 Comm: kmcheck Kdump: loaded Not tainted
>        6.18.0-20251130.rc7.git0.16131a59cab1.300.fc43.s390x+debug #1 PREEMPT
> 
>   Krnl PSW : 0404e00180000000 0000020fc86aa1dc (__lock_acquire+0x5c/0x15f0)
>              R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
>   Krnl GPRS: 0000000000000000 0000020f00000001 6b6b6b6b6b6b6c33 0000000000000000
>              0000000000000000 0000000000000000 0000000000000001 0000000000000000
>              0000000000000000 0000020fca28b820 0000000000000000 0000010a1ced8100
>              0000010a1ced8100 0000020fc9775068 0000018fce14f8b8 0000018fce14f7f8
>   Krnl Code: 0000020fc86aa1cc: e3b003400004        lg      %r11,832
>              0000020fc86aa1d2: a7840211           brc     8,0000020fc86aa5f4
>             *0000020fc86aa1d6: c09000df0b25       larl    %r9,0000020fca28b820
>             >0000020fc86aa1dc: d50790002000       clc     0(8,%r9),0(%r2)
>              0000020fc86aa1e2: a7840209           brc     8,0000020fc86aa5f4
>              0000020fc86aa1e6: c0e001100401       larl    %r14,0000020fca8aa9e8
>              0000020fc86aa1ec: c01000e25a00       larl    %r1,0000020fca2f55ec
>              0000020fc86aa1f2: a7eb00e8           aghi    %r14,232
> 
>   Call Trace:
>    __lock_acquire+0x5c/0x15f0
>    lock_acquire.part.0+0xf8/0x270
>    lock_acquire+0xb0/0x1b0
>    down_write+0x5a/0x250
>    mlx5_detach_device+0x42/0x110 [mlx5_core]
>    mlx5_unload_one_devl_locked+0x50/0xc0 [mlx5_core]
>    mlx5_unload_one+0x42/0x60 [mlx5_core]
>    mlx5_pci_err_detected+0x94/0x150 [mlx5_core]
>    zpci_event_attempt_error_recovery+0xcc/0x388
> 
> Fixes: 5a977b5833b7 ("net/mlx5: Lag, move devcom registration to LAG layer")
> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>

Reviewed-by: Moshe Shemesh <moshe@nvidia.com>> ---
> Hi Shay et al,
> 
> while checking for potential regressions by Lukas Wunner's recent work
> on pci_save/restore_state() for the recoverability of mlx5 functions I
> consistently hit this bug. (Bjorn has queued this up for 6.19, according
> to [0] and [1])
> 
> Apparently, the issue is unrelated to Lukas' work but can be reproduced
> with master. It appears to be timing-sensitive, since it shows up only
> when I use s390's debug_defconfig, but I think needs fixing anyhow, as
> timing can change for other reasons, too.

Hi Gerd,
  I stepped on this bug recently too, without s390 and was about to 
submit same fix :) So as you wrote it is unrelated to Lukas' patches and 
this fix is correct.

> 
> I've spotted two additional places where the devcom reference is not
> cleared after calling mlx5_devcom_unregister_component() in
> drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c that I have not
> addressed with a patch, since I'm unclear about how to test these
> paths.

As for the other cases, we had the patch 664f76be38a1 ("net/mlx5: Fix 
IPsec cleanup over MPV device") and two other cases on shared clock and 
SD but I don't see any flow the shared clock or SD can fail, 
specifically mlx5_sd_cleanup() checks sd pointer at beginning of the 
function and nullify it right after sd_unregister() that free devcom.

Thanks,
Moshe.>
> Thanks,
> Gerd
> 
> [0] https://lore.kernel.org/all/cover.1760274044.git.lukas@wunner.de/
> [1] https://lore.kernel.org/linux-pci/cover.1763483367.git.lukas@wunner.de/
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> index 3db0387bf6dcb727a65df9d0253f242554af06db..8ec04a5f434dd4f717d6d556649fcc2a584db847 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> @@ -1413,6 +1413,7 @@ static int __mlx5_lag_dev_add_mdev(struct mlx5_core_dev *dev)
>   static void mlx5_lag_unregister_hca_devcom_comp(struct mlx5_core_dev *dev)
>   {
>   	mlx5_devcom_unregister_component(dev->priv.hca_devcom_comp);
> +	dev->priv.hca_devcom_comp = NULL;
>   }
>   
>   static int mlx5_lag_register_hca_devcom_comp(struct mlx5_core_dev *dev)
> 
> ---
> base-commit: 4a26e7032d7d57c998598c08a034872d6f0d3945
> change-id: 20251202-fix_lag-6a59b39a0b3c
> 
> Best regards,


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component
  2025-12-02 11:12 [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component Gerd Bayer
  2025-12-03 15:14 ` Moshe Shemesh
@ 2025-12-03 21:10 ` Farhan Ali
  2025-12-04  8:27   ` Tariq Toukan
  2025-12-04  9:00 ` Tariq Toukan
  2025-12-04 14:30 ` patchwork-bot+netdevbpf
  3 siblings, 1 reply; 9+ messages in thread
From: Farhan Ali @ 2025-12-03 21:10 UTC (permalink / raw)
  To: Gerd Bayer, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Shay Drory, Simon Horman
  Cc: Lukas Wunner, Bjorn Helgaas, Niklas Schnelle, netdev, linux-rdma,
	linux-kernel, linux-s390, linux-pci


On 12/2/2025 3:12 AM, Gerd Bayer wrote:
> Clear hca_devcom_comp in device's private data after unregistering it in
> LAG teardown. Otherwise a slightly lagging second pass through
> mlx5_unload_one() might try to unregister it again and trip over
> use-after-free.
>
> On s390 almost all PCI level recovery events trigger two passes through
> mxl5_unload_one() - one through the poll_health() method and one through
> mlx5_pci_err_detected() as callback from generic PCI error recovery.
> While testing PCI error recovery paths with more kernel debug features
> enabled, this issue reproducibly led to kernel panics with the following
> call chain:
>
>   Unable to handle kernel pointer dereference in virtual kernel address space
>   Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803 ESOP-2 FSI
>   Fault in home space mode while using kernel ASCE.
>   AS:00000000705c4007 R3:0000000000000024
>   Oops: 0038 ilc:3 [#1]SMP
>
>   CPU: 14 UID: 0 PID: 156 Comm: kmcheck Kdump: loaded Not tainted
>        6.18.0-20251130.rc7.git0.16131a59cab1.300.fc43.s390x+debug #1 PREEMPT
>
>   Krnl PSW : 0404e00180000000 0000020fc86aa1dc (__lock_acquire+0x5c/0x15f0)
>              R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
>   Krnl GPRS: 0000000000000000 0000020f00000001 6b6b6b6b6b6b6c33 0000000000000000
>              0000000000000000 0000000000000000 0000000000000001 0000000000000000
>              0000000000000000 0000020fca28b820 0000000000000000 0000010a1ced8100
>              0000010a1ced8100 0000020fc9775068 0000018fce14f8b8 0000018fce14f7f8
>   Krnl Code: 0000020fc86aa1cc: e3b003400004        lg      %r11,832
>              0000020fc86aa1d2: a7840211           brc     8,0000020fc86aa5f4
>             *0000020fc86aa1d6: c09000df0b25       larl    %r9,0000020fca28b820
>             >0000020fc86aa1dc: d50790002000       clc     0(8,%r9),0(%r2)
>              0000020fc86aa1e2: a7840209           brc     8,0000020fc86aa5f4
>              0000020fc86aa1e6: c0e001100401       larl    %r14,0000020fca8aa9e8
>              0000020fc86aa1ec: c01000e25a00       larl    %r1,0000020fca2f55ec
>              0000020fc86aa1f2: a7eb00e8           aghi    %r14,232
>
>   Call Trace:
>    __lock_acquire+0x5c/0x15f0
>    lock_acquire.part.0+0xf8/0x270
>    lock_acquire+0xb0/0x1b0
>    down_write+0x5a/0x250
>    mlx5_detach_device+0x42/0x110 [mlx5_core]
>    mlx5_unload_one_devl_locked+0x50/0xc0 [mlx5_core]
>    mlx5_unload_one+0x42/0x60 [mlx5_core]
>    mlx5_pci_err_detected+0x94/0x150 [mlx5_core]
>    zpci_event_attempt_error_recovery+0xcc/0x388
>
> Fixes: 5a977b5833b7 ("net/mlx5: Lag, move devcom registration to LAG layer")
> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
> ---
> Hi Shay et al,
>
> while checking for potential regressions by Lukas Wunner's recent work
> on pci_save/restore_state() for the recoverability of mlx5 functions I
> consistently hit this bug. (Bjorn has queued this up for 6.19, according
> to [0] and [1])
>
> Apparently, the issue is unrelated to Lukas' work but can be reproduced
> with master. It appears to be timing-sensitive, since it shows up only
> when I use s390's debug_defconfig, but I think needs fixing anyhow, as
> timing can change for other reasons, too.
>
> I've spotted two additional places where the devcom reference is not
> cleared after calling mlx5_devcom_unregister_component() in
> drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c that I have not
> addressed with a patch, since I'm unclear about how to test these
> paths.
>
> Thanks,
> Gerd
>
> [0] https://lore.kernel.org/all/cover.1760274044.git.lukas@wunner.de/
> [1] https://lore.kernel.org/linux-pci/cover.1763483367.git.lukas@wunner.de/
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> index 3db0387bf6dcb727a65df9d0253f242554af06db..8ec04a5f434dd4f717d6d556649fcc2a584db847 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> @@ -1413,6 +1413,7 @@ static int __mlx5_lag_dev_add_mdev(struct mlx5_core_dev *dev)
>   static void mlx5_lag_unregister_hca_devcom_comp(struct mlx5_core_dev *dev)
>   {
>   	mlx5_devcom_unregister_component(dev->priv.hca_devcom_comp);
> +	dev->priv.hca_devcom_comp = NULL;
>   }

Though this fix looks correct to me in freeing hca_devcom_comp (not too 
familiar with mlx5 internals), I wonder if it would be better to just 
set devcom = NULL in devcom_free_comp_dev() after the kfree? This would 
also take care of other places where devcom is not set to NULL?

Thanks

Farhan


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component
  2025-12-03 21:10 ` Farhan Ali
@ 2025-12-04  8:27   ` Tariq Toukan
  0 siblings, 0 replies; 9+ messages in thread
From: Tariq Toukan @ 2025-12-04  8:27 UTC (permalink / raw)
  To: Farhan Ali, Gerd Bayer, Saeed Mahameed, Leon Romanovsky,
	Tariq Toukan, Mark Bloch, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Shay Drory,
	Simon Horman
  Cc: Lukas Wunner, Bjorn Helgaas, Niklas Schnelle, netdev, linux-rdma,
	linux-kernel, linux-s390, linux-pci



On 03/12/2025 23:10, Farhan Ali wrote:
> 
> On 12/2/2025 3:12 AM, Gerd Bayer wrote:
>> Clear hca_devcom_comp in device's private data after unregistering it in
>> LAG teardown. Otherwise a slightly lagging second pass through
>> mlx5_unload_one() might try to unregister it again and trip over
>> use-after-free.
>>
>> On s390 almost all PCI level recovery events trigger two passes through
>> mxl5_unload_one() - one through the poll_health() method and one through
>> mlx5_pci_err_detected() as callback from generic PCI error recovery.
>> While testing PCI error recovery paths with more kernel debug features
>> enabled, this issue reproducibly led to kernel panics with the following
>> call chain:
>>
>>   Unable to handle kernel pointer dereference in virtual kernel 
>> address space
>>   Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803 ESOP-2 FSI
>>   Fault in home space mode while using kernel ASCE.
>>   AS:00000000705c4007 R3:0000000000000024
>>   Oops: 0038 ilc:3 [#1]SMP
>>
>>   CPU: 14 UID: 0 PID: 156 Comm: kmcheck Kdump: loaded Not tainted
>>        6.18.0-20251130.rc7.git0.16131a59cab1.300.fc43.s390x+debug #1 
>> PREEMPT
>>
>>   Krnl PSW : 0404e00180000000 0000020fc86aa1dc 
>> (__lock_acquire+0x5c/0x15f0)
>>              R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
>>   Krnl GPRS: 0000000000000000 0000020f00000001 6b6b6b6b6b6b6c33 
>> 0000000000000000
>>              0000000000000000 0000000000000000 0000000000000001 
>> 0000000000000000
>>              0000000000000000 0000020fca28b820 0000000000000000 
>> 0000010a1ced8100
>>              0000010a1ced8100 0000020fc9775068 0000018fce14f8b8 
>> 0000018fce14f7f8
>>   Krnl Code: 0000020fc86aa1cc: e3b003400004        lg      %r11,832
>>              0000020fc86aa1d2: a7840211           brc     
>> 8,0000020fc86aa5f4
>>             *0000020fc86aa1d6: c09000df0b25       larl    
>> %r9,0000020fca28b820
>>             >0000020fc86aa1dc: d50790002000       clc     0(8,%r9),0(%r2)
>>              0000020fc86aa1e2: a7840209           brc     
>> 8,0000020fc86aa5f4
>>              0000020fc86aa1e6: c0e001100401       larl    
>> %r14,0000020fca8aa9e8
>>              0000020fc86aa1ec: c01000e25a00       larl    
>> %r1,0000020fca2f55ec
>>              0000020fc86aa1f2: a7eb00e8           aghi    %r14,232
>>
>>   Call Trace:
>>    __lock_acquire+0x5c/0x15f0
>>    lock_acquire.part.0+0xf8/0x270
>>    lock_acquire+0xb0/0x1b0
>>    down_write+0x5a/0x250
>>    mlx5_detach_device+0x42/0x110 [mlx5_core]
>>    mlx5_unload_one_devl_locked+0x50/0xc0 [mlx5_core]
>>    mlx5_unload_one+0x42/0x60 [mlx5_core]
>>    mlx5_pci_err_detected+0x94/0x150 [mlx5_core]
>>    zpci_event_attempt_error_recovery+0xcc/0x388
>>
>> Fixes: 5a977b5833b7 ("net/mlx5: Lag, move devcom registration to LAG 
>> layer")
>> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
>> ---
>> Hi Shay et al,
>>
>> while checking for potential regressions by Lukas Wunner's recent work
>> on pci_save/restore_state() for the recoverability of mlx5 functions I
>> consistently hit this bug. (Bjorn has queued this up for 6.19, according
>> to [0] and [1])
>>
>> Apparently, the issue is unrelated to Lukas' work but can be reproduced
>> with master. It appears to be timing-sensitive, since it shows up only
>> when I use s390's debug_defconfig, but I think needs fixing anyhow, as
>> timing can change for other reasons, too.
>>
>> I've spotted two additional places where the devcom reference is not
>> cleared after calling mlx5_devcom_unregister_component() in
>> drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c that I have not
>> addressed with a patch, since I'm unclear about how to test these
>> paths.
>>
>> Thanks,
>> Gerd
>>
>> [0] https://lore.kernel.org/all/cover.1760274044.git.lukas@wunner.de/
>> [1] https://lore.kernel.org/linux-pci/ 
>> cover.1763483367.git.lukas@wunner.de/
>> ---
>>   drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/ 
>> drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
>> index 
>> 3db0387bf6dcb727a65df9d0253f242554af06db..8ec04a5f434dd4f717d6d556649fcc2a584db847 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
>> @@ -1413,6 +1413,7 @@ static int __mlx5_lag_dev_add_mdev(struct 
>> mlx5_core_dev *dev)
>>   static void mlx5_lag_unregister_hca_devcom_comp(struct mlx5_core_dev 
>> *dev)
>>   {
>>       mlx5_devcom_unregister_component(dev->priv.hca_devcom_comp);
>> +    dev->priv.hca_devcom_comp = NULL;
>>   }
> 
> Though this fix looks correct to me in freeing hca_devcom_comp (not too 
> familiar with mlx5 internals), I wonder if it would be better to just 
> set devcom = NULL in devcom_free_comp_dev() after the kfree? This would 
> also take care of other places where devcom is not set to NULL?
> 

Setting NULL after the kfree will have no impact, it won't nullify the 
original field, but the function parameter copy (by-value).

devcom_free_comp_dev() and mlx5_devcom_unregister_component() get struct 
mlx5_devcom_comp_dev *devcom to work with, they can't nullify it for the 
caller context.



> Thanks
> 
> Farhan
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component
  2025-12-02 11:12 [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component Gerd Bayer
  2025-12-03 15:14 ` Moshe Shemesh
  2025-12-03 21:10 ` Farhan Ali
@ 2025-12-04  9:00 ` Tariq Toukan
  2025-12-04 14:30 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 9+ messages in thread
From: Tariq Toukan @ 2025-12-04  9:00 UTC (permalink / raw)
  To: Gerd Bayer, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Shay Drory, Simon Horman
  Cc: Lukas Wunner, Bjorn Helgaas, Niklas Schnelle, Farhan Ali, netdev,
	linux-rdma, linux-kernel, linux-s390, linux-pci



On 02/12/2025 13:12, Gerd Bayer wrote:
> Clear hca_devcom_comp in device's private data after unregistering it in
> LAG teardown. Otherwise a slightly lagging second pass through
> mlx5_unload_one() might try to unregister it again and trip over
> use-after-free.
> 
> On s390 almost all PCI level recovery events trigger two passes through
> mxl5_unload_one() - one through the poll_health() method and one through
> mlx5_pci_err_detected() as callback from generic PCI error recovery.
> While testing PCI error recovery paths with more kernel debug features
> enabled, this issue reproducibly led to kernel panics with the following
> call chain:
> 
>   Unable to handle kernel pointer dereference in virtual kernel address space
>   Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803 ESOP-2 FSI
>   Fault in home space mode while using kernel ASCE.
>   AS:00000000705c4007 R3:0000000000000024
>   Oops: 0038 ilc:3 [#1]SMP
> 
>   CPU: 14 UID: 0 PID: 156 Comm: kmcheck Kdump: loaded Not tainted
>        6.18.0-20251130.rc7.git0.16131a59cab1.300.fc43.s390x+debug #1 PREEMPT
> 
>   Krnl PSW : 0404e00180000000 0000020fc86aa1dc (__lock_acquire+0x5c/0x15f0)
>              R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
>   Krnl GPRS: 0000000000000000 0000020f00000001 6b6b6b6b6b6b6c33 0000000000000000
>              0000000000000000 0000000000000000 0000000000000001 0000000000000000
>              0000000000000000 0000020fca28b820 0000000000000000 0000010a1ced8100
>              0000010a1ced8100 0000020fc9775068 0000018fce14f8b8 0000018fce14f7f8
>   Krnl Code: 0000020fc86aa1cc: e3b003400004        lg      %r11,832
>              0000020fc86aa1d2: a7840211           brc     8,0000020fc86aa5f4
>             *0000020fc86aa1d6: c09000df0b25       larl    %r9,0000020fca28b820
>             >0000020fc86aa1dc: d50790002000       clc     0(8,%r9),0(%r2)
>              0000020fc86aa1e2: a7840209           brc     8,0000020fc86aa5f4
>              0000020fc86aa1e6: c0e001100401       larl    %r14,0000020fca8aa9e8
>              0000020fc86aa1ec: c01000e25a00       larl    %r1,0000020fca2f55ec
>              0000020fc86aa1f2: a7eb00e8           aghi    %r14,232
> 
>   Call Trace:
>    __lock_acquire+0x5c/0x15f0
>    lock_acquire.part.0+0xf8/0x270
>    lock_acquire+0xb0/0x1b0
>    down_write+0x5a/0x250
>    mlx5_detach_device+0x42/0x110 [mlx5_core]
>    mlx5_unload_one_devl_locked+0x50/0xc0 [mlx5_core]
>    mlx5_unload_one+0x42/0x60 [mlx5_core]
>    mlx5_pci_err_detected+0x94/0x150 [mlx5_core]
>    zpci_event_attempt_error_recovery+0xcc/0x388
> 
> Fixes: 5a977b5833b7 ("net/mlx5: Lag, move devcom registration to LAG layer")
> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
> ---
> Hi Shay et al,
> 
> while checking for potential regressions by Lukas Wunner's recent work
> on pci_save/restore_state() for the recoverability of mlx5 functions I
> consistently hit this bug. (Bjorn has queued this up for 6.19, according
> to [0] and [1])
> 
> Apparently, the issue is unrelated to Lukas' work but can be reproduced
> with master. It appears to be timing-sensitive, since it shows up only
> when I use s390's debug_defconfig, but I think needs fixing anyhow, as
> timing can change for other reasons, too.
> 
> I've spotted two additional places where the devcom reference is not
> cleared after calling mlx5_devcom_unregister_component() in
> drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c that I have not
> addressed with a patch, since I'm unclear about how to test these
> paths.
> 
> Thanks,
> Gerd
> 
> [0] https://lore.kernel.org/all/cover.1760274044.git.lukas@wunner.de/
> [1] https://lore.kernel.org/linux-pci/cover.1763483367.git.lukas@wunner.de/
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> index 3db0387bf6dcb727a65df9d0253f242554af06db..8ec04a5f434dd4f717d6d556649fcc2a584db847 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> @@ -1413,6 +1413,7 @@ static int __mlx5_lag_dev_add_mdev(struct mlx5_core_dev *dev)
>   static void mlx5_lag_unregister_hca_devcom_comp(struct mlx5_core_dev *dev)
>   {
>   	mlx5_devcom_unregister_component(dev->priv.hca_devcom_comp);
> +	dev->priv.hca_devcom_comp = NULL;
>   }
>   
>   static int mlx5_lag_register_hca_devcom_comp(struct mlx5_core_dev *dev)
> 
> ---
> base-commit: 4a26e7032d7d57c998598c08a034872d6f0d3945
> change-id: 20251202-fix_lag-6a59b39a0b3c
> 
> Best regards,

Thanks for your patch.
Acked-by: Tariq Toukan <tariqt@nvidia.com>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component
  2025-12-03 15:14 ` Moshe Shemesh
@ 2025-12-04  9:48   ` Gerd Bayer
  2025-12-04 17:07     ` Moshe Shemesh
  0 siblings, 1 reply; 9+ messages in thread
From: Gerd Bayer @ 2025-12-04  9:48 UTC (permalink / raw)
  To: Moshe Shemesh, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Shay Drory, Simon Horman
  Cc: Lukas Wunner, Bjorn Helgaas, Niklas Schnelle, Farhan Ali, netdev,
	linux-rdma, linux-kernel, linux-s390, linux-pci

On Wed, 2025-12-03 at 17:14 +0200, Moshe Shemesh wrote:
> 
> On 12/2/2025 1:12 PM, Gerd Bayer wrote:
> > 

  [ ... snip ... ]

> > 
> > Fixes: 5a977b5833b7 ("net/mlx5: Lag, move devcom registration to LAG layer")
> > Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
> 
> Reviewed-by: Moshe Shemesh <moshe@nvidia.com>> ---
> > Hi Shay et al,
> > 
> 
> Hi Gerd,
>   I stepped on this bug recently too, without s390 and was about to 
> submit same fix :) So as you wrote it is unrelated to Lukas' patches and 
> this fix is correct.

Good to hear. I wonder if you could share how you got to run into this?

> 
> > 
> > I've spotted two additional places where the devcom reference is not
> > cleared after calling mlx5_devcom_unregister_component() in
> > drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c that I have not
> > addressed with a patch, since I'm unclear about how to test these
> > paths.
> 
> As for the other cases, we had the patch 664f76be38a1 ("net/mlx5: Fix 
> IPsec cleanup over MPV device") and two other cases on shared clock and 
> SD but I don't see any flow the shared clock or SD can fail, 
> specifically mlx5_sd_cleanup() checks sd pointer at beginning of the 
> function and nullify it right after sd_unregister() that free devcom.

I didn't locate any calls to mxl5_devcom_unregister_component() in
"shared clock" - is that not yet upstream?

Regarding SD, I follow that sd_cleanup() is followed immediately after
sd_unregister() and does the clean-up. One path remains uncovered
though: The error exit at
https://elixir.bootlin.com/linux/v6.18/source/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c#L265

Not sure, how likely that is...

Thanks,
Gerd

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component
  2025-12-02 11:12 [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component Gerd Bayer
                   ` (2 preceding siblings ...)
  2025-12-04  9:00 ` Tariq Toukan
@ 2025-12-04 14:30 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 9+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-12-04 14:30 UTC (permalink / raw)
  To: Gerd Bayer
  Cc: saeedm, leon, tariqt, mbloch, andrew+netdev, davem, edumazet,
	kuba, pabeni, shayd, horms, lukas, helgaas, schnelle, alifm,
	netdev, linux-rdma, linux-kernel, linux-s390, linux-pci

Hello:

This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Tue, 02 Dec 2025 12:12:57 +0100 you wrote:
> Clear hca_devcom_comp in device's private data after unregistering it in
> LAG teardown. Otherwise a slightly lagging second pass through
> mlx5_unload_one() might try to unregister it again and trip over
> use-after-free.
> 
> On s390 almost all PCI level recovery events trigger two passes through
> mxl5_unload_one() - one through the poll_health() method and one through
> mlx5_pci_err_detected() as callback from generic PCI error recovery.
> While testing PCI error recovery paths with more kernel debug features
> enabled, this issue reproducibly led to kernel panics with the following
> call chain:
> 
> [...]

Here is the summary with links:
  - [net] net/mlx5: Fix double unregister of HCA_PORTS component
    https://git.kernel.org/netdev/net/c/6a107cfe9c99

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component
  2025-12-04  9:48   ` Gerd Bayer
@ 2025-12-04 17:07     ` Moshe Shemesh
  2025-12-05  8:23       ` Gerd Bayer
  0 siblings, 1 reply; 9+ messages in thread
From: Moshe Shemesh @ 2025-12-04 17:07 UTC (permalink / raw)
  To: Gerd Bayer, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Shay Drory, Simon Horman
  Cc: Lukas Wunner, Bjorn Helgaas, Niklas Schnelle, Farhan Ali, netdev,
	linux-rdma, linux-kernel, linux-s390, linux-pci



On 12/4/2025 11:48 AM, Gerd Bayer wrote:
> 
> On Wed, 2025-12-03 at 17:14 +0200, Moshe Shemesh wrote:
>>
>> On 12/2/2025 1:12 PM, Gerd Bayer wrote:
>>>
> 
>    [ ... snip ... ]
> 
>>>
>>> Fixes: 5a977b5833b7 ("net/mlx5: Lag, move devcom registration to LAG layer")
>>> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
>>
>> Reviewed-by: Moshe Shemesh <moshe@nvidia.com>> ---
>>> Hi Shay et al,
>>>
>>
>> Hi Gerd,
>>    I stepped on this bug recently too, without s390 and was about to
>> submit same fix :) So as you wrote it is unrelated to Lukas' patches and
>> this fix is correct.
> 
> Good to hear. I wonder if you could share how you got to run into this?
> 

mlx5_unload_one() can be called from few flows.
Even that it is always called with devlink lock, serial of 
mlx5_unload_one() twice caused it. I got it on fw_reset and shutdown. I 
I will submit also a patch for calling mlx5_drain_fw_reset() on shutdown 
soon.

>>
>>>
>>> I've spotted two additional places where the devcom reference is not
>>> cleared after calling mlx5_devcom_unregister_component() in
>>> drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c that I have not
>>> addressed with a patch, since I'm unclear about how to test these
>>> paths.
>>
>> As for the other cases, we had the patch 664f76be38a1 ("net/mlx5: Fix
>> IPsec cleanup over MPV device") and two other cases on shared clock and
>> SD but I don't see any flow the shared clock or SD can fail,
>> specifically mlx5_sd_cleanup() checks sd pointer at beginning of the
>> function and nullify it right after sd_unregister() that free devcom.
> 
> I didn't locate any calls to mxl5_devcom_unregister_component() in
> "shared clock" - is that not yet upstream?

mlx5_shared_clock_unregister() in 
drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c

> 
> Regarding SD, I follow that sd_cleanup() is followed immediately after
> sd_unregister() and does the clean-up. One path remains uncovered
> though: The error exit at
> https://elixir.bootlin.com/linux/v6.18/source/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c#L265
> 
> Not sure, how likely that is...

It comes on error flow but after successful 
mlx5_devcom_register_component() in sd_register(), and that error leads 
to error flow in mlx5_sd_init(), which calls sd_cleanup() too.

> 
> Thanks,
> Gerd


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component
  2025-12-04 17:07     ` Moshe Shemesh
@ 2025-12-05  8:23       ` Gerd Bayer
  0 siblings, 0 replies; 9+ messages in thread
From: Gerd Bayer @ 2025-12-05  8:23 UTC (permalink / raw)
  To: Moshe Shemesh, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Shay Drory, Simon Horman
  Cc: Lukas Wunner, Bjorn Helgaas, Niklas Schnelle, Farhan Ali, netdev,
	linux-rdma, linux-kernel, linux-s390, linux-pci

On Thu, 2025-12-04 at 19:07 +0200, Moshe Shemesh wrote:
> 
> On 12/4/2025 11:48 AM, Gerd Bayer wrote:
> > 
> > On Wed, 2025-12-03 at 17:14 +0200, Moshe Shemesh wrote:
> > > 
> > > On 12/2/2025 1:12 PM, Gerd Bayer wrote:
> > > > 
> > 
> >    [ ... snip ... ]
> > 
> > > > 
> > > > Fixes: 5a977b5833b7 ("net/mlx5: Lag, move devcom registration to LAG layer")
> > > > Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
> > > 
> > > Reviewed-by: Moshe Shemesh <moshe@nvidia.com>> ---
> > > > Hi Shay et al,
> > > > 
> > > 
> > > Hi Gerd,
> > >    I stepped on this bug recently too, without s390 and was about to
> > > submit same fix :) So as you wrote it is unrelated to Lukas' patches and
> > > this fix is correct.
> > 
> > Good to hear. I wonder if you could share how you got to run into this?
> > 
> 
> mlx5_unload_one() can be called from few flows.
> Even that it is always called with devlink lock, serial of 
> mlx5_unload_one() twice caused it. I got it on fw_reset and shutdown. I 
> I will submit also a patch for calling mlx5_drain_fw_reset() on shutdown 
> soon.

I agree, serialization through the devlink lock does not help if
mlx5_unload_one() does not clean up all the references.

> 
> > > 
> > > > 
> > > > I've spotted two additional places where the devcom reference is not
> > > > cleared after calling mlx5_devcom_unregister_component() in
> > > > drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c that I have not
> > > > addressed with a patch, since I'm unclear about how to test these
> > > > paths.
> > > 
> > > As for the other cases, we had the patch 664f76be38a1 ("net/mlx5: Fix
> > > IPsec cleanup over MPV device") and two other cases on shared clock and
> > > SD but I don't see any flow the shared clock or SD can fail,
> > > specifically mlx5_sd_cleanup() checks sd pointer at beginning of the
> > > function and nullify it right after sd_unregister() that free devcom.
> > 
> > I didn't locate any calls to mxl5_devcom_unregister_component() in
> > "shared clock" - is that not yet upstream?
> 
> mlx5_shared_clock_unregister() in 
> drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c

Hah - my fault! I was searching through the indexer's parameterized
cross-references, and w/o CONFIG_PTP_1588_CLOCK that file was excluded.

> 
> > 
> > Regarding SD, I follow that sd_cleanup() is followed immediately after
> > sd_unregister() and does the clean-up. One path remains uncovered
> > though: The error exit at
> > https://elixir.bootlin.com/linux/v6.18/source/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c#L265
> > 
> > Not sure, how likely that is...
> 
> It comes on error flow but after successful 
> mlx5_devcom_register_component() in sd_register(), and that error leads 
> to error flow in mlx5_sd_init(), which calls sd_cleanup() too.
> 
> > 
> > Thanks,
> > Gerd

Thanks for you explanations,
Gerd

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-12-05  8:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-02 11:12 [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component Gerd Bayer
2025-12-03 15:14 ` Moshe Shemesh
2025-12-04  9:48   ` Gerd Bayer
2025-12-04 17:07     ` Moshe Shemesh
2025-12-05  8:23       ` Gerd Bayer
2025-12-03 21:10 ` Farhan Ali
2025-12-04  8:27   ` Tariq Toukan
2025-12-04  9:00 ` Tariq Toukan
2025-12-04 14:30 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).