[PATCH] PCI/DPC: Extend DPC recovery timeout

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] PCI/DPC: Extend DPC recovery timeout
@ 2025-07-07 10:30 Andy Xu
  2025-07-07 17:04 ` Sathyanarayanan Kuppuswamy
  0 siblings, 1 reply; 8+ messages in thread
From: Andy Xu @ 2025-07-07 10:30 UTC (permalink / raw)
  To: bhelgaas, lukas
  Cc: mahesh, oohall, linux-pci, linux-kernel, jemma.zhang, peter.du,
	Hongbo Yao

From: Hongbo Yao <andy.xu@hj-micro.com>

Extend the DPC recovery timeout from 4 seconds to 7 seconds to
support Mellanox ConnectX series network adapters.

My environment:
  - Platform: arm64 N2 based server
  - Endpoint1: Mellanox Technologies MT27800 Family [ConnectX-5]
  - Endpoint2: Mellanox Technologies MT2910 Family [ConnectX-7]

With the original 4s timeout, hotplug would still be triggered:

[ 81.012463] pcieport 0004:00:00.0: DPC: containment event, status:0x1f01 source:0x0000
[ 81.014536] pcieport 0004:00:00.0: DPC: unmasked uncorrectable error detected
[ 81.029598] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
[ 81.040830] pcieport 0004:00:00.0: device [0823:0110] error status/mask=00008000/04d40000
[ 81.049870] pcieport 0004:00:00.0: [ 0] ERCR (First)
[ 81.053520] pcieport 0004:00:00.0: AER: TLP Header: 60008010 010000ff 00001000 9c4c0000
[ 81.065793] mlx5_core 0004:01:00.0: mlx5_pci_err_detected Device state = 1 health sensors: 1 pci_status: 1. Enter, pci channel state = 2
[ 81.076183] mlx5_core 0004:01:00.0: mlx5_error_sw_reset:231:(pid 1618): start
[ 81.083307] mlx5_core 0004:01:00.0: mlx5_error_sw_reset:252:(pid 1618): PCI channel offline, stop waiting for NIC IFC
[ 81.077428] mlx5_core 0004:01:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), neovfs(0), active vports(0)
[ 81.486693] mlx5_core 0004:01:00.0: mlx5_wait_for_pages:786:(pid 1618): Skipping wait for vf pages stage
[ 81.496965] mlx5_core 0004:01:00.0: mlx5_wait_for_pages:786:(pid 1618): Skipping wait for vf pages stage
[ 82.395040] mlx5_core 0004:01:00.1: print_health:819:(pid 0): Fatal error detected
[ 82.395493] mlx5_core 0004:01:00.1: print_health_info:423:(pid 0): PCI slot 1 is unavailable
[ 83.431094] mlx5_core 0004:01:00.0: mlx5_pci_err_detected Device state = 2 pci_status: 0. Exit, result = 3, need reset
[ 83.442100] mlx5_core 0004:01:00.1: mlx5_pci_err_detected Device state = 2 health sensors: 1 pci_status: 1. Enter, pci channel state = 2
[ 83.441801] mlx5_core 0004:01:00.0: mlx5_crdump_collect:50:(pid 2239): crdump: failed to lock gw status -13
[ 83.454050] mlx5_core 0004:01:00.1: mlx5_error_sw_reset:231:(pid 1618): start
[ 83.454050] mlx5_core 0004:01:00.1: mlx5_error_sw_reset:252:(pid 1618): PCI channel offline, stop waiting for NIC IFC
[ 83.849429] mlx5_core 0004:01:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), neovfs(0), active vports(0)
[ 83.858892] mlx5_core 0004:01:00.1: mlx5_wait_for_pages:786:(pid 1618): Skipping wait for vf pages stage
[ 83.869464] mlx5_core 0004:01:00.1: mlx5_wait_for_pages:786:(pid 1618): Skipping wait for vf pages stage
[ 85.201433] pcieport 0004:00:00.0: pciehp: Slot(41): Link Down
[ 85.815016] mlx5_core 0004:01:00.1: mlx5_health_try_recover:335:(pid 2239): handling bad device here
[ 85.824164] mlx5_core 0004:01:00.1: mlx5_error_sw_reset:231:(pid 2239): start
[ 85.831283] mlx5_core 0004:01:00.1: mlx5_error_sw_reset:252:(pid 2239): PCI channel offline, stop waiting for NIC IFC
[ 85.841899] mlx5_core 0004:01:00.1: mlx5_unload_one_dev_locked:1612:(pid 2239): mlx5_unload_one_dev_locked: interface is down, NOP
[ 85.853799] mlx5_core 0004:01:00.1: mlx5_health_wait_pci_up:325:(pid 2239): PCI channel offline, stop waiting for PCI
[ 85.863494] mlx5_core 0004:01:00.1: mlx5_health_try_recover:338:(pid 2239): health recovery flow aborted, PCI reads still not working
[ 85.873231] mlx5_core 0004:01:00.1: mlx5_pci_err_detected Device state = 2 pci_status: 0. Exit, result = 3, need reset
[ 85.879899] mlx5_core 0004:01:00.1: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), neovfs(0), active vports(0)
[ 85.921428] mlx5_core 0004:01:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), neovfs(0), active vports(0)
[ 85.930491] mlx5_core 0004:01:00.1: mlx5_wait_for_pages:786:(pid 1617): Skipping wait for vf pages stage
[ 85.940849] mlx5_core 0004:01:00.1: mlx5_wait_for_pages:786:(pid 1617): Skipping wait for vf pages stage
[ 85.949971] mlx5_core 0004:01:00.1: mlx5_uninit_one:1528:(pid 1617): mlx5_uninit_one: interface is down, NOP
[ 85.959944] mlx5_core 0004:01:00.1: E-Switch: cleanup
[ 86.035541] mlx5_core 0004:01:00.0: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), neovfs(0), active vports(0)
[ 86.077568] mlx5_core 0004:01:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), neovfs(0), active vports(0)
[ 86.071727] mlx5_core 0004:01:00.0: mlx5_wait_for_pages:786:(pid 1617): Skipping wait for vf pages stage
[ 86.096577] mlx5_core 0004:01:00.0: mlx5_wait_for_pages:786:(pid 1617): Skipping wait for vf pages stage
[ 86.106909] mlx5_core 0004:01:00.0: mlx5_uninit_one:1528:(pid 1617): mlx5_uninit_one: interface is down, NOP
[ 86.115940] pcieport 0004:00:00.0: AER: subordinate device reset failed
[ 86.122557] pcieport 0004:00:00.0: AER: device recovery failed
[ 86.128571] mlx5_core 0004:01:00.0: E-Switch: cleanup

I added some prints and found that:
 - ConnectX-5 requires >5s for full recovery
 - ConnectX-7 requires >6s for full recovery

Setting timeout to 7s covers both devices with safety margin.

Signed-off-by: Hongbo Yao <andy.xu@hj-micro.com>
---
 drivers/pci/pcie/dpc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index fc18349614d7..35a37fd86dcd 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -118,10 +118,10 @@ bool pci_dpc_recovered(struct pci_dev *pdev)
 	/*
 	 * Need a timeout in case DPC never completes due to failure of
 	 * dpc_wait_rp_inactive().  The spec doesn't mandate a time limit,
-	 * but reports indicate that DPC completes within 4 seconds.
+	 * but reports indicate that DPC completes within 7 seconds.
 	 */
 	wait_event_timeout(dpc_completed_waitqueue, dpc_completed(pdev),
-			   msecs_to_jiffies(4000));
+			   msecs_to_jiffies(7000));
 
 	return test_and_clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] PCI/DPC: Extend DPC recovery timeout
  2025-07-07 10:30 [PATCH] PCI/DPC: Extend DPC recovery timeout Andy Xu
@ 2025-07-07 17:04 ` Sathyanarayanan Kuppuswamy
  2025-07-11  3:20   ` Hongbo Yao
  0 siblings, 1 reply; 8+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2025-07-07 17:04 UTC (permalink / raw)
  To: Andy Xu, bhelgaas, lukas
  Cc: mahesh, oohall, linux-pci, linux-kernel, jemma.zhang, peter.du


On 7/7/25 3:30 AM, Andy Xu wrote:
> From: Hongbo Yao <andy.xu@hj-micro.com>
>
> Extend the DPC recovery timeout from 4 seconds to 7 seconds to
> support Mellanox ConnectX series network adapters.
>
> My environment:
>    - Platform: arm64 N2 based server
>    - Endpoint1: Mellanox Technologies MT27800 Family [ConnectX-5]
>    - Endpoint2: Mellanox Technologies MT2910 Family [ConnectX-7]
>
> With the original 4s timeout, hotplug would still be triggered:
>
> [ 81.012463] pcieport 0004:00:00.0: DPC: containment event, status:0x1f01 source:0x0000
> [ 81.014536] pcieport 0004:00:00.0: DPC: unmasked uncorrectable error detected
> [ 81.029598] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
> [ 81.040830] pcieport 0004:00:00.0: device [0823:0110] error status/mask=00008000/04d40000
> [ 81.049870] pcieport 0004:00:00.0: [ 0] ERCR (First)
> [ 81.053520] pcieport 0004:00:00.0: AER: TLP Header: 60008010 010000ff 00001000 9c4c0000
> [ 81.065793] mlx5_core 0004:01:00.0: mlx5_pci_err_detected Device state = 1 health sensors: 1 pci_status: 1. Enter, pci channel state = 2
> [ 81.076183] mlx5_core 0004:01:00.0: mlx5_error_sw_reset:231:(pid 1618): start
> [ 81.083307] mlx5_core 0004:01:00.0: mlx5_error_sw_reset:252:(pid 1618): PCI channel offline, stop waiting for NIC IFC
> [ 81.077428] mlx5_core 0004:01:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), neovfs(0), active vports(0)
> [ 81.486693] mlx5_core 0004:01:00.0: mlx5_wait_for_pages:786:(pid 1618): Skipping wait for vf pages stage
> [ 81.496965] mlx5_core 0004:01:00.0: mlx5_wait_for_pages:786:(pid 1618): Skipping wait for vf pages stage
> [ 82.395040] mlx5_core 0004:01:00.1: print_health:819:(pid 0): Fatal error detected
> [ 82.395493] mlx5_core 0004:01:00.1: print_health_info:423:(pid 0): PCI slot 1 is unavailable
> [ 83.431094] mlx5_core 0004:01:00.0: mlx5_pci_err_detected Device state = 2 pci_status: 0. Exit, result = 3, need reset
> [ 83.442100] mlx5_core 0004:01:00.1: mlx5_pci_err_detected Device state = 2 health sensors: 1 pci_status: 1. Enter, pci channel state = 2
> [ 83.441801] mlx5_core 0004:01:00.0: mlx5_crdump_collect:50:(pid 2239): crdump: failed to lock gw status -13
> [ 83.454050] mlx5_core 0004:01:00.1: mlx5_error_sw_reset:231:(pid 1618): start
> [ 83.454050] mlx5_core 0004:01:00.1: mlx5_error_sw_reset:252:(pid 1618): PCI channel offline, stop waiting for NIC IFC
> [ 83.849429] mlx5_core 0004:01:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), neovfs(0), active vports(0)
> [ 83.858892] mlx5_core 0004:01:00.1: mlx5_wait_for_pages:786:(pid 1618): Skipping wait for vf pages stage
> [ 83.869464] mlx5_core 0004:01:00.1: mlx5_wait_for_pages:786:(pid 1618): Skipping wait for vf pages stage
> [ 85.201433] pcieport 0004:00:00.0: pciehp: Slot(41): Link Down
> [ 85.815016] mlx5_core 0004:01:00.1: mlx5_health_try_recover:335:(pid 2239): handling bad device here
> [ 85.824164] mlx5_core 0004:01:00.1: mlx5_error_sw_reset:231:(pid 2239): start
> [ 85.831283] mlx5_core 0004:01:00.1: mlx5_error_sw_reset:252:(pid 2239): PCI channel offline, stop waiting for NIC IFC
> [ 85.841899] mlx5_core 0004:01:00.1: mlx5_unload_one_dev_locked:1612:(pid 2239): mlx5_unload_one_dev_locked: interface is down, NOP
> [ 85.853799] mlx5_core 0004:01:00.1: mlx5_health_wait_pci_up:325:(pid 2239): PCI channel offline, stop waiting for PCI
> [ 85.863494] mlx5_core 0004:01:00.1: mlx5_health_try_recover:338:(pid 2239): health recovery flow aborted, PCI reads still not working
> [ 85.873231] mlx5_core 0004:01:00.1: mlx5_pci_err_detected Device state = 2 pci_status: 0. Exit, result = 3, need reset
> [ 85.879899] mlx5_core 0004:01:00.1: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), neovfs(0), active vports(0)
> [ 85.921428] mlx5_core 0004:01:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), neovfs(0), active vports(0)
> [ 85.930491] mlx5_core 0004:01:00.1: mlx5_wait_for_pages:786:(pid 1617): Skipping wait for vf pages stage
> [ 85.940849] mlx5_core 0004:01:00.1: mlx5_wait_for_pages:786:(pid 1617): Skipping wait for vf pages stage
> [ 85.949971] mlx5_core 0004:01:00.1: mlx5_uninit_one:1528:(pid 1617): mlx5_uninit_one: interface is down, NOP
> [ 85.959944] mlx5_core 0004:01:00.1: E-Switch: cleanup
> [ 86.035541] mlx5_core 0004:01:00.0: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), neovfs(0), active vports(0)
> [ 86.077568] mlx5_core 0004:01:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), neovfs(0), active vports(0)
> [ 86.071727] mlx5_core 0004:01:00.0: mlx5_wait_for_pages:786:(pid 1617): Skipping wait for vf pages stage
> [ 86.096577] mlx5_core 0004:01:00.0: mlx5_wait_for_pages:786:(pid 1617): Skipping wait for vf pages stage
> [ 86.106909] mlx5_core 0004:01:00.0: mlx5_uninit_one:1528:(pid 1617): mlx5_uninit_one: interface is down, NOP
> [ 86.115940] pcieport 0004:00:00.0: AER: subordinate device reset failed
> [ 86.122557] pcieport 0004:00:00.0: AER: device recovery failed
> [ 86.128571] mlx5_core 0004:01:00.0: E-Switch: cleanup
>
> I added some prints and found that:
>   - ConnectX-5 requires >5s for full recovery
>   - ConnectX-7 requires >6s for full recovery
>
> Setting timeout to 7s covers both devices with safety margin.


Instead of updating the recovery time, can you check why your device recovery takes
such a long time and how to fix it from the device end?


> Signed-off-by: Hongbo Yao <andy.xu@hj-micro.com>
> ---
>   drivers/pci/pcie/dpc.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index fc18349614d7..35a37fd86dcd 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -118,10 +118,10 @@ bool pci_dpc_recovered(struct pci_dev *pdev)
>   	/*
>   	 * Need a timeout in case DPC never completes due to failure of
>   	 * dpc_wait_rp_inactive().  The spec doesn't mandate a time limit,
> -	 * but reports indicate that DPC completes within 4 seconds.
> +	 * but reports indicate that DPC completes within 7 seconds.
>   	 */
>   	wait_event_timeout(dpc_completed_waitqueue, dpc_completed(pdev),
> -			   msecs_to_jiffies(4000));
> +			   msecs_to_jiffies(7000));
>   
>   	return test_and_clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
>   }

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PCI/DPC: Extend DPC recovery timeout
  2025-07-07 17:04 ` Sathyanarayanan Kuppuswamy
@ 2025-07-11  3:20   ` Hongbo Yao
  2025-07-11  4:13     ` Lukas Wunner
  2025-08-07  2:00     ` Ethan Zhao
  0 siblings, 2 replies; 8+ messages in thread
From: Hongbo Yao @ 2025-07-11  3:20 UTC (permalink / raw)
  To: Sathyanarayanan Kuppuswamy, bhelgaas, lukas
  Cc: mahesh, oohall, linux-pci, linux-kernel, jemma.zhang, peter.du



在 2025/7/8 1:04, Sathyanarayanan Kuppuswamy 写道:
> 
> On 7/7/25 3:30 AM, Andy Xu wrote:
>> From: Hongbo Yao <andy.xu@hj-micro.com>
>>
>> Extend the DPC recovery timeout from 4 seconds to 7 seconds to
>> support Mellanox ConnectX series network adapters.
>>
>> My environment:
>>    - Platform: arm64 N2 based server
>>    - Endpoint1: Mellanox Technologies MT27800 Family [ConnectX-5]
>>    - Endpoint2: Mellanox Technologies MT2910 Family [ConnectX-7]
>>
>> With the original 4s timeout, hotplug would still be triggered:
>>
>> [ 81.012463] pcieport 0004:00:00.0: DPC: containment event,
>> status:0x1f01 source:0x0000
>> [ 81.014536] pcieport 0004:00:00.0: DPC: unmasked uncorrectable error
>> detected
>> [ 81.029598] pcieport 0004:00:00.0: PCIe Bus Error:
>> severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
>> [ 81.040830] pcieport 0004:00:00.0: device [0823:0110] error status/
>> mask=00008000/04d40000
>> [ 81.049870] pcieport 0004:00:00.0: [ 0] ERCR (First)
>> [ 81.053520] pcieport 0004:00:00.0: AER: TLP Header: 60008010 010000ff
>> 00001000 9c4c0000
>> [ 81.065793] mlx5_core 0004:01:00.0: mlx5_pci_err_detected Device
>> state = 1 health sensors: 1 pci_status: 1. Enter, pci channel state = 2
>> [ 81.076183] mlx5_core 0004:01:00.0: mlx5_error_sw_reset:231:(pid
>> 1618): start
>> [ 81.083307] mlx5_core 0004:01:00.0: mlx5_error_sw_reset:252:(pid
>> 1618): PCI channel offline, stop waiting for NIC IFC
>> [ 81.077428] mlx5_core 0004:01:00.0: E-Switch: Disable: mode(LEGACY),
>> nvfs(0), neovfs(0), active vports(0)
>> [ 81.486693] mlx5_core 0004:01:00.0: mlx5_wait_for_pages:786:(pid
>> 1618): Skipping wait for vf pages stage
>> [ 81.496965] mlx5_core 0004:01:00.0: mlx5_wait_for_pages:786:(pid
>> 1618): Skipping wait for vf pages stage
>> [ 82.395040] mlx5_core 0004:01:00.1: print_health:819:(pid 0): Fatal
>> error detected
>> [ 82.395493] mlx5_core 0004:01:00.1: print_health_info:423:(pid 0):
>> PCI slot 1 is unavailable
>> [ 83.431094] mlx5_core 0004:01:00.0: mlx5_pci_err_detected Device
>> state = 2 pci_status: 0. Exit, result = 3, need reset
>> [ 83.442100] mlx5_core 0004:01:00.1: mlx5_pci_err_detected Device
>> state = 2 health sensors: 1 pci_status: 1. Enter, pci channel state = 2
>> [ 83.441801] mlx5_core 0004:01:00.0: mlx5_crdump_collect:50:(pid
>> 2239): crdump: failed to lock gw status -13
>> [ 83.454050] mlx5_core 0004:01:00.1: mlx5_error_sw_reset:231:(pid
>> 1618): start
>> [ 83.454050] mlx5_core 0004:01:00.1: mlx5_error_sw_reset:252:(pid
>> 1618): PCI channel offline, stop waiting for NIC IFC
>> [ 83.849429] mlx5_core 0004:01:00.1: E-Switch: Disable: mode(LEGACY),
>> nvfs(0), neovfs(0), active vports(0)
>> [ 83.858892] mlx5_core 0004:01:00.1: mlx5_wait_for_pages:786:(pid
>> 1618): Skipping wait for vf pages stage
>> [ 83.869464] mlx5_core 0004:01:00.1: mlx5_wait_for_pages:786:(pid
>> 1618): Skipping wait for vf pages stage
>> [ 85.201433] pcieport 0004:00:00.0: pciehp: Slot(41): Link Down
>> [ 85.815016] mlx5_core 0004:01:00.1: mlx5_health_try_recover:335:(pid
>> 2239): handling bad device here
>> [ 85.824164] mlx5_core 0004:01:00.1: mlx5_error_sw_reset:231:(pid
>> 2239): start
>> [ 85.831283] mlx5_core 0004:01:00.1: mlx5_error_sw_reset:252:(pid
>> 2239): PCI channel offline, stop waiting for NIC IFC
>> [ 85.841899] mlx5_core 0004:01:00.1: mlx5_unload_one_dev_locked:1612:
>> (pid 2239): mlx5_unload_one_dev_locked: interface is down, NOP
>> [ 85.853799] mlx5_core 0004:01:00.1: mlx5_health_wait_pci_up:325:(pid
>> 2239): PCI channel offline, stop waiting for PCI
>> [ 85.863494] mlx5_core 0004:01:00.1: mlx5_health_try_recover:338:(pid
>> 2239): health recovery flow aborted, PCI reads still not working
>> [ 85.873231] mlx5_core 0004:01:00.1: mlx5_pci_err_detected Device
>> state = 2 pci_status: 0. Exit, result = 3, need reset
>> [ 85.879899] mlx5_core 0004:01:00.1: E-Switch: Unload vfs:
>> mode(LEGACY), nvfs(0), neovfs(0), active vports(0)
>> [ 85.921428] mlx5_core 0004:01:00.1: E-Switch: Disable: mode(LEGACY),
>> nvfs(0), neovfs(0), active vports(0)
>> [ 85.930491] mlx5_core 0004:01:00.1: mlx5_wait_for_pages:786:(pid
>> 1617): Skipping wait for vf pages stage
>> [ 85.940849] mlx5_core 0004:01:00.1: mlx5_wait_for_pages:786:(pid
>> 1617): Skipping wait for vf pages stage
>> [ 85.949971] mlx5_core 0004:01:00.1: mlx5_uninit_one:1528:(pid 1617):
>> mlx5_uninit_one: interface is down, NOP
>> [ 85.959944] mlx5_core 0004:01:00.1: E-Switch: cleanup
>> [ 86.035541] mlx5_core 0004:01:00.0: E-Switch: Unload vfs:
>> mode(LEGACY), nvfs(0), neovfs(0), active vports(0)
>> [ 86.077568] mlx5_core 0004:01:00.0: E-Switch: Disable: mode(LEGACY),
>> nvfs(0), neovfs(0), active vports(0)
>> [ 86.071727] mlx5_core 0004:01:00.0: mlx5_wait_for_pages:786:(pid
>> 1617): Skipping wait for vf pages stage
>> [ 86.096577] mlx5_core 0004:01:00.0: mlx5_wait_for_pages:786:(pid
>> 1617): Skipping wait for vf pages stage
>> [ 86.106909] mlx5_core 0004:01:00.0: mlx5_uninit_one:1528:(pid 1617):
>> mlx5_uninit_one: interface is down, NOP
>> [ 86.115940] pcieport 0004:00:00.0: AER: subordinate device reset failed
>> [ 86.122557] pcieport 0004:00:00.0: AER: device recovery failed
>> [ 86.128571] mlx5_core 0004:01:00.0: E-Switch: cleanup
>>
>> I added some prints and found that:
>>   - ConnectX-5 requires >5s for full recovery
>>   - ConnectX-7 requires >6s for full recovery
>>
>> Setting timeout to 7s covers both devices with safety margin.
> 
> 
> Instead of updating the recovery time, can you check why your device
> recovery takes
> such a long time and how to fix it from the device end?
> 
Hi, Sathyanarayanan.

Thanks for the valuable feedback and suggestions.

I fully agree that ideally the root cause should be addressed on the
device side to reduce the DPC recovery latency, and that waiting longer
in the kernel is not a perfect solution.

However, the current 4 seconds timeout in pci_dpc_recovered() is indeed
an empirical value rather than a hard requirement from the PCIe
specification. In real-world scenarios, like with Mellanox ConnectX-5/7
adapters, we've observed that full DPC recovery can take more than 5-6
seconds, which leads to premature hotplug processing and device removal.

To improve robustness and maintain flexibility, I’m considering
introducing a module parameter to allow tuning the DPC recovery timeout
dynamically. Would you like me to prepare and submit such a patch for
review?


Best regards,
Hongbo Yao


> 
>> Signed-off-by: Hongbo Yao <andy.xu@hj-micro.com>
>> ---
>>   drivers/pci/pcie/dpc.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
>> index fc18349614d7..35a37fd86dcd 100644
>> --- a/drivers/pci/pcie/dpc.c
>> +++ b/drivers/pci/pcie/dpc.c
>> @@ -118,10 +118,10 @@ bool pci_dpc_recovered(struct pci_dev *pdev)
>>       /*
>>        * Need a timeout in case DPC never completes due to failure of
>>        * dpc_wait_rp_inactive().  The spec doesn't mandate a time limit,
>> -     * but reports indicate that DPC completes within 4 seconds.
>> +     * but reports indicate that DPC completes within 7 seconds.
>>        */
>>       wait_event_timeout(dpc_completed_waitqueue, dpc_completed(pdev),
>> -               msecs_to_jiffies(4000));
>> +               msecs_to_jiffies(7000));
>>         return test_and_clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
>>   }
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PCI/DPC: Extend DPC recovery timeout
  2025-07-11  3:20   ` Hongbo Yao
@ 2025-07-11  4:13     ` Lukas Wunner
  2025-08-06 21:34       ` Bjorn Helgaas
  2025-08-07  2:00     ` Ethan Zhao
  1 sibling, 1 reply; 8+ messages in thread
From: Lukas Wunner @ 2025-07-11  4:13 UTC (permalink / raw)
  To: Hongbo Yao
  Cc: Sathyanarayanan Kuppuswamy, bhelgaas, mahesh, oohall, linux-pci,
	linux-kernel, jemma.zhang, peter.du

On Fri, Jul 11, 2025 at 11:20:15AM +0800, Hongbo Yao wrote:
> 2025/7/8 1:04, Sathyanarayanan Kuppuswamy:
> > On 7/7/25 3:30 AM, Andy Xu wrote:
> > > Setting timeout to 7s covers both devices with safety margin.
> > 
> > Instead of updating the recovery time, can you check why your device
> > recovery takes
> > such a long time and how to fix it from the device end?
> 
> I fully agree that ideally the root cause should be addressed on the
> device side to reduce the DPC recovery latency, and that waiting longer
> in the kernel is not a perfect solution.
> 
> However, the current 4 seconds timeout in pci_dpc_recovered() is indeed
> an empirical value rather than a hard requirement from the PCIe
> specification. In real-world scenarios, like with Mellanox ConnectX-5/7
> adapters, we've observed that full DPC recovery can take more than 5-6
> seconds, which leads to premature hotplug processing and device removal.

I think Sathya's point was:  Have you made an effort to talk to the
vendor and ask them to root-cause and fix the issue e.g. with a firmware
update.

> To improve robustness and maintain flexibility, I???m considering
> introducing a module parameter to allow tuning the DPC recovery timeout
> dynamically. Would you like me to prepare and submit such a patch for
> review?

We try to avoid adding new module parameters.  Things should just work
out of the box without the user having to adjust the kernel command
line for their system.

So the solution is indeed to either adjust the delay for everyone
(as you've done) or introduce an unsigned int to struct pci_dev
which can be assigned the delay after reset for the device to be
responsive.

For comparison, we're allowing up to 60 sec for devices to become
available after a Fundamental Reset or Conventional Reset
(PCIE_RESET_READY_POLL_MS).  That's how long we're waiting in
dpc_reset_link() -> pci_bridge_wait_for_secondary_bus() and
we're not consistent with that when we wait only 4 sec in
pci_dpc_recovered().

I think the reason is that we weren't really sure whether this approach
to synchronize hotplug with DPC works well and how to choose delays.
But we've had this for a few years now and it seems to have worked nicely
for people.  I think this is the first report where it's not been
working out of the box.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PCI/DPC: Extend DPC recovery timeout
  2025-07-11  4:13     ` Lukas Wunner
@ 2025-08-06 21:34       ` Bjorn Helgaas
  2025-08-06 21:52         ` Keith Busch
  0 siblings, 1 reply; 8+ messages in thread
From: Bjorn Helgaas @ 2025-08-06 21:34 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Hongbo Yao, Sathyanarayanan Kuppuswamy, bhelgaas, mahesh, oohall,
	linux-pci, linux-kernel, jemma.zhang, peter.du

On Fri, Jul 11, 2025 at 06:13:01AM +0200, Lukas Wunner wrote:
> On Fri, Jul 11, 2025 at 11:20:15AM +0800, Hongbo Yao wrote:
> > 2025/7/8 1:04, Sathyanarayanan Kuppuswamy:
> > > On 7/7/25 3:30 AM, Andy Xu wrote:
> > > > Setting timeout to 7s covers both devices with safety margin.
> > > 
> > > Instead of updating the recovery time, can you check why your device
> > > recovery takes
> > > such a long time and how to fix it from the device end?
> > 
> > I fully agree that ideally the root cause should be addressed on the
> > device side to reduce the DPC recovery latency, and that waiting longer
> > in the kernel is not a perfect solution.
> > 
> > However, the current 4 seconds timeout in pci_dpc_recovered() is indeed
> > an empirical value rather than a hard requirement from the PCIe
> > specification. In real-world scenarios, like with Mellanox ConnectX-5/7
> > adapters, we've observed that full DPC recovery can take more than 5-6
> > seconds, which leads to premature hotplug processing and device removal.
> 
> I think Sathya's point was:  Have you made an effort to talk to the
> vendor and ask them to root-cause and fix the issue e.g. with a firmware
> update.

Would definitely be great, but unless we have a number in the spec to
point to, they might just shrug and ask what the requirement is.

> > To improve robustness and maintain flexibility, I???m considering
> > introducing a module parameter to allow tuning the DPC recovery timeout
> > dynamically. Would you like me to prepare and submit such a patch for
> > review?
> 
> We try to avoid adding new module parameters.  Things should just work
> out of the box without the user having to adjust the kernel command
> line for their system.
> 
> So the solution is indeed to either adjust the delay for everyone
> (as you've done) or introduce an unsigned int to struct pci_dev
> which can be assigned the delay after reset for the device to be
> responsive.
> 
> For comparison, we're allowing up to 60 sec for devices to become
> available after a Fundamental Reset or Conventional Reset
> (PCIE_RESET_READY_POLL_MS).  That's how long we're waiting in
> dpc_reset_link() -> pci_bridge_wait_for_secondary_bus() and
> we're not consistent with that when we wait only 4 sec in
> pci_dpc_recovered().
> 
> I think the reason is that we weren't really sure whether this approach
> to synchronize hotplug with DPC works well and how to choose delays.
> But we've had this for a few years now and it seems to have worked nicely
> for people.  I think this is the first report where it's not been
> working out of the box.

Why would we wait less than PCIE_RESET_READY_POLL_MS?  DPC disables
the link, so that's basically a reset for the device.  Seems like we
should allow as much time as we do for any other kind of reset.

Bjorn

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PCI/DPC: Extend DPC recovery timeout
  2025-08-06 21:34       ` Bjorn Helgaas
@ 2025-08-06 21:52         ` Keith Busch
  2025-08-07  1:54           ` Ethan Zhao
  0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2025-08-06 21:52 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Lukas Wunner, Hongbo Yao, Sathyanarayanan Kuppuswamy, bhelgaas,
	mahesh, oohall, linux-pci, linux-kernel, jemma.zhang, peter.du

On Wed, Aug 06, 2025 at 04:34:09PM -0500, Bjorn Helgaas wrote:
> > > However, the current 4 seconds timeout in pci_dpc_recovered() is indeed
> > > an empirical value rather than a hard requirement from the PCIe
> > > specification. In real-world scenarios, like with Mellanox ConnectX-5/7
> > > adapters, we've observed that full DPC recovery can take more than 5-6
> > > seconds, which leads to premature hotplug processing and device removal.
> > 
> > I think Sathya's point was:  Have you made an effort to talk to the
> > vendor and ask them to root-cause and fix the issue e.g. with a firmware
> > update.
> 
> Would definitely be great, but unless we have a number in the spec to
> point to, they might just shrug and ask what the requirement is.

I agree, and I have similar problems with other arbitrary kernel timing
decicsions. Specifically RRL where there's no spec defined number yet my
patch to modify it has not received much consideration.

  https://lore.kernel.org/linux-pci/20250218165444.2406119-1-kbusch@meta.com/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PCI/DPC: Extend DPC recovery timeout
  2025-08-06 21:52         ` Keith Busch
@ 2025-08-07  1:54           ` Ethan Zhao
  0 siblings, 0 replies; 8+ messages in thread
From: Ethan Zhao @ 2025-08-07  1:54 UTC (permalink / raw)
  To: Keith Busch, Bjorn Helgaas
  Cc: Lukas Wunner, Hongbo Yao, Sathyanarayanan Kuppuswamy, bhelgaas,
	mahesh, oohall, linux-pci, linux-kernel, jemma.zhang, peter.du



On 8/7/2025 5:52 AM, Keith Busch wrote:
> On Wed, Aug 06, 2025 at 04:34:09PM -0500, Bjorn Helgaas wrote:
>>>> However, the current 4 seconds timeout in pci_dpc_recovered() is indeed
>>>> an empirical value rather than a hard requirement from the PCIe
>>>> specification. In real-world scenarios, like with Mellanox ConnectX-5/7
>>>> adapters, we've observed that full DPC recovery can take more than 5-6
>>>> seconds, which leads to premature hotplug processing and device removal.
>>>
>>> I think Sathya's point was:  Have you made an effort to talk to the
>>> vendor and ask them to root-cause and fix the issue e.g. with a firmware
>>> update.
>>
>> Would definitely be great, but unless we have a number in the spec to
>> point to, they might just shrug and ask what the requirement is.
> 
> I agree, and I have similar problems with other arbitrary kernel timing
> decicsions. Specifically RRL where there's no spec defined number yet my
> patch to modify it has not received much consideration.
> 
>    https://lore.kernel.org/linux-pci/20250218165444.2406119-1-kbusch@meta.com/
> 
At least, with this patch, have a workaround in hand to make some device 
work.

Thanks,
Ethan



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PCI/DPC: Extend DPC recovery timeout
  2025-07-11  3:20   ` Hongbo Yao
  2025-07-11  4:13     ` Lukas Wunner
@ 2025-08-07  2:00     ` Ethan Zhao
  1 sibling, 0 replies; 8+ messages in thread
From: Ethan Zhao @ 2025-08-07  2:00 UTC (permalink / raw)
  To: Hongbo Yao, Sathyanarayanan Kuppuswamy, bhelgaas, lukas
  Cc: mahesh, oohall, linux-pci, linux-kernel, jemma.zhang, peter.du



On 7/11/2025 11:20 AM, Hongbo Yao wrote:
> 
> 
> 在 2025/7/8 1:04, Sathyanarayanan Kuppuswamy 写道:
>>
>> On 7/7/25 3:30 AM, Andy Xu wrote:
>>> From: Hongbo Yao <andy.xu@hj-micro.com>
>>>
>>> Extend the DPC recovery timeout from 4 seconds to 7 seconds to
>>> support Mellanox ConnectX series network adapters.
>>>
>>> My environment:
>>>     - Platform: arm64 N2 based server
>>>     - Endpoint1: Mellanox Technologies MT27800 Family [ConnectX-5]
>>>     - Endpoint2: Mellanox Technologies MT2910 Family [ConnectX-7]
>>>
>>> With the original 4s timeout, hotplug would still be triggered:
>>>
>>> [ 81.012463] pcieport 0004:00:00.0: DPC: containment event,
>>> status:0x1f01 source:0x0000
>>> [ 81.014536] pcieport 0004:00:00.0: DPC: unmasked uncorrectable error
>>> detected
>>> [ 81.029598] pcieport 0004:00:00.0: PCIe Bus Error:
>>> severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
>>> [ 81.040830] pcieport 0004:00:00.0: device [0823:0110] error status/
>>> mask=00008000/04d40000
>>> [ 81.049870] pcieport 0004:00:00.0: [ 0] ERCR (First)
>>> [ 81.053520] pcieport 0004:00:00.0: AER: TLP Header: 60008010 010000ff
>>> 00001000 9c4c0000
>>> [ 81.065793] mlx5_core 0004:01:00.0: mlx5_pci_err_detected Device
>>> state = 1 health sensors: 1 pci_status: 1. Enter, pci channel state = 2
>>> [ 81.076183] mlx5_core 0004:01:00.0: mlx5_error_sw_reset:231:(pid
>>> 1618): start
>>> [ 81.083307] mlx5_core 0004:01:00.0: mlx5_error_sw_reset:252:(pid
>>> 1618): PCI channel offline, stop waiting for NIC IFC
>>> [ 81.077428] mlx5_core 0004:01:00.0: E-Switch: Disable: mode(LEGACY),
>>> nvfs(0), neovfs(0), active vports(0)
>>> [ 81.486693] mlx5_core 0004:01:00.0: mlx5_wait_for_pages:786:(pid
>>> 1618): Skipping wait for vf pages stage
>>> [ 81.496965] mlx5_core 0004:01:00.0: mlx5_wait_for_pages:786:(pid
>>> 1618): Skipping wait for vf pages stage
>>> [ 82.395040] mlx5_core 0004:01:00.1: print_health:819:(pid 0): Fatal
>>> error detected
>>> [ 82.395493] mlx5_core 0004:01:00.1: print_health_info:423:(pid 0):
>>> PCI slot 1 is unavailable
>>> [ 83.431094] mlx5_core 0004:01:00.0: mlx5_pci_err_detected Device
>>> state = 2 pci_status: 0. Exit, result = 3, need reset
>>> [ 83.442100] mlx5_core 0004:01:00.1: mlx5_pci_err_detected Device
>>> state = 2 health sensors: 1 pci_status: 1. Enter, pci channel state = 2
>>> [ 83.441801] mlx5_core 0004:01:00.0: mlx5_crdump_collect:50:(pid
>>> 2239): crdump: failed to lock gw status -13
>>> [ 83.454050] mlx5_core 0004:01:00.1: mlx5_error_sw_reset:231:(pid
>>> 1618): start
>>> [ 83.454050] mlx5_core 0004:01:00.1: mlx5_error_sw_reset:252:(pid
>>> 1618): PCI channel offline, stop waiting for NIC IFC
>>> [ 83.849429] mlx5_core 0004:01:00.1: E-Switch: Disable: mode(LEGACY),
>>> nvfs(0), neovfs(0), active vports(0)
>>> [ 83.858892] mlx5_core 0004:01:00.1: mlx5_wait_for_pages:786:(pid
>>> 1618): Skipping wait for vf pages stage
>>> [ 83.869464] mlx5_core 0004:01:00.1: mlx5_wait_for_pages:786:(pid
>>> 1618): Skipping wait for vf pages stage
>>> [ 85.201433] pcieport 0004:00:00.0: pciehp: Slot(41): Link Down
>>> [ 85.815016] mlx5_core 0004:01:00.1: mlx5_health_try_recover:335:(pid
>>> 2239): handling bad device here
>>> [ 85.824164] mlx5_core 0004:01:00.1: mlx5_error_sw_reset:231:(pid
>>> 2239): start
>>> [ 85.831283] mlx5_core 0004:01:00.1: mlx5_error_sw_reset:252:(pid
>>> 2239): PCI channel offline, stop waiting for NIC IFC
>>> [ 85.841899] mlx5_core 0004:01:00.1: mlx5_unload_one_dev_locked:1612:
>>> (pid 2239): mlx5_unload_one_dev_locked: interface is down, NOP
>>> [ 85.853799] mlx5_core 0004:01:00.1: mlx5_health_wait_pci_up:325:(pid
>>> 2239): PCI channel offline, stop waiting for PCI
>>> [ 85.863494] mlx5_core 0004:01:00.1: mlx5_health_try_recover:338:(pid
>>> 2239): health recovery flow aborted, PCI reads still not working
>>> [ 85.873231] mlx5_core 0004:01:00.1: mlx5_pci_err_detected Device
>>> state = 2 pci_status: 0. Exit, result = 3, need reset
>>> [ 85.879899] mlx5_core 0004:01:00.1: E-Switch: Unload vfs:
>>> mode(LEGACY), nvfs(0), neovfs(0), active vports(0)
>>> [ 85.921428] mlx5_core 0004:01:00.1: E-Switch: Disable: mode(LEGACY),
>>> nvfs(0), neovfs(0), active vports(0)
>>> [ 85.930491] mlx5_core 0004:01:00.1: mlx5_wait_for_pages:786:(pid
>>> 1617): Skipping wait for vf pages stage
>>> [ 85.940849] mlx5_core 0004:01:00.1: mlx5_wait_for_pages:786:(pid
>>> 1617): Skipping wait for vf pages stage
>>> [ 85.949971] mlx5_core 0004:01:00.1: mlx5_uninit_one:1528:(pid 1617):
>>> mlx5_uninit_one: interface is down, NOP
>>> [ 85.959944] mlx5_core 0004:01:00.1: E-Switch: cleanup
>>> [ 86.035541] mlx5_core 0004:01:00.0: E-Switch: Unload vfs:
>>> mode(LEGACY), nvfs(0), neovfs(0), active vports(0)
>>> [ 86.077568] mlx5_core 0004:01:00.0: E-Switch: Disable: mode(LEGACY),
>>> nvfs(0), neovfs(0), active vports(0)
>>> [ 86.071727] mlx5_core 0004:01:00.0: mlx5_wait_for_pages:786:(pid
>>> 1617): Skipping wait for vf pages stage
>>> [ 86.096577] mlx5_core 0004:01:00.0: mlx5_wait_for_pages:786:(pid
>>> 1617): Skipping wait for vf pages stage
>>> [ 86.106909] mlx5_core 0004:01:00.0: mlx5_uninit_one:1528:(pid 1617):
>>> mlx5_uninit_one: interface is down, NOP
>>> [ 86.115940] pcieport 0004:00:00.0: AER: subordinate device reset failed
>>> [ 86.122557] pcieport 0004:00:00.0: AER: device recovery failed
>>> [ 86.128571] mlx5_core 0004:01:00.0: E-Switch: cleanup
>>>
>>> I added some prints and found that:
>>>    - ConnectX-5 requires >5s for full recovery
>>>    - ConnectX-7 requires >6s for full recovery
>>>
>>> Setting timeout to 7s covers both devices with safety margin.
>>
>>
>> Instead of updating the recovery time, can you check why your device
>> recovery takes
>> such a long time and how to fix it from the device end?
>>
> Hi, Sathyanarayanan.
> 
> Thanks for the valuable feedback and suggestions.
> 
> I fully agree that ideally the root cause should be addressed on the
> device side to reduce the DPC recovery latency, and that waiting longer
> in the kernel is not a perfect solution.
> 
> However, the current 4 seconds timeout in pci_dpc_recovered() is indeed
> an empirical value rather than a hard requirement from the PCIe
> specification. In real-world scenarios, like with Mellanox ConnectX-5/7
> adapters, we've observed that full DPC recovery can take more than 5-6
> seconds, which leads to premature hotplug processing and device removal.
> 
> To improve robustness and maintain flexibility, I’m considering
> introducing a module parameter to allow tuning the DPC recovery timeout
> dynamically. Would you like me to prepare and submit such a patch for
> review?
> 
What if another device just needs 7.1 seconds to recover ? revise the
timeout again ?  no spec says 4 seconds is mandated. have a kernel 
parameter to override it's default value is one choice to workaround.

Ask FW guys to fix ? what justification we have ?

Thanks,
Ethan
> 
> Best regards,
> Hongbo Yao
> 
> 
>>
>>> Signed-off-by: Hongbo Yao <andy.xu@hj-micro.com>
>>> ---
>>>    drivers/pci/pcie/dpc.c | 4 ++--
>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
>>> index fc18349614d7..35a37fd86dcd 100644
>>> --- a/drivers/pci/pcie/dpc.c
>>> +++ b/drivers/pci/pcie/dpc.c
>>> @@ -118,10 +118,10 @@ bool pci_dpc_recovered(struct pci_dev *pdev)
>>>        /*
>>>         * Need a timeout in case DPC never completes due to failure of
>>>         * dpc_wait_rp_inactive().  The spec doesn't mandate a time limit,
>>> -     * but reports indicate that DPC completes within 4 seconds.
>>> +     * but reports indicate that DPC completes within 7 seconds.
>>>         */
>>>        wait_event_timeout(dpc_completed_waitqueue, dpc_completed(pdev),
>>> -               msecs_to_jiffies(4000));
>>> +               msecs_to_jiffies(7000));
>>>          return test_and_clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
>>>    }
>>
> 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-08-07  2:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-07 10:30 [PATCH] PCI/DPC: Extend DPC recovery timeout Andy Xu
2025-07-07 17:04 ` Sathyanarayanan Kuppuswamy
2025-07-11  3:20   ` Hongbo Yao
2025-07-11  4:13     ` Lukas Wunner
2025-08-06 21:34       ` Bjorn Helgaas
2025-08-06 21:52         ` Keith Busch
2025-08-07  1:54           ` Ethan Zhao
2025-08-07  2:00     ` Ethan Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).