linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] PCI: pciehp: Replace fixed delay with polling for slot power-off
@ 2025-06-19  9:32 Hongbo Yao
  2025-06-19 11:52 ` Lukas Wunner
  0 siblings, 1 reply; 4+ messages in thread
From: Hongbo Yao @ 2025-06-19  9:32 UTC (permalink / raw)
  To: bhelgaas, lukas
  Cc: linux-pci, linux-kernel, linux-arm-kernel, peter.du, jemma.zhang,
	Hongbo Yao

Fixed 1-second delay in remove_board() fails to accommodate certain
hardware like multi-host OCP cards, which exhibit longer power-off
latencies.

Failure scenario:
1. Software clears pending_events prematurely
2. Hardware triggers DLLSC after software clearance
3. Subsequent power-off process may generate AER interrupts
4. System misinterprets residual DLLSC as valid hotplug event

Fix this by replacing the fixed delay with polling for DLLLA bit in
the link status register.

Logs before fix:
[157.778307] pcieport 0003:00:00.0: pciehp: pending interrupts 0x0001 from Slot Status
[157.778321] pcieport 0003:00:00.0: pciehp: Slot(31): Attention button pressed
[157.785445] pcieport 0003:00:00.0: pciehp: Slot(31): Powering off due to button press
[157.798931] pcieport 000b:00:02.0: pciehp: pending interrupts 0x0001 from Slot Status
[157.799263] pcieport 0003:00:00.0: pciehp: pciehp_set_indicators: SLOTCTRL 88 write cmd 2c0
[157.799273] pcieport 000b:00:02.0: pciehp: Slot(113): Attention button pressed
[157.800484] pcieport 000b:00:02.0: pciehp: Slot(113): Powering off due to button press
[157.800830] pcieport 000b:00:02.0: pciehp: pciehp_set_indicators: SLOTCTRL 88 write cmd 2c0
[162.850249] pcieport 0003:00:00.0: pciehp: pciehp_get_power_status: SLOTCTRL 88 value read 12e1
[162.850251] pcieport 000b:00:02.0: pciehp: pciehp_get_power_status: SLOTCTRL 88 value read 12e1
[162.850253] pcieport 0003:00:00.0: pciehp: pciehp_unconfigure_device: domain:bus:dev = 0003:01:00
[162.850254] pcieport 000b:00:02.0: pciehp: pciehp_unconfigure_device: domain:bus:dev = 000b:02:00
[162.850278] mlx5_core 000b:02:00.1: PME# disabled
[164.862362] mlx5_core 000b:02:00.1: E-Switch: cleanup
[165.171541] mlx5_core 000b:02:00.1: disabling bus mastering
[165.171591] mlx5_core 000b:02:00.0: PME# disabled
[167.214351] mlx5_core 000b:02:00.0: E-Switch: cleanup
[167.539871] mlx5_core 000b:02:00.0: disabling bus mastering
[167.540342] pcieport 000b:00:02.0: pciehp: pciehp_power_off_slot: SLOTCTRL 88 write cmd 400
[167.540345] mlx5_core 0003:01:00.1: PME# disabled
[168.546269] pcieport 000b:00:02.0: pciehp: pciehp_set_indicators: SLOTCTRL 88 write cmd 300
[169.590320] mlx5_core 0003:01:00.1: E-Switch: cleanup
[169.899852] mlx5_core 0003:01:00.1: disabling bus mastering
[169.899241] mlx5_core 0003:01:00.0: PME# disabled
[171.870344] mlx5_core 0003:01:00.0: E-Switch: cleanup
[172.289046] mlx5_core 0003:01:00.0: disabling bus mastering
[172.289366] pcieport 0003:00:00.0: pciehp: pciehp_power_off_slot: SLOTCTRL 88 write cmd 400
[172.302385] pcieport 0003:00:00.0: AER: Corrected error message received from 0003:00:00.0
[172.302396] pcieport 000b:00:02.0: AER: Corrected error message received from 000b:00:02.0
[172.310641] pcieport 0003:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[172.310892] pcieport 000b:00:02.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[172.310893] pcieport 0003:00:00.0:   device [0823:0110] error status/mask=00000001/0000e000
[172.310894] pcieport 0003:00:00.0:    [ 0] RxErr
[172.310893] pcieport 000b:00:02.0:   device [0823:0112] error status/mask=00000001/0000e000
[172.310894] pcieport 000b:00:02.0:    [ 0] RxErr                  (First)
[172.332915] pcieport 0003:00:00.0: pciehp: pending interrupts 0x0100 from Slot Status
[172.353876] pcieport 000b:00:02.0: pciehp: pending interrupts 0x0100 from Slot Status
[172.360212] pcieport 000b:00:02.0: pciehp: pciehp_check_link_active: lnk_status = d011
[172.360362] pcieport 000b:00:02.0: pciehp: Slot(113): Card present
[172.374314] pcieport 000b:00:02.0: pciehp: pciehp_get_power_status: SLOTCTRL 88 value read 17e1
[172.374357] pcieport 000b:00:02.0: pciehp: pciehp_power_on_slot: SLOTCTRL 88 write cmd 0
[172.374383] pcieport 000b:00:02.0: pciehp: pciehp_set_indicators: SLOTCTRL 88 write cmd 200
[173.314226] pcieport 000b:00:02.0: pciehp: pciehp_set_indicators: SLOTCTRL 88 write cmd 300
[174.111245] pcieport 0003:00:00.0: pciehp: pending interrupts 0x0100 from Slot Status
[174.111232] pcieport 0003:00:00.0: pciehp: pciehp_check_link_active: lnk_status = f085
[174.111234] pcieport 0003:00:00.0: pciehp: Slot(31): Card present
[174.117136] pcieport 0003:00:00.0: pciehp: Slot(31): Link Up
[174.122963] pcieport 0003:00:00.0: pciehp: pciehp_get_power_status: SLOTCTRL 88 value read 17e1
[174.122964] pcieport 0003:00:00.0: pciehp: pciehp_power_on_slot: SLOTCTRL 88 write cmd 0
[174.122967] pcieport 0003:00:00.0: pciehp: pciehp_set_indicators: SLOTCTRL 88 write cmd 200

Logs after fix:
[143.610100] pcieport 000b:00:02.0: pciehp: pending interrupts 0x0001 from Slot Status
[143.616525] pcieport 0003:00:00.0: pciehp: pending interrupts 0x0001 from Slot Status
[143.629921] pcieport 000b:00:02.0: pciehp: Slot(113): Attention button pressed
[143.629923] pcieport 000b:00:02.0: pciehp: Slot(113): Powering off due to button press
[143.629926] pcieport 000b:00:02.0: pciehp: pciehp_set_indicators: SLOTCTRL 88 write cmd 2c0
[143.651725] pcieport 0003:00:00.0: pciehp: Slot(31): Attention button pressed
[143.658848] pcieport 0003:00:00.0: pciehp: Slot(31): Powering off due to button press
[143.666666] pcieport 0003:00:00.0: pciehp: pciehp_set_indicators: SLOTCTRL 88 write cmd 2c0
[148.863526] pcieport 000b:00:02.0: pciehp: pciehp_get_power_status: SLOTCTRL 88 value read 12e1
[148.870562] pcieport 0003:00:00.0: pciehp: pciehp_get_power_status: SLOTCTRL 88 value read 12e1
[148.870572] pcieport 0003:00:00.0: pciehp: pciehp_unconfigure_device: domain:bus:dev = 0003:01:00
[148.870579] pcieport 000b:00:02.0: pciehp: pciehp_unconfigure_device: domain:bus:dev = 000b:02:00
[148.870605] mlx5_core 0003:01:00.1: PME# disabled
[152.536968] mlx5_core 0003:01:00.1: E-Switch: cleanup
[152.877666] mlx5_core 0003:01:00.1: disabling bus mastering
[152.878202] mlx5_core 0003:01:00.0: PME# disabled
[156.592567] mlx5_core 0003:01:00.0: E-Switch: cleanup
[156.904767] mlx5_core 0003:01:00.0: disabling bus mastering
[156.905195] pcieport 0003:00:00.0: pciehp: pciehp_power_off_slot: SLOTCTRL 88 write cmd 400
[156.905231] mlx5_core 000b:02:00.1: PME# disabled
[160.360660] mlx5_core 000b:02:00.1: E-Switch: cleanup
[161.272871] mlx5_core 000b:02:00.1: disabling bus mastering
[161.273719] mlx5_core 000b:02:00.0: PME# disabled
[165.028632] mlx5_core 000b:02:00.0: E-Switch: cleanup
[165.454213] mlx5_core 000b:02:00.0: disabling bus mastering
[165.454925] pcieport 000b:00:02.0: pciehp: pciehp_power_off_slot: SLOTCTRL 88 write cmd 400
[165.464333] pcieport 0003:00:00.0: AER: Corrected error message received from 0003:00:00.0
[165.478716] pcieport 000b:00:02.0: AER: Corrected error message received from 000b:00:02.0
[165.481156] pcieport 0003:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[165.481158] pcieport 0003:00:00.0:   device [0823:0110] error status/mask=00000001/0000e000
[165.481166] pcieport 0003:00:00.0:    [ 0] RxErr                  (First)
[165.528873] pcieport 000b:00:02.0: pciehp: pending interrupts 0x0001 from Slot Status
[165.535384] pcieport 0003:00:00.0: pciehp: pending interrupts 0x0100 from Slot Status
[165.535403] pcieport 000b:00:02.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[165.535403] pcieport 000b:00:02.0:   device [0823:0112] error status/mask=00000001/0000e000
[165.541315] pcieport 0003:00:00.0: pciehp: pciehp_set_indicators: SLOTCTRL 88 write cmd 300
[165.550871] pcieport 0003:00:00.0:    [ 0] RxErr                  (First)
[165.578591] pcieport 000b:00:02.0: pciehp: pciehp_set_indicators: SLOTCTRL 88 write cmd 300

Signed-off-by: Hongbo Yao <andy.xu@hj-micro.com>
---
 drivers/pci/hotplug/pciehp_ctrl.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index bcc938d4420f..179ebe4f7797 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -30,6 +30,25 @@
 #define SAFE_REMOVAL	 true
 #define SURPRISE_REMOVAL false
 
+static void pciehp_wait_for_link_inactive(struct controller *ctrl)
+{
+	u16 lnk_status;
+	int timeout = 10000, step = 20;
+
+	do {
+		pcie_capability_read_word(ctrl->pcie->port, PCI_EXP_LNKSTA,
+					  &lnk_status);
+
+		if (!(lnk_status & PCI_EXP_LNKSTA_DLLLA))
+			return;
+
+		msleep(step);
+		timeout -= step;
+	} while (timeout >= 0);
+
+	ctrl_dbg(ctrl, "Timeout waiting for link inactive state\n");
+}
+
 static void set_slot_off(struct controller *ctrl)
 {
 	/*
@@ -119,8 +138,11 @@ static void remove_board(struct controller *ctrl, bool safe_removal)
 		 * After turning power off, we must wait for at least 1 second
 		 * before taking any action that relies on power having been
 		 * removed from the slot/adapter.
+		 *
+		 * Extended wait with polling to ensure hardware has completed
+		 * power-off sequence.
 		 */
-		msleep(1000);
+		pciehp_wait_for_link_inactive(ctrl);
 
 		/* Ignore link or presence changes caused by power off */
 		atomic_and(~(PCI_EXP_SLTSTA_DLLSC | PCI_EXP_SLTSTA_PDC),
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] PCI: pciehp: Replace fixed delay with polling for slot power-off
  2025-06-19  9:32 [RFC PATCH] PCI: pciehp: Replace fixed delay with polling for slot power-off Hongbo Yao
@ 2025-06-19 11:52 ` Lukas Wunner
  2025-06-20  5:54   ` Hongbo Yao
  0 siblings, 1 reply; 4+ messages in thread
From: Lukas Wunner @ 2025-06-19 11:52 UTC (permalink / raw)
  To: Hongbo Yao
  Cc: bhelgaas, linux-pci, linux-kernel, linux-arm-kernel, peter.du,
	jemma.zhang

On Thu, Jun 19, 2025 at 05:32:28PM +0800, Hongbo Yao wrote:
> Fixed 1-second delay in remove_board() fails to accommodate certain
> hardware like multi-host OCP cards, which exhibit longer power-off
> latencies.

Please name the affected product(s).

They don't seem to comply to the spec.  How prevalent are they?
If there are only few deployed, quirks like this are probably
best addressed by an out-of-tree patch.

> Logs before fix:
> [157.778307] pcieport 0003:00:00.0: pciehp: pending interrupts 0x0001 from Slot Status
> [157.778321] pcieport 0003:00:00.0: pciehp: Slot(31): Attention button pressed
> [157.785445] pcieport 0003:00:00.0: pciehp: Slot(31): Powering off due to button press
> [157.798931] pcieport 000b:00:02.0: pciehp: pending interrupts 0x0001 from Slot Status

This log excerpt mixes messages from two separate hotplug ports
(0003:00:00.0 and 000b:00:02.0).  Are these hotplug ports related?
If not, please reduce the log excerpt to a single hotplug port
to avoid confusion.

> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -30,6 +30,25 @@
>  #define SAFE_REMOVAL	 true
>  #define SURPRISE_REMOVAL false
>  
> +static void pciehp_wait_for_link_inactive(struct controller *ctrl)
> +{
> +	u16 lnk_status;
> +	int timeout = 10000, step = 20;
> +
> +	do {
> +		pcie_capability_read_word(ctrl->pcie->port, PCI_EXP_LNKSTA,
> +					  &lnk_status);
> +
> +		if (!(lnk_status & PCI_EXP_LNKSTA_DLLLA))
> +			return;
> +
> +		msleep(step);
> +		timeout -= step;
> +	} while (timeout >= 0);
> +
> +	ctrl_dbg(ctrl, "Timeout waiting for link inactive state\n");
> +}

Any chance you can use one of the existing helpers, such as
pcie_wait_for_link()?

Is the 10 second delay chosen arbitrarily or how did you come up
with it?  How much time do the affected products really need?

> @@ -119,8 +138,11 @@ static void remove_board(struct controller *ctrl, bool safe_removal)
>  		 * After turning power off, we must wait for at least 1 second
>  		 * before taking any action that relies on power having been
>  		 * removed from the slot/adapter.
> +		 *
> +		 * Extended wait with polling to ensure hardware has completed
> +		 * power-off sequence.
>  		 */
> -		msleep(1000);
> +		pciehp_wait_for_link_inactive(ctrl);
>  
>  		/* Ignore link or presence changes caused by power off */
>  		atomic_and(~(PCI_EXP_SLTSTA_DLLSC | PCI_EXP_SLTSTA_PDC),

Please keep the msleep(1000), that's the minimum we need to wait
per PCIe r6.3 sec 6.7.1.8.

Please make the extra wait for link down conditional on
ctrl->pcie->port->link_active_reporting.  (DLLLA reporting is
optional for hotplug ports conforming to older spec revisions.)

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] PCI: pciehp: Replace fixed delay with polling for slot power-off
  2025-06-19 11:52 ` Lukas Wunner
@ 2025-06-20  5:54   ` Hongbo Yao
  2025-06-23 18:00     ` Lukas Wunner
  0 siblings, 1 reply; 4+ messages in thread
From: Hongbo Yao @ 2025-06-20  5:54 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: bhelgaas, linux-pci, linux-kernel, linux-arm-kernel, peter.du,
	jemma.zhang



在 2025/6/19 19:52, Lukas Wunner 写道:
> On Thu, Jun 19, 2025 at 05:32:28PM +0800, Hongbo Yao wrote:
>> Fixed 1-second delay in remove_board() fails to accommodate certain
>> hardware like multi-host OCP cards, which exhibit longer power-off
>> latencies.
> 
> Please name the affected product(s).
> 
> They don't seem to comply to the spec.  How prevalent are they?
> If there are only few deployed, quirks like this are probably
> best addressed by an out-of-tree patch.
> 

Hi Lukas,
Thank you for reviewing the patch.

The affected hardware configuration:
 - Host system: Arm Neoverse N2 based server
 - Multi-host OCP card: Mellanox Technologies MT2910 Family [ConnectX-7]


>> Logs before fix:
>> [157.778307] pcieport 0003:00:00.0: pciehp: pending interrupts 0x0001 from Slot Status
>> [157.778321] pcieport 0003:00:00.0: pciehp: Slot(31): Attention button pressed
>> [157.785445] pcieport 0003:00:00.0: pciehp: Slot(31): Powering off due to button press
>> [157.798931] pcieport 000b:00:02.0: pciehp: pending interrupts 0x0001 from Slot Status
> 
> This log excerpt mixes messages from two separate hotplug ports
> (0003:00:00.0 and 000b:00:02.0).  Are these hotplug ports related?
> If not, please reduce the log excerpt to a single hotplug port
> to avoid confusion.
> 
Sorry for not providing adequate context in the patch submission.

Yes, these two hotplug ports are related - they are part of the same
physical multi-host OCP card.

Key points:
1. The OCP card has two independent PCIe endpoints
2. Each endpoint connected to a PCIe root port:
   - Endpoint 1 → Port 0003:00:00.0
   - Endpoint 2 → Port 000b:00:02.0
3. Both endpoints share a common power domain
4. Full power-off occurs only after BOTH endpoints are powered down
5. DLLSC is triggered only after complete power-off

Critical log events:
[157.778307] Both ports: Attention button pressed
[167.540342] Port 0003:00:00.0 power off command issued
[172.289366] Port 000b:00:02.0 power off command issued
[172.302385] Card fully powered off, trigger AER interrupts and DLLSC

Full power-off occurs only after BOTH ports complete their sequences,
taking about 5s total.

>> --- a/drivers/pci/hotplug/pciehp_ctrl.c
>> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
>> @@ -30,6 +30,25 @@
>>  #define SAFE_REMOVAL	 true
>>  #define SURPRISE_REMOVAL false
>>  
>> +static void pciehp_wait_for_link_inactive(struct controller *ctrl)
>> +{
>> +	u16 lnk_status;
>> +	int timeout = 10000, step = 20;
>> +
>> +	do {
>> +		pcie_capability_read_word(ctrl->pcie->port, PCI_EXP_LNKSTA,
>> +					  &lnk_status);
>> +
>> +		if (!(lnk_status & PCI_EXP_LNKSTA_DLLLA))
>> +			return;
>> +
>> +		msleep(step);
>> +		timeout -= step;
>> +	} while (timeout >= 0);
>> +
>> +	ctrl_dbg(ctrl, "Timeout waiting for link inactive state\n");
>> +}
> 
> Any chance you can use one of the existing helpers, such as
> pcie_wait_for_link()?
> 
> Is the 10 second delay chosen arbitrarily or how did you come up
> with it?  How much time do the affected products really need?
>
Ok,  I will try to use pcie_wait_for_link().

The 10-second timeout was determined from actual log observations. The
power-off process for the multi-host OCP card takes approximately 5-9
seconds in our measurements.

>> @@ -119,8 +138,11 @@ static void remove_board(struct controller *ctrl, bool safe_removal)
>>  		 * After turning power off, we must wait for at least 1 second
>>  		 * before taking any action that relies on power having been
>>  		 * removed from the slot/adapter.
>> +		 *
>> +		 * Extended wait with polling to ensure hardware has completed
>> +		 * power-off sequence.
>>  		 */
>> -		msleep(1000);
>> +		pciehp_wait_for_link_inactive(ctrl);
>>  
>>  		/* Ignore link or presence changes caused by power off */
>>  		atomic_and(~(PCI_EXP_SLTSTA_DLLSC | PCI_EXP_SLTSTA_PDC),
> 
> Please keep the msleep(1000), that's the minimum we need to wait
> per PCIe r6.3 sec 6.7.1.8.
> 
> Please make the extra wait for link down conditional on
> ctrl->pcie->port->link_active_reporting.  (DLLLA reporting is
> optional for hotplug ports conforming to older spec revisions.)
> 
Thank you for the valuable suggestion. i'll  revise the patch

Best regards,
Hongbo.> Thanks,
> 
> Lukas
> 
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] PCI: pciehp: Replace fixed delay with polling for slot power-off
  2025-06-20  5:54   ` Hongbo Yao
@ 2025-06-23 18:00     ` Lukas Wunner
  0 siblings, 0 replies; 4+ messages in thread
From: Lukas Wunner @ 2025-06-23 18:00 UTC (permalink / raw)
  To: Hongbo Yao
  Cc: bhelgaas, linux-pci, linux-kernel, linux-arm-kernel, peter.du,
	jemma.zhang

On Fri, Jun 20, 2025 at 01:54:05PM +0800, Hongbo Yao wrote:
> The affected hardware configuration:
>  - Host system: Arm Neoverse N2 based server
>  - Multi-host OCP card: Mellanox Technologies MT2910 Family [ConnectX-7]

Thanks for providing context for this patch submission.
When respinning, please make sure to include the information above
(as well as all the information in your e-mail) in the commit message.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-06-23 18:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-19  9:32 [RFC PATCH] PCI: pciehp: Replace fixed delay with polling for slot power-off Hongbo Yao
2025-06-19 11:52 ` Lukas Wunner
2025-06-20  5:54   ` Hongbo Yao
2025-06-23 18:00     ` Lukas Wunner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).