public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4] PCI/portdrv: Only disable hotplug interrupts early when needed.
@ 2025-03-03  2:36 Feng Tang
  2025-03-03  3:35 ` Sathyanarayanan Kuppuswamy
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Feng Tang @ 2025-03-03  2:36 UTC (permalink / raw)
  To: Bjorn Helgaas, Lukas Wunner, Sathyanarayanan Kuppuswamy,
	Liguang Zhang, Guanghui Feng, rafael
  Cc: Markus Elfring, Jonathan Cameron, ilpo.jarvinen, linux-pci,
	linux-kernel, Feng Tang

There was problem reported by firmware developers that they received
two PCIe hotplug commands in very short intervals on an ARM server,
which doesn't comply with PCIe spec, and broke their state machine and
work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs
to wait at least 1 second for the command-complete event, before
resending the command or sending a new command.

In the failure case, the first PCIe hotplug command firmware received
is from get_port_device_capability(), which sends command to disable
PCIe hotplug interrupts without waiting for its completion, and the
second command comes from pcie_enable_notification() of pciehp driver,
which enables hotplug interrupts again.

One solution is to add the necessary delay after the first command [1],
while Lukas proposed an optimization that if the pciehp driver will be
loaded soon and handle the interrupts, then the hotplug and the wait
are not needed and can be saved, for every root port.

So fix it by only disabling the hotplug interrupts when pciehp driver
is not enabled.

[1]. https://lore.kernel.org/lkml/20250224034500.23024-1-feng.tang@linux.alibaba.com/t/#u

Fixes: 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port initialization")
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
---
Changelog:

  since v3:
    * Separate this patch from patches dealing with irq storm in nomsi case
    * Take Lukas's suggestion (Lukas Wunner)  

  since v2:
    * Add patch 0001, which move the waiting logic of pcie_poll_cmd from pciehp
      driver to PCIe port driver for code reuse (Bjorn Helgaas)
    * Separate Lucas' suggestion out as patch 0003 (Bjorn and Sathyanarayanan)  
    * Avoid hotplug command waiting for HW without command-complete
      event support (Bjorn Helgaas)
    * Fix spell issue in commit log (Bjorn and Markus)
    * Add cover-letter for whole patchset (Markus Elfring)
    * Handle a set-but-unused build warning (0Day lkp bot)

  since v1:
    * Add the Originally-by for Liguang for patch 0002. The issue was found on
      a 5.10 kernel, then 6.6. I was initially given a 5.10 kernel tar ball
      without git info to debug the issue, and made the patch. Thanks to Guanghui
      who recently pointed me to tree https://gitee.com/anolis/cloud-kernel which
      show the wait logic in 5.10 was originally from Liguang, and never hit
      mainline.
    * Make the irq disabling not dependent on wthether pciehp service driver
      will be loaded (Lukas Wunner) 
    * Use read_poll_timeout() API to simply the waiting logic (Sathyanarayanan
      Kuppuswamy)
    * Fix wrong email address (Markus Elfring)
    * Add logic to skip irq disabling if it is already disabled.


 drivers/pci/pcie/portdrv.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index 02e73099bad0..e8318fd5f6ed 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -228,10 +228,12 @@ static int get_port_device_capability(struct pci_dev *dev)
 
 		/*
 		 * Disable hot-plug interrupts in case they have been enabled
-		 * by the BIOS and the hot-plug service driver is not loaded.
+		 * by the BIOS and the hot-plug service driver won't be loaded
+		 * to handle them.
 		 */
-		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
-			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
+		if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
+			pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
+				PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
 	}
 
 #ifdef CONFIG_PCIEAER
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v4] PCI/portdrv: Only disable hotplug interrupts early when needed.
  2025-03-03  2:36 [PATCH v4] PCI/portdrv: Only disable hotplug interrupts early when needed Feng Tang
@ 2025-03-03  3:35 ` Sathyanarayanan Kuppuswamy
  2025-03-03  5:55   ` Feng Tang
  2025-03-03  8:54 ` Lukas Wunner
  2025-03-04 23:17 ` Bjorn Helgaas
  2 siblings, 1 reply; 5+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2025-03-03  3:35 UTC (permalink / raw)
  To: Feng Tang, Bjorn Helgaas, Lukas Wunner, Liguang Zhang,
	Guanghui Feng, rafael
  Cc: Markus Elfring, Jonathan Cameron, ilpo.jarvinen, linux-pci,
	linux-kernel


On 3/2/25 6:36 PM, Feng Tang wrote:
> There was problem reported by firmware developers that they received
> two PCIe hotplug commands in very short intervals on an ARM server,
> which doesn't comply with PCIe spec, and broke their state machine and
> work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs
> to wait at least 1 second for the command-complete event, before
> resending the command or sending a new command.
>
> In the failure case, the first PCIe hotplug command firmware received
> is from get_port_device_capability(), which sends command to disable
> PCIe hotplug interrupts without waiting for its completion, and the
> second command comes from pcie_enable_notification() of pciehp driver,
> which enables hotplug interrupts again.
>
> One solution is to add the necessary delay after the first command [1],
> while Lukas proposed an optimization that if the pciehp driver will be
> loaded soon and handle the interrupts, then the hotplug and the wait
> are not needed and can be saved, for every root port.

I think above part of the commit message might need some rewording.

The way you are fixing this issue is to make the first command conditional
on hotplug driver not enabled. That way you can skip one of these
commands in both enable/disable hotplug driver case.

I also recommend adding some info about why making this change
should not affect the original intention of commit # 2bd50dd800b5

Code wise looks fine.

Reviewed-by: Kuppuswamy Sathyanarayanan 
<sathyanarayanan.kuppuswamy@linux.intel.com>

>
> So fix it by only disabling the hotplug interrupts when pciehp driver
> is not enabled.
>
> [1]. https://lore.kernel.org/lkml/20250224034500.23024-1-feng.tang@linux.alibaba.com/t/#u
>
> Fixes: 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port initialization")
> Suggested-by: Lukas Wunner <lukas@wunner.de>
> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> ---
> Changelog:
>
>    since v3:
>      * Separate this patch from patches dealing with irq storm in nomsi case
>      * Take Lukas's suggestion (Lukas Wunner)
>
>    since v2:
>      * Add patch 0001, which move the waiting logic of pcie_poll_cmd from pciehp
>        driver to PCIe port driver for code reuse (Bjorn Helgaas)
>      * Separate Lucas' suggestion out as patch 0003 (Bjorn and Sathyanarayanan)
>      * Avoid hotplug command waiting for HW without command-complete
>        event support (Bjorn Helgaas)
>      * Fix spell issue in commit log (Bjorn and Markus)
>      * Add cover-letter for whole patchset (Markus Elfring)
>      * Handle a set-but-unused build warning (0Day lkp bot)
>
>    since v1:
>      * Add the Originally-by for Liguang for patch 0002. The issue was found on
>        a 5.10 kernel, then 6.6. I was initially given a 5.10 kernel tar ball
>        without git info to debug the issue, and made the patch. Thanks to Guanghui
>        who recently pointed me to tree https://gitee.com/anolis/cloud-kernel which
>        show the wait logic in 5.10 was originally from Liguang, and never hit
>        mainline.
>      * Make the irq disabling not dependent on wthether pciehp service driver
>        will be loaded (Lukas Wunner)
>      * Use read_poll_timeout() API to simply the waiting logic (Sathyanarayanan
>        Kuppuswamy)
>      * Fix wrong email address (Markus Elfring)
>      * Add logic to skip irq disabling if it is already disabled.
>
>
>   drivers/pci/pcie/portdrv.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> index 02e73099bad0..e8318fd5f6ed 100644
> --- a/drivers/pci/pcie/portdrv.c
> +++ b/drivers/pci/pcie/portdrv.c
> @@ -228,10 +228,12 @@ static int get_port_device_capability(struct pci_dev *dev)
>   
>   		/*
>   		 * Disable hot-plug interrupts in case they have been enabled
> -		 * by the BIOS and the hot-plug service driver is not loaded.
> +		 * by the BIOS and the hot-plug service driver won't be loaded
I think you can use "is not enabled" instead of won't be loaded.

> +		 * to handle them.
>   		 */
> -		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> -			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> +		if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
> +			pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> +				PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
>   	}
>   
>   #ifdef CONFIG_PCIEAER

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4] PCI/portdrv: Only disable hotplug interrupts early when needed.
  2025-03-03  3:35 ` Sathyanarayanan Kuppuswamy
@ 2025-03-03  5:55   ` Feng Tang
  0 siblings, 0 replies; 5+ messages in thread
From: Feng Tang @ 2025-03-03  5:55 UTC (permalink / raw)
  To: Sathyanarayanan Kuppuswamy
  Cc: Bjorn Helgaas, Lukas Wunner, Liguang Zhang, Guanghui Feng, rafael,
	Markus Elfring, Jonathan Cameron, ilpo.jarvinen, linux-pci,
	linux-kernel

Hi Sathyanarayanan,

On Sun, Mar 02, 2025 at 07:35:21PM -0800, Sathyanarayanan Kuppuswamy wrote:
> 
> On 3/2/25 6:36 PM, Feng Tang wrote:
> > There was problem reported by firmware developers that they received
> > two PCIe hotplug commands in very short intervals on an ARM server,
> > which doesn't comply with PCIe spec, and broke their state machine and
> > work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs
> > to wait at least 1 second for the command-complete event, before
> > resending the command or sending a new command.
> > 
> > In the failure case, the first PCIe hotplug command firmware received
> > is from get_port_device_capability(), which sends command to disable
> > PCIe hotplug interrupts without waiting for its completion, and the
> > second command comes from pcie_enable_notification() of pciehp driver,
> > which enables hotplug interrupts again.
> > 
> > One solution is to add the necessary delay after the first command [1],
> > while Lukas proposed an optimization that if the pciehp driver will be
> > loaded soon and handle the interrupts, then the hotplug and the wait
> > are not needed and can be saved, for every root port.
> 
> I think above part of the commit message might need some rewording.
> 
> The way you are fixing this issue is to make the first command conditional
> on hotplug driver not enabled. That way you can skip one of these
> commands in both enable/disable hotplug driver case.
> 
> I also recommend adding some info about why making this change
> should not affect the original intention of commit # 2bd50dd800b5

Sure. Will add.

> Code wise looks fine.
> 
> Reviewed-by: Kuppuswamy Sathyanarayanan
> <sathyanarayanan.kuppuswamy@linux.intel.com>
 
Thank you!

- Feng

> > 
> > So fix it by only disabling the hotplug interrupts when pciehp driver
> > is not enabled.
> > 
> > [1]. https://lore.kernel.org/lkml/20250224034500.23024-1-feng.tang@linux.alibaba.com/t/#u
> > 
> > Fixes: 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port initialization")
> > Suggested-by: Lukas Wunner <lukas@wunner.de>
> > Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> > ---
> > Changelog:
> > 
> >    since v3:
> >      * Separate this patch from patches dealing with irq storm in nomsi case
> >      * Take Lukas's suggestion (Lukas Wunner)
> > 
> >    since v2:
> >      * Add patch 0001, which move the waiting logic of pcie_poll_cmd from pciehp
> >        driver to PCIe port driver for code reuse (Bjorn Helgaas)
> >      * Separate Lucas' suggestion out as patch 0003 (Bjorn and Sathyanarayanan)
> >      * Avoid hotplug command waiting for HW without command-complete
> >        event support (Bjorn Helgaas)
> >      * Fix spell issue in commit log (Bjorn and Markus)
> >      * Add cover-letter for whole patchset (Markus Elfring)
> >      * Handle a set-but-unused build warning (0Day lkp bot)
> > 
> >    since v1:
> >      * Add the Originally-by for Liguang for patch 0002. The issue was found on
> >        a 5.10 kernel, then 6.6. I was initially given a 5.10 kernel tar ball
> >        without git info to debug the issue, and made the patch. Thanks to Guanghui
> >        who recently pointed me to tree https://gitee.com/anolis/cloud-kernel which
> >        show the wait logic in 5.10 was originally from Liguang, and never hit
> >        mainline.
> >      * Make the irq disabling not dependent on wthether pciehp service driver
> >        will be loaded (Lukas Wunner)
> >      * Use read_poll_timeout() API to simply the waiting logic (Sathyanarayanan
> >        Kuppuswamy)
> >      * Fix wrong email address (Markus Elfring)
> >      * Add logic to skip irq disabling if it is already disabled.
> > 
> > 
> >   drivers/pci/pcie/portdrv.c | 8 +++++---
> >   1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> > index 02e73099bad0..e8318fd5f6ed 100644
> > --- a/drivers/pci/pcie/portdrv.c
> > +++ b/drivers/pci/pcie/portdrv.c
> > @@ -228,10 +228,12 @@ static int get_port_device_capability(struct pci_dev *dev)
> >   		/*
> >   		 * Disable hot-plug interrupts in case they have been enabled
> > -		 * by the BIOS and the hot-plug service driver is not loaded.
> > +		 * by the BIOS and the hot-plug service driver won't be loaded
> I think you can use "is not enabled" instead of won't be loaded.
> 
> > +		 * to handle them.
> >   		 */
> > -		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> > -			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> > +		if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
> > +			pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> > +				PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> >   	}
> >   #ifdef CONFIG_PCIEAER
> 
> -- 
> Sathyanarayanan Kuppuswamy
> Linux Kernel Developer

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4] PCI/portdrv: Only disable hotplug interrupts early when needed.
  2025-03-03  2:36 [PATCH v4] PCI/portdrv: Only disable hotplug interrupts early when needed Feng Tang
  2025-03-03  3:35 ` Sathyanarayanan Kuppuswamy
@ 2025-03-03  8:54 ` Lukas Wunner
  2025-03-04 23:17 ` Bjorn Helgaas
  2 siblings, 0 replies; 5+ messages in thread
From: Lukas Wunner @ 2025-03-03  8:54 UTC (permalink / raw)
  To: Feng Tang
  Cc: Bjorn Helgaas, Sathyanarayanan Kuppuswamy, Liguang Zhang,
	Guanghui Feng, rafael, Markus Elfring, Jonathan Cameron,
	ilpo.jarvinen, linux-pci, linux-kernel

On Mon, Mar 03, 2025 at 10:36:30AM +0800, Feng Tang wrote:
> There was problem reported by firmware developers that they received
> two PCIe hotplug commands in very short intervals on an ARM server,
> which doesn't comply with PCIe spec, and broke their state machine and
> work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs
> to wait at least 1 second for the command-complete event, before
> resending the command or sending a new command.
> 
> In the failure case, the first PCIe hotplug command firmware received
> is from get_port_device_capability(), which sends command to disable
> PCIe hotplug interrupts without waiting for its completion, and the
> second command comes from pcie_enable_notification() of pciehp driver,
> which enables hotplug interrupts again.
> 
> One solution is to add the necessary delay after the first command [1],
> while Lukas proposed an optimization that if the pciehp driver will be
> loaded soon and handle the interrupts, then the hotplug and the wait
> are not needed and can be saved, for every root port.
> 
> So fix it by only disabling the hotplug interrupts when pciehp driver
> is not enabled.
> 
> [1]. https://lore.kernel.org/lkml/20250224034500.23024-1-feng.tang@linux.alibaba.com/t/#u
> 
> Fixes: 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port initialization")
> Suggested-by: Lukas Wunner <lukas@wunner.de>
> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>

Reviewed-by: Lukas Wunner <lukas@wunner.de>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4] PCI/portdrv: Only disable hotplug interrupts early when needed.
  2025-03-03  2:36 [PATCH v4] PCI/portdrv: Only disable hotplug interrupts early when needed Feng Tang
  2025-03-03  3:35 ` Sathyanarayanan Kuppuswamy
  2025-03-03  8:54 ` Lukas Wunner
@ 2025-03-04 23:17 ` Bjorn Helgaas
  2 siblings, 0 replies; 5+ messages in thread
From: Bjorn Helgaas @ 2025-03-04 23:17 UTC (permalink / raw)
  To: Feng Tang
  Cc: Bjorn Helgaas, Lukas Wunner, Sathyanarayanan Kuppuswamy,
	Liguang Zhang, Guanghui Feng, rafael, Markus Elfring,
	Jonathan Cameron, ilpo.jarvinen, linux-pci, linux-kernel

On Mon, Mar 03, 2025 at 10:36:30AM +0800, Feng Tang wrote:
> There was problem reported by firmware developers that they received
> two PCIe hotplug commands in very short intervals on an ARM server,
> which doesn't comply with PCIe spec, and broke their state machine and
> work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs
> to wait at least 1 second for the command-complete event, before
> resending the command or sending a new command.
> 
> In the failure case, the first PCIe hotplug command firmware received
> is from get_port_device_capability(), which sends command to disable
> PCIe hotplug interrupts without waiting for its completion, and the
> second command comes from pcie_enable_notification() of pciehp driver,
> which enables hotplug interrupts again.
> 
> One solution is to add the necessary delay after the first command [1],
> while Lukas proposed an optimization that if the pciehp driver will be
> loaded soon and handle the interrupts, then the hotplug and the wait
> are not needed and can be saved, for every root port.
> 
> So fix it by only disabling the hotplug interrupts when pciehp driver
> is not enabled.
> 
> [1]. https://lore.kernel.org/lkml/20250224034500.23024-1-feng.tang@linux.alibaba.com/t/#u
> 
> Fixes: 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port initialization")
> Suggested-by: Lukas Wunner <lukas@wunner.de>
> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>

Applied to pci/hotplug for v6.15, thanks!

> ---
> Changelog:
> 
>   since v3:
>     * Separate this patch from patches dealing with irq storm in nomsi case
>     * Take Lukas's suggestion (Lukas Wunner)  
> 
>   since v2:
>     * Add patch 0001, which move the waiting logic of pcie_poll_cmd from pciehp
>       driver to PCIe port driver for code reuse (Bjorn Helgaas)
>     * Separate Lucas' suggestion out as patch 0003 (Bjorn and Sathyanarayanan)  
>     * Avoid hotplug command waiting for HW without command-complete
>       event support (Bjorn Helgaas)
>     * Fix spell issue in commit log (Bjorn and Markus)
>     * Add cover-letter for whole patchset (Markus Elfring)
>     * Handle a set-but-unused build warning (0Day lkp bot)
> 
>   since v1:
>     * Add the Originally-by for Liguang for patch 0002. The issue was found on
>       a 5.10 kernel, then 6.6. I was initially given a 5.10 kernel tar ball
>       without git info to debug the issue, and made the patch. Thanks to Guanghui
>       who recently pointed me to tree https://gitee.com/anolis/cloud-kernel which
>       show the wait logic in 5.10 was originally from Liguang, and never hit
>       mainline.
>     * Make the irq disabling not dependent on wthether pciehp service driver
>       will be loaded (Lukas Wunner) 
>     * Use read_poll_timeout() API to simply the waiting logic (Sathyanarayanan
>       Kuppuswamy)
>     * Fix wrong email address (Markus Elfring)
>     * Add logic to skip irq disabling if it is already disabled.
> 
> 
>  drivers/pci/pcie/portdrv.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> index 02e73099bad0..e8318fd5f6ed 100644
> --- a/drivers/pci/pcie/portdrv.c
> +++ b/drivers/pci/pcie/portdrv.c
> @@ -228,10 +228,12 @@ static int get_port_device_capability(struct pci_dev *dev)
>  
>  		/*
>  		 * Disable hot-plug interrupts in case they have been enabled
> -		 * by the BIOS and the hot-plug service driver is not loaded.
> +		 * by the BIOS and the hot-plug service driver won't be loaded
> +		 * to handle them.
>  		 */
> -		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> -			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> +		if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
> +			pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> +				PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
>  	}
>  
>  #ifdef CONFIG_PCIEAER
> -- 
> 2.43.5
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-03-04 23:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-03  2:36 [PATCH v4] PCI/portdrv: Only disable hotplug interrupts early when needed Feng Tang
2025-03-03  3:35 ` Sathyanarayanan Kuppuswamy
2025-03-03  5:55   ` Feng Tang
2025-03-03  8:54 ` Lukas Wunner
2025-03-04 23:17 ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox