* [PATCH v2 2/2] PCI: Disable PCIE hotplug interrupts early when msi is disabled
2025-02-18 3:48 [PATCH v2 1/2] PCI/portdrv: Add necessary wait for disabling hotplug events Feng Tang
@ 2025-02-18 3:48 ` Feng Tang
2025-02-18 9:00 ` [PATCH v2 1/2] PCI/portdrv: Add necessary wait for disabling hotplug events Markus Elfring
` (3 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Feng Tang @ 2025-02-18 3:48 UTC (permalink / raw)
To: Bjorn Helgaas, Lukas Wunner, Sathyanarayanan Kuppuswamy,
Liguang Zhang, Guanghui Feng, rafael
Cc: Markus Elfring, Jonathan Cameron, ilpo.jarvinen, linux-pci,
linux-kernel, Feng Tang
There was an irq storm bug when testing "pci=nomsi" case, and the root
cause is: 'nomsi' will disable MSI and let devices and root ports use
legacy INTX interrupt, and likely make several devices/ports share one
interrupt. In the failure case, BIOS doesn't disable the pcie hotplug
interrupts, and actually asserts the command-complete interrupt.
So the timeline is:
1. pciehp's CCIE/HPIE enabled and command-complete interrupts asserted
2. the interrupt is shared by pcie root port and nvme/nic device
3. nvme/nic driver's probe function enables the interrupt line
4. pciehp driver is loaded later or not loaded
And the "nobody cared irq storm" happens between 3 and 4. This is not
an issue for normal MSI case, as each interrupt is controlled by its own
driver. When the driver is not loaded, the interrupt won't get fired
to kernel even if it is physically asserted.
So disable the pcie hotplug CCIE/HPIE interrupt in early boot phase when
MSI is not enabled.
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
---
Changlog:
Since v1:
* Modify the commit log
drivers/pci/probe.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index b6536ed599c3..10d72156da9a 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1664,6 +1664,15 @@ void set_pcie_hotplug_bridge(struct pci_dev *pdev)
pcie_capability_read_dword(pdev, PCI_EXP_SLTCAP, ®32);
if (reg32 & PCI_EXP_SLTCAP_HPC)
pdev->is_hotplug_bridge = 1;
+
+ /*
+ * When MSI is disabled, root port will use legacy INTX, and likely
+ * share INTX interrupt line with other devices like NIC/NVME. There
+ * was real world issue that the CCIE IRQ is asserted afer boot, but
+ * will not be handled well and cause IRQ storm. So disable it early.
+ */
+ if (!pci_msi_enabled())
+ pcie_disable_hp_interrupts_early(pdev);
}
static void set_pcie_thunderbolt(struct pci_dev *dev)
--
2.43.5
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH v2 1/2] PCI/portdrv: Add necessary wait for disabling hotplug events
2025-02-18 3:48 [PATCH v2 1/2] PCI/portdrv: Add necessary wait for disabling hotplug events Feng Tang
2025-02-18 3:48 ` [PATCH v2 2/2] PCI: Disable PCIE hotplug interrupts early when msi is disabled Feng Tang
@ 2025-02-18 9:00 ` Markus Elfring
2025-02-19 2:19 ` Feng Tang
2025-02-18 18:58 ` Sathyanarayanan Kuppuswamy
` (2 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Markus Elfring @ 2025-02-18 9:00 UTC (permalink / raw)
To: Feng Tang, linux-pci, Bjorn Helgaas, Guanghui Feng, Liguang Zhang,
Lukas Wunner, Rafael J. Wysocki, Sathyanarayanan Kuppuswamy
Cc: LKML, Ilpo Järvinen, Jonathan Cameron
> There was problem reported by firmware developers that they received
> 2 pcie link control commands in very short intervals on an ARM server,
> which doesn't comply with pcie spec, and broke their state machine and
> work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs
Would you like to use key words in consistent ways also in such a change description?
> to wait at least 1 second for the command-complete event, before
> resending the cmd or …
command?
…
> ---
> Changlog:
>
> since v1:
…
Are cover letters generally desirable for patch series?
Regards,
Markus
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v2 1/2] PCI/portdrv: Add necessary wait for disabling hotplug events
2025-02-18 9:00 ` [PATCH v2 1/2] PCI/portdrv: Add necessary wait for disabling hotplug events Markus Elfring
@ 2025-02-19 2:19 ` Feng Tang
0 siblings, 0 replies; 10+ messages in thread
From: Feng Tang @ 2025-02-19 2:19 UTC (permalink / raw)
To: Markus Elfring
Cc: linux-pci, Bjorn Helgaas, Guanghui Feng, Liguang Zhang,
Lukas Wunner, Rafael J. Wysocki, Sathyanarayanan Kuppuswamy, LKML,
Ilpo Järvinen, Jonathan Cameron
Hi Markus,
On Tue, Feb 18, 2025 at 10:00:33AM +0100, Markus Elfring wrote:
> > There was problem reported by firmware developers that they received
> > 2 pcie link control commands in very short intervals on an ARM server,
> > which doesn't comply with pcie spec, and broke their state machine and
> > work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs
>
> Would you like to use key words in consistent ways also in such a change description?
Will do. thanks
>
> > to wait at least 1 second for the command-complete event, before
> > resending the cmd or …
>
> command?
Yes.
> …
> > ---
> > Changlog:
> >
> > since v1:
> …
>
> Are cover letters generally desirable for patch series?
The 2 patches solve different issue, and not logically relevant. But
I'll try in next version.
Thanks,
Feng
> Regards,
> Markus
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] PCI/portdrv: Add necessary wait for disabling hotplug events
2025-02-18 3:48 [PATCH v2 1/2] PCI/portdrv: Add necessary wait for disabling hotplug events Feng Tang
2025-02-18 3:48 ` [PATCH v2 2/2] PCI: Disable PCIE hotplug interrupts early when msi is disabled Feng Tang
2025-02-18 9:00 ` [PATCH v2 1/2] PCI/portdrv: Add necessary wait for disabling hotplug events Markus Elfring
@ 2025-02-18 18:58 ` Sathyanarayanan Kuppuswamy
2025-02-19 6:53 ` Feng Tang
2025-02-18 22:33 ` Bjorn Helgaas
2025-02-19 5:57 ` kernel test robot
4 siblings, 1 reply; 10+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2025-02-18 18:58 UTC (permalink / raw)
To: Feng Tang, Bjorn Helgaas, Lukas Wunner, Liguang Zhang,
Guanghui Feng, rafael
Cc: Markus Elfring, Jonathan Cameron, ilpo.jarvinen, linux-pci,
linux-kernel
On 2/17/25 7:48 PM, Feng Tang wrote:
> There was problem reported by firmware developers that they received
> 2 pcie link control commands in very short intervals on an ARM server,
> which doesn't comply with pcie spec, and broke their state machine and
> work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs
> to wait at least 1 second for the command-complete event, before
> resending the cmd or sending a new cmd.
>
> And the first link control command firmware received is from
> get_port_device_capability(), which sends cmd to disable pcie hotplug
> interrupts without waiting for its completion.
Were you able to narrow down the source of the second command? The
reason I am asking is, the commit you are trying to fix seems to have
existed for 10+ years and no one had faced any issues with it. So
I am wondering whether this needs to fixed at this place or before
executing the second command.
>
> Fix it by adding the necessary wait to comply with PCIe spec, referring
> pcie_poll_cmd().
>
> Also make the interrupt disabling not dependent on whether pciehp
> service driver will be loaded as suggested by Lukas.
May be this needs a new patch?
>
> Fixes: 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port initialization")
> Originally-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
> Suggested-by: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> ---
Code wise it looks fine to me.
> Changlog:
>
> since v1:
> * Add the Originally-by for Liguang. The issue was found on a 5.10 kernel,
> then 6.6. I was initially given a 5.10 kernel tar bar without git info to
> debug the issue, and made the patch. Thanks to Guanghui who recently pointed
> me to tree https://gitee.com/anolis/cloud-kernel which show the wait logic
> in 5.10 was originally from Liguang, and never hit mainline.
> * Make the irq disabling not dependent on wthether pciehp service driver
> will be loaded (Lukas Wunner)
> * Use read_poll_timeout() API to simply the waiting logic (Sathyanarayanan
> Kuppuswamy)
> * Add logic to skip irq disabling if it is already disabled.
>
> drivers/pci/pci.h | 2 ++
> drivers/pci/pcie/portdrv.c | 44 +++++++++++++++++++++++++++++++++-----
> 2 files changed, 41 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 01e51db8d285..c1e234d1b81d 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -759,12 +759,14 @@ static inline void pcie_ecrc_get_policy(char *str) { }
> #ifdef CONFIG_PCIEPORTBUS
> void pcie_reset_lbms_count(struct pci_dev *port);
> int pcie_lbms_count(struct pci_dev *port, unsigned long *val);
> +void pcie_disable_hp_interrupts_early(struct pci_dev *dev);
> #else
> static inline void pcie_reset_lbms_count(struct pci_dev *port) {}
> static inline int pcie_lbms_count(struct pci_dev *port, unsigned long *val)
> {
> return -EOPNOTSUPP;
> }
> +static inline void pcie_disable_hp_interrupts_early(struct pci_dev *dev) {}
> #endif
>
> struct pci_dev_reset_methods {
> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> index 02e73099bad0..2470333bba2f 100644
> --- a/drivers/pci/pcie/portdrv.c
> +++ b/drivers/pci/pcie/portdrv.c
> @@ -18,6 +18,7 @@
> #include <linux/string.h>
> #include <linux/slab.h>
> #include <linux/aer.h>
> +#include <linux/iopoll.h>
>
> #include "../pci.h"
> #include "portdrv.h"
> @@ -205,6 +206,40 @@ static int pcie_init_service_irqs(struct pci_dev *dev, int *irqs, int mask)
> return 0;
> }
>
> +static int pcie_wait_sltctl_cmd_raw(struct pci_dev *dev)
> +{
> + u16 slot_status = 0;
> + int ret, ret1, timeout_us;
> +
> + /* 1 second, according to PCIe spec 6.1, section 6.7.3.2 */
> + timeout_us = 1000000;
> + ret = read_poll_timeout(pcie_capability_read_word, ret1,
> + (slot_status & PCI_EXP_SLTSTA_CC), 10000,
> + timeout_us, true, dev, PCI_EXP_SLTSTA,
> + &slot_status);
> + if (!ret)
> + pcie_capability_write_word(dev, PCI_EXP_SLTSTA,
> + PCI_EXP_SLTSTA_CC);
> +
> + return ret;
> +}
> +
> +void pcie_disable_hp_interrupts_early(struct pci_dev *dev)
> +{
> + u16 slot_ctrl = 0;
> +
> + pcie_capability_read_word(dev, PCI_EXP_SLTCTL, &slot_ctrl);
> + /* Bail out early if it is already disabled */
> + if (!(slot_ctrl & (PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE)))
> + return;
> +
> + pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> + PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> +
> + if (pcie_wait_sltctl_cmd_raw(dev))
> + pci_info(dev, "Timeout on disabling PCIE hot-plug interrupt\n");
> +}
> +
> /**
> * get_port_device_capability - discover capabilities of a PCI Express port
> * @dev: PCI Express port to examine
> @@ -222,16 +257,15 @@ static int get_port_device_capability(struct pci_dev *dev)
>
> if (dev->is_hotplug_bridge &&
> (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> - pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM) &&
> - (pcie_ports_native || host->native_pcie_hotplug)) {
> - services |= PCIE_PORT_SERVICE_HP;
> + pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) {
> + if (pcie_ports_native || host->native_pcie_hotplug)
> + services |= PCIE_PORT_SERVICE_HP;
>
> /*
> * Disable hot-plug interrupts in case they have been enabled
> * by the BIOS and the hot-plug service driver is not loaded.
> */
> - pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> - PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> + pcie_disable_hp_interrupts_early(dev);
> }
>
> #ifdef CONFIG_PCIEAER
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v2 1/2] PCI/portdrv: Add necessary wait for disabling hotplug events
2025-02-18 18:58 ` Sathyanarayanan Kuppuswamy
@ 2025-02-19 6:53 ` Feng Tang
0 siblings, 0 replies; 10+ messages in thread
From: Feng Tang @ 2025-02-19 6:53 UTC (permalink / raw)
To: Sathyanarayanan Kuppuswamy
Cc: Bjorn Helgaas, Lukas Wunner, Liguang Zhang, Guanghui Feng, rafael,
Markus Elfring, Jonathan Cameron, ilpo.jarvinen, linux-pci,
linux-kernel
Hi Sathyanarayanan,
On Tue, Feb 18, 2025 at 10:58:19AM -0800, Sathyanarayanan Kuppuswamy wrote:
>
> On 2/17/25 7:48 PM, Feng Tang wrote:
> > There was problem reported by firmware developers that they received
> > 2 pcie link control commands in very short intervals on an ARM server,
> > which doesn't comply with pcie spec, and broke their state machine and
> > work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs
> > to wait at least 1 second for the command-complete event, before
> > resending the cmd or sending a new cmd.
> >
> > And the first link control command firmware received is from
> > get_port_device_capability(), which sends cmd to disable pcie hotplug
> > interrupts without waiting for its completion.
>
> Were you able to narrow down the source of the second command? The
> reason I am asking is, the commit you are trying to fix seems to have
> existed for 10+ years and no one had faced any issues with it. So
> I am wondering whether this needs to fixed at this place or before
> executing the second command.
The second command comes from pcie_enable_notification(), which in our
case will send command to enable hotplug interrupt again.
The firmware developer found the problem when handling some device
hotplug case, kind of stress case.
I think maybe it's better to add the wait after this first command,
which follows the PCIe spec naturally. Also v2 patch adds the logic
of skipping the interrupt-disabling command, if it has been disabled
earlier, either by kernel or BIOS.
>
> >
> > Fix it by adding the necessary wait to comply with PCIe spec, referring
> > pcie_poll_cmd().
> >
> > Also make the interrupt disabling not dependent on whether pciehp
> > service driver will be loaded as suggested by Lukas.
>
> May be this needs a new patch?
Yes, will do.
> >
> > Fixes: 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port initialization")
> > Originally-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
> > Suggested-by: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
> > Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> > ---
>
> Code wise it looks fine to me.
Thanks for the review!
- Feng
> > Changlog:
> >
> > since v1:
> > * Add the Originally-by for Liguang. The issue was found on a 5.10 kernel,
> > then 6.6. I was initially given a 5.10 kernel tar bar without git info to
> > debug the issue, and made the patch. Thanks to Guanghui who recently pointed
> > me to tree https://gitee.com/anolis/cloud-kernel which show the wait logic
> > in 5.10 was originally from Liguang, and never hit mainline.
> > * Make the irq disabling not dependent on wthether pciehp service driver
> > will be loaded (Lukas Wunner)
> > * Use read_poll_timeout() API to simply the waiting logic (Sathyanarayanan
> > Kuppuswamy)
> > * Add logic to skip irq disabling if it is already disabled.
> >
> > drivers/pci/pci.h | 2 ++
> > drivers/pci/pcie/portdrv.c | 44 +++++++++++++++++++++++++++++++++-----
> > 2 files changed, 41 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > index 01e51db8d285..c1e234d1b81d 100644
> > --- a/drivers/pci/pci.h
> > +++ b/drivers/pci/pci.h
> > @@ -759,12 +759,14 @@ static inline void pcie_ecrc_get_policy(char *str) { }
> > #ifdef CONFIG_PCIEPORTBUS
> > void pcie_reset_lbms_count(struct pci_dev *port);
> > int pcie_lbms_count(struct pci_dev *port, unsigned long *val);
> > +void pcie_disable_hp_interrupts_early(struct pci_dev *dev);
> > #else
> > static inline void pcie_reset_lbms_count(struct pci_dev *port) {}
> > static inline int pcie_lbms_count(struct pci_dev *port, unsigned long *val)
> > {
> > return -EOPNOTSUPP;
> > }
> > +static inline void pcie_disable_hp_interrupts_early(struct pci_dev *dev) {}
> > #endif
> > struct pci_dev_reset_methods {
> > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> > index 02e73099bad0..2470333bba2f 100644
> > --- a/drivers/pci/pcie/portdrv.c
> > +++ b/drivers/pci/pcie/portdrv.c
> > @@ -18,6 +18,7 @@
> > #include <linux/string.h>
> > #include <linux/slab.h>
> > #include <linux/aer.h>
> > +#include <linux/iopoll.h>
> > #include "../pci.h"
> > #include "portdrv.h"
> > @@ -205,6 +206,40 @@ static int pcie_init_service_irqs(struct pci_dev *dev, int *irqs, int mask)
> > return 0;
> > }
> > +static int pcie_wait_sltctl_cmd_raw(struct pci_dev *dev)
> > +{
> > + u16 slot_status = 0;
> > + int ret, ret1, timeout_us;
> > +
> > + /* 1 second, according to PCIe spec 6.1, section 6.7.3.2 */
> > + timeout_us = 1000000;
> > + ret = read_poll_timeout(pcie_capability_read_word, ret1,
> > + (slot_status & PCI_EXP_SLTSTA_CC), 10000,
> > + timeout_us, true, dev, PCI_EXP_SLTSTA,
> > + &slot_status);
> > + if (!ret)
> > + pcie_capability_write_word(dev, PCI_EXP_SLTSTA,
> > + PCI_EXP_SLTSTA_CC);
> > +
> > + return ret;
> > +}
> > +
> > +void pcie_disable_hp_interrupts_early(struct pci_dev *dev)
> > +{
> > + u16 slot_ctrl = 0;
> > +
> > + pcie_capability_read_word(dev, PCI_EXP_SLTCTL, &slot_ctrl);
> > + /* Bail out early if it is already disabled */
> > + if (!(slot_ctrl & (PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE)))
> > + return;
> > +
> > + pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> > + PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> > +
> > + if (pcie_wait_sltctl_cmd_raw(dev))
> > + pci_info(dev, "Timeout on disabling PCIE hot-plug interrupt\n");
> > +}
> > +
> > /**
> > * get_port_device_capability - discover capabilities of a PCI Express port
> > * @dev: PCI Express port to examine
> > @@ -222,16 +257,15 @@ static int get_port_device_capability(struct pci_dev *dev)
> > if (dev->is_hotplug_bridge &&
> > (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> > - pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM) &&
> > - (pcie_ports_native || host->native_pcie_hotplug)) {
> > - services |= PCIE_PORT_SERVICE_HP;
> > + pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) {
> > + if (pcie_ports_native || host->native_pcie_hotplug)
> > + services |= PCIE_PORT_SERVICE_HP;
> > /*
> > * Disable hot-plug interrupts in case they have been enabled
> > * by the BIOS and the hot-plug service driver is not loaded.
> > */
> > - pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> > - PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> > + pcie_disable_hp_interrupts_early(dev);
> > }
> > #ifdef CONFIG_PCIEAER
>
> --
> Sathyanarayanan Kuppuswamy
> Linux Kernel Developer
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] PCI/portdrv: Add necessary wait for disabling hotplug events
2025-02-18 3:48 [PATCH v2 1/2] PCI/portdrv: Add necessary wait for disabling hotplug events Feng Tang
` (2 preceding siblings ...)
2025-02-18 18:58 ` Sathyanarayanan Kuppuswamy
@ 2025-02-18 22:33 ` Bjorn Helgaas
2025-02-19 2:53 ` Feng Tang
2025-02-19 5:57 ` kernel test robot
4 siblings, 1 reply; 10+ messages in thread
From: Bjorn Helgaas @ 2025-02-18 22:33 UTC (permalink / raw)
To: Feng Tang
Cc: Bjorn Helgaas, Lukas Wunner, Sathyanarayanan Kuppuswamy,
Liguang Zhang, Guanghui Feng, rafael, Markus Elfring,
Jonathan Cameron, ilpo.jarvinen, linux-pci, linux-kernel
On Tue, Feb 18, 2025 at 11:48:58AM +0800, Feng Tang wrote:
> There was problem reported by firmware developers that they received
> 2 pcie link control commands in very short intervals on an ARM server,
> which doesn't comply with pcie spec, and broke their state machine and
> work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs
> to wait at least 1 second for the command-complete event, before
> resending the cmd or sending a new cmd.
s/link control/hotplug/ (also below)
s/2/two/
s/pcie/PCIe/ (also below)
> And the first link control command firmware received is from
> get_port_device_capability(), which sends cmd to disable pcie hotplug
> interrupts without waiting for its completion.
>
> Fix it by adding the necessary wait to comply with PCIe spec, referring
> pcie_poll_cmd().
>
> Also make the interrupt disabling not dependent on whether pciehp
> service driver will be loaded as suggested by Lukas.
This sounds like maybe it should be two separate patches.
> Fixes: 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port initialization")
> Originally-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
> Suggested-by: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> ---
> Changlog:
>
> since v1:
> * Add the Originally-by for Liguang. The issue was found on a 5.10 kernel,
> then 6.6. I was initially given a 5.10 kernel tar bar without git info to
> debug the issue, and made the patch. Thanks to Guanghui who recently pointed
> me to tree https://gitee.com/anolis/cloud-kernel which show the wait logic
> in 5.10 was originally from Liguang, and never hit mainline.
> * Make the irq disabling not dependent on wthether pciehp service driver
> will be loaded (Lukas Wunner)
> * Use read_poll_timeout() API to simply the waiting logic (Sathyanarayanan
> Kuppuswamy)
> * Add logic to skip irq disabling if it is already disabled.
>
> drivers/pci/pci.h | 2 ++
> drivers/pci/pcie/portdrv.c | 44 +++++++++++++++++++++++++++++++++-----
> 2 files changed, 41 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 01e51db8d285..c1e234d1b81d 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -759,12 +759,14 @@ static inline void pcie_ecrc_get_policy(char *str) { }
> #ifdef CONFIG_PCIEPORTBUS
> void pcie_reset_lbms_count(struct pci_dev *port);
> int pcie_lbms_count(struct pci_dev *port, unsigned long *val);
> +void pcie_disable_hp_interrupts_early(struct pci_dev *dev);
> #else
> static inline void pcie_reset_lbms_count(struct pci_dev *port) {}
> static inline int pcie_lbms_count(struct pci_dev *port, unsigned long *val)
> {
> return -EOPNOTSUPP;
> }
> +static inline void pcie_disable_hp_interrupts_early(struct pci_dev *dev) {}
> #endif
>
> struct pci_dev_reset_methods {
> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> index 02e73099bad0..2470333bba2f 100644
> --- a/drivers/pci/pcie/portdrv.c
> +++ b/drivers/pci/pcie/portdrv.c
> @@ -18,6 +18,7 @@
> #include <linux/string.h>
> #include <linux/slab.h>
> #include <linux/aer.h>
> +#include <linux/iopoll.h>
>
> #include "../pci.h"
> #include "portdrv.h"
> @@ -205,6 +206,40 @@ static int pcie_init_service_irqs(struct pci_dev *dev, int *irqs, int mask)
> return 0;
> }
>
> +static int pcie_wait_sltctl_cmd_raw(struct pci_dev *dev)
> +{
> + u16 slot_status = 0;
> + int ret, ret1, timeout_us;
> +
> + /* 1 second, according to PCIe spec 6.1, section 6.7.3.2 */
> + timeout_us = 1000000;
> + ret = read_poll_timeout(pcie_capability_read_word, ret1,
> + (slot_status & PCI_EXP_SLTSTA_CC), 10000,
> + timeout_us, true, dev, PCI_EXP_SLTSTA,
> + &slot_status);
> + if (!ret)
> + pcie_capability_write_word(dev, PCI_EXP_SLTSTA,
> + PCI_EXP_SLTSTA_CC);
> +
> + return ret;
Ugh. I really don't like the way this basically duplicates
pcie_poll_cmd(). I don't have a great suggestion to fix it; maybe we
need a way to build part of pciehp unconditionally. At the very least
we need a comment here pointing to pcie_poll_cmd().
And IIUC this will add a one second delay for ports that don't need
command completed events. I don't think that's fair to those ports.
> +}
> +
> +void pcie_disable_hp_interrupts_early(struct pci_dev *dev)
> +{
> + u16 slot_ctrl = 0;
> +
> + pcie_capability_read_word(dev, PCI_EXP_SLTCTL, &slot_ctrl);
> + /* Bail out early if it is already disabled */
> + if (!(slot_ctrl & (PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE)))
> + return;
> +
> + pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> + PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> +
> + if (pcie_wait_sltctl_cmd_raw(dev))
> + pci_info(dev, "Timeout on disabling PCIE hot-plug interrupt\n");
s/PCIE/PCIe/
> +}
> +
> /**
> * get_port_device_capability - discover capabilities of a PCI Express port
> * @dev: PCI Express port to examine
> @@ -222,16 +257,15 @@ static int get_port_device_capability(struct pci_dev *dev)
>
> if (dev->is_hotplug_bridge &&
> (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> - pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM) &&
> - (pcie_ports_native || host->native_pcie_hotplug)) {
> - services |= PCIE_PORT_SERVICE_HP;
> + pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) {
> + if (pcie_ports_native || host->native_pcie_hotplug)
> + services |= PCIE_PORT_SERVICE_HP;
>
> /*
> * Disable hot-plug interrupts in case they have been enabled
> * by the BIOS and the hot-plug service driver is not loaded.
> */
> - pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> - PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> + pcie_disable_hp_interrupts_early(dev);
> }
>
> #ifdef CONFIG_PCIEAER
> --
> 2.43.5
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v2 1/2] PCI/portdrv: Add necessary wait for disabling hotplug events
2025-02-18 22:33 ` Bjorn Helgaas
@ 2025-02-19 2:53 ` Feng Tang
2025-02-19 11:12 ` Feng Tang
0 siblings, 1 reply; 10+ messages in thread
From: Feng Tang @ 2025-02-19 2:53 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Bjorn Helgaas, Lukas Wunner, Sathyanarayanan Kuppuswamy,
Liguang Zhang, Guanghui Feng, rafael, Markus Elfring,
Jonathan Cameron, ilpo.jarvinen, linux-pci, linux-kernel
Hi Bjorn Helgaas,
Thanks for the review!
On Tue, Feb 18, 2025 at 04:33:54PM -0600, Bjorn Helgaas wrote:
> On Tue, Feb 18, 2025 at 11:48:58AM +0800, Feng Tang wrote:
> > There was problem reported by firmware developers that they received
> > 2 pcie link control commands in very short intervals on an ARM server,
> > which doesn't comply with pcie spec, and broke their state machine and
> > work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs
> > to wait at least 1 second for the command-complete event, before
> > resending the cmd or sending a new cmd.
>
> s/link control/hotplug/ (also below)
> s/2/two/
> s/pcie/PCIe/ (also below)
Will follow.
> > And the first link control command firmware received is from
> > get_port_device_capability(), which sends cmd to disable pcie hotplug
> > interrupts without waiting for its completion.
> >
> > Fix it by adding the necessary wait to comply with PCIe spec, referring
> > pcie_poll_cmd().
> >
> > Also make the interrupt disabling not dependent on whether pciehp
> > service driver will be loaded as suggested by Lukas.
>
> This sounds like maybe it should be two separate patches.
OK.
[...]
> >
> > +static int pcie_wait_sltctl_cmd_raw(struct pci_dev *dev)
> > +{
> > + u16 slot_status = 0;
> > + int ret, ret1, timeout_us;
> > +
> > + /* 1 second, according to PCIe spec 6.1, section 6.7.3.2 */
> > + timeout_us = 1000000;
> > + ret = read_poll_timeout(pcie_capability_read_word, ret1,
> > + (slot_status & PCI_EXP_SLTSTA_CC), 10000,
> > + timeout_us, true, dev, PCI_EXP_SLTSTA,
> > + &slot_status);
> > + if (!ret)
> > + pcie_capability_write_word(dev, PCI_EXP_SLTSTA,
> > + PCI_EXP_SLTSTA_CC);
> > +
> > + return ret;
>
> Ugh. I really don't like the way this basically duplicates
> pcie_poll_cmd(). I don't have a great suggestion to fix it; maybe we
> need a way to build part of pciehp unconditionally.
Yes, I also thought about this. One idea is to unify the two functions,
and let pcie_poll_cmd() reuse this pcie_wait_sltctl_cmd_raw() here in
portdrv.c. As CONFIG_HOTPLUG_PCI_PCIE depends on CONFIG_PCIEPORTBUS,
there should be no dependency issue. How do you think?
> At the very least
> we need a comment here pointing to pcie_poll_cmd().
Aha, I mentioned 'referring pcie_poll_cmd()' in commit log, but forgot
to add it here.
>
> And IIUC this will add a one second delay for ports that don't need
> command completed events. I don't think that's fair to those ports.
Good catch! So we should add a read of PCI_EXP_SLTCAP register and
check if PCI_EXP_SLTCAP_HPC bit is set.
> > +}
> > +
> > +void pcie_disable_hp_interrupts_early(struct pci_dev *dev)
> > +{
> > + u16 slot_ctrl = 0;
> > +
> > + pcie_capability_read_word(dev, PCI_EXP_SLTCTL, &slot_ctrl);
> > + /* Bail out early if it is already disabled */
> > + if (!(slot_ctrl & (PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE)))
> > + return;
> > +
> > + pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> > + PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> > +
> > + if (pcie_wait_sltctl_cmd_raw(dev))
> > + pci_info(dev, "Timeout on disabling PCIE hot-plug interrupt\n");
>
> s/PCIE/PCIe/
Will change.
Thanks,
Feng
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v2 1/2] PCI/portdrv: Add necessary wait for disabling hotplug events
2025-02-19 2:53 ` Feng Tang
@ 2025-02-19 11:12 ` Feng Tang
0 siblings, 0 replies; 10+ messages in thread
From: Feng Tang @ 2025-02-19 11:12 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Bjorn Helgaas, Lukas Wunner, Sathyanarayanan Kuppuswamy,
Liguang Zhang, Guanghui Feng, rafael, Markus Elfring,
Jonathan Cameron, ilpo.jarvinen, linux-pci, linux-kernel
On Wed, Feb 19, 2025 at 10:53:44AM +0800, Feng Tang wrote:
[...]
> >
> > And IIUC this will add a one second delay for ports that don't need
> > command completed events. I don't think that's fair to those ports.
>
> Good catch! So we should add a read of PCI_EXP_SLTCAP register and
> check if PCI_EXP_SLTCAP_HPC bit is set.
Maybe something like this?
if (slot_cap & PCI_EXP_SLTCAP_HPC &&
!(slot_cap & PCI_EXP_SLTCAP_NCCS) &&
!pdev->broken_cmd_compl)
do_the_wait();
Thanks,
Feng
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] PCI/portdrv: Add necessary wait for disabling hotplug events
2025-02-18 3:48 [PATCH v2 1/2] PCI/portdrv: Add necessary wait for disabling hotplug events Feng Tang
` (3 preceding siblings ...)
2025-02-18 22:33 ` Bjorn Helgaas
@ 2025-02-19 5:57 ` kernel test robot
4 siblings, 0 replies; 10+ messages in thread
From: kernel test robot @ 2025-02-19 5:57 UTC (permalink / raw)
To: Feng Tang, Bjorn Helgaas, Lukas Wunner,
Sathyanarayanan Kuppuswamy, Liguang Zhang, Guanghui Feng, rafael
Cc: oe-kbuild-all, Markus Elfring, Jonathan Cameron, ilpo.jarvinen,
linux-pci, linux-kernel, Feng Tang
Hi Feng,
kernel test robot noticed the following build warnings:
[auto build test WARNING on pci/next]
[also build test WARNING on pci/for-linus linus/master v6.14-rc3 next-20250218]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Feng-Tang/PCI-Disable-PCIE-hotplug-interrupts-early-when-msi-is-disabled/20250218-115134
base: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next
patch link: https://lore.kernel.org/r/20250218034859.40397-1-feng.tang%40linux.alibaba.com
patch subject: [PATCH v2 1/2] PCI/portdrv: Add necessary wait for disabling hotplug events
config: csky-randconfig-002-20250219 (https://download.01.org/0day-ci/archive/20250219/202502191308.uQbXkZna-lkp@intel.com/config)
compiler: csky-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250219/202502191308.uQbXkZna-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202502191308.uQbXkZna-lkp@intel.com/
All warnings (new ones prefixed by >>):
drivers/pci/pcie/portdrv.c: In function 'pcie_wait_sltctl_cmd_raw':
>> drivers/pci/pcie/portdrv.c:212:18: warning: variable 'ret1' set but not used [-Wunused-but-set-variable]
212 | int ret, ret1, timeout_us;
| ^~~~
vim +/ret1 +212 drivers/pci/pcie/portdrv.c
208
209 static int pcie_wait_sltctl_cmd_raw(struct pci_dev *dev)
210 {
211 u16 slot_status = 0;
> 212 int ret, ret1, timeout_us;
213
214 /* 1 second, according to PCIe spec 6.1, section 6.7.3.2 */
215 timeout_us = 1000000;
216 ret = read_poll_timeout(pcie_capability_read_word, ret1,
217 (slot_status & PCI_EXP_SLTSTA_CC), 10000,
218 timeout_us, true, dev, PCI_EXP_SLTSTA,
219 &slot_status);
220 if (!ret)
221 pcie_capability_write_word(dev, PCI_EXP_SLTSTA,
222 PCI_EXP_SLTSTA_CC);
223
224 return ret;
225 }
226
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 10+ messages in thread