linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] PCI: Add helper to check if any of ancestor device support D3cold
@ 2023-08-24  4:46 Kai-Heng Feng
  2023-08-24  4:46 ` [PATCH 2/3] PCI/AER: Disable AER service on suspend Kai-Heng Feng
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Kai-Heng Feng @ 2023-08-24  4:46 UTC (permalink / raw)
  To: bhelgaas
  Cc: mika.westerberg, koba.ko, sathyanarayanan.kuppuswamy,
	Kai-Heng Feng, linux-pci, linux-kernel

In addition to nearest upstream bridge, driver may want to know if the
entire hierarchy can be powered off to perform different action.

So walk higher up the hierarchy to find out if any device has valid
_PR3.

The user will be introduced in next patch.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
---
 drivers/pci/pci.c   | 16 ++++++++++++++++
 include/linux/pci.h |  2 ++
 2 files changed, 18 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 60230da957e0..79961936ed1b 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -6500,6 +6500,22 @@ bool pci_pr3_present(struct pci_dev *pdev)
 		acpi_has_method(adev->handle, "_PR3");
 }
 EXPORT_SYMBOL_GPL(pci_pr3_present);
+
+bool pci_ancestor_pr3_present(struct pci_dev *pdev)
+{
+	struct pci_dev *parent = pdev;
+
+	if (acpi_disabled)
+		return false;
+
+	while ((parent = pci_upstream_bridge(parent))) {
+		if (pci_pr3_present(pdev))
+			return true;
+	}
+
+	return false;
+}
+EXPORT_SYMBOL_GPL(pci_ancestor_pr3_present);
 #endif
 
 /**
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c69a2cc1f412..1f30c20753a3 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2512,10 +2512,12 @@ struct irq_domain *pci_host_bridge_acpi_msi_domain(struct pci_bus *bus);
 void
 pci_msi_register_fwnode_provider(struct fwnode_handle *(*fn)(struct device *));
 bool pci_pr3_present(struct pci_dev *pdev);
+bool pci_ancestor_pr3_present(struct pci_dev *pdev);
 #else
 static inline struct irq_domain *
 pci_host_bridge_acpi_msi_domain(struct pci_bus *bus) { return NULL; }
 static inline bool pci_pr3_present(struct pci_dev *pdev) { return false; }
+static inline bool pci_ancestor_pr3_present(struct pci_dev *pdev) { return false; }
 #endif
 
 #ifdef CONFIG_EEH
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] PCI/AER: Disable AER service on suspend
  2023-08-24  4:46 [PATCH 1/3] PCI: Add helper to check if any of ancestor device support D3cold Kai-Heng Feng
@ 2023-08-24  4:46 ` Kai-Heng Feng
  2023-08-24  4:46 ` [PATCH 3/3] PCI/DPC: Disable DPC " Kai-Heng Feng
  2023-08-24 11:56 ` [PATCH 1/3] PCI: Add helper to check if any of ancestor device support D3cold Mika Westerberg
  2 siblings, 0 replies; 11+ messages in thread
From: Kai-Heng Feng @ 2023-08-24  4:46 UTC (permalink / raw)
  To: bhelgaas
  Cc: mika.westerberg, koba.ko, sathyanarayanan.kuppuswamy,
	Kai-Heng Feng, Mahesh J Salgaonkar, Oliver O'Halloran,
	linuxppc-dev, linux-pci, linux-kernel

When the power rail gets cut off, the hardware can create some electric
noise on the link that triggers AER. If IRQ is shared between AER with
PME, such AER noise will cause a spurious wakeup on system suspend.

When the power rail gets back, the firmware of the device resets itself
and can create unexpected behavior like sending PTM messages. For this
case, the driver will always be too late to toggle off features should
be disabled.

As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power
Management", TLP and DLLP transmission are disabled for a Link in L2/L3
Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold) states. So if
the power will be turned off during suspend process, disable AER service
and re-enable it during the resume process. This should not affect the
basic functionality.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295

v7:
 - Wording
 - Disable AER completely (again) if power will be turned off

v6:
v5:
 - Wording.

v4:
v3:
 - No change.

v2:
 - Only disable AER IRQ.
 - No more check on PME IRQ#.
 - Use helper.
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
---
 drivers/pci/pcie/aer.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index f6c24ded134c..b5161bfedd8b 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -28,6 +28,7 @@
 #include <linux/delay.h>
 #include <linux/kfifo.h>
 #include <linux/slab.h>
+#include <linux/suspend.h>
 #include <acpi/apei.h>
 #include <ras/ras_event.h>
 
@@ -1340,6 +1341,28 @@ static int aer_probe(struct pcie_device *dev)
 	return 0;
 }
 
+static int aer_suspend(struct pcie_device *dev)
+{
+	struct aer_rpc *rpc = get_service_data(dev);
+	struct pci_dev *pdev = rpc->rpd;
+
+	if (pci_ancestor_pr3_present(pdev) || pm_suspend_via_firmware())
+		aer_disable_rootport(rpc);
+
+	return 0;
+}
+
+static int aer_resume(struct pcie_device *dev)
+{
+	struct aer_rpc *rpc = get_service_data(dev);
+	struct pci_dev *pdev = rpc->rpd;
+
+	if (pci_ancestor_pr3_present(pdev) || pm_resume_via_firmware())
+		aer_enable_rootport(rpc);
+
+	return 0;
+}
+
 /**
  * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP
  * @dev: pointer to Root Port, RCEC, or RCiEP
@@ -1411,6 +1434,8 @@ static struct pcie_port_service_driver aerdriver = {
 	.service	= PCIE_PORT_SERVICE_AER,
 
 	.probe		= aer_probe,
+	.suspend	= aer_suspend,
+	.resume		= aer_resume,
 	.remove		= aer_remove,
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] PCI/DPC: Disable DPC service on suspend
  2023-08-24  4:46 [PATCH 1/3] PCI: Add helper to check if any of ancestor device support D3cold Kai-Heng Feng
  2023-08-24  4:46 ` [PATCH 2/3] PCI/AER: Disable AER service on suspend Kai-Heng Feng
@ 2023-08-24  4:46 ` Kai-Heng Feng
  2023-08-24 11:56 ` [PATCH 1/3] PCI: Add helper to check if any of ancestor device support D3cold Mika Westerberg
  2 siblings, 0 replies; 11+ messages in thread
From: Kai-Heng Feng @ 2023-08-24  4:46 UTC (permalink / raw)
  To: bhelgaas
  Cc: mika.westerberg, koba.ko, sathyanarayanan.kuppuswamy,
	Kai-Heng Feng, Mahesh J Salgaonkar, Oliver O'Halloran,
	linuxppc-dev, linux-pci, linux-kernel

When the power rail gets cut off, the hardware can create some electric
noise on the link that triggers AER. If IRQ is shared between AER with
PME, such AER noise will cause a spurious wakeup on system suspend.

When the power rail gets back, the firmware of the device resets itself
and can create unexpected behavior like sending PTM messages. For this
case, the driver will always be too late to toggle off features should
be disabled.

As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power
Management", TLP and DLLP transmission are disabled for a Link in L2/L3
Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold) states. So if
the power will be turned off during suspend process, disable DPC service
and re-enable it during the resume process. This should not affect the
basic functionality.

Since DPC depends on AER to function, also disable DPC.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295

v7:
 - Wording.
 - Disable DPC completely (again) if power will be turned off

v6:
v5:
 - Wording.

v4:
v3:
 - No change.

v2:
 - Only disable DPC IRQ.
 - No more check on PME IRQ#.
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
---
 drivers/pci/pcie/dpc.c | 56 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 46 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 3ceed8e3de41..73426addb2f1 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -13,6 +13,7 @@
 #include <linux/interrupt.h>
 #include <linux/init.h>
 #include <linux/pci.h>
+#include <linux/suspend.h>
 
 #include "portdrv.h"
 #include "../pci.h"
@@ -347,13 +348,34 @@ void pci_dpc_init(struct pci_dev *pdev)
 	}
 }
 
+static void dpc_enable(struct pcie_device *dev)
+{
+	struct pci_dev *pdev = dev->port;
+	u16 ctl;
+
+	pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl);
+
+	ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN;
+	pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);
+}
+
+static void dpc_disable(struct pcie_device *dev)
+{
+	struct pci_dev *pdev = dev->port;
+	u16 ctl;
+
+	pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl);
+	ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN);
+	pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);
+}
+
 #define FLAG(x, y) (((x) & (y)) ? '+' : '-')
 static int dpc_probe(struct pcie_device *dev)
 {
 	struct pci_dev *pdev = dev->port;
 	struct device *device = &dev->device;
 	int status;
-	u16 ctl, cap;
+	u16 cap;
 
 	if (!pcie_aer_is_native(pdev) && !pcie_ports_dpc_native)
 		return -ENOTSUPP;
@@ -368,10 +390,7 @@ static int dpc_probe(struct pcie_device *dev)
 	}
 
 	pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CAP, &cap);
-	pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl);
-
-	ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN;
-	pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);
+	dpc_enable(dev);
 	pci_info(pdev, "enabled with IRQ %d\n", dev->irq);
 
 	pci_info(pdev, "error containment capabilities: Int Msg #%d, RPExt%c PoisonedTLP%c SwTrigger%c RP PIO Log %d, DL_ActiveErr%c\n",
@@ -384,14 +403,29 @@ static int dpc_probe(struct pcie_device *dev)
 	return status;
 }
 
-static void dpc_remove(struct pcie_device *dev)
+static int dpc_suspend(struct pcie_device *dev)
 {
 	struct pci_dev *pdev = dev->port;
-	u16 ctl;
 
-	pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl);
-	ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN);
-	pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);
+	if (pci_ancestor_pr3_present(pdev) || pm_suspend_via_firmware())
+		dpc_disable(dev);
+
+	return 0;
+}
+
+static int dpc_resume(struct pcie_device *dev)
+{
+	struct pci_dev *pdev = dev->port;
+
+	if (pci_ancestor_pr3_present(pdev) || pm_resume_via_firmware())
+		dpc_enable(dev);
+
+	return 0;
+}
+
+static void dpc_remove(struct pcie_device *dev)
+{
+	dpc_disable(dev);
 }
 
 static struct pcie_port_service_driver dpcdriver = {
@@ -399,6 +433,8 @@ static struct pcie_port_service_driver dpcdriver = {
 	.port_type	= PCIE_ANY_PORT,
 	.service	= PCIE_PORT_SERVICE_DPC,
 	.probe		= dpc_probe,
+	.suspend	= dpc_suspend,
+	.resume		= dpc_resume,
 	.remove		= dpc_remove,
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] PCI: Add helper to check if any of ancestor device support D3cold
  2023-08-24  4:46 [PATCH 1/3] PCI: Add helper to check if any of ancestor device support D3cold Kai-Heng Feng
  2023-08-24  4:46 ` [PATCH 2/3] PCI/AER: Disable AER service on suspend Kai-Heng Feng
  2023-08-24  4:46 ` [PATCH 3/3] PCI/DPC: Disable DPC " Kai-Heng Feng
@ 2023-08-24 11:56 ` Mika Westerberg
  2023-08-24 13:46   ` Kai-Heng Feng
  2 siblings, 1 reply; 11+ messages in thread
From: Mika Westerberg @ 2023-08-24 11:56 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: bhelgaas, koba.ko, sathyanarayanan.kuppuswamy, linux-pci,
	linux-kernel

Hi,

On Thu, Aug 24, 2023 at 12:46:43PM +0800, Kai-Heng Feng wrote:
> In addition to nearest upstream bridge, driver may want to know if the
> entire hierarchy can be powered off to perform different action.
> 
> So walk higher up the hierarchy to find out if any device has valid
> _PR3.

I'm not entirely sure this is good idea. The drivers should expect that
the power will be turned off pretty soon after device enters D3hot. Also
_PR3 is not PCI concept it's ACPI concept so API like this would only
work on systems with ACPI.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] PCI: Add helper to check if any of ancestor device support D3cold
  2023-08-24 11:56 ` [PATCH 1/3] PCI: Add helper to check if any of ancestor device support D3cold Mika Westerberg
@ 2023-08-24 13:46   ` Kai-Heng Feng
  2023-08-25  5:29     ` Mika Westerberg
  0 siblings, 1 reply; 11+ messages in thread
From: Kai-Heng Feng @ 2023-08-24 13:46 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: bhelgaas, koba.ko, sathyanarayanan.kuppuswamy, linux-pci,
	linux-kernel

Hi,

On Thu, Aug 24, 2023 at 7:57 PM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
>
> Hi,
>
> On Thu, Aug 24, 2023 at 12:46:43PM +0800, Kai-Heng Feng wrote:
> > In addition to nearest upstream bridge, driver may want to know if the
> > entire hierarchy can be powered off to perform different action.
> >
> > So walk higher up the hierarchy to find out if any device has valid
> > _PR3.
>
> I'm not entirely sure this is good idea. The drivers should expect that
> the power will be turned off pretty soon after device enters D3hot. Also
> _PR3 is not PCI concept it's ACPI concept so API like this would only
> work on systems with ACPI.

IIUC, Bjorn wants to limit the AER/DPC disablement when device power
is really off.
Is "the power will be turned off pretty soon after device enters
D3hot" applicable to most devices? Since config space is still
accessible when device is in D3hot.
Unless there are cases when device firmware behave differently to
D3hot? Then maybe it's better to disable AER for both D3hot, D3cold
and system S3.

Kai-Heng

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] PCI: Add helper to check if any of ancestor device support D3cold
  2023-08-24 13:46   ` Kai-Heng Feng
@ 2023-08-25  5:29     ` Mika Westerberg
  2023-08-25  5:43       ` Kai-Heng Feng
  0 siblings, 1 reply; 11+ messages in thread
From: Mika Westerberg @ 2023-08-25  5:29 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: bhelgaas, koba.ko, sathyanarayanan.kuppuswamy, linux-pci,
	linux-kernel

On Thu, Aug 24, 2023 at 09:46:00PM +0800, Kai-Heng Feng wrote:
> Hi,
> 
> On Thu, Aug 24, 2023 at 7:57 PM Mika Westerberg
> <mika.westerberg@linux.intel.com> wrote:
> >
> > Hi,
> >
> > On Thu, Aug 24, 2023 at 12:46:43PM +0800, Kai-Heng Feng wrote:
> > > In addition to nearest upstream bridge, driver may want to know if the
> > > entire hierarchy can be powered off to perform different action.
> > >
> > > So walk higher up the hierarchy to find out if any device has valid
> > > _PR3.
> >
> > I'm not entirely sure this is good idea. The drivers should expect that
> > the power will be turned off pretty soon after device enters D3hot. Also
> > _PR3 is not PCI concept it's ACPI concept so API like this would only
> > work on systems with ACPI.
> 
> IIUC, Bjorn wants to limit the AER/DPC disablement when device power
> is really off.
> Is "the power will be turned off pretty soon after device enters
> D3hot" applicable to most devices? Since config space is still
> accessible when device is in D3hot.

Well the device may be part of a topology, say Thunderbolt/USB4 (but can
be NVMe or similar) where it initially goes into D3hot but in the end
the whole topology is put into D3cold. The device driver really should
expect that this happens always and not try to distinguish between the
D3hot or D3cold.

> Unless there are cases when device firmware behave differently to
> D3hot? Then maybe it's better to disable AER for both D3hot, D3cold
> and system S3.

Yes, this makes sense.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] PCI: Add helper to check if any of ancestor device support D3cold
  2023-08-25  5:29     ` Mika Westerberg
@ 2023-08-25  5:43       ` Kai-Heng Feng
  2023-08-25  6:39         ` Mika Westerberg
  0 siblings, 1 reply; 11+ messages in thread
From: Kai-Heng Feng @ 2023-08-25  5:43 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: bhelgaas, koba.ko, sathyanarayanan.kuppuswamy, linux-pci,
	linux-kernel

On Fri, Aug 25, 2023 at 1:29 PM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
>
> On Thu, Aug 24, 2023 at 09:46:00PM +0800, Kai-Heng Feng wrote:
> > Hi,
> >
> > On Thu, Aug 24, 2023 at 7:57 PM Mika Westerberg
> > <mika.westerberg@linux.intel.com> wrote:
> > >
> > > Hi,
> > >
> > > On Thu, Aug 24, 2023 at 12:46:43PM +0800, Kai-Heng Feng wrote:
> > > > In addition to nearest upstream bridge, driver may want to know if the
> > > > entire hierarchy can be powered off to perform different action.
> > > >
> > > > So walk higher up the hierarchy to find out if any device has valid
> > > > _PR3.
> > >
> > > I'm not entirely sure this is good idea. The drivers should expect that
> > > the power will be turned off pretty soon after device enters D3hot. Also
> > > _PR3 is not PCI concept it's ACPI concept so API like this would only
> > > work on systems with ACPI.
> >
> > IIUC, Bjorn wants to limit the AER/DPC disablement when device power
> > is really off.
> > Is "the power will be turned off pretty soon after device enters
> > D3hot" applicable to most devices? Since config space is still
> > accessible when device is in D3hot.
>
> Well the device may be part of a topology, say Thunderbolt/USB4 (but can
> be NVMe or similar) where it initially goes into D3hot but in the end
> the whole topology is put into D3cold. The device driver really should
> expect that this happens always and not try to distinguish between the
> D3hot or D3cold.

What if the device is not in such topology? There are cases that the
rootport doesn't have Power Resources associated so the rootport also
stays in D3hot.
I think what Bjorn suggested is to keep AER enabled for D3hot, and
only disable it for D3cold and S3.

>
> > Unless there are cases when device firmware behave differently to
> > D3hot? Then maybe it's better to disable AER for both D3hot, D3cold
> > and system S3.
>
> Yes, this makes sense.

I agree that differentiate between D3hot and D3cold unnecessarily make
things more complicated, but Bjorn suggested errors reported by AER
under D3hot should still be recorded.
Do you have more compelling data to persuade Bjorn that AER should be
disabled for both D3 states?

Kai-Heng

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] PCI: Add helper to check if any of ancestor device support D3cold
  2023-08-25  5:43       ` Kai-Heng Feng
@ 2023-08-25  6:39         ` Mika Westerberg
  2023-08-26 13:11           ` Bjorn Helgaas
  0 siblings, 1 reply; 11+ messages in thread
From: Mika Westerberg @ 2023-08-25  6:39 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: bhelgaas, koba.ko, sathyanarayanan.kuppuswamy, linux-pci,
	linux-kernel

On Fri, Aug 25, 2023 at 01:43:08PM +0800, Kai-Heng Feng wrote:
> On Fri, Aug 25, 2023 at 1:29 PM Mika Westerberg
> <mika.westerberg@linux.intel.com> wrote:
> >
> > On Thu, Aug 24, 2023 at 09:46:00PM +0800, Kai-Heng Feng wrote:
> > > Hi,
> > >
> > > On Thu, Aug 24, 2023 at 7:57 PM Mika Westerberg
> > > <mika.westerberg@linux.intel.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Thu, Aug 24, 2023 at 12:46:43PM +0800, Kai-Heng Feng wrote:
> > > > > In addition to nearest upstream bridge, driver may want to know if the
> > > > > entire hierarchy can be powered off to perform different action.
> > > > >
> > > > > So walk higher up the hierarchy to find out if any device has valid
> > > > > _PR3.
> > > >
> > > > I'm not entirely sure this is good idea. The drivers should expect that
> > > > the power will be turned off pretty soon after device enters D3hot. Also
> > > > _PR3 is not PCI concept it's ACPI concept so API like this would only
> > > > work on systems with ACPI.
> > >
> > > IIUC, Bjorn wants to limit the AER/DPC disablement when device power
> > > is really off.
> > > Is "the power will be turned off pretty soon after device enters
> > > D3hot" applicable to most devices? Since config space is still
> > > accessible when device is in D3hot.
> >
> > Well the device may be part of a topology, say Thunderbolt/USB4 (but can
> > be NVMe or similar) where it initially goes into D3hot but in the end
> > the whole topology is put into D3cold. The device driver really should
> > expect that this happens always and not try to distinguish between the
> > D3hot or D3cold.
> 
> What if the device is not in such topology? There are cases that the
> rootport doesn't have Power Resources associated so the rootport also
> stays in D3hot.

Yes and this is why the driver should not care. Otherwise it just
complicates things.

For instance, a root port may have _PR3 that allows the whole thing to
enter L2/3 (D3cold) but some device has d3cold_allowed set to false or
there may be a device that does not support PME on D3cold which prevents
the power to be turned off.

> I think what Bjorn suggested is to keep AER enabled for D3hot, and
> only disable it for D3cold and S3.
> 
> >
> > > Unless there are cases when device firmware behave differently to
> > > D3hot? Then maybe it's better to disable AER for both D3hot, D3cold
> > > and system S3.
> >
> > Yes, this makes sense.
> 
> I agree that differentiate between D3hot and D3cold unnecessarily make
> things more complicated, but Bjorn suggested errors reported by AER
> under D3hot should still be recorded.
> Do you have more compelling data to persuade Bjorn that AER should be
> disabled for both D3 states?

Is there even an AER error that can happen when a device is in D3hot
(link is in L1) or D3cold (link is in L2/3)? I'm not an expert in AER
but AFAICT these errors are reported when the device is in active state
not when it is in low power state.

All that said, Bjorn is the maintainer so he gets to decide. If this is
what he suggested then I'm fine with it. Just wanted to mention this. I
too want to see this issue fixed, one way or another ;-) Thanks for
working on this BTW!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] PCI: Add helper to check if any of ancestor device support D3cold
  2023-08-25  6:39         ` Mika Westerberg
@ 2023-08-26 13:11           ` Bjorn Helgaas
  2023-08-28  7:29             ` Kai-Heng Feng
  0 siblings, 1 reply; 11+ messages in thread
From: Bjorn Helgaas @ 2023-08-26 13:11 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Kai-Heng Feng, bhelgaas, koba.ko, sathyanarayanan.kuppuswamy,
	linux-pci, linux-kernel

On Fri, Aug 25, 2023 at 09:39:48AM +0300, Mika Westerberg wrote:
> On Fri, Aug 25, 2023 at 01:43:08PM +0800, Kai-Heng Feng wrote:
> > On Fri, Aug 25, 2023 at 1:29 PM Mika Westerberg
> > <mika.westerberg@linux.intel.com> wrote:
> > > On Thu, Aug 24, 2023 at 09:46:00PM +0800, Kai-Heng Feng wrote:
> > > > On Thu, Aug 24, 2023 at 7:57 PM Mika Westerberg
> > > > <mika.westerberg@linux.intel.com> wrote:

> > I think what Bjorn suggested is to keep AER enabled for D3hot, and
> > only disable it for D3cold and S3.
> > 
> > > > Unless there are cases when device firmware behave differently to
> > > > D3hot? Then maybe it's better to disable AER for both D3hot, D3cold
> > > > and system S3.
> > >
> > > Yes, this makes sense.
> > 
> > I agree that differentiate between D3hot and D3cold unnecessarily make
> > things more complicated, but Bjorn suggested errors reported by AER
> > under D3hot should still be recorded.
> > Do you have more compelling data to persuade Bjorn that AER should be
> > disabled for both D3 states?
> 
> Is there even an AER error that can happen when a device is in D3hot
> (link is in L1) or D3cold (link is in L2/3)? I'm not an expert in AER
> but AFAICT these errors are reported when the device is in active state
> not when it is in low power state.

I don't think a device in D3cold can signal its own errors.  But the
link transition to L2/L3 as a device goes to D3cold may cause the
bridge above to log an error.  And of course a config access to a
device in D3cold will probably result in an Unsupported Request being
logged by the bridge above it.  I think these are the sorts of errors
we do need to avoid or ignore somehow.

But Configuration and Message requests definitely happen in D3hot, and
they can cause errors reported via AER.  The spec (r6.0, sec 2.2.8)
recommends that Messages be handled the same in D0-D3hot.

PTM is an example of where we had errors being reported at suspend/
resume because we had it configured incorrectly.  If we disabled AER
in D3hot we might not learn about that kind of configuration problem.
That's what makes me think there's some value in keeping AER enabled
in D3hot.

Bjorn

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] PCI: Add helper to check if any of ancestor device support D3cold
  2023-08-26 13:11           ` Bjorn Helgaas
@ 2023-08-28  7:29             ` Kai-Heng Feng
  2023-08-28 16:23               ` Bjorn Helgaas
  0 siblings, 1 reply; 11+ messages in thread
From: Kai-Heng Feng @ 2023-08-28  7:29 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, bhelgaas, koba.ko, sathyanarayanan.kuppuswamy,
	linux-pci, linux-kernel

On Sat, Aug 26, 2023 at 9:11 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Fri, Aug 25, 2023 at 09:39:48AM +0300, Mika Westerberg wrote:
> > On Fri, Aug 25, 2023 at 01:43:08PM +0800, Kai-Heng Feng wrote:
> > > On Fri, Aug 25, 2023 at 1:29 PM Mika Westerberg
> > > <mika.westerberg@linux.intel.com> wrote:
> > > > On Thu, Aug 24, 2023 at 09:46:00PM +0800, Kai-Heng Feng wrote:
> > > > > On Thu, Aug 24, 2023 at 7:57 PM Mika Westerberg
> > > > > <mika.westerberg@linux.intel.com> wrote:
>
> > > I think what Bjorn suggested is to keep AER enabled for D3hot, and
> > > only disable it for D3cold and S3.
> > >
> > > > > Unless there are cases when device firmware behave differently to
> > > > > D3hot? Then maybe it's better to disable AER for both D3hot, D3cold
> > > > > and system S3.
> > > >
> > > > Yes, this makes sense.
> > >
> > > I agree that differentiate between D3hot and D3cold unnecessarily make
> > > things more complicated, but Bjorn suggested errors reported by AER
> > > under D3hot should still be recorded.
> > > Do you have more compelling data to persuade Bjorn that AER should be
> > > disabled for both D3 states?
> >
> > Is there even an AER error that can happen when a device is in D3hot
> > (link is in L1) or D3cold (link is in L2/3)? I'm not an expert in AER
> > but AFAICT these errors are reported when the device is in active state
> > not when it is in low power state.
>
> I don't think a device in D3cold can signal its own errors.  But the
> link transition to L2/L3 as a device goes to D3cold may cause the
> bridge above to log an error.  And of course a config access to a
> device in D3cold will probably result in an Unsupported Request being
> logged by the bridge above it.  I think these are the sorts of errors
> we do need to avoid or ignore somehow.

In addition to that, we can't really control what device behaves
during the D3hot (L2) transition.
The kernel can't control what the firmware on the device may respond.

>
> But Configuration and Message requests definitely happen in D3hot, and
> they can cause errors reported via AER.  The spec (r6.0, sec 2.2.8)
> recommends that Messages be handled the same in D0-D3hot.
>
> PTM is an example of where we had errors being reported at suspend/
> resume because we had it configured incorrectly.  If we disabled AER
> in D3hot we might not learn about that kind of configuration problem.
> That's what makes me think there's some value in keeping AER enabled
> in D3hot.

In this particular case, the firmware of the device gets power cycled
and starts sending PTM because of that.
For this case, we want to know the error happens, but in the meantime
there's nothing much can be done.

So is it reasonable to log Corrected Errors, but skipping the AER reset?

Kai-Heng

>
> Bjorn

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] PCI: Add helper to check if any of ancestor device support D3cold
  2023-08-28  7:29             ` Kai-Heng Feng
@ 2023-08-28 16:23               ` Bjorn Helgaas
  0 siblings, 0 replies; 11+ messages in thread
From: Bjorn Helgaas @ 2023-08-28 16:23 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Mika Westerberg, bhelgaas, koba.ko, sathyanarayanan.kuppuswamy,
	linux-pci, linux-kernel

On Mon, Aug 28, 2023 at 03:29:08PM +0800, Kai-Heng Feng wrote:
> On Sat, Aug 26, 2023 at 9:11 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Fri, Aug 25, 2023 at 09:39:48AM +0300, Mika Westerberg wrote:
> > > On Fri, Aug 25, 2023 at 01:43:08PM +0800, Kai-Heng Feng wrote:
> > > > On Fri, Aug 25, 2023 at 1:29 PM Mika Westerberg
> > > > <mika.westerberg@linux.intel.com> wrote:
> > > > > On Thu, Aug 24, 2023 at 09:46:00PM +0800, Kai-Heng Feng wrote:
> > > > > > On Thu, Aug 24, 2023 at 7:57 PM Mika Westerberg
> > > > > > <mika.westerberg@linux.intel.com> wrote:
> >
> > > > I think what Bjorn suggested is to keep AER enabled for D3hot, and
> > > > only disable it for D3cold and S3.
> > > >
> > > > > > Unless there are cases when device firmware behave differently to
> > > > > > D3hot? Then maybe it's better to disable AER for both D3hot, D3cold
> > > > > > and system S3.
> > > > >
> > > > > Yes, this makes sense.
> > > >
> > > > I agree that differentiate between D3hot and D3cold unnecessarily make
> > > > things more complicated, but Bjorn suggested errors reported by AER
> > > > under D3hot should still be recorded.
> > > > Do you have more compelling data to persuade Bjorn that AER should be
> > > > disabled for both D3 states?
> > >
> > > Is there even an AER error that can happen when a device is in D3hot
> > > (link is in L1) or D3cold (link is in L2/3)? I'm not an expert in AER
> > > but AFAICT these errors are reported when the device is in active state
> > > not when it is in low power state.
> >
> > I don't think a device in D3cold can signal its own errors.  But the
> > link transition to L2/L3 as a device goes to D3cold may cause the
> > bridge above to log an error.  And of course a config access to a
> > device in D3cold will probably result in an Unsupported Request being
> > logged by the bridge above it.  I think these are the sorts of errors
> > we do need to avoid or ignore somehow.
> 
> In addition to that, we can't really control what device behaves
> during the D3hot (L2) transition.

I don't think a link in L2 (main power off) can lead to a device in
D3hot, can it?  I assume that a device in D3hot can be returned to D0
by a config write to the PM CSR, and the link must be usable for that.

> The kernel can't control what the firmware on the device may
> respond.

The kernel can't directly control the internal behavior of the device,
but the behavior that's observable on the PCIe link should always
conform to the spec.

Do you see devices where a transition to D3hot may cause some kind of
non-compliant behavior on the link?  If so, then I guess we have to
consider whether to quirk them avoid D3hot completely or to work
around it somehow.

> > But Configuration and Message requests definitely happen in D3hot, and
> > they can cause errors reported via AER.  The spec (r6.0, sec 2.2.8)
> > recommends that Messages be handled the same in D0-D3hot.
> >
> > PTM is an example of where we had errors being reported at suspend/
> > resume because we had it configured incorrectly.  If we disabled AER
> > in D3hot we might not learn about that kind of configuration problem.
> > That's what makes me think there's some value in keeping AER enabled
> > in D3hot.
> 
> In this particular case, the firmware of the device gets power cycled
> and starts sending PTM because of that.
> For this case, we want to know the error happens, but in the meantime
> there's nothing much can be done.

So simply putting the device in D3hot restarts firmware on the device?
And it starts sending PTM requests after the restart?  I *assume* that
at least it only sends the PTM requests if the PTM Enable bit is set,
right?  That shouldn't cause us trouble unless we configured something
wrong.

Bjorn

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-08-28 16:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-24  4:46 [PATCH 1/3] PCI: Add helper to check if any of ancestor device support D3cold Kai-Heng Feng
2023-08-24  4:46 ` [PATCH 2/3] PCI/AER: Disable AER service on suspend Kai-Heng Feng
2023-08-24  4:46 ` [PATCH 3/3] PCI/DPC: Disable DPC " Kai-Heng Feng
2023-08-24 11:56 ` [PATCH 1/3] PCI: Add helper to check if any of ancestor device support D3cold Mika Westerberg
2023-08-24 13:46   ` Kai-Heng Feng
2023-08-25  5:29     ` Mika Westerberg
2023-08-25  5:43       ` Kai-Heng Feng
2023-08-25  6:39         ` Mika Westerberg
2023-08-26 13:11           ` Bjorn Helgaas
2023-08-28  7:29             ` Kai-Heng Feng
2023-08-28 16:23               ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).