linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] PCI: Prevent power state transition of erroneous device
@ 2025-05-19 10:28 Raag Jadav
  2025-05-19 10:41 ` Raag Jadav
  0 siblings, 1 reply; 27+ messages in thread
From: Raag Jadav @ 2025-05-19 10:28 UTC (permalink / raw)
  To: rafael, mahesh, oohall, bhelgaas
  Cc: linux-pci, linux-pm, linux-kernel, ilpo.jarvinen, lukas,
	aravind.iddamsetty, superm1, benato.denis96, Raag Jadav

If error status is set on an AER capable device, most likely either the
device recovery is in progress or has already failed. Neither of the
cases are well suited for power state transition of the device, since
this can lead to unpredictable consequences like resume failure, or in
worst case the device is lost because of it. Leave the device in its
existing power state to avoid such issues.

Signed-off-by: Raag Jadav <raag.jadav@intel.com>
---

v2: Synchronize AER handling with PCI PM (Rafael)
v3: Move pci_aer_in_progress() to pci_set_low_power_state() (Rafael)
    Elaborate "why" (Bjorn)
v4: Rely on error status instead of device status
    Condense comment (Lukas)

More discussion on [1].
[1] https://lore.kernel.org/all/CAJZ5v0g-aJXfVH+Uc=9eRPuW08t-6PwzdyMXsC6FZRKYJtY03Q@mail.gmail.com/

 drivers/pci/pci.c      |  9 +++++++++
 drivers/pci/pcie/aer.c | 13 +++++++++++++
 include/linux/aer.h    |  2 ++
 3 files changed, 24 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 4d7c9f64ea24..a20018692933 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -9,6 +9,7 @@
  */
 
 #include <linux/acpi.h>
+#include <linux/aer.h>
 #include <linux/kernel.h>
 #include <linux/delay.h>
 #include <linux/dmi.h>
@@ -1539,6 +1540,14 @@ static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state, bool
 	   || (state == PCI_D2 && !dev->d2_support))
 		return -EIO;
 
+	/*
+	 * If error status is set on an AER capable device, it is not well
+	 * suited for power state transition. Leave it in its existing power
+	 * state to avoid issues like unpredictable resume failure.
+	 */
+	if (pci_aer_in_progress(dev))
+		return -EIO;
+
 	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
 	if (PCI_POSSIBLE_ERROR(pmcsr)) {
 		pci_err(dev, "Unable to change power state from %s to %s, device inaccessible\n",
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a1cf8c7ef628..617fbac0d38a 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -237,6 +237,19 @@ int pcie_aer_is_native(struct pci_dev *dev)
 }
 EXPORT_SYMBOL_NS_GPL(pcie_aer_is_native, "CXL");
 
+bool pci_aer_in_progress(struct pci_dev *dev)
+{
+	int aer = dev->aer_cap;
+	u32 cor, uncor;
+
+	if (!pcie_aer_is_native(dev))
+		return false;
+
+	pci_read_config_dword(dev, aer + PCI_ERR_COR_STATUS, &cor);
+	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &uncor);
+	return cor || uncor;
+}
+
 static int pci_enable_pcie_error_reporting(struct pci_dev *dev)
 {
 	int rc;
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 02940be66324..e6a380bb2e68 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -56,12 +56,14 @@ struct aer_capability_regs {
 #if defined(CONFIG_PCIEAER)
 int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
 int pcie_aer_is_native(struct pci_dev *dev);
+bool pci_aer_in_progress(struct pci_dev *dev);
 #else
 static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
 {
 	return -EINVAL;
 }
 static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
+static inline bool pci_aer_in_progress(struct pci_dev *dev) { return false; }
 #endif
 
 void pci_print_aer(struct pci_dev *dev, int aer_severity,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-05-19 10:28 [PATCH v4] PCI: Prevent power state transition of erroneous device Raag Jadav
@ 2025-05-19 10:41 ` Raag Jadav
  2025-05-19 21:42   ` Denis Benato
  0 siblings, 1 reply; 27+ messages in thread
From: Raag Jadav @ 2025-05-19 10:41 UTC (permalink / raw)
  To: rafael, mahesh, oohall, bhelgaas
  Cc: linux-pci, linux-pm, linux-kernel, ilpo.jarvinen, lukas,
	aravind.iddamsetty, superm1, benato.denis96

On Mon, May 19, 2025 at 03:58:08PM +0530, Raag Jadav wrote:
> If error status is set on an AER capable device, most likely either the
> device recovery is in progress or has already failed. Neither of the
> cases are well suited for power state transition of the device, since
> this can lead to unpredictable consequences like resume failure, or in
> worst case the device is lost because of it. Leave the device in its
> existing power state to avoid such issues.
> 
> Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> ---
> 
> v2: Synchronize AER handling with PCI PM (Rafael)
> v3: Move pci_aer_in_progress() to pci_set_low_power_state() (Rafael)
>     Elaborate "why" (Bjorn)
> v4: Rely on error status instead of device status
>     Condense comment (Lukas)

Since pci_aer_in_progress() is changed I've not included Rafael's tag with
my understanding of this needing a revisit. If this was a mistake, please
let me know.

Denis, Mario, does this fix your issue?

> More discussion on [1].
> [1] https://lore.kernel.org/all/CAJZ5v0g-aJXfVH+Uc=9eRPuW08t-6PwzdyMXsC6FZRKYJtY03Q@mail.gmail.com/
> 
>  drivers/pci/pci.c      |  9 +++++++++
>  drivers/pci/pcie/aer.c | 13 +++++++++++++
>  include/linux/aer.h    |  2 ++
>  3 files changed, 24 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 4d7c9f64ea24..a20018692933 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -9,6 +9,7 @@
>   */
>  
>  #include <linux/acpi.h>
> +#include <linux/aer.h>
>  #include <linux/kernel.h>
>  #include <linux/delay.h>
>  #include <linux/dmi.h>
> @@ -1539,6 +1540,14 @@ static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state, bool
>  	   || (state == PCI_D2 && !dev->d2_support))
>  		return -EIO;
>  
> +	/*
> +	 * If error status is set on an AER capable device, it is not well
> +	 * suited for power state transition. Leave it in its existing power
> +	 * state to avoid issues like unpredictable resume failure.
> +	 */
> +	if (pci_aer_in_progress(dev))
> +		return -EIO;
> +
>  	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
>  	if (PCI_POSSIBLE_ERROR(pmcsr)) {
>  		pci_err(dev, "Unable to change power state from %s to %s, device inaccessible\n",
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index a1cf8c7ef628..617fbac0d38a 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -237,6 +237,19 @@ int pcie_aer_is_native(struct pci_dev *dev)
>  }
>  EXPORT_SYMBOL_NS_GPL(pcie_aer_is_native, "CXL");
>  
> +bool pci_aer_in_progress(struct pci_dev *dev)
> +{
> +	int aer = dev->aer_cap;
> +	u32 cor, uncor;
> +
> +	if (!pcie_aer_is_native(dev))
> +		return false;
> +
> +	pci_read_config_dword(dev, aer + PCI_ERR_COR_STATUS, &cor);
> +	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &uncor);
> +	return cor || uncor;
> +}
> +
>  static int pci_enable_pcie_error_reporting(struct pci_dev *dev)
>  {
>  	int rc;
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index 02940be66324..e6a380bb2e68 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -56,12 +56,14 @@ struct aer_capability_regs {
>  #if defined(CONFIG_PCIEAER)
>  int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
>  int pcie_aer_is_native(struct pci_dev *dev);
> +bool pci_aer_in_progress(struct pci_dev *dev);
>  #else
>  static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>  {
>  	return -EINVAL;
>  }
>  static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
> +static inline bool pci_aer_in_progress(struct pci_dev *dev) { return false; }
>  #endif
>  
>  void pci_print_aer(struct pci_dev *dev, int aer_severity,
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-05-19 10:41 ` Raag Jadav
@ 2025-05-19 21:42   ` Denis Benato
  2025-05-20  9:48     ` Raag Jadav
  0 siblings, 1 reply; 27+ messages in thread
From: Denis Benato @ 2025-05-19 21:42 UTC (permalink / raw)
  To: Raag Jadav, rafael, mahesh, oohall, bhelgaas
  Cc: linux-pci, linux-pm, linux-kernel, ilpo.jarvinen, lukas,
	aravind.iddamsetty, superm1


On 5/19/25 12:41, Raag Jadav wrote:
> On Mon, May 19, 2025 at 03:58:08PM +0530, Raag Jadav wrote:
>> If error status is set on an AER capable device, most likely either the
>> device recovery is in progress or has already failed. Neither of the
>> cases are well suited for power state transition of the device, since
>> this can lead to unpredictable consequences like resume failure, or in
>> worst case the device is lost because of it. Leave the device in its
>> existing power state to avoid such issues.
>>
>> Signed-off-by: Raag Jadav <raag.jadav@intel.com>
>> ---
>>
>> v2: Synchronize AER handling with PCI PM (Rafael)
>> v3: Move pci_aer_in_progress() to pci_set_low_power_state() (Rafael)
>>     Elaborate "why" (Bjorn)
>> v4: Rely on error status instead of device status
>>     Condense comment (Lukas)
> Since pci_aer_in_progress() is changed I've not included Rafael's tag with
> my understanding of this needing a revisit. If this was a mistake, please
> let me know.
>
> Denis, Mario, does this fix your issue?
>
Hello,

Unfortunately no, I have prepared a dmesg but had to remove the bootup process because it was too long of a few kb: https://pastebin.com/1uBEA1FL

>> More discussion on [1].
>> [1] https://lore.kernel.org/all/CAJZ5v0g-aJXfVH+Uc=9eRPuW08t-6PwzdyMXsC6FZRKYJtY03Q@mail.gmail.com/
>>
>>  drivers/pci/pci.c      |  9 +++++++++
>>  drivers/pci/pcie/aer.c | 13 +++++++++++++
>>  include/linux/aer.h    |  2 ++
>>  3 files changed, 24 insertions(+)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 4d7c9f64ea24..a20018692933 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -9,6 +9,7 @@
>>   */
>>  
>>  #include <linux/acpi.h>
>> +#include <linux/aer.h>
>>  #include <linux/kernel.h>
>>  #include <linux/delay.h>
>>  #include <linux/dmi.h>
>> @@ -1539,6 +1540,14 @@ static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state, bool
>>  	   || (state == PCI_D2 && !dev->d2_support))
>>  		return -EIO;
>>  
>> +	/*
>> +	 * If error status is set on an AER capable device, it is not well
>> +	 * suited for power state transition. Leave it in its existing power
>> +	 * state to avoid issues like unpredictable resume failure.
>> +	 */
>> +	if (pci_aer_in_progress(dev))
>> +		return -EIO;
>> +
>>  	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
>>  	if (PCI_POSSIBLE_ERROR(pmcsr)) {
>>  		pci_err(dev, "Unable to change power state from %s to %s, device inaccessible\n",
>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> index a1cf8c7ef628..617fbac0d38a 100644
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -237,6 +237,19 @@ int pcie_aer_is_native(struct pci_dev *dev)
>>  }
>>  EXPORT_SYMBOL_NS_GPL(pcie_aer_is_native, "CXL");
>>  
>> +bool pci_aer_in_progress(struct pci_dev *dev)
>> +{
>> +	int aer = dev->aer_cap;
>> +	u32 cor, uncor;
>> +
>> +	if (!pcie_aer_is_native(dev))
>> +		return false;
>> +
>> +	pci_read_config_dword(dev, aer + PCI_ERR_COR_STATUS, &cor);
>> +	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &uncor);
>> +	return cor || uncor;
>> +}
>> +
>>  static int pci_enable_pcie_error_reporting(struct pci_dev *dev)
>>  {
>>  	int rc;
>> diff --git a/include/linux/aer.h b/include/linux/aer.h
>> index 02940be66324..e6a380bb2e68 100644
>> --- a/include/linux/aer.h
>> +++ b/include/linux/aer.h
>> @@ -56,12 +56,14 @@ struct aer_capability_regs {
>>  #if defined(CONFIG_PCIEAER)
>>  int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
>>  int pcie_aer_is_native(struct pci_dev *dev);
>> +bool pci_aer_in_progress(struct pci_dev *dev);
>>  #else
>>  static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>>  {
>>  	return -EINVAL;
>>  }
>>  static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
>> +static inline bool pci_aer_in_progress(struct pci_dev *dev) { return false; }
>>  #endif
>>  
>>  void pci_print_aer(struct pci_dev *dev, int aer_severity,
>> -- 
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-05-19 21:42   ` Denis Benato
@ 2025-05-20  9:48     ` Raag Jadav
  2025-05-20 15:23       ` Mario Limonciello
  0 siblings, 1 reply; 27+ messages in thread
From: Raag Jadav @ 2025-05-20  9:48 UTC (permalink / raw)
  To: Denis Benato
  Cc: rafael, mahesh, oohall, bhelgaas, linux-pci, linux-pm,
	linux-kernel, ilpo.jarvinen, lukas, aravind.iddamsetty, superm1

On Mon, May 19, 2025 at 11:42:31PM +0200, Denis Benato wrote:
> On 5/19/25 12:41, Raag Jadav wrote:
> > On Mon, May 19, 2025 at 03:58:08PM +0530, Raag Jadav wrote:
> >> If error status is set on an AER capable device, most likely either the
> >> device recovery is in progress or has already failed. Neither of the
> >> cases are well suited for power state transition of the device, since
> >> this can lead to unpredictable consequences like resume failure, or in
> >> worst case the device is lost because of it. Leave the device in its
> >> existing power state to avoid such issues.
> >>
> >> Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> >> ---
> >>
> >> v2: Synchronize AER handling with PCI PM (Rafael)
> >> v3: Move pci_aer_in_progress() to pci_set_low_power_state() (Rafael)
> >>     Elaborate "why" (Bjorn)
> >> v4: Rely on error status instead of device status
> >>     Condense comment (Lukas)
> > Since pci_aer_in_progress() is changed I've not included Rafael's tag with
> > my understanding of this needing a revisit. If this was a mistake, please
> > let me know.
> >
> > Denis, Mario, does this fix your issue?
> >
> Hello,
> 
> Unfortunately no, I have prepared a dmesg but had to remove the bootup process because it was too long of a few kb: https://pastebin.com/1uBEA1FL

Thanks for the test. It seems there's no hotplug event this time around
and endpoint device is still intact without any PCI related failure.

Also,

amdgpu 0000:09:00.0: PCI PM: Suspend power state: D3hot

Which means whatever you're facing is either not related to this patch,
or at best exposed some nasty side-effect that's not handled correctly
by the driver.

I'd say amdgpu folks would be of better help for your case.

Raag

> >> More discussion on [1].
> >> [1] https://lore.kernel.org/all/CAJZ5v0g-aJXfVH+Uc=9eRPuW08t-6PwzdyMXsC6FZRKYJtY03Q@mail.gmail.com/
> >>
> >>  drivers/pci/pci.c      |  9 +++++++++
> >>  drivers/pci/pcie/aer.c | 13 +++++++++++++
> >>  include/linux/aer.h    |  2 ++
> >>  3 files changed, 24 insertions(+)
> >>
> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >> index 4d7c9f64ea24..a20018692933 100644
> >> --- a/drivers/pci/pci.c
> >> +++ b/drivers/pci/pci.c
> >> @@ -9,6 +9,7 @@
> >>   */
> >>  
> >>  #include <linux/acpi.h>
> >> +#include <linux/aer.h>
> >>  #include <linux/kernel.h>
> >>  #include <linux/delay.h>
> >>  #include <linux/dmi.h>
> >> @@ -1539,6 +1540,14 @@ static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state, bool
> >>  	   || (state == PCI_D2 && !dev->d2_support))
> >>  		return -EIO;
> >>  
> >> +	/*
> >> +	 * If error status is set on an AER capable device, it is not well
> >> +	 * suited for power state transition. Leave it in its existing power
> >> +	 * state to avoid issues like unpredictable resume failure.
> >> +	 */
> >> +	if (pci_aer_in_progress(dev))
> >> +		return -EIO;
> >> +
> >>  	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> >>  	if (PCI_POSSIBLE_ERROR(pmcsr)) {
> >>  		pci_err(dev, "Unable to change power state from %s to %s, device inaccessible\n",
> >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> >> index a1cf8c7ef628..617fbac0d38a 100644
> >> --- a/drivers/pci/pcie/aer.c
> >> +++ b/drivers/pci/pcie/aer.c
> >> @@ -237,6 +237,19 @@ int pcie_aer_is_native(struct pci_dev *dev)
> >>  }
> >>  EXPORT_SYMBOL_NS_GPL(pcie_aer_is_native, "CXL");
> >>  
> >> +bool pci_aer_in_progress(struct pci_dev *dev)
> >> +{
> >> +	int aer = dev->aer_cap;
> >> +	u32 cor, uncor;
> >> +
> >> +	if (!pcie_aer_is_native(dev))
> >> +		return false;
> >> +
> >> +	pci_read_config_dword(dev, aer + PCI_ERR_COR_STATUS, &cor);
> >> +	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &uncor);
> >> +	return cor || uncor;
> >> +}
> >> +
> >>  static int pci_enable_pcie_error_reporting(struct pci_dev *dev)
> >>  {
> >>  	int rc;
> >> diff --git a/include/linux/aer.h b/include/linux/aer.h
> >> index 02940be66324..e6a380bb2e68 100644
> >> --- a/include/linux/aer.h
> >> +++ b/include/linux/aer.h
> >> @@ -56,12 +56,14 @@ struct aer_capability_regs {
> >>  #if defined(CONFIG_PCIEAER)
> >>  int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
> >>  int pcie_aer_is_native(struct pci_dev *dev);
> >> +bool pci_aer_in_progress(struct pci_dev *dev);
> >>  #else
> >>  static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
> >>  {
> >>  	return -EINVAL;
> >>  }
> >>  static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
> >> +static inline bool pci_aer_in_progress(struct pci_dev *dev) { return false; }
> >>  #endif
> >>  
> >>  void pci_print_aer(struct pci_dev *dev, int aer_severity,
> >> -- 
> >> 2.34.1
> >>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-05-20  9:48     ` Raag Jadav
@ 2025-05-20 15:23       ` Mario Limonciello
  2025-05-20 15:47         ` Raag Jadav
  0 siblings, 1 reply; 27+ messages in thread
From: Mario Limonciello @ 2025-05-20 15:23 UTC (permalink / raw)
  To: Raag Jadav, Denis Benato
  Cc: rafael, mahesh, oohall, bhelgaas, linux-pci, linux-pm,
	linux-kernel, ilpo.jarvinen, lukas, aravind.iddamsetty

On 5/20/2025 4:48 AM, Raag Jadav wrote:
> On Mon, May 19, 2025 at 11:42:31PM +0200, Denis Benato wrote:
>> On 5/19/25 12:41, Raag Jadav wrote:
>>> On Mon, May 19, 2025 at 03:58:08PM +0530, Raag Jadav wrote:
>>>> If error status is set on an AER capable device, most likely either the
>>>> device recovery is in progress or has already failed. Neither of the
>>>> cases are well suited for power state transition of the device, since
>>>> this can lead to unpredictable consequences like resume failure, or in
>>>> worst case the device is lost because of it. Leave the device in its
>>>> existing power state to avoid such issues.
>>>>
>>>> Signed-off-by: Raag Jadav <raag.jadav@intel.com>
>>>> ---
>>>>
>>>> v2: Synchronize AER handling with PCI PM (Rafael)
>>>> v3: Move pci_aer_in_progress() to pci_set_low_power_state() (Rafael)
>>>>      Elaborate "why" (Bjorn)
>>>> v4: Rely on error status instead of device status
>>>>      Condense comment (Lukas)
>>> Since pci_aer_in_progress() is changed I've not included Rafael's tag with
>>> my understanding of this needing a revisit. If this was a mistake, please
>>> let me know.
>>>
>>> Denis, Mario, does this fix your issue?
>>>
>> Hello,
>>
>> Unfortunately no, I have prepared a dmesg but had to remove the bootup process because it was too long of a few kb: https://pastebin.com/1uBEA1FL
> 
> Thanks for the test. It seems there's no hotplug event this time around
> and endpoint device is still intact without any PCI related failure.
> 
> Also,
> 
> amdgpu 0000:09:00.0: PCI PM: Suspend power state: D3hot
> 
> Which means whatever you're facing is either not related to this patch,
> or at best exposed some nasty side-effect that's not handled correctly
> by the driver.
> 
> I'd say amdgpu folks would be of better help for your case.
> 
> Raag

So according to the logs Denis shared with v4 
(https://pastebin.com/1uBEA1FL) the GPU should have been going to BOCO. 
This stands for "Bus off Chip Off"

amdgpu 0000:09:00.0: amdgpu: Using BOCO for runtime pm

If it's going to D3hot - that's not going to be BOCO, it should be going 
to D3cold.

Denis, can you redo your logs with out Raag's patch patch and set 
CONFIG_PCI_DEBUG to compare?  The 6.14.6 log you shared already 
(https://pastebin.com/kLZtibcD) also chooses BOCO but I'm suspecting 
picks D3cold like it should.

> 
>>>> More discussion on [1].
>>>> [1] https://lore.kernel.org/all/CAJZ5v0g-aJXfVH+Uc=9eRPuW08t-6PwzdyMXsC6FZRKYJtY03Q@mail.gmail.com/
>>>>
>>>>   drivers/pci/pci.c      |  9 +++++++++
>>>>   drivers/pci/pcie/aer.c | 13 +++++++++++++
>>>>   include/linux/aer.h    |  2 ++
>>>>   3 files changed, 24 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>> index 4d7c9f64ea24..a20018692933 100644
>>>> --- a/drivers/pci/pci.c
>>>> +++ b/drivers/pci/pci.c
>>>> @@ -9,6 +9,7 @@
>>>>    */
>>>>   
>>>>   #include <linux/acpi.h>
>>>> +#include <linux/aer.h>
>>>>   #include <linux/kernel.h>
>>>>   #include <linux/delay.h>
>>>>   #include <linux/dmi.h>
>>>> @@ -1539,6 +1540,14 @@ static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state, bool
>>>>   	   || (state == PCI_D2 && !dev->d2_support))
>>>>   		return -EIO;
>>>>   
>>>> +	/*
>>>> +	 * If error status is set on an AER capable device, it is not well
>>>> +	 * suited for power state transition. Leave it in its existing power
>>>> +	 * state to avoid issues like unpredictable resume failure.
>>>> +	 */
>>>> +	if (pci_aer_in_progress(dev))
>>>> +		return -EIO;
>>>> +
>>>>   	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
>>>>   	if (PCI_POSSIBLE_ERROR(pmcsr)) {
>>>>   		pci_err(dev, "Unable to change power state from %s to %s, device inaccessible\n",
>>>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>>>> index a1cf8c7ef628..617fbac0d38a 100644
>>>> --- a/drivers/pci/pcie/aer.c
>>>> +++ b/drivers/pci/pcie/aer.c
>>>> @@ -237,6 +237,19 @@ int pcie_aer_is_native(struct pci_dev *dev)
>>>>   }
>>>>   EXPORT_SYMBOL_NS_GPL(pcie_aer_is_native, "CXL");
>>>>   
>>>> +bool pci_aer_in_progress(struct pci_dev *dev)
>>>> +{
>>>> +	int aer = dev->aer_cap;
>>>> +	u32 cor, uncor;
>>>> +
>>>> +	if (!pcie_aer_is_native(dev))
>>>> +		return false;
>>>> +
>>>> +	pci_read_config_dword(dev, aer + PCI_ERR_COR_STATUS, &cor);
>>>> +	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &uncor);
>>>> +	return cor || uncor;
>>>> +}
>>>> +
>>>>   static int pci_enable_pcie_error_reporting(struct pci_dev *dev)
>>>>   {
>>>>   	int rc;
>>>> diff --git a/include/linux/aer.h b/include/linux/aer.h
>>>> index 02940be66324..e6a380bb2e68 100644
>>>> --- a/include/linux/aer.h
>>>> +++ b/include/linux/aer.h
>>>> @@ -56,12 +56,14 @@ struct aer_capability_regs {
>>>>   #if defined(CONFIG_PCIEAER)
>>>>   int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
>>>>   int pcie_aer_is_native(struct pci_dev *dev);
>>>> +bool pci_aer_in_progress(struct pci_dev *dev);
>>>>   #else
>>>>   static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>>>>   {
>>>>   	return -EINVAL;
>>>>   }
>>>>   static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
>>>> +static inline bool pci_aer_in_progress(struct pci_dev *dev) { return false; }
>>>>   #endif
>>>>   
>>>>   void pci_print_aer(struct pci_dev *dev, int aer_severity,
>>>> -- 
>>>> 2.34.1
>>>>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-05-20 15:23       ` Mario Limonciello
@ 2025-05-20 15:47         ` Raag Jadav
  2025-05-20 15:49           ` Mario Limonciello
  0 siblings, 1 reply; 27+ messages in thread
From: Raag Jadav @ 2025-05-20 15:47 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Denis Benato, rafael, mahesh, oohall, bhelgaas, linux-pci,
	linux-pm, linux-kernel, ilpo.jarvinen, lukas, aravind.iddamsetty

On Tue, May 20, 2025 at 10:23:57AM -0500, Mario Limonciello wrote:
> On 5/20/2025 4:48 AM, Raag Jadav wrote:
> > On Mon, May 19, 2025 at 11:42:31PM +0200, Denis Benato wrote:
> > > On 5/19/25 12:41, Raag Jadav wrote:
> > > > On Mon, May 19, 2025 at 03:58:08PM +0530, Raag Jadav wrote:
> > > > > If error status is set on an AER capable device, most likely either the
> > > > > device recovery is in progress or has already failed. Neither of the
> > > > > cases are well suited for power state transition of the device, since
> > > > > this can lead to unpredictable consequences like resume failure, or in
> > > > > worst case the device is lost because of it. Leave the device in its
> > > > > existing power state to avoid such issues.
> > > > > 
> > > > > Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> > > > > ---
> > > > > 
> > > > > v2: Synchronize AER handling with PCI PM (Rafael)
> > > > > v3: Move pci_aer_in_progress() to pci_set_low_power_state() (Rafael)
> > > > >      Elaborate "why" (Bjorn)
> > > > > v4: Rely on error status instead of device status
> > > > >      Condense comment (Lukas)
> > > > Since pci_aer_in_progress() is changed I've not included Rafael's tag with
> > > > my understanding of this needing a revisit. If this was a mistake, please
> > > > let me know.
> > > > 
> > > > Denis, Mario, does this fix your issue?
> > > > 
> > > Hello,
> > > 
> > > Unfortunately no, I have prepared a dmesg but had to remove the bootup process because it was too long of a few kb: https://pastebin.com/1uBEA1FL
> > 
> > Thanks for the test. It seems there's no hotplug event this time around
> > and endpoint device is still intact without any PCI related failure.
> > 
> > Also,
> > 
> > amdgpu 0000:09:00.0: PCI PM: Suspend power state: D3hot
> > 
> > Which means whatever you're facing is either not related to this patch,
> > or at best exposed some nasty side-effect that's not handled correctly
> > by the driver.
> > 
> > I'd say amdgpu folks would be of better help for your case.
> > 
> > Raag
> 
> So according to the logs Denis shared with v4
> (https://pastebin.com/1uBEA1FL) the GPU should have been going to BOCO. This
> stands for "Bus off Chip Off"
> 
> amdgpu 0000:09:00.0: amdgpu: Using BOCO for runtime pm
> 
> If it's going to D3hot - that's not going to be BOCO, it should be going to
> D3cold.

Yes, because upstream port is in D0 for some reason (might be this patch
but not sure) and so will be the root port.

pcieport 0000:07:00.0: PCI PM: Suspend power state: D0
pcieport 0000:07:00.0: PCI PM: Skipped

and my best guess is the driver is not able to cope with the lack of D3cold.

Raag

> Denis, can you redo your logs with out Raag's patch patch and set
> CONFIG_PCI_DEBUG to compare?  The 6.14.6 log you shared already
> (https://pastebin.com/kLZtibcD) also chooses BOCO but I'm suspecting picks
> D3cold like it should.
> 
> > 
> > > > > More discussion on [1].
> > > > > [1] https://lore.kernel.org/all/CAJZ5v0g-aJXfVH+Uc=9eRPuW08t-6PwzdyMXsC6FZRKYJtY03Q@mail.gmail.com/
> > > > > 
> > > > >   drivers/pci/pci.c      |  9 +++++++++
> > > > >   drivers/pci/pcie/aer.c | 13 +++++++++++++
> > > > >   include/linux/aer.h    |  2 ++
> > > > >   3 files changed, 24 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > > index 4d7c9f64ea24..a20018692933 100644
> > > > > --- a/drivers/pci/pci.c
> > > > > +++ b/drivers/pci/pci.c
> > > > > @@ -9,6 +9,7 @@
> > > > >    */
> > > > >   #include <linux/acpi.h>
> > > > > +#include <linux/aer.h>
> > > > >   #include <linux/kernel.h>
> > > > >   #include <linux/delay.h>
> > > > >   #include <linux/dmi.h>
> > > > > @@ -1539,6 +1540,14 @@ static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state, bool
> > > > >   	   || (state == PCI_D2 && !dev->d2_support))
> > > > >   		return -EIO;
> > > > > +	/*
> > > > > +	 * If error status is set on an AER capable device, it is not well
> > > > > +	 * suited for power state transition. Leave it in its existing power
> > > > > +	 * state to avoid issues like unpredictable resume failure.
> > > > > +	 */
> > > > > +	if (pci_aer_in_progress(dev))
> > > > > +		return -EIO;
> > > > > +
> > > > >   	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> > > > >   	if (PCI_POSSIBLE_ERROR(pmcsr)) {
> > > > >   		pci_err(dev, "Unable to change power state from %s to %s, device inaccessible\n",
> > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > > > > index a1cf8c7ef628..617fbac0d38a 100644
> > > > > --- a/drivers/pci/pcie/aer.c
> > > > > +++ b/drivers/pci/pcie/aer.c
> > > > > @@ -237,6 +237,19 @@ int pcie_aer_is_native(struct pci_dev *dev)
> > > > >   }
> > > > >   EXPORT_SYMBOL_NS_GPL(pcie_aer_is_native, "CXL");
> > > > > +bool pci_aer_in_progress(struct pci_dev *dev)
> > > > > +{
> > > > > +	int aer = dev->aer_cap;
> > > > > +	u32 cor, uncor;
> > > > > +
> > > > > +	if (!pcie_aer_is_native(dev))
> > > > > +		return false;
> > > > > +
> > > > > +	pci_read_config_dword(dev, aer + PCI_ERR_COR_STATUS, &cor);
> > > > > +	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &uncor);
> > > > > +	return cor || uncor;
> > > > > +}
> > > > > +
> > > > >   static int pci_enable_pcie_error_reporting(struct pci_dev *dev)
> > > > >   {
> > > > >   	int rc;
> > > > > diff --git a/include/linux/aer.h b/include/linux/aer.h
> > > > > index 02940be66324..e6a380bb2e68 100644
> > > > > --- a/include/linux/aer.h
> > > > > +++ b/include/linux/aer.h
> > > > > @@ -56,12 +56,14 @@ struct aer_capability_regs {
> > > > >   #if defined(CONFIG_PCIEAER)
> > > > >   int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
> > > > >   int pcie_aer_is_native(struct pci_dev *dev);
> > > > > +bool pci_aer_in_progress(struct pci_dev *dev);
> > > > >   #else
> > > > >   static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
> > > > >   {
> > > > >   	return -EINVAL;
> > > > >   }
> > > > >   static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
> > > > > +static inline bool pci_aer_in_progress(struct pci_dev *dev) { return false; }
> > > > >   #endif
> > > > >   void pci_print_aer(struct pci_dev *dev, int aer_severity,
> > > > > -- 
> > > > > 2.34.1
> > > > > 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-05-20 15:47         ` Raag Jadav
@ 2025-05-20 15:49           ` Mario Limonciello
  2025-05-20 17:22             ` Denis Benato
  0 siblings, 1 reply; 27+ messages in thread
From: Mario Limonciello @ 2025-05-20 15:49 UTC (permalink / raw)
  To: Raag Jadav
  Cc: Denis Benato, rafael, mahesh, oohall, bhelgaas, linux-pci,
	linux-pm, linux-kernel, ilpo.jarvinen, lukas, aravind.iddamsetty

On 5/20/2025 10:47 AM, Raag Jadav wrote:
> On Tue, May 20, 2025 at 10:23:57AM -0500, Mario Limonciello wrote:
>> On 5/20/2025 4:48 AM, Raag Jadav wrote:
>>> On Mon, May 19, 2025 at 11:42:31PM +0200, Denis Benato wrote:
>>>> On 5/19/25 12:41, Raag Jadav wrote:
>>>>> On Mon, May 19, 2025 at 03:58:08PM +0530, Raag Jadav wrote:
>>>>>> If error status is set on an AER capable device, most likely either the
>>>>>> device recovery is in progress or has already failed. Neither of the
>>>>>> cases are well suited for power state transition of the device, since
>>>>>> this can lead to unpredictable consequences like resume failure, or in
>>>>>> worst case the device is lost because of it. Leave the device in its
>>>>>> existing power state to avoid such issues.
>>>>>>
>>>>>> Signed-off-by: Raag Jadav <raag.jadav@intel.com>
>>>>>> ---
>>>>>>
>>>>>> v2: Synchronize AER handling with PCI PM (Rafael)
>>>>>> v3: Move pci_aer_in_progress() to pci_set_low_power_state() (Rafael)
>>>>>>       Elaborate "why" (Bjorn)
>>>>>> v4: Rely on error status instead of device status
>>>>>>       Condense comment (Lukas)
>>>>> Since pci_aer_in_progress() is changed I've not included Rafael's tag with
>>>>> my understanding of this needing a revisit. If this was a mistake, please
>>>>> let me know.
>>>>>
>>>>> Denis, Mario, does this fix your issue?
>>>>>
>>>> Hello,
>>>>
>>>> Unfortunately no, I have prepared a dmesg but had to remove the bootup process because it was too long of a few kb: https://pastebin.com/1uBEA1FL
>>>
>>> Thanks for the test. It seems there's no hotplug event this time around
>>> and endpoint device is still intact without any PCI related failure.
>>>
>>> Also,
>>>
>>> amdgpu 0000:09:00.0: PCI PM: Suspend power state: D3hot
>>>
>>> Which means whatever you're facing is either not related to this patch,
>>> or at best exposed some nasty side-effect that's not handled correctly
>>> by the driver.
>>>
>>> I'd say amdgpu folks would be of better help for your case.
>>>
>>> Raag
>>
>> So according to the logs Denis shared with v4
>> (https://pastebin.com/1uBEA1FL) the GPU should have been going to BOCO. This
>> stands for "Bus off Chip Off"
>>
>> amdgpu 0000:09:00.0: amdgpu: Using BOCO for runtime pm
>>
>> If it's going to D3hot - that's not going to be BOCO, it should be going to
>> D3cold.
> 
> Yes, because upstream port is in D0 for some reason (might be this patch
> but not sure) and so will be the root port.
> 
> pcieport 0000:07:00.0: PCI PM: Suspend power state: D0
> pcieport 0000:07:00.0: PCI PM: Skipped
> 
> and my best guess is the driver is not able to cope with the lack of D3cold.

Yes; if the driver is configured to expect BOCO (D3cold) if it doesn't 
get it, chaos ensues.

I guess let's double check the behavior with CONFIG_PCI_DEBUG to verify 
this patch is what is changing that upstream port behavior.

> 
> Raag
> 
>> Denis, can you redo your logs with out Raag's patch patch and set
>> CONFIG_PCI_DEBUG to compare?  The 6.14.6 log you shared already
>> (https://pastebin.com/kLZtibcD) also chooses BOCO but I'm suspecting picks
>> D3cold like it should.
>>
>>>
>>>>>> More discussion on [1].
>>>>>> [1] https://lore.kernel.org/all/CAJZ5v0g-aJXfVH+Uc=9eRPuW08t-6PwzdyMXsC6FZRKYJtY03Q@mail.gmail.com/
>>>>>>
>>>>>>    drivers/pci/pci.c      |  9 +++++++++
>>>>>>    drivers/pci/pcie/aer.c | 13 +++++++++++++
>>>>>>    include/linux/aer.h    |  2 ++
>>>>>>    3 files changed, 24 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>>>> index 4d7c9f64ea24..a20018692933 100644
>>>>>> --- a/drivers/pci/pci.c
>>>>>> +++ b/drivers/pci/pci.c
>>>>>> @@ -9,6 +9,7 @@
>>>>>>     */
>>>>>>    #include <linux/acpi.h>
>>>>>> +#include <linux/aer.h>
>>>>>>    #include <linux/kernel.h>
>>>>>>    #include <linux/delay.h>
>>>>>>    #include <linux/dmi.h>
>>>>>> @@ -1539,6 +1540,14 @@ static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state, bool
>>>>>>    	   || (state == PCI_D2 && !dev->d2_support))
>>>>>>    		return -EIO;
>>>>>> +	/*
>>>>>> +	 * If error status is set on an AER capable device, it is not well
>>>>>> +	 * suited for power state transition. Leave it in its existing power
>>>>>> +	 * state to avoid issues like unpredictable resume failure.
>>>>>> +	 */
>>>>>> +	if (pci_aer_in_progress(dev))
>>>>>> +		return -EIO;
>>>>>> +
>>>>>>    	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
>>>>>>    	if (PCI_POSSIBLE_ERROR(pmcsr)) {
>>>>>>    		pci_err(dev, "Unable to change power state from %s to %s, device inaccessible\n",
>>>>>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>>>>>> index a1cf8c7ef628..617fbac0d38a 100644
>>>>>> --- a/drivers/pci/pcie/aer.c
>>>>>> +++ b/drivers/pci/pcie/aer.c
>>>>>> @@ -237,6 +237,19 @@ int pcie_aer_is_native(struct pci_dev *dev)
>>>>>>    }
>>>>>>    EXPORT_SYMBOL_NS_GPL(pcie_aer_is_native, "CXL");
>>>>>> +bool pci_aer_in_progress(struct pci_dev *dev)
>>>>>> +{
>>>>>> +	int aer = dev->aer_cap;
>>>>>> +	u32 cor, uncor;
>>>>>> +
>>>>>> +	if (!pcie_aer_is_native(dev))
>>>>>> +		return false;
>>>>>> +
>>>>>> +	pci_read_config_dword(dev, aer + PCI_ERR_COR_STATUS, &cor);
>>>>>> +	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &uncor);
>>>>>> +	return cor || uncor;
>>>>>> +}
>>>>>> +
>>>>>>    static int pci_enable_pcie_error_reporting(struct pci_dev *dev)
>>>>>>    {
>>>>>>    	int rc;
>>>>>> diff --git a/include/linux/aer.h b/include/linux/aer.h
>>>>>> index 02940be66324..e6a380bb2e68 100644
>>>>>> --- a/include/linux/aer.h
>>>>>> +++ b/include/linux/aer.h
>>>>>> @@ -56,12 +56,14 @@ struct aer_capability_regs {
>>>>>>    #if defined(CONFIG_PCIEAER)
>>>>>>    int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
>>>>>>    int pcie_aer_is_native(struct pci_dev *dev);
>>>>>> +bool pci_aer_in_progress(struct pci_dev *dev);
>>>>>>    #else
>>>>>>    static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>>>>>>    {
>>>>>>    	return -EINVAL;
>>>>>>    }
>>>>>>    static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
>>>>>> +static inline bool pci_aer_in_progress(struct pci_dev *dev) { return false; }
>>>>>>    #endif
>>>>>>    void pci_print_aer(struct pci_dev *dev, int aer_severity,
>>>>>> -- 
>>>>>> 2.34.1
>>>>>>
>>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-05-20 15:49           ` Mario Limonciello
@ 2025-05-20 17:22             ` Denis Benato
  2025-05-20 17:39               ` Mario Limonciello
  2025-05-21 13:39               ` Lukas Wunner
  0 siblings, 2 replies; 27+ messages in thread
From: Denis Benato @ 2025-05-20 17:22 UTC (permalink / raw)
  To: Mario Limonciello, Raag Jadav
  Cc: rafael, mahesh, oohall, bhelgaas, linux-pci, linux-pm,
	linux-kernel, ilpo.jarvinen, lukas, aravind.iddamsetty


On 5/20/25 17:49, Mario Limonciello wrote:
> On 5/20/2025 10:47 AM, Raag Jadav wrote:
>> On Tue, May 20, 2025 at 10:23:57AM -0500, Mario Limonciello wrote:
>>> On 5/20/2025 4:48 AM, Raag Jadav wrote:
>>>> On Mon, May 19, 2025 at 11:42:31PM +0200, Denis Benato wrote:
>>>>> On 5/19/25 12:41, Raag Jadav wrote:
>>>>>> On Mon, May 19, 2025 at 03:58:08PM +0530, Raag Jadav wrote:
>>>>>>> If error status is set on an AER capable device, most likely either the
>>>>>>> device recovery is in progress or has already failed. Neither of the
>>>>>>> cases are well suited for power state transition of the device, since
>>>>>>> this can lead to unpredictable consequences like resume failure, or in
>>>>>>> worst case the device is lost because of it. Leave the device in its
>>>>>>> existing power state to avoid such issues.
>>>>>>>
>>>>>>> Signed-off-by: Raag Jadav <raag.jadav@intel.com>
>>>>>>> ---
>>>>>>>
>>>>>>> v2: Synchronize AER handling with PCI PM (Rafael)
>>>>>>> v3: Move pci_aer_in_progress() to pci_set_low_power_state() (Rafael)
>>>>>>>       Elaborate "why" (Bjorn)
>>>>>>> v4: Rely on error status instead of device status
>>>>>>>       Condense comment (Lukas)
>>>>>> Since pci_aer_in_progress() is changed I've not included Rafael's tag with
>>>>>> my understanding of this needing a revisit. If this was a mistake, please
>>>>>> let me know.
>>>>>>
>>>>>> Denis, Mario, does this fix your issue?
>>>>>>
>>>>> Hello,
>>>>>
>>>>> Unfortunately no, I have prepared a dmesg but had to remove the bootup process because it was too long of a few kb: https://pastebin.com/1uBEA1FL
>>>>
>>>> Thanks for the test. It seems there's no hotplug event this time around
>>>> and endpoint device is still intact without any PCI related failure.
>>>>
>>>> Also,
>>>>
>>>> amdgpu 0000:09:00.0: PCI PM: Suspend power state: D3hot
>>>>
>>>> Which means whatever you're facing is either not related to this patch,
>>>> or at best exposed some nasty side-effect that's not handled correctly
>>>> by the driver.
>>>>
>>>> I'd say amdgpu folks would be of better help for your case.
>>>>
>>>> Raag
>>>
>>> So according to the logs Denis shared with v4
>>> (https://pastebin.com/1uBEA1FL) the GPU should have been going to BOCO. This
>>> stands for "Bus off Chip Off"
>>>
>>> amdgpu 0000:09:00.0: amdgpu: Using BOCO for runtime pm
>>>
>>> If it's going to D3hot - that's not going to be BOCO, it should be going to
>>> D3cold.
>>
>> Yes, because upstream port is in D0 for some reason (might be this patch
>> but not sure) and so will be the root port.
>>
>> pcieport 0000:07:00.0: PCI PM: Suspend power state: D0
>> pcieport 0000:07:00.0: PCI PM: Skipped
>>
>> and my best guess is the driver is not able to cope with the lack of D3cold.
>
> Yes; if the driver is configured to expect BOCO (D3cold) if it doesn't get it, chaos ensues.
>
> I guess let's double check the behavior with CONFIG_PCI_DEBUG to verify this patch is what is changing that upstream port behavior.


This is the very same exact kernel, minus the patch in question:  https://pastebin.com/rwMYgG7C


Both previous kernel and this one have CONFIG_PCI_DEBUG=y.

Removed the initial bootup sequence to be able to use pastebin.

>
>>
>> Raag
>>
>>> Denis, can you redo your logs with out Raag's patch patch and set
>>> CONFIG_PCI_DEBUG to compare?  The 6.14.6 log you shared already
>>> (https://pastebin.com/kLZtibcD) also chooses BOCO but I'm suspecting picks
>>> D3cold like it should.
>>>
>>>>
>>>>>>> More discussion on [1].
>>>>>>> [1] https://lore.kernel.org/all/CAJZ5v0g-aJXfVH+Uc=9eRPuW08t-6PwzdyMXsC6FZRKYJtY03Q@mail.gmail.com/
>>>>>>>
>>>>>>>    drivers/pci/pci.c      |  9 +++++++++
>>>>>>>    drivers/pci/pcie/aer.c | 13 +++++++++++++
>>>>>>>    include/linux/aer.h    |  2 ++
>>>>>>>    3 files changed, 24 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>>>>> index 4d7c9f64ea24..a20018692933 100644
>>>>>>> --- a/drivers/pci/pci.c
>>>>>>> +++ b/drivers/pci/pci.c
>>>>>>> @@ -9,6 +9,7 @@
>>>>>>>     */
>>>>>>>    #include <linux/acpi.h>
>>>>>>> +#include <linux/aer.h>
>>>>>>>    #include <linux/kernel.h>
>>>>>>>    #include <linux/delay.h>
>>>>>>>    #include <linux/dmi.h>
>>>>>>> @@ -1539,6 +1540,14 @@ static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state, bool
>>>>>>>           || (state == PCI_D2 && !dev->d2_support))
>>>>>>>            return -EIO;
>>>>>>> +    /*
>>>>>>> +     * If error status is set on an AER capable device, it is not well
>>>>>>> +     * suited for power state transition. Leave it in its existing power
>>>>>>> +     * state to avoid issues like unpredictable resume failure.
>>>>>>> +     */
>>>>>>> +    if (pci_aer_in_progress(dev))
>>>>>>> +        return -EIO;
>>>>>>> +
>>>>>>>        pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
>>>>>>>        if (PCI_POSSIBLE_ERROR(pmcsr)) {
>>>>>>>            pci_err(dev, "Unable to change power state from %s to %s, device inaccessible\n",
>>>>>>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>>>>>>> index a1cf8c7ef628..617fbac0d38a 100644
>>>>>>> --- a/drivers/pci/pcie/aer.c
>>>>>>> +++ b/drivers/pci/pcie/aer.c
>>>>>>> @@ -237,6 +237,19 @@ int pcie_aer_is_native(struct pci_dev *dev)
>>>>>>>    }
>>>>>>>    EXPORT_SYMBOL_NS_GPL(pcie_aer_is_native, "CXL");
>>>>>>> +bool pci_aer_in_progress(struct pci_dev *dev)
>>>>>>> +{
>>>>>>> +    int aer = dev->aer_cap;
>>>>>>> +    u32 cor, uncor;
>>>>>>> +
>>>>>>> +    if (!pcie_aer_is_native(dev))
>>>>>>> +        return false;
>>>>>>> +
>>>>>>> +    pci_read_config_dword(dev, aer + PCI_ERR_COR_STATUS, &cor);
>>>>>>> +    pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &uncor);
>>>>>>> +    return cor || uncor;
>>>>>>> +}
>>>>>>> +
>>>>>>>    static int pci_enable_pcie_error_reporting(struct pci_dev *dev)
>>>>>>>    {
>>>>>>>        int rc;
>>>>>>> diff --git a/include/linux/aer.h b/include/linux/aer.h
>>>>>>> index 02940be66324..e6a380bb2e68 100644
>>>>>>> --- a/include/linux/aer.h
>>>>>>> +++ b/include/linux/aer.h
>>>>>>> @@ -56,12 +56,14 @@ struct aer_capability_regs {
>>>>>>>    #if defined(CONFIG_PCIEAER)
>>>>>>>    int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
>>>>>>>    int pcie_aer_is_native(struct pci_dev *dev);
>>>>>>> +bool pci_aer_in_progress(struct pci_dev *dev);
>>>>>>>    #else
>>>>>>>    static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>>>>>>>    {
>>>>>>>        return -EINVAL;
>>>>>>>    }
>>>>>>>    static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
>>>>>>> +static inline bool pci_aer_in_progress(struct pci_dev *dev) { return false; }
>>>>>>>    #endif
>>>>>>>    void pci_print_aer(struct pci_dev *dev, int aer_severity,
>>>>>>> -- 
>>>>>>> 2.34.1
>>>>>>>
>>>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-05-20 17:22             ` Denis Benato
@ 2025-05-20 17:39               ` Mario Limonciello
  2025-05-20 18:42                 ` Raag Jadav
  2025-05-21 13:39               ` Lukas Wunner
  1 sibling, 1 reply; 27+ messages in thread
From: Mario Limonciello @ 2025-05-20 17:39 UTC (permalink / raw)
  To: Denis Benato, Raag Jadav
  Cc: rafael, mahesh, oohall, bhelgaas, linux-pci, linux-pm,
	linux-kernel, ilpo.jarvinen, lukas, aravind.iddamsetty

On 5/20/2025 12:22 PM, Denis Benato wrote:
> 
> On 5/20/25 17:49, Mario Limonciello wrote:
>> On 5/20/2025 10:47 AM, Raag Jadav wrote:
>>> On Tue, May 20, 2025 at 10:23:57AM -0500, Mario Limonciello wrote:
>>>> On 5/20/2025 4:48 AM, Raag Jadav wrote:
>>>>> On Mon, May 19, 2025 at 11:42:31PM +0200, Denis Benato wrote:
>>>>>> On 5/19/25 12:41, Raag Jadav wrote:
>>>>>>> On Mon, May 19, 2025 at 03:58:08PM +0530, Raag Jadav wrote:
>>>>>>>> If error status is set on an AER capable device, most likely either the
>>>>>>>> device recovery is in progress or has already failed. Neither of the
>>>>>>>> cases are well suited for power state transition of the device, since
>>>>>>>> this can lead to unpredictable consequences like resume failure, or in
>>>>>>>> worst case the device is lost because of it. Leave the device in its
>>>>>>>> existing power state to avoid such issues.
>>>>>>>>
>>>>>>>> Signed-off-by: Raag Jadav <raag.jadav@intel.com>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> v2: Synchronize AER handling with PCI PM (Rafael)
>>>>>>>> v3: Move pci_aer_in_progress() to pci_set_low_power_state() (Rafael)
>>>>>>>>        Elaborate "why" (Bjorn)
>>>>>>>> v4: Rely on error status instead of device status
>>>>>>>>        Condense comment (Lukas)
>>>>>>> Since pci_aer_in_progress() is changed I've not included Rafael's tag with
>>>>>>> my understanding of this needing a revisit. If this was a mistake, please
>>>>>>> let me know.
>>>>>>>
>>>>>>> Denis, Mario, does this fix your issue?
>>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Unfortunately no, I have prepared a dmesg but had to remove the bootup process because it was too long of a few kb: https://pastebin.com/1uBEA1FL
>>>>>
>>>>> Thanks for the test. It seems there's no hotplug event this time around
>>>>> and endpoint device is still intact without any PCI related failure.
>>>>>
>>>>> Also,
>>>>>
>>>>> amdgpu 0000:09:00.0: PCI PM: Suspend power state: D3hot
>>>>>
>>>>> Which means whatever you're facing is either not related to this patch,
>>>>> or at best exposed some nasty side-effect that's not handled correctly
>>>>> by the driver.
>>>>>
>>>>> I'd say amdgpu folks would be of better help for your case.
>>>>>
>>>>> Raag
>>>>
>>>> So according to the logs Denis shared with v4
>>>> (https://pastebin.com/1uBEA1FL) the GPU should have been going to BOCO. This
>>>> stands for "Bus off Chip Off"
>>>>
>>>> amdgpu 0000:09:00.0: amdgpu: Using BOCO for runtime pm
>>>>
>>>> If it's going to D3hot - that's not going to be BOCO, it should be going to
>>>> D3cold.
>>>
>>> Yes, because upstream port is in D0 for some reason (might be this patch
>>> but not sure) and so will be the root port.
>>>
>>> pcieport 0000:07:00.0: PCI PM: Suspend power state: D0
>>> pcieport 0000:07:00.0: PCI PM: Skipped
>>>
>>> and my best guess is the driver is not able to cope with the lack of D3cold.
>>
>> Yes; if the driver is configured to expect BOCO (D3cold) if it doesn't get it, chaos ensues.
>>
>> I guess let's double check the behavior with CONFIG_PCI_DEBUG to verify this patch is what is changing that upstream port behavior.
> 
> 
> This is the very same exact kernel, minus the patch in question:  https://pastebin.com/rwMYgG7C
> 
> 
> Both previous kernel and this one have CONFIG_PCI_DEBUG=y.
> 
> Removed the initial bootup sequence to be able to use pastebin.

Thanks - this confirms that the problem is the root port not going to 
D3.  This new log shows:

pcieport 0000:07:00.0: PCI PM: Suspend power state: D3hot

So I feel we should fixate on solving that.
> 
>>
>>>
>>> Raag
>>>
>>>> Denis, can you redo your logs with out Raag's patch patch and set
>>>> CONFIG_PCI_DEBUG to compare?  The 6.14.6 log you shared already
>>>> (https://pastebin.com/kLZtibcD) also chooses BOCO but I'm suspecting picks
>>>> D3cold like it should.
>>>>
>>>>>
>>>>>>>> More discussion on [1].
>>>>>>>> [1] https://lore.kernel.org/all/CAJZ5v0g-aJXfVH+Uc=9eRPuW08t-6PwzdyMXsC6FZRKYJtY03Q@mail.gmail.com/
>>>>>>>>
>>>>>>>>     drivers/pci/pci.c      |  9 +++++++++
>>>>>>>>     drivers/pci/pcie/aer.c | 13 +++++++++++++
>>>>>>>>     include/linux/aer.h    |  2 ++
>>>>>>>>     3 files changed, 24 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>>>>>> index 4d7c9f64ea24..a20018692933 100644
>>>>>>>> --- a/drivers/pci/pci.c
>>>>>>>> +++ b/drivers/pci/pci.c
>>>>>>>> @@ -9,6 +9,7 @@
>>>>>>>>      */
>>>>>>>>     #include <linux/acpi.h>
>>>>>>>> +#include <linux/aer.h>
>>>>>>>>     #include <linux/kernel.h>
>>>>>>>>     #include <linux/delay.h>
>>>>>>>>     #include <linux/dmi.h>
>>>>>>>> @@ -1539,6 +1540,14 @@ static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state, bool
>>>>>>>>            || (state == PCI_D2 && !dev->d2_support))
>>>>>>>>             return -EIO;
>>>>>>>> +    /*
>>>>>>>> +     * If error status is set on an AER capable device, it is not well
>>>>>>>> +     * suited for power state transition. Leave it in its existing power
>>>>>>>> +     * state to avoid issues like unpredictable resume failure.
>>>>>>>> +     */
>>>>>>>> +    if (pci_aer_in_progress(dev))
>>>>>>>> +        return -EIO;
>>>>>>>> +
>>>>>>>>         pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
>>>>>>>>         if (PCI_POSSIBLE_ERROR(pmcsr)) {
>>>>>>>>             pci_err(dev, "Unable to change power state from %s to %s, device inaccessible\n",
>>>>>>>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>>>>>>>> index a1cf8c7ef628..617fbac0d38a 100644
>>>>>>>> --- a/drivers/pci/pcie/aer.c
>>>>>>>> +++ b/drivers/pci/pcie/aer.c
>>>>>>>> @@ -237,6 +237,19 @@ int pcie_aer_is_native(struct pci_dev *dev)
>>>>>>>>     }
>>>>>>>>     EXPORT_SYMBOL_NS_GPL(pcie_aer_is_native, "CXL");
>>>>>>>> +bool pci_aer_in_progress(struct pci_dev *dev)
>>>>>>>> +{
>>>>>>>> +    int aer = dev->aer_cap;
>>>>>>>> +    u32 cor, uncor;
>>>>>>>> +
>>>>>>>> +    if (!pcie_aer_is_native(dev))
>>>>>>>> +        return false;
>>>>>>>> +
>>>>>>>> +    pci_read_config_dword(dev, aer + PCI_ERR_COR_STATUS, &cor);
>>>>>>>> +    pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &uncor);
>>>>>>>> +    return cor || uncor;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>     static int pci_enable_pcie_error_reporting(struct pci_dev *dev)
>>>>>>>>     {
>>>>>>>>         int rc;
>>>>>>>> diff --git a/include/linux/aer.h b/include/linux/aer.h
>>>>>>>> index 02940be66324..e6a380bb2e68 100644
>>>>>>>> --- a/include/linux/aer.h
>>>>>>>> +++ b/include/linux/aer.h
>>>>>>>> @@ -56,12 +56,14 @@ struct aer_capability_regs {
>>>>>>>>     #if defined(CONFIG_PCIEAER)
>>>>>>>>     int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
>>>>>>>>     int pcie_aer_is_native(struct pci_dev *dev);
>>>>>>>> +bool pci_aer_in_progress(struct pci_dev *dev);
>>>>>>>>     #else
>>>>>>>>     static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>>>>>>>>     {
>>>>>>>>         return -EINVAL;
>>>>>>>>     }
>>>>>>>>     static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
>>>>>>>> +static inline bool pci_aer_in_progress(struct pci_dev *dev) { return false; }
>>>>>>>>     #endif
>>>>>>>>     void pci_print_aer(struct pci_dev *dev, int aer_severity,
>>>>>>>> -- 
>>>>>>>> 2.34.1
>>>>>>>>
>>>>
>>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-05-20 17:39               ` Mario Limonciello
@ 2025-05-20 18:42                 ` Raag Jadav
  2025-05-20 18:56                   ` Mario Limonciello
  0 siblings, 1 reply; 27+ messages in thread
From: Raag Jadav @ 2025-05-20 18:42 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Denis Benato, rafael, mahesh, oohall, bhelgaas, linux-pci,
	linux-pm, linux-kernel, ilpo.jarvinen, lukas, aravind.iddamsetty

On Tue, May 20, 2025 at 12:39:12PM -0500, Mario Limonciello wrote:
> On 5/20/2025 12:22 PM, Denis Benato wrote:
> > On 5/20/25 17:49, Mario Limonciello wrote:
> > > On 5/20/2025 10:47 AM, Raag Jadav wrote:
> > > > On Tue, May 20, 2025 at 10:23:57AM -0500, Mario Limonciello wrote:
> > > > > On 5/20/2025 4:48 AM, Raag Jadav wrote:
> > > > > > On Mon, May 19, 2025 at 11:42:31PM +0200, Denis Benato wrote:
> > > > > > > On 5/19/25 12:41, Raag Jadav wrote:
> > > > > > > > On Mon, May 19, 2025 at 03:58:08PM +0530, Raag Jadav wrote:
> > > > > > > > > If error status is set on an AER capable device, most likely either the
> > > > > > > > > device recovery is in progress or has already failed. Neither of the
> > > > > > > > > cases are well suited for power state transition of the device, since
> > > > > > > > > this can lead to unpredictable consequences like resume failure, or in
> > > > > > > > > worst case the device is lost because of it. Leave the device in its
> > > > > > > > > existing power state to avoid such issues.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> > > > > > > > > ---
> > > > > > > > > 
> > > > > > > > > v2: Synchronize AER handling with PCI PM (Rafael)
> > > > > > > > > v3: Move pci_aer_in_progress() to pci_set_low_power_state() (Rafael)
> > > > > > > > >        Elaborate "why" (Bjorn)
> > > > > > > > > v4: Rely on error status instead of device status
> > > > > > > > >        Condense comment (Lukas)
> > > > > > > > Since pci_aer_in_progress() is changed I've not included Rafael's tag with
> > > > > > > > my understanding of this needing a revisit. If this was a mistake, please
> > > > > > > > let me know.
> > > > > > > > 
> > > > > > > > Denis, Mario, does this fix your issue?
> > > > > > > > 
> > > > > > > Hello,
> > > > > > > 
> > > > > > > Unfortunately no, I have prepared a dmesg but had to remove the bootup process because it was too long of a few kb: https://pastebin.com/1uBEA1FL
> > > > > > 
> > > > > > Thanks for the test. It seems there's no hotplug event this time around
> > > > > > and endpoint device is still intact without any PCI related failure.
> > > > > > 
> > > > > > Also,
> > > > > > 
> > > > > > amdgpu 0000:09:00.0: PCI PM: Suspend power state: D3hot
> > > > > > 
> > > > > > Which means whatever you're facing is either not related to this patch,
> > > > > > or at best exposed some nasty side-effect that's not handled correctly
> > > > > > by the driver.
> > > > > > 
> > > > > > I'd say amdgpu folks would be of better help for your case.
> > > > > > 
> > > > > > Raag
> > > > > 
> > > > > So according to the logs Denis shared with v4
> > > > > (https://pastebin.com/1uBEA1FL) the GPU should have been going to BOCO. This
> > > > > stands for "Bus off Chip Off"
> > > > > 
> > > > > amdgpu 0000:09:00.0: amdgpu: Using BOCO for runtime pm
> > > > > 
> > > > > If it's going to D3hot - that's not going to be BOCO, it should be going to
> > > > > D3cold.
> > > > 
> > > > Yes, because upstream port is in D0 for some reason (might be this patch
> > > > but not sure) and so will be the root port.
> > > > 
> > > > pcieport 0000:07:00.0: PCI PM: Suspend power state: D0
> > > > pcieport 0000:07:00.0: PCI PM: Skipped
> > > > 
> > > > and my best guess is the driver is not able to cope with the lack of D3cold.
> > > 
> > > Yes; if the driver is configured to expect BOCO (D3cold) if it doesn't get it, chaos ensues.
> > > 
> > > I guess let's double check the behavior with CONFIG_PCI_DEBUG to verify this patch is what is changing that upstream port behavior.
> > 
> > 
> > This is the very same exact kernel, minus the patch in question:  https://pastebin.com/rwMYgG7C
> > 
> > 
> > Both previous kernel and this one have CONFIG_PCI_DEBUG=y.
> > 
> > Removed the initial bootup sequence to be able to use pastebin.
> 
> Thanks - this confirms that the problem is the root port not going to D3.
> This new log shows:
> 
> pcieport 0000:07:00.0: PCI PM: Suspend power state: D3hot
> 
> So I feel we should fixate on solving that.

Which means what you're looking for is error flag being set somewhere in
the hierarchy that is preventing suspend.

But regardless of it, my understanding is that root port suspend depends
on a lot of factors (now errors flags being one of them with this patch)
and endpoint driver can't possibly enforce or guarantee it - the best it
can do is try.

What's probably needed is D3cold failure handling on driver side, but I'm
no PCI PM expert and perhaps Rafael can comment on it.

Raag

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-05-20 18:42                 ` Raag Jadav
@ 2025-05-20 18:56                   ` Mario Limonciello
  2025-05-21  8:54                     ` Raag Jadav
  0 siblings, 1 reply; 27+ messages in thread
From: Mario Limonciello @ 2025-05-20 18:56 UTC (permalink / raw)
  To: Raag Jadav
  Cc: Denis Benato, rafael, mahesh, oohall, bhelgaas, linux-pci,
	linux-pm, linux-kernel, ilpo.jarvinen, lukas, aravind.iddamsetty,
	amd-gfx@lists.freedesktop.org, Alex Deucher

On 5/20/2025 1:42 PM, Raag Jadav wrote:
> On Tue, May 20, 2025 at 12:39:12PM -0500, Mario Limonciello wrote:
>> On 5/20/2025 12:22 PM, Denis Benato wrote:
>>> On 5/20/25 17:49, Mario Limonciello wrote:
>>>> On 5/20/2025 10:47 AM, Raag Jadav wrote:
>>>>> On Tue, May 20, 2025 at 10:23:57AM -0500, Mario Limonciello wrote:
>>>>>> On 5/20/2025 4:48 AM, Raag Jadav wrote:
>>>>>>> On Mon, May 19, 2025 at 11:42:31PM +0200, Denis Benato wrote:
>>>>>>>> On 5/19/25 12:41, Raag Jadav wrote:
>>>>>>>>> On Mon, May 19, 2025 at 03:58:08PM +0530, Raag Jadav wrote:
>>>>>>>>>> If error status is set on an AER capable device, most likely either the
>>>>>>>>>> device recovery is in progress or has already failed. Neither of the
>>>>>>>>>> cases are well suited for power state transition of the device, since
>>>>>>>>>> this can lead to unpredictable consequences like resume failure, or in
>>>>>>>>>> worst case the device is lost because of it. Leave the device in its
>>>>>>>>>> existing power state to avoid such issues.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Raag Jadav <raag.jadav@intel.com>
>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>> v2: Synchronize AER handling with PCI PM (Rafael)
>>>>>>>>>> v3: Move pci_aer_in_progress() to pci_set_low_power_state() (Rafael)
>>>>>>>>>>         Elaborate "why" (Bjorn)
>>>>>>>>>> v4: Rely on error status instead of device status
>>>>>>>>>>         Condense comment (Lukas)
>>>>>>>>> Since pci_aer_in_progress() is changed I've not included Rafael's tag with
>>>>>>>>> my understanding of this needing a revisit. If this was a mistake, please
>>>>>>>>> let me know.
>>>>>>>>>
>>>>>>>>> Denis, Mario, does this fix your issue?
>>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Unfortunately no, I have prepared a dmesg but had to remove the bootup process because it was too long of a few kb: https://pastebin.com/1uBEA1FL
>>>>>>>
>>>>>>> Thanks for the test. It seems there's no hotplug event this time around
>>>>>>> and endpoint device is still intact without any PCI related failure.
>>>>>>>
>>>>>>> Also,
>>>>>>>
>>>>>>> amdgpu 0000:09:00.0: PCI PM: Suspend power state: D3hot
>>>>>>>
>>>>>>> Which means whatever you're facing is either not related to this patch,
>>>>>>> or at best exposed some nasty side-effect that's not handled correctly
>>>>>>> by the driver.
>>>>>>>
>>>>>>> I'd say amdgpu folks would be of better help for your case.
>>>>>>>
>>>>>>> Raag
>>>>>>
>>>>>> So according to the logs Denis shared with v4
>>>>>> (https://pastebin.com/1uBEA1FL) the GPU should have been going to BOCO. This
>>>>>> stands for "Bus off Chip Off"
>>>>>>
>>>>>> amdgpu 0000:09:00.0: amdgpu: Using BOCO for runtime pm
>>>>>>
>>>>>> If it's going to D3hot - that's not going to be BOCO, it should be going to
>>>>>> D3cold.
>>>>>
>>>>> Yes, because upstream port is in D0 for some reason (might be this patch
>>>>> but not sure) and so will be the root port.
>>>>>
>>>>> pcieport 0000:07:00.0: PCI PM: Suspend power state: D0
>>>>> pcieport 0000:07:00.0: PCI PM: Skipped
>>>>>
>>>>> and my best guess is the driver is not able to cope with the lack of D3cold.
>>>>
>>>> Yes; if the driver is configured to expect BOCO (D3cold) if it doesn't get it, chaos ensues.
>>>>
>>>> I guess let's double check the behavior with CONFIG_PCI_DEBUG to verify this patch is what is changing that upstream port behavior.
>>>
>>>
>>> This is the very same exact kernel, minus the patch in question:  https://pastebin.com/rwMYgG7C
>>>
>>>
>>> Both previous kernel and this one have CONFIG_PCI_DEBUG=y.
>>>
>>> Removed the initial bootup sequence to be able to use pastebin.
>>
>> Thanks - this confirms that the problem is the root port not going to D3.
>> This new log shows:
>>
>> pcieport 0000:07:00.0: PCI PM: Suspend power state: D3hot
>>
>> So I feel we should fixate on solving that.
> 
> Which means what you're looking for is error flag being set somewhere in
> the hierarchy that is preventing suspend.

Is the issue perhaps that this is now gated on both correctable and 
uncorrectable errors?

Perhaps should *correctable errors* be emitted with a warning and the 
*uncorrectable errors* be fatal?

> 
> But regardless of it, my understanding is that root port suspend depends
> on a lot of factors (now errors flags being one of them with this patch)
> and endpoint driver can't possibly enforce or guarantee it - the best it
> can do is try.
> 
> What's probably needed is D3cold failure handling on driver side, but I'm
> no PCI PM expert and perhaps Rafael can comment on it.
> 
> Raag

 From the driver perspective it does have expectations that the parts 
outside the driver did the right thing.  If the driver was expecting the 
root port to be powered down at suspend and it wasn't there are hardware 
components that didn't power cycle and that's what we're seeing here.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-05-20 18:56                   ` Mario Limonciello
@ 2025-05-21  8:54                     ` Raag Jadav
  2025-05-21 11:27                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Raag Jadav @ 2025-05-21  8:54 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Denis Benato, rafael, mahesh, oohall, bhelgaas, linux-pci,
	linux-pm, linux-kernel, ilpo.jarvinen, lukas, aravind.iddamsetty,
	amd-gfx@lists.freedesktop.org, Alex Deucher

On Tue, May 20, 2025 at 01:56:28PM -0500, Mario Limonciello wrote:
> On 5/20/2025 1:42 PM, Raag Jadav wrote:
> > On Tue, May 20, 2025 at 12:39:12PM -0500, Mario Limonciello wrote:
> > > On 5/20/2025 12:22 PM, Denis Benato wrote:
> > > > On 5/20/25 17:49, Mario Limonciello wrote:
> > > > > On 5/20/2025 10:47 AM, Raag Jadav wrote:
> > > > > > On Tue, May 20, 2025 at 10:23:57AM -0500, Mario Limonciello wrote:
> > > > > > > On 5/20/2025 4:48 AM, Raag Jadav wrote:
> > > > > > > > On Mon, May 19, 2025 at 11:42:31PM +0200, Denis Benato wrote:
> > > > > > > > > On 5/19/25 12:41, Raag Jadav wrote:
> > > > > > > > > > On Mon, May 19, 2025 at 03:58:08PM +0530, Raag Jadav wrote:
> > > > > > > > > > > If error status is set on an AER capable device, most likely either the
> > > > > > > > > > > device recovery is in progress or has already failed. Neither of the
> > > > > > > > > > > cases are well suited for power state transition of the device, since
> > > > > > > > > > > this can lead to unpredictable consequences like resume failure, or in
> > > > > > > > > > > worst case the device is lost because of it. Leave the device in its
> > > > > > > > > > > existing power state to avoid such issues.
> > > > > > > > > > > 
> > > > > > > > > > > Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> > > > > > > > > > > ---
> > > > > > > > > > > 
> > > > > > > > > > > v2: Synchronize AER handling with PCI PM (Rafael)
> > > > > > > > > > > v3: Move pci_aer_in_progress() to pci_set_low_power_state() (Rafael)
> > > > > > > > > > >         Elaborate "why" (Bjorn)
> > > > > > > > > > > v4: Rely on error status instead of device status
> > > > > > > > > > >         Condense comment (Lukas)
> > > > > > > > > > Since pci_aer_in_progress() is changed I've not included Rafael's tag with
> > > > > > > > > > my understanding of this needing a revisit. If this was a mistake, please
> > > > > > > > > > let me know.
> > > > > > > > > > 
> > > > > > > > > > Denis, Mario, does this fix your issue?
> > > > > > > > > > 
> > > > > > > > > Hello,
> > > > > > > > > 
> > > > > > > > > Unfortunately no, I have prepared a dmesg but had to remove the bootup process because it was too long of a few kb: https://pastebin.com/1uBEA1FL
> > > > > > > > 
> > > > > > > > Thanks for the test. It seems there's no hotplug event this time around
> > > > > > > > and endpoint device is still intact without any PCI related failure.
> > > > > > > > 
> > > > > > > > Also,
> > > > > > > > 
> > > > > > > > amdgpu 0000:09:00.0: PCI PM: Suspend power state: D3hot
> > > > > > > > 
> > > > > > > > Which means whatever you're facing is either not related to this patch,
> > > > > > > > or at best exposed some nasty side-effect that's not handled correctly
> > > > > > > > by the driver.
> > > > > > > > 
> > > > > > > > I'd say amdgpu folks would be of better help for your case.
> > > > > > > > 
> > > > > > > > Raag
> > > > > > > 
> > > > > > > So according to the logs Denis shared with v4
> > > > > > > (https://pastebin.com/1uBEA1FL) the GPU should have been going to BOCO. This
> > > > > > > stands for "Bus off Chip Off"
> > > > > > > 
> > > > > > > amdgpu 0000:09:00.0: amdgpu: Using BOCO for runtime pm
> > > > > > > 
> > > > > > > If it's going to D3hot - that's not going to be BOCO, it should be going to
> > > > > > > D3cold.
> > > > > > 
> > > > > > Yes, because upstream port is in D0 for some reason (might be this patch
> > > > > > but not sure) and so will be the root port.
> > > > > > 
> > > > > > pcieport 0000:07:00.0: PCI PM: Suspend power state: D0
> > > > > > pcieport 0000:07:00.0: PCI PM: Skipped
> > > > > > 
> > > > > > and my best guess is the driver is not able to cope with the lack of D3cold.
> > > > > 
> > > > > Yes; if the driver is configured to expect BOCO (D3cold) if it doesn't get it, chaos ensues.
> > > > > 
> > > > > I guess let's double check the behavior with CONFIG_PCI_DEBUG to verify this patch is what is changing that upstream port behavior.
> > > > 
> > > > 
> > > > This is the very same exact kernel, minus the patch in question:  https://pastebin.com/rwMYgG7C
> > > > 
> > > > 
> > > > Both previous kernel and this one have CONFIG_PCI_DEBUG=y.
> > > > 
> > > > Removed the initial bootup sequence to be able to use pastebin.
> > > 
> > > Thanks - this confirms that the problem is the root port not going to D3.
> > > This new log shows:
> > > 
> > > pcieport 0000:07:00.0: PCI PM: Suspend power state: D3hot
> > > 
> > > So I feel we should fixate on solving that.
> > 
> > Which means what you're looking for is error flag being set somewhere in
> > the hierarchy that is preventing suspend.
> 
> Is the issue perhaps that this is now gated on both correctable and
> uncorrectable errors?
> 
> Perhaps should *correctable errors* be emitted with a warning and the
> *uncorrectable errors* be fatal?

That'd be more or less inline with hiding the issue, and it can also race
with err_handler callback if driver has registered it.

> > But regardless of it, my understanding is that root port suspend depends
> > on a lot of factors (now errors flags being one of them with this patch)
> > and endpoint driver can't possibly enforce or guarantee it - the best it
> > can do is try.
> > 
> > What's probably needed is D3cold failure handling on driver side, but I'm
> > no PCI PM expert and perhaps Rafael can comment on it.
> > 
> > Raag
> 
> From the driver perspective it does have expectations that the parts outside
> the driver did the right thing.  If the driver was expecting the root port
> to be powered down at suspend and it wasn't there are hardware components
> that didn't power cycle and that's what we're seeing here.

Which means the expectation set by the driver is the opposite of the
purpose of this patch, and it's going to fail if any kind of error is
detected under root port during suspend.

Raag
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-05-21  8:54                     ` Raag Jadav
@ 2025-05-21 11:27                       ` Rafael J. Wysocki
  2025-05-23 15:23                         ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2025-05-21 11:27 UTC (permalink / raw)
  To: Raag Jadav, Mario Limonciello
  Cc: Denis Benato, rafael, mahesh, oohall, bhelgaas, linux-pci,
	linux-pm, linux-kernel, ilpo.jarvinen, lukas, aravind.iddamsetty,
	amd-gfx@lists.freedesktop.org, Alex Deucher

On Wed, May 21, 2025 at 10:54 AM Raag Jadav <raag.jadav@intel.com> wrote:
>
> On Tue, May 20, 2025 at 01:56:28PM -0500, Mario Limonciello wrote:
> > On 5/20/2025 1:42 PM, Raag Jadav wrote:
> > > On Tue, May 20, 2025 at 12:39:12PM -0500, Mario Limonciello wrote:
> > > > On 5/20/2025 12:22 PM, Denis Benato wrote:
> > > > > On 5/20/25 17:49, Mario Limonciello wrote:
> > > > > > On 5/20/2025 10:47 AM, Raag Jadav wrote:
> > > > > > > On Tue, May 20, 2025 at 10:23:57AM -0500, Mario Limonciello wrote:
> > > > > > > > On 5/20/2025 4:48 AM, Raag Jadav wrote:
> > > > > > > > > On Mon, May 19, 2025 at 11:42:31PM +0200, Denis Benato wrote:
> > > > > > > > > > On 5/19/25 12:41, Raag Jadav wrote:
> > > > > > > > > > > On Mon, May 19, 2025 at 03:58:08PM +0530, Raag Jadav wrote:
> > > > > > > > > > > > If error status is set on an AER capable device, most likely either the
> > > > > > > > > > > > device recovery is in progress or has already failed. Neither of the
> > > > > > > > > > > > cases are well suited for power state transition of the device, since
> > > > > > > > > > > > this can lead to unpredictable consequences like resume failure, or in
> > > > > > > > > > > > worst case the device is lost because of it. Leave the device in its
> > > > > > > > > > > > existing power state to avoid such issues.
> > > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> > > > > > > > > > > > ---
> > > > > > > > > > > >
> > > > > > > > > > > > v2: Synchronize AER handling with PCI PM (Rafael)
> > > > > > > > > > > > v3: Move pci_aer_in_progress() to pci_set_low_power_state() (Rafael)
> > > > > > > > > > > >         Elaborate "why" (Bjorn)
> > > > > > > > > > > > v4: Rely on error status instead of device status
> > > > > > > > > > > >         Condense comment (Lukas)
> > > > > > > > > > > Since pci_aer_in_progress() is changed I've not included Rafael's tag with
> > > > > > > > > > > my understanding of this needing a revisit. If this was a mistake, please
> > > > > > > > > > > let me know.
> > > > > > > > > > >
> > > > > > > > > > > Denis, Mario, does this fix your issue?
> > > > > > > > > > >
> > > > > > > > > > Hello,
> > > > > > > > > >
> > > > > > > > > > Unfortunately no, I have prepared a dmesg but had to remove the bootup process because it was too long of a few kb: https://pastebin.com/1uBEA1FL
> > > > > > > > >
> > > > > > > > > Thanks for the test. It seems there's no hotplug event this time around
> > > > > > > > > and endpoint device is still intact without any PCI related failure.
> > > > > > > > >
> > > > > > > > > Also,
> > > > > > > > >
> > > > > > > > > amdgpu 0000:09:00.0: PCI PM: Suspend power state: D3hot
> > > > > > > > >
> > > > > > > > > Which means whatever you're facing is either not related to this patch,
> > > > > > > > > or at best exposed some nasty side-effect that's not handled correctly
> > > > > > > > > by the driver.
> > > > > > > > >
> > > > > > > > > I'd say amdgpu folks would be of better help for your case.
> > > > > > > > >
> > > > > > > > > Raag
> > > > > > > >
> > > > > > > > So according to the logs Denis shared with v4
> > > > > > > > (https://pastebin.com/1uBEA1FL) the GPU should have been going to BOCO. This
> > > > > > > > stands for "Bus off Chip Off"
> > > > > > > >
> > > > > > > > amdgpu 0000:09:00.0: amdgpu: Using BOCO for runtime pm
> > > > > > > >
> > > > > > > > If it's going to D3hot - that's not going to be BOCO, it should be going to
> > > > > > > > D3cold.
> > > > > > >
> > > > > > > Yes, because upstream port is in D0 for some reason (might be this patch
> > > > > > > but not sure) and so will be the root port.
> > > > > > >
> > > > > > > pcieport 0000:07:00.0: PCI PM: Suspend power state: D0
> > > > > > > pcieport 0000:07:00.0: PCI PM: Skipped
> > > > > > >
> > > > > > > and my best guess is the driver is not able to cope with the lack of D3cold.
> > > > > >
> > > > > > Yes; if the driver is configured to expect BOCO (D3cold) if it doesn't get it, chaos ensues.
> > > > > >
> > > > > > I guess let's double check the behavior with CONFIG_PCI_DEBUG to verify this patch is what is changing that upstream port behavior.
> > > > >
> > > > >
> > > > > This is the very same exact kernel, minus the patch in question:  https://pastebin.com/rwMYgG7C
> > > > >
> > > > >
> > > > > Both previous kernel and this one have CONFIG_PCI_DEBUG=y.
> > > > >
> > > > > Removed the initial bootup sequence to be able to use pastebin.
> > > >
> > > > Thanks - this confirms that the problem is the root port not going to D3.
> > > > This new log shows:
> > > >
> > > > pcieport 0000:07:00.0: PCI PM: Suspend power state: D3hot
> > > >
> > > > So I feel we should fixate on solving that.
> > >
> > > Which means what you're looking for is error flag being set somewhere in
> > > the hierarchy that is preventing suspend.
> >
> > Is the issue perhaps that this is now gated on both correctable and
> > uncorrectable errors?
> >
> > Perhaps should *correctable errors* be emitted with a warning and the
> > *uncorrectable errors* be fatal?
>
> That'd be more or less inline with hiding the issue, and it can also race
> with err_handler callback if driver has registered it.
>
> > > But regardless of it, my understanding is that root port suspend depends
> > > on a lot of factors (now errors flags being one of them with this patch)
> > > and endpoint driver can't possibly enforce or guarantee it - the best it
> > > can do is try.
> > >
> > > What's probably needed is D3cold failure handling on driver side, but I'm
> > > no PCI PM expert and perhaps Rafael can comment on it.
> > >
> > > Raag
> >
> > From the driver perspective it does have expectations that the parts outside
> > the driver did the right thing.  If the driver was expecting the root port
> > to be powered down at suspend and it wasn't there are hardware components
> > that didn't power cycle and that's what we're seeing here.
>
> Which means the expectation set by the driver is the opposite of the
> purpose of this patch, and it's going to fail if any kind of error is
> detected under root port during suspend.

And IMV this driver's expectation is questionable at least.

There is no promise whatsoever that the device will always be put into
D3cold during system suspend.

Thanks!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-05-20 17:22             ` Denis Benato
  2025-05-20 17:39               ` Mario Limonciello
@ 2025-05-21 13:39               ` Lukas Wunner
  2025-05-21 17:06                 ` Mario Limonciello
  1 sibling, 1 reply; 27+ messages in thread
From: Lukas Wunner @ 2025-05-21 13:39 UTC (permalink / raw)
  To: Denis Benato
  Cc: Mario Limonciello, Raag Jadav, rafael, mahesh, oohall, bhelgaas,
	linux-pci, linux-pm, linux-kernel, ilpo.jarvinen,
	aravind.iddamsetty

On Tue, May 20, 2025 at 07:22:04PM +0200, Denis Benato wrote:
> This is the very same exact kernel, minus the patch in question:
> https://pastebin.com/rwMYgG7C
> 
> Both previous kernel and this one have CONFIG_PCI_DEBUG=y.

This log excerpt shows that the ASMedia Thunderbolt controller
below the Intel Thunderbolt controller couldn't be enumerated
on boot:

mag 20 18:42:20 denis-pc kernel: pci 0000:03:01.0: broken device, retraining non-functional downstream link at 2.5GT/s
mag 20 18:42:20 denis-pc kernel: pci 0000:03:01.0: retraining failed

However, the Thunderbolt tunnel goes up and the devices are
enumerated 24 seconds later:

mag 20 18:42:44 denis-pc kernel: pcieport 0000:03:01.0: pciehp: Slot(1-1): Card present
mag 20 18:42:44 denis-pc kernel: pcieport 0000:03:01.0: pciehp: Slot(1-1): Link Up

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-05-21 13:39               ` Lukas Wunner
@ 2025-05-21 17:06                 ` Mario Limonciello
  2025-05-21 20:28                   ` Denis Benato
  0 siblings, 1 reply; 27+ messages in thread
From: Mario Limonciello @ 2025-05-21 17:06 UTC (permalink / raw)
  To: Lukas Wunner, Denis Benato
  Cc: Raag Jadav, rafael, mahesh, oohall, bhelgaas, linux-pci, linux-pm,
	linux-kernel, ilpo.jarvinen, aravind.iddamsetty

On 5/21/25 08:39, Lukas Wunner wrote:
> On Tue, May 20, 2025 at 07:22:04PM +0200, Denis Benato wrote:
>> This is the very same exact kernel, minus the patch in question:
>> https://pastebin.com/rwMYgG7C
>>
>> Both previous kernel and this one have CONFIG_PCI_DEBUG=y.
> 
> This log excerpt shows that the ASMedia Thunderbolt controller
> below the Intel Thunderbolt controller couldn't be enumerated
> on boot:
> 
> mag 20 18:42:20 denis-pc kernel: pci 0000:03:01.0: broken device, retraining non-functional downstream link at 2.5GT/s
> mag 20 18:42:20 denis-pc kernel: pci 0000:03:01.0: retraining failed
> 
> However, the Thunderbolt tunnel goes up and the devices are
> enumerated 24 seconds later:
> 
> mag 20 18:42:44 denis-pc kernel: pcieport 0000:03:01.0: pciehp: Slot(1-1): Card present
> mag 20 18:42:44 denis-pc kernel: pcieport 0000:03:01.0: pciehp: Slot(1-1): Link Up
> 
> Thanks,
> 
> Lukas

Are you suggesting that the training failure is why there was an error?

If so maybe that should be cleared when it does eventually train.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-05-21 17:06                 ` Mario Limonciello
@ 2025-05-21 20:28                   ` Denis Benato
  2025-05-22  7:31                     ` Lukas Wunner
  0 siblings, 1 reply; 27+ messages in thread
From: Denis Benato @ 2025-05-21 20:28 UTC (permalink / raw)
  To: Mario Limonciello, Lukas Wunner
  Cc: Raag Jadav, rafael, mahesh, oohall, bhelgaas, linux-pci, linux-pm,
	linux-kernel, ilpo.jarvinen, aravind.iddamsetty


On 5/21/25 19:06, Mario Limonciello wrote:
> On 5/21/25 08:39, Lukas Wunner wrote:
>> On Tue, May 20, 2025 at 07:22:04PM +0200, Denis Benato wrote:
>>> This is the very same exact kernel, minus the patch in question:
>>> https://pastebin.com/rwMYgG7C
>>>
>>> Both previous kernel and this one have CONFIG_PCI_DEBUG=y.
>>
>> This log excerpt shows that the ASMedia Thunderbolt controller
>> below the Intel Thunderbolt controller couldn't be enumerated
>> on boot:
>>
>> mag 20 18:42:20 denis-pc kernel: pci 0000:03:01.0: broken device, retraining non-functional downstream link at 2.5GT/s
>> mag 20 18:42:20 denis-pc kernel: pci 0000:03:01.0: retraining failed
>>
>> However, the Thunderbolt tunnel goes up and the devices are
>> enumerated 24 seconds later:
>>
>> mag 20 18:42:44 denis-pc kernel: pcieport 0000:03:01.0: pciehp: Slot(1-1): Card present
>> mag 20 18:42:44 denis-pc kernel: pcieport 0000:03:01.0: pciehp: Slot(1-1): Link Up
>>
>> Thanks,
>>
>> Lukas
>
> Are you suggesting that the training failure is why there was an error?
>
> If so maybe that should be cleared when it does eventually train.
Oftentimes the gpu is not initialized when I boot my laptop: in that dmesg I posted I detached it and re-plugged when I saw displays not lighting up.

Using the patch proposed here the gpu is always there, that's why you don't see the same in the two dmesg in that regard.

Also another reason I want this patch to work.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-05-21 20:28                   ` Denis Benato
@ 2025-05-22  7:31                     ` Lukas Wunner
  0 siblings, 0 replies; 27+ messages in thread
From: Lukas Wunner @ 2025-05-22  7:31 UTC (permalink / raw)
  To: Denis Benato
  Cc: Mario Limonciello, Raag Jadav, rafael, mahesh, oohall, bhelgaas,
	linux-pci, linux-pm, linux-kernel, ilpo.jarvinen,
	aravind.iddamsetty, Mika Westerberg, Mathias Nyman

[cc += Mika, Mathias]

On Tue, May 20, 2025 at 07:22:04PM +0200, Denis Benato wrote:
> This is the very same exact kernel, minus the patch in question:
> https://pastebin.com/rwMYgG7C

I note that your machine uses an Intel Maple Ridge 2C Thunderbolt
controller.  It looks like we're missing that one in the list of
XHCI devices which are allowed to runtime suspend.  Only the 4C
variant of that controller was allowed to runtime suspend by commit
5a8e3229ac27 ("xhci-pci: Allow host runtime PM as default for Intel
Maple Ridge xHCI").

Commit a611bf473d1f ("xhci-pci: Set runtime PM as default policy on
all xHC 1.2 or later devices") has since obviated the need to
continuously amend the list of controllers, but it's unclear to me
whether it encompasses Maple Ridge 2C or not.  Chances are it doesn't
because the 4C variant was kept in the whitelist.

What do you get if you run:

cat /sys/bus/pci/devices/0000:38:00.0/power/runtime_enabled
cat /sys/bus/pci/devices/0000:38:00.0/power/runtime_status

Below is a small patch to add the 2C variant to the whitelist.
Does it help in any way?  What's the output for the two commands
above if you apply the patch?

Upthread it was said that it's a problem if the Root Port stays in D0.
Of course if the XHCI doesn't runtime suspend, that will keep the
Root Port in D0 as well.  The patch may eliminate one reason for the
Root Port to stay in D0.

The PCI IDs database was also missing the IDs of the 2C variant,
I just went ahead and fixed that.

Thanks,

Lukas

-- >8 --

diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 0c481cb..5233d59 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -61,7 +61,8 @@
 #define PCI_DEVICE_ID_PHYTIUM_XHCI			0xdc27
 
 /* Thunderbolt */
-#define PCI_DEVICE_ID_INTEL_MAPLE_RIDGE_XHCI		0x1138
+#define PCI_DEVICE_ID_INTEL_MAPLE_RIDGE_2C_XHCI		0x1135
+#define PCI_DEVICE_ID_INTEL_MAPLE_RIDGE_4C_XHCI		0x1138
 #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_2C_XHCI	0x15b5
 #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_XHCI	0x15b6
 #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_LP_XHCI	0x15c1
@@ -387,7 +388,8 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
 	     pdev->device == PCI_DEVICE_ID_INTEL_TITAN_RIDGE_DD_XHCI ||
 	     pdev->device == PCI_DEVICE_ID_INTEL_ICE_LAKE_XHCI ||
 	     pdev->device == PCI_DEVICE_ID_INTEL_TIGER_LAKE_XHCI ||
-	     pdev->device == PCI_DEVICE_ID_INTEL_MAPLE_RIDGE_XHCI))
+	     pdev->device == PCI_DEVICE_ID_INTEL_MAPLE_RIDGE_2C_XHCI ||
+	     pdev->device == PCI_DEVICE_ID_INTEL_MAPLE_RIDGE_4C_XHCI))
 		xhci->quirks |= XHCI_DEFAULT_PM_RUNTIME_ALLOW;
 
 	if (pdev->vendor == PCI_VENDOR_ID_ETRON &&

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-05-21 11:27                       ` Rafael J. Wysocki
@ 2025-05-23 15:23                         ` Rafael J. Wysocki
  2025-05-30 17:23                           ` Raag Jadav
  0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2025-05-23 15:23 UTC (permalink / raw)
  To: Raag Jadav, Mario Limonciello
  Cc: Denis Benato, mahesh, oohall, bhelgaas, linux-pci, linux-pm,
	linux-kernel, ilpo.jarvinen, lukas, aravind.iddamsetty,
	amd-gfx@lists.freedesktop.org, Alex Deucher

On Wed, May 21, 2025 at 1:27 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Wed, May 21, 2025 at 10:54 AM Raag Jadav <raag.jadav@intel.com> wrote:
> >
> > On Tue, May 20, 2025 at 01:56:28PM -0500, Mario Limonciello wrote:
> > > On 5/20/2025 1:42 PM, Raag Jadav wrote:
> > > > On Tue, May 20, 2025 at 12:39:12PM -0500, Mario Limonciello wrote:
> > > > > On 5/20/2025 12:22 PM, Denis Benato wrote:
> > > > > > On 5/20/25 17:49, Mario Limonciello wrote:
> > > > > > > On 5/20/2025 10:47 AM, Raag Jadav wrote:
> > > > > > > > On Tue, May 20, 2025 at 10:23:57AM -0500, Mario Limonciello wrote:
> > > > > > > > > On 5/20/2025 4:48 AM, Raag Jadav wrote:
> > > > > > > > > > On Mon, May 19, 2025 at 11:42:31PM +0200, Denis Benato wrote:
> > > > > > > > > > > On 5/19/25 12:41, Raag Jadav wrote:
> > > > > > > > > > > > On Mon, May 19, 2025 at 03:58:08PM +0530, Raag Jadav wrote:
> > > > > > > > > > > > > If error status is set on an AER capable device, most likely either the
> > > > > > > > > > > > > device recovery is in progress or has already failed. Neither of the
> > > > > > > > > > > > > cases are well suited for power state transition of the device, since
> > > > > > > > > > > > > this can lead to unpredictable consequences like resume failure, or in
> > > > > > > > > > > > > worst case the device is lost because of it. Leave the device in its
> > > > > > > > > > > > > existing power state to avoid such issues.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >
> > > > > > > > > > > > > v2: Synchronize AER handling with PCI PM (Rafael)
> > > > > > > > > > > > > v3: Move pci_aer_in_progress() to pci_set_low_power_state() (Rafael)
> > > > > > > > > > > > >         Elaborate "why" (Bjorn)
> > > > > > > > > > > > > v4: Rely on error status instead of device status
> > > > > > > > > > > > >         Condense comment (Lukas)
> > > > > > > > > > > > Since pci_aer_in_progress() is changed I've not included Rafael's tag with
> > > > > > > > > > > > my understanding of this needing a revisit. If this was a mistake, please
> > > > > > > > > > > > let me know.
> > > > > > > > > > > >
> > > > > > > > > > > > Denis, Mario, does this fix your issue?
> > > > > > > > > > > >
> > > > > > > > > > > Hello,
> > > > > > > > > > >
> > > > > > > > > > > Unfortunately no, I have prepared a dmesg but had to remove the bootup process because it was too long of a few kb: https://pastebin.com/1uBEA1FL
> > > > > > > > > >
> > > > > > > > > > Thanks for the test. It seems there's no hotplug event this time around
> > > > > > > > > > and endpoint device is still intact without any PCI related failure.
> > > > > > > > > >
> > > > > > > > > > Also,
> > > > > > > > > >
> > > > > > > > > > amdgpu 0000:09:00.0: PCI PM: Suspend power state: D3hot
> > > > > > > > > >
> > > > > > > > > > Which means whatever you're facing is either not related to this patch,
> > > > > > > > > > or at best exposed some nasty side-effect that's not handled correctly
> > > > > > > > > > by the driver.
> > > > > > > > > >
> > > > > > > > > > I'd say amdgpu folks would be of better help for your case.
> > > > > > > > > >
> > > > > > > > > > Raag
> > > > > > > > >
> > > > > > > > > So according to the logs Denis shared with v4
> > > > > > > > > (https://pastebin.com/1uBEA1FL) the GPU should have been going to BOCO. This
> > > > > > > > > stands for "Bus off Chip Off"
> > > > > > > > >
> > > > > > > > > amdgpu 0000:09:00.0: amdgpu: Using BOCO for runtime pm
> > > > > > > > >
> > > > > > > > > If it's going to D3hot - that's not going to be BOCO, it should be going to
> > > > > > > > > D3cold.
> > > > > > > >
> > > > > > > > Yes, because upstream port is in D0 for some reason (might be this patch
> > > > > > > > but not sure) and so will be the root port.
> > > > > > > >
> > > > > > > > pcieport 0000:07:00.0: PCI PM: Suspend power state: D0
> > > > > > > > pcieport 0000:07:00.0: PCI PM: Skipped
> > > > > > > >
> > > > > > > > and my best guess is the driver is not able to cope with the lack of D3cold.
> > > > > > >
> > > > > > > Yes; if the driver is configured to expect BOCO (D3cold) if it doesn't get it, chaos ensues.
> > > > > > >
> > > > > > > I guess let's double check the behavior with CONFIG_PCI_DEBUG to verify this patch is what is changing that upstream port behavior.
> > > > > >
> > > > > >
> > > > > > This is the very same exact kernel, minus the patch in question:  https://pastebin.com/rwMYgG7C
> > > > > >
> > > > > >
> > > > > > Both previous kernel and this one have CONFIG_PCI_DEBUG=y.
> > > > > >
> > > > > > Removed the initial bootup sequence to be able to use pastebin.
> > > > >
> > > > > Thanks - this confirms that the problem is the root port not going to D3.
> > > > > This new log shows:
> > > > >
> > > > > pcieport 0000:07:00.0: PCI PM: Suspend power state: D3hot
> > > > >
> > > > > So I feel we should fixate on solving that.
> > > >
> > > > Which means what you're looking for is error flag being set somewhere in
> > > > the hierarchy that is preventing suspend.
> > >
> > > Is the issue perhaps that this is now gated on both correctable and
> > > uncorrectable errors?
> > >
> > > Perhaps should *correctable errors* be emitted with a warning and the
> > > *uncorrectable errors* be fatal?
> >
> > That'd be more or less inline with hiding the issue, and it can also race
> > with err_handler callback if driver has registered it.
> >
> > > > But regardless of it, my understanding is that root port suspend depends
> > > > on a lot of factors (now errors flags being one of them with this patch)
> > > > and endpoint driver can't possibly enforce or guarantee it - the best it
> > > > can do is try.
> > > >
> > > > What's probably needed is D3cold failure handling on driver side, but I'm
> > > > no PCI PM expert and perhaps Rafael can comment on it.
> > > >
> > > > Raag
> > >
> > > From the driver perspective it does have expectations that the parts outside
> > > the driver did the right thing.  If the driver was expecting the root port
> > > to be powered down at suspend and it wasn't there are hardware components
> > > that didn't power cycle and that's what we're seeing here.
> >
> > Which means the expectation set by the driver is the opposite of the
> > purpose of this patch, and it's going to fail if any kind of error is
> > detected under root port during suspend.
>
> And IMV this driver's expectation is questionable at least.
>
> There is no promise whatsoever that the device will always be put into
> D3cold during system suspend.

For instance, user space may disable D3cold for any PCI device via the
d3cold_allowed attribute in sysfs.

If the driver cannot handle this, it needs to be fixed.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-05-23 15:23                         ` Rafael J. Wysocki
@ 2025-05-30 17:23                           ` Raag Jadav
  2025-05-30 17:49                             ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Raag Jadav @ 2025-05-30 17:23 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mario Limonciello, Denis Benato, mahesh, oohall, bhelgaas,
	linux-pci, linux-pm, linux-kernel, ilpo.jarvinen, lukas,
	aravind.iddamsetty, amd-gfx@lists.freedesktop.org, Alex Deucher

On Fri, May 23, 2025 at 05:23:10PM +0200, Rafael J. Wysocki wrote:
> On Wed, May 21, 2025 at 1:27 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > On Wed, May 21, 2025 at 10:54 AM Raag Jadav <raag.jadav@intel.com> wrote:
> > > On Tue, May 20, 2025 at 01:56:28PM -0500, Mario Limonciello wrote:
> > > > On 5/20/2025 1:42 PM, Raag Jadav wrote:
> > > > > On Tue, May 20, 2025 at 12:39:12PM -0500, Mario Limonciello wrote:
> > > > > > On 5/20/2025 12:22 PM, Denis Benato wrote:
> > > > > > > On 5/20/25 17:49, Mario Limonciello wrote:
> > > > > > > > On 5/20/2025 10:47 AM, Raag Jadav wrote:
> > > > > > > > > On Tue, May 20, 2025 at 10:23:57AM -0500, Mario Limonciello wrote:
> > > > > > > > > > On 5/20/2025 4:48 AM, Raag Jadav wrote:
> > > > > > > > > > > On Mon, May 19, 2025 at 11:42:31PM +0200, Denis Benato wrote:
> > > > > > > > > > > > On 5/19/25 12:41, Raag Jadav wrote:
> > > > > > > > > > > > > On Mon, May 19, 2025 at 03:58:08PM +0530, Raag Jadav wrote:
> > > > > > > > > > > > > > If error status is set on an AER capable device, most likely either the
> > > > > > > > > > > > > > device recovery is in progress or has already failed. Neither of the
> > > > > > > > > > > > > > cases are well suited for power state transition of the device, since
> > > > > > > > > > > > > > this can lead to unpredictable consequences like resume failure, or in
> > > > > > > > > > > > > > worst case the device is lost because of it. Leave the device in its
> > > > > > > > > > > > > > existing power state to avoid such issues.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > v2: Synchronize AER handling with PCI PM (Rafael)
> > > > > > > > > > > > > > v3: Move pci_aer_in_progress() to pci_set_low_power_state() (Rafael)
> > > > > > > > > > > > > >         Elaborate "why" (Bjorn)
> > > > > > > > > > > > > > v4: Rely on error status instead of device status
> > > > > > > > > > > > > >         Condense comment (Lukas)
> > > > > > > > > > > > > Since pci_aer_in_progress() is changed I've not included Rafael's tag with
> > > > > > > > > > > > > my understanding of this needing a revisit. If this was a mistake, please
> > > > > > > > > > > > > let me know.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Denis, Mario, does this fix your issue?
> > > > > > > > > > > > >
> > > > > > > > > > > > Hello,
> > > > > > > > > > > >
> > > > > > > > > > > > Unfortunately no, I have prepared a dmesg but had to remove the bootup process because it was too long of a few kb: https://pastebin.com/1uBEA1FL
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the test. It seems there's no hotplug event this time around
> > > > > > > > > > > and endpoint device is still intact without any PCI related failure.
> > > > > > > > > > >
> > > > > > > > > > > Also,
> > > > > > > > > > >
> > > > > > > > > > > amdgpu 0000:09:00.0: PCI PM: Suspend power state: D3hot
> > > > > > > > > > >
> > > > > > > > > > > Which means whatever you're facing is either not related to this patch,
> > > > > > > > > > > or at best exposed some nasty side-effect that's not handled correctly
> > > > > > > > > > > by the driver.
> > > > > > > > > > >
> > > > > > > > > > > I'd say amdgpu folks would be of better help for your case.
> > > > > > > > > > >
> > > > > > > > > > > Raag
> > > > > > > > > >
> > > > > > > > > > So according to the logs Denis shared with v4
> > > > > > > > > > (https://pastebin.com/1uBEA1FL) the GPU should have been going to BOCO. This
> > > > > > > > > > stands for "Bus off Chip Off"
> > > > > > > > > >
> > > > > > > > > > amdgpu 0000:09:00.0: amdgpu: Using BOCO for runtime pm
> > > > > > > > > >
> > > > > > > > > > If it's going to D3hot - that's not going to be BOCO, it should be going to
> > > > > > > > > > D3cold.
> > > > > > > > >
> > > > > > > > > Yes, because upstream port is in D0 for some reason (might be this patch
> > > > > > > > > but not sure) and so will be the root port.
> > > > > > > > >
> > > > > > > > > pcieport 0000:07:00.0: PCI PM: Suspend power state: D0
> > > > > > > > > pcieport 0000:07:00.0: PCI PM: Skipped
> > > > > > > > >
> > > > > > > > > and my best guess is the driver is not able to cope with the lack of D3cold.
> > > > > > > >
> > > > > > > > Yes; if the driver is configured to expect BOCO (D3cold) if it doesn't get it, chaos ensues.
> > > > > > > >
> > > > > > > > I guess let's double check the behavior with CONFIG_PCI_DEBUG to verify this patch is what is changing that upstream port behavior.
> > > > > > >
> > > > > > >
> > > > > > > This is the very same exact kernel, minus the patch in question:  https://pastebin.com/rwMYgG7C
> > > > > > >
> > > > > > >
> > > > > > > Both previous kernel and this one have CONFIG_PCI_DEBUG=y.
> > > > > > >
> > > > > > > Removed the initial bootup sequence to be able to use pastebin.
> > > > > >
> > > > > > Thanks - this confirms that the problem is the root port not going to D3.
> > > > > > This new log shows:
> > > > > >
> > > > > > pcieport 0000:07:00.0: PCI PM: Suspend power state: D3hot
> > > > > >
> > > > > > So I feel we should fixate on solving that.
> > > > >
> > > > > Which means what you're looking for is error flag being set somewhere in
> > > > > the hierarchy that is preventing suspend.
> > > >
> > > > Is the issue perhaps that this is now gated on both correctable and
> > > > uncorrectable errors?
> > > >
> > > > Perhaps should *correctable errors* be emitted with a warning and the
> > > > *uncorrectable errors* be fatal?
> > >
> > > That'd be more or less inline with hiding the issue, and it can also race
> > > with err_handler callback if driver has registered it.
> > >
> > > > > But regardless of it, my understanding is that root port suspend depends
> > > > > on a lot of factors (now errors flags being one of them with this patch)
> > > > > and endpoint driver can't possibly enforce or guarantee it - the best it
> > > > > can do is try.
> > > > >
> > > > > What's probably needed is D3cold failure handling on driver side, but I'm
> > > > > no PCI PM expert and perhaps Rafael can comment on it.
> > > > >
> > > > > Raag
> > > >
> > > > From the driver perspective it does have expectations that the parts outside
> > > > the driver did the right thing.  If the driver was expecting the root port
> > > > to be powered down at suspend and it wasn't there are hardware components
> > > > that didn't power cycle and that's what we're seeing here.
> > >
> > > Which means the expectation set by the driver is the opposite of the
> > > purpose of this patch, and it's going to fail if any kind of error is
> > > detected under root port during suspend.
> >
> > And IMV this driver's expectation is questionable at least.
> >
> > There is no promise whatsoever that the device will always be put into
> > D3cold during system suspend.
> 
> For instance, user space may disable D3cold for any PCI device via the
> d3cold_allowed attribute in sysfs.
> 
> If the driver cannot handle this, it needs to be fixed.

Thanks for confirming. So should we consider this patch to be valid
and worth moving forward?

Raag

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-05-30 17:23                           ` Raag Jadav
@ 2025-05-30 17:49                             ` Rafael J. Wysocki
  2025-06-04 15:42                               ` Raag Jadav
  0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2025-05-30 17:49 UTC (permalink / raw)
  To: Raag Jadav
  Cc: Rafael J. Wysocki, Mario Limonciello, Denis Benato, mahesh,
	oohall, bhelgaas, linux-pci, linux-pm, linux-kernel,
	ilpo.jarvinen, lukas, aravind.iddamsetty,
	amd-gfx@lists.freedesktop.org, Alex Deucher

On Fri, May 30, 2025 at 7:23 PM Raag Jadav <raag.jadav@intel.com> wrote:
>
> On Fri, May 23, 2025 at 05:23:10PM +0200, Rafael J. Wysocki wrote:
> > On Wed, May 21, 2025 at 1:27 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > On Wed, May 21, 2025 at 10:54 AM Raag Jadav <raag.jadav@intel.com> wrote:
> > > > On Tue, May 20, 2025 at 01:56:28PM -0500, Mario Limonciello wrote:
> > > > > On 5/20/2025 1:42 PM, Raag Jadav wrote:
> > > > > > On Tue, May 20, 2025 at 12:39:12PM -0500, Mario Limonciello wrote:
> > > > > > > On 5/20/2025 12:22 PM, Denis Benato wrote:
> > > > > > > > On 5/20/25 17:49, Mario Limonciello wrote:
> > > > > > > > > On 5/20/2025 10:47 AM, Raag Jadav wrote:
> > > > > > > > > > On Tue, May 20, 2025 at 10:23:57AM -0500, Mario Limonciello wrote:
> > > > > > > > > > > On 5/20/2025 4:48 AM, Raag Jadav wrote:
> > > > > > > > > > > > On Mon, May 19, 2025 at 11:42:31PM +0200, Denis Benato wrote:
> > > > > > > > > > > > > On 5/19/25 12:41, Raag Jadav wrote:
> > > > > > > > > > > > > > On Mon, May 19, 2025 at 03:58:08PM +0530, Raag Jadav wrote:
> > > > > > > > > > > > > > > If error status is set on an AER capable device, most likely either the
> > > > > > > > > > > > > > > device recovery is in progress or has already failed. Neither of the
> > > > > > > > > > > > > > > cases are well suited for power state transition of the device, since
> > > > > > > > > > > > > > > this can lead to unpredictable consequences like resume failure, or in
> > > > > > > > > > > > > > > worst case the device is lost because of it. Leave the device in its
> > > > > > > > > > > > > > > existing power state to avoid such issues.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > v2: Synchronize AER handling with PCI PM (Rafael)
> > > > > > > > > > > > > > > v3: Move pci_aer_in_progress() to pci_set_low_power_state() (Rafael)
> > > > > > > > > > > > > > >         Elaborate "why" (Bjorn)
> > > > > > > > > > > > > > > v4: Rely on error status instead of device status
> > > > > > > > > > > > > > >         Condense comment (Lukas)
> > > > > > > > > > > > > > Since pci_aer_in_progress() is changed I've not included Rafael's tag with
> > > > > > > > > > > > > > my understanding of this needing a revisit. If this was a mistake, please
> > > > > > > > > > > > > > let me know.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Denis, Mario, does this fix your issue?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > Hello,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Unfortunately no, I have prepared a dmesg but had to remove the bootup process because it was too long of a few kb: https://pastebin.com/1uBEA1FL
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the test. It seems there's no hotplug event this time around
> > > > > > > > > > > > and endpoint device is still intact without any PCI related failure.
> > > > > > > > > > > >
> > > > > > > > > > > > Also,
> > > > > > > > > > > >
> > > > > > > > > > > > amdgpu 0000:09:00.0: PCI PM: Suspend power state: D3hot
> > > > > > > > > > > >
> > > > > > > > > > > > Which means whatever you're facing is either not related to this patch,
> > > > > > > > > > > > or at best exposed some nasty side-effect that's not handled correctly
> > > > > > > > > > > > by the driver.
> > > > > > > > > > > >
> > > > > > > > > > > > I'd say amdgpu folks would be of better help for your case.
> > > > > > > > > > > >
> > > > > > > > > > > > Raag
> > > > > > > > > > >
> > > > > > > > > > > So according to the logs Denis shared with v4
> > > > > > > > > > > (https://pastebin.com/1uBEA1FL) the GPU should have been going to BOCO. This
> > > > > > > > > > > stands for "Bus off Chip Off"
> > > > > > > > > > >
> > > > > > > > > > > amdgpu 0000:09:00.0: amdgpu: Using BOCO for runtime pm
> > > > > > > > > > >
> > > > > > > > > > > If it's going to D3hot - that's not going to be BOCO, it should be going to
> > > > > > > > > > > D3cold.
> > > > > > > > > >
> > > > > > > > > > Yes, because upstream port is in D0 for some reason (might be this patch
> > > > > > > > > > but not sure) and so will be the root port.
> > > > > > > > > >
> > > > > > > > > > pcieport 0000:07:00.0: PCI PM: Suspend power state: D0
> > > > > > > > > > pcieport 0000:07:00.0: PCI PM: Skipped
> > > > > > > > > >
> > > > > > > > > > and my best guess is the driver is not able to cope with the lack of D3cold.
> > > > > > > > >
> > > > > > > > > Yes; if the driver is configured to expect BOCO (D3cold) if it doesn't get it, chaos ensues.
> > > > > > > > >
> > > > > > > > > I guess let's double check the behavior with CONFIG_PCI_DEBUG to verify this patch is what is changing that upstream port behavior.
> > > > > > > >
> > > > > > > >
> > > > > > > > This is the very same exact kernel, minus the patch in question:  https://pastebin.com/rwMYgG7C
> > > > > > > >
> > > > > > > >
> > > > > > > > Both previous kernel and this one have CONFIG_PCI_DEBUG=y.
> > > > > > > >
> > > > > > > > Removed the initial bootup sequence to be able to use pastebin.
> > > > > > >
> > > > > > > Thanks - this confirms that the problem is the root port not going to D3.
> > > > > > > This new log shows:
> > > > > > >
> > > > > > > pcieport 0000:07:00.0: PCI PM: Suspend power state: D3hot
> > > > > > >
> > > > > > > So I feel we should fixate on solving that.
> > > > > >
> > > > > > Which means what you're looking for is error flag being set somewhere in
> > > > > > the hierarchy that is preventing suspend.
> > > > >
> > > > > Is the issue perhaps that this is now gated on both correctable and
> > > > > uncorrectable errors?
> > > > >
> > > > > Perhaps should *correctable errors* be emitted with a warning and the
> > > > > *uncorrectable errors* be fatal?
> > > >
> > > > That'd be more or less inline with hiding the issue, and it can also race
> > > > with err_handler callback if driver has registered it.
> > > >
> > > > > > But regardless of it, my understanding is that root port suspend depends
> > > > > > on a lot of factors (now errors flags being one of them with this patch)
> > > > > > and endpoint driver can't possibly enforce or guarantee it - the best it
> > > > > > can do is try.
> > > > > >
> > > > > > What's probably needed is D3cold failure handling on driver side, but I'm
> > > > > > no PCI PM expert and perhaps Rafael can comment on it.
> > > > > >
> > > > > > Raag
> > > > >
> > > > > From the driver perspective it does have expectations that the parts outside
> > > > > the driver did the right thing.  If the driver was expecting the root port
> > > > > to be powered down at suspend and it wasn't there are hardware components
> > > > > that didn't power cycle and that's what we're seeing here.
> > > >
> > > > Which means the expectation set by the driver is the opposite of the
> > > > purpose of this patch, and it's going to fail if any kind of error is
> > > > detected under root port during suspend.
> > >
> > > And IMV this driver's expectation is questionable at least.
> > >
> > > There is no promise whatsoever that the device will always be put into
> > > D3cold during system suspend.
> >
> > For instance, user space may disable D3cold for any PCI device via the
> > d3cold_allowed attribute in sysfs.
> >
> > If the driver cannot handle this, it needs to be fixed.
>
> Thanks for confirming. So should we consider this patch to be valid
> and worth moving forward?

It doesn't do anything that would be invalid in principle IMV.

You need to consider one more thing, though: It may be necessary to
power-cycle the device in order to kick it out of the erroneous state
and the patch effectively blocks this if I'm not mistaken.

But admittedly I'm not sure if this really matters.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-05-30 17:49                             ` Rafael J. Wysocki
@ 2025-06-04 15:42                               ` Raag Jadav
  2025-06-04 18:19                                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Raag Jadav @ 2025-06-04 15:42 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mario Limonciello, Denis Benato, mahesh, oohall, bhelgaas,
	linux-pci, linux-pm, linux-kernel, ilpo.jarvinen, lukas,
	aravind.iddamsetty, amd-gfx@lists.freedesktop.org, Alex Deucher

On Fri, May 30, 2025 at 07:49:26PM +0200, Rafael J. Wysocki wrote:
> On Fri, May 30, 2025 at 7:23 PM Raag Jadav <raag.jadav@intel.com> wrote:
> > On Fri, May 23, 2025 at 05:23:10PM +0200, Rafael J. Wysocki wrote:
> > > On Wed, May 21, 2025 at 1:27 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > On Wed, May 21, 2025 at 10:54 AM Raag Jadav <raag.jadav@intel.com> wrote:
> > > > > On Tue, May 20, 2025 at 01:56:28PM -0500, Mario Limonciello wrote:
> > > > > > On 5/20/2025 1:42 PM, Raag Jadav wrote:
> > > > > > > On Tue, May 20, 2025 at 12:39:12PM -0500, Mario Limonciello wrote:

...

> > > > > > From the driver perspective it does have expectations that the parts outside
> > > > > > the driver did the right thing.  If the driver was expecting the root port
> > > > > > to be powered down at suspend and it wasn't there are hardware components
> > > > > > that didn't power cycle and that's what we're seeing here.
> > > > >
> > > > > Which means the expectation set by the driver is the opposite of the
> > > > > purpose of this patch, and it's going to fail if any kind of error is
> > > > > detected under root port during suspend.
> > > >
> > > > And IMV this driver's expectation is questionable at least.
> > > >
> > > > There is no promise whatsoever that the device will always be put into
> > > > D3cold during system suspend.
> > >
> > > For instance, user space may disable D3cold for any PCI device via the
> > > d3cold_allowed attribute in sysfs.
> > >
> > > If the driver cannot handle this, it needs to be fixed.
> >
> > Thanks for confirming. So should we consider this patch to be valid
> > and worth moving forward?
> 
> It doesn't do anything that would be invalid in principle IMV.
> 
> You need to consider one more thing, though: It may be necessary to
> power-cycle the device in order to kick it out of the erroneous state
> and the patch effectively blocks this if I'm not mistaken.
> 
> But admittedly I'm not sure if this really matters.

Wouldn't something like bus reset (SBR) be more predictable?

Raag

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-06-04 15:42                               ` Raag Jadav
@ 2025-06-04 18:19                                 ` Rafael J. Wysocki
  2025-06-05 11:44                                   ` Raag Jadav
  0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2025-06-04 18:19 UTC (permalink / raw)
  To: Raag Jadav
  Cc: Rafael J. Wysocki, Mario Limonciello, Denis Benato, mahesh,
	oohall, bhelgaas, linux-pci, linux-pm, linux-kernel,
	ilpo.jarvinen, lukas, aravind.iddamsetty,
	amd-gfx@lists.freedesktop.org, Alex Deucher

On Wed, Jun 4, 2025 at 5:43 PM Raag Jadav <raag.jadav@intel.com> wrote:
>
> On Fri, May 30, 2025 at 07:49:26PM +0200, Rafael J. Wysocki wrote:
> > On Fri, May 30, 2025 at 7:23 PM Raag Jadav <raag.jadav@intel.com> wrote:
> > > On Fri, May 23, 2025 at 05:23:10PM +0200, Rafael J. Wysocki wrote:
> > > > On Wed, May 21, 2025 at 1:27 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > On Wed, May 21, 2025 at 10:54 AM Raag Jadav <raag.jadav@intel.com> wrote:
> > > > > > On Tue, May 20, 2025 at 01:56:28PM -0500, Mario Limonciello wrote:
> > > > > > > On 5/20/2025 1:42 PM, Raag Jadav wrote:
> > > > > > > > On Tue, May 20, 2025 at 12:39:12PM -0500, Mario Limonciello wrote:
>
> ...
>
> > > > > > > From the driver perspective it does have expectations that the parts outside
> > > > > > > the driver did the right thing.  If the driver was expecting the root port
> > > > > > > to be powered down at suspend and it wasn't there are hardware components
> > > > > > > that didn't power cycle and that's what we're seeing here.
> > > > > >
> > > > > > Which means the expectation set by the driver is the opposite of the
> > > > > > purpose of this patch, and it's going to fail if any kind of error is
> > > > > > detected under root port during suspend.
> > > > >
> > > > > And IMV this driver's expectation is questionable at least.
> > > > >
> > > > > There is no promise whatsoever that the device will always be put into
> > > > > D3cold during system suspend.
> > > >
> > > > For instance, user space may disable D3cold for any PCI device via the
> > > > d3cold_allowed attribute in sysfs.
> > > >
> > > > If the driver cannot handle this, it needs to be fixed.
> > >
> > > Thanks for confirming. So should we consider this patch to be valid
> > > and worth moving forward?
> >
> > It doesn't do anything that would be invalid in principle IMV.
> >
> > You need to consider one more thing, though: It may be necessary to
> > power-cycle the device in order to kick it out of the erroneous state
> > and the patch effectively blocks this if I'm not mistaken.
> >
> > But admittedly I'm not sure if this really matters.
>
> Wouldn't something like bus reset (SBR) be more predictable?

Maybe.

The device state is most likely inconsistent in that case, so it depends.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-06-04 18:19                                 ` Rafael J. Wysocki
@ 2025-06-05 11:44                                   ` Raag Jadav
  2025-06-05 12:26                                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Raag Jadav @ 2025-06-05 11:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mario Limonciello, Denis Benato, mahesh, oohall, bhelgaas,
	linux-pci, linux-pm, linux-kernel, ilpo.jarvinen, lukas,
	aravind.iddamsetty, amd-gfx@lists.freedesktop.org, Alex Deucher

On Wed, Jun 04, 2025 at 08:19:34PM +0200, Rafael J. Wysocki wrote:
> On Wed, Jun 4, 2025 at 5:43 PM Raag Jadav <raag.jadav@intel.com> wrote:
> > On Fri, May 30, 2025 at 07:49:26PM +0200, Rafael J. Wysocki wrote:
> > > On Fri, May 30, 2025 at 7:23 PM Raag Jadav <raag.jadav@intel.com> wrote:
> > > > On Fri, May 23, 2025 at 05:23:10PM +0200, Rafael J. Wysocki wrote:
> > > > > On Wed, May 21, 2025 at 1:27 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > On Wed, May 21, 2025 at 10:54 AM Raag Jadav <raag.jadav@intel.com> wrote:
> > > > > > > On Tue, May 20, 2025 at 01:56:28PM -0500, Mario Limonciello wrote:
> > > > > > > > On 5/20/2025 1:42 PM, Raag Jadav wrote:
> > > > > > > > > On Tue, May 20, 2025 at 12:39:12PM -0500, Mario Limonciello wrote:
> >
> > ...
> >
> > > > > > > > From the driver perspective it does have expectations that the parts outside
> > > > > > > > the driver did the right thing.  If the driver was expecting the root port
> > > > > > > > to be powered down at suspend and it wasn't there are hardware components
> > > > > > > > that didn't power cycle and that's what we're seeing here.
> > > > > > >
> > > > > > > Which means the expectation set by the driver is the opposite of the
> > > > > > > purpose of this patch, and it's going to fail if any kind of error is
> > > > > > > detected under root port during suspend.
> > > > > >
> > > > > > And IMV this driver's expectation is questionable at least.
> > > > > >
> > > > > > There is no promise whatsoever that the device will always be put into
> > > > > > D3cold during system suspend.
> > > > >
> > > > > For instance, user space may disable D3cold for any PCI device via the
> > > > > d3cold_allowed attribute in sysfs.
> > > > >
> > > > > If the driver cannot handle this, it needs to be fixed.
> > > >
> > > > Thanks for confirming. So should we consider this patch to be valid
> > > > and worth moving forward?
> > >
> > > It doesn't do anything that would be invalid in principle IMV.
> > >
> > > You need to consider one more thing, though: It may be necessary to
> > > power-cycle the device in order to kick it out of the erroneous state
> > > and the patch effectively blocks this if I'm not mistaken.
> > >
> > > But admittedly I'm not sure if this really matters.
> >
> > Wouldn't something like bus reset (SBR) be more predictable?
> 
> Maybe.
> 
> The device state is most likely inconsistent in that case, so it depends.

My limited understanding is that if SBR doesn't help, at that point all
bets are off including PMCSR configuration and probably a cold boot is
needed.

Please correct me if I've misunderstood.

Raag

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-06-05 11:44                                   ` Raag Jadav
@ 2025-06-05 12:26                                     ` Rafael J. Wysocki
  2025-06-10 13:44                                       ` Raag Jadav
  0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2025-06-05 12:26 UTC (permalink / raw)
  To: Raag Jadav
  Cc: Rafael J. Wysocki, Mario Limonciello, Denis Benato, mahesh,
	oohall, bhelgaas, linux-pci, linux-pm, linux-kernel,
	ilpo.jarvinen, lukas, aravind.iddamsetty,
	amd-gfx@lists.freedesktop.org, Alex Deucher

On Thu, Jun 5, 2025 at 1:44 PM Raag Jadav <raag.jadav@intel.com> wrote:
>
> On Wed, Jun 04, 2025 at 08:19:34PM +0200, Rafael J. Wysocki wrote:
> > On Wed, Jun 4, 2025 at 5:43 PM Raag Jadav <raag.jadav@intel.com> wrote:
> > > On Fri, May 30, 2025 at 07:49:26PM +0200, Rafael J. Wysocki wrote:
> > > > On Fri, May 30, 2025 at 7:23 PM Raag Jadav <raag.jadav@intel.com> wrote:
> > > > > On Fri, May 23, 2025 at 05:23:10PM +0200, Rafael J. Wysocki wrote:
> > > > > > On Wed, May 21, 2025 at 1:27 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > > On Wed, May 21, 2025 at 10:54 AM Raag Jadav <raag.jadav@intel.com> wrote:
> > > > > > > > On Tue, May 20, 2025 at 01:56:28PM -0500, Mario Limonciello wrote:
> > > > > > > > > On 5/20/2025 1:42 PM, Raag Jadav wrote:
> > > > > > > > > > On Tue, May 20, 2025 at 12:39:12PM -0500, Mario Limonciello wrote:
> > >
> > > ...
> > >
> > > > > > > > > From the driver perspective it does have expectations that the parts outside
> > > > > > > > > the driver did the right thing.  If the driver was expecting the root port
> > > > > > > > > to be powered down at suspend and it wasn't there are hardware components
> > > > > > > > > that didn't power cycle and that's what we're seeing here.
> > > > > > > >
> > > > > > > > Which means the expectation set by the driver is the opposite of the
> > > > > > > > purpose of this patch, and it's going to fail if any kind of error is
> > > > > > > > detected under root port during suspend.
> > > > > > >
> > > > > > > And IMV this driver's expectation is questionable at least.
> > > > > > >
> > > > > > > There is no promise whatsoever that the device will always be put into
> > > > > > > D3cold during system suspend.
> > > > > >
> > > > > > For instance, user space may disable D3cold for any PCI device via the
> > > > > > d3cold_allowed attribute in sysfs.
> > > > > >
> > > > > > If the driver cannot handle this, it needs to be fixed.
> > > > >
> > > > > Thanks for confirming. So should we consider this patch to be valid
> > > > > and worth moving forward?
> > > >
> > > > It doesn't do anything that would be invalid in principle IMV.
> > > >
> > > > You need to consider one more thing, though: It may be necessary to
> > > > power-cycle the device in order to kick it out of the erroneous state
> > > > and the patch effectively blocks this if I'm not mistaken.
> > > >
> > > > But admittedly I'm not sure if this really matters.
> > >
> > > Wouldn't something like bus reset (SBR) be more predictable?
> >
> > Maybe.
> >
> > The device state is most likely inconsistent in that case, so it depends.
>
> My limited understanding is that if SBR doesn't help, at that point all
> bets are off including PMCSR configuration and probably a cold boot is
> needed.

I'm not talking about PMCSR, I'm talking about power removal (D3cold).
This should be equivalent to a cold boot for the particular device
except that cold boot would also reset the hierarchy above it.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-06-05 12:26                                     ` Rafael J. Wysocki
@ 2025-06-10 13:44                                       ` Raag Jadav
  2025-06-10 13:53                                         ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Raag Jadav @ 2025-06-10 13:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mario Limonciello, Denis Benato, mahesh, oohall, bhelgaas,
	linux-pci, linux-pm, linux-kernel, ilpo.jarvinen, lukas,
	aravind.iddamsetty, amd-gfx@lists.freedesktop.org, Alex Deucher

On Thu, Jun 05, 2025 at 02:26:05PM +0200, Rafael J. Wysocki wrote:
> On Thu, Jun 5, 2025 at 1:44 PM Raag Jadav <raag.jadav@intel.com> wrote:
> > On Wed, Jun 04, 2025 at 08:19:34PM +0200, Rafael J. Wysocki wrote:
> > > On Wed, Jun 4, 2025 at 5:43 PM Raag Jadav <raag.jadav@intel.com> wrote:
> > > > On Fri, May 30, 2025 at 07:49:26PM +0200, Rafael J. Wysocki wrote:
> > > > > On Fri, May 30, 2025 at 7:23 PM Raag Jadav <raag.jadav@intel.com> wrote:
> > > > > > On Fri, May 23, 2025 at 05:23:10PM +0200, Rafael J. Wysocki wrote:
> > > > > > > On Wed, May 21, 2025 at 1:27 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > > > On Wed, May 21, 2025 at 10:54 AM Raag Jadav <raag.jadav@intel.com> wrote:
> > > > > > > > > On Tue, May 20, 2025 at 01:56:28PM -0500, Mario Limonciello wrote:
> > > > > > > > > > On 5/20/2025 1:42 PM, Raag Jadav wrote:
> > > > > > > > > > > On Tue, May 20, 2025 at 12:39:12PM -0500, Mario Limonciello wrote:
> > > >
> > > > ...
> > > >
> > > > > > > > > > From the driver perspective it does have expectations that the parts outside
> > > > > > > > > > the driver did the right thing.  If the driver was expecting the root port
> > > > > > > > > > to be powered down at suspend and it wasn't there are hardware components
> > > > > > > > > > that didn't power cycle and that's what we're seeing here.
> > > > > > > > >
> > > > > > > > > Which means the expectation set by the driver is the opposite of the
> > > > > > > > > purpose of this patch, and it's going to fail if any kind of error is
> > > > > > > > > detected under root port during suspend.
> > > > > > > >
> > > > > > > > And IMV this driver's expectation is questionable at least.
> > > > > > > >
> > > > > > > > There is no promise whatsoever that the device will always be put into
> > > > > > > > D3cold during system suspend.
> > > > > > >
> > > > > > > For instance, user space may disable D3cold for any PCI device via the
> > > > > > > d3cold_allowed attribute in sysfs.
> > > > > > >
> > > > > > > If the driver cannot handle this, it needs to be fixed.
> > > > > >
> > > > > > Thanks for confirming. So should we consider this patch to be valid
> > > > > > and worth moving forward?
> > > > >
> > > > > It doesn't do anything that would be invalid in principle IMV.
> > > > >
> > > > > You need to consider one more thing, though: It may be necessary to
> > > > > power-cycle the device in order to kick it out of the erroneous state
> > > > > and the patch effectively blocks this if I'm not mistaken.
> > > > >
> > > > > But admittedly I'm not sure if this really matters.
> > > >
> > > > Wouldn't something like bus reset (SBR) be more predictable?
> > >
> > > Maybe.
> > >
> > > The device state is most likely inconsistent in that case, so it depends.
> >
> > My limited understanding is that if SBR doesn't help, at that point all
> > bets are off including PMCSR configuration and probably a cold boot is
> > needed.
> 
> I'm not talking about PMCSR, I'm talking about power removal (D3cold).
> This should be equivalent to a cold boot for the particular device
> except that cold boot would also reset the hierarchy above it.

Sure. But for D3cold we rely on everything else under root port to also
be suspended, which we can't predict while in endpoint suspend path. And
with that we're back to "no guarantees" problem, which was the case either
way. The patch might just prevent any further damage than what's already
done.

Raag

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-06-10 13:44                                       ` Raag Jadav
@ 2025-06-10 13:53                                         ` Rafael J. Wysocki
  2025-06-20 12:14                                           ` Raag Jadav
  0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2025-06-10 13:53 UTC (permalink / raw)
  To: Raag Jadav
  Cc: Rafael J. Wysocki, Mario Limonciello, Denis Benato, mahesh,
	oohall, bhelgaas, linux-pci, linux-pm, linux-kernel,
	ilpo.jarvinen, lukas, aravind.iddamsetty,
	amd-gfx@lists.freedesktop.org, Alex Deucher

On Tue, Jun 10, 2025 at 3:44 PM Raag Jadav <raag.jadav@intel.com> wrote:
>
> On Thu, Jun 05, 2025 at 02:26:05PM +0200, Rafael J. Wysocki wrote:
> > On Thu, Jun 5, 2025 at 1:44 PM Raag Jadav <raag.jadav@intel.com> wrote:
> > > On Wed, Jun 04, 2025 at 08:19:34PM +0200, Rafael J. Wysocki wrote:
> > > > On Wed, Jun 4, 2025 at 5:43 PM Raag Jadav <raag.jadav@intel.com> wrote:
> > > > > On Fri, May 30, 2025 at 07:49:26PM +0200, Rafael J. Wysocki wrote:
> > > > > > On Fri, May 30, 2025 at 7:23 PM Raag Jadav <raag.jadav@intel.com> wrote:
> > > > > > > On Fri, May 23, 2025 at 05:23:10PM +0200, Rafael J. Wysocki wrote:
> > > > > > > > On Wed, May 21, 2025 at 1:27 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > > > > On Wed, May 21, 2025 at 10:54 AM Raag Jadav <raag.jadav@intel.com> wrote:
> > > > > > > > > > On Tue, May 20, 2025 at 01:56:28PM -0500, Mario Limonciello wrote:
> > > > > > > > > > > On 5/20/2025 1:42 PM, Raag Jadav wrote:
> > > > > > > > > > > > On Tue, May 20, 2025 at 12:39:12PM -0500, Mario Limonciello wrote:
> > > > >
> > > > > ...
> > > > >
> > > > > > > > > > > From the driver perspective it does have expectations that the parts outside
> > > > > > > > > > > the driver did the right thing.  If the driver was expecting the root port
> > > > > > > > > > > to be powered down at suspend and it wasn't there are hardware components
> > > > > > > > > > > that didn't power cycle and that's what we're seeing here.
> > > > > > > > > >
> > > > > > > > > > Which means the expectation set by the driver is the opposite of the
> > > > > > > > > > purpose of this patch, and it's going to fail if any kind of error is
> > > > > > > > > > detected under root port during suspend.
> > > > > > > > >
> > > > > > > > > And IMV this driver's expectation is questionable at least.
> > > > > > > > >
> > > > > > > > > There is no promise whatsoever that the device will always be put into
> > > > > > > > > D3cold during system suspend.
> > > > > > > >
> > > > > > > > For instance, user space may disable D3cold for any PCI device via the
> > > > > > > > d3cold_allowed attribute in sysfs.
> > > > > > > >
> > > > > > > > If the driver cannot handle this, it needs to be fixed.
> > > > > > >
> > > > > > > Thanks for confirming. So should we consider this patch to be valid
> > > > > > > and worth moving forward?
> > > > > >
> > > > > > It doesn't do anything that would be invalid in principle IMV.
> > > > > >
> > > > > > You need to consider one more thing, though: It may be necessary to
> > > > > > power-cycle the device in order to kick it out of the erroneous state
> > > > > > and the patch effectively blocks this if I'm not mistaken.
> > > > > >
> > > > > > But admittedly I'm not sure if this really matters.
> > > > >
> > > > > Wouldn't something like bus reset (SBR) be more predictable?
> > > >
> > > > Maybe.
> > > >
> > > > The device state is most likely inconsistent in that case, so it depends.
> > >
> > > My limited understanding is that if SBR doesn't help, at that point all
> > > bets are off including PMCSR configuration and probably a cold boot is
> > > needed.
> >
> > I'm not talking about PMCSR, I'm talking about power removal (D3cold).
> > This should be equivalent to a cold boot for the particular device
> > except that cold boot would also reset the hierarchy above it.
>
> Sure. But for D3cold we rely on everything else under root port to also
> be suspended, which we can't predict while in endpoint suspend path. And
> with that we're back to "no guarantees" problem, which was the case either
> way. The patch might just prevent any further damage than what's already
> done.

Fair enough.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
  2025-06-10 13:53                                         ` Rafael J. Wysocki
@ 2025-06-20 12:14                                           ` Raag Jadav
  0 siblings, 0 replies; 27+ messages in thread
From: Raag Jadav @ 2025-06-20 12:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mario Limonciello, Denis Benato, mahesh, oohall, bhelgaas,
	linux-pci, linux-pm, linux-kernel, ilpo.jarvinen, lukas,
	aravind.iddamsetty, amd-gfx@lists.freedesktop.org, Alex Deucher

On Tue, Jun 10, 2025 at 03:53:10PM +0200, Rafael J. Wysocki wrote:
> On Tue, Jun 10, 2025 at 3:44 PM Raag Jadav <raag.jadav@intel.com> wrote:
> > On Thu, Jun 05, 2025 at 02:26:05PM +0200, Rafael J. Wysocki wrote:
> > > On Thu, Jun 5, 2025 at 1:44 PM Raag Jadav <raag.jadav@intel.com> wrote:
> > > > On Wed, Jun 04, 2025 at 08:19:34PM +0200, Rafael J. Wysocki wrote:
> > > > > On Wed, Jun 4, 2025 at 5:43 PM Raag Jadav <raag.jadav@intel.com> wrote:
> > > > > > On Fri, May 30, 2025 at 07:49:26PM +0200, Rafael J. Wysocki wrote:
> > > > > > > On Fri, May 30, 2025 at 7:23 PM Raag Jadav <raag.jadav@intel.com> wrote:
> > > > > > > > On Fri, May 23, 2025 at 05:23:10PM +0200, Rafael J. Wysocki wrote:
> > > > > > > > > On Wed, May 21, 2025 at 1:27 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > > > > > On Wed, May 21, 2025 at 10:54 AM Raag Jadav <raag.jadav@intel.com> wrote:
> > > > > > > > > > > On Tue, May 20, 2025 at 01:56:28PM -0500, Mario Limonciello wrote:
> > > > > > > > > > > > On 5/20/2025 1:42 PM, Raag Jadav wrote:
> > > > > > > > > > > > > On Tue, May 20, 2025 at 12:39:12PM -0500, Mario Limonciello wrote:
> > > > > >
> > > > > > ...
> > > > > >
> > > > > > > > > > > > From the driver perspective it does have expectations that the parts outside
> > > > > > > > > > > > the driver did the right thing.  If the driver was expecting the root port
> > > > > > > > > > > > to be powered down at suspend and it wasn't there are hardware components
> > > > > > > > > > > > that didn't power cycle and that's what we're seeing here.
> > > > > > > > > > >
> > > > > > > > > > > Which means the expectation set by the driver is the opposite of the
> > > > > > > > > > > purpose of this patch, and it's going to fail if any kind of error is
> > > > > > > > > > > detected under root port during suspend.
> > > > > > > > > >
> > > > > > > > > > And IMV this driver's expectation is questionable at least.
> > > > > > > > > >
> > > > > > > > > > There is no promise whatsoever that the device will always be put into
> > > > > > > > > > D3cold during system suspend.
> > > > > > > > >
> > > > > > > > > For instance, user space may disable D3cold for any PCI device via the
> > > > > > > > > d3cold_allowed attribute in sysfs.
> > > > > > > > >
> > > > > > > > > If the driver cannot handle this, it needs to be fixed.
> > > > > > > >
> > > > > > > > Thanks for confirming. So should we consider this patch to be valid
> > > > > > > > and worth moving forward?
> > > > > > >
> > > > > > > It doesn't do anything that would be invalid in principle IMV.
> > > > > > >
> > > > > > > You need to consider one more thing, though: It may be necessary to
> > > > > > > power-cycle the device in order to kick it out of the erroneous state
> > > > > > > and the patch effectively blocks this if I'm not mistaken.
> > > > > > >
> > > > > > > But admittedly I'm not sure if this really matters.
> > > > > >
> > > > > > Wouldn't something like bus reset (SBR) be more predictable?
> > > > >
> > > > > Maybe.
> > > > >
> > > > > The device state is most likely inconsistent in that case, so it depends.
> > > >
> > > > My limited understanding is that if SBR doesn't help, at that point all
> > > > bets are off including PMCSR configuration and probably a cold boot is
> > > > needed.
> > >
> > > I'm not talking about PMCSR, I'm talking about power removal (D3cold).
> > > This should be equivalent to a cold boot for the particular device
> > > except that cold boot would also reset the hierarchy above it.
> >
> > Sure. But for D3cold we rely on everything else under root port to also
> > be suspended, which we can't predict while in endpoint suspend path. And
> > with that we're back to "no guarantees" problem, which was the case either
> > way. The patch might just prevent any further damage than what's already
> > done.
> 
> Fair enough.

So anything I can do to move this forward?
Sorry I didn't include your tag since I changed the core logic.

Raag

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2025-06-20 12:14 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-19 10:28 [PATCH v4] PCI: Prevent power state transition of erroneous device Raag Jadav
2025-05-19 10:41 ` Raag Jadav
2025-05-19 21:42   ` Denis Benato
2025-05-20  9:48     ` Raag Jadav
2025-05-20 15:23       ` Mario Limonciello
2025-05-20 15:47         ` Raag Jadav
2025-05-20 15:49           ` Mario Limonciello
2025-05-20 17:22             ` Denis Benato
2025-05-20 17:39               ` Mario Limonciello
2025-05-20 18:42                 ` Raag Jadav
2025-05-20 18:56                   ` Mario Limonciello
2025-05-21  8:54                     ` Raag Jadav
2025-05-21 11:27                       ` Rafael J. Wysocki
2025-05-23 15:23                         ` Rafael J. Wysocki
2025-05-30 17:23                           ` Raag Jadav
2025-05-30 17:49                             ` Rafael J. Wysocki
2025-06-04 15:42                               ` Raag Jadav
2025-06-04 18:19                                 ` Rafael J. Wysocki
2025-06-05 11:44                                   ` Raag Jadav
2025-06-05 12:26                                     ` Rafael J. Wysocki
2025-06-10 13:44                                       ` Raag Jadav
2025-06-10 13:53                                         ` Rafael J. Wysocki
2025-06-20 12:14                                           ` Raag Jadav
2025-05-21 13:39               ` Lukas Wunner
2025-05-21 17:06                 ` Mario Limonciello
2025-05-21 20:28                   ` Denis Benato
2025-05-22  7:31                     ` Lukas Wunner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).