Linux Power Management development
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Fix sticky -EINVAL after resume callback failure
@ 2026-07-03 15:05 Praveen Talari
  2026-07-03 15:05 ` [PATCH v2 1/2] PM: runtime: Only set runtime_error on suspend callback failures Praveen Talari
  2026-07-03 15:05 ` [PATCH v2 2/2] spi: qcom-geni: Fix missing error check on pm_runtime_get_sync() Praveen Talari
  0 siblings, 2 replies; 5+ messages in thread
From: Praveen Talari @ 2026-07-03 15:05 UTC (permalink / raw)
  To: Mark Brown, Dilip Kota, Stephen Boyd, Girish Mahadevan,
	Alok Chauhan, bjorn.andersson, Konrad Dybcio, Rafael J. Wysocki,
	Len Brown, Pavel Machek, Greg Kroah-Hartman, Danilo Krummrich,
	Douglas Anderson
  Cc: linux-spi, linux-kernel, linux-arm-msm, Mukesh Kumar Savaliya,
	aniket.randive, chandana.chiluveru, jyothi.seerapu, linux-pm,
	driver-core, Praveen Talari

When a runtime PM resume callback returns an error, rpm_callback() sets
power.runtime_error on the device.  All subsequent rpm_resume() calls
then return -EINVAL immediately at the top of the function, permanently
blocking any future resume attempt — including those triggered by
consumers trying to power up their suppliers — until runtime PM is
explicitly re-initialized.

Unlike suspend failures, resume failures should be retryable.  The first
patch fixes this in the core by only setting power.runtime_error when a
suspend callback fails, leaving resume failures transient by nature.

The second patch fixes a pre-existing issue in the spi-geni-qcom driver
that this scenario exposed: pm_runtime_get_sync() was called in
spi_geni_init() without checking the return value, so a resume failure
would silently proceed to access hardware registers on a device that was
not powered up.

Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
---
Changes in v2:
- Reworked the fix per maintainer feedback: instead of calling
  pm_runtime_set_suspended() in rpm_get_suppliers(), fix the root cause
  in rpm_callback() by not setting power.runtime_error on resume
  callback failures.
- Removed other patches which are not needed now.
- Link to v1: https://patch.msgid.link/20260702-fix_sticky_-einval_after_pm_runtime_api_failure-v1-0-6ddc317011c0@oss.qualcomm.com

---
Praveen Talari (2):
      PM: runtime: Only set runtime_error on suspend callback failures
      spi: qcom-geni: Fix missing error check on pm_runtime_get_sync()

 drivers/base/power/runtime.c |  8 +++++++-
 drivers/spi/spi-geni-qcom.c  | 17 ++++++++++-------
 2 files changed, 17 insertions(+), 8 deletions(-)
---
base-commit: 4e5dfb7c84012007c3c7061126491bbc92d71bf1
change-id: 20260625-fix_sticky_-einval_after_pm_runtime_api_failure-6797d0a5c4d0

Best regards,
--  
Praveen Talari <praveen.talari@oss.qualcomm.com>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/2] PM: runtime: Only set runtime_error on suspend callback failures
  2026-07-03 15:05 [PATCH v2 0/2] Fix sticky -EINVAL after resume callback failure Praveen Talari
@ 2026-07-03 15:05 ` Praveen Talari
  2026-07-03 20:23   ` Rafael J. Wysocki (Intel)
  2026-07-03 15:05 ` [PATCH v2 2/2] spi: qcom-geni: Fix missing error check on pm_runtime_get_sync() Praveen Talari
  1 sibling, 1 reply; 5+ messages in thread
From: Praveen Talari @ 2026-07-03 15:05 UTC (permalink / raw)
  To: Mark Brown, Dilip Kota, Stephen Boyd, Girish Mahadevan,
	Alok Chauhan, bjorn.andersson, Konrad Dybcio, Rafael J. Wysocki,
	Len Brown, Pavel Machek, Greg Kroah-Hartman, Danilo Krummrich,
	Douglas Anderson
  Cc: linux-spi, linux-kernel, linux-arm-msm, Mukesh Kumar Savaliya,
	aniket.randive, chandana.chiluveru, jyothi.seerapu, linux-pm,
	driver-core, Praveen Talari

When a runtime resume callback returns an error, rpm_callback() sets
power.runtime_error on the device.  This causes all subsequent calls to
rpm_resume() to return -EINVAL immediately at the top of the function
without invoking the callback again, making the failure permanent until
runtime PM is explicitly re-initialized.

Unlike suspend failures, resume failures should be retryable.  If a
device's resume callback fails transiently, there is no reason to
permanently block future resume attempts on that device and all of its
consumers.

Fix this by conditioning the power.runtime_error assignment in
rpm_callback() on the device being in the RPM_SUSPENDING state.  At the
point rpm_callback() runs, __update_runtime_status() has already set the
device status to either RPM_SUSPENDING or RPM_RESUMING, so the two paths
are reliably distinguishable.  Suspend callback failures continue to set
power.runtime_error as before.  Resume callback failures return the error
to the caller but leave power.runtime_error clear, allowing the next
resume attempt to invoke the callback normally.

Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
---
 drivers/base/power/runtime.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 335288e8b5b3..70d933bcd295 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -469,7 +469,13 @@ static int rpm_callback(int (*cb)(struct device *), struct device *dev)
 	if (retval == -EACCES)
 		retval = -EAGAIN;
 
-	if (retval != -EAGAIN && retval != -EBUSY)
+	/*
+	 * Only stick the error on suspend failures.  Resume failures are not
+	 * treated as permanent so that the next resume attempt will run the
+	 * callback again rather than short-circuiting on runtime_error.
+	 */
+	if (retval != -EAGAIN && retval != -EBUSY &&
+	    dev->power.runtime_status == RPM_SUSPENDING)
 		dev->power.runtime_error = retval;
 
 	return retval;

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 2/2] spi: qcom-geni: Fix missing error check on pm_runtime_get_sync()
  2026-07-03 15:05 [PATCH v2 0/2] Fix sticky -EINVAL after resume callback failure Praveen Talari
  2026-07-03 15:05 ` [PATCH v2 1/2] PM: runtime: Only set runtime_error on suspend callback failures Praveen Talari
@ 2026-07-03 15:05 ` Praveen Talari
  1 sibling, 0 replies; 5+ messages in thread
From: Praveen Talari @ 2026-07-03 15:05 UTC (permalink / raw)
  To: Mark Brown, Dilip Kota, Stephen Boyd, Girish Mahadevan,
	Alok Chauhan, bjorn.andersson, Konrad Dybcio, Rafael J. Wysocki,
	Len Brown, Pavel Machek, Greg Kroah-Hartman, Danilo Krummrich,
	Douglas Anderson
  Cc: linux-spi, linux-kernel, linux-arm-msm, Mukesh Kumar Savaliya,
	aniket.randive, chandana.chiluveru, jyothi.seerapu, linux-pm,
	driver-core, Praveen Talari

spi_geni_init() calls pm_runtime_get_sync() to power up the device
before accessing hardware registers, but never checks the return value.
If the runtime resume fails, the function silently proceeds to read and
write hardware registers on a device that may not be powered up, leading
to register access faults.

Fix this by replacing pm_runtime_get_sync() with the
PM_RUNTIME_ACQUIRE_IF_ENABLED() macro and checking the result via
PM_RUNTIME_ACQUIRE_ERR(), propagating any error back to the caller
immediately before any hardware access occurs.

Since the macro handles its own cleanup on failure, the out_pm label and
the corresponding pm_runtime_put() call are no longer needed. Replace
all goto out_pm paths with direct return ret statements and remove the
label entirely.

Fixes: 561de45f72bd ("spi: spi-geni-qcom: Add SPI driver support for GENI based QUP")
Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
---
 drivers/spi/spi-geni-qcom.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c
index 26e723cfea61..a55a3afc0ebd 100644
--- a/drivers/spi/spi-geni-qcom.c
+++ b/drivers/spi/spi-geni-qcom.c
@@ -613,25 +613,30 @@ static int spi_geni_init(struct spi_geni_master *mas)
 	u32 spi_tx_cfg, fifo_disable;
 	int ret = -ENXIO;
 
-	pm_runtime_get_sync(mas->dev);
+	PM_RUNTIME_ACQUIRE_IF_ENABLED(mas->dev, pm);
+	ret = PM_RUNTIME_ACQUIRE_ERR(&pm);
+	if (ret < 0) {
+		dev_err(mas->dev, "Failed to resume and get %d\n", ret);
+		return ret;
+	}
 
 	proto = geni_se_read_proto(se);
 
 	if (spi->target) {
 		if (proto != GENI_SE_SPI_SLAVE) {
 			dev_err(mas->dev, "Invalid proto %d\n", proto);
-			goto out_pm;
+			return ret;
 		}
 		spi_slv_setup(mas);
 	} else if (proto == GENI_SE_INVALID_PROTO) {
 		ret = geni_load_se_firmware(se, GENI_SE_SPI);
 		if (ret) {
 			dev_err(mas->dev, "spi master firmware load failed ret: %d\n", ret);
-			goto out_pm;
+			return ret;
 		}
 	} else if (proto != GENI_SE_SPI) {
 		dev_err(mas->dev, "Invalid proto %d\n", proto);
-		goto out_pm;
+		return ret;
 	}
 	mas->tx_fifo_depth = geni_se_get_tx_fifo_depth(se);
 
@@ -664,7 +669,7 @@ static int spi_geni_init(struct spi_geni_master *mas)
 			dev_dbg(mas->dev, "Using GPI DMA mode for SPI\n");
 			break;
 		} else if (ret == -EPROBE_DEFER) {
-			goto out_pm;
+			return ret;
 		}
 		/*
 		 * in case of failure to get gpi dma channel, we can still do the
@@ -693,8 +698,6 @@ static int spi_geni_init(struct spi_geni_master *mas)
 		writel(spi_tx_cfg, se->base + SE_SPI_TRANS_CFG);
 	}
 
-out_pm:
-	pm_runtime_put(mas->dev);
 	return ret;
 }
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/2] PM: runtime: Only set runtime_error on suspend callback failures
  2026-07-03 15:05 ` [PATCH v2 1/2] PM: runtime: Only set runtime_error on suspend callback failures Praveen Talari
@ 2026-07-03 20:23   ` Rafael J. Wysocki (Intel)
  2026-07-04  5:50     ` Praveen Talari
  0 siblings, 1 reply; 5+ messages in thread
From: Rafael J. Wysocki (Intel) @ 2026-07-03 20:23 UTC (permalink / raw)
  To: Praveen Talari
  Cc: Mark Brown, Dilip Kota, Stephen Boyd, Girish Mahadevan,
	Alok Chauhan, bjorn.andersson, Konrad Dybcio, Rafael J. Wysocki,
	Len Brown, Pavel Machek, Greg Kroah-Hartman, Danilo Krummrich,
	Douglas Anderson, linux-spi, linux-kernel, linux-arm-msm,
	Mukesh Kumar Savaliya, aniket.randive, chandana.chiluveru,
	jyothi.seerapu, linux-pm, driver-core

On Fri, Jul 3, 2026 at 5:05 PM Praveen Talari
<praveen.talari@oss.qualcomm.com> wrote:
>
> When a runtime resume callback returns an error, rpm_callback() sets
> power.runtime_error on the device.  This causes all subsequent calls to
> rpm_resume() to return -EINVAL immediately at the top of the function
> without invoking the callback again, making the failure permanent until
> runtime PM is explicitly re-initialized.
>
> Unlike suspend failures, resume failures should be retryable.  If a
> device's resume callback fails transiently, there is no reason to
> permanently block future resume attempts on that device and all of its
> consumers.
>
> Fix this by conditioning the power.runtime_error assignment in
> rpm_callback() on the device being in the RPM_SUSPENDING state.  At the
> point rpm_callback() runs, __update_runtime_status() has already set the
> device status to either RPM_SUSPENDING or RPM_RESUMING, so the two paths
> are reliably distinguishable.  Suspend callback failures continue to set
> power.runtime_error as before.  Resume callback failures return the error
> to the caller but leave power.runtime_error clear, allowing the next
> resume attempt to invoke the callback normally.
>
> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
> ---
>  drivers/base/power/runtime.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 335288e8b5b3..70d933bcd295 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -469,7 +469,13 @@ static int rpm_callback(int (*cb)(struct device *), struct device *dev)
>         if (retval == -EACCES)
>                 retval = -EAGAIN;
>
> -       if (retval != -EAGAIN && retval != -EBUSY)
> +       /*
> +        * Only stick the error on suspend failures.  Resume failures are not
> +        * treated as permanent so that the next resume attempt will run the
> +        * callback again rather than short-circuiting on runtime_error.
> +        */
> +       if (retval != -EAGAIN && retval != -EBUSY &&
> +           dev->power.runtime_status == RPM_SUSPENDING)
>                 dev->power.runtime_error = retval;

Why don't you move this check to rpm_suspend()?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/2] PM: runtime: Only set runtime_error on suspend callback failures
  2026-07-03 20:23   ` Rafael J. Wysocki (Intel)
@ 2026-07-04  5:50     ` Praveen Talari
  0 siblings, 0 replies; 5+ messages in thread
From: Praveen Talari @ 2026-07-04  5:50 UTC (permalink / raw)
  To: Rafael J. Wysocki (Intel)
  Cc: Mark Brown, Dilip Kota, Stephen Boyd, Girish Mahadevan,
	Alok Chauhan, bjorn.andersson, Konrad Dybcio, Len Brown,
	Pavel Machek, Greg Kroah-Hartman, Danilo Krummrich,
	Douglas Anderson, linux-spi, linux-kernel, linux-arm-msm,
	Mukesh Kumar Savaliya, aniket.randive, chandana.chiluveru,
	jyothi.seerapu, linux-pm, driver-core

Hi Rafael,

On 04-07-2026 01:53, Rafael J. Wysocki (Intel) wrote:
> On Fri, Jul 3, 2026 at 5:05 PM Praveen Talari
> <praveen.talari@oss.qualcomm.com> wrote:
>> When a runtime resume callback returns an error, rpm_callback() sets
>> power.runtime_error on the device.  This causes all subsequent calls to
>> rpm_resume() to return -EINVAL immediately at the top of the function
>> without invoking the callback again, making the failure permanent until
>> runtime PM is explicitly re-initialized.
>>
>> Unlike suspend failures, resume failures should be retryable.  If a
>> device's resume callback fails transiently, there is no reason to
>> permanently block future resume attempts on that device and all of its
>> consumers.
>>
>> Fix this by conditioning the power.runtime_error assignment in
>> rpm_callback() on the device being in the RPM_SUSPENDING state.  At the
>> point rpm_callback() runs, __update_runtime_status() has already set the
>> device status to either RPM_SUSPENDING or RPM_RESUMING, so the two paths
>> are reliably distinguishable.  Suspend callback failures continue to set
>> power.runtime_error as before.  Resume callback failures return the error
>> to the caller but leave power.runtime_error clear, allowing the next
>> resume attempt to invoke the callback normally.
>>
>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>> ---
>>   drivers/base/power/runtime.c | 8 +++++++-
>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
>> index 335288e8b5b3..70d933bcd295 100644
>> --- a/drivers/base/power/runtime.c
>> +++ b/drivers/base/power/runtime.c
>> @@ -469,7 +469,13 @@ static int rpm_callback(int (*cb)(struct device *), struct device *dev)
>>          if (retval == -EACCES)
>>                  retval = -EAGAIN;
>>
>> -       if (retval != -EAGAIN && retval != -EBUSY)
>> +       /*
>> +        * Only stick the error on suspend failures.  Resume failures are not
>> +        * treated as permanent so that the next resume attempt will run the
>> +        * callback again rather than short-circuiting on runtime_error.
>> +        */
>> +       if (retval != -EAGAIN && retval != -EBUSY &&
>> +           dev->power.runtime_status == RPM_SUSPENDING)
>>                  dev->power.runtime_error = retval;
> Why don't you move this check to rpm_suspend()?
Yes, we can do that.
I hope the changes below address your concerns. Please let me know if 
there is anything else that needs to be adjusted.

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 335288e8b5b3..fab38bc98113 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -469,9 +469,6 @@ static int rpm_callback(int (*cb)(struct device *), 
struct device *dev)
         if (retval == -EACCES)
                 retval = -EAGAIN;

-       if (retval != -EAGAIN && retval != -EBUSY)
-               dev->power.runtime_error = retval;
-
         return retval;
  }

@@ -751,6 +748,9 @@ static int rpm_suspend(struct device *dev, int rpmflags)
         dev->power.deferred_resume = false;
         wake_up_all(&dev->power.wait_queue);

+       if (retval != -EAGAIN && retval != -EBUSY)
+               dev->power.runtime_error = retval;

+


Thanks,

Praveen Talari



^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-07-04  5:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-03 15:05 [PATCH v2 0/2] Fix sticky -EINVAL after resume callback failure Praveen Talari
2026-07-03 15:05 ` [PATCH v2 1/2] PM: runtime: Only set runtime_error on suspend callback failures Praveen Talari
2026-07-03 20:23   ` Rafael J. Wysocki (Intel)
2026-07-04  5:50     ` Praveen Talari
2026-07-03 15:05 ` [PATCH v2 2/2] spi: qcom-geni: Fix missing error check on pm_runtime_get_sync() Praveen Talari

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox