* [PATCH v2 0/2] Fix sticky -EINVAL after resume callback failure
@ 2026-07-03 15:05 Praveen Talari
2026-07-03 15:05 ` [PATCH v2 1/2] PM: runtime: Only set runtime_error on suspend callback failures Praveen Talari
2026-07-03 15:05 ` [PATCH v2 2/2] spi: qcom-geni: Fix missing error check on pm_runtime_get_sync() Praveen Talari
0 siblings, 2 replies; 5+ messages in thread
From: Praveen Talari @ 2026-07-03 15:05 UTC (permalink / raw)
To: Mark Brown, Dilip Kota, Stephen Boyd, Girish Mahadevan,
Alok Chauhan, bjorn.andersson, Konrad Dybcio, Rafael J. Wysocki,
Len Brown, Pavel Machek, Greg Kroah-Hartman, Danilo Krummrich,
Douglas Anderson
Cc: linux-spi, linux-kernel, linux-arm-msm, Mukesh Kumar Savaliya,
aniket.randive, chandana.chiluveru, jyothi.seerapu, linux-pm,
driver-core, Praveen Talari
When a runtime PM resume callback returns an error, rpm_callback() sets
power.runtime_error on the device. All subsequent rpm_resume() calls
then return -EINVAL immediately at the top of the function, permanently
blocking any future resume attempt — including those triggered by
consumers trying to power up their suppliers — until runtime PM is
explicitly re-initialized.
Unlike suspend failures, resume failures should be retryable. The first
patch fixes this in the core by only setting power.runtime_error when a
suspend callback fails, leaving resume failures transient by nature.
The second patch fixes a pre-existing issue in the spi-geni-qcom driver
that this scenario exposed: pm_runtime_get_sync() was called in
spi_geni_init() without checking the return value, so a resume failure
would silently proceed to access hardware registers on a device that was
not powered up.
Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
---
Changes in v2:
- Reworked the fix per maintainer feedback: instead of calling
pm_runtime_set_suspended() in rpm_get_suppliers(), fix the root cause
in rpm_callback() by not setting power.runtime_error on resume
callback failures.
- Removed other patches which are not needed now.
- Link to v1: https://patch.msgid.link/20260702-fix_sticky_-einval_after_pm_runtime_api_failure-v1-0-6ddc317011c0@oss.qualcomm.com
---
Praveen Talari (2):
PM: runtime: Only set runtime_error on suspend callback failures
spi: qcom-geni: Fix missing error check on pm_runtime_get_sync()
drivers/base/power/runtime.c | 8 +++++++-
drivers/spi/spi-geni-qcom.c | 17 ++++++++++-------
2 files changed, 17 insertions(+), 8 deletions(-)
---
base-commit: 4e5dfb7c84012007c3c7061126491bbc92d71bf1
change-id: 20260625-fix_sticky_-einval_after_pm_runtime_api_failure-6797d0a5c4d0
Best regards,
--
Praveen Talari <praveen.talari@oss.qualcomm.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2 1/2] PM: runtime: Only set runtime_error on suspend callback failures
2026-07-03 15:05 [PATCH v2 0/2] Fix sticky -EINVAL after resume callback failure Praveen Talari
@ 2026-07-03 15:05 ` Praveen Talari
2026-07-03 20:23 ` Rafael J. Wysocki (Intel)
2026-07-03 15:05 ` [PATCH v2 2/2] spi: qcom-geni: Fix missing error check on pm_runtime_get_sync() Praveen Talari
1 sibling, 1 reply; 5+ messages in thread
From: Praveen Talari @ 2026-07-03 15:05 UTC (permalink / raw)
To: Mark Brown, Dilip Kota, Stephen Boyd, Girish Mahadevan,
Alok Chauhan, bjorn.andersson, Konrad Dybcio, Rafael J. Wysocki,
Len Brown, Pavel Machek, Greg Kroah-Hartman, Danilo Krummrich,
Douglas Anderson
Cc: linux-spi, linux-kernel, linux-arm-msm, Mukesh Kumar Savaliya,
aniket.randive, chandana.chiluveru, jyothi.seerapu, linux-pm,
driver-core, Praveen Talari
When a runtime resume callback returns an error, rpm_callback() sets
power.runtime_error on the device. This causes all subsequent calls to
rpm_resume() to return -EINVAL immediately at the top of the function
without invoking the callback again, making the failure permanent until
runtime PM is explicitly re-initialized.
Unlike suspend failures, resume failures should be retryable. If a
device's resume callback fails transiently, there is no reason to
permanently block future resume attempts on that device and all of its
consumers.
Fix this by conditioning the power.runtime_error assignment in
rpm_callback() on the device being in the RPM_SUSPENDING state. At the
point rpm_callback() runs, __update_runtime_status() has already set the
device status to either RPM_SUSPENDING or RPM_RESUMING, so the two paths
are reliably distinguishable. Suspend callback failures continue to set
power.runtime_error as before. Resume callback failures return the error
to the caller but leave power.runtime_error clear, allowing the next
resume attempt to invoke the callback normally.
Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
---
drivers/base/power/runtime.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 335288e8b5b3..70d933bcd295 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -469,7 +469,13 @@ static int rpm_callback(int (*cb)(struct device *), struct device *dev)
if (retval == -EACCES)
retval = -EAGAIN;
- if (retval != -EAGAIN && retval != -EBUSY)
+ /*
+ * Only stick the error on suspend failures. Resume failures are not
+ * treated as permanent so that the next resume attempt will run the
+ * callback again rather than short-circuiting on runtime_error.
+ */
+ if (retval != -EAGAIN && retval != -EBUSY &&
+ dev->power.runtime_status == RPM_SUSPENDING)
dev->power.runtime_error = retval;
return retval;
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v2 2/2] spi: qcom-geni: Fix missing error check on pm_runtime_get_sync()
2026-07-03 15:05 [PATCH v2 0/2] Fix sticky -EINVAL after resume callback failure Praveen Talari
2026-07-03 15:05 ` [PATCH v2 1/2] PM: runtime: Only set runtime_error on suspend callback failures Praveen Talari
@ 2026-07-03 15:05 ` Praveen Talari
1 sibling, 0 replies; 5+ messages in thread
From: Praveen Talari @ 2026-07-03 15:05 UTC (permalink / raw)
To: Mark Brown, Dilip Kota, Stephen Boyd, Girish Mahadevan,
Alok Chauhan, bjorn.andersson, Konrad Dybcio, Rafael J. Wysocki,
Len Brown, Pavel Machek, Greg Kroah-Hartman, Danilo Krummrich,
Douglas Anderson
Cc: linux-spi, linux-kernel, linux-arm-msm, Mukesh Kumar Savaliya,
aniket.randive, chandana.chiluveru, jyothi.seerapu, linux-pm,
driver-core, Praveen Talari
spi_geni_init() calls pm_runtime_get_sync() to power up the device
before accessing hardware registers, but never checks the return value.
If the runtime resume fails, the function silently proceeds to read and
write hardware registers on a device that may not be powered up, leading
to register access faults.
Fix this by replacing pm_runtime_get_sync() with the
PM_RUNTIME_ACQUIRE_IF_ENABLED() macro and checking the result via
PM_RUNTIME_ACQUIRE_ERR(), propagating any error back to the caller
immediately before any hardware access occurs.
Since the macro handles its own cleanup on failure, the out_pm label and
the corresponding pm_runtime_put() call are no longer needed. Replace
all goto out_pm paths with direct return ret statements and remove the
label entirely.
Fixes: 561de45f72bd ("spi: spi-geni-qcom: Add SPI driver support for GENI based QUP")
Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
---
drivers/spi/spi-geni-qcom.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c
index 26e723cfea61..a55a3afc0ebd 100644
--- a/drivers/spi/spi-geni-qcom.c
+++ b/drivers/spi/spi-geni-qcom.c
@@ -613,25 +613,30 @@ static int spi_geni_init(struct spi_geni_master *mas)
u32 spi_tx_cfg, fifo_disable;
int ret = -ENXIO;
- pm_runtime_get_sync(mas->dev);
+ PM_RUNTIME_ACQUIRE_IF_ENABLED(mas->dev, pm);
+ ret = PM_RUNTIME_ACQUIRE_ERR(&pm);
+ if (ret < 0) {
+ dev_err(mas->dev, "Failed to resume and get %d\n", ret);
+ return ret;
+ }
proto = geni_se_read_proto(se);
if (spi->target) {
if (proto != GENI_SE_SPI_SLAVE) {
dev_err(mas->dev, "Invalid proto %d\n", proto);
- goto out_pm;
+ return ret;
}
spi_slv_setup(mas);
} else if (proto == GENI_SE_INVALID_PROTO) {
ret = geni_load_se_firmware(se, GENI_SE_SPI);
if (ret) {
dev_err(mas->dev, "spi master firmware load failed ret: %d\n", ret);
- goto out_pm;
+ return ret;
}
} else if (proto != GENI_SE_SPI) {
dev_err(mas->dev, "Invalid proto %d\n", proto);
- goto out_pm;
+ return ret;
}
mas->tx_fifo_depth = geni_se_get_tx_fifo_depth(se);
@@ -664,7 +669,7 @@ static int spi_geni_init(struct spi_geni_master *mas)
dev_dbg(mas->dev, "Using GPI DMA mode for SPI\n");
break;
} else if (ret == -EPROBE_DEFER) {
- goto out_pm;
+ return ret;
}
/*
* in case of failure to get gpi dma channel, we can still do the
@@ -693,8 +698,6 @@ static int spi_geni_init(struct spi_geni_master *mas)
writel(spi_tx_cfg, se->base + SE_SPI_TRANS_CFG);
}
-out_pm:
- pm_runtime_put(mas->dev);
return ret;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2 1/2] PM: runtime: Only set runtime_error on suspend callback failures
2026-07-03 15:05 ` [PATCH v2 1/2] PM: runtime: Only set runtime_error on suspend callback failures Praveen Talari
@ 2026-07-03 20:23 ` Rafael J. Wysocki (Intel)
2026-07-04 5:50 ` Praveen Talari
0 siblings, 1 reply; 5+ messages in thread
From: Rafael J. Wysocki (Intel) @ 2026-07-03 20:23 UTC (permalink / raw)
To: Praveen Talari
Cc: Mark Brown, Dilip Kota, Stephen Boyd, Girish Mahadevan,
Alok Chauhan, bjorn.andersson, Konrad Dybcio, Rafael J. Wysocki,
Len Brown, Pavel Machek, Greg Kroah-Hartman, Danilo Krummrich,
Douglas Anderson, linux-spi, linux-kernel, linux-arm-msm,
Mukesh Kumar Savaliya, aniket.randive, chandana.chiluveru,
jyothi.seerapu, linux-pm, driver-core
On Fri, Jul 3, 2026 at 5:05 PM Praveen Talari
<praveen.talari@oss.qualcomm.com> wrote:
>
> When a runtime resume callback returns an error, rpm_callback() sets
> power.runtime_error on the device. This causes all subsequent calls to
> rpm_resume() to return -EINVAL immediately at the top of the function
> without invoking the callback again, making the failure permanent until
> runtime PM is explicitly re-initialized.
>
> Unlike suspend failures, resume failures should be retryable. If a
> device's resume callback fails transiently, there is no reason to
> permanently block future resume attempts on that device and all of its
> consumers.
>
> Fix this by conditioning the power.runtime_error assignment in
> rpm_callback() on the device being in the RPM_SUSPENDING state. At the
> point rpm_callback() runs, __update_runtime_status() has already set the
> device status to either RPM_SUSPENDING or RPM_RESUMING, so the two paths
> are reliably distinguishable. Suspend callback failures continue to set
> power.runtime_error as before. Resume callback failures return the error
> to the caller but leave power.runtime_error clear, allowing the next
> resume attempt to invoke the callback normally.
>
> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
> ---
> drivers/base/power/runtime.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 335288e8b5b3..70d933bcd295 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -469,7 +469,13 @@ static int rpm_callback(int (*cb)(struct device *), struct device *dev)
> if (retval == -EACCES)
> retval = -EAGAIN;
>
> - if (retval != -EAGAIN && retval != -EBUSY)
> + /*
> + * Only stick the error on suspend failures. Resume failures are not
> + * treated as permanent so that the next resume attempt will run the
> + * callback again rather than short-circuiting on runtime_error.
> + */
> + if (retval != -EAGAIN && retval != -EBUSY &&
> + dev->power.runtime_status == RPM_SUSPENDING)
> dev->power.runtime_error = retval;
Why don't you move this check to rpm_suspend()?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2 1/2] PM: runtime: Only set runtime_error on suspend callback failures
2026-07-03 20:23 ` Rafael J. Wysocki (Intel)
@ 2026-07-04 5:50 ` Praveen Talari
0 siblings, 0 replies; 5+ messages in thread
From: Praveen Talari @ 2026-07-04 5:50 UTC (permalink / raw)
To: Rafael J. Wysocki (Intel)
Cc: Mark Brown, Dilip Kota, Stephen Boyd, Girish Mahadevan,
Alok Chauhan, bjorn.andersson, Konrad Dybcio, Len Brown,
Pavel Machek, Greg Kroah-Hartman, Danilo Krummrich,
Douglas Anderson, linux-spi, linux-kernel, linux-arm-msm,
Mukesh Kumar Savaliya, aniket.randive, chandana.chiluveru,
jyothi.seerapu, linux-pm, driver-core
Hi Rafael,
On 04-07-2026 01:53, Rafael J. Wysocki (Intel) wrote:
> On Fri, Jul 3, 2026 at 5:05 PM Praveen Talari
> <praveen.talari@oss.qualcomm.com> wrote:
>> When a runtime resume callback returns an error, rpm_callback() sets
>> power.runtime_error on the device. This causes all subsequent calls to
>> rpm_resume() to return -EINVAL immediately at the top of the function
>> without invoking the callback again, making the failure permanent until
>> runtime PM is explicitly re-initialized.
>>
>> Unlike suspend failures, resume failures should be retryable. If a
>> device's resume callback fails transiently, there is no reason to
>> permanently block future resume attempts on that device and all of its
>> consumers.
>>
>> Fix this by conditioning the power.runtime_error assignment in
>> rpm_callback() on the device being in the RPM_SUSPENDING state. At the
>> point rpm_callback() runs, __update_runtime_status() has already set the
>> device status to either RPM_SUSPENDING or RPM_RESUMING, so the two paths
>> are reliably distinguishable. Suspend callback failures continue to set
>> power.runtime_error as before. Resume callback failures return the error
>> to the caller but leave power.runtime_error clear, allowing the next
>> resume attempt to invoke the callback normally.
>>
>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>> ---
>> drivers/base/power/runtime.c | 8 +++++++-
>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
>> index 335288e8b5b3..70d933bcd295 100644
>> --- a/drivers/base/power/runtime.c
>> +++ b/drivers/base/power/runtime.c
>> @@ -469,7 +469,13 @@ static int rpm_callback(int (*cb)(struct device *), struct device *dev)
>> if (retval == -EACCES)
>> retval = -EAGAIN;
>>
>> - if (retval != -EAGAIN && retval != -EBUSY)
>> + /*
>> + * Only stick the error on suspend failures. Resume failures are not
>> + * treated as permanent so that the next resume attempt will run the
>> + * callback again rather than short-circuiting on runtime_error.
>> + */
>> + if (retval != -EAGAIN && retval != -EBUSY &&
>> + dev->power.runtime_status == RPM_SUSPENDING)
>> dev->power.runtime_error = retval;
> Why don't you move this check to rpm_suspend()?
Yes, we can do that.
I hope the changes below address your concerns. Please let me know if
there is anything else that needs to be adjusted.
diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 335288e8b5b3..fab38bc98113 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -469,9 +469,6 @@ static int rpm_callback(int (*cb)(struct device *),
struct device *dev)
if (retval == -EACCES)
retval = -EAGAIN;
- if (retval != -EAGAIN && retval != -EBUSY)
- dev->power.runtime_error = retval;
-
return retval;
}
@@ -751,6 +748,9 @@ static int rpm_suspend(struct device *dev, int rpmflags)
dev->power.deferred_resume = false;
wake_up_all(&dev->power.wait_queue);
+ if (retval != -EAGAIN && retval != -EBUSY)
+ dev->power.runtime_error = retval;
+
Thanks,
Praveen Talari
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-07-04 5:51 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-03 15:05 [PATCH v2 0/2] Fix sticky -EINVAL after resume callback failure Praveen Talari
2026-07-03 15:05 ` [PATCH v2 1/2] PM: runtime: Only set runtime_error on suspend callback failures Praveen Talari
2026-07-03 20:23 ` Rafael J. Wysocki (Intel)
2026-07-04 5:50 ` Praveen Talari
2026-07-03 15:05 ` [PATCH v2 2/2] spi: qcom-geni: Fix missing error check on pm_runtime_get_sync() Praveen Talari
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox