* [PATCH] watchdog: starfive: use pm_runtime_resume_and_get() to fix refcount leak
@ 2026-06-05 12:23 William Theesfeld
2026-06-05 12:35 ` sashiko-bot
2026-06-05 13:11 ` [PATCH v2 0/6] watchdog: starfive: runtime PM cleanup William Theesfeld
0 siblings, 2 replies; 22+ messages in thread
From: William Theesfeld @ 2026-06-05 12:23 UTC (permalink / raw)
To: Xingyu Wu
Cc: Ziv Xu, Wim Van Sebroeck, Guenter Roeck, linux-watchdog,
linux-kernel
starfive_wdt_pm_start() calls pm_runtime_get_sync(), but
pm_runtime_get_sync() increments the runtime PM usage counter even
when it returns an error. On the error path this function just
returns the error code without a matching pm_runtime_put_noidle(),
so the usage counter is leaked on every failed resume.
Switch to pm_runtime_resume_and_get(), which decrements the usage
counter automatically on failure, leaving the caller with the simple
"check return value and propagate" pattern that the existing code
was already trying to follow.
Signed-off-by: William Theesfeld <william@theesfeld.net>
---
drivers/watchdog/starfive-wdt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/watchdog/starfive-wdt.c b/drivers/watchdog/starfive-wdt.c
index af55adc4a..5a3254c83 100644
--- a/drivers/watchdog/starfive-wdt.c
+++ b/drivers/watchdog/starfive-wdt.c
@@ -371,7 +371,7 @@ static void starfive_wdt_stop(struct starfive_wdt *wdt)
static int starfive_wdt_pm_start(struct watchdog_device *wdd)
{
struct starfive_wdt *wdt = watchdog_get_drvdata(wdd);
- int ret = pm_runtime_get_sync(wdd->parent);
+ int ret = pm_runtime_resume_and_get(wdd->parent);
if (ret < 0)
return ret;
--
2.54.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] watchdog: starfive: use pm_runtime_resume_and_get() to fix refcount leak
2026-06-05 12:23 [PATCH] watchdog: starfive: use pm_runtime_resume_and_get() to fix refcount leak William Theesfeld
@ 2026-06-05 12:35 ` sashiko-bot
2026-06-05 13:11 ` [PATCH v2 0/6] watchdog: starfive: runtime PM cleanup William Theesfeld
1 sibling, 0 replies; 22+ messages in thread
From: sashiko-bot @ 2026-06-05 12:35 UTC (permalink / raw)
To: William Theesfeld; +Cc: linux-watchdog
Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
Pre-existing issues:
- [Critical] System callbacks unconditionally access hardware registers while the device may be runtime-suspended, which will trigger a synchronous external abort.
- [High] starfive_wdt_pm_start() leaks the runtime PM usage counter if the hardware start operation fails, an edge case missed by the patch.
- [High] starfive_wdt_pm_stop() and starfive_wdt_probe() incorrectly handle a return value of 1 from pm_runtime_put_sync(), interpreting a success case as an error.
- [High] Multiple error paths in starfive_wdt_probe() leak the PM usage counter and fail to disable PM runtime.
- [High] starfive_wdt_shutdown() unconditionally drops the runtime PM usage counter, causing an underflow if the watchdog is not active.
- [High] Runtime PM usage counter leaks on module removal when early_enable is true and the device was never opened.
--
commit 716832af83b52aac9fc1a5fdc00be97db3d21459
Author: William Theesfeld <william@theesfeld.net>
watchdog: starfive: use pm_runtime_resume_and_get() to fix refcount leak
starfive_wdt_pm_start() calls pm_runtime_get_sync(), but
pm_runtime_get_sync() increments the runtime PM usage counter even
when it returns an error. On the error path this function just
returns the error code without a matching pm_runtime_put_noidle(),
so the usage counter is leaked on every failed resume.
Switch to pm_runtime_resume_and_get(), which decrements the usage
counter automatically on failure, leaving the caller with the simple
"check return value and propagate" pattern that the existing code
was already trying to follow.
> diff --git a/drivers/watchdog/starfive-wdt.c b/drivers/watchdog/starfive-wdt.c
> index af55adc4a3c69..5a3254c83d3bb 100644
> --- a/drivers/watchdog/starfive-wdt.c
> +++ b/drivers/watchdog/starfive-wdt.c
> @@ -371,7 +371,7 @@ static void starfive_wdt_stop(struct starfive_wdt *wdt)
> static int starfive_wdt_pm_start(struct watchdog_device *wdd)
> {
> struct starfive_wdt *wdt = watchdog_get_drvdata(wdd);
> - int ret = pm_runtime_get_sync(wdd->parent);
> + int ret = pm_runtime_resume_and_get(wdd->parent);
>
> if (ret < 0)
> return ret;
>
> return starfive_wdt_start(wdt);
[Severity: High]
This isn't a bug introduced by this patch, but does this code leak the runtime
PM usage counter if starfive_wdt_start() fails? It appears that the function
returns the error directly without calling pm_runtime_put(), keeping the
reference acquired by pm_runtime_resume_and_get().
While reviewing this file, I noticed a few other pre-existing issues.
drivers/watchdog/starfive-wdt.c:starfive_wdt_pm_stop() {
...
starfive_wdt_stop(wdt);
return pm_runtime_put_sync(wdd->parent);
}
[Severity: High]
This is a pre-existing issue, but does this break the watchdog core's
expectation of returning 0 on success?
According to Runtime PM API contracts, pm_runtime_put_sync() returns 1 if the
device is already in the RPM_SUSPENDED state, which indicates success, not an
error. By returning this raw result, it seems we force the watchdog core to
assume the stop operation failed.
drivers/watchdog/starfive-wdt.c:starfive_wdt_probe() {
...
if (!early_enable) {
if (pm_runtime_enabled(&pdev->dev)) {
ret = pm_runtime_put_sync(&pdev->dev);
if (ret)
goto err_unregister_wdt;
...
[Severity: High]
This is a pre-existing issue, but does this erroneously fail the probe if
pm_runtime_put_sync() returns 1?
drivers/watchdog/starfive-wdt.c:starfive_wdt_probe() {
...
pm_runtime_enable(&pdev->dev);
if (pm_runtime_enabled(&pdev->dev)) {
ret = pm_runtime_resume_and_get(&pdev->dev);
if (ret < 0)
return ret;
...
[Severity: High]
This is a pre-existing issue, but do these direct return paths bypass the
required pm_runtime_disable() cleanup after pm_runtime_enable() was called?
drivers/watchdog/starfive-wdt.c:starfive_wdt_probe() {
...
err_exit:
starfive_wdt_disable_clock(wdt);
pm_runtime_disable(&pdev->dev);
return ret;
}
[Severity: High]
This is a pre-existing issue, but does the err_exit label miss dropping the
runtime PM reference acquired via pm_runtime_resume_and_get() earlier in the
probe?
drivers/watchdog/starfive-wdt.c:starfive_wdt_remove() {
...
static void starfive_wdt_remove(struct platform_device *pdev)
{
struct starfive_wdt *wdt = platform_get_drvdata(pdev);
starfive_wdt_stop(wdt);
...
drivers/watchdog/starfive-wdt.c:starfive_wdt_shutdown() {
...
static void starfive_wdt_shutdown(struct platform_device *pdev)
{
struct starfive_wdt *wdt = platform_get_drvdata(pdev);
starfive_wdt_pm_stop(&wdt->wdd);
...
drivers/watchdog/starfive-wdt.c:starfive_wdt_suspend() {
...
static int starfive_wdt_suspend(struct device *dev)
{
struct starfive_wdt *wdt = dev_get_drvdata(dev);
/* Save watchdog state, and turn it off. */
wdt->reload = starfive_wdt_get_count(wdt);
/* Note that WTCNT doesn't need to be saved. */
starfive_wdt_stop(wdt);
...
drivers/watchdog/starfive-wdt.c:starfive_wdt_resume() {
...
static int starfive_wdt_resume(struct device *dev)
{
...
starfive_wdt_unlock(wdt);
/* Restore watchdog state. */
starfive_wdt_set_reload_count(wdt, wdt->reload);
starfive_wdt_lock(wdt);
...
[Severity: Critical]
These are pre-existing issues, but do these system callbacks risk triggering a
synchronous external abort by unconditionally accessing hardware registers
while the device might be runtime-suspended and unclocked?
drivers/watchdog/starfive-wdt.c:starfive_wdt_shutdown() {
...
static void starfive_wdt_shutdown(struct platform_device *pdev)
{
struct starfive_wdt *wdt = platform_get_drvdata(pdev);
starfive_wdt_pm_stop(&wdt->wdd);
}
[Severity: High]
This is a pre-existing issue, but does unconditionally calling
starfive_wdt_pm_stop() (which contains pm_runtime_put_sync()) cause a PM
usage counter underflow if the watchdog is already inactive and its counter
is 0?
drivers/watchdog/starfive-wdt.c:starfive_wdt_remove() {
...
if (pm_runtime_enabled(&pdev->dev))
pm_runtime_disable(&pdev->dev);
else
/* disable clock without PM */
starfive_wdt_disable_clock(wdt);
}
[Severity: High]
This is a pre-existing issue, but does the remove function fail to decrement
the PM usage counter if early_enable is true and the device was never opened
by userspace?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605122336.159546-1-william@theesfeld.net?part=1
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v2 0/6] watchdog: starfive: runtime PM cleanup
2026-06-05 12:23 [PATCH] watchdog: starfive: use pm_runtime_resume_and_get() to fix refcount leak William Theesfeld
2026-06-05 12:35 ` sashiko-bot
@ 2026-06-05 13:11 ` William Theesfeld
2026-06-05 13:11 ` [PATCH v2 1/6] watchdog: starfive: balance PM refcount when start operation fails William Theesfeld
` (6 more replies)
1 sibling, 7 replies; 22+ messages in thread
From: William Theesfeld @ 2026-06-05 13:11 UTC (permalink / raw)
To: Xingyu Wu
Cc: Ziv Xu, Wim Van Sebroeck, Guenter Roeck, linux-watchdog,
linux-kernel
The v1 patch was a one-line conversion of pm_runtime_get_sync() to
pm_runtime_resume_and_get() in starfive_wdt_pm_start(). On review,
the StarFive maintainers pointed out one edge case the v1 fix had
missed, plus five pre-existing issues in the surrounding runtime PM
handling that all need to be addressed before the original fix is
useful in practice.
This v2 turns the single fix into a six-patch series. Each patch
addresses one independent issue and can be reviewed in isolation;
they are ordered roughly from the simplest correctness fix to the
ones requiring a small amount of restructuring.
The series:
1/6 Expand the original v1 fix so the runtime PM reference is also
released when starfive_wdt_start() fails after a successful
pm_runtime_resume_and_get().
2/6 Treat a return value of 1 from pm_runtime_put_sync() as success
in both starfive_wdt_pm_stop() and the probe path; the watchdog
framework currently propagates the value verbatim and the probe
takes the err_unregister_wdt path on a non-error return.
3/6 Restructure starfive_wdt_probe() error handling into three
labels (err_pm_disable / err_put_pm / err_unregister_wdt) so
every failure path balances exactly the resources it has
acquired. Previously two early failure paths returned without
calling pm_runtime_disable() and several later paths leaked the
runtime PM reference.
4/6 Gate the system suspend / resume callbacks with
pm_runtime_status_suspended() so the WDOGLOAD / WDOGCONTROL
accesses run only when the device is actually clocked. When
the device was already runtime-suspended the previous code
triggered a synchronous external abort.
5/6 Make starfive_wdt_shutdown() only drop the PM refcount when
WDOG_ACTIVE is set, and stop just the hardware (no PM touch)
when only WDOG_HW_RUNNING is set. This avoids the refcount
underflow that the previous unconditional pm_stop() produced.
6/6 Release the early_enable PM refcount in starfive_wdt_remove()
when WDOG_HW_RUNNING is still asserted at unregister time. The
watchdog framework's stop-on-unregister path only runs the
stop op when WDOG_ACTIVE is set, so the early-enable +
never-opened combination otherwise leaks the probe-time
reference.
The whole series builds cleanly with CONFIG_STARFIVE_WATCHDOG=m on
x86_64 defconfig. Runtime testing would require StarFive JH7100 /
JH7110 hardware and has not been performed.
William Theesfeld (6):
watchdog: starfive: balance PM refcount when start operation fails
watchdog: starfive: treat pm_runtime_put_sync() positive return as
success
watchdog: starfive: balance PM refcount and disable in probe error
paths
watchdog: starfive: guard system suspend/resume hardware access
watchdog: starfive: avoid PM refcount underflow in shutdown
watchdog: starfive: release early_enable PM refcount on remove
drivers/watchdog/starfive-wdt.c | 92 ++++++++++++++++++++++++++-------
1 file changed, 73 insertions(+), 19 deletions(-)
--
2.54.0
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v2 1/6] watchdog: starfive: balance PM refcount when start operation fails
2026-06-05 13:11 ` [PATCH v2 0/6] watchdog: starfive: runtime PM cleanup William Theesfeld
@ 2026-06-05 13:11 ` William Theesfeld
2026-06-05 13:23 ` sashiko-bot
2026-06-05 13:11 ` [PATCH v2 2/6] watchdog: starfive: treat pm_runtime_put_sync() positive return as success William Theesfeld
` (5 subsequent siblings)
6 siblings, 1 reply; 22+ messages in thread
From: William Theesfeld @ 2026-06-05 13:11 UTC (permalink / raw)
To: Xingyu Wu
Cc: Ziv Xu, Wim Van Sebroeck, Guenter Roeck, linux-watchdog,
linux-kernel
After pm_runtime_resume_and_get() succeeds in starfive_wdt_pm_start(),
the runtime PM usage counter is incremented. If starfive_wdt_start()
subsequently fails, the function returns the error without releasing
that reference, leaking the usage counter.
Call pm_runtime_put_sync() on the start failure path so the counter
is balanced.
This also folds in the v1 conversion of pm_runtime_get_sync() to
pm_runtime_resume_and_get(), which was the original purpose of this
patch; both fixes together close the refcount-leak window in this
function as flagged by the maintainer on the v1 thread.
Signed-off-by: William Theesfeld <william@theesfeld.net>
---
drivers/watchdog/starfive-wdt.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/watchdog/starfive-wdt.c b/drivers/watchdog/starfive-wdt.c
index af55adc4a..ed8c5711a 100644
--- a/drivers/watchdog/starfive-wdt.c
+++ b/drivers/watchdog/starfive-wdt.c
@@ -371,12 +371,16 @@ static void starfive_wdt_stop(struct starfive_wdt *wdt)
static int starfive_wdt_pm_start(struct watchdog_device *wdd)
{
struct starfive_wdt *wdt = watchdog_get_drvdata(wdd);
- int ret = pm_runtime_get_sync(wdd->parent);
+ int ret = pm_runtime_resume_and_get(wdd->parent);
if (ret < 0)
return ret;
- return starfive_wdt_start(wdt);
+ ret = starfive_wdt_start(wdt);
+ if (ret)
+ pm_runtime_put_sync(wdd->parent);
+
+ return ret;
}
static int starfive_wdt_pm_stop(struct watchdog_device *wdd)
--
2.54.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 2/6] watchdog: starfive: treat pm_runtime_put_sync() positive return as success
2026-06-05 13:11 ` [PATCH v2 0/6] watchdog: starfive: runtime PM cleanup William Theesfeld
2026-06-05 13:11 ` [PATCH v2 1/6] watchdog: starfive: balance PM refcount when start operation fails William Theesfeld
@ 2026-06-05 13:11 ` William Theesfeld
2026-06-05 13:26 ` sashiko-bot
2026-06-05 13:11 ` [PATCH v2 3/6] watchdog: starfive: balance PM refcount and disable in probe error paths William Theesfeld
` (4 subsequent siblings)
6 siblings, 1 reply; 22+ messages in thread
From: William Theesfeld @ 2026-06-05 13:11 UTC (permalink / raw)
To: Xingyu Wu
Cc: Ziv Xu, Wim Van Sebroeck, Guenter Roeck, linux-watchdog,
linux-kernel
pm_runtime_put_sync() returns 1 when the device cannot be suspended
because it is still in use (typically due to other runtime PM users
holding references); only negative return values are real errors.
Both starfive_wdt_pm_stop() and starfive_wdt_probe() currently treat
any non-zero return as failure: pm_stop returns the value verbatim,
which the watchdog framework propagates as an error, and probe takes
the err_unregister_wdt path even on a successful but non-zero return.
Mask off the positive return value in pm_stop and tighten the probe
check to "< 0" so the legitimate non-suspended case is no longer
mishandled.
Signed-off-by: William Theesfeld <william@theesfeld.net>
---
drivers/watchdog/starfive-wdt.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/watchdog/starfive-wdt.c b/drivers/watchdog/starfive-wdt.c
index ed8c5711a..c821b38d2 100644
--- a/drivers/watchdog/starfive-wdt.c
+++ b/drivers/watchdog/starfive-wdt.c
@@ -386,9 +386,17 @@ static int starfive_wdt_pm_start(struct watchdog_device *wdd)
static int starfive_wdt_pm_stop(struct watchdog_device *wdd)
{
struct starfive_wdt *wdt = watchdog_get_drvdata(wdd);
+ int ret;
starfive_wdt_stop(wdt);
- return pm_runtime_put_sync(wdd->parent);
+ ret = pm_runtime_put_sync(wdd->parent);
+ /*
+ * pm_runtime_put_sync() can return 1 when the device is still in
+ * use (and therefore did not actually suspend). That is not an
+ * error from the caller's point of view; only propagate negative
+ * return values.
+ */
+ return ret < 0 ? ret : 0;
}
static int starfive_wdt_set_timeout(struct watchdog_device *wdd,
@@ -503,7 +511,7 @@ static int starfive_wdt_probe(struct platform_device *pdev)
if (!early_enable) {
if (pm_runtime_enabled(&pdev->dev)) {
ret = pm_runtime_put_sync(&pdev->dev);
- if (ret)
+ if (ret < 0)
goto err_unregister_wdt;
}
}
--
2.54.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 3/6] watchdog: starfive: balance PM refcount and disable in probe error paths
2026-06-05 13:11 ` [PATCH v2 0/6] watchdog: starfive: runtime PM cleanup William Theesfeld
2026-06-05 13:11 ` [PATCH v2 1/6] watchdog: starfive: balance PM refcount when start operation fails William Theesfeld
2026-06-05 13:11 ` [PATCH v2 2/6] watchdog: starfive: treat pm_runtime_put_sync() positive return as success William Theesfeld
@ 2026-06-05 13:11 ` William Theesfeld
2026-06-05 13:24 ` sashiko-bot
2026-06-05 13:11 ` [PATCH v2 4/6] watchdog: starfive: guard system suspend/resume hardware access William Theesfeld
` (3 subsequent siblings)
6 siblings, 1 reply; 22+ messages in thread
From: William Theesfeld @ 2026-06-05 13:11 UTC (permalink / raw)
To: Xingyu Wu
Cc: Ziv Xu, Wim Van Sebroeck, Guenter Roeck, linux-watchdog,
linux-kernel
The probe path takes a runtime PM reference via
pm_runtime_resume_and_get() (or enables the clocks directly when
runtime PM is unavailable), but several error paths after that point
do not release that reference before returning, and the two earliest
error paths return without calling pm_runtime_disable() at all even
though pm_runtime_enable() has already run.
Restructure the error handling into three labels so every failure
path balances exactly the resources it has acquired:
err_pm_disable: pm_runtime_enable() ran but no clock/refcount was
taken yet (resume_and_get / enable_clock failed).
err_put_pm: clock or PM refcount is held; release it, then
fall through to disable runtime PM.
err_unregister_wdt: watchdog_register_device() succeeded; unregister
before releasing the refcount.
Update the in-function goto targets to use these labels and remove
the early "return ret;" paths so pm_runtime_disable() is always run
once pm_runtime_enable() has been called.
Signed-off-by: William Theesfeld <william@theesfeld.net>
---
drivers/watchdog/starfive-wdt.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/drivers/watchdog/starfive-wdt.c b/drivers/watchdog/starfive-wdt.c
index c821b38d2..6539bf089 100644
--- a/drivers/watchdog/starfive-wdt.c
+++ b/drivers/watchdog/starfive-wdt.c
@@ -460,17 +460,17 @@ static int starfive_wdt_probe(struct platform_device *pdev)
if (pm_runtime_enabled(&pdev->dev)) {
ret = pm_runtime_resume_and_get(&pdev->dev);
if (ret < 0)
- return ret;
+ goto err_pm_disable;
} else {
/* runtime PM is disabled but clocks need to be enabled */
ret = starfive_wdt_enable_clock(wdt);
if (ret)
- return ret;
+ goto err_pm_disable;
}
ret = starfive_wdt_reset_init(&pdev->dev);
if (ret)
- goto err_exit;
+ goto err_put_pm;
watchdog_set_drvdata(&wdt->wdd, wdt);
wdt->wdd.info = &starfive_wdt_info;
@@ -482,7 +482,7 @@ static int starfive_wdt_probe(struct platform_device *pdev)
if (!wdt->freq) {
dev_err(&pdev->dev, "get clock rate failed.\n");
ret = -EINVAL;
- goto err_exit;
+ goto err_put_pm;
}
wdt->wdd.min_timeout = 1;
@@ -498,7 +498,7 @@ static int starfive_wdt_probe(struct platform_device *pdev)
if (early_enable) {
ret = starfive_wdt_start(wdt);
if (ret)
- goto err_exit;
+ goto err_put_pm;
set_bit(WDOG_HW_RUNNING, &wdt->wdd.status);
} else {
starfive_wdt_stop(wdt);
@@ -506,7 +506,7 @@ static int starfive_wdt_probe(struct platform_device *pdev)
ret = watchdog_register_device(&wdt->wdd);
if (ret)
- goto err_exit;
+ goto err_put_pm;
if (!early_enable) {
if (pm_runtime_enabled(&pdev->dev)) {
@@ -520,8 +520,12 @@ static int starfive_wdt_probe(struct platform_device *pdev)
err_unregister_wdt:
watchdog_unregister_device(&wdt->wdd);
-err_exit:
- starfive_wdt_disable_clock(wdt);
+err_put_pm:
+ if (pm_runtime_enabled(&pdev->dev))
+ pm_runtime_put_sync(&pdev->dev);
+ else
+ starfive_wdt_disable_clock(wdt);
+err_pm_disable:
pm_runtime_disable(&pdev->dev);
return ret;
--
2.54.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 4/6] watchdog: starfive: guard system suspend/resume hardware access
2026-06-05 13:11 ` [PATCH v2 0/6] watchdog: starfive: runtime PM cleanup William Theesfeld
` (2 preceding siblings ...)
2026-06-05 13:11 ` [PATCH v2 3/6] watchdog: starfive: balance PM refcount and disable in probe error paths William Theesfeld
@ 2026-06-05 13:11 ` William Theesfeld
2026-06-05 13:27 ` sashiko-bot
2026-06-05 13:11 ` [PATCH v2 5/6] watchdog: starfive: avoid PM refcount underflow in shutdown William Theesfeld
` (2 subsequent siblings)
6 siblings, 1 reply; 22+ messages in thread
From: William Theesfeld @ 2026-06-05 13:11 UTC (permalink / raw)
To: Xingyu Wu
Cc: Ziv Xu, Wim Van Sebroeck, Guenter Roeck, linux-watchdog,
linux-kernel
starfive_wdt_suspend() reads from STARFIVE_WDT_VALUE and writes to
WDOGCONTROL via starfive_wdt_get_count() and starfive_wdt_stop()
before calling pm_runtime_force_suspend(). If the device was already
runtime-suspended when system suspend began, the apb and core clocks
are gated and those register accesses produce a synchronous external
abort.
starfive_wdt_resume() has the mirror problem: it unconditionally
restores the WDOGLOAD value after pm_runtime_force_resume() returns,
but pm_runtime_force_resume() leaves the device in whichever runtime
PM state it was in pre-suspend. If the device stays suspended, the
restore writes race with gated clocks.
Gate both halves with pm_runtime_status_suspended() so the hardware
is only touched when it is actually active. When the watchdog was
suspended at runtime PM level there is no state to save or restore.
Signed-off-by: William Theesfeld <william@theesfeld.net>
---
drivers/watchdog/starfive-wdt.c | 25 +++++++++++++++++++------
1 file changed, 19 insertions(+), 6 deletions(-)
diff --git a/drivers/watchdog/starfive-wdt.c b/drivers/watchdog/starfive-wdt.c
index 6539bf089..4926984bf 100644
--- a/drivers/watchdog/starfive-wdt.c
+++ b/drivers/watchdog/starfive-wdt.c
@@ -556,11 +556,16 @@ static int starfive_wdt_suspend(struct device *dev)
{
struct starfive_wdt *wdt = dev_get_drvdata(dev);
- /* Save watchdog state, and turn it off. */
- wdt->reload = starfive_wdt_get_count(wdt);
-
- /* Note that WTCNT doesn't need to be saved. */
- starfive_wdt_stop(wdt);
+ /*
+ * Only access the hardware while the device is runtime-resumed; if
+ * runtime PM has already suspended the device, its clocks are gated
+ * and a register read/write would trigger a synchronous external
+ * abort. WTCNT does not need to be saved.
+ */
+ if (!pm_runtime_status_suspended(dev)) {
+ wdt->reload = starfive_wdt_get_count(wdt);
+ starfive_wdt_stop(wdt);
+ }
return pm_runtime_force_suspend(dev);
}
@@ -574,8 +579,16 @@ static int starfive_wdt_resume(struct device *dev)
if (ret)
return ret;
+ /*
+ * pm_runtime_force_resume() leaves the device in whichever runtime
+ * PM state it was in before system suspend. Only restore hardware
+ * state when the device is actually resumed; otherwise the register
+ * writes below would race with gated clocks.
+ */
+ if (pm_runtime_status_suspended(dev))
+ return 0;
+
starfive_wdt_unlock(wdt);
- /* Restore watchdog state. */
starfive_wdt_set_reload_count(wdt, wdt->reload);
starfive_wdt_lock(wdt);
--
2.54.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 5/6] watchdog: starfive: avoid PM refcount underflow in shutdown
2026-06-05 13:11 ` [PATCH v2 0/6] watchdog: starfive: runtime PM cleanup William Theesfeld
` (3 preceding siblings ...)
2026-06-05 13:11 ` [PATCH v2 4/6] watchdog: starfive: guard system suspend/resume hardware access William Theesfeld
@ 2026-06-05 13:11 ` William Theesfeld
2026-06-05 13:29 ` sashiko-bot
2026-06-05 13:11 ` [PATCH v2 6/6] watchdog: starfive: release early_enable PM refcount on remove William Theesfeld
2026-06-05 17:19 ` [PATCH v3 0/6] watchdog: starfive: runtime PM cleanup William Theesfeld
6 siblings, 1 reply; 22+ messages in thread
From: William Theesfeld @ 2026-06-05 13:11 UTC (permalink / raw)
To: Xingyu Wu
Cc: Ziv Xu, Wim Van Sebroeck, Guenter Roeck, linux-watchdog,
linux-kernel
starfive_wdt_shutdown() unconditionally calls starfive_wdt_pm_stop(),
which drops a runtime PM reference via pm_runtime_put_sync(). If the
device was never opened (so there was no matching pm_start()), and
early_enable is also off (so probe already balanced its own get/put),
the counter is at zero and put_sync() drives it negative.
Only call pm_stop() when WDOG_ACTIVE is set (meaning a paired
pm_start() happened on open). For the early_enable-and-never-opened
case, the framework state is WDOG_HW_RUNNING without WDOG_ACTIVE; in
that case stop just the hardware so the watchdog does not keep
ticking through shutdown, while leaving the PM refcount for the
remove path to release.
Signed-off-by: William Theesfeld <william@theesfeld.net>
---
drivers/watchdog/starfive-wdt.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/watchdog/starfive-wdt.c b/drivers/watchdog/starfive-wdt.c
index 4926984bf..848935dc6 100644
--- a/drivers/watchdog/starfive-wdt.c
+++ b/drivers/watchdog/starfive-wdt.c
@@ -549,7 +549,19 @@ static void starfive_wdt_shutdown(struct platform_device *pdev)
{
struct starfive_wdt *wdt = platform_get_drvdata(pdev);
- starfive_wdt_pm_stop(&wdt->wdd);
+ /*
+ * Only drop a runtime PM reference when we actually hold one. The
+ * watchdog framework increments it via pm_start() on open; when
+ * early_enable started the hardware without an open, the reference
+ * was taken at probe time and the framework state reads as
+ * WDOG_HW_RUNNING without WDOG_ACTIVE. In the inactive-and-not-
+ * running case there is nothing for us to do and dropping the
+ * counter unconditionally would underflow it.
+ */
+ if (watchdog_active(&wdt->wdd))
+ starfive_wdt_pm_stop(&wdt->wdd);
+ else if (test_bit(WDOG_HW_RUNNING, &wdt->wdd.status))
+ starfive_wdt_stop(wdt);
}
static int starfive_wdt_suspend(struct device *dev)
--
2.54.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 6/6] watchdog: starfive: release early_enable PM refcount on remove
2026-06-05 13:11 ` [PATCH v2 0/6] watchdog: starfive: runtime PM cleanup William Theesfeld
` (4 preceding siblings ...)
2026-06-05 13:11 ` [PATCH v2 5/6] watchdog: starfive: avoid PM refcount underflow in shutdown William Theesfeld
@ 2026-06-05 13:11 ` William Theesfeld
2026-06-05 13:33 ` sashiko-bot
2026-06-05 17:19 ` [PATCH v3 0/6] watchdog: starfive: runtime PM cleanup William Theesfeld
6 siblings, 1 reply; 22+ messages in thread
From: William Theesfeld @ 2026-06-05 13:11 UTC (permalink / raw)
To: Xingyu Wu
Cc: Ziv Xu, Wim Van Sebroeck, Guenter Roeck, linux-watchdog,
linux-kernel
When early_enable starts the hardware at probe time, the runtime PM
reference taken by pm_runtime_resume_and_get() is intentionally kept
to hold the device runtime-resumed. In the normal (userspace-opened)
case the watchdog framework unwinds that reference via pm_stop() on
close, and again via stop_on_unregister() during
watchdog_unregister_device().
But watchdog_unregister_device() only runs the stop path when
WDOG_ACTIVE is set (see drivers/watchdog/watchdog_dev.c). If
early_enable started the hardware and userspace never opened it,
WDOG_HW_RUNNING is set without WDOG_ACTIVE: unregister returns
without touching pm_stop, and the probe-time PM reference is leaked.
Drop that reference in starfive_wdt_remove() before
pm_runtime_disable() runs, when WDOG_HW_RUNNING is still asserted.
Signed-off-by: William Theesfeld <william@theesfeld.net>
---
drivers/watchdog/starfive-wdt.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/drivers/watchdog/starfive-wdt.c b/drivers/watchdog/starfive-wdt.c
index 848935dc6..d9175b045 100644
--- a/drivers/watchdog/starfive-wdt.c
+++ b/drivers/watchdog/starfive-wdt.c
@@ -538,6 +538,19 @@ static void starfive_wdt_remove(struct platform_device *pdev)
starfive_wdt_stop(wdt);
watchdog_unregister_device(&wdt->wdd);
+ /*
+ * watchdog_unregister_device() only calls our pm_stop op when
+ * WDOG_ACTIVE is set. When early_enable started the hardware at
+ * probe time and userspace never opened the watchdog, the framework
+ * state is WDOG_HW_RUNNING without WDOG_ACTIVE: the unregister path
+ * leaves the runtime PM reference taken at probe outstanding. Drop
+ * it here before disabling runtime PM so the usage counter does not
+ * leak.
+ */
+ if (test_bit(WDOG_HW_RUNNING, &wdt->wdd.status) &&
+ pm_runtime_enabled(&pdev->dev))
+ pm_runtime_put_sync(&pdev->dev);
+
if (pm_runtime_enabled(&pdev->dev))
pm_runtime_disable(&pdev->dev);
else
--
2.54.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v2 1/6] watchdog: starfive: balance PM refcount when start operation fails
2026-06-05 13:11 ` [PATCH v2 1/6] watchdog: starfive: balance PM refcount when start operation fails William Theesfeld
@ 2026-06-05 13:23 ` sashiko-bot
0 siblings, 0 replies; 22+ messages in thread
From: sashiko-bot @ 2026-06-05 13:23 UTC (permalink / raw)
To: William Theesfeld; +Cc: linux-watchdog
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues:
- [High] Hardware registers are accessed in `set_timeout` and `get_timeleft` callbacks without ensuring the device is runtime-resumed.
--
commit 634df4af4e48b2df41f8b9f5b5a99c2ef88e4d38
Author: William Theesfeld <william@theesfeld.net>
watchdog: starfive: balance PM refcount when start operation fails
This commit fixes a PM runtime usage counter leak in starfive_wdt_pm_start()
by calling pm_runtime_put_sync() if starfive_wdt_start() fails. It also
updates the resume call to use pm_runtime_resume_and_get().
> diff --git a/drivers/watchdog/starfive-wdt.c b/drivers/watchdog/starfive-wdt.c
> index af55adc4a3c69..ed8c5711a3853 100644
> --- a/drivers/watchdog/starfive-wdt.c
> +++ b/drivers/watchdog/starfive-wdt.c
> @@ -371,12 +371,16 @@ static void starfive_wdt_stop(struct starfive_wdt *wdt)
> static int starfive_wdt_pm_start(struct watchdog_device *wdd)
> {
> struct starfive_wdt *wdt = watchdog_get_drvdata(wdd);
> - int ret = pm_runtime_get_sync(wdd->parent);
> + int ret = pm_runtime_resume_and_get(wdd->parent);
>
> if (ret < 0)
> return ret;
>
> - return starfive_wdt_start(wdt);
> + ret = starfive_wdt_start(wdt);
> + if (ret)
> + pm_runtime_put_sync(wdd->parent);
> +
> + return ret;
> }
[Severity: High]
This isn't a bug introduced by this patch, but since we are modifying PM
runtime handling in this driver, does the driver safely handle hardware
register accesses when the watchdog is inactive?
The watchdog framework allows ioctls like WDIOC_SETTIMEOUT and
WDIOC_GETTIMELEFT to be called even when the watchdog is stopped. In the
inactive state, the PM runtime usage counter drops to 0, which suspends the
device and gates its APB and core clocks.
In drivers/watchdog/starfive-wdt.c:starfive_wdt_set_timeout():
starfive_wdt_unlock(wdt);
starfive_wdt_disable(wdt);
starfive_wdt_set_reload_count(wdt, wdt->count);
starfive_wdt_enable(wdt);
starfive_wdt_lock(wdt);
And in drivers/watchdog/starfive-wdt.c:starfive_wdt_get_timeleft():
count = starfive_wdt_get_count(wdt);
if (wdt->variant->double_timeout && !starfive_wdt_raise_irq_status(wdt))
count += wdt->count;
Since these functions directly access MMIO registers without ensuring the
clocks are active via pm_runtime_resume_and_get(), could these unclocked
bus accesses trigger a bus fault or panic if called by userspace while the
watchdog is inactive?
--
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1780664848.git.william@theesfeld.net?part=1
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 3/6] watchdog: starfive: balance PM refcount and disable in probe error paths
2026-06-05 13:11 ` [PATCH v2 3/6] watchdog: starfive: balance PM refcount and disable in probe error paths William Theesfeld
@ 2026-06-05 13:24 ` sashiko-bot
0 siblings, 0 replies; 22+ messages in thread
From: sashiko-bot @ 2026-06-05 13:24 UTC (permalink / raw)
To: William Theesfeld; +Cc: linux-watchdog
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
New issues:
- [High] Treating pm_runtime_put_sync() return value as a fatal error causes spurious probe failures and a double PM refcount drop.
Pre-existing issues:
- [High] Missing runtime PM synchronization in remove() leads to unclocked hardware access and potential kernel panic.
--
commit b65fb45d5687c6419949ad39d8553aec8eec2e77
Author: William Theesfeld <william@theesfeld.net>
watchdog: starfive: balance PM refcount and disable in probe error paths
The commit restructures the error handling in the probe function of the
starfive watchdog driver. It adds specific labels to ensure runtime PM
refcounts and clocks are properly balanced and disabled during failures.
> diff --git a/drivers/watchdog/starfive-wdt.c b/drivers/watchdog/starfive-wdt.c
> index c821b38d26d22..6539bf0893583 100644
> --- a/drivers/watchdog/starfive-wdt.c
> +++ b/drivers/watchdog/starfive-wdt.c
[ ... ]
> @@ -506,7 +506,7 @@ static int starfive_wdt_probe(struct platform_device *pdev)
>
> ret = watchdog_register_device(&wdt->wdd);
> if (ret)
> - goto err_exit;
> + goto err_put_pm;
>
> if (!early_enable) {
> if (pm_runtime_enabled(&pdev->dev)) {
> @@ -520,8 +520,12 @@ static int starfive_wdt_probe(struct platform_device *pdev)
>
> err_unregister_wdt:
> watchdog_unregister_device(&wdt->wdd);
> -err_exit:
> - starfive_wdt_disable_clock(wdt);
> +err_put_pm:
> + if (pm_runtime_enabled(&pdev->dev))
> + pm_runtime_put_sync(&pdev->dev);
[Severity: High]
Is there a risk of a double put and refcount underflow here?
If pm_runtime_put_sync() returns a negative value like -EAGAIN
because another thread incremented the usage count before the idle check,
it successfully decremented the refcount.
If we jump to err_unregister_wdt, the code will fall through to err_put_pm
and call pm_runtime_put_sync() a second time.
Does this cause an unbalanced put and spurious probe failures?
[Severity: High]
This isn't a bug introduced by this patch, but looking at the PM handling
in starfive_wdt_remove():
drivers/watchdog/starfive-wdt.c:starfive_wdt_remove() {
...
starfive_wdt_stop(wdt);
watchdog_unregister_device(&wdt->wdd);
...
}
Is there missing runtime PM synchronization here?
starfive_wdt_remove() unconditionally calls starfive_wdt_stop(), which
accesses hardware registers. If the watchdog is not open, the device might
be runtime-suspended with its APB and core clocks gated.
Could accessing unclocked hardware registers here trigger a synchronous
external abort and a kernel panic?
--
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1780664848.git.william@theesfeld.net?part=3
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/6] watchdog: starfive: treat pm_runtime_put_sync() positive return as success
2026-06-05 13:11 ` [PATCH v2 2/6] watchdog: starfive: treat pm_runtime_put_sync() positive return as success William Theesfeld
@ 2026-06-05 13:26 ` sashiko-bot
0 siblings, 0 replies; 22+ messages in thread
From: sashiko-bot @ 2026-06-05 13:26 UTC (permalink / raw)
To: William Theesfeld; +Cc: linux-watchdog
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
New issues:
- [Medium] The commit message and inline comment justify the code change with a fundamentally incorrect explanation of Runtime PM return values.
Pre-existing issues:
- [High] Propagating normal negative returns from `pm_runtime_put_sync()` as errors breaks the watchdog state machine and probe function.
- [High] The error path in `starfive_wdt_probe` calls `pm_runtime_put_sync()` twice when the initial PM put fails, leading to a usage count underflow.
- [High] The driver unconditionally accesses hardware registers during driver removal without guaranteeing the clocks are enabled, risking bus faults.
--
commit ff0d7567f1fad840688f8c89badc9b0278568af6
Author: William Theesfeld <william@theesfeld.net>
watchdog: starfive: treat pm_runtime_put_sync() positive return as success
This commit updates the watchdog driver to mask off positive return values
from pm_runtime_put_sync() in starfive_wdt_pm_stop() and tightens the error
check in starfive_wdt_probe() to only catch negative returns. The intention
is to stop treating the non-suspended state as an error condition.
> watchdog: starfive: treat pm_runtime_put_sync() positive return as success
>
> pm_runtime_put_sync() returns 1 when the device cannot be suspended
> because it is still in use (typically due to other runtime PM users
> holding references); only negative return values are real errors.
[Severity: Medium]
Is this description of the return value accurate?
According to the Runtime PM guidelines, pm_runtime_put_sync() returns 1 when
the device is already suspended, not when it is still in use. When a device
is still in use (i.e., usage counter is non-zero), it returns -EAGAIN.
Documenting incorrect API semantics here could misguide future maintainers.
[ ... ]
> diff --git a/drivers/watchdog/starfive-wdt.c b/drivers/watchdog/starfive-wdt.c
> index ed8c5711a3853..c821b38d26d22 100644
> --- a/drivers/watchdog/starfive-wdt.c
> +++ b/drivers/watchdog/starfive-wdt.c
> @@ -386,9 +386,17 @@ static int starfive_wdt_pm_start(struct watchdog_device *wdd)
> static int starfive_wdt_pm_stop(struct watchdog_device *wdd)
> {
> struct starfive_wdt *wdt = watchdog_get_drvdata(wdd);
> + int ret;
>
> starfive_wdt_stop(wdt);
> - return pm_runtime_put_sync(wdd->parent);
> + ret = pm_runtime_put_sync(wdd->parent);
> + /*
> + * pm_runtime_put_sync() can return 1 when the device is still in
> + * use (and therefore did not actually suspend). That is not an
> + * error from the caller's point of view; only propagate negative
> + * return values.
> + */
[Severity: Medium]
Is the comment's description of the return value correct?
As mentioned regarding the commit message, pm_runtime_put_sync() returns 1
when the device is already suspended, not when it is still in use.
> + return ret < 0 ? ret : 0;
[Severity: High]
This isn't a bug introduced by this patch, but does propagating normal
negative returns as errors break the watchdog state machine?
If the device cannot immediately suspend due to a non-zero usage counter,
pm_runtime_put_sync() correctly returns -EAGAIN or -EBUSY.
By propagating these negative values as errors, the watchdog core assumes the
hardware stop failed. It will leave the WDOG_ACTIVE bit set and continue the
background ping worker.
Because the driver has already successfully disabled the hardware and dropped
its PM reference, the device will eventually suspend. The rogue ping worker
would then access unclocked MMIO registers and trigger a fatal bus fault.
Should this return 0 if the hardware was successfully stopped?
> }
>
> static int starfive_wdt_set_timeout(struct watchdog_device *wdd,
[ ... ]
> @@ -503,7 +511,7 @@ static int starfive_wdt_probe(struct platform_device *pdev)
> if (!early_enable) {
> if (pm_runtime_enabled(&pdev->dev)) {
> ret = pm_runtime_put_sync(&pdev->dev);
> - if (ret)
> + if (ret < 0)
[Severity: High]
This is a pre-existing issue, but does treating -EAGAIN as a failure cause
the probe to fail unnecessarily?
If pm_runtime_put_sync() returns -EAGAIN due to a concurrent sysfs access,
propagating it via ret < 0 turns a harmless concurrency race into a fatal
failure, unregistering a working watchdog.
[Severity: High]
This is a pre-existing issue, but does this error path cause a usage count
underflow?
If pm_runtime_put_sync() returns an error (like -EAGAIN), the code branches
to err_unregister_wdt and falls through to the err_put_pm label:
err_unregister_wdt:
watchdog_unregister_device(&wdt->wdd);
err_put_pm:
if (pm_runtime_enabled(&pdev->dev))
pm_runtime_put_sync(&pdev->dev);
Since the initial pm_runtime_put_sync() unconditionally decrements the usage
count regardless of its return value, calling it a second time here causes
an unbalanced decrement.
> goto err_unregister_wdt;
> }
> }
[Severity: High]
This is a pre-existing issue, but does starfive_wdt_remove() risk bus faults
by unconditionally accessing hardware registers without ensuring clocks are
active?
drivers/watchdog/starfive-wdt.c:starfive_wdt_remove() {
...
struct starfive_wdt *wdt = platform_get_drvdata(pdev);
starfive_wdt_stop(wdt);
...
}
If the watchdog was never opened by userspace and early_enable was false, the
device is in the RPM_SUSPENDED state with its clocks gated. Accessing
unclocked MMIO registers in starfive_wdt_stop() triggers a synchronous
external abort.
--
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1780664848.git.william@theesfeld.net?part=2
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 4/6] watchdog: starfive: guard system suspend/resume hardware access
2026-06-05 13:11 ` [PATCH v2 4/6] watchdog: starfive: guard system suspend/resume hardware access William Theesfeld
@ 2026-06-05 13:27 ` sashiko-bot
0 siblings, 0 replies; 22+ messages in thread
From: sashiko-bot @ 2026-06-05 13:27 UTC (permalink / raw)
To: William Theesfeld; +Cc: linux-watchdog
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues:
- [High] A watchdog started via early_enable is permanently stopped after system suspend/resume.
- [High] starfive_wdt_remove unconditionally accesses gated hardware, triggering a synchronous external abort.
--
commit c23d412b644d160f8e059fcbfd66f48b0a689897
Author: William Theesfeld <william@theesfeld.net>
watchdog: starfive: guard system suspend/resume hardware access
This commit guards hardware access in starfive_wdt_suspend() and
starfive_wdt_resume() with pm_runtime_status_suspended() checks. It prevents
register reads and writes when the device is runtime-suspended and its clocks
are gated, avoiding synchronous external aborts.
> diff --git a/drivers/watchdog/starfive-wdt.c b/drivers/watchdog/starfive-wdt.c
> index 6539bf0893583..4926984bf1e93 100644
> --- a/drivers/watchdog/starfive-wdt.c
> +++ b/drivers/watchdog/starfive-wdt.c
[ ... ]
> @@ -574,8 +579,16 @@ static int starfive_wdt_resume(struct device *dev)
> if (ret)
> return ret;
>
> + /*
> + * pm_runtime_force_resume() leaves the device in whichever runtime
> + * PM state it was in before system suspend. Only restore hardware
> + * state when the device is actually resumed; otherwise the register
> + * writes below would race with gated clocks.
> + */
> + if (pm_runtime_status_suspended(dev))
> + return 0;
> +
> starfive_wdt_unlock(wdt);
> - /* Restore watchdog state. */
> starfive_wdt_set_reload_count(wdt, wdt->reload);
> starfive_wdt_lock(wdt);
>
[Severity: High]
This isn't a bug introduced by this patch, but does this skip restarting the
hardware for early-enabled watchdogs?
Immediately following this block, the function checks:
if (watchdog_active(&wdt->wdd))
return starfive_wdt_start(wdt);
When early_enable is used but userspace hasn't opened the watchdog, the
framework state is WDOG_HW_RUNNING without WDOG_ACTIVE. Since
watchdog_active() only checks WDOG_ACTIVE, it evaluates to false here.
Does this leave the hardware watchdog permanently stopped after a system
resume?
[Severity: High]
This is a pre-existing issue, but I noticed a similar hardware access issue
can happen during driver unbind:
drivers/watchdog/starfive-wdt.c:starfive_wdt_remove() {
struct starfive_wdt *wdt = platform_get_drvdata(pdev);
starfive_wdt_stop(wdt);
...
}
During driver unbind, the driver core (__device_release_driver() in
drivers/base/dd.c) drops the runtime PM usage counter to 0 immediately before
calling device_remove().
If the watchdog is not active (e.g. early_enable is false and userspace
never opened it), the device is suspended and its clocks are gated. Does
calling starfive_wdt_stop() here unconditionally access the WDOGCONTROL
and WDOGLOCK registers while clocks are gated, triggering a synchronous
external abort?
--
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1780664848.git.william@theesfeld.net?part=4
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 5/6] watchdog: starfive: avoid PM refcount underflow in shutdown
2026-06-05 13:11 ` [PATCH v2 5/6] watchdog: starfive: avoid PM refcount underflow in shutdown William Theesfeld
@ 2026-06-05 13:29 ` sashiko-bot
0 siblings, 0 replies; 22+ messages in thread
From: sashiko-bot @ 2026-06-05 13:29 UTC (permalink / raw)
To: William Theesfeld; +Cc: linux-watchdog
Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
New issues:
- [High] Race condition in `starfive_wdt_shutdown()` causes an external abort due to prematurely gating clocks and TOCTOU.
Pre-existing issues:
- [High] Unconditional hardware access in `starfive_wdt_remove()` causes an external abort when the device is runtime-suspended.
- [High] Double-put of the runtime PM usage counter in the `starfive_wdt_probe` error path.
- [High] The watchdog is left permanently disabled after a system resume if started via `early_enable` but not yet opened by userspace.
- [High] The `starfive_wdt_shutdown()` callback causes a double runtime PM put and a synchronous external abort by duplicating the watchdog core reboot notifier.
- [High] `starfive_wdt_pm_stop` incorrectly propagates `-EAGAIN` when the device is still in use, breaking the watchdog core state machine.
--
commit a92c83041e6a1832c846458c02228e9b4ab617c8
Author: William Theesfeld <william@theesfeld.net>
Subject: watchdog: starfive: avoid PM refcount underflow in shutdown
This commit prevents a runtime pm reference counter underflow in the
shutdown path by only dropping the reference if the watchdog is actively
running. It checks WDOG_ACTIVE and WDOG_HW_RUNNING to conditionally stop
the hardware or drop the runtime pm reference.
> diff --git a/drivers/watchdog/starfive-wdt.c b/drivers/watchdog/starfive-wdt.c
> index 4926984bf1e93..848935dc6dca6 100644
> --- a/drivers/watchdog/starfive-wdt.c
> +++ b/drivers/watchdog/starfive-wdt.c
> @@ -549,7 +549,19 @@ static void starfive_wdt_shutdown(struct platform_device *pdev)
> {
> struct starfive_wdt *wdt = platform_get_drvdata(pdev);
>
> - starfive_wdt_pm_stop(&wdt->wdd);
> + /*
> + * Only drop a runtime PM reference when we actually hold one. The
> + * watchdog framework increments it via pm_start() on open; when
> + * early_enable started the hardware without an open, the reference
> + * was taken at probe time and the framework state reads as
> + * WDOG_HW_RUNNING without WDOG_ACTIVE. In the inactive-and-not-
> + * running case there is nothing for us to do and dropping the
> + * counter unconditionally would underflow it.
> + */
> + if (watchdog_active(&wdt->wdd))
[Severity: High]
Can this lockless check race with watchdog_release() closing the watchdog
concurrently?
If device_shutdown() runs concurrently with a user-space process closing
the watchdog, processor 0 could execute watchdog_stop and call
starfive_wdt_pm_stop() to gate clocks, but might not have cleared
WDOG_ACTIVE yet.
If processor 1 running starfive_wdt_shutdown() sees watchdog_active() is
still true, could it call starfive_wdt_pm_stop() again and access gated
clocks, triggering a synchronous external abort?
[Severity: High]
This is a pre-existing issue, but does this shutdown callback duplicate the
watchdog core reboot notifier and cause a double runtime pm put?
The watchdog core registers a reboot notifier that calls wdd->ops->stop()
during reboot, gating the clocks and dropping the runtime pm reference, but
it does not clear the WDOG_ACTIVE or WDOG_HW_RUNNING flags.
When device_shutdown() subsequently invokes starfive_wdt_shutdown(), it
checks these uncleared flags. Could this erroneously call
starfive_wdt_pm_stop() or starfive_wdt_stop() a second time, triggering a
bus fault by accessing memory-mapped registers after clocks are gated?
> + starfive_wdt_pm_stop(&wdt->wdd);
> + else if (test_bit(WDOG_HW_RUNNING, &wdt->wdd.status))
> + starfive_wdt_stop(wdt);
> }
[ ... ]
While reviewing this code, I noticed a few pre-existing issues in other
functions of this driver.
[Severity: High]
This isn't a bug introduced by this patch, but does the unconditional
hardware access in starfive_wdt_remove() trigger an external abort when
clocks are gated?
drivers/watchdog/starfive-wdt.c:starfive_wdt_remove() {
...
starfive_wdt_stop(wdt);
watchdog_unregister_device(&wdt->wdd);
...
}
If the watchdog was not started via early_enable and was never opened, or
was opened and subsequently closed, its runtime pm usage counter drops to 0,
causing the device to enter runtime suspend and its clocks to be gated.
When the driver is unbound, starfive_wdt_remove() unconditionally calls
starfive_wdt_stop(), which attempts to write to memory-mapped registers.
Could accessing these registers while clocks are gated trigger a synchronous
external abort?
[Severity: High]
This isn't a bug introduced by this patch, but does the error path in
starfive_wdt_probe() double-put the runtime pm usage counter?
drivers/watchdog/starfive-wdt.c:starfive_wdt_probe() {
...
ret = pm_runtime_put_sync(&pdev->dev);
if (ret < 0)
goto err_unregister_wdt;
...
}
If pm_runtime_put_sync() returns an error like -EAGAIN or -EBUSY, the usage
counter was already unconditionally decremented.
The code jumps to err_unregister_wdt and falls through to err_put_pm, where
pm_runtime_put_sync() is called a second time. Can this cause the usage
counter to underflow?
[Severity: High]
This isn't a bug introduced by this patch, but is the watchdog left
permanently disabled after a system resume if started via early_enable
but not yet opened by userspace?
drivers/watchdog/starfive-wdt.c:starfive_wdt_resume() {
...
if (watchdog_active(&wdt->wdd))
return starfive_wdt_start(wdt);
return 0;
}
When early_enable is true, probe enables the hardware and sets
WDOG_HW_RUNNING, but watchdog_active() remains false until userspace opens
the device.
On system resume, starfive_wdt_resume() only calls starfive_wdt_start() if
watchdog_active() is true, leaving the hardware disabled.
Could the core's ping worker see WDOG_HW_RUNNING and continue to call
starfive_wdt_keepalive(), reloading the counter but not enabling the
hardware? Does this leave the system unprotected after a suspend/resume
cycle?
[Severity: High]
This isn't a bug introduced by this patch, but does starfive_wdt_pm_stop()
incorrectly propagate -EAGAIN when the device is still legitimately in use?
drivers/watchdog/starfive-wdt.c:starfive_wdt_pm_stop() {
...
return ret < 0 ? ret : 0;
}
If early_enable is true, probe leaves an outstanding runtime pm usage count.
When the watchdog is opened, the count increments to 2.
When closed, starfive_wdt_pm_stop() drops the count to 1 via
pm_runtime_put_sync(), which returns -EAGAIN because the device is still
active.
If this -EAGAIN is propagated back to the watchdog core, does
watchdog_release() treat this as an error, leaving WDOG_ACTIVE set and
spinning the background ping worker indefinitely, even though the driver
actually stopped the hardware?
--
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1780664848.git.william@theesfeld.net?part=5
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 6/6] watchdog: starfive: release early_enable PM refcount on remove
2026-06-05 13:11 ` [PATCH v2 6/6] watchdog: starfive: release early_enable PM refcount on remove William Theesfeld
@ 2026-06-05 13:33 ` sashiko-bot
0 siblings, 0 replies; 22+ messages in thread
From: sashiko-bot @ 2026-06-05 13:33 UTC (permalink / raw)
To: William Theesfeld; +Cc: linux-watchdog
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
Pre-existing issues:
- [High] `starfive_wdt_remove` unconditionally accesses hardware registers before checking PM state, causing a crash on unbind if the device is PM-suspended.
- [High] System resume fails to restart the hardware watchdog if it was running purely due to `early_enable`.
- [Medium] Double decrement of runtime PM usage counter in `starfive_wdt_probe` error path.
--
commit 2328d244e274598a728c0306f052404720065d11
Author: William Theesfeld <william@theesfeld.net>
watchdog: starfive: release early_enable PM refcount on remove
This commit drops the runtime PM reference taken at probe time if the
watchdog was started by early_enable but never opened by userspace. It
ensures the usage counter does not leak when the driver is removed.
> diff --git a/drivers/watchdog/starfive-wdt.c b/drivers/watchdog/starfive-wdt.c
> index 848935dc6dca6..d9175b045d60f 100644
> --- a/drivers/watchdog/starfive-wdt.c
> +++ b/drivers/watchdog/starfive-wdt.c
> @@ -538,6 +538,19 @@ static void starfive_wdt_remove(struct platform_device *pdev)
> {
> struct starfive_wdt *wdt = platform_get_drvdata(pdev);
>
> starfive_wdt_stop(wdt);
> watchdog_unregister_device(&wdt->wdd);
[Severity: High]
This isn't a bug introduced by this patch, but does starfive_wdt_stop()
access hardware registers before checking the PM state?
If the device is runtime suspended (for example, early_enable is false and
the watchdog is not open), its clocks would be gated. Calling
starfive_wdt_stop() without waking up the device via pm_runtime_get_sync()
could access memory-mapped registers with the clock gated.
Could this trigger a synchronous external abort or bus fault during driver
unbind?
[ ... ]
While reviewing the early_enable and PM logic in this driver, I noticed a few
other pre-existing issues:
[Severity: High]
This is a pre-existing issue, but looking at the suspend and resume
behavior with early_enable:
drivers/watchdog/starfive-wdt.c:starfive_wdt_resume() {
...
if (watchdog_active(&wdt->wdd))
return starfive_wdt_start(wdt);
return 0;
}
During suspend, starfive_wdt_suspend() unconditionally stops the hardware if
it's active. On resume, the code above only restarts the hardware if
WDOG_ACTIVE is set. If the watchdog was running solely due to early_enable,
WDOG_ACTIVE is false, so it is left stopped, but WDOG_HW_RUNNING remains
set in the watchdog core.
If userspace later opens the watchdog, would the core only ping it via
starfive_wdt_keepalive() instead of explicitly starting it? It seems this
might leave the hardware watchdog disabled indefinitely.
[Severity: Medium]
This is also a pre-existing issue, but looking at the error path in
starfive_wdt_probe():
drivers/watchdog/starfive-wdt.c:starfive_wdt_probe() {
...
ret = pm_runtime_put_sync(&pdev->dev);
if (ret < 0)
goto err_unregister_wdt;
...
err_unregister_wdt:
watchdog_unregister_device(&wdt->wdd);
err_put_pm:
if (pm_runtime_enabled(&pdev->dev))
pm_runtime_put_sync(&pdev->dev);
}
If the initial pm_runtime_put_sync() returns an error (like -EAGAIN), the
code jumps to err_unregister_wdt and falls through to err_put_pm, where it
calls pm_runtime_put_sync() a second time.
Because __pm_runtime_idle() decrements the usage counter regardless of its
return value, could this cause a usage counter underflow warning in the PM
core?
--
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1780664848.git.william@theesfeld.net?part=6
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v3 0/6] watchdog: starfive: runtime PM cleanup
2026-06-05 13:11 ` [PATCH v2 0/6] watchdog: starfive: runtime PM cleanup William Theesfeld
` (5 preceding siblings ...)
2026-06-05 13:11 ` [PATCH v2 6/6] watchdog: starfive: release early_enable PM refcount on remove William Theesfeld
@ 2026-06-05 17:19 ` William Theesfeld
2026-06-05 17:19 ` [PATCH v3 1/6] watchdog: starfive: balance PM refcount when start operation fails William Theesfeld
` (5 more replies)
6 siblings, 6 replies; 22+ messages in thread
From: William Theesfeld @ 2026-06-05 17:19 UTC (permalink / raw)
To: Xingyu Wu
Cc: Ziv Xu, Wim Van Sebroeck, Guenter Roeck, linux-watchdog,
linux-kernel
This series addresses runtime PM correctness issues in the StarFive
watchdog driver. v3 fixes three issues that were raised against v2:
Changes in v3:
- 2/6 (pm_runtime_put_sync positive return): the v2 commit message
and code comment described the semantics of a "1" return as "the
device is still in use", which is wrong — that case is signalled
by -EAGAIN. A positive return indicates a non-error condition such
as the device already being in the requested state. Reword both
the commit message and the in-function comment to describe this
correctly. No code change beyond that.
- 3/6 (probe error path restructure): the v2 patch left a path that
put the runtime PM usage counter twice. When the success-path
pm_runtime_put_sync() (line ~513) returns an error, the function
jumps to err_unregister_wdt. In v2 that label fell through to
err_put_pm, which called pm_runtime_put_sync() a second time on a
counter that had already been decremented by the first put.
Insert an explicit "goto err_pm_disable;" after the unregister so
the post-register error path skips the second put.
- 4/6 (system suspend/resume guard): the v2 patch added a
pm_runtime_status_suspended() check on the resume path but kept
the final restart predicate as watchdog_active(). For watchdogs
started via early_enable, the framework state is WDOG_HW_RUNNING
without WDOG_ACTIVE, so the existing predicate left the hardware
permanently stopped after a system suspend/resume cycle (the
matching suspend path had already stopped it). Extend the
predicate to also accept watchdog_hw_running().
Patches 1, 5, and 6 are unchanged from v2.
Several pre-existing issues in adjacent code paths were flagged
during v2 review (hardware access during driver unbind, possible
races between device_shutdown() and the watchdog reboot notifier,
SETTIMEOUT/GETTIMELEFT ioctls touching gated registers). They are
not addressed in this series; each is a separate logical change
that deserves its own patch and its own justification. Happy to
follow up on any the maintainers want addressed first.
The whole series builds cleanly with CONFIG_STARFIVE_WATCHDOG=m on
x86_64 defconfig. Runtime testing would require StarFive JH7100 /
JH7110 hardware and has not been performed.
William Theesfeld (6):
watchdog: starfive: balance PM refcount when start operation fails
watchdog: starfive: treat pm_runtime_put_sync() positive return as
success
watchdog: starfive: balance PM refcount and disable in probe error
paths
watchdog: starfive: guard system suspend/resume hardware access
watchdog: starfive: avoid PM refcount underflow in shutdown
watchdog: starfive: release early_enable PM refcount on remove
drivers/watchdog/starfive-wdt.c | 109 ++++++++++++++++++++++++++------
1 file changed, 89 insertions(+), 20 deletions(-)
--
2.54.0
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v3 1/6] watchdog: starfive: balance PM refcount when start operation fails
2026-06-05 17:19 ` [PATCH v3 0/6] watchdog: starfive: runtime PM cleanup William Theesfeld
@ 2026-06-05 17:19 ` William Theesfeld
2026-06-05 17:19 ` [PATCH v3 2/6] watchdog: starfive: treat pm_runtime_put_sync() positive return as success William Theesfeld
` (4 subsequent siblings)
5 siblings, 0 replies; 22+ messages in thread
From: William Theesfeld @ 2026-06-05 17:19 UTC (permalink / raw)
To: Xingyu Wu
Cc: Ziv Xu, Wim Van Sebroeck, Guenter Roeck, linux-watchdog,
linux-kernel
After pm_runtime_resume_and_get() succeeds in starfive_wdt_pm_start(),
the runtime PM usage counter is incremented. If starfive_wdt_start()
subsequently fails, the function returns the error without releasing
that reference, leaking the usage counter.
Call pm_runtime_put_sync() on the start failure path so the counter
is balanced.
This also folds in the v1 conversion of pm_runtime_get_sync() to
pm_runtime_resume_and_get(), which was the original purpose of this
patch; both fixes together close the refcount-leak window in this
function as flagged by the maintainer on the v1 thread.
Signed-off-by: William Theesfeld <william@theesfeld.net>
---
drivers/watchdog/starfive-wdt.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/watchdog/starfive-wdt.c b/drivers/watchdog/starfive-wdt.c
index af55adc4a..ed8c5711a 100644
--- a/drivers/watchdog/starfive-wdt.c
+++ b/drivers/watchdog/starfive-wdt.c
@@ -371,12 +371,16 @@ static void starfive_wdt_stop(struct starfive_wdt *wdt)
static int starfive_wdt_pm_start(struct watchdog_device *wdd)
{
struct starfive_wdt *wdt = watchdog_get_drvdata(wdd);
- int ret = pm_runtime_get_sync(wdd->parent);
+ int ret = pm_runtime_resume_and_get(wdd->parent);
if (ret < 0)
return ret;
- return starfive_wdt_start(wdt);
+ ret = starfive_wdt_start(wdt);
+ if (ret)
+ pm_runtime_put_sync(wdd->parent);
+
+ return ret;
}
static int starfive_wdt_pm_stop(struct watchdog_device *wdd)
--
2.54.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v3 2/6] watchdog: starfive: treat pm_runtime_put_sync() positive return as success
2026-06-05 17:19 ` [PATCH v3 0/6] watchdog: starfive: runtime PM cleanup William Theesfeld
2026-06-05 17:19 ` [PATCH v3 1/6] watchdog: starfive: balance PM refcount when start operation fails William Theesfeld
@ 2026-06-05 17:19 ` William Theesfeld
2026-06-05 17:19 ` [PATCH v3 3/6] watchdog: starfive: balance PM refcount and disable in probe error paths William Theesfeld
` (3 subsequent siblings)
5 siblings, 0 replies; 22+ messages in thread
From: William Theesfeld @ 2026-06-05 17:19 UTC (permalink / raw)
To: Xingyu Wu
Cc: Ziv Xu, Wim Van Sebroeck, Guenter Roeck, linux-watchdog,
linux-kernel
pm_runtime_put_sync() can return a positive value to signal a
non-error condition (for example, the device was already in the
requested state); only negative return values are real errors.
Both starfive_wdt_pm_stop() and starfive_wdt_probe() currently treat
any non-zero return as failure: pm_stop returns the value verbatim,
which the watchdog framework propagates as an error, and probe takes
the err_unregister_wdt path even on a successful but non-zero return.
Mask off the positive return value in pm_stop and tighten the probe
check to "< 0" so the legitimate positive return is no longer
mishandled.
Signed-off-by: William Theesfeld <william@theesfeld.net>
---
drivers/watchdog/starfive-wdt.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/watchdog/starfive-wdt.c b/drivers/watchdog/starfive-wdt.c
index ed8c5711a..e047f52b0 100644
--- a/drivers/watchdog/starfive-wdt.c
+++ b/drivers/watchdog/starfive-wdt.c
@@ -386,9 +386,17 @@ static int starfive_wdt_pm_start(struct watchdog_device *wdd)
static int starfive_wdt_pm_stop(struct watchdog_device *wdd)
{
struct starfive_wdt *wdt = watchdog_get_drvdata(wdd);
+ int ret;
starfive_wdt_stop(wdt);
- return pm_runtime_put_sync(wdd->parent);
+ ret = pm_runtime_put_sync(wdd->parent);
+ /*
+ * pm_runtime_put_sync() can return a positive value to signal a
+ * non-error condition (for example, the device was already in the
+ * requested state and no suspend callback was needed). Only
+ * propagate negative return values as failures.
+ */
+ return ret < 0 ? ret : 0;
}
static int starfive_wdt_set_timeout(struct watchdog_device *wdd,
@@ -503,7 +511,7 @@ static int starfive_wdt_probe(struct platform_device *pdev)
if (!early_enable) {
if (pm_runtime_enabled(&pdev->dev)) {
ret = pm_runtime_put_sync(&pdev->dev);
- if (ret)
+ if (ret < 0)
goto err_unregister_wdt;
}
}
--
2.54.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v3 3/6] watchdog: starfive: balance PM refcount and disable in probe error paths
2026-06-05 17:19 ` [PATCH v3 0/6] watchdog: starfive: runtime PM cleanup William Theesfeld
2026-06-05 17:19 ` [PATCH v3 1/6] watchdog: starfive: balance PM refcount when start operation fails William Theesfeld
2026-06-05 17:19 ` [PATCH v3 2/6] watchdog: starfive: treat pm_runtime_put_sync() positive return as success William Theesfeld
@ 2026-06-05 17:19 ` William Theesfeld
2026-06-05 17:19 ` [PATCH v3 4/6] watchdog: starfive: guard system suspend/resume hardware access William Theesfeld
` (2 subsequent siblings)
5 siblings, 0 replies; 22+ messages in thread
From: William Theesfeld @ 2026-06-05 17:19 UTC (permalink / raw)
To: Xingyu Wu
Cc: Ziv Xu, Wim Van Sebroeck, Guenter Roeck, linux-watchdog,
linux-kernel
The probe path takes a runtime PM reference via
pm_runtime_resume_and_get() (or enables the clocks directly when
runtime PM is unavailable), but several error paths after that point
do not release that reference before returning, and the two earliest
error paths return without calling pm_runtime_disable() at all even
though pm_runtime_enable() has already run.
Restructure the error handling into three labels so every failure
path balances exactly the resources it has acquired:
err_pm_disable: pm_runtime_enable() ran but no clock/refcount
was taken yet (resume_and_get / enable_clock
failed).
err_put_pm: clock or PM refcount is held; release it, then
fall through to disable runtime PM.
err_unregister_wdt: watchdog_register_device() succeeded and the
success-path pm_runtime_put_sync() returned an
error. The put has already decremented the
counter, so this path jumps directly to
err_pm_disable rather than falling through to
err_put_pm; otherwise the counter would be
decremented a second time and underflow.
Update the in-function goto targets to use these labels and remove
the early "return ret;" paths so pm_runtime_disable() is always run
once pm_runtime_enable() has been called.
Signed-off-by: William Theesfeld <william@theesfeld.net>
---
drivers/watchdog/starfive-wdt.c | 28 ++++++++++++++++++++--------
1 file changed, 20 insertions(+), 8 deletions(-)
diff --git a/drivers/watchdog/starfive-wdt.c b/drivers/watchdog/starfive-wdt.c
index e047f52b0..856e55f04 100644
--- a/drivers/watchdog/starfive-wdt.c
+++ b/drivers/watchdog/starfive-wdt.c
@@ -460,17 +460,17 @@ static int starfive_wdt_probe(struct platform_device *pdev)
if (pm_runtime_enabled(&pdev->dev)) {
ret = pm_runtime_resume_and_get(&pdev->dev);
if (ret < 0)
- return ret;
+ goto err_pm_disable;
} else {
/* runtime PM is disabled but clocks need to be enabled */
ret = starfive_wdt_enable_clock(wdt);
if (ret)
- return ret;
+ goto err_pm_disable;
}
ret = starfive_wdt_reset_init(&pdev->dev);
if (ret)
- goto err_exit;
+ goto err_put_pm;
watchdog_set_drvdata(&wdt->wdd, wdt);
wdt->wdd.info = &starfive_wdt_info;
@@ -482,7 +482,7 @@ static int starfive_wdt_probe(struct platform_device *pdev)
if (!wdt->freq) {
dev_err(&pdev->dev, "get clock rate failed.\n");
ret = -EINVAL;
- goto err_exit;
+ goto err_put_pm;
}
wdt->wdd.min_timeout = 1;
@@ -498,7 +498,7 @@ static int starfive_wdt_probe(struct platform_device *pdev)
if (early_enable) {
ret = starfive_wdt_start(wdt);
if (ret)
- goto err_exit;
+ goto err_put_pm;
set_bit(WDOG_HW_RUNNING, &wdt->wdd.status);
} else {
starfive_wdt_stop(wdt);
@@ -506,7 +506,7 @@ static int starfive_wdt_probe(struct platform_device *pdev)
ret = watchdog_register_device(&wdt->wdd);
if (ret)
- goto err_exit;
+ goto err_put_pm;
if (!early_enable) {
if (pm_runtime_enabled(&pdev->dev)) {
@@ -520,8 +520,20 @@ static int starfive_wdt_probe(struct platform_device *pdev)
err_unregister_wdt:
watchdog_unregister_device(&wdt->wdd);
-err_exit:
- starfive_wdt_disable_clock(wdt);
+ /*
+ * The only path into err_unregister_wdt is the post-register
+ * pm_runtime_put_sync() that returned an error. That call already
+ * decremented the runtime PM usage counter, so falling through to
+ * err_put_pm would put again and underflow the counter. Jump
+ * straight to err_pm_disable.
+ */
+ goto err_pm_disable;
+err_put_pm:
+ if (pm_runtime_enabled(&pdev->dev))
+ pm_runtime_put_sync(&pdev->dev);
+ else
+ starfive_wdt_disable_clock(wdt);
+err_pm_disable:
pm_runtime_disable(&pdev->dev);
return ret;
--
2.54.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v3 4/6] watchdog: starfive: guard system suspend/resume hardware access
2026-06-05 17:19 ` [PATCH v3 0/6] watchdog: starfive: runtime PM cleanup William Theesfeld
` (2 preceding siblings ...)
2026-06-05 17:19 ` [PATCH v3 3/6] watchdog: starfive: balance PM refcount and disable in probe error paths William Theesfeld
@ 2026-06-05 17:19 ` William Theesfeld
2026-06-05 17:19 ` [PATCH v3 5/6] watchdog: starfive: avoid PM refcount underflow in shutdown William Theesfeld
2026-06-05 17:19 ` [PATCH v3 6/6] watchdog: starfive: release early_enable PM refcount on remove William Theesfeld
5 siblings, 0 replies; 22+ messages in thread
From: William Theesfeld @ 2026-06-05 17:19 UTC (permalink / raw)
To: Xingyu Wu
Cc: Ziv Xu, Wim Van Sebroeck, Guenter Roeck, linux-watchdog,
linux-kernel
starfive_wdt_suspend() reads from STARFIVE_WDT_VALUE and writes to
WDOGCONTROL via starfive_wdt_get_count() and starfive_wdt_stop()
before calling pm_runtime_force_suspend(). If the device was already
runtime-suspended when system suspend began, the apb and core clocks
are gated and those register accesses produce a synchronous external
abort.
starfive_wdt_resume() has the mirror problem: it unconditionally
restores the WDOGLOAD value after pm_runtime_force_resume() returns,
but pm_runtime_force_resume() leaves the device in whichever runtime
PM state it was in pre-suspend. If the device stays suspended, the
restore writes race with gated clocks.
Gate both halves with pm_runtime_status_suspended() so the hardware
is only touched when it is actually active. When the watchdog was
suspended at runtime PM level there is no state to save or restore.
Additionally, restart the hardware on resume for the early_enable
case (WDOG_HW_RUNNING set without WDOG_ACTIVE). Without this, a
system suspend stops the early-enabled hardware via the suspend
callback above, but the resume callback only restarts when
watchdog_active() is true, leaving the watchdog permanently stopped
after the suspend/resume cycle.
Signed-off-by: William Theesfeld <william@theesfeld.net>
---
drivers/watchdog/starfive-wdt.c | 34 ++++++++++++++++++++++++++-------
1 file changed, 27 insertions(+), 7 deletions(-)
diff --git a/drivers/watchdog/starfive-wdt.c b/drivers/watchdog/starfive-wdt.c
index 856e55f04..e3a47c0c3 100644
--- a/drivers/watchdog/starfive-wdt.c
+++ b/drivers/watchdog/starfive-wdt.c
@@ -564,11 +564,16 @@ static int starfive_wdt_suspend(struct device *dev)
{
struct starfive_wdt *wdt = dev_get_drvdata(dev);
- /* Save watchdog state, and turn it off. */
- wdt->reload = starfive_wdt_get_count(wdt);
-
- /* Note that WTCNT doesn't need to be saved. */
- starfive_wdt_stop(wdt);
+ /*
+ * Only access the hardware while the device is runtime-resumed; if
+ * runtime PM has already suspended the device, its clocks are gated
+ * and a register read/write would trigger a synchronous external
+ * abort. WTCNT does not need to be saved.
+ */
+ if (!pm_runtime_status_suspended(dev)) {
+ wdt->reload = starfive_wdt_get_count(wdt);
+ starfive_wdt_stop(wdt);
+ }
return pm_runtime_force_suspend(dev);
}
@@ -582,12 +587,27 @@ static int starfive_wdt_resume(struct device *dev)
if (ret)
return ret;
+ /*
+ * pm_runtime_force_resume() leaves the device in whichever runtime
+ * PM state it was in before system suspend. Only restore hardware
+ * state when the device is actually resumed; otherwise the register
+ * writes below would race with gated clocks.
+ */
+ if (pm_runtime_status_suspended(dev))
+ return 0;
+
starfive_wdt_unlock(wdt);
- /* Restore watchdog state. */
starfive_wdt_set_reload_count(wdt, wdt->reload);
starfive_wdt_lock(wdt);
- if (watchdog_active(&wdt->wdd))
+ /*
+ * Restart the hardware on resume for both userspace-opened
+ * watchdogs (WDOG_ACTIVE) and watchdogs started via early_enable
+ * (WDOG_HW_RUNNING). Checking only WDOG_ACTIVE leaves the
+ * early_enable case stopped after a suspend/resume cycle even
+ * though the framework still considers the hardware running.
+ */
+ if (watchdog_active(&wdt->wdd) || watchdog_hw_running(&wdt->wdd))
return starfive_wdt_start(wdt);
return 0;
--
2.54.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v3 5/6] watchdog: starfive: avoid PM refcount underflow in shutdown
2026-06-05 17:19 ` [PATCH v3 0/6] watchdog: starfive: runtime PM cleanup William Theesfeld
` (3 preceding siblings ...)
2026-06-05 17:19 ` [PATCH v3 4/6] watchdog: starfive: guard system suspend/resume hardware access William Theesfeld
@ 2026-06-05 17:19 ` William Theesfeld
2026-06-05 17:19 ` [PATCH v3 6/6] watchdog: starfive: release early_enable PM refcount on remove William Theesfeld
5 siblings, 0 replies; 22+ messages in thread
From: William Theesfeld @ 2026-06-05 17:19 UTC (permalink / raw)
To: Xingyu Wu
Cc: Ziv Xu, Wim Van Sebroeck, Guenter Roeck, linux-watchdog,
linux-kernel
starfive_wdt_shutdown() unconditionally calls starfive_wdt_pm_stop(),
which drops a runtime PM reference via pm_runtime_put_sync(). If the
device was never opened (so there was no matching pm_start()), and
early_enable is also off (so probe already balanced its own get/put),
the counter is at zero and put_sync() drives it negative.
Only call pm_stop() when WDOG_ACTIVE is set (meaning a paired
pm_start() happened on open). For the early_enable-and-never-opened
case, the framework state is WDOG_HW_RUNNING without WDOG_ACTIVE; in
that case stop just the hardware so the watchdog does not keep
ticking through shutdown, while leaving the PM refcount for the
remove path to release.
Signed-off-by: William Theesfeld <william@theesfeld.net>
---
drivers/watchdog/starfive-wdt.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/watchdog/starfive-wdt.c b/drivers/watchdog/starfive-wdt.c
index e3a47c0c3..6137e98f1 100644
--- a/drivers/watchdog/starfive-wdt.c
+++ b/drivers/watchdog/starfive-wdt.c
@@ -557,7 +557,19 @@ static void starfive_wdt_shutdown(struct platform_device *pdev)
{
struct starfive_wdt *wdt = platform_get_drvdata(pdev);
- starfive_wdt_pm_stop(&wdt->wdd);
+ /*
+ * Only drop a runtime PM reference when we actually hold one. The
+ * watchdog framework increments it via pm_start() on open; when
+ * early_enable started the hardware without an open, the reference
+ * was taken at probe time and the framework state reads as
+ * WDOG_HW_RUNNING without WDOG_ACTIVE. In the inactive-and-not-
+ * running case there is nothing for us to do and dropping the
+ * counter unconditionally would underflow it.
+ */
+ if (watchdog_active(&wdt->wdd))
+ starfive_wdt_pm_stop(&wdt->wdd);
+ else if (test_bit(WDOG_HW_RUNNING, &wdt->wdd.status))
+ starfive_wdt_stop(wdt);
}
static int starfive_wdt_suspend(struct device *dev)
--
2.54.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v3 6/6] watchdog: starfive: release early_enable PM refcount on remove
2026-06-05 17:19 ` [PATCH v3 0/6] watchdog: starfive: runtime PM cleanup William Theesfeld
` (4 preceding siblings ...)
2026-06-05 17:19 ` [PATCH v3 5/6] watchdog: starfive: avoid PM refcount underflow in shutdown William Theesfeld
@ 2026-06-05 17:19 ` William Theesfeld
5 siblings, 0 replies; 22+ messages in thread
From: William Theesfeld @ 2026-06-05 17:19 UTC (permalink / raw)
To: Xingyu Wu
Cc: Ziv Xu, Wim Van Sebroeck, Guenter Roeck, linux-watchdog,
linux-kernel
When early_enable starts the hardware at probe time, the runtime PM
reference taken by pm_runtime_resume_and_get() is intentionally kept
to hold the device runtime-resumed. In the normal (userspace-opened)
case the watchdog framework unwinds that reference via pm_stop() on
close, and again via stop_on_unregister() during
watchdog_unregister_device().
But watchdog_unregister_device() only runs the stop path when
WDOG_ACTIVE is set (see drivers/watchdog/watchdog_dev.c). If
early_enable started the hardware and userspace never opened it,
WDOG_HW_RUNNING is set without WDOG_ACTIVE: unregister returns
without touching pm_stop, and the probe-time PM reference is leaked.
Drop that reference in starfive_wdt_remove() before
pm_runtime_disable() runs, when WDOG_HW_RUNNING is still asserted.
Signed-off-by: William Theesfeld <william@theesfeld.net>
---
drivers/watchdog/starfive-wdt.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/drivers/watchdog/starfive-wdt.c b/drivers/watchdog/starfive-wdt.c
index 6137e98f1..ad5bb67d4 100644
--- a/drivers/watchdog/starfive-wdt.c
+++ b/drivers/watchdog/starfive-wdt.c
@@ -546,6 +546,19 @@ static void starfive_wdt_remove(struct platform_device *pdev)
starfive_wdt_stop(wdt);
watchdog_unregister_device(&wdt->wdd);
+ /*
+ * watchdog_unregister_device() only calls our pm_stop op when
+ * WDOG_ACTIVE is set. When early_enable started the hardware at
+ * probe time and userspace never opened the watchdog, the framework
+ * state is WDOG_HW_RUNNING without WDOG_ACTIVE: the unregister path
+ * leaves the runtime PM reference taken at probe outstanding. Drop
+ * it here before disabling runtime PM so the usage counter does not
+ * leak.
+ */
+ if (test_bit(WDOG_HW_RUNNING, &wdt->wdd.status) &&
+ pm_runtime_enabled(&pdev->dev))
+ pm_runtime_put_sync(&pdev->dev);
+
if (pm_runtime_enabled(&pdev->dev))
pm_runtime_disable(&pdev->dev);
else
--
2.54.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
end of thread, other threads:[~2026-06-05 17:20 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-05 12:23 [PATCH] watchdog: starfive: use pm_runtime_resume_and_get() to fix refcount leak William Theesfeld
2026-06-05 12:35 ` sashiko-bot
2026-06-05 13:11 ` [PATCH v2 0/6] watchdog: starfive: runtime PM cleanup William Theesfeld
2026-06-05 13:11 ` [PATCH v2 1/6] watchdog: starfive: balance PM refcount when start operation fails William Theesfeld
2026-06-05 13:23 ` sashiko-bot
2026-06-05 13:11 ` [PATCH v2 2/6] watchdog: starfive: treat pm_runtime_put_sync() positive return as success William Theesfeld
2026-06-05 13:26 ` sashiko-bot
2026-06-05 13:11 ` [PATCH v2 3/6] watchdog: starfive: balance PM refcount and disable in probe error paths William Theesfeld
2026-06-05 13:24 ` sashiko-bot
2026-06-05 13:11 ` [PATCH v2 4/6] watchdog: starfive: guard system suspend/resume hardware access William Theesfeld
2026-06-05 13:27 ` sashiko-bot
2026-06-05 13:11 ` [PATCH v2 5/6] watchdog: starfive: avoid PM refcount underflow in shutdown William Theesfeld
2026-06-05 13:29 ` sashiko-bot
2026-06-05 13:11 ` [PATCH v2 6/6] watchdog: starfive: release early_enable PM refcount on remove William Theesfeld
2026-06-05 13:33 ` sashiko-bot
2026-06-05 17:19 ` [PATCH v3 0/6] watchdog: starfive: runtime PM cleanup William Theesfeld
2026-06-05 17:19 ` [PATCH v3 1/6] watchdog: starfive: balance PM refcount when start operation fails William Theesfeld
2026-06-05 17:19 ` [PATCH v3 2/6] watchdog: starfive: treat pm_runtime_put_sync() positive return as success William Theesfeld
2026-06-05 17:19 ` [PATCH v3 3/6] watchdog: starfive: balance PM refcount and disable in probe error paths William Theesfeld
2026-06-05 17:19 ` [PATCH v3 4/6] watchdog: starfive: guard system suspend/resume hardware access William Theesfeld
2026-06-05 17:19 ` [PATCH v3 5/6] watchdog: starfive: avoid PM refcount underflow in shutdown William Theesfeld
2026-06-05 17:19 ` [PATCH v3 6/6] watchdog: starfive: release early_enable PM refcount on remove William Theesfeld
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.