Linux Watchdog driver development
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "William Theesfeld" <william@theesfeld.net>
Cc: linux-watchdog@vger.kernel.org
Subject: Re: [PATCH v2 2/6] watchdog: starfive: treat pm_runtime_put_sync() positive return as success
Date: Fri, 05 Jun 2026 13:26:07 +0000	[thread overview]
Message-ID: <20260605132607.DB5001F00893@smtp.kernel.org> (raw)
In-Reply-To: <bc91d669bb732784d1e28d4bd259612b6dd07342.1780664848.git.william@theesfeld.net>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [Medium] The commit message and inline comment justify the code change with a fundamentally incorrect explanation of Runtime PM return values.

Pre-existing issues:
- [High] Propagating normal negative returns from `pm_runtime_put_sync()` as errors breaks the watchdog state machine and probe function.
- [High] The error path in `starfive_wdt_probe` calls `pm_runtime_put_sync()` twice when the initial PM put fails, leading to a usage count underflow.
- [High] The driver unconditionally accesses hardware registers during driver removal without guaranteeing the clocks are enabled, risking bus faults.
--

commit ff0d7567f1fad840688f8c89badc9b0278568af6
Author: William Theesfeld <william@theesfeld.net>

watchdog: starfive: treat pm_runtime_put_sync() positive return as success

This commit updates the watchdog driver to mask off positive return values
from pm_runtime_put_sync() in starfive_wdt_pm_stop() and tightens the error
check in starfive_wdt_probe() to only catch negative returns. The intention
is to stop treating the non-suspended state as an error condition.

> watchdog: starfive: treat pm_runtime_put_sync() positive return as success
> 
> pm_runtime_put_sync() returns 1 when the device cannot be suspended
> because it is still in use (typically due to other runtime PM users
> holding references); only negative return values are real errors.

[Severity: Medium]
Is this description of the return value accurate?

According to the Runtime PM guidelines, pm_runtime_put_sync() returns 1 when
the device is already suspended, not when it is still in use. When a device
is still in use (i.e., usage counter is non-zero), it returns -EAGAIN.

Documenting incorrect API semantics here could misguide future maintainers.

[ ... ]

> diff --git a/drivers/watchdog/starfive-wdt.c b/drivers/watchdog/starfive-wdt.c
> index ed8c5711a3853..c821b38d26d22 100644
> --- a/drivers/watchdog/starfive-wdt.c
> +++ b/drivers/watchdog/starfive-wdt.c
> @@ -386,9 +386,17 @@ static int starfive_wdt_pm_start(struct watchdog_device *wdd)
>  static int starfive_wdt_pm_stop(struct watchdog_device *wdd)
>  {
>  	struct starfive_wdt *wdt = watchdog_get_drvdata(wdd);
> +	int ret;
>  
>  	starfive_wdt_stop(wdt);
> -	return pm_runtime_put_sync(wdd->parent);
> +	ret = pm_runtime_put_sync(wdd->parent);
> +	/*
> +	 * pm_runtime_put_sync() can return 1 when the device is still in
> +	 * use (and therefore did not actually suspend).  That is not an
> +	 * error from the caller's point of view; only propagate negative
> +	 * return values.
> +	 */

[Severity: Medium]
Is the comment's description of the return value correct?

As mentioned regarding the commit message, pm_runtime_put_sync() returns 1
when the device is already suspended, not when it is still in use.

> +	return ret < 0 ? ret : 0;

[Severity: High]
This isn't a bug introduced by this patch, but does propagating normal
negative returns as errors break the watchdog state machine?

If the device cannot immediately suspend due to a non-zero usage counter,
pm_runtime_put_sync() correctly returns -EAGAIN or -EBUSY.

By propagating these negative values as errors, the watchdog core assumes the
hardware stop failed. It will leave the WDOG_ACTIVE bit set and continue the
background ping worker.

Because the driver has already successfully disabled the hardware and dropped
its PM reference, the device will eventually suspend. The rogue ping worker
would then access unclocked MMIO registers and trigger a fatal bus fault.

Should this return 0 if the hardware was successfully stopped?

>  }
>  
>  static int starfive_wdt_set_timeout(struct watchdog_device *wdd,

[ ... ]

> @@ -503,7 +511,7 @@ static int starfive_wdt_probe(struct platform_device *pdev)
>  	if (!early_enable) {
>  		if (pm_runtime_enabled(&pdev->dev)) {
>  			ret = pm_runtime_put_sync(&pdev->dev);
> -			if (ret)
> +			if (ret < 0)

[Severity: High]
This is a pre-existing issue, but does treating -EAGAIN as a failure cause
the probe to fail unnecessarily?

If pm_runtime_put_sync() returns -EAGAIN due to a concurrent sysfs access,
propagating it via ret < 0 turns a harmless concurrency race into a fatal
failure, unregistering a working watchdog.

[Severity: High]
This is a pre-existing issue, but does this error path cause a usage count
underflow?

If pm_runtime_put_sync() returns an error (like -EAGAIN), the code branches
to err_unregister_wdt and falls through to the err_put_pm label:

err_unregister_wdt:
	watchdog_unregister_device(&wdt->wdd);
err_put_pm:
	if (pm_runtime_enabled(&pdev->dev))
		pm_runtime_put_sync(&pdev->dev);

Since the initial pm_runtime_put_sync() unconditionally decrements the usage
count regardless of its return value, calling it a second time here causes
an unbalanced decrement.

>  				goto err_unregister_wdt;
>  		}
>  	}

[Severity: High]
This is a pre-existing issue, but does starfive_wdt_remove() risk bus faults
by unconditionally accessing hardware registers without ensuring clocks are
active?

drivers/watchdog/starfive-wdt.c:starfive_wdt_remove() {
...
	struct starfive_wdt *wdt = platform_get_drvdata(pdev);

	starfive_wdt_stop(wdt);
...
}

If the watchdog was never opened by userspace and early_enable was false, the
device is in the RPM_SUSPENDED state with its clocks gated. Accessing
unclocked MMIO registers in starfive_wdt_stop() triggers a synchronous
external abort.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1780664848.git.william@theesfeld.net?part=2

  reply	other threads:[~2026-06-05 13:26 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-05 12:23 [PATCH] watchdog: starfive: use pm_runtime_resume_and_get() to fix refcount leak William Theesfeld
2026-06-05 12:35 ` sashiko-bot
2026-06-05 13:11 ` [PATCH v2 0/6] watchdog: starfive: runtime PM cleanup William Theesfeld
2026-06-05 13:11   ` [PATCH v2 1/6] watchdog: starfive: balance PM refcount when start operation fails William Theesfeld
2026-06-05 13:23     ` sashiko-bot
2026-06-05 13:11   ` [PATCH v2 2/6] watchdog: starfive: treat pm_runtime_put_sync() positive return as success William Theesfeld
2026-06-05 13:26     ` sashiko-bot [this message]
2026-06-05 13:11   ` [PATCH v2 3/6] watchdog: starfive: balance PM refcount and disable in probe error paths William Theesfeld
2026-06-05 13:24     ` sashiko-bot
2026-06-05 13:11   ` [PATCH v2 4/6] watchdog: starfive: guard system suspend/resume hardware access William Theesfeld
2026-06-05 13:27     ` sashiko-bot
2026-06-05 13:11   ` [PATCH v2 5/6] watchdog: starfive: avoid PM refcount underflow in shutdown William Theesfeld
2026-06-05 13:29     ` sashiko-bot
2026-06-05 13:11   ` [PATCH v2 6/6] watchdog: starfive: release early_enable PM refcount on remove William Theesfeld
2026-06-05 13:33     ` sashiko-bot
2026-06-05 17:19   ` [PATCH v3 0/6] watchdog: starfive: runtime PM cleanup William Theesfeld
2026-06-05 17:19     ` [PATCH v3 1/6] watchdog: starfive: balance PM refcount when start operation fails William Theesfeld
2026-06-05 17:19     ` [PATCH v3 2/6] watchdog: starfive: treat pm_runtime_put_sync() positive return as success William Theesfeld
2026-06-05 17:19     ` [PATCH v3 3/6] watchdog: starfive: balance PM refcount and disable in probe error paths William Theesfeld
2026-06-05 17:19     ` [PATCH v3 4/6] watchdog: starfive: guard system suspend/resume hardware access William Theesfeld
2026-06-05 17:19     ` [PATCH v3 5/6] watchdog: starfive: avoid PM refcount underflow in shutdown William Theesfeld
2026-06-05 17:19     ` [PATCH v3 6/6] watchdog: starfive: release early_enable PM refcount on remove William Theesfeld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260605132607.DB5001F00893@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=linux-watchdog@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=william@theesfeld.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox