From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 61D4840E8C7 for ; Fri, 5 Jun 2026 13:26:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780665969; cv=none; b=fHtUpHHV3wvScYyPlQF54fxlcQV1JmjW7dQ3AOV1C4mxzxtkXB5fO/ZlZSy8wfnUX60tyTBowYnOPuKNLMLEF8AVAKX0sL2BtwL19IQRehTrHNCGIcIEbIEAG7/RqAZi/5YeS7ykBeUYaRnH05hIToRifzDVsRPx2hfvadPd7Vc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780665969; c=relaxed/simple; bh=Ht5vRTW0svFQq6S0bsbJjzx/Z2WZYS2HgANd8yx8Vm8=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=Q3Qt088s/Tb3G694D0gF6kwEsXdseiQkI2eoY6d7qhCa+edXnJO9EI56XtflRr6p6BCNMgHslp3bkyChxuOjeicScL3EBUSlyEYrhFyJjaUE3vqTgguHTSUOcSv4v+NWmNFLmGNildRvsB661glYYYu1hRKBB9Xjyb/yibOBpp4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Lz8HElMp; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Lz8HElMp" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DB5001F00893; Fri, 5 Jun 2026 13:26:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780665968; bh=bvn2OmNHeNQA8djHRkjEtnbHRxYksNi+jbeocc8J/J4=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=Lz8HElMpF1OIrnjrb7Yf8ujcmGkTnASpQlXj37maHFxs8K3YXfQUfQoqJ0U3kAShb iAZf7l1dyHNiTL86J4hblv2h57N9/73JFrxzaURoAgTOaxGLd2gL9t3CDmciqv3Ixu WGbGnQBAS4jNaxtzeb2DeLh7Wma4NpI18V+sd5noUvKrF9uN0xcxuWFv7NTdeYaO0L R2bZ7gaRSfwP/ur9kU43awKoYCQDTykW+635VihoV3vGeoNGl+E5nyAlgSG/b8fDJS Yn+AUmvbx3LAlNepJvjfrBsl0XieysBrgHYxIqmC/tFdJ99e5dU1gaoLk5jiOvzWyY PqXn35iJI7Qig== From: sashiko-bot@kernel.org Subject: Re: [PATCH v2 2/6] watchdog: starfive: treat pm_runtime_put_sync() positive return as success Reply-To: sashiko-reviews@lists.linux.dev To: "William Theesfeld" Cc: linux-watchdog@vger.kernel.org In-Reply-To: References: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Fri, 05 Jun 2026 13:26:07 +0000 Message-Id: <20260605132607.DB5001F00893@smtp.kernel.org> Precedence: bulk X-Mailing-List: linux-watchdog@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 4 potential issue(= s) to consider: New issues: - [Medium] The commit message and inline comment justify the code change wi= th a fundamentally incorrect explanation of Runtime PM return values. Pre-existing issues: - [High] Propagating normal negative returns from `pm_runtime_put_sync()` a= s errors breaks the watchdog state machine and probe function. - [High] The error path in `starfive_wdt_probe` calls `pm_runtime_put_sync(= )` twice when the initial PM put fails, leading to a usage count underflow. - [High] The driver unconditionally accesses hardware registers during driv= er removal without guaranteeing the clocks are enabled, risking bus faults. -- commit ff0d7567f1fad840688f8c89badc9b0278568af6 Author: William Theesfeld watchdog: starfive: treat pm_runtime_put_sync() positive return as success This commit updates the watchdog driver to mask off positive return values from pm_runtime_put_sync() in starfive_wdt_pm_stop() and tightens the error check in starfive_wdt_probe() to only catch negative returns. The intention is to stop treating the non-suspended state as an error condition. > watchdog: starfive: treat pm_runtime_put_sync() positive return as success >=20 > pm_runtime_put_sync() returns 1 when the device cannot be suspended > because it is still in use (typically due to other runtime PM users > holding references); only negative return values are real errors. [Severity: Medium] Is this description of the return value accurate? According to the Runtime PM guidelines, pm_runtime_put_sync() returns 1 when the device is already suspended, not when it is still in use. When a device is still in use (i.e., usage counter is non-zero), it returns -EAGAIN. Documenting incorrect API semantics here could misguide future maintainers. [ ... ] > diff --git a/drivers/watchdog/starfive-wdt.c b/drivers/watchdog/starfive-= wdt.c > index ed8c5711a3853..c821b38d26d22 100644 > --- a/drivers/watchdog/starfive-wdt.c > +++ b/drivers/watchdog/starfive-wdt.c > @@ -386,9 +386,17 @@ static int starfive_wdt_pm_start(struct watchdog_dev= ice *wdd) > static int starfive_wdt_pm_stop(struct watchdog_device *wdd) > { > struct starfive_wdt *wdt =3D watchdog_get_drvdata(wdd); > + int ret; > =20 > starfive_wdt_stop(wdt); > - return pm_runtime_put_sync(wdd->parent); > + ret =3D pm_runtime_put_sync(wdd->parent); > + /* > + * pm_runtime_put_sync() can return 1 when the device is still in > + * use (and therefore did not actually suspend). That is not an > + * error from the caller's point of view; only propagate negative > + * return values. > + */ [Severity: Medium] Is the comment's description of the return value correct? As mentioned regarding the commit message, pm_runtime_put_sync() returns 1 when the device is already suspended, not when it is still in use. > + return ret < 0 ? ret : 0; [Severity: High] This isn't a bug introduced by this patch, but does propagating normal negative returns as errors break the watchdog state machine? If the device cannot immediately suspend due to a non-zero usage counter, pm_runtime_put_sync() correctly returns -EAGAIN or -EBUSY. By propagating these negative values as errors, the watchdog core assumes t= he hardware stop failed. It will leave the WDOG_ACTIVE bit set and continue the background ping worker. Because the driver has already successfully disabled the hardware and dropp= ed its PM reference, the device will eventually suspend. The rogue ping worker would then access unclocked MMIO registers and trigger a fatal bus fault. Should this return 0 if the hardware was successfully stopped? > } > =20 > static int starfive_wdt_set_timeout(struct watchdog_device *wdd, [ ... ] > @@ -503,7 +511,7 @@ static int starfive_wdt_probe(struct platform_device = *pdev) > if (!early_enable) { > if (pm_runtime_enabled(&pdev->dev)) { > ret =3D pm_runtime_put_sync(&pdev->dev); > - if (ret) > + if (ret < 0) [Severity: High] This is a pre-existing issue, but does treating -EAGAIN as a failure cause the probe to fail unnecessarily? If pm_runtime_put_sync() returns -EAGAIN due to a concurrent sysfs access, propagating it via ret < 0 turns a harmless concurrency race into a fatal failure, unregistering a working watchdog. [Severity: High] This is a pre-existing issue, but does this error path cause a usage count underflow? If pm_runtime_put_sync() returns an error (like -EAGAIN), the code branches to err_unregister_wdt and falls through to the err_put_pm label: err_unregister_wdt: watchdog_unregister_device(&wdt->wdd); err_put_pm: if (pm_runtime_enabled(&pdev->dev)) pm_runtime_put_sync(&pdev->dev); Since the initial pm_runtime_put_sync() unconditionally decrements the usage count regardless of its return value, calling it a second time here causes an unbalanced decrement. > goto err_unregister_wdt; > } > } [Severity: High] This is a pre-existing issue, but does starfive_wdt_remove() risk bus faults by unconditionally accessing hardware registers without ensuring clocks are active? drivers/watchdog/starfive-wdt.c:starfive_wdt_remove() { ... struct starfive_wdt *wdt =3D platform_get_drvdata(pdev); starfive_wdt_stop(wdt); ... } If the watchdog was never opened by userspace and early_enable was false, t= he device is in the RPM_SUSPENDED state with its clocks gated. Accessing unclocked MMIO registers in starfive_wdt_stop() triggers a synchronous external abort. --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/cover.1780664848.gi= t.william@theesfeld.net?part=3D2