* [PATCH AUTOSEL 6.16-5.15] watchdog: sbsa: Adjust keepalive timeout to avoid MediaTek WS0 race condition
[not found] <20250805130945.471732-1-sashal@kernel.org>
@ 2025-08-05 13:08 ` Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.15] watchdog: iTCO_wdt: Report error if timeout configuration fails Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.4] watchdog: dw_wdt: Fix default timeout Sasha Levin
2 siblings, 0 replies; 3+ messages in thread
From: Sasha Levin @ 2025-08-05 13:08 UTC (permalink / raw)
To: patches, stable
Cc: Aaron Plattner, Timur Tabi, Guenter Roeck, Wim Van Sebroeck,
Sasha Levin, linux-watchdog
From: Aaron Plattner <aplattner@nvidia.com>
[ Upstream commit 48defdf6b083f74a44e1f742db284960d3444aec ]
The MediaTek implementation of the sbsa_gwdt watchdog has a race
condition where a write to SBSA_GWDT_WRR is ignored if it occurs while
the hardware is processing a timeout refresh that asserts WS0.
Detect this based on the hardware implementer and adjust
wdd->min_hw_heartbeat_ms to avoid the race by forcing the keepalive ping
to be one second later.
Signed-off-by: Aaron Plattner <aplattner@nvidia.com>
Acked-by: Timur Tabi <ttabi@nvidia.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20250721230640.2244915-1-aplattner@nvidia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees. Here's my
extensive analysis:
## Nature of the Fix
This is a **hardware-specific bug fix** that addresses a race condition
in MediaTek's implementation of the SBSA generic watchdog. The race
occurs when a watchdog keepalive ping (`writel(0, gwdt->refresh_base +
SBSA_GWDT_WRR)`) happens simultaneously with the hardware asserting the
WS0 timeout signal. In affected hardware, this write is silently
ignored, potentially leading to unexpected system resets.
## Key Code Changes Analysis
1. **Hardware Detection** (lines 78-79, 217-219):
- Adds `SBSA_GWDT_IMPL_MEDIATEK` constant (0x426) to identify
MediaTek hardware
- Reads hardware implementer ID from `SBSA_GWDT_W_IIDR` register
- Sets `need_ws0_race_workaround` flag only for MediaTek
implementations when `action=0`
2. **Workaround Implementation** (lines 171-191):
- Adjusts `wdd->min_hw_heartbeat_ms` to `timeout * 500 + 1000`
- This forces keepalive pings to occur 1 second after the WOR timeout
- Avoids the exact moment when WS0 is being asserted
3. **Minimum Timeout Adjustment** (lines 337-343):
- Sets minimum timeout to 3 seconds for affected hardware
- Ensures WOR is set to 1.5 seconds with heartbeats every 2.5 seconds
## Why This Qualifies for Stable Backport
1. **Fixes a Real Bug**: Addresses a hardware race condition that can
cause **unexpected system resets** - a critical reliability issue in
production systems.
2. **Limited Scope**:
- Only affects MediaTek SBSA watchdog implementations
- Workaround is conditionally applied based on hardware detection
- No impact on other implementations
3. **Minimal Risk**:
- Changes are confined to timing adjustments
- No architectural changes or new features
- Preserves existing behavior for all non-MediaTek hardware
4. **Clear Hardware Issue**: The commit message explicitly describes the
race condition where "a write to SBSA_GWDT_WRR is ignored if it
occurs while the hardware is processing a timeout refresh that
asserts WS0."
5. **Production Impact**: Without this fix, systems with MediaTek SBSA
watchdog can experience spurious reboots when the race condition is
triggered, affecting system availability.
6. **Follows Stable Rules**:
- Fixes an actual bug (not adding features)
- Small, contained change (~50 lines)
- Already reviewed by subsystem maintainers
- Hardware-specific fix with clear problem/solution
## Comparison with Similar Commits
Looking at recent watchdog commits that were backported:
- `c6e89348fd58`: Fixed incorrect timeout calculations - backported with
Cc: stable
- `041fdbe73de9`: Fixed timeout programming limits - backported to
stable
This commit follows the same pattern of fixing hardware-specific bugs
that affect system reliability.
The fix is essential for MediaTek-based systems using SBSA watchdog to
prevent unexpected reboots caused by the hardware race condition.
drivers/watchdog/sbsa_gwdt.c | 50 +++++++++++++++++++++++++++++++++---
1 file changed, 47 insertions(+), 3 deletions(-)
diff --git a/drivers/watchdog/sbsa_gwdt.c b/drivers/watchdog/sbsa_gwdt.c
index 5f23913ce3b4..6ce1bfb39064 100644
--- a/drivers/watchdog/sbsa_gwdt.c
+++ b/drivers/watchdog/sbsa_gwdt.c
@@ -75,11 +75,17 @@
#define SBSA_GWDT_VERSION_MASK 0xF
#define SBSA_GWDT_VERSION_SHIFT 16
+#define SBSA_GWDT_IMPL_MASK 0x7FF
+#define SBSA_GWDT_IMPL_SHIFT 0
+#define SBSA_GWDT_IMPL_MEDIATEK 0x426
+
/**
* struct sbsa_gwdt - Internal representation of the SBSA GWDT
* @wdd: kernel watchdog_device structure
* @clk: store the System Counter clock frequency, in Hz.
* @version: store the architecture version
+ * @need_ws0_race_workaround:
+ * indicate whether to adjust wdd->timeout to avoid a race with WS0
* @refresh_base: Virtual address of the watchdog refresh frame
* @control_base: Virtual address of the watchdog control frame
*/
@@ -87,6 +93,7 @@ struct sbsa_gwdt {
struct watchdog_device wdd;
u32 clk;
int version;
+ bool need_ws0_race_workaround;
void __iomem *refresh_base;
void __iomem *control_base;
};
@@ -161,6 +168,31 @@ static int sbsa_gwdt_set_timeout(struct watchdog_device *wdd,
*/
sbsa_gwdt_reg_write(((u64)gwdt->clk / 2) * timeout, gwdt);
+ /*
+ * Some watchdog hardware has a race condition where it will ignore
+ * sbsa_gwdt_keepalive() if it is called at the exact moment that a
+ * timeout occurs and WS0 is being asserted. Unfortunately, the default
+ * behavior of the watchdog core is very likely to trigger this race
+ * when action=0 because it programs WOR to be half of the desired
+ * timeout, and watchdog_next_keepalive() chooses the exact same time to
+ * send keepalive pings.
+ *
+ * This triggers a race where sbsa_gwdt_keepalive() can be called right
+ * as WS0 is being asserted, and affected hardware will ignore that
+ * write and continue to assert WS0. After another (timeout / 2)
+ * seconds, the same race happens again. If the driver wins then the
+ * explicit refresh will reset WS0 to false but if the hardware wins,
+ * then WS1 is asserted and the system resets.
+ *
+ * Avoid the problem by scheduling keepalive heartbeats one second later
+ * than the WOR timeout.
+ *
+ * This workaround might not be needed in a future revision of the
+ * hardware.
+ */
+ if (gwdt->need_ws0_race_workaround)
+ wdd->min_hw_heartbeat_ms = timeout * 500 + 1000;
+
return 0;
}
@@ -202,12 +234,15 @@ static int sbsa_gwdt_keepalive(struct watchdog_device *wdd)
static void sbsa_gwdt_get_version(struct watchdog_device *wdd)
{
struct sbsa_gwdt *gwdt = watchdog_get_drvdata(wdd);
- int ver;
+ int iidr, ver, impl;
- ver = readl(gwdt->control_base + SBSA_GWDT_W_IIDR);
- ver = (ver >> SBSA_GWDT_VERSION_SHIFT) & SBSA_GWDT_VERSION_MASK;
+ iidr = readl(gwdt->control_base + SBSA_GWDT_W_IIDR);
+ ver = (iidr >> SBSA_GWDT_VERSION_SHIFT) & SBSA_GWDT_VERSION_MASK;
+ impl = (iidr >> SBSA_GWDT_IMPL_SHIFT) & SBSA_GWDT_IMPL_MASK;
gwdt->version = ver;
+ gwdt->need_ws0_race_workaround =
+ !action && (impl == SBSA_GWDT_IMPL_MEDIATEK);
}
static int sbsa_gwdt_start(struct watchdog_device *wdd)
@@ -299,6 +334,15 @@ static int sbsa_gwdt_probe(struct platform_device *pdev)
else
wdd->max_hw_heartbeat_ms = GENMASK_ULL(47, 0) / gwdt->clk * 1000;
+ if (gwdt->need_ws0_race_workaround) {
+ /*
+ * A timeout of 3 seconds means that WOR will be set to 1.5
+ * seconds and the heartbeat will be scheduled every 2.5
+ * seconds.
+ */
+ wdd->min_timeout = 3;
+ }
+
status = readl(cf_base + SBSA_GWDT_WCS);
if (status & SBSA_GWDT_WCS_WS1) {
dev_warn(dev, "System reset by WDT.\n");
--
2.39.5
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH AUTOSEL 6.16-5.15] watchdog: iTCO_wdt: Report error if timeout configuration fails
[not found] <20250805130945.471732-1-sashal@kernel.org>
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16-5.15] watchdog: sbsa: Adjust keepalive timeout to avoid MediaTek WS0 race condition Sasha Levin
@ 2025-08-05 13:09 ` Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.4] watchdog: dw_wdt: Fix default timeout Sasha Levin
2 siblings, 0 replies; 3+ messages in thread
From: Sasha Levin @ 2025-08-05 13:09 UTC (permalink / raw)
To: patches, stable
Cc: Ziyan Fu, Guenter Roeck, Wim Van Sebroeck, Sasha Levin,
linux-watchdog
From: Ziyan Fu <fuzy5@lenovo.com>
[ Upstream commit 40efc43eb7ffb5a4e2f998c13b8cfb555e671b92 ]
The driver probes with the invalid timeout value when
'iTCO_wdt_set_timeout()' fails, as its return value is not checked. In
this case, when executing "wdctl", we may get:
Device: /dev/watchdog0
Timeout: 30 seconds
Timeleft: 613 seconds
The timeout value is the value of "heartbeat" or "WATCHDOG_TIMEOUT", and
the timeleft value is calculated from the register value we actually read
(0xffff) by masking with 0x3ff and converting ticks to seconds (* 6 / 10).
Add error handling to return the failure code if 'iTCO_wdt_set_timeout()'
fails, ensuring the driver probe fails and prevents invalid operation.
Signed-off-by: Ziyan Fu <fuzy5@lenovo.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20250704073518.7838-1-13281011316@163.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the
following reasons:
## Bug Fix Analysis
1. **Critical Bug Being Fixed**: The commit fixes a probe failure
handling bug where the driver continues to initialize with an invalid
timeout value when `iTCO_wdt_set_timeout()` fails. This can lead to:
- Watchdog operating with incorrect timeout values
- Misleading information reported to userspace (timeout shows 30s but
actual timer reads 0xffff)
- Potential system instability due to watchdog misbehavior
2. **Clear Error Path Issue**: Looking at the code changes in
drivers/watchdog/iTCO_wdt.c:579-584, the original code ignored the
return value of the fallback `iTCO_wdt_set_timeout()` call. If this
second attempt to set the default timeout also fails, the driver
would continue initialization with a potentially corrupted watchdog
state.
3. **Hardware Interaction Failure**: The `iTCO_wdt_set_timeout()`
function (lines 351-395) performs hardware register writes and
verifies them. A failure here means:
- Hardware register writes failed to take effect
- Verification showed the written value didn't match what was read
back
- The watchdog timer is in an undefined/invalid state
## Stable Backport Criteria
1. **Small and Contained Fix**: The change is minimal - only 4 lines
added:
- Captures return value of fallback timeout setting
- Adds error checking and proper error return
- Adds appropriate error logging
2. **No Architectural Changes**: This is a straightforward error
handling fix that doesn't change any interfaces or introduce new
features.
3. **Fixes Real User-Visible Issue**: The commit message shows this
causes real problems visible to users through `wdctl` command showing
incorrect timeout/timeleft values.
4. **Low Risk of Regression**: The change only adds error checking where
none existed before. If the timeout setting succeeds (normal case),
behavior is unchanged. If it fails, the driver now properly fails to
probe instead of continuing with invalid state.
5. **Critical Subsystem**: Watchdog drivers are critical for system
reliability and recovery. A malfunctioning watchdog that reports
incorrect values or operates with wrong timeouts can lead to
unexpected system reboots or failure to reboot when needed.
6. **Part of a Series of Fixes**: This appears to be from the same
author (Ziyan Fu) who previously fixed a related issue in commit
158f9f2f7152 about updating the heartbeat value. This suggests
ongoing work to fix error handling in this driver.
The fix prevents the driver from operating in an invalid state when
hardware initialization fails, which is exactly the type of bug fix that
stable kernels should include to maintain system reliability.
drivers/watchdog/iTCO_wdt.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
index 9ab769aa0244..4ab3405ef8e6 100644
--- a/drivers/watchdog/iTCO_wdt.c
+++ b/drivers/watchdog/iTCO_wdt.c
@@ -577,7 +577,11 @@ static int iTCO_wdt_probe(struct platform_device *pdev)
/* Check that the heartbeat value is within it's range;
if not reset to the default */
if (iTCO_wdt_set_timeout(&p->wddev, heartbeat)) {
- iTCO_wdt_set_timeout(&p->wddev, WATCHDOG_TIMEOUT);
+ ret = iTCO_wdt_set_timeout(&p->wddev, WATCHDOG_TIMEOUT);
+ if (ret != 0) {
+ dev_err(dev, "Failed to set watchdog timeout (%d)\n", WATCHDOG_TIMEOUT);
+ return ret;
+ }
dev_info(dev, "timeout value out of range, using %d\n",
WATCHDOG_TIMEOUT);
heartbeat = WATCHDOG_TIMEOUT;
--
2.39.5
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH AUTOSEL 6.16-5.4] watchdog: dw_wdt: Fix default timeout
[not found] <20250805130945.471732-1-sashal@kernel.org>
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16-5.15] watchdog: sbsa: Adjust keepalive timeout to avoid MediaTek WS0 race condition Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.15] watchdog: iTCO_wdt: Report error if timeout configuration fails Sasha Levin
@ 2025-08-05 13:09 ` Sasha Levin
2 siblings, 0 replies; 3+ messages in thread
From: Sasha Levin @ 2025-08-05 13:09 UTC (permalink / raw)
To: patches, stable
Cc: Sebastian Reichel, Guenter Roeck, Wim Van Sebroeck, Sasha Levin,
linux-watchdog
From: Sebastian Reichel <sebastian.reichel@collabora.com>
[ Upstream commit ac3dbb91e0167d017f44701dd51c1efe30d0c256 ]
The Synopsys Watchdog driver sets the default timeout to 30 seconds,
but on some devices this is not a valid timeout. E.g. on RK3588 the
actual timeout being used is 44 seconds instead.
Once the watchdog is started the value is updated accordingly, but
it would be better to expose a sensible timeout to userspace without
the need to first start the watchdog.
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20250717-dw-wdt-fix-initial-timeout-v1-1-86dc864d48dd@kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Based on my analysis of the commit, here's my assessment:
**Backport Status: YES**
## Extensive Explanation:
This commit is a clear bugfix that should be backported to stable kernel
trees for the following reasons:
### 1. **It fixes a real user-visible bug**
The commit addresses a problem where the watchdog driver exposes an
incorrect timeout value (30 seconds) to userspace when that value is not
actually valid for the hardware. On RK3588 specifically, the hardware
uses 44 seconds instead of the requested 30 seconds, creating a
discrepancy between what userspace sees and what the hardware actually
does.
### 2. **The fix is minimal and contained**
The change adds only two lines of code:
```c
+ /* Limit timeout value to hardware constraints. */
+ dw_wdt_set_timeout(wdd, wdd->timeout);
```
This simply calls the existing `dw_wdt_set_timeout()` function during
initialization to ensure the timeout value is properly validated and
adjusted to hardware constraints before the watchdog is registered with
the kernel.
### 3. **No architectural changes or new features**
The commit doesn't introduce any new functionality or change the
driver's architecture. It merely ensures that the existing timeout
validation logic (which was already being called when the watchdog
starts) is also applied during initialization.
### 4. **Low risk of regression**
The `dw_wdt_set_timeout()` function is already being called in other
code paths (when the watchdog is started, when timeout is changed). This
commit just ensures it's also called during probe to validate the
initial timeout. The function itself uses `dw_wdt_find_best_top()` to
find a valid hardware timeout value that's closest to the requested
value.
### 5. **Affects user-facing behavior**
Without this fix, userspace applications that query the watchdog timeout
before starting it will get an incorrect value (30 seconds) that doesn't
match what the hardware will actually use (e.g., 44 seconds on RK3588).
This could lead to incorrect assumptions about watchdog behavior in
monitoring and recovery systems.
### 6. **The fix follows established patterns**
The commit message notes "it would be better to expose a sensible
timeout to userspace without the need to first start the watchdog" -
this is a reasonable expectation that userspace should have accurate
information about hardware settings without needing to activate the
hardware first.
### 7. **Subsystem maintainer review**
The commit has been reviewed by Guenter Roeck (the watchdog subsystem
maintainer) and signed off by both subsystem maintainers, indicating
it's a proper fix that they consider important.
The fix ensures that the watchdog timeout value exposed to userspace
through sysfs and ioctl interfaces accurately reflects the hardware's
actual timeout capability from the moment the driver is loaded, rather
than showing an incorrect default value until the watchdog is started.
This is particularly important for systems that need to make decisions
based on watchdog capabilities before actually starting the watchdog
timer.
drivers/watchdog/dw_wdt.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/watchdog/dw_wdt.c b/drivers/watchdog/dw_wdt.c
index 26efca9ae0e7..c3fbb6068c52 100644
--- a/drivers/watchdog/dw_wdt.c
+++ b/drivers/watchdog/dw_wdt.c
@@ -644,6 +644,8 @@ static int dw_wdt_drv_probe(struct platform_device *pdev)
} else {
wdd->timeout = DW_WDT_DEFAULT_SECONDS;
watchdog_init_timeout(wdd, 0, dev);
+ /* Limit timeout value to hardware constraints. */
+ dw_wdt_set_timeout(wdd, wdd->timeout);
}
platform_set_drvdata(pdev, dw_wdt);
--
2.39.5
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-08-05 13:11 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250805130945.471732-1-sashal@kernel.org>
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16-5.15] watchdog: sbsa: Adjust keepalive timeout to avoid MediaTek WS0 race condition Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.15] watchdog: iTCO_wdt: Report error if timeout configuration fails Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.4] watchdog: dw_wdt: Fix default timeout Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).