[PATCH 0/4] watchdog/hardlockup: Improvements to hardlockup detection and documentation

public inbox for linux-doc@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] watchdog/hardlockup: Improvements to hardlockup detection and documentation
@ 2026-02-12 21:12 Mayank Rungta via B4 Relay
  2026-02-12 21:12 ` [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check Mayank Rungta via B4 Relay
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Mayank Rungta via B4 Relay @ 2026-02-12 21:12 UTC (permalink / raw)
  To: Jonathan Corbet, Petr Mladek, Jinchao Wang, Yunhui Cui,
	Stephane Eranian, Ian Rogers, Li Huafei, Feng Tang,
	Max Kellermann, Jonathan Corbet, Douglas Anderson, Andrew Morton
  Cc: linux-kernel, linux-doc, Mayank Rungta

This series addresses limitations in the hardlockup detector implementations
and updates the documentation to reflect actual behavior and recent changes.

The changes are structured as follows:

Hardlockup Detection Improvements (Patches 1 & 3)
=================================================
The hardlockup detector logic relies on updating saved interrupt counts to
determine if the CPU is making progress.

Patch 1 ensures that the saved interrupt count is updated unconditionally
before checking the "touched" flag. This prevents stale comparisons which
can delay detection. This is a logic fix that ensures the detector remains
accurate even when the watchdog is frequently touched.

Patch 3 improves the Buddy detector's timeliness. The current checking
interval (every 3rd sample) causes high variability in detection time (up
to 24s). This patch changes the Buddy detector to check at every hrtimer
interval (4s) with a missed-interrupt threshold of 3, narrowing the
detection window to a consistent 8-12 second range.

Documentation Updates (Patches 2 & 4)
=====================================
The current documentation does not fully capture the variable nature of
detection latency or the details of the Buddy system.

Patch 2 removes the strict "10 seconds" definition of a hardlockup, which
was misleading given the periodic nature of the detector. It adds a
"Detection Overhead" section to the admin guide, using "Best Case" and
"Worst Case" scenarios to illustrate that detection time can vary
significantly (e.g., ~6s to ~20s).

Patch 4 adds a dedicated section for the Buddy detector, which was previously
undocumented. It details the mechanism, the new timing logic, and known
limitations.

Signed-off-by: Mayank Rungta <mrungta@google.com>
---
Mayank Rungta (4):
      watchdog/hardlockup: Always update saved interrupts during check
      doc: watchdog: Clarify hardlockup detection timing
      watchdog/hardlockup: improve buddy system detection timeliness
      doc: watchdog: Document buddy detector

 Documentation/admin-guide/lockup-watchdogs.rst | 132 +++++++++++++++++++++----
 include/linux/nmi.h                            |   1 +
 kernel/watchdog.c                              |  41 ++++++--
 kernel/watchdog_buddy.c                        |   9 +-
 4 files changed, 146 insertions(+), 37 deletions(-)
---
base-commit: 0dddf20b4fd4afd59767acc144ad4da60259f21f
change-id: 20260211-hardlockup-watchdog-fixes-60317598ac20

Best regards,
-- 
Mayank Rungta <mrungta@google.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check
  2026-02-12 21:12 [PATCH 0/4] watchdog/hardlockup: Improvements to hardlockup detection and documentation Mayank Rungta via B4 Relay
@ 2026-02-12 21:12 ` Mayank Rungta via B4 Relay
  2026-02-13 16:29   ` Doug Anderson
  2026-03-04 14:44   ` Petr Mladek
  2026-02-12 21:12 ` [PATCH 2/4] doc: watchdog: Clarify hardlockup detection timing Mayank Rungta via B4 Relay
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 21+ messages in thread
From: Mayank Rungta via B4 Relay @ 2026-02-12 21:12 UTC (permalink / raw)
  To: Jonathan Corbet, Petr Mladek, Jinchao Wang, Yunhui Cui,
	Stephane Eranian, Ian Rogers, Li Huafei, Feng Tang,
	Max Kellermann, Jonathan Corbet, Douglas Anderson, Andrew Morton
  Cc: linux-kernel, linux-doc, Mayank Rungta

From: Mayank Rungta <mrungta@google.com>

Currently, arch_touch_nmi_watchdog() causes an early return that
skips updating hrtimer_interrupts_saved. This leads to stale
comparisons and delayed lockup detection.

Update the saved interrupt count before checking the touched flag
to ensure detection timeliness.

Signed-off-by: Mayank Rungta <mrungta@google.com>
---
 kernel/watchdog.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 7d675781bc917d709aa3fb499629eeac86934f55..b71aa814edcf9ad8f73644eb5bcd1eeb3264e4ed 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -186,7 +186,21 @@ static void watchdog_hardlockup_kick(void)
 
 void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 {
+	bool is_hl;
 	int hardlockup_all_cpu_backtrace;
+	/*
+	 * Check for a hardlockup by making sure the CPU's timer
+	 * interrupt is incrementing. The timer interrupt should have
+	 * fired multiple times before we overflow'd. If it hasn't
+	 * then this is a good indication the cpu is stuck
+	 *
+	 * Purposely check this _before_ checking watchdog_hardlockup_touched
+	 * so we make sure we still update the saved value of the interrupts.
+	 * Without that we'll take an extra round through this function before
+	 * we can detect a lockup.
+	 */
+
+	is_hl = is_hardlockup(cpu);
 
 	if (per_cpu(watchdog_hardlockup_touched, cpu)) {
 		per_cpu(watchdog_hardlockup_touched, cpu) = false;
@@ -195,13 +209,8 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 
 	hardlockup_all_cpu_backtrace = (hardlockup_si_mask & SYS_INFO_ALL_BT) ?
 					1 : sysctl_hardlockup_all_cpu_backtrace;
-	/*
-	 * Check for a hardlockup by making sure the CPU's timer
-	 * interrupt is incrementing. The timer interrupt should have
-	 * fired multiple times before we overflow'd. If it hasn't
-	 * then this is a good indication the cpu is stuck
-	 */
-	if (is_hardlockup(cpu)) {
+
+	if (is_hl) {
 		unsigned int this_cpu = smp_processor_id();
 		unsigned long flags;
 

-- 
2.53.0.273.g2a3d683680-goog



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/4] doc: watchdog: Clarify hardlockup detection timing
  2026-02-12 21:12 [PATCH 0/4] watchdog/hardlockup: Improvements to hardlockup detection and documentation Mayank Rungta via B4 Relay
  2026-02-12 21:12 ` [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check Mayank Rungta via B4 Relay
@ 2026-02-12 21:12 ` Mayank Rungta via B4 Relay
  2026-02-13 16:29   ` Doug Anderson
  2026-03-05 12:33   ` Petr Mladek
  2026-02-12 21:12 ` [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness Mayank Rungta via B4 Relay
  2026-02-12 21:12 ` [PATCH 4/4] doc: watchdog: Document buddy detector Mayank Rungta via B4 Relay
  3 siblings, 2 replies; 21+ messages in thread
From: Mayank Rungta via B4 Relay @ 2026-02-12 21:12 UTC (permalink / raw)
  To: Jonathan Corbet, Petr Mladek, Jinchao Wang, Yunhui Cui,
	Stephane Eranian, Ian Rogers, Li Huafei, Feng Tang,
	Max Kellermann, Jonathan Corbet, Douglas Anderson, Andrew Morton
  Cc: linux-kernel, linux-doc, Mayank Rungta

From: Mayank Rungta <mrungta@google.com>

The current documentation implies that a hardlockup is strictly defined
as looping for "more than 10 seconds." However, the detection mechanism
is periodic (based on `watchdog_thresh`), meaning detection time varies
significantly depending on when the lockup occurs relative to the NMI
perf event.

Update the definition to remove the strict "more than 10 seconds"
constraint in the introduction and defer details to the Implementation
section.

Additionally, add a "Detection Overhead" section illustrating the
Best Case (~6s) and Worst Case (~20s) detection scenarios to provide
administrators with a clearer understanding of the watchdog's
latency.

Signed-off-by: Mayank Rungta <mrungta@google.com>
---
 Documentation/admin-guide/lockup-watchdogs.rst | 41 +++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/lockup-watchdogs.rst b/Documentation/admin-guide/lockup-watchdogs.rst
index 3e09284a8b9bef75c0ac1607a1809ac3b8a4c1ea..1b374053771f676d874716b3210cade55ae89b28 100644
--- a/Documentation/admin-guide/lockup-watchdogs.rst
+++ b/Documentation/admin-guide/lockup-watchdogs.rst
@@ -16,7 +16,7 @@ details), and a compile option, "BOOTPARAM_SOFTLOCKUP_PANIC", are
 provided for this.
 
 A 'hardlockup' is defined as a bug that causes the CPU to loop in
-kernel mode for more than 10 seconds (see "Implementation" below for
+kernel mode for several seconds (see "Implementation" below for
 details), without letting other interrupts have a chance to run.
 Similarly to the softlockup case, the current stack trace is displayed
 upon detection and the system will stay locked up unless the default
@@ -64,6 +64,45 @@ administrators to configure the period of the hrtimer and the perf
 event. The right value for a particular environment is a trade-off
 between fast response to lockups and detection overhead.
 
+Detection Overhead
+------------------
+
+The hardlockup detector checks for lockups using a periodic NMI perf
+event. This means the time to detect a lockup can vary depending on
+when the lockup occurs relative to the NMI check window.
+
+**Best Case:**
+In the best case scenario, the lockup occurs just before the first
+heartbeat is due. The detector will notice the missing hrtimer
+interrupt almost immediately during the next check.
+
+::
+
+  Time 100.0: cpu 1 heartbeat
+  Time 100.1: hardlockup_check, cpu1 stores its state
+  Time 103.9: Hard Lockup on cpu1
+  Time 104.0: cpu 1 heartbeat never comes
+  Time 110.1: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup
+
+  Time to detection: ~6 seconds
+
+**Worst Case:**
+In the worst case scenario, the lockup occurs shortly after a valid
+interrupt (heartbeat) which itself happened just after the NMI check.
+The next NMI check sees that the interrupt count has changed (due to
+that one heartbeat), assumes the CPU is healthy, and resets the
+baseline. The lockup is only detected at the subsequent check.
+
+::
+
+  Time 100.0: hardlockup_check, cpu1 stores its state
+  Time 100.1: cpu 1 heartbeat
+  Time 100.2: Hard Lockup on cpu1
+  Time 110.0: hardlockup_check, cpu1 stores its state (misses lockup as state changed)
+  Time 120.0: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup
+
+  Time to detection: ~20 seconds
+
 By default, the watchdog runs on all online cores.  However, on a
 kernel configured with NO_HZ_FULL, by default the watchdog runs only
 on the housekeeping cores, not the cores specified in the "nohz_full"

-- 
2.53.0.273.g2a3d683680-goog



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness
  2026-02-12 21:12 [PATCH 0/4] watchdog/hardlockup: Improvements to hardlockup detection and documentation Mayank Rungta via B4 Relay
  2026-02-12 21:12 ` [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check Mayank Rungta via B4 Relay
  2026-02-12 21:12 ` [PATCH 2/4] doc: watchdog: Clarify hardlockup detection timing Mayank Rungta via B4 Relay
@ 2026-02-12 21:12 ` Mayank Rungta via B4 Relay
  2026-02-13 16:30   ` Doug Anderson
  2026-03-05 13:46   ` Petr Mladek
  2026-02-12 21:12 ` [PATCH 4/4] doc: watchdog: Document buddy detector Mayank Rungta via B4 Relay
  3 siblings, 2 replies; 21+ messages in thread
From: Mayank Rungta via B4 Relay @ 2026-02-12 21:12 UTC (permalink / raw)
  To: Jonathan Corbet, Petr Mladek, Jinchao Wang, Yunhui Cui,
	Stephane Eranian, Ian Rogers, Li Huafei, Feng Tang,
	Max Kellermann, Jonathan Corbet, Douglas Anderson, Andrew Morton
  Cc: linux-kernel, linux-doc, Mayank Rungta

From: Mayank Rungta <mrungta@google.com>

Currently, the buddy system only performs checks every 3rd sample. With
a 4-second interval. If a check window is missed, the next check occurs
12 seconds later, potentially delaying hard lockup detection for up to
24 seconds.

Modify the buddy system to perform checks at every interval (4s).
Introduce a missed-interrupt threshold to maintain the existing grace
period while reducing the detection window to 8-12 seconds.

Best and worst case detection scenarios:

Before (12s check window):
- Best case: Lockup occurs after first check but just before heartbeat
  interval. Detected in ~8s (8s till next check).
- Worst case: Lockup occurs just after a check.
  Detected in ~24s (missed check + 12s till next check + 12s logic).

After (4s check window with threshold of 3):
- Best case: Lockup occurs just before a check.
  Detected in ~8s (0s till 1st check + 4s till 2nd + 4s till 3rd).
- Worst case: Lockup occurs just after a check.
  Detected in ~12s (4s till 1st check + 4s till 2nd + 4s till 3rd).

Signed-off-by: Mayank Rungta <mrungta@google.com>
---
 include/linux/nmi.h     |  1 +
 kernel/watchdog.c       | 18 ++++++++++++++++--
 kernel/watchdog_buddy.c |  9 +--------
 3 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 207156f2143c5f43e89e81cbf0215331eae9bd49..bc1162895f3558bff178dd6c2c839344162f8adc 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -21,6 +21,7 @@ void lockup_detector_soft_poweroff(void);
 extern int watchdog_user_enabled;
 extern int watchdog_thresh;
 extern unsigned long watchdog_enabled;
+extern int watchdog_hardlockup_miss_thresh;
 
 extern struct cpumask watchdog_cpumask;
 extern unsigned long *watchdog_cpumask_bits;
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b71aa814edcf9ad8f73644eb5bcd1eeb3264e4ed..30199eaeb5d7e0fd229657a31ffff4463c97332c 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -60,6 +60,13 @@ unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
 int __read_mostly sysctl_hardlockup_all_cpu_backtrace;
 # endif /* CONFIG_SMP */
 
+/*
+ * Number of consecutive missed interrupts before declaring a lockup.
+ * Default to 1 (immediate) for NMI/Perf. Buddy will overwrite this to 3.
+ */
+int __read_mostly watchdog_hardlockup_miss_thresh = 1;
+EXPORT_SYMBOL_GPL(watchdog_hardlockup_miss_thresh);
+
 /*
  * Should we panic when a soft-lockup or hard-lockup occurs:
  */
@@ -137,6 +144,7 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
 
 static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts);
 static DEFINE_PER_CPU(int, hrtimer_interrupts_saved);
+static DEFINE_PER_CPU(int, hrtimer_interrupts_missed);
 static DEFINE_PER_CPU(bool, watchdog_hardlockup_warned);
 static DEFINE_PER_CPU(bool, watchdog_hardlockup_touched);
 static unsigned long hard_lockup_nmi_warn;
@@ -163,8 +171,13 @@ static bool is_hardlockup(unsigned int cpu)
 {
 	int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
 
-	if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
-		return true;
+	if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
+		per_cpu(hrtimer_interrupts_missed, cpu)++;
+		if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
+			return true;
+
+		return false;
+	}
 
 	/*
 	 * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
@@ -172,6 +185,7 @@ static bool is_hardlockup(unsigned int cpu)
 	 * written/read by a single CPU.
 	 */
 	per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
+	per_cpu(hrtimer_interrupts_missed, cpu) = 0;
 
 	return false;
 }
diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c
index ee754d767c2131e3cd34bccf26d8e6cf0e0b5f75..3a1e57080c1c6a645c974b3b6eebec87df9e69e9 100644
--- a/kernel/watchdog_buddy.c
+++ b/kernel/watchdog_buddy.c
@@ -21,6 +21,7 @@ static unsigned int watchdog_next_cpu(unsigned int cpu)
 
 int __init watchdog_hardlockup_probe(void)
 {
+	watchdog_hardlockup_miss_thresh = 3;
 	return 0;
 }
 
@@ -86,14 +87,6 @@ void watchdog_buddy_check_hardlockup(int hrtimer_interrupts)
 {
 	unsigned int next_cpu;
 
-	/*
-	 * Test for hardlockups every 3 samples. The sample period is
-	 *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
-	 *  watchdog_thresh (over by 20%).
-	 */
-	if (hrtimer_interrupts % 3 != 0)
-		return;
-
 	/* check for a hardlockup on the next CPU */
 	next_cpu = watchdog_next_cpu(smp_processor_id());
 	if (next_cpu >= nr_cpu_ids)

-- 
2.53.0.273.g2a3d683680-goog



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/4] doc: watchdog: Document buddy detector
  2026-02-12 21:12 [PATCH 0/4] watchdog/hardlockup: Improvements to hardlockup detection and documentation Mayank Rungta via B4 Relay
                   ` (2 preceding siblings ...)
  2026-02-12 21:12 ` [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness Mayank Rungta via B4 Relay
@ 2026-02-12 21:12 ` Mayank Rungta via B4 Relay
  2026-02-13 16:30   ` Doug Anderson
  3 siblings, 1 reply; 21+ messages in thread
From: Mayank Rungta via B4 Relay @ 2026-02-12 21:12 UTC (permalink / raw)
  To: Jonathan Corbet, Petr Mladek, Jinchao Wang, Yunhui Cui,
	Stephane Eranian, Ian Rogers, Li Huafei, Feng Tang,
	Max Kellermann, Jonathan Corbet, Douglas Anderson, Andrew Morton
  Cc: linux-kernel, linux-doc, Mayank Rungta

From: Mayank Rungta <mrungta@google.com>

The current documentation generalizes the hardlockup detector as primarily
NMI-perf-based and lacks details on the SMP "Buddy" detector.

Update the documentation to add a detailed description of the Buddy
detector, and also restructure the "Implementation" section to explicitly
separate "Softlockup Detector", "Hardlockup Detector (NMI/Perf)", and
"Hardlockup Detector (Buddy)".

Clarify that the softlockup hrtimer acts as the heartbeat generator for
both hardlockup mechanisms and centralize the configuration details in a
"Frequency and Heartbeats" section.

Signed-off-by: Mayank Rungta <mrungta@google.com>
---
 Documentation/admin-guide/lockup-watchdogs.rst | 149 +++++++++++++++++--------
 1 file changed, 101 insertions(+), 48 deletions(-)

diff --git a/Documentation/admin-guide/lockup-watchdogs.rst b/Documentation/admin-guide/lockup-watchdogs.rst
index 1b374053771f676d874716b3210cade55ae89b28..7ae7ce3abd2c838ff29c70f7a32ffaf58531e150 100644
--- a/Documentation/admin-guide/lockup-watchdogs.rst
+++ b/Documentation/admin-guide/lockup-watchdogs.rst
@@ -30,22 +30,23 @@ timeout is set through the confusingly named "kernel.panic" sysctl),
 to cause the system to reboot automatically after a specified amount
 of time.
 
+Configuration
+=============
+
+A kernel knob is provided that allows administrators to configure
+this period. The "watchdog_thresh" parameter (default 10 seconds)
+controls the threshold. The right value for a particular environment
+is a trade-off between fast response to lockups and detection overhead.
+
 Implementation
 ==============
 
-The soft and hard lockup detectors are built on top of the hrtimer and
-perf subsystems, respectively. A direct consequence of this is that,
-in principle, they should work in any architecture where these
-subsystems are present.
+The soft lockup detector is built on top of the hrtimer subsystem.
+The hard lockup detector is built on top of the perf subsystem
+(on architectures that support it) or uses an SMP "buddy" system.
 
-A periodic hrtimer runs to generate interrupts and kick the watchdog
-job. An NMI perf event is generated every "watchdog_thresh"
-(compile-time initialized to 10 and configurable through sysctl of the
-same name) seconds to check for hardlockups. If any CPU in the system
-does not receive any hrtimer interrupt during that time the
-'hardlockup detector' (the handler for the NMI perf event) will
-generate a kernel warning or call panic, depending on the
-configuration.
+Softlockup Detector
+-------------------
 
 The watchdog job runs in a stop scheduling thread that updates a
 timestamp every time it is scheduled. If that timestamp is not updated
@@ -55,53 +56,105 @@ will dump useful debug information to the system log, after which it
 will call panic if it was instructed to do so or resume execution of
 other kernel code.
 
-The period of the hrtimer is 2*watchdog_thresh/5, which means it has
-two or three chances to generate an interrupt before the hardlockup
-detector kicks in.
+Frequency and Heartbeats
+------------------------
+
+The hrtimer used by the softlockup detector serves a dual purpose:
+it detects softlockups, and it also generates the interrupts
+(heartbeats) that the hardlockup detectors use to verify CPU liveness.
+
+The period of this hrtimer is 2*watchdog_thresh/5. This means the
+hrtimer has two or three chances to generate an interrupt before the
+NMI hardlockup detector kicks in.
+
+Hardlockup Detector (NMI/Perf)
+------------------------------
+
+On architectures that support NMI (Non-Maskable Interrupt) perf events,
+a periodic NMI is generated every "watchdog_thresh" seconds.
+
+If any CPU in the system does not receive any hrtimer interrupt
+(heartbeat) during the "watchdog_thresh" window, the 'hardlockup
+detector' (the handler for the NMI perf event) will generate a kernel
+warning or call panic.
+
+**Detection Overhead (NMI):**
+
+The time to detect a lockup can vary depending on when the lockup
+occurs relative to the NMI check window. Examples below assume a watchdog_thresh of 10.
+
+* **Best Case:** The lockup occurs just before the first heartbeat is
+  due. The detector will notice the missing hrtimer interrupt almost
+  immediately during the next check.
+
+  ::
+
+    Time 100.0: cpu 1 heartbeat
+    Time 100.1: hardlockup_check, cpu1 stores its state
+    Time 103.9: Hard Lockup on cpu1
+    Time 104.0: cpu 1 heartbeat never comes
+    Time 110.1: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup
+
+    Time to detection: ~6 seconds
+
+* **Worst Case:** The lockup occurs shortly after a valid interrupt
+  (heartbeat) which itself happened just after the NMI check. The next
+  NMI check sees that the interrupt count has changed (due to that one
+  heartbeat), assumes the CPU is healthy, and resets the baseline. The
+  lockup is only detected at the subsequent check.
+
+  ::
+
+    Time 100.0: hardlockup_check, cpu1 stores its state
+    Time 100.1: cpu 1 heartbeat
+    Time 100.2: Hard Lockup on cpu1
+    Time 110.0: hardlockup_check, cpu1 stores its state (misses lockup as state changed)
+    Time 120.0: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup
 
-As explained above, a kernel knob is provided that allows
-administrators to configure the period of the hrtimer and the perf
-event. The right value for a particular environment is a trade-off
-between fast response to lockups and detection overhead.
+    Time to detection: ~20 seconds
 
-Detection Overhead
-------------------
+Hardlockup Detector (Buddy)
+---------------------------
 
-The hardlockup detector checks for lockups using a periodic NMI perf
-event. This means the time to detect a lockup can vary depending on
-when the lockup occurs relative to the NMI check window.
+On architectures or configurations where NMI perf events are not
+available (or disabled), the kernel may use the "buddy" hardlockup
+detector. This mechanism requires SMP (Symmetric Multi-Processing).
 
-**Best Case:**
-In the best case scenario, the lockup occurs just before the first
-heartbeat is due. The detector will notice the missing hrtimer
-interrupt almost immediately during the next check.
+In this mode, each CPU is assigned a "buddy" CPU to monitor. The
+monitoring CPU runs its own hrtimer (the same one used for softlockup
+detection) and checks if the buddy CPU's hrtimer interrupt count has
+increased.
 
-::
+To ensure timeliness and avoid false positives, the buddy system performs
+checks at every hrtimer interval (2*watchdog_thresh/5, which is 4 seconds
+by default). It uses a missed-interrupt threshold of 3. If the buddy's
+interrupt count has not changed for 3 consecutive checks, it is assumed
+that the buddy CPU is hardlocked (interrupts disabled). The monitoring
+CPU will then trigger the hardlockup response (warning or panic).
 
-  Time 100.0: cpu 1 heartbeat
-  Time 100.1: hardlockup_check, cpu1 stores its state
-  Time 103.9: Hard Lockup on cpu1
-  Time 104.0: cpu 1 heartbeat never comes
-  Time 110.1: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup
+**Detection Overhead (Buddy):**
 
-  Time to detection: ~6 seconds
+With a default check interval of 4 seconds (watchdog_thresh = 10):
 
-**Worst Case:**
-In the worst case scenario, the lockup occurs shortly after a valid
-interrupt (heartbeat) which itself happened just after the NMI check.
-The next NMI check sees that the interrupt count has changed (due to
-that one heartbeat), assumes the CPU is healthy, and resets the
-baseline. The lockup is only detected at the subsequent check.
+* **Best case:** Lockup occurs just before a check.
+    Detected in ~8s (0s till 1st check + 4s till 2nd + 4s till 3rd).
+* **Worst case:** Lockup occurs just after a check.
+    Detected in ~12s (4s till 1st check + 4s till 2nd + 4s till 3rd).
 
-::
+**Limitations of the Buddy Detector:**
 
-  Time 100.0: hardlockup_check, cpu1 stores its state
-  Time 100.1: cpu 1 heartbeat
-  Time 100.2: Hard Lockup on cpu1
-  Time 110.0: hardlockup_check, cpu1 stores its state (misses lockup as state changed)
-  Time 120.0: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup
+1.  **All-CPU Lockup:** If all CPUs lock up simultaneously, the buddy
+    detector cannot detect the condition because the monitoring CPUs
+    are also frozen.
+2.  **Stack Traces:** Unlike the NMI detector, the buddy detector
+    cannot directly interrupt the locked CPU to grab a stack trace.
+    It relies on architecture-specific mechanisms (like NMI backtrace
+    support) to try and retrieve the status of the locked CPU. If
+    such support is missing, the log may only show that a lockup
+    occurred without providing the locked CPU's stack.
 
-  Time to detection: ~20 seconds
+Watchdog Core Exclusion
+=======================
 
 By default, the watchdog runs on all online cores.  However, on a
 kernel configured with NO_HZ_FULL, by default the watchdog runs only

-- 
2.53.0.273.g2a3d683680-goog



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check
  2026-02-12 21:12 ` [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check Mayank Rungta via B4 Relay
@ 2026-02-13 16:29   ` Doug Anderson
  2026-03-04 14:44   ` Petr Mladek
  1 sibling, 0 replies; 21+ messages in thread
From: Doug Anderson @ 2026-02-13 16:29 UTC (permalink / raw)
  To: mrungta
  Cc: Jonathan Corbet, Petr Mladek, Jinchao Wang, Yunhui Cui,
	Stephane Eranian, Ian Rogers, Li Huafei, Feng Tang,
	Max Kellermann, Andrew Morton, linux-kernel, linux-doc

Hi,

On Thu, Feb 12, 2026 at 1:12 PM Mayank Rungta via B4 Relay
<devnull+mrungta.google.com@kernel.org> wrote:
>
> From: Mayank Rungta <mrungta@google.com>
>
> Currently, arch_touch_nmi_watchdog() causes an early return that
> skips updating hrtimer_interrupts_saved. This leads to stale
> comparisons and delayed lockup detection.
>
> Update the saved interrupt count before checking the touched flag
> to ensure detection timeliness.
>
> Signed-off-by: Mayank Rungta <mrungta@google.com>
> ---
>  kernel/watchdog.c | 23 ++++++++++++++++-------
>  1 file changed, 16 insertions(+), 7 deletions(-)

I pre-reviewed this patch series for Mayank, so unsurprisingly I have
no comments. ;-)

Reviewed-by: Douglas Anderson <dianders@chromium.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] doc: watchdog: Clarify hardlockup detection timing
  2026-02-12 21:12 ` [PATCH 2/4] doc: watchdog: Clarify hardlockup detection timing Mayank Rungta via B4 Relay
@ 2026-02-13 16:29   ` Doug Anderson
  2026-03-05 12:33   ` Petr Mladek
  1 sibling, 0 replies; 21+ messages in thread
From: Doug Anderson @ 2026-02-13 16:29 UTC (permalink / raw)
  To: mrungta
  Cc: Jonathan Corbet, Petr Mladek, Jinchao Wang, Yunhui Cui,
	Stephane Eranian, Ian Rogers, Li Huafei, Feng Tang,
	Max Kellermann, Andrew Morton, linux-kernel, linux-doc

Hi,

On Thu, Feb 12, 2026 at 1:12 PM Mayank Rungta via B4 Relay
<devnull+mrungta.google.com@kernel.org> wrote:
>
> From: Mayank Rungta <mrungta@google.com>
>
> The current documentation implies that a hardlockup is strictly defined
> as looping for "more than 10 seconds." However, the detection mechanism
> is periodic (based on `watchdog_thresh`), meaning detection time varies
> significantly depending on when the lockup occurs relative to the NMI
> perf event.
>
> Update the definition to remove the strict "more than 10 seconds"
> constraint in the introduction and defer details to the Implementation
> section.
>
> Additionally, add a "Detection Overhead" section illustrating the
> Best Case (~6s) and Worst Case (~20s) detection scenarios to provide
> administrators with a clearer understanding of the watchdog's
> latency.
>
> Signed-off-by: Mayank Rungta <mrungta@google.com>
> ---
>  Documentation/admin-guide/lockup-watchdogs.rst | 41 +++++++++++++++++++++++++-
>  1 file changed, 40 insertions(+), 1 deletion(-)

Thanks for updating the docs! Again, given that I pre-reviewed, I
unsurprisingly have no further comments.

Reviewed-by: Douglas Anderson <dianders@chromium.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness
  2026-02-12 21:12 ` [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness Mayank Rungta via B4 Relay
@ 2026-02-13 16:30   ` Doug Anderson
  2026-03-05 13:46   ` Petr Mladek
  1 sibling, 0 replies; 21+ messages in thread
From: Doug Anderson @ 2026-02-13 16:30 UTC (permalink / raw)
  To: mrungta
  Cc: Jonathan Corbet, Petr Mladek, Jinchao Wang, Yunhui Cui,
	Stephane Eranian, Ian Rogers, Li Huafei, Feng Tang,
	Max Kellermann, Andrew Morton, linux-kernel, linux-doc

Hi,

On Thu, Feb 12, 2026 at 1:12 PM Mayank Rungta via B4 Relay
<devnull+mrungta.google.com@kernel.org> wrote:
>
> @@ -21,6 +21,7 @@ static unsigned int watchdog_next_cpu(unsigned int cpu)
>
>  int __init watchdog_hardlockup_probe(void)
>  {
> +       watchdog_hardlockup_miss_thresh = 3;
>         return 0;
>  }
>
> @@ -86,14 +87,6 @@ void watchdog_buddy_check_hardlockup(int hrtimer_interrupts)
>  {
>         unsigned int next_cpu;
>
> -       /*
> -        * Test for hardlockups every 3 samples. The sample period is
> -        *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
> -        *  watchdog_thresh (over by 20%).
> -        */
> -       if (hrtimer_interrupts % 3 != 0)
> -               return;

I really like that this solution achieves a tighter detection range
without any downside (no extra wakeups, etc). :-)

Reviewed-by: Douglas Anderson <dianders@chromium.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] doc: watchdog: Document buddy detector
  2026-02-12 21:12 ` [PATCH 4/4] doc: watchdog: Document buddy detector Mayank Rungta via B4 Relay
@ 2026-02-13 16:30   ` Doug Anderson
  0 siblings, 0 replies; 21+ messages in thread
From: Doug Anderson @ 2026-02-13 16:30 UTC (permalink / raw)
  To: mrungta
  Cc: Jonathan Corbet, Petr Mladek, Jinchao Wang, Yunhui Cui,
	Stephane Eranian, Ian Rogers, Li Huafei, Feng Tang,
	Max Kellermann, Andrew Morton, linux-kernel, linux-doc

Hi,

On Thu, Feb 12, 2026 at 1:12 PM Mayank Rungta via B4 Relay
<devnull+mrungta.google.com@kernel.org> wrote:
>
> From: Mayank Rungta <mrungta@google.com>
>
> The current documentation generalizes the hardlockup detector as primarily
> NMI-perf-based and lacks details on the SMP "Buddy" detector.
>
> Update the documentation to add a detailed description of the Buddy
> detector, and also restructure the "Implementation" section to explicitly
> separate "Softlockup Detector", "Hardlockup Detector (NMI/Perf)", and
> "Hardlockup Detector (Buddy)".
>
> Clarify that the softlockup hrtimer acts as the heartbeat generator for
> both hardlockup mechanisms and centralize the configuration details in a
> "Frequency and Heartbeats" section.
>
> Signed-off-by: Mayank Rungta <mrungta@google.com>
> ---
>  Documentation/admin-guide/lockup-watchdogs.rst | 149 +++++++++++++++++--------
>  1 file changed, 101 insertions(+), 48 deletions(-)

Thank you for updating the docs! I consider it my bug that I didn't
think to update this doc when the buddy lockup detector first landed.
I'm glad it's updated now, at least! :-)

Reviewed-by: Douglas Anderson <dianders@chromium.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check
  2026-02-12 21:12 ` [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check Mayank Rungta via B4 Relay
  2026-02-13 16:29   ` Doug Anderson
@ 2026-03-04 14:44   ` Petr Mladek
  2026-03-05  0:58     ` Doug Anderson
  1 sibling, 1 reply; 21+ messages in thread
From: Petr Mladek @ 2026-03-04 14:44 UTC (permalink / raw)
  To: mrungta
  Cc: Jonathan Corbet, Jinchao Wang, Yunhui Cui, Stephane Eranian,
	Ian Rogers, Li Huafei, Feng Tang, Max Kellermann,
	Douglas Anderson, Andrew Morton, linux-kernel, linux-doc

On Thu 2026-02-12 14:12:10, Mayank Rungta via B4 Relay wrote:
> From: Mayank Rungta <mrungta@google.com>
> 
> Currently, arch_touch_nmi_watchdog() causes an early return that
> skips updating hrtimer_interrupts_saved. This leads to stale
> comparisons and delayed lockup detection.
> 
> Update the saved interrupt count before checking the touched flag
> to ensure detection timeliness.

IMHO, it is not that easy, see below.

So I am curious. Have you found this when debugging a particular
problem or just by reading the code, please?

> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -186,7 +186,21 @@ static void watchdog_hardlockup_kick(void)
>  
>  void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
>  {
> +	bool is_hl;
>  	int hardlockup_all_cpu_backtrace;
> +	/*
> +	 * Check for a hardlockup by making sure the CPU's timer
> +	 * interrupt is incrementing. The timer interrupt should have
> +	 * fired multiple times before we overflow'd. If it hasn't
> +	 * then this is a good indication the cpu is stuck
> +	 *
> +	 * Purposely check this _before_ checking watchdog_hardlockup_touched
> +	 * so we make sure we still update the saved value of the interrupts.
> +	 * Without that we'll take an extra round through this function before
> +	 * we can detect a lockup.
> +	 */
> +
> +	is_hl = is_hardlockup(cpu);
>  
>  	if (per_cpu(watchdog_hardlockup_touched, cpu)) {
>  		per_cpu(watchdog_hardlockup_touched, cpu) = false;

Hmm, this does not look correct to me.

1. First, let's agree on the meaning of "watchdog_hardlockup_touched".

   My understanding is that arch_touch_nmi_watchdog() is called by code
   which might block interrupts (timers) for a long time and wants to
   hide it.

   Blocking interrupts for too long is _bad_. In the ideal word,
   nobody should call arch_touch_nmi_watchdog() because we want
   to know about all sinners.

   In the real world, we allow to hide some sinners because
   they might produce "false" positives, see touch_nmi_watchdog()
   callers:

     + Most callers are printk() related.

       We might argue whether it is a false positive or not.

       The argument for "touching the watchdog" is that slow serial
       consoles might block IRQs for a long time. But they work as
       expected and can't do better.

       Also the stall is kind of expected in this case. We could
       confuse users and/or hide the original problem if the stall
       was reported.

       Note that there is a bigger problem with the legacy console
       drivers. printk() tries to emit them immediately. And the
       current console_lock() owner become responsible for flushing
       new messages added by other CPUs in parallel.

       It is better with the new NBCON console drivers which are
       offloaded to a kthread. Here, printk() tries to flush them directly
       only when called in an emergency mode (WARN, stall report)
       or in panic().

     + There are some other callers, for example, multi_stop_cpu(),
       or hv_do_rep_hypercall_ex(). IMHO, they create stalls on
       purpose.


2. Let's look at is_hardlockup() in detail:

    static bool is_hardlockup(unsigned int cpu)
    {
    	int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
    
    	if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
    		per_cpu(hrtimer_interrupts_missed, cpu)++;
    		if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
    			return true;
    
    		return false;
    	}
    
    	per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
    	per_cpu(hrtimer_interrupts_missed, cpu) = 0;
    
    	return false;
    }

    If we call it when the watchdog was touched then
    (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)

        =>  per_cpu(hrtimer_interrupts_missed, cpu)++;

    is called even when watchdog was touched.

    As a result, we might report stall which should have been hidden,
    for example:

CPU0				   CPU1

 <NMI>
   watchdog_hardlockup_check() # passes
     is_hardlockup() # no
       hr_int_saved = hr_int;
       hr_int_missed = 0;
 </NMI>

  local_irq_save()
    printk()
      console_trylock()
      console_unlock()
        console_flush_all()

           touch_nmi_watchdog()

				   // Other CPUs print many messages,
				   // e.g. during boot when initializing a lot of HW
				   for (i=0; i<1000; i++) do
				       printk();

      <NMI>
        watchdog_hardlockup_check()
	  is_hardlockup() # yes
	    hr_int_missed++  # 1

          # skip because touched
      </NMI>

         touch_nmi_watchdog()

      <NMI>
        watchdog_hardlockup_check()
	  is_hardlockup() # yes
	    hr_int_missed++  # 2

          # skip because touched
      </NMI>

    ... repeat many times ...

  local_irq_restore()

    # this might normally trigger handling of pending IRQs
    # including the timers. But IMHO, it can be offloaded
    # to a kthread (at least on RT)

      <NMI>
        watchdog_hardlockup_check()
	  is_hardlockup() # yes
	    hr_int_missed++  # might be already 3, 4,...

          Report hardlockup even when all the "hr_int_missed"
	  values should have been ignored because of
	  touch_watchdog.

      </NMI>


A solution might be clearing "hrtimer_interrupts_missed"
when the watchdog was touched.

But honestly, I am not sure if this is worth the complexity.


Higher level look:
------------------

My understanding is that this patch has an effect only when
touch_nmi_watchdog() is called as frequently as
watchdog_hardlockup_check().

The original code gives the system more time to recover after
a known stall (touch_nmi_watchdog() called).

The new code is more eager to report a stall. It might be more prone
to report "false" positives.

IMHO, the root of the problem is that touch_nmi_watchdog() is
called too frequently. And this patch is rather dancing around
then fixing it.


Alternative:
------------

An alternative solution might to detect and report when too many
watchdog_hardlockup_check() calls are ignored because of
touch_nmi_watchdog().

It might help to find a mis-use of touch_nmi_watchdog(). The question
is what details should be reported in this case.

It should be optional because touch_nmi_watchdog() is supposed
to hide "well-known" sinners after all.



> @@ -195,13 +209,8 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
>  
>  	hardlockup_all_cpu_backtrace = (hardlockup_si_mask & SYS_INFO_ALL_BT) ?
>  					1 : sysctl_hardlockup_all_cpu_backtrace;
> -	/*
> -	 * Check for a hardlockup by making sure the CPU's timer
> -	 * interrupt is incrementing. The timer interrupt should have
> -	 * fired multiple times before we overflow'd. If it hasn't
> -	 * then this is a good indication the cpu is stuck
> -	 */
> -	if (is_hardlockup(cpu)) {
> +
> +	if (is_hl) {
>  		unsigned int this_cpu = smp_processor_id();
>  		unsigned long flags;
>  

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check
  2026-03-04 14:44   ` Petr Mladek
@ 2026-03-05  0:58     ` Doug Anderson
  2026-03-05 11:27       ` Petr Mladek
  0 siblings, 1 reply; 21+ messages in thread
From: Doug Anderson @ 2026-03-05  0:58 UTC (permalink / raw)
  To: Petr Mladek
  Cc: mrungta, Jonathan Corbet, Jinchao Wang, Yunhui Cui,
	Stephane Eranian, Ian Rogers, Li Huafei, Feng Tang,
	Max Kellermann, Andrew Morton, linux-kernel, linux-doc

Hi,

On Wed, Mar 4, 2026 at 6:44 AM Petr Mladek <pmladek@suse.com> wrote:
>
> On Thu 2026-02-12 14:12:10, Mayank Rungta via B4 Relay wrote:
> > From: Mayank Rungta <mrungta@google.com>
> >
> > Currently, arch_touch_nmi_watchdog() causes an early return that
> > skips updating hrtimer_interrupts_saved. This leads to stale
> > comparisons and delayed lockup detection.
> >
> > Update the saved interrupt count before checking the touched flag
> > to ensure detection timeliness.
>
> IMHO, it is not that easy, see below.
>
> So I am curious. Have you found this when debugging a particular
> problem or just by reading the code, please?

As I understand it, Mayank found this because the watchdog was
reacting significantly more slowly than he expected. In his caes, he
tracked it down to the fact that 8250 console driver has several calls
to touch_nmi_watchdog(), including on every call to console_write().
This caused the watchdog to take _much_ longer to fire.

On devices that fairly chatty w/ their output to the serial console,
the console_write() is called almost constantly. That means that the
watchdog is being touched constantly. If I remember Mayank tracked it
down as:

* watchdog_hardlockup_check() called and saves counter (1000)
* timer runs and updates the timer (1000 -> 1001)
* touch_nmi_watchdog() is called
* CPU locks up
* 10 seconds pass
* watchdog_hardlockup_check() called and saves counter (1001)
* 10 seconds pass
* watchdog_hardlockup_check() called and notices touch
* 10 seconds pass
* watchdog_hardlockup_check() called and finally detects lockup

...so we detect the lockup after 30 seconds, which is pretty bad. With
his new scheme, we'd detect the lockup in 20 seconds.


> > @@ -186,7 +186,21 @@ static void watchdog_hardlockup_kick(void)
> >
> >  void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> >  {
> > +     bool is_hl;
> >       int hardlockup_all_cpu_backtrace;
> > +     /*
> > +      * Check for a hardlockup by making sure the CPU's timer
> > +      * interrupt is incrementing. The timer interrupt should have
> > +      * fired multiple times before we overflow'd. If it hasn't
> > +      * then this is a good indication the cpu is stuck
> > +      *
> > +      * Purposely check this _before_ checking watchdog_hardlockup_touched
> > +      * so we make sure we still update the saved value of the interrupts.
> > +      * Without that we'll take an extra round through this function before
> > +      * we can detect a lockup.
> > +      */
> > +
> > +     is_hl = is_hardlockup(cpu);
> >
> >       if (per_cpu(watchdog_hardlockup_touched, cpu)) {
> >               per_cpu(watchdog_hardlockup_touched, cpu) = false;
>
> Hmm, this does not look correct to me.
>
> 1. First, let's agree on the meaning of "watchdog_hardlockup_touched".
>
>    My understanding is that arch_touch_nmi_watchdog() is called by code
>    which might block interrupts (timers) for a long time and wants to
>    hide it.
>
>    Blocking interrupts for too long is _bad_. In the ideal word,
>    nobody should call arch_touch_nmi_watchdog() because we want
>    to know about all sinners.
>
>    In the real world, we allow to hide some sinners because
>    they might produce "false" positives, see touch_nmi_watchdog()
>    callers:
>
>      + Most callers are printk() related.
>
>        We might argue whether it is a false positive or not.
>
>        The argument for "touching the watchdog" is that slow serial
>        consoles might block IRQs for a long time. But they work as
>        expected and can't do better.
>
>        Also the stall is kind of expected in this case. We could
>        confuse users and/or hide the original problem if the stall
>        was reported.
>
>        Note that there is a bigger problem with the legacy console
>        drivers. printk() tries to emit them immediately. And the
>        current console_lock() owner become responsible for flushing
>        new messages added by other CPUs in parallel.
>
>        It is better with the new NBCON console drivers which are
>        offloaded to a kthread. Here, printk() tries to flush them directly
>        only when called in an emergency mode (WARN, stall report)
>        or in panic().
>
>      + There are some other callers, for example, multi_stop_cpu(),
>        or hv_do_rep_hypercall_ex(). IMHO, they create stalls on
>        purpose.

I guess the problem is that these sinners tend to touch the watchdog
because they _might_ end up blocking too long. ...and they touch the
watchdog constantly. Essentially:

while (something_might_be_slow()) {
  touch_nmi_watchdog();
  do_something();
}

Even if the operation is _usually_ not slow (like console write), if
the code has a chance of being slow then it will have the touch.


> 2. Let's look at is_hardlockup() in detail:
>
>     static bool is_hardlockup(unsigned int cpu)
>     {
>         int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
>
>         if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
>                 per_cpu(hrtimer_interrupts_missed, cpu)++;
>                 if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
>                         return true;
>
>                 return false;
>         }
>
>         per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
>         per_cpu(hrtimer_interrupts_missed, cpu) = 0;
>
>         return false;
>     }
>
>     If we call it when the watchdog was touched then
>     (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
>
>         =>  per_cpu(hrtimer_interrupts_missed, cpu)++;
>
>     is called even when watchdog was touched.
>
>     As a result, we might report stall which should have been hidden,
>     for example:
>
> CPU0                               CPU1
>
>  <NMI>
>    watchdog_hardlockup_check() # passes
>      is_hardlockup() # no
>        hr_int_saved = hr_int;
>        hr_int_missed = 0;
>  </NMI>
>
>   local_irq_save()
>     printk()
>       console_trylock()
>       console_unlock()
>         console_flush_all()
>
>            touch_nmi_watchdog()
>
>                                    // Other CPUs print many messages,
>                                    // e.g. during boot when initializing a lot of HW
>                                    for (i=0; i<1000; i++) do
>                                        printk();
>
>       <NMI>
>         watchdog_hardlockup_check()
>           is_hardlockup() # yes
>             hr_int_missed++  # 1
>
>           # skip because touched
>       </NMI>
>
>          touch_nmi_watchdog()
>
>       <NMI>
>         watchdog_hardlockup_check()
>           is_hardlockup() # yes
>             hr_int_missed++  # 2
>
>           # skip because touched
>       </NMI>
>
>     ... repeat many times ...
>
>   local_irq_restore()
>
>     # this might normally trigger handling of pending IRQs
>     # including the timers. But IMHO, it can be offloaded
>     # to a kthread (at least on RT)
>
>       <NMI>
>         watchdog_hardlockup_check()
>           is_hardlockup() # yes
>             hr_int_missed++  # might be already 3, 4,...
>
>           Report hardlockup even when all the "hr_int_missed"
>           values should have been ignored because of
>           touch_watchdog.
>
>       </NMI>
>
>
> A solution might be clearing "hrtimer_interrupts_missed"
> when the watchdog was touched.

Great catch! When I was thinking about Mayank's patches, I thought
about them independently. ...and I believe that independently, each
patch is fine. The problem is that together they have exactly the
problem you indicated.

Clearing "hrtimer_interrupts_missed" seems like the right solution in
Mayank's patch #3.


> But honestly, I am not sure if this is worth the complexity.
>
>
> Higher level look:
> ------------------
>
> My understanding is that this patch has an effect only when
> touch_nmi_watchdog() is called as frequently as
> watchdog_hardlockup_check().
>
> The original code gives the system more time to recover after
> a known stall (touch_nmi_watchdog() called).
>
> The new code is more eager to report a stall. It might be more prone
> to report "false" positives.
>
> IMHO, the root of the problem is that touch_nmi_watchdog() is
> called too frequently. And this patch is rather dancing around
> then fixing it.

I don't think it's really any more likely to report false positives
after the bug you pointed out is fixed. The old watchdog was just too
conservative. With Mayank's proposal I think calling
touch_nmi_watchdog() should reset the watchdog the same amount as
letting the hrtimer run once and that seems like a very reasonable
interpretation.


> Alternative:
> ------------
>
> An alternative solution might to detect and report when too many
> watchdog_hardlockup_check() calls are ignored because of
> touch_nmi_watchdog().
>
> It might help to find a mis-use of touch_nmi_watchdog(). The question
> is what details should be reported in this case.
>
> It should be optional because touch_nmi_watchdog() is supposed
> to hide "well-known" sinners after all.

Hmmmm. I certainly support trying to reduce the number of places that
call touch_nmi_watchdog(), but at the same time I don't think Mayank's
patch is "dancing around" the problem. IMO considering the
touch_nmi_watchdog() to be "pretend a timer interrupt fired" is the
intuitive way one would think the call should work. The fact that the
code gave an entire extra 10 seconds before the watchdog could be
caught just feels like a bug that should be fixed.

For the 8250 driver in particular, it looks like the
touch_nmi_watchdog() was removed from serial8250_console_write() as
part of nbcon, but then that got reverted. That would still leave two
other touch_nmi_watchdog() calls in that driver...

-Doug

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check
  2026-03-05  0:58     ` Doug Anderson
@ 2026-03-05 11:27       ` Petr Mladek
  2026-03-05 16:13         ` Doug Anderson
  0 siblings, 1 reply; 21+ messages in thread
From: Petr Mladek @ 2026-03-05 11:27 UTC (permalink / raw)
  To: Doug Anderson
  Cc: mrungta, Jonathan Corbet, Jinchao Wang, Yunhui Cui,
	Stephane Eranian, Ian Rogers, Li Huafei, Feng Tang,
	Max Kellermann, Andrew Morton, linux-kernel, linux-doc

On Wed 2026-03-04 16:58:35, Doug Anderson wrote:
> Hi,
> 
> On Wed, Mar 4, 2026 at 6:44 AM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Thu 2026-02-12 14:12:10, Mayank Rungta via B4 Relay wrote:
> > > From: Mayank Rungta <mrungta@google.com>
> > >
> > > Currently, arch_touch_nmi_watchdog() causes an early return that
> > > skips updating hrtimer_interrupts_saved. This leads to stale
> > > comparisons and delayed lockup detection.
> > >
> > > Update the saved interrupt count before checking the touched flag
> > > to ensure detection timeliness.
> >
> > IMHO, it is not that easy, see below.
> >
> > So I am curious. Have you found this when debugging a particular
> > problem or just by reading the code, please?
> 
> As I understand it, Mayank found this because the watchdog was
> reacting significantly more slowly than he expected. In his caes, he
> tracked it down to the fact that 8250 console driver has several calls
> to touch_nmi_watchdog(), including on every call to console_write().
> This caused the watchdog to take _much_ longer to fire.
> 
> On devices that fairly chatty w/ their output to the serial console,
> the console_write() is called almost constantly. That means that the
> watchdog is being touched constantly. If I remember Mayank tracked it
> down as:
> 
> * watchdog_hardlockup_check() called and saves counter (1000)
> * timer runs and updates the timer (1000 -> 1001)
> * touch_nmi_watchdog() is called
> * CPU locks up
> * 10 seconds pass
> * watchdog_hardlockup_check() called and saves counter (1001)
> * 10 seconds pass
> * watchdog_hardlockup_check() called and notices touch

Great visualization!

Nit: It seems to be actually the other way around:

 * 10 seconds pass
 * watchdog_hardlockup_check() called and notices touch and skips updating counters
 * 10 seconds pass
 * watchdog_hardlockup_check() called and saves counter (1001)

> * 10 seconds pass
> * watchdog_hardlockup_check() called and finally detects lockup
> 
> ...so we detect the lockup after 30 seconds, which is pretty bad. With
> his new scheme, we'd detect the lockup in 20 seconds.

Fair enough.

> > > @@ -186,7 +186,21 @@ static void watchdog_hardlockup_kick(void)
> > >
> > >  void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > >  {
> > > +     bool is_hl;
> > >       int hardlockup_all_cpu_backtrace;
> > > +     /*
> > > +      * Check for a hardlockup by making sure the CPU's timer
> > > +      * interrupt is incrementing. The timer interrupt should have
> > > +      * fired multiple times before we overflow'd. If it hasn't
> > > +      * then this is a good indication the cpu is stuck
> > > +      *
> > > +      * Purposely check this _before_ checking watchdog_hardlockup_touched
> > > +      * so we make sure we still update the saved value of the interrupts.
> > > +      * Without that we'll take an extra round through this function before
> > > +      * we can detect a lockup.
> > > +      */
> > > +
> > > +     is_hl = is_hardlockup(cpu);
> > >
> > >       if (per_cpu(watchdog_hardlockup_touched, cpu)) {
> > >               per_cpu(watchdog_hardlockup_touched, cpu) = false;
> >
> > Hmm, this does not look correct to me.
> >
> > 2. Let's look at is_hardlockup() in detail:
> >
> >     static bool is_hardlockup(unsigned int cpu)
> >     {
> >         int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> >
> >         if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
> >                 per_cpu(hrtimer_interrupts_missed, cpu)++;
> >                 if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
> >                         return true;
> >
> >                 return false;
> >         }
> >
> >         per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
> >         per_cpu(hrtimer_interrupts_missed, cpu) = 0;
> >
> >         return false;
> >     }
> >
> >     If we call it when the watchdog was touched then
> >     (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
> >
> >         =>  per_cpu(hrtimer_interrupts_missed, cpu)++;
> >
> >     is called even when watchdog was touched.
> >
> >     As a result, we might report stall which should have been hidden,
> >     for example:
> >
> > CPU0                               CPU1
> >
> >  <NMI>
> >    watchdog_hardlockup_check() # passes
> >      is_hardlockup() # no
> >        hr_int_saved = hr_int;
> >        hr_int_missed = 0;
> >  </NMI>
> >
> >   local_irq_save()
> >     printk()
> >       console_trylock()
> >       console_unlock()
> >         console_flush_all()
> >
> >            touch_nmi_watchdog()
> >
> >                                    // Other CPUs print many messages,
> >                                    // e.g. during boot when initializing a lot of HW
> >                                    for (i=0; i<1000; i++) do
> >                                        printk();
> >
> >       <NMI>
> >         watchdog_hardlockup_check()
> >           is_hardlockup() # yes
> >             hr_int_missed++  # 1
> >
> >           # skip because touched
> >       </NMI>
> >
> >          touch_nmi_watchdog()
> >
> >       <NMI>
> >         watchdog_hardlockup_check()
> >           is_hardlockup() # yes
> >             hr_int_missed++  # 2
> >
> >           # skip because touched
> >       </NMI>
> >
> >     ... repeat many times ...
> >
> >   local_irq_restore()
> >
> >     # this might normally trigger handling of pending IRQs
> >     # including the timers. But IMHO, it can be offloaded
> >     # to a kthread (at least on RT)
> >
> >       <NMI>
> >         watchdog_hardlockup_check()
> >           is_hardlockup() # yes
> >             hr_int_missed++  # might be already 3, 4,...
> >
> >           Report hardlockup even when all the "hr_int_missed"
> >           values should have been ignored because of
> >           touch_watchdog.
> >
> >       </NMI>
> >
> >
> > A solution might be clearing "hrtimer_interrupts_missed"
> > when the watchdog was touched.
> 
> Great catch! When I was thinking about Mayank's patches, I thought
> about them independently. ...and I believe that independently, each
> patch is fine. The problem is that together they have exactly the
> problem you indicated.

Heh, I was not aware that "hrtimer_interrupts_missed" was added by
the 3rd patch. I looked at the final code with all patches applied ;-)

> Clearing "hrtimer_interrupts_missed" seems like the right solution in
> Mayank's patch #3.

OK, this 1st patch moves "is_hardlockup()" up because it has some
"side effects". It adds a 4-line comment to explain it.
But it still causes problems in the 3rd patch.

A better solution might be to separate the check and update/reset
of the values. Something like (on top of this patchset, just
compilation tested):

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 30199eaeb5d7..4d0851f0f412 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -167,18 +167,10 @@ void watchdog_hardlockup_touch_cpu(unsigned int cpu)
 	per_cpu(watchdog_hardlockup_touched, cpu) = true;
 }
 
-static bool is_hardlockup(unsigned int cpu)
+static void watchdog_hardlockup_update_reset(unsigned int cpu)
 {
 	int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
 
-	if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
-		per_cpu(hrtimer_interrupts_missed, cpu)++;
-		if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
-			return true;
-
-		return false;
-	}
-
 	/*
 	 * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
 	 * for hrtimer_interrupts_saved. hrtimer_interrupts_saved is
@@ -186,8 +178,20 @@ static bool is_hardlockup(unsigned int cpu)
 	 */
 	per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
 	per_cpu(hrtimer_interrupts_missed, cpu) = 0;
+}
 
-	return false;
+static bool is_hardlockup(unsigned int cpu)
+{
+	int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
+
+	if (per_cpu(hrtimer_interrupts_saved, cpu) != hrint)
+		return false;
+
+	per_cpu(hrtimer_interrupts_missed, cpu)++;
+	if (per_cpu(hrtimer_interrupts_missed, cpu) < watchdog_hardlockup_miss_thresh)
+		return false;
+
+	return true;
 }
 
 static void watchdog_hardlockup_kick(void)
@@ -200,23 +204,10 @@ static void watchdog_hardlockup_kick(void)
 
 void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 {
-	bool is_hl;
 	int hardlockup_all_cpu_backtrace;
-	/*
-	 * Check for a hardlockup by making sure the CPU's timer
-	 * interrupt is incrementing. The timer interrupt should have
-	 * fired multiple times before we overflow'd. If it hasn't
-	 * then this is a good indication the cpu is stuck
-	 *
-	 * Purposely check this _before_ checking watchdog_hardlockup_touched
-	 * so we make sure we still update the saved value of the interrupts.
-	 * Without that we'll take an extra round through this function before
-	 * we can detect a lockup.
-	 */
-
-	is_hl = is_hardlockup(cpu);
 
 	if (per_cpu(watchdog_hardlockup_touched, cpu)) {
+		watchdog_hardlockup_update_reset(cpu);
 		per_cpu(watchdog_hardlockup_touched, cpu) = false;
 		return;
 	}
@@ -224,7 +215,13 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 	hardlockup_all_cpu_backtrace = (hardlockup_si_mask & SYS_INFO_ALL_BT) ?
 					1 : sysctl_hardlockup_all_cpu_backtrace;
 
-	if (is_hl) {
+	/*
+	 * Check for a hardlockup by making sure the CPU's timer
+	 * interrupt is incrementing. The timer interrupt should have
+	 * fired multiple times before we overflow'd. If it hasn't
+	 * then this is a good indication the cpu is stuck
+	 */
+	if (is_hardlockup(cpu)) {
 		unsigned int this_cpu = smp_processor_id();
 		unsigned long flags;
 
@@ -290,6 +287,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 
 		per_cpu(watchdog_hardlockup_warned, cpu) = true;
 	} else {
+		watchdog_hardlockup_update_reset(cpu);
 		per_cpu(watchdog_hardlockup_warned, cpu) = false;
 	}
 }

> > But honestly, I am not sure if this is worth the complexity.
> >
> >
> > Higher level look:
> > ------------------
> >
> > My understanding is that this patch has an effect only when
> > touch_nmi_watchdog() is called as frequently as
> > watchdog_hardlockup_check().
> >
> > The original code gives the system more time to recover after
> > a known stall (touch_nmi_watchdog() called).
> >
> > The new code is more eager to report a stall. It might be more prone
> > to report "false" positives.
> >
> > IMHO, the root of the problem is that touch_nmi_watchdog() is
> > called too frequently. And this patch is rather dancing around
> > then fixing it.
> 
> I don't think it's really any more likely to report false positives
> after the bug you pointed out is fixed. The old watchdog was just too
> conservative. With Mayank's proposal I think calling
> touch_nmi_watchdog() should reset the watchdog the same amount as
> letting the hrtimer run once and that seems like a very reasonable
> interpretation.

Fair enough.

> > Alternative:
> > ------------
> >
> > An alternative solution might to detect and report when too many
> > watchdog_hardlockup_check() calls are ignored because of
> > touch_nmi_watchdog().
> >
> > It might help to find a mis-use of touch_nmi_watchdog(). The question
> > is what details should be reported in this case.
> >
> > It should be optional because touch_nmi_watchdog() is supposed
> > to hide "well-known" sinners after all.
> 
> Hmmmm. I certainly support trying to reduce the number of places that
> call touch_nmi_watchdog(), but at the same time I don't think Mayank's
> patch is "dancing around" the problem. IMO considering the
> touch_nmi_watchdog() to be "pretend a timer interrupt fired" is the
> intuitive way one would think the call should work. The fact that the
> code gave an entire extra 10 seconds before the watchdog could be
> caught just feels like a bug that should be fixed.
> 
> For the 8250 driver in particular, it looks like the
> touch_nmi_watchdog() was removed from serial8250_console_write() as
> part of nbcon, but then that got reverted. That would still leave two
> other touch_nmi_watchdog() calls in that driver...

Sigh, it seems that touch_nmi_watchdog() can't be removed easily.

Best Regards,
Petr

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] doc: watchdog: Clarify hardlockup detection timing
  2026-02-12 21:12 ` [PATCH 2/4] doc: watchdog: Clarify hardlockup detection timing Mayank Rungta via B4 Relay
  2026-02-13 16:29   ` Doug Anderson
@ 2026-03-05 12:33   ` Petr Mladek
  1 sibling, 0 replies; 21+ messages in thread
From: Petr Mladek @ 2026-03-05 12:33 UTC (permalink / raw)
  To: mrungta
  Cc: Jonathan Corbet, Jinchao Wang, Yunhui Cui, Stephane Eranian,
	Ian Rogers, Li Huafei, Feng Tang, Max Kellermann,
	Douglas Anderson, Andrew Morton, linux-kernel, linux-doc

On Thu 2026-02-12 14:12:11, Mayank Rungta via B4 Relay wrote:
> From: Mayank Rungta <mrungta@google.com>
> 
> The current documentation implies that a hardlockup is strictly defined
> as looping for "more than 10 seconds." However, the detection mechanism
> is periodic (based on `watchdog_thresh`), meaning detection time varies
> significantly depending on when the lockup occurs relative to the NMI
> perf event.
> 
> Update the definition to remove the strict "more than 10 seconds"
> constraint in the introduction and defer details to the Implementation
> section.
> 
> Additionally, add a "Detection Overhead" section illustrating the
> Best Case (~6s) and Worst Case (~20s) detection scenarios to provide
> administrators with a clearer understanding of the watchdog's
> latency.
> 
> Signed-off-by: Mayank Rungta <mrungta@google.com>

Great addition:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness
  2026-02-12 21:12 ` [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness Mayank Rungta via B4 Relay
  2026-02-13 16:30   ` Doug Anderson
@ 2026-03-05 13:46   ` Petr Mladek
  2026-03-05 16:45     ` Doug Anderson
  1 sibling, 1 reply; 21+ messages in thread
From: Petr Mladek @ 2026-03-05 13:46 UTC (permalink / raw)
  To: mrungta
  Cc: Jonathan Corbet, Jinchao Wang, Yunhui Cui, Stephane Eranian,
	Ian Rogers, Li Huafei, Feng Tang, Max Kellermann,
	Douglas Anderson, Andrew Morton, linux-kernel, linux-doc

On Thu 2026-02-12 14:12:12, Mayank Rungta via B4 Relay wrote:
> From: Mayank Rungta <mrungta@google.com>
> 
> Currently, the buddy system only performs checks every 3rd sample. With
> a 4-second interval. If a check window is missed, the next check occurs
> 12 seconds later, potentially delaying hard lockup detection for up to
> 24 seconds.
> 
> Modify the buddy system to perform checks at every interval (4s).
> Introduce a missed-interrupt threshold to maintain the existing grace
> period while reducing the detection window to 8-12 seconds.
> 
> Best and worst case detection scenarios:
> 
> Before (12s check window):
> - Best case: Lockup occurs after first check but just before heartbeat
>   interval. Detected in ~8s (8s till next check).
> - Worst case: Lockup occurs just after a check.
>   Detected in ~24s (missed check + 12s till next check + 12s logic).
> 
> After (4s check window with threshold of 3):
> - Best case: Lockup occurs just before a check.
>   Detected in ~8s (0s till 1st check + 4s till 2nd + 4s till 3rd).
> - Worst case: Lockup occurs just after a check.
>   Detected in ~12s (4s till 1st check + 4s till 2nd + 4s till 3rd).

One might argue that the interval <8s,24s> is not much worse than
<6s,20s> achieved by the perf detector.

But I personally like that the disperse of <8s,12s> is lower so that
the result is more predictable. And it is relatively cheap.

People might have different option. But I am fine with this change.

> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -163,8 +171,13 @@ static bool is_hardlockup(unsigned int cpu)
>  {
>  	int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
>  
> -	if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
> -		return true;
> +	if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
> +		per_cpu(hrtimer_interrupts_missed, cpu)++;
> +		if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)

This would return true for every check when missed >= 3.
As a result, the hardlockup would be reported every 4s.

I would keep the 12s cadence and change this to:

		if (per_cpu(hrtimer_interrupts_missed, cpu) % watchdog_hardlockup_miss_thresh == 0)

> +			return true;
> +
> +		return false;
> +	}
>  
>  	/*
>  	 * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
> --- a/kernel/watchdog_buddy.c
> +++ b/kernel/watchdog_buddy.c
> @@ -86,14 +87,6 @@ void watchdog_buddy_check_hardlockup(int hrtimer_interrupts)
>  {
>  	unsigned int next_cpu;
>  
> -	/*
> -	 * Test for hardlockups every 3 samples. The sample period is
> -	 *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
> -	 *  watchdog_thresh (over by 20%).
> -	 */
> -	if (hrtimer_interrupts % 3 != 0)
> -		return;

It would be symetric with the "% 3" above.

> -
>  	/* check for a hardlockup on the next CPU */
>  	next_cpu = watchdog_next_cpu(smp_processor_id());
>  	if (next_cpu >= nr_cpu_ids)

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check
  2026-03-05 11:27       ` Petr Mladek
@ 2026-03-05 16:13         ` Doug Anderson
  2026-03-09 13:33           ` Petr Mladek
  0 siblings, 1 reply; 21+ messages in thread
From: Doug Anderson @ 2026-03-05 16:13 UTC (permalink / raw)
  To: Petr Mladek
  Cc: mrungta, Jonathan Corbet, Jinchao Wang, Yunhui Cui,
	Stephane Eranian, Ian Rogers, Li Huafei, Feng Tang,
	Max Kellermann, Andrew Morton, linux-kernel, linux-doc

Hi,

On Thu, Mar 5, 2026 at 3:27 AM Petr Mladek <pmladek@suse.com> wrote:
>
> > * watchdog_hardlockup_check() called and saves counter (1000)
> > * timer runs and updates the timer (1000 -> 1001)
> > * touch_nmi_watchdog() is called
> > * CPU locks up
> > * 10 seconds pass
> > * watchdog_hardlockup_check() called and saves counter (1001)
> > * 10 seconds pass
> > * watchdog_hardlockup_check() called and notices touch
>
> Great visualization!
>
> Nit: It seems to be actually the other way around:
>
>  * 10 seconds pass
>  * watchdog_hardlockup_check() called and notices touch and skips updating counters
>  * 10 seconds pass
>  * watchdog_hardlockup_check() called and saves counter (1001)

Oops, right! :-) Mayank: it's probably worth adding some form of the
(corrected) example here to the commit message. Also, you could
mention in the commit message that you were seeing real problems
because of the 8250 console prints with the general rule that if
someone asks a question during the a review it's worth including that
info in the next version of the commit message. ;-)


> A better solution might be to separate the check and update/reset
> of the values. Something like (on top of this patchset, just
> compilation tested):
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 30199eaeb5d7..4d0851f0f412 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -167,18 +167,10 @@ void watchdog_hardlockup_touch_cpu(unsigned int cpu)
>         per_cpu(watchdog_hardlockup_touched, cpu) = true;
>  }
>
> -static bool is_hardlockup(unsigned int cpu)
> +static void watchdog_hardlockup_update_reset(unsigned int cpu)
>  {
>         int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
>
> -       if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
> -               per_cpu(hrtimer_interrupts_missed, cpu)++;
> -               if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
> -                       return true;
> -
> -               return false;
> -       }
> -
>         /*
>          * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
>          * for hrtimer_interrupts_saved. hrtimer_interrupts_saved is
> @@ -186,8 +178,20 @@ static bool is_hardlockup(unsigned int cpu)
>          */
>         per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
>         per_cpu(hrtimer_interrupts_missed, cpu) = 0;
> +}
>
> -       return false;
> +static bool is_hardlockup(unsigned int cpu)
> +{
> +       int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> +
> +       if (per_cpu(hrtimer_interrupts_saved, cpu) != hrint)
> +               return false;
> +
> +       per_cpu(hrtimer_interrupts_missed, cpu)++;
> +       if (per_cpu(hrtimer_interrupts_missed, cpu) < watchdog_hardlockup_miss_thresh)
> +               return false;
> +
> +       return true;
>  }
>
>  static void watchdog_hardlockup_kick(void)
> @@ -200,23 +204,10 @@ static void watchdog_hardlockup_kick(void)
>
>  void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
>  {
> -       bool is_hl;
>         int hardlockup_all_cpu_backtrace;
> -       /*
> -        * Check for a hardlockup by making sure the CPU's timer
> -        * interrupt is incrementing. The timer interrupt should have
> -        * fired multiple times before we overflow'd. If it hasn't
> -        * then this is a good indication the cpu is stuck
> -        *
> -        * Purposely check this _before_ checking watchdog_hardlockup_touched
> -        * so we make sure we still update the saved value of the interrupts.
> -        * Without that we'll take an extra round through this function before
> -        * we can detect a lockup.
> -        */
> -
> -       is_hl = is_hardlockup(cpu);
>
>         if (per_cpu(watchdog_hardlockup_touched, cpu)) {
> +               watchdog_hardlockup_update_reset(cpu);
>                 per_cpu(watchdog_hardlockup_touched, cpu) = false;
>                 return;
>         }
> @@ -224,7 +215,13 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
>         hardlockup_all_cpu_backtrace = (hardlockup_si_mask & SYS_INFO_ALL_BT) ?
>                                         1 : sysctl_hardlockup_all_cpu_backtrace;
>
> -       if (is_hl) {
> +       /*
> +        * Check for a hardlockup by making sure the CPU's timer
> +        * interrupt is incrementing. The timer interrupt should have
> +        * fired multiple times before we overflow'd. If it hasn't
> +        * then this is a good indication the cpu is stuck
> +        */
> +       if (is_hardlockup(cpu)) {
>                 unsigned int this_cpu = smp_processor_id();
>                 unsigned long flags;
>
> @@ -290,6 +287,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
>
>                 per_cpu(watchdog_hardlockup_warned, cpu) = true;
>         } else {
> +               watchdog_hardlockup_update_reset(cpu);
>                 per_cpu(watchdog_hardlockup_warned, cpu) = false;
>         }
>  }

I haven't tested it, but that actually looks like a pretty nice final
result to me. Mayank: What do you think? You'd have to figure out how
to rework your two patches to incorporate Petr's ideas.

Petr: Since you gave your ideas as a diff, what are you thinking in
terms of tags on Mayank's v2? You didn't provide a Signed-off-by on
your diff, so I guess you're expecting Mayank not to incorprate it
directly but take it as a "suggestion" for improving his patches (AKA
not add any of your tags to his v2).

One nit: in the final result, it might be nice to invert the
"is_hardlockup()" test so we can return early and get rid of a level
of indentation. AKA:

if (!is_hardlockup(cpu)) {
  watchdog_hardlockup_update_reset(cpu);
  per_cpu(watchdog_hardlockup_warned, cpu) = false;
  return;
}

Not only does it reduce indentation, but it also keeps the two calls
to watchdog_hardlockup_update_reset() closer to each other.

 -Doug

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness
  2026-03-05 13:46   ` Petr Mladek
@ 2026-03-05 16:45     ` Doug Anderson
  2026-03-11 14:07       ` Petr Mladek
  0 siblings, 1 reply; 21+ messages in thread
From: Doug Anderson @ 2026-03-05 16:45 UTC (permalink / raw)
  To: Petr Mladek
  Cc: mrungta, Jonathan Corbet, Jinchao Wang, Yunhui Cui,
	Stephane Eranian, Ian Rogers, Li Huafei, Feng Tang,
	Max Kellermann, Andrew Morton, linux-kernel, linux-doc

Hi,

On Thu, Mar 5, 2026 at 5:47 AM Petr Mladek <pmladek@suse.com> wrote:
>
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -163,8 +171,13 @@ static bool is_hardlockup(unsigned int cpu)
> >  {
> >       int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> >
> > -     if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
> > -             return true;
> > +     if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
> > +             per_cpu(hrtimer_interrupts_missed, cpu)++;
> > +             if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
>
> This would return true for every check when missed >= 3.
> As a result, the hardlockup would be reported every 4s.
>
> I would keep the 12s cadence and change this to:
>
>                 if (per_cpu(hrtimer_interrupts_missed, cpu) % watchdog_hardlockup_miss_thresh == 0)

I could be confused, but I don't think this is needed because we clear
"hrtimer_interrupts_missed" to 0 any time we save the timer count.
While I believe the "%" will functionally work, it seems harder to
understand, at least to me.


> > +                     return true;
> > +
> > +             return false;
> > +     }
> >
> >       /*
> >        * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
> > --- a/kernel/watchdog_buddy.c
> > +++ b/kernel/watchdog_buddy.c
> > @@ -86,14 +87,6 @@ void watchdog_buddy_check_hardlockup(int hrtimer_interrupts)
> >  {
> >       unsigned int next_cpu;
> >
> > -     /*
> > -      * Test for hardlockups every 3 samples. The sample period is
> > -      *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
> > -      *  watchdog_thresh (over by 20%).
> > -      */
> > -     if (hrtimer_interrupts % 3 != 0)
> > -             return;
>
> It would be symetric with the "% 3" above.

Here we weren't resetting the count, so the "%" _was_ important. In
the new code where we're resetting the count back to 0...

-Doug

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check
  2026-03-05 16:13         ` Doug Anderson
@ 2026-03-09 13:33           ` Petr Mladek
  2026-03-11  2:51             ` Mayank Rungta
  0 siblings, 1 reply; 21+ messages in thread
From: Petr Mladek @ 2026-03-09 13:33 UTC (permalink / raw)
  To: Doug Anderson
  Cc: mrungta, Jonathan Corbet, Jinchao Wang, Yunhui Cui,
	Stephane Eranian, Ian Rogers, Li Huafei, Feng Tang,
	Max Kellermann, Andrew Morton, linux-kernel, linux-doc

On Thu 2026-03-05 08:13:39, Doug Anderson wrote:
> Hi,
> 
> On Thu, Mar 5, 2026 at 3:27 AM Petr Mladek <pmladek@suse.com> wrote:
> >
> > > * watchdog_hardlockup_check() called and saves counter (1000)
> > > * timer runs and updates the timer (1000 -> 1001)
> > > * touch_nmi_watchdog() is called
> > > * CPU locks up
> > > * 10 seconds pass
> > > * watchdog_hardlockup_check() called and saves counter (1001)
> > > * 10 seconds pass
> > > * watchdog_hardlockup_check() called and notices touch
> >
> > Great visualization!
> >
> > Nit: It seems to be actually the other way around:
> >
> >  * 10 seconds pass
> >  * watchdog_hardlockup_check() called and notices touch and skips updating counters
> >  * 10 seconds pass
> >  * watchdog_hardlockup_check() called and saves counter (1001)
> 
> Oops, right! :-) Mayank: it's probably worth adding some form of the
> (corrected) example here to the commit message. Also, you could
> mention in the commit message that you were seeing real problems
> because of the 8250 console prints with the general rule that if
> someone asks a question during the a review it's worth including that
> info in the next version of the commit message. ;-)
> 
> 
> > A better solution might be to separate the check and update/reset
> > of the values. Something like (on top of this patchset, just
> > compilation tested):
> >
> > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > index 30199eaeb5d7..4d0851f0f412 100644
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -167,18 +167,10 @@ void watchdog_hardlockup_touch_cpu(unsigned int cpu)
> >         per_cpu(watchdog_hardlockup_touched, cpu) = true;
> >  }
> >
> > -static bool is_hardlockup(unsigned int cpu)
> > +static void watchdog_hardlockup_update_reset(unsigned int cpu)
> >  {
> >         int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> >
> > -       if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
> > -               per_cpu(hrtimer_interrupts_missed, cpu)++;
> > -               if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
> > -                       return true;
> > -
> > -               return false;
> > -       }
> > -
> >         /*
> >          * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
> >          * for hrtimer_interrupts_saved. hrtimer_interrupts_saved is
> > @@ -186,8 +178,20 @@ static bool is_hardlockup(unsigned int cpu)
> >          */
> >         per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
> >         per_cpu(hrtimer_interrupts_missed, cpu) = 0;
> > +}
> >
> > -       return false;
> > +static bool is_hardlockup(unsigned int cpu)
> > +{
> > +       int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> > +
> > +       if (per_cpu(hrtimer_interrupts_saved, cpu) != hrint)
> > +               return false;
> > +
> > +       per_cpu(hrtimer_interrupts_missed, cpu)++;
> > +       if (per_cpu(hrtimer_interrupts_missed, cpu) < watchdog_hardlockup_miss_thresh)
> > +               return false;
> > +
> > +       return true;
> >  }
> >
> >  static void watchdog_hardlockup_kick(void)
> > @@ -200,23 +204,10 @@ static void watchdog_hardlockup_kick(void)
> >
> >  void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> >  {
> > -       bool is_hl;
> >         int hardlockup_all_cpu_backtrace;
> > -       /*
> > -        * Check for a hardlockup by making sure the CPU's timer
> > -        * interrupt is incrementing. The timer interrupt should have
> > -        * fired multiple times before we overflow'd. If it hasn't
> > -        * then this is a good indication the cpu is stuck
> > -        *
> > -        * Purposely check this _before_ checking watchdog_hardlockup_touched
> > -        * so we make sure we still update the saved value of the interrupts.
> > -        * Without that we'll take an extra round through this function before
> > -        * we can detect a lockup.
> > -        */
> > -
> > -       is_hl = is_hardlockup(cpu);
> >
> >         if (per_cpu(watchdog_hardlockup_touched, cpu)) {
> > +               watchdog_hardlockup_update_reset(cpu);
> >                 per_cpu(watchdog_hardlockup_touched, cpu) = false;
> >                 return;
> >         }
> > @@ -224,7 +215,13 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> >         hardlockup_all_cpu_backtrace = (hardlockup_si_mask & SYS_INFO_ALL_BT) ?
> >                                         1 : sysctl_hardlockup_all_cpu_backtrace;
> >
> > -       if (is_hl) {
> > +       /*
> > +        * Check for a hardlockup by making sure the CPU's timer
> > +        * interrupt is incrementing. The timer interrupt should have
> > +        * fired multiple times before we overflow'd. If it hasn't
> > +        * then this is a good indication the cpu is stuck
> > +        */
> > +       if (is_hardlockup(cpu)) {
> >                 unsigned int this_cpu = smp_processor_id();
> >                 unsigned long flags;
> >
> > @@ -290,6 +287,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> >
> >                 per_cpu(watchdog_hardlockup_warned, cpu) = true;
> >         } else {
> > +               watchdog_hardlockup_update_reset(cpu);
> >                 per_cpu(watchdog_hardlockup_warned, cpu) = false;
> >         }
> >  }
> 
> I haven't tested it, but that actually looks like a pretty nice final
> result to me. Mayank: What do you think? You'd have to figure out how
> to rework your two patches to incorporate Petr's ideas.
> 
> Petr: Since you gave your ideas as a diff, what are you thinking in
> terms of tags on Mayank's v2? You didn't provide a Signed-off-by on
> your diff, so I guess you're expecting Mayank not to incorprate it
> directly but take it as a "suggestion" for improving his patches (AKA
> not add any of your tags to his v2).

I expected that Mayank could rework his patchset using ideas from the
diff. Feel free to use the changes as they are and copy&paste them
from my diff. It is just a refactoring.

> One nit: in the final result, it might be nice to invert the
> "is_hardlockup()" test so we can return early and get rid of a level
> of indentation. AKA:
> 
> if (!is_hardlockup(cpu)) {
>   watchdog_hardlockup_update_reset(cpu);
>   per_cpu(watchdog_hardlockup_warned, cpu) = false;
>   return;
> }
> 
> Not only does it reduce indentation, but it also keeps the two calls
> to watchdog_hardlockup_update_reset() closer to each other.

Yeah, that would be great. I actually wanted to do it in my diff
as well. But I did not do it to keep the diff simple.

It might be better to invert the logic as a separate preparation
patch so that we do not hide other changes in the reshuffling.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check
  2026-03-09 13:33           ` Petr Mladek
@ 2026-03-11  2:51             ` Mayank Rungta
  2026-03-11 13:56               ` Petr Mladek
  0 siblings, 1 reply; 21+ messages in thread
From: Mayank Rungta @ 2026-03-11  2:51 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Doug Anderson, Jonathan Corbet, Jinchao Wang, Yunhui Cui,
	Stephane Eranian, Ian Rogers, Li Huafei, Feng Tang,
	Max Kellermann, Andrew Morton, linux-kernel, linux-doc

Thanks guys for ironing out the details. I tried to implement the
ideas mentioned in this thread

> On Mon, Mar 9, 2026 at 6:33 AM Petr Mladek <pmladek@suse.com> wrote:
>
> On Thu 2026-03-05 08:13:39, Doug Anderson wrote:
> > Hi,
> >
> > On Thu, Mar 5, 2026 at 3:27 AM Petr Mladek <pmladek@suse.com> wrote:
> > >
> > > > * watchdog_hardlockup_check() called and saves counter (1000)
> > > > * timer runs and updates the timer (1000 -> 1001)
> > > > * touch_nmi_watchdog() is called
> > > > * CPU locks up
> > > > * 10 seconds pass
> > > > * watchdog_hardlockup_check() called and saves counter (1001)
> > > > * 10 seconds pass
> > > > * watchdog_hardlockup_check() called and notices touch
> > >
> > > Great visualization!
> > >
> > > Nit: It seems to be actually the other way around:
> > >
> > >  * 10 seconds pass
> > >  * watchdog_hardlockup_check() called and notices touch and skips updating counters
> > >  * 10 seconds pass
> > >  * watchdog_hardlockup_check() called and saves counter (1001)
> >
> > Oops, right! :-) Mayank: it's probably worth adding some form of the
> > (corrected) example here to the commit message. Also, you could
> > mention in the commit message that you were seeing real problems
> > because of the 8250 console prints with the general rule that if
> > someone asks a question during the a review it's worth including that
> > info in the next version of the commit message. ;-)
> >

That's a good idea, sure I will add this to the commit message for V2.

> >
> > > A better solution might be to separate the check and update/reset
> > > of the values. Something like (on top of this patchset, just
> > > compilation tested):
> > >
> > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > > index 30199eaeb5d7..4d0851f0f412 100644
> > > --- a/kernel/watchdog.c
> > > +++ b/kernel/watchdog.c
> > > @@ -167,18 +167,10 @@ void watchdog_hardlockup_touch_cpu(unsigned int cpu)
> > >         per_cpu(watchdog_hardlockup_touched, cpu) = true;
> > >  }
> > >
> > > -static bool is_hardlockup(unsigned int cpu)
> > > +static void watchdog_hardlockup_update_reset(unsigned int cpu)
> > >  {
> > >         int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> > >
> > > -       if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
> > > -               per_cpu(hrtimer_interrupts_missed, cpu)++;
> > > -               if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
> > > -                       return true;
> > > -
> > > -               return false;
> > > -       }
> > > -
> > >         /*
> > >          * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
> > >          * for hrtimer_interrupts_saved. hrtimer_interrupts_saved is
> > > @@ -186,8 +178,20 @@ static bool is_hardlockup(unsigned int cpu)
> > >          */
> > >         per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
> > >         per_cpu(hrtimer_interrupts_missed, cpu) = 0;
> > > +}
> > >
> > > -       return false;
> > > +static bool is_hardlockup(unsigned int cpu)
> > > +{
> > > +       int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> > > +
> > > +       if (per_cpu(hrtimer_interrupts_saved, cpu) != hrint)
> > > +               return false;
> > > +
> > > +       per_cpu(hrtimer_interrupts_missed, cpu)++;
> > > +       if (per_cpu(hrtimer_interrupts_missed, cpu) < watchdog_hardlockup_miss_thresh)
> > > +               return false;
> > > +
> > > +       return true;
> > >  }
> > >
> > >  static void watchdog_hardlockup_kick(void)
> > > @@ -200,23 +204,10 @@ static void watchdog_hardlockup_kick(void)
> > >
> > >  void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > >  {
> > > -       bool is_hl;
> > >         int hardlockup_all_cpu_backtrace;
> > > -       /*
> > > -        * Check for a hardlockup by making sure the CPU's timer
> > > -        * interrupt is incrementing. The timer interrupt should have
> > > -        * fired multiple times before we overflow'd. If it hasn't
> > > -        * then this is a good indication the cpu is stuck
> > > -        *
> > > -        * Purposely check this _before_ checking watchdog_hardlockup_touched
> > > -        * so we make sure we still update the saved value of the interrupts.
> > > -        * Without that we'll take an extra round through this function before
> > > -        * we can detect a lockup.
> > > -        */
> > > -
> > > -       is_hl = is_hardlockup(cpu);
> > >
> > >         if (per_cpu(watchdog_hardlockup_touched, cpu)) {
> > > +               watchdog_hardlockup_update_reset(cpu);
> > >                 per_cpu(watchdog_hardlockup_touched, cpu) = false;
> > >                 return;
> > >         }
> > > @@ -224,7 +215,13 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > >         hardlockup_all_cpu_backtrace = (hardlockup_si_mask & SYS_INFO_ALL_BT) ?
> > >                                         1 : sysctl_hardlockup_all_cpu_backtrace;
> > >
> > > -       if (is_hl) {
> > > +       /*
> > > +        * Check for a hardlockup by making sure the CPU's timer
> > > +        * interrupt is incrementing. The timer interrupt should have
> > > +        * fired multiple times before we overflow'd. If it hasn't
> > > +        * then this is a good indication the cpu is stuck
> > > +        */
> > > +       if (is_hardlockup(cpu)) {
> > >                 unsigned int this_cpu = smp_processor_id();
> > >                 unsigned long flags;
> > >
> > > @@ -290,6 +287,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > >
> > >                 per_cpu(watchdog_hardlockup_warned, cpu) = true;
> > >         } else {
> > > +               watchdog_hardlockup_update_reset(cpu);
> > >                 per_cpu(watchdog_hardlockup_warned, cpu) = false;
> > >         }
> > >  }
> >
> > I haven't tested it, but that actually looks like a pretty nice final
> > result to me. Mayank: What do you think? You'd have to figure out how
> > to rework your two patches to incorporate Petr's ideas.
> >

Thanks for your suggestion, this is pretty close, but we cannot call
watchdog_hardlockup_update_reset(cpu) in a general else block. If we
did, the hrtimer_interrupts_missed count would be reset on every check
that isn't a hardlockup; even when no progress was actually made and
we would never hit the watchdog_threshold.

I’ve moved the update/reset inside the "progress detected" path within
is_hardlockup() instead:

static bool is_hardlockup(unsigned int cpu)
{
        int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));

        if (per_cpu(hrtimer_interrupts_saved, cpu) != hrint) {
                watchdog_hardlockup_update_reset(cpu);
                return false;
        }

        per_cpu(hrtimer_interrupts_missed, cpu)++;
        if (per_cpu(hrtimer_interrupts_missed, cpu) <
watchdog_hardlockup_miss_thresh)
                return false;

        return true;
}

> > Petr: Since you gave your ideas as a diff, what are you thinking in
> > terms of tags on Mayank's v2? You didn't provide a Signed-off-by on
> > your diff, so I guess you're expecting Mayank not to incorprate it
> > directly but take it as a "suggestion" for improving his patches (AKA
> > not add any of your tags to his v2).
>
> I expected that Mayank could rework his patchset using ideas from the
> diff. Feel free to use the changes as they are and copy&paste them
> from my diff. It is just a refactoring.
>
> > One nit: in the final result, it might be nice to invert the
> > "is_hardlockup()" test so we can return early and get rid of a level
> > of indentation. AKA:
> >
> > if (!is_hardlockup(cpu)) {
> >   watchdog_hardlockup_update_reset(cpu);
> >   per_cpu(watchdog_hardlockup_warned, cpu) = false;
> >   return;
> > }
> >
> > Not only does it reduce indentation, but it also keeps the two calls
> > to watchdog_hardlockup_update_reset() closer to each other.
>
> Yeah, that would be great. I actually wanted to do it in my diff
> as well. But I did not do it to keep the diff simple.
>
> It might be better to invert the logic as a separate preparation
> patch so that we do not hide other changes in the reshuffling.
>
> Best Regards,
> Petr

I still went ahead and flipped the `is_hardlockup` test in the main
check function to keep the indentation clean

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check
  2026-03-11  2:51             ` Mayank Rungta
@ 2026-03-11 13:56               ` Petr Mladek
  0 siblings, 0 replies; 21+ messages in thread
From: Petr Mladek @ 2026-03-11 13:56 UTC (permalink / raw)
  To: Mayank Rungta
  Cc: Doug Anderson, Jonathan Corbet, Jinchao Wang, Yunhui Cui,
	Stephane Eranian, Ian Rogers, Li Huafei, Feng Tang,
	Max Kellermann, Andrew Morton, linux-kernel, linux-doc

On Tue 2026-03-10 19:51:21, Mayank Rungta wrote:
> > On Mon, Mar 9, 2026 at 6:33 AM Petr Mladek <pmladek@suse.com> wrote:
> > On Thu 2026-03-05 08:13:39, Doug Anderson wrote:
> > > On Thu, Mar 5, 2026 at 3:27 AM Petr Mladek <pmladek@suse.com> wrote:
> > > > A better solution might be to separate the check and update/reset
> > > > of the values. Something like (on top of this patchset, just
> > > > compilation tested):
> > > >
> > > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > > > index 30199eaeb5d7..4d0851f0f412 100644
> > > > --- a/kernel/watchdog.c
> > > > +++ b/kernel/watchdog.c
> > > > @@ -167,18 +167,10 @@ void watchdog_hardlockup_touch_cpu(unsigned int cpu)
> > > >         per_cpu(watchdog_hardlockup_touched, cpu) = true;
> > > >  }
> > > >
> > > > -static bool is_hardlockup(unsigned int cpu)
> > > > +static void watchdog_hardlockup_update_reset(unsigned int cpu)
> > > >  {
> > > >         int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> > > >
> > > > -       if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
> > > > -               per_cpu(hrtimer_interrupts_missed, cpu)++;
> > > > -               if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
> > > > -                       return true;
> > > > -
> > > > -               return false;
> > > > -       }
> > > > -
> > > >         /*
> > > >          * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
> > > >          * for hrtimer_interrupts_saved. hrtimer_interrupts_saved is
> > > > @@ -186,8 +178,20 @@ static bool is_hardlockup(unsigned int cpu)
> > > >          */
> > > >         per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
> > > >         per_cpu(hrtimer_interrupts_missed, cpu) = 0;
> > > > +}
> > > >
> > > > -       return false;
> > > > +static bool is_hardlockup(unsigned int cpu)
> > > > +{
> > > > +       int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> > > > +
> > > > +       if (per_cpu(hrtimer_interrupts_saved, cpu) != hrint)
> > > > +               return false;
> > > > +
> > > > +       per_cpu(hrtimer_interrupts_missed, cpu)++;
> > > > +       if (per_cpu(hrtimer_interrupts_missed, cpu) < watchdog_hardlockup_miss_thresh)
> > > > +               return false;
> > > > +
> > > > +       return true;
> > > >  }
> > > >
> > > >  static void watchdog_hardlockup_kick(void)
> > > > @@ -200,23 +204,10 @@ static void watchdog_hardlockup_kick(void)
> > > >
> > > >  void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > > >  {
> > > > -       bool is_hl;
> > > >         int hardlockup_all_cpu_backtrace;
> > > > -       /*
> > > > -        * Check for a hardlockup by making sure the CPU's timer
> > > > -        * interrupt is incrementing. The timer interrupt should have
> > > > -        * fired multiple times before we overflow'd. If it hasn't
> > > > -        * then this is a good indication the cpu is stuck
> > > > -        *
> > > > -        * Purposely check this _before_ checking watchdog_hardlockup_touched
> > > > -        * so we make sure we still update the saved value of the interrupts.
> > > > -        * Without that we'll take an extra round through this function before
> > > > -        * we can detect a lockup.
> > > > -        */
> > > > -
> > > > -       is_hl = is_hardlockup(cpu);
> > > >
> > > >         if (per_cpu(watchdog_hardlockup_touched, cpu)) {
> > > > +               watchdog_hardlockup_update_reset(cpu);
> > > >                 per_cpu(watchdog_hardlockup_touched, cpu) = false;
> > > >                 return;
> > > >         }
> > > > @@ -224,7 +215,13 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > > >         hardlockup_all_cpu_backtrace = (hardlockup_si_mask & SYS_INFO_ALL_BT) ?
> > > >                                         1 : sysctl_hardlockup_all_cpu_backtrace;
> > > >
> > > > -       if (is_hl) {
> > > > +       /*
> > > > +        * Check for a hardlockup by making sure the CPU's timer
> > > > +        * interrupt is incrementing. The timer interrupt should have
> > > > +        * fired multiple times before we overflow'd. If it hasn't
> > > > +        * then this is a good indication the cpu is stuck
> > > > +        */
> > > > +       if (is_hardlockup(cpu)) {
> > > >                 unsigned int this_cpu = smp_processor_id();
> > > >                 unsigned long flags;
> > > >
> > > > @@ -290,6 +287,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > > >
> > > >                 per_cpu(watchdog_hardlockup_warned, cpu) = true;
> > > >         } else {
> > > > +               watchdog_hardlockup_update_reset(cpu);
> > > >                 per_cpu(watchdog_hardlockup_warned, cpu) = false;
> > > >         }
> > > >  }
> > >
> > > I haven't tested it, but that actually looks like a pretty nice final
> > > result to me. Mayank: What do you think? You'd have to figure out how
> > > to rework your two patches to incorporate Petr's ideas.
> > >
> 
> Thanks for your suggestion, this is pretty close, but we cannot call
> watchdog_hardlockup_update_reset(cpu) in a general else block. If we
> did, the hrtimer_interrupts_missed count would be reset on every check
> that isn't a hardlockup; even when no progress was actually made and
> we would never hit the watchdog_threshold.

Great catch!

> I’ve moved the update/reset inside the "progress detected" path within
> is_hardlockup() instead:
> 
> static bool is_hardlockup(unsigned int cpu)
> {
>         int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> 
>         if (per_cpu(hrtimer_interrupts_saved, cpu) != hrint) {
>                 watchdog_hardlockup_update_reset(cpu);
>                 return false;
>         }
> 
>         per_cpu(hrtimer_interrupts_missed, cpu)++;
>         if (per_cpu(hrtimer_interrupts_missed, cpu) <
> watchdog_hardlockup_miss_thresh)
>                 return false;
> 
>         return true;
> }

Looks good to me.

> I still went ahead and flipped the `is_hardlockup` test in the main
> check function to keep the indentation clean

Great.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness
  2026-03-05 16:45     ` Doug Anderson
@ 2026-03-11 14:07       ` Petr Mladek
  2026-03-12 21:02         ` Doug Anderson
  0 siblings, 1 reply; 21+ messages in thread
From: Petr Mladek @ 2026-03-11 14:07 UTC (permalink / raw)
  To: Doug Anderson
  Cc: mrungta, Jonathan Corbet, Jinchao Wang, Yunhui Cui,
	Stephane Eranian, Ian Rogers, Li Huafei, Feng Tang,
	Max Kellermann, Andrew Morton, linux-kernel, linux-doc

On Thu 2026-03-05 08:45:35, Doug Anderson wrote:
> Hi,
> 
> On Thu, Mar 5, 2026 at 5:47 AM Petr Mladek <pmladek@suse.com> wrote:
> >
> > > --- a/kernel/watchdog.c
> > > +++ b/kernel/watchdog.c
> > > @@ -163,8 +171,13 @@ static bool is_hardlockup(unsigned int cpu)
> > >  {
> > >       int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> > >
> > > -     if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
> > > -             return true;
> > > +     if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
> > > +             per_cpu(hrtimer_interrupts_missed, cpu)++;
> > > +             if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
> >
> > This would return true for every check when missed >= 3.
> > As a result, the hardlockup would be reported every 4s.
> >
> > I would keep the 12s cadence and change this to:
> >
> >                 if (per_cpu(hrtimer_interrupts_missed, cpu) % watchdog_hardlockup_miss_thresh == 0)
> 
> I could be confused, but I don't think this is needed because we clear
> "hrtimer_interrupts_missed" to 0 any time we save the timer count.
> While I believe the "%" will functionally work, it seems harder to
> understand, at least to me.

My understanding is that we save the number of interrupts
and reset missed counter only when:

   + the number of interrupts is different (timer on the watched CPU fired)
   + the watchdog was touched (hiding delay)

=> it is just incremented when the timer was not called
   (hardlockup scenario).

In particular, it is _not_ reset when we report the hardlockup.

Or do I miss anything.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness
  2026-03-11 14:07       ` Petr Mladek
@ 2026-03-12 21:02         ` Doug Anderson
  0 siblings, 0 replies; 21+ messages in thread
From: Doug Anderson @ 2026-03-12 21:02 UTC (permalink / raw)
  To: Petr Mladek
  Cc: mrungta, Jonathan Corbet, Jinchao Wang, Yunhui Cui,
	Stephane Eranian, Ian Rogers, Li Huafei, Feng Tang,
	Max Kellermann, Andrew Morton, linux-kernel, linux-doc

Hi,

On Wed, Mar 11, 2026 at 7:07 AM Petr Mladek <pmladek@suse.com> wrote:
>
> On Thu 2026-03-05 08:45:35, Doug Anderson wrote:
> > Hi,
> >
> > On Thu, Mar 5, 2026 at 5:47 AM Petr Mladek <pmladek@suse.com> wrote:
> > >
> > > > --- a/kernel/watchdog.c
> > > > +++ b/kernel/watchdog.c
> > > > @@ -163,8 +171,13 @@ static bool is_hardlockup(unsigned int cpu)
> > > >  {
> > > >       int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> > > >
> > > > -     if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
> > > > -             return true;
> > > > +     if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
> > > > +             per_cpu(hrtimer_interrupts_missed, cpu)++;
> > > > +             if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
> > >
> > > This would return true for every check when missed >= 3.
> > > As a result, the hardlockup would be reported every 4s.
> > >
> > > I would keep the 12s cadence and change this to:
> > >
> > >                 if (per_cpu(hrtimer_interrupts_missed, cpu) % watchdog_hardlockup_miss_thresh == 0)
> >
> > I could be confused, but I don't think this is needed because we clear
> > "hrtimer_interrupts_missed" to 0 any time we save the timer count.
> > While I believe the "%" will functionally work, it seems harder to
> > understand, at least to me.
>
> My understanding is that we save the number of interrupts
> and reset missed counter only when:
>
>    + the number of interrupts is different (timer on the watched CPU fired)
>    + the watchdog was touched (hiding delay)
>
> => it is just incremented when the timer was not called
>    (hardlockup scenario).
>
> In particular, it is _not_ reset when we report the hardlockup.
>
> Or do I miss anything.

Ah, I wasn't thinking about the "non-panic" case. You are correct, we
need the "%" syntax in order to handle that case.

-Doug

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2026-03-12 21:02 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-12 21:12 [PATCH 0/4] watchdog/hardlockup: Improvements to hardlockup detection and documentation Mayank Rungta via B4 Relay
2026-02-12 21:12 ` [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check Mayank Rungta via B4 Relay
2026-02-13 16:29   ` Doug Anderson
2026-03-04 14:44   ` Petr Mladek
2026-03-05  0:58     ` Doug Anderson
2026-03-05 11:27       ` Petr Mladek
2026-03-05 16:13         ` Doug Anderson
2026-03-09 13:33           ` Petr Mladek
2026-03-11  2:51             ` Mayank Rungta
2026-03-11 13:56               ` Petr Mladek
2026-02-12 21:12 ` [PATCH 2/4] doc: watchdog: Clarify hardlockup detection timing Mayank Rungta via B4 Relay
2026-02-13 16:29   ` Doug Anderson
2026-03-05 12:33   ` Petr Mladek
2026-02-12 21:12 ` [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness Mayank Rungta via B4 Relay
2026-02-13 16:30   ` Doug Anderson
2026-03-05 13:46   ` Petr Mladek
2026-03-05 16:45     ` Doug Anderson
2026-03-11 14:07       ` Petr Mladek
2026-03-12 21:02         ` Doug Anderson
2026-02-12 21:12 ` [PATCH 4/4] doc: watchdog: Document buddy detector Mayank Rungta via B4 Relay
2026-02-13 16:30   ` Doug Anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox