[PATCH v2 0/5] watchdog/hardlockup: Improvements to hardlockup detection and documentation

public inbox for linux-doc@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/5] watchdog/hardlockup: Improvements to hardlockup detection and documentation
@ 2026-03-12 23:22 Mayank Rungta via B4 Relay
  2026-03-12 23:22 ` [PATCH v2 1/5] watchdog: Return early in watchdog_hardlockup_check() Mayank Rungta via B4 Relay
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Mayank Rungta via B4 Relay @ 2026-03-12 23:22 UTC (permalink / raw)
  To: Petr Mladek, Jinchao Wang, Yunhui Cui, Stephane Eranian,
	Ian Rogers, Li Huafei, Feng Tang, Max Kellermann, Jonathan Corbet,
	Douglas Anderson, Andrew Morton, Florian Delizy, Shuah Khan
  Cc: linux-kernel, linux-doc, Mayank Rungta

This series addresses limitations in the hardlockup detector implementations
and updates the documentation to reflect actual behavior and recent changes.

The changes are structured as follows:

Refactoring (Patch 1)
=====================
Patch 1 refactors watchdog_hardlockup_check() to return early if no
lockup is detected. This reduces the indentation level of the main
logic block, serving as a clean base for the subsequent changes.

Hardlockup Detection Improvements (Patches 2 & 4)
=================================================
The hardlockup detector logic relies on updating saved interrupt counts to
determine if the CPU is making progress.

Patch 1 ensures that the saved interrupt count is updated unconditionally
before checking the "touched" flag. This prevents stale comparisons which
can delay detection. This is a logic fix that ensures the detector remains
accurate even when the watchdog is frequently touched.

Patch 3 improves the Buddy detector's timeliness. The current checking
interval (every 3rd sample) causes high variability in detection time (up
to 24s). This patch changes the Buddy detector to check at every hrtimer
interval (4s) with a missed-interrupt threshold of 3, narrowing the
detection window to a consistent 8-12 second range.

Documentation Updates (Patches 3 & 5)
=====================================
The current documentation does not fully capture the variable nature of
detection latency or the details of the Buddy system.

Patch 3 removes the strict "10 seconds" definition of a hardlockup, which
was misleading given the periodic nature of the detector. It adds a
"Detection Overhead" section to the admin guide, using "Best Case" and
"Worst Case" scenarios to illustrate that detection time can vary
significantly (e.g., ~6s to ~20s).

Patch 5 adds a dedicated section for the Buddy detector, which was previously
undocumented. It details the mechanism, the new timing logic, and known
limitations.

Signed-off-by: Mayank Rungta <mrungta@google.com>
---
Changes in v2:
- Added Patch 1 to refactor watchdog_hardlockup_check() by returning
  early (Suggested by Douglas Anderson)
- Introduced the `watchdog_hardlockup_update_reset()` API (Suggested by
  Petr Mladek)
- Shifted original v1 patches to Patches 2-5 and rebased them on top of
  the new refactoring.
- Link to v1: https://lore.kernel.org/r/20260212-hardlockup-watchdog-fixes-v1-0-745f1dce04c3@google.com

---
Mayank Rungta (5):
      watchdog: Return early in watchdog_hardlockup_check()
      watchdog: Update saved interrupts during check
      doc: watchdog: Clarify hardlockup detection timing
      watchdog/hardlockup: improve buddy system detection timeliness
      doc: watchdog: Document buddy detector

 Documentation/admin-guide/lockup-watchdogs.rst | 132 ++++++++++++++++++----
 include/linux/nmi.h                            |   1 +
 kernel/watchdog.c                              | 148 ++++++++++++++-----------
 kernel/watchdog_buddy.c                        |   9 +-
 4 files changed, 199 insertions(+), 91 deletions(-)
---
base-commit: b4f0dd314b39ea154f62f3bd3115ed0470f9f71e
change-id: 20260211-hardlockup-watchdog-fixes-60317598ac20

Best regards,
-- 
Mayank Rungta <mrungta@google.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/5] watchdog: Return early in watchdog_hardlockup_check()
  2026-03-12 23:22 [PATCH v2 0/5] watchdog/hardlockup: Improvements to hardlockup detection and documentation Mayank Rungta via B4 Relay
@ 2026-03-12 23:22 ` Mayank Rungta via B4 Relay
  2026-03-13 15:27   ` Doug Anderson
  2026-03-23 15:47   ` Petr Mladek
  2026-03-12 23:22 ` [PATCH v2 2/5] watchdog: Update saved interrupts during check Mayank Rungta via B4 Relay
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 13+ messages in thread
From: Mayank Rungta via B4 Relay @ 2026-03-12 23:22 UTC (permalink / raw)
  To: Petr Mladek, Jinchao Wang, Yunhui Cui, Stephane Eranian,
	Ian Rogers, Li Huafei, Feng Tang, Max Kellermann, Jonathan Corbet,
	Douglas Anderson, Andrew Morton, Florian Delizy, Shuah Khan
  Cc: linux-kernel, linux-doc, Mayank Rungta

From: Mayank Rungta <mrungta@google.com>

Invert the `is_hardlockup(cpu)` check in `watchdog_hardlockup_check()`
to return early when a hardlockup is not detected. This flattens the
main logic block, reducing the indentation level and making the code
easier to read and maintain.

This refactoring serves as a preparation patch for future hardlockup
changes.

Signed-off-by: Mayank Rungta <mrungta@google.com>
---
 kernel/watchdog.c | 117 +++++++++++++++++++++++++++---------------------------
 1 file changed, 59 insertions(+), 58 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 7d675781bc91..4c5b47495745 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -187,6 +187,8 @@ static void watchdog_hardlockup_kick(void)
 void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 {
 	int hardlockup_all_cpu_backtrace;
+	unsigned int this_cpu;
+	unsigned long flags;
 
 	if (per_cpu(watchdog_hardlockup_touched, cpu)) {
 		per_cpu(watchdog_hardlockup_touched, cpu) = false;
@@ -201,74 +203,73 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 	 * fired multiple times before we overflow'd. If it hasn't
 	 * then this is a good indication the cpu is stuck
 	 */
-	if (is_hardlockup(cpu)) {
-		unsigned int this_cpu = smp_processor_id();
-		unsigned long flags;
+	if (!is_hardlockup(cpu)) {
+		per_cpu(watchdog_hardlockup_warned, cpu) = false;
+		return;
+	}
 
 #ifdef CONFIG_SYSFS
-		++hardlockup_count;
+	++hardlockup_count;
 #endif
-		/*
-		 * A poorly behaving BPF scheduler can trigger hard lockup by
-		 * e.g. putting numerous affinitized tasks in a single queue and
-		 * directing all CPUs at it. The following call can return true
-		 * only once when sched_ext is enabled and will immediately
-		 * abort the BPF scheduler and print out a warning message.
-		 */
-		if (scx_hardlockup(cpu))
-			return;
+	/*
+	 * A poorly behaving BPF scheduler can trigger hard lockup by
+	 * e.g. putting numerous affinitized tasks in a single queue and
+	 * directing all CPUs at it. The following call can return true
+	 * only once when sched_ext is enabled and will immediately
+	 * abort the BPF scheduler and print out a warning message.
+	 */
+	if (scx_hardlockup(cpu))
+		return;
 
-		/* Only print hardlockups once. */
-		if (per_cpu(watchdog_hardlockup_warned, cpu))
-			return;
+	/* Only print hardlockups once. */
+	if (per_cpu(watchdog_hardlockup_warned, cpu))
+		return;
 
-		/*
-		 * Prevent multiple hard-lockup reports if one cpu is already
-		 * engaged in dumping all cpu back traces.
-		 */
-		if (hardlockup_all_cpu_backtrace) {
-			if (test_and_set_bit_lock(0, &hard_lockup_nmi_warn))
-				return;
-		}
+	/*
+	 * Prevent multiple hard-lockup reports if one cpu is already
+	 * engaged in dumping all cpu back traces.
+	 */
+	if (hardlockup_all_cpu_backtrace) {
+		if (test_and_set_bit_lock(0, &hard_lockup_nmi_warn))
+			return;
+	}
 
-		/*
-		 * NOTE: we call printk_cpu_sync_get_irqsave() after printing
-		 * the lockup message. While it would be nice to serialize
-		 * that printout, we really want to make sure that if some
-		 * other CPU somehow locked up while holding the lock associated
-		 * with printk_cpu_sync_get_irqsave() that we can still at least
-		 * get the message about the lockup out.
-		 */
-		pr_emerg("CPU%u: Watchdog detected hard LOCKUP on cpu %u\n", this_cpu, cpu);
-		printk_cpu_sync_get_irqsave(flags);
+	/*
+	 * NOTE: we call printk_cpu_sync_get_irqsave() after printing
+	 * the lockup message. While it would be nice to serialize
+	 * that printout, we really want to make sure that if some
+	 * other CPU somehow locked up while holding the lock associated
+	 * with printk_cpu_sync_get_irqsave() that we can still at least
+	 * get the message about the lockup out.
+	 */
+	this_cpu = smp_processor_id();
+	pr_emerg("CPU%u: Watchdog detected hard LOCKUP on cpu %u\n", this_cpu, cpu);
+	printk_cpu_sync_get_irqsave(flags);
 
-		print_modules();
-		print_irqtrace_events(current);
-		if (cpu == this_cpu) {
-			if (regs)
-				show_regs(regs);
-			else
-				dump_stack();
-			printk_cpu_sync_put_irqrestore(flags);
-		} else {
-			printk_cpu_sync_put_irqrestore(flags);
-			trigger_single_cpu_backtrace(cpu);
-		}
+	print_modules();
+	print_irqtrace_events(current);
+	if (cpu == this_cpu) {
+		if (regs)
+			show_regs(regs);
+		else
+			dump_stack();
+		printk_cpu_sync_put_irqrestore(flags);
+	} else {
+		printk_cpu_sync_put_irqrestore(flags);
+		trigger_single_cpu_backtrace(cpu);
+	}
 
-		if (hardlockup_all_cpu_backtrace) {
-			trigger_allbutcpu_cpu_backtrace(cpu);
-			if (!hardlockup_panic)
-				clear_bit_unlock(0, &hard_lockup_nmi_warn);
-		}
+	if (hardlockup_all_cpu_backtrace) {
+		trigger_allbutcpu_cpu_backtrace(cpu);
+		if (!hardlockup_panic)
+			clear_bit_unlock(0, &hard_lockup_nmi_warn);
+	}
 
-		sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
-		if (hardlockup_panic)
-			nmi_panic(regs, "Hard LOCKUP");
+	sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
+	if (hardlockup_panic)
+		nmi_panic(regs, "Hard LOCKUP");
 
-		per_cpu(watchdog_hardlockup_warned, cpu) = true;
-	} else {
-		per_cpu(watchdog_hardlockup_warned, cpu) = false;
-	}
+	per_cpu(watchdog_hardlockup_warned, cpu) = true;
 }
 
 #else /* CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER */

-- 
2.53.0.851.ga537e3e6e9-goog



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/5] watchdog: Update saved interrupts during check
  2026-03-12 23:22 [PATCH v2 0/5] watchdog/hardlockup: Improvements to hardlockup detection and documentation Mayank Rungta via B4 Relay
  2026-03-12 23:22 ` [PATCH v2 1/5] watchdog: Return early in watchdog_hardlockup_check() Mayank Rungta via B4 Relay
@ 2026-03-12 23:22 ` Mayank Rungta via B4 Relay
  2026-03-13 15:27   ` Doug Anderson
  2026-03-23 15:58   ` Petr Mladek
  2026-03-12 23:22 ` [PATCH v2 3/5] doc: watchdog: Clarify hardlockup detection timing Mayank Rungta via B4 Relay
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 13+ messages in thread
From: Mayank Rungta via B4 Relay @ 2026-03-12 23:22 UTC (permalink / raw)
  To: Petr Mladek, Jinchao Wang, Yunhui Cui, Stephane Eranian,
	Ian Rogers, Li Huafei, Feng Tang, Max Kellermann, Jonathan Corbet,
	Douglas Anderson, Andrew Morton, Florian Delizy, Shuah Khan
  Cc: linux-kernel, linux-doc, Mayank Rungta

From: Mayank Rungta <mrungta@google.com>

Currently, arch_touch_nmi_watchdog() causes an early return that
skips updating hrtimer_interrupts_saved. This leads to stale
comparisons and delayed lockup detection.

I found this issue because in our system the serial console is fairly
chatty. For example, the 8250 console driver frequently calls
touch_nmi_watchdog() via console_write(). If a CPU locks up after a
timer interrupt but before next watchdog check, we see the following
sequence:

  * watchdog_hardlockup_check() saves counter (e.g., 1000)
  * Timer runs and updates the counter (1001)
  * touch_nmi_watchdog() is called
  * CPU locks up
  * 10s pass: check() notices touch, returns early, skips update
  * 10s pass: check() saves counter (1001)
  * 10s pass: check() finally detects lockup

This delays detection to 30 seconds. With this fix, we detect the
lockup in 20 seconds.

Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Mayank Rungta <mrungta@google.com>
---
 kernel/watchdog.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 4c5b47495745..431c540bd035 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -159,21 +159,28 @@ void watchdog_hardlockup_touch_cpu(unsigned int cpu)
 	per_cpu(watchdog_hardlockup_touched, cpu) = true;
 }
 
-static bool is_hardlockup(unsigned int cpu)
+static void watchdog_hardlockup_update(unsigned int cpu)
 {
 	int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
 
-	if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
-		return true;
-
 	/*
 	 * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
 	 * for hrtimer_interrupts_saved. hrtimer_interrupts_saved is
 	 * written/read by a single CPU.
 	 */
 	per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
+}
+
+static bool is_hardlockup(unsigned int cpu)
+{
+	int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
+
+	if (per_cpu(hrtimer_interrupts_saved, cpu) != hrint) {
+		watchdog_hardlockup_update(cpu);
+		return false;
+	}
 
-	return false;
+	return true;
 }
 
 static void watchdog_hardlockup_kick(void)
@@ -191,6 +198,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 	unsigned long flags;
 
 	if (per_cpu(watchdog_hardlockup_touched, cpu)) {
+		watchdog_hardlockup_update(cpu);
 		per_cpu(watchdog_hardlockup_touched, cpu) = false;
 		return;
 	}

-- 
2.53.0.851.ga537e3e6e9-goog



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/5] doc: watchdog: Clarify hardlockup detection timing
  2026-03-12 23:22 [PATCH v2 0/5] watchdog/hardlockup: Improvements to hardlockup detection and documentation Mayank Rungta via B4 Relay
  2026-03-12 23:22 ` [PATCH v2 1/5] watchdog: Return early in watchdog_hardlockup_check() Mayank Rungta via B4 Relay
  2026-03-12 23:22 ` [PATCH v2 2/5] watchdog: Update saved interrupts during check Mayank Rungta via B4 Relay
@ 2026-03-12 23:22 ` Mayank Rungta via B4 Relay
  2026-03-12 23:22 ` [PATCH v2 4/5] watchdog/hardlockup: improve buddy system detection timeliness Mayank Rungta via B4 Relay
  2026-03-12 23:22 ` [PATCH v2 5/5] doc: watchdog: Document buddy detector Mayank Rungta via B4 Relay
  4 siblings, 0 replies; 13+ messages in thread
From: Mayank Rungta via B4 Relay @ 2026-03-12 23:22 UTC (permalink / raw)
  To: Petr Mladek, Jinchao Wang, Yunhui Cui, Stephane Eranian,
	Ian Rogers, Li Huafei, Feng Tang, Max Kellermann, Jonathan Corbet,
	Douglas Anderson, Andrew Morton, Florian Delizy, Shuah Khan
  Cc: linux-kernel, linux-doc, Mayank Rungta

From: Mayank Rungta <mrungta@google.com>

The current documentation implies that a hardlockup is strictly defined
as looping for "more than 10 seconds." However, the detection mechanism
is periodic (based on `watchdog_thresh`), meaning detection time varies
significantly depending on when the lockup occurs relative to the NMI
perf event.

Update the definition to remove the strict "more than 10 seconds"
constraint in the introduction and defer details to the Implementation
section.

Additionally, add a "Detection Overhead" section illustrating the
Best Case (~6s) and Worst Case (~20s) detection scenarios to provide
administrators with a clearer understanding of the watchdog's
latency.

Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Mayank Rungta <mrungta@google.com>
---
 Documentation/admin-guide/lockup-watchdogs.rst | 41 +++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/lockup-watchdogs.rst b/Documentation/admin-guide/lockup-watchdogs.rst
index 3e09284a8b9b..1b374053771f 100644
--- a/Documentation/admin-guide/lockup-watchdogs.rst
+++ b/Documentation/admin-guide/lockup-watchdogs.rst
@@ -16,7 +16,7 @@ details), and a compile option, "BOOTPARAM_SOFTLOCKUP_PANIC", are
 provided for this.
 
 A 'hardlockup' is defined as a bug that causes the CPU to loop in
-kernel mode for more than 10 seconds (see "Implementation" below for
+kernel mode for several seconds (see "Implementation" below for
 details), without letting other interrupts have a chance to run.
 Similarly to the softlockup case, the current stack trace is displayed
 upon detection and the system will stay locked up unless the default
@@ -64,6 +64,45 @@ administrators to configure the period of the hrtimer and the perf
 event. The right value for a particular environment is a trade-off
 between fast response to lockups and detection overhead.
 
+Detection Overhead
+------------------
+
+The hardlockup detector checks for lockups using a periodic NMI perf
+event. This means the time to detect a lockup can vary depending on
+when the lockup occurs relative to the NMI check window.
+
+**Best Case:**
+In the best case scenario, the lockup occurs just before the first
+heartbeat is due. The detector will notice the missing hrtimer
+interrupt almost immediately during the next check.
+
+::
+
+  Time 100.0: cpu 1 heartbeat
+  Time 100.1: hardlockup_check, cpu1 stores its state
+  Time 103.9: Hard Lockup on cpu1
+  Time 104.0: cpu 1 heartbeat never comes
+  Time 110.1: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup
+
+  Time to detection: ~6 seconds
+
+**Worst Case:**
+In the worst case scenario, the lockup occurs shortly after a valid
+interrupt (heartbeat) which itself happened just after the NMI check.
+The next NMI check sees that the interrupt count has changed (due to
+that one heartbeat), assumes the CPU is healthy, and resets the
+baseline. The lockup is only detected at the subsequent check.
+
+::
+
+  Time 100.0: hardlockup_check, cpu1 stores its state
+  Time 100.1: cpu 1 heartbeat
+  Time 100.2: Hard Lockup on cpu1
+  Time 110.0: hardlockup_check, cpu1 stores its state (misses lockup as state changed)
+  Time 120.0: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup
+
+  Time to detection: ~20 seconds
+
 By default, the watchdog runs on all online cores.  However, on a
 kernel configured with NO_HZ_FULL, by default the watchdog runs only
 on the housekeeping cores, not the cores specified in the "nohz_full"

-- 
2.53.0.851.ga537e3e6e9-goog



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 4/5] watchdog/hardlockup: improve buddy system detection timeliness
  2026-03-12 23:22 [PATCH v2 0/5] watchdog/hardlockup: Improvements to hardlockup detection and documentation Mayank Rungta via B4 Relay
                   ` (2 preceding siblings ...)
  2026-03-12 23:22 ` [PATCH v2 3/5] doc: watchdog: Clarify hardlockup detection timing Mayank Rungta via B4 Relay
@ 2026-03-12 23:22 ` Mayank Rungta via B4 Relay
  2026-03-23 16:26   ` Petr Mladek
  2026-03-12 23:22 ` [PATCH v2 5/5] doc: watchdog: Document buddy detector Mayank Rungta via B4 Relay
  4 siblings, 1 reply; 13+ messages in thread
From: Mayank Rungta via B4 Relay @ 2026-03-12 23:22 UTC (permalink / raw)
  To: Petr Mladek, Jinchao Wang, Yunhui Cui, Stephane Eranian,
	Ian Rogers, Li Huafei, Feng Tang, Max Kellermann, Jonathan Corbet,
	Douglas Anderson, Andrew Morton, Florian Delizy, Shuah Khan
  Cc: linux-kernel, linux-doc, Mayank Rungta

From: Mayank Rungta <mrungta@google.com>

Currently, the buddy system only performs checks every 3rd sample. With
a 4-second interval. If a check window is missed, the next check occurs
12 seconds later, potentially delaying hard lockup detection for up to
24 seconds.

Modify the buddy system to perform checks at every interval (4s).
Introduce a missed-interrupt threshold to maintain the existing grace
period while reducing the detection window to 8-12 seconds.

Best and worst case detection scenarios:

Before (12s check window):
- Best case: Lockup occurs after first check but just before heartbeat
  interval. Detected in ~8s (8s till next check).
- Worst case: Lockup occurs just after a check.
  Detected in ~24s (missed check + 12s till next check + 12s logic).

After (4s check window with threshold of 3):
- Best case: Lockup occurs just before a check.
  Detected in ~8s (0s till 1st check + 4s till 2nd + 4s till 3rd).
- Worst case: Lockup occurs just after a check.
  Detected in ~12s (4s till 1st check + 4s till 2nd + 4s till 3rd).

Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Mayank Rungta <mrungta@google.com>
---
 include/linux/nmi.h     |  1 +
 kernel/watchdog.c       | 19 ++++++++++++++++---
 kernel/watchdog_buddy.c |  9 +--------
 3 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 207156f2143c..bc1162895f35 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -21,6 +21,7 @@ void lockup_detector_soft_poweroff(void);
 extern int watchdog_user_enabled;
 extern int watchdog_thresh;
 extern unsigned long watchdog_enabled;
+extern int watchdog_hardlockup_miss_thresh;
 
 extern struct cpumask watchdog_cpumask;
 extern unsigned long *watchdog_cpumask_bits;
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 431c540bd035..87dd5e0f6968 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -60,6 +60,13 @@ unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
 int __read_mostly sysctl_hardlockup_all_cpu_backtrace;
 # endif /* CONFIG_SMP */
 
+/*
+ * Number of consecutive missed interrupts before declaring a lockup.
+ * Default to 1 (immediate) for NMI/Perf. Buddy will overwrite this to 3.
+ */
+int __read_mostly watchdog_hardlockup_miss_thresh = 1;
+EXPORT_SYMBOL_GPL(watchdog_hardlockup_miss_thresh);
+
 /*
  * Should we panic when a soft-lockup or hard-lockup occurs:
  */
@@ -137,6 +144,7 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
 
 static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts);
 static DEFINE_PER_CPU(int, hrtimer_interrupts_saved);
+static DEFINE_PER_CPU(int, hrtimer_interrupts_missed);
 static DEFINE_PER_CPU(bool, watchdog_hardlockup_warned);
 static DEFINE_PER_CPU(bool, watchdog_hardlockup_touched);
 static unsigned long hard_lockup_nmi_warn;
@@ -159,7 +167,7 @@ void watchdog_hardlockup_touch_cpu(unsigned int cpu)
 	per_cpu(watchdog_hardlockup_touched, cpu) = true;
 }
 
-static void watchdog_hardlockup_update(unsigned int cpu)
+static void watchdog_hardlockup_update_reset(unsigned int cpu)
 {
 	int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
 
@@ -169,6 +177,7 @@ static void watchdog_hardlockup_update(unsigned int cpu)
 	 * written/read by a single CPU.
 	 */
 	per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
+	per_cpu(hrtimer_interrupts_missed, cpu) = 0;
 }
 
 static bool is_hardlockup(unsigned int cpu)
@@ -176,10 +185,14 @@ static bool is_hardlockup(unsigned int cpu)
 	int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
 
 	if (per_cpu(hrtimer_interrupts_saved, cpu) != hrint) {
-		watchdog_hardlockup_update(cpu);
+		watchdog_hardlockup_update_reset(cpu);
 		return false;
 	}
 
+	per_cpu(hrtimer_interrupts_missed, cpu)++;
+	if (per_cpu(hrtimer_interrupts_missed, cpu) % watchdog_hardlockup_miss_thresh)
+		return false;
+
 	return true;
 }
 
@@ -198,7 +211,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 	unsigned long flags;
 
 	if (per_cpu(watchdog_hardlockup_touched, cpu)) {
-		watchdog_hardlockup_update(cpu);
+		watchdog_hardlockup_update_reset(cpu);
 		per_cpu(watchdog_hardlockup_touched, cpu) = false;
 		return;
 	}
diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c
index ee754d767c21..3a1e57080c1c 100644
--- a/kernel/watchdog_buddy.c
+++ b/kernel/watchdog_buddy.c
@@ -21,6 +21,7 @@ static unsigned int watchdog_next_cpu(unsigned int cpu)
 
 int __init watchdog_hardlockup_probe(void)
 {
+	watchdog_hardlockup_miss_thresh = 3;
 	return 0;
 }
 
@@ -86,14 +87,6 @@ void watchdog_buddy_check_hardlockup(int hrtimer_interrupts)
 {
 	unsigned int next_cpu;
 
-	/*
-	 * Test for hardlockups every 3 samples. The sample period is
-	 *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
-	 *  watchdog_thresh (over by 20%).
-	 */
-	if (hrtimer_interrupts % 3 != 0)
-		return;
-
 	/* check for a hardlockup on the next CPU */
 	next_cpu = watchdog_next_cpu(smp_processor_id());
 	if (next_cpu >= nr_cpu_ids)

-- 
2.53.0.851.ga537e3e6e9-goog



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 5/5] doc: watchdog: Document buddy detector
  2026-03-12 23:22 [PATCH v2 0/5] watchdog/hardlockup: Improvements to hardlockup detection and documentation Mayank Rungta via B4 Relay
                   ` (3 preceding siblings ...)
  2026-03-12 23:22 ` [PATCH v2 4/5] watchdog/hardlockup: improve buddy system detection timeliness Mayank Rungta via B4 Relay
@ 2026-03-12 23:22 ` Mayank Rungta via B4 Relay
  2026-03-23 17:26   ` Petr Mladek
  4 siblings, 1 reply; 13+ messages in thread
From: Mayank Rungta via B4 Relay @ 2026-03-12 23:22 UTC (permalink / raw)
  To: Petr Mladek, Jinchao Wang, Yunhui Cui, Stephane Eranian,
	Ian Rogers, Li Huafei, Feng Tang, Max Kellermann, Jonathan Corbet,
	Douglas Anderson, Andrew Morton, Florian Delizy, Shuah Khan
  Cc: linux-kernel, linux-doc, Mayank Rungta

From: Mayank Rungta <mrungta@google.com>

The current documentation generalizes the hardlockup detector as primarily
NMI-perf-based and lacks details on the SMP "Buddy" detector.

Update the documentation to add a detailed description of the Buddy
detector, and also restructure the "Implementation" section to explicitly
separate "Softlockup Detector", "Hardlockup Detector (NMI/Perf)", and
"Hardlockup Detector (Buddy)".

Clarify that the softlockup hrtimer acts as the heartbeat generator for
both hardlockup mechanisms and centralize the configuration details in a
"Frequency and Heartbeats" section.

Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Mayank Rungta <mrungta@google.com>
---
 Documentation/admin-guide/lockup-watchdogs.rst | 149 +++++++++++++++++--------
 1 file changed, 101 insertions(+), 48 deletions(-)

diff --git a/Documentation/admin-guide/lockup-watchdogs.rst b/Documentation/admin-guide/lockup-watchdogs.rst
index 1b374053771f..7ae7ce3abd2c 100644
--- a/Documentation/admin-guide/lockup-watchdogs.rst
+++ b/Documentation/admin-guide/lockup-watchdogs.rst
@@ -30,22 +30,23 @@ timeout is set through the confusingly named "kernel.panic" sysctl),
 to cause the system to reboot automatically after a specified amount
 of time.
 
+Configuration
+=============
+
+A kernel knob is provided that allows administrators to configure
+this period. The "watchdog_thresh" parameter (default 10 seconds)
+controls the threshold. The right value for a particular environment
+is a trade-off between fast response to lockups and detection overhead.
+
 Implementation
 ==============
 
-The soft and hard lockup detectors are built on top of the hrtimer and
-perf subsystems, respectively. A direct consequence of this is that,
-in principle, they should work in any architecture where these
-subsystems are present.
+The soft lockup detector is built on top of the hrtimer subsystem.
+The hard lockup detector is built on top of the perf subsystem
+(on architectures that support it) or uses an SMP "buddy" system.
 
-A periodic hrtimer runs to generate interrupts and kick the watchdog
-job. An NMI perf event is generated every "watchdog_thresh"
-(compile-time initialized to 10 and configurable through sysctl of the
-same name) seconds to check for hardlockups. If any CPU in the system
-does not receive any hrtimer interrupt during that time the
-'hardlockup detector' (the handler for the NMI perf event) will
-generate a kernel warning or call panic, depending on the
-configuration.
+Softlockup Detector
+-------------------
 
 The watchdog job runs in a stop scheduling thread that updates a
 timestamp every time it is scheduled. If that timestamp is not updated
@@ -55,53 +56,105 @@ will dump useful debug information to the system log, after which it
 will call panic if it was instructed to do so or resume execution of
 other kernel code.
 
-The period of the hrtimer is 2*watchdog_thresh/5, which means it has
-two or three chances to generate an interrupt before the hardlockup
-detector kicks in.
+Frequency and Heartbeats
+------------------------
+
+The hrtimer used by the softlockup detector serves a dual purpose:
+it detects softlockups, and it also generates the interrupts
+(heartbeats) that the hardlockup detectors use to verify CPU liveness.
+
+The period of this hrtimer is 2*watchdog_thresh/5. This means the
+hrtimer has two or three chances to generate an interrupt before the
+NMI hardlockup detector kicks in.
+
+Hardlockup Detector (NMI/Perf)
+------------------------------
+
+On architectures that support NMI (Non-Maskable Interrupt) perf events,
+a periodic NMI is generated every "watchdog_thresh" seconds.
+
+If any CPU in the system does not receive any hrtimer interrupt
+(heartbeat) during the "watchdog_thresh" window, the 'hardlockup
+detector' (the handler for the NMI perf event) will generate a kernel
+warning or call panic.
+
+**Detection Overhead (NMI):**
+
+The time to detect a lockup can vary depending on when the lockup
+occurs relative to the NMI check window. Examples below assume a watchdog_thresh of 10.
+
+* **Best Case:** The lockup occurs just before the first heartbeat is
+  due. The detector will notice the missing hrtimer interrupt almost
+  immediately during the next check.
+
+  ::
+
+    Time 100.0: cpu 1 heartbeat
+    Time 100.1: hardlockup_check, cpu1 stores its state
+    Time 103.9: Hard Lockup on cpu1
+    Time 104.0: cpu 1 heartbeat never comes
+    Time 110.1: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup
+
+    Time to detection: ~6 seconds
+
+* **Worst Case:** The lockup occurs shortly after a valid interrupt
+  (heartbeat) which itself happened just after the NMI check. The next
+  NMI check sees that the interrupt count has changed (due to that one
+  heartbeat), assumes the CPU is healthy, and resets the baseline. The
+  lockup is only detected at the subsequent check.
+
+  ::
+
+    Time 100.0: hardlockup_check, cpu1 stores its state
+    Time 100.1: cpu 1 heartbeat
+    Time 100.2: Hard Lockup on cpu1
+    Time 110.0: hardlockup_check, cpu1 stores its state (misses lockup as state changed)
+    Time 120.0: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup
 
-As explained above, a kernel knob is provided that allows
-administrators to configure the period of the hrtimer and the perf
-event. The right value for a particular environment is a trade-off
-between fast response to lockups and detection overhead.
+    Time to detection: ~20 seconds
 
-Detection Overhead
-------------------
+Hardlockup Detector (Buddy)
+---------------------------
 
-The hardlockup detector checks for lockups using a periodic NMI perf
-event. This means the time to detect a lockup can vary depending on
-when the lockup occurs relative to the NMI check window.
+On architectures or configurations where NMI perf events are not
+available (or disabled), the kernel may use the "buddy" hardlockup
+detector. This mechanism requires SMP (Symmetric Multi-Processing).
 
-**Best Case:**
-In the best case scenario, the lockup occurs just before the first
-heartbeat is due. The detector will notice the missing hrtimer
-interrupt almost immediately during the next check.
+In this mode, each CPU is assigned a "buddy" CPU to monitor. The
+monitoring CPU runs its own hrtimer (the same one used for softlockup
+detection) and checks if the buddy CPU's hrtimer interrupt count has
+increased.
 
-::
+To ensure timeliness and avoid false positives, the buddy system performs
+checks at every hrtimer interval (2*watchdog_thresh/5, which is 4 seconds
+by default). It uses a missed-interrupt threshold of 3. If the buddy's
+interrupt count has not changed for 3 consecutive checks, it is assumed
+that the buddy CPU is hardlocked (interrupts disabled). The monitoring
+CPU will then trigger the hardlockup response (warning or panic).
 
-  Time 100.0: cpu 1 heartbeat
-  Time 100.1: hardlockup_check, cpu1 stores its state
-  Time 103.9: Hard Lockup on cpu1
-  Time 104.0: cpu 1 heartbeat never comes
-  Time 110.1: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup
+**Detection Overhead (Buddy):**
 
-  Time to detection: ~6 seconds
+With a default check interval of 4 seconds (watchdog_thresh = 10):
 
-**Worst Case:**
-In the worst case scenario, the lockup occurs shortly after a valid
-interrupt (heartbeat) which itself happened just after the NMI check.
-The next NMI check sees that the interrupt count has changed (due to
-that one heartbeat), assumes the CPU is healthy, and resets the
-baseline. The lockup is only detected at the subsequent check.
+* **Best case:** Lockup occurs just before a check.
+    Detected in ~8s (0s till 1st check + 4s till 2nd + 4s till 3rd).
+* **Worst case:** Lockup occurs just after a check.
+    Detected in ~12s (4s till 1st check + 4s till 2nd + 4s till 3rd).
 
-::
+**Limitations of the Buddy Detector:**
 
-  Time 100.0: hardlockup_check, cpu1 stores its state
-  Time 100.1: cpu 1 heartbeat
-  Time 100.2: Hard Lockup on cpu1
-  Time 110.0: hardlockup_check, cpu1 stores its state (misses lockup as state changed)
-  Time 120.0: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup
+1.  **All-CPU Lockup:** If all CPUs lock up simultaneously, the buddy
+    detector cannot detect the condition because the monitoring CPUs
+    are also frozen.
+2.  **Stack Traces:** Unlike the NMI detector, the buddy detector
+    cannot directly interrupt the locked CPU to grab a stack trace.
+    It relies on architecture-specific mechanisms (like NMI backtrace
+    support) to try and retrieve the status of the locked CPU. If
+    such support is missing, the log may only show that a lockup
+    occurred without providing the locked CPU's stack.
 
-  Time to detection: ~20 seconds
+Watchdog Core Exclusion
+=======================
 
 By default, the watchdog runs on all online cores.  However, on a
 kernel configured with NO_HZ_FULL, by default the watchdog runs only

-- 
2.53.0.851.ga537e3e6e9-goog



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/5] watchdog: Return early in watchdog_hardlockup_check()
  2026-03-12 23:22 ` [PATCH v2 1/5] watchdog: Return early in watchdog_hardlockup_check() Mayank Rungta via B4 Relay
@ 2026-03-13 15:27   ` Doug Anderson
  2026-03-23 15:47   ` Petr Mladek
  1 sibling, 0 replies; 13+ messages in thread
From: Doug Anderson @ 2026-03-13 15:27 UTC (permalink / raw)
  To: mrungta
  Cc: Petr Mladek, Jinchao Wang, Yunhui Cui, Stephane Eranian,
	Ian Rogers, Li Huafei, Feng Tang, Max Kellermann, Jonathan Corbet,
	Andrew Morton, Florian Delizy, Shuah Khan, linux-kernel,
	linux-doc

Hi,

On Thu, Mar 12, 2026 at 4:22 PM Mayank Rungta via B4 Relay
<devnull+mrungta.google.com@kernel.org> wrote:
>
> From: Mayank Rungta <mrungta@google.com>
>
> Invert the `is_hardlockup(cpu)` check in `watchdog_hardlockup_check()`
> to return early when a hardlockup is not detected. This flattens the
> main logic block, reducing the indentation level and making the code
> easier to read and maintain.
>
> This refactoring serves as a preparation patch for future hardlockup
> changes.
>
> Signed-off-by: Mayank Rungta <mrungta@google.com>
> ---
>  kernel/watchdog.c | 117 +++++++++++++++++++++++++++---------------------------
>  1 file changed, 59 insertions(+), 58 deletions(-)
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 7d675781bc91..4c5b47495745 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -187,6 +187,8 @@ static void watchdog_hardlockup_kick(void)
>  void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
>  {
>         int hardlockup_all_cpu_backtrace;
> +       unsigned int this_cpu;
> +       unsigned long flags;
>
>         if (per_cpu(watchdog_hardlockup_touched, cpu)) {
>                 per_cpu(watchdog_hardlockup_touched, cpu) = false;
> @@ -201,74 +203,73 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
>          * fired multiple times before we overflow'd. If it hasn't
>          * then this is a good indication the cpu is stuck
>          */
> -       if (is_hardlockup(cpu)) {
> -               unsigned int this_cpu = smp_processor_id();
> -               unsigned long flags;
> +       if (!is_hardlockup(cpu)) {
> +               per_cpu(watchdog_hardlockup_warned, cpu) = false;
> +               return;
> +       }

IMO not worth spinning for, but potentially the
"hardlockup_all_cpu_backtrace" assignment could be moved down below
the new "if" test, since it's not needed if we "early out".

In any case:

Reviewed-by: Douglas Anderson <dianders@chromium.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/5] watchdog: Update saved interrupts during check
  2026-03-12 23:22 ` [PATCH v2 2/5] watchdog: Update saved interrupts during check Mayank Rungta via B4 Relay
@ 2026-03-13 15:27   ` Doug Anderson
  2026-03-23 15:58   ` Petr Mladek
  1 sibling, 0 replies; 13+ messages in thread
From: Doug Anderson @ 2026-03-13 15:27 UTC (permalink / raw)
  To: mrungta
  Cc: Petr Mladek, Jinchao Wang, Yunhui Cui, Stephane Eranian,
	Ian Rogers, Li Huafei, Feng Tang, Max Kellermann, Jonathan Corbet,
	Andrew Morton, Florian Delizy, Shuah Khan, linux-kernel,
	linux-doc

Hi,

On Thu, Mar 12, 2026 at 4:22 PM Mayank Rungta via B4 Relay
<devnull+mrungta.google.com@kernel.org> wrote:
>
> From: Mayank Rungta <mrungta@google.com>
>
> Currently, arch_touch_nmi_watchdog() causes an early return that
> skips updating hrtimer_interrupts_saved. This leads to stale
> comparisons and delayed lockup detection.
>
> I found this issue because in our system the serial console is fairly
> chatty. For example, the 8250 console driver frequently calls
> touch_nmi_watchdog() via console_write(). If a CPU locks up after a
> timer interrupt but before next watchdog check, we see the following
> sequence:
>
>   * watchdog_hardlockup_check() saves counter (e.g., 1000)
>   * Timer runs and updates the counter (1001)
>   * touch_nmi_watchdog() is called
>   * CPU locks up
>   * 10s pass: check() notices touch, returns early, skips update
>   * 10s pass: check() saves counter (1001)
>   * 10s pass: check() finally detects lockup
>
> This delays detection to 30 seconds. With this fix, we detect the
> lockup in 20 seconds.
>
> Reviewed-by: Douglas Anderson <dianders@chromium.org>
> Signed-off-by: Mayank Rungta <mrungta@google.com>
> ---
>  kernel/watchdog.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 4c5b47495745..431c540bd035 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -159,21 +159,28 @@ void watchdog_hardlockup_touch_cpu(unsigned int cpu)
>         per_cpu(watchdog_hardlockup_touched, cpu) = true;
>  }
>
> -static bool is_hardlockup(unsigned int cpu)
> +static void watchdog_hardlockup_update(unsigned int cpu)
>  {
>         int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
>
> -       if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
> -               return true;
> -
>         /*
>          * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
>          * for hrtimer_interrupts_saved. hrtimer_interrupts_saved is
>          * written/read by a single CPU.
>          */
>         per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
> +}
> +
> +static bool is_hardlockup(unsigned int cpu)
> +{
> +       int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> +
> +       if (per_cpu(hrtimer_interrupts_saved, cpu) != hrint) {
> +               watchdog_hardlockup_update(cpu);
> +               return false;
> +       }
>
> -       return false;
> +       return true;
>  }
>
>  static void watchdog_hardlockup_kick(void)
> @@ -191,6 +198,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
>         unsigned long flags;
>
>         if (per_cpu(watchdog_hardlockup_touched, cpu)) {
> +               watchdog_hardlockup_update(cpu);

In the new solution, we read `hrtimer_interrupts twice instead of
once. That means that (potentially) those two reads could give us back
different values. I spent time thinking about whether this is a
problem, and I don't think it is.

The first time we read `hrtimer_interrupts`, we only care about
whether the value is the same as the saved value. If it is the same,
we won't read `hrtimer_interrupts` again anyway. If it isn't the same,
then we will read it agian. ...but that's OK. All we cared about was
whether it was the same as the (old) saved value. The second time we
read `hrtimer_interrupts` it could only have become more different (by
getting incremented again).

That's a longwinded way of saying:

Reviewed-by: Douglas Anderson <dianders@chromium.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/5] watchdog: Return early in watchdog_hardlockup_check()
  2026-03-12 23:22 ` [PATCH v2 1/5] watchdog: Return early in watchdog_hardlockup_check() Mayank Rungta via B4 Relay
  2026-03-13 15:27   ` Doug Anderson
@ 2026-03-23 15:47   ` Petr Mladek
  1 sibling, 0 replies; 13+ messages in thread
From: Petr Mladek @ 2026-03-23 15:47 UTC (permalink / raw)
  To: mrungta
  Cc: Jinchao Wang, Yunhui Cui, Stephane Eranian, Ian Rogers, Li Huafei,
	Feng Tang, Max Kellermann, Jonathan Corbet, Douglas Anderson,
	Andrew Morton, Florian Delizy, Shuah Khan, linux-kernel,
	linux-doc

On Thu 2026-03-12 16:22:02, Mayank Rungta via B4 Relay wrote:
> From: Mayank Rungta <mrungta@google.com>
> 
> Invert the `is_hardlockup(cpu)` check in `watchdog_hardlockup_check()`
> to return early when a hardlockup is not detected. This flattens the
> main logic block, reducing the indentation level and making the code
> easier to read and maintain.
> 
> This refactoring serves as a preparation patch for future hardlockup
> changes.
> 
> Signed-off-by: Mayank Rungta <mrungta@google.com>

LGTM:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/5] watchdog: Update saved interrupts during check
  2026-03-12 23:22 ` [PATCH v2 2/5] watchdog: Update saved interrupts during check Mayank Rungta via B4 Relay
  2026-03-13 15:27   ` Doug Anderson
@ 2026-03-23 15:58   ` Petr Mladek
  1 sibling, 0 replies; 13+ messages in thread
From: Petr Mladek @ 2026-03-23 15:58 UTC (permalink / raw)
  To: mrungta
  Cc: Jinchao Wang, Yunhui Cui, Stephane Eranian, Ian Rogers, Li Huafei,
	Feng Tang, Max Kellermann, Jonathan Corbet, Douglas Anderson,
	Andrew Morton, Florian Delizy, Shuah Khan, linux-kernel,
	linux-doc

On Thu 2026-03-12 16:22:03, Mayank Rungta via B4 Relay wrote:
> From: Mayank Rungta <mrungta@google.com>
> 
> Currently, arch_touch_nmi_watchdog() causes an early return that
> skips updating hrtimer_interrupts_saved. This leads to stale
> comparisons and delayed lockup detection.
> 
> I found this issue because in our system the serial console is fairly
> chatty. For example, the 8250 console driver frequently calls
> touch_nmi_watchdog() via console_write(). If a CPU locks up after a
> timer interrupt but before next watchdog check, we see the following
> sequence:
> 
>   * watchdog_hardlockup_check() saves counter (e.g., 1000)
>   * Timer runs and updates the counter (1001)
>   * touch_nmi_watchdog() is called
>   * CPU locks up
>   * 10s pass: check() notices touch, returns early, skips update
>   * 10s pass: check() saves counter (1001)
>   * 10s pass: check() finally detects lockup
> 
> This delays detection to 30 seconds. With this fix, we detect the
> lockup in 20 seconds.
> 
> Reviewed-by: Douglas Anderson <dianders@chromium.org>
> Signed-off-by: Mayank Rungta <mrungta@google.com>

I agree with Doug's analyze and it looks good to me:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/5] watchdog/hardlockup: improve buddy system detection timeliness
  2026-03-12 23:22 ` [PATCH v2 4/5] watchdog/hardlockup: improve buddy system detection timeliness Mayank Rungta via B4 Relay
@ 2026-03-23 16:26   ` Petr Mladek
  0 siblings, 0 replies; 13+ messages in thread
From: Petr Mladek @ 2026-03-23 16:26 UTC (permalink / raw)
  To: mrungta
  Cc: Jinchao Wang, Yunhui Cui, Stephane Eranian, Ian Rogers, Li Huafei,
	Feng Tang, Max Kellermann, Jonathan Corbet, Douglas Anderson,
	Andrew Morton, Florian Delizy, Shuah Khan, linux-kernel,
	linux-doc

On Thu 2026-03-12 16:22:05, Mayank Rungta via B4 Relay wrote:
> From: Mayank Rungta <mrungta@google.com>
> 
> Currently, the buddy system only performs checks every 3rd sample. With
> a 4-second interval. If a check window is missed, the next check occurs
> 12 seconds later, potentially delaying hard lockup detection for up to
> 24 seconds.
> 
> Modify the buddy system to perform checks at every interval (4s).
> Introduce a missed-interrupt threshold to maintain the existing grace
> period while reducing the detection window to 8-12 seconds.
> 
> Best and worst case detection scenarios:
> 
> Before (12s check window):
> - Best case: Lockup occurs after first check but just before heartbeat
>   interval. Detected in ~8s (8s till next check).
> - Worst case: Lockup occurs just after a check.
>   Detected in ~24s (missed check + 12s till next check + 12s logic).
> 
> After (4s check window with threshold of 3):
> - Best case: Lockup occurs just before a check.
>   Detected in ~8s (0s till 1st check + 4s till 2nd + 4s till 3rd).
> - Worst case: Lockup occurs just after a check.
>   Detected in ~12s (4s till 1st check + 4s till 2nd + 4s till 3rd).
> 
> Reviewed-by: Douglas Anderson <dianders@chromium.org>
> Signed-off-by: Mayank Rungta <mrungta@google.com>

LGTM:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 5/5] doc: watchdog: Document buddy detector
  2026-03-12 23:22 ` [PATCH v2 5/5] doc: watchdog: Document buddy detector Mayank Rungta via B4 Relay
@ 2026-03-23 17:26   ` Petr Mladek
  2026-03-23 22:45     ` Doug Anderson
  0 siblings, 1 reply; 13+ messages in thread
From: Petr Mladek @ 2026-03-23 17:26 UTC (permalink / raw)
  To: mrungta
  Cc: Jinchao Wang, Yunhui Cui, Stephane Eranian, Ian Rogers, Li Huafei,
	Feng Tang, Max Kellermann, Jonathan Corbet, Douglas Anderson,
	Andrew Morton, Florian Delizy, Shuah Khan, linux-kernel,
	linux-doc

On Thu 2026-03-12 16:22:06, Mayank Rungta via B4 Relay wrote:
> From: Mayank Rungta <mrungta@google.com>
> 
> The current documentation generalizes the hardlockup detector as primarily
> NMI-perf-based and lacks details on the SMP "Buddy" detector.
> 
> Update the documentation to add a detailed description of the Buddy
> detector, and also restructure the "Implementation" section to explicitly
> separate "Softlockup Detector", "Hardlockup Detector (NMI/Perf)", and
> "Hardlockup Detector (Buddy)".
> 
> Clarify that the softlockup hrtimer acts as the heartbeat generator for
> both hardlockup mechanisms and centralize the configuration details in a
> "Frequency and Heartbeats" section.

This is a great step forward. See few nits below:

> --- a/Documentation/admin-guide/lockup-watchdogs.rst
> +++ b/Documentation/admin-guide/lockup-watchdogs.rst
> @@ -30,22 +30,23 @@ timeout is set through the confusingly named "kernel.panic" sysctl),
>  to cause the system to reboot automatically after a specified amount
>  of time.
>  
> +Configuration
> +=============
> +
> +A kernel knob is provided that allows administrators to configure
> +this period. The "watchdog_thresh" parameter (default 10 seconds)
> +controls the threshold. The right value for a particular environment
> +is a trade-off between fast response to lockups and detection overhead.
> +
>  Implementation
>  ==============
>  
> -The soft and hard lockup detectors are built on top of the hrtimer and
> -perf subsystems, respectively. A direct consequence of this is that,
> -in principle, they should work in any architecture where these
> -subsystems are present.
> +The soft lockup detector is built on top of the hrtimer subsystem.
> +The hard lockup detector is built on top of the perf subsystem
> +(on architectures that support it) or uses an SMP "buddy" system.

This looks like a too big simplification. In fact, the hrtimer is
the core of all these detectors. The buddy detector uses only
the hrtimer. Also it would be nice to mention the scheduled
job used by softlockup detector.

See below for a proposal.

> -A periodic hrtimer runs to generate interrupts and kick the watchdog
> -job. An NMI perf event is generated every "watchdog_thresh"
> -(compile-time initialized to 10 and configurable through sysctl of the
> -same name) seconds to check for hardlockups. If any CPU in the system
> -does not receive any hrtimer interrupt during that time the
> -'hardlockup detector' (the handler for the NMI perf event) will
> -generate a kernel warning or call panic, depending on the
> -configuration.
> +Softlockup Detector
> +-------------------
>  
>  The watchdog job runs in a stop scheduling thread that updates a
>  timestamp every time it is scheduled. If that timestamp is not updated
> @@ -55,53 +56,105 @@ will dump useful debug information to the system log, after which it
>  will call panic if it was instructed to do so or resume execution of
>  other kernel code.
>  
> -The period of the hrtimer is 2*watchdog_thresh/5, which means it has
> -two or three chances to generate an interrupt before the hardlockup
> -detector kicks in.
> +Frequency and Heartbeats
> +------------------------
> +
> +The hrtimer used by the softlockup detector serves a dual purpose:
> +it detects softlockups, and it also generates the interrupts
> +(heartbeats) that the hardlockup detectors use to verify CPU liveness.
> +
> +The period of this hrtimer is 2*watchdog_thresh/5. This means the
> +hrtimer has two or three chances to generate an interrupt before the
> +NMI hardlockup detector kicks in.

As I said, the hrtimer is the core of all detectors. I would explain
this first.

I propose the following changes on top of this one:

From f1cfdc330cfbc68568dfe6bf2513bde9373c89d7 Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Mon, 23 Mar 2026 18:21:38 +0100
Subject: [PATCH] doc: watchdog: Futher improvements

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 .../admin-guide/lockup-watchdogs.rst          | 44 ++++++++++---------
 1 file changed, 24 insertions(+), 20 deletions(-)

diff --git a/Documentation/admin-guide/lockup-watchdogs.rst b/Documentation/admin-guide/lockup-watchdogs.rst
index 7ae7ce3abd2c..d0773edf3396 100644
--- a/Documentation/admin-guide/lockup-watchdogs.rst
+++ b/Documentation/admin-guide/lockup-watchdogs.rst
@@ -41,31 +41,35 @@ is a trade-off between fast response to lockups and detection overhead.
 Implementation
 ==============
 
-The soft lockup detector is built on top of the hrtimer subsystem.
-The hard lockup detector is built on top of the perf subsystem
-(on architectures that support it) or uses an SMP "buddy" system.
-
-Softlockup Detector
--------------------
-
-The watchdog job runs in a stop scheduling thread that updates a
-timestamp every time it is scheduled. If that timestamp is not updated
-for 2*watchdog_thresh seconds (the softlockup threshold) the
-'softlockup detector' (coded inside the hrtimer callback function)
-will dump useful debug information to the system log, after which it
-will call panic if it was instructed to do so or resume execution of
-other kernel code.
+The soft and hard lockup detectors are built around a hrtimer.
+In addition, the softlockup detector regularly schedules a job, and
+the hard lockup detector might use Perf/NMI events on architectures
+that support it.
 
 Frequency and Heartbeats
 ------------------------
 
-The hrtimer used by the softlockup detector serves a dual purpose:
-it detects softlockups, and it also generates the interrupts
-(heartbeats) that the hardlockup detectors use to verify CPU liveness.
+The core of the detectors in a hrtimer. It servers multiple purpose:
 
-The period of this hrtimer is 2*watchdog_thresh/5. This means the
-hrtimer has two or three chances to generate an interrupt before the
-NMI hardlockup detector kicks in.
+- schedules watchdog job for the softlockup detector
+- bumps the interrupt counter for hardlockup detectors (heartbeat)
+- detects softlockups
+- detects hardlockups in Buddy mode
+
+The period of this hrtimer is 2*watchdog_thresh/5, which is 4 seconds
+by default. The hrtimer has two or three chances to generate an interrupt
+(heartbeat) before the hardlockup detector kicks in.
+
+Softlockup Detector
+-------------------
+
+The watchdog job is scheduled by the hrtimer and runs in a stop scheduling
+thread. It updates a timestamp every time it is scheduled. If that timestamp
+is not updated for 2*watchdog_thresh seconds (the softlockup threshold) the
+'softlockup detector' (coded inside the hrtimer callback function)
+will dump useful debug information to the system log, after which it
+will call panic if it was instructed to do so or resume execution of
+other kernel code.
 
 Hardlockup Detector (NMI/Perf)
 ------------------------------
-- 
2.53.0


Best Regards,
Petr

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 5/5] doc: watchdog: Document buddy detector
  2026-03-23 17:26   ` Petr Mladek
@ 2026-03-23 22:45     ` Doug Anderson
  0 siblings, 0 replies; 13+ messages in thread
From: Doug Anderson @ 2026-03-23 22:45 UTC (permalink / raw)
  To: Petr Mladek
  Cc: mrungta, Jinchao Wang, Yunhui Cui, Stephane Eranian, Ian Rogers,
	Li Huafei, Feng Tang, Max Kellermann, Jonathan Corbet,
	Andrew Morton, Florian Delizy, Shuah Khan, linux-kernel,
	linux-doc

Hi,

On Mon, Mar 23, 2026 at 10:26 AM Petr Mladek <pmladek@suse.com> wrote:
>
> From f1cfdc330cfbc68568dfe6bf2513bde9373c89d7 Mon Sep 17 00:00:00 2001
> From: Petr Mladek <pmladek@suse.com>
> Date: Mon, 23 Mar 2026 18:21:38 +0100
> Subject: [PATCH] doc: watchdog: Futher improvements
>
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> ---
>  .../admin-guide/lockup-watchdogs.rst          | 44 ++++++++++---------
>  1 file changed, 24 insertions(+), 20 deletions(-)
>
> diff --git a/Documentation/admin-guide/lockup-watchdogs.rst b/Documentation/admin-guide/lockup-watchdogs.rst
> index 7ae7ce3abd2c..d0773edf3396 100644
> --- a/Documentation/admin-guide/lockup-watchdogs.rst
> +++ b/Documentation/admin-guide/lockup-watchdogs.rst
> @@ -41,31 +41,35 @@ is a trade-off between fast response to lockups and detection overhead.
>  Implementation
>  ==============
>
> -The soft lockup detector is built on top of the hrtimer subsystem.
> -The hard lockup detector is built on top of the perf subsystem
> -(on architectures that support it) or uses an SMP "buddy" system.
> -
> -Softlockup Detector
> --------------------
> -
> -The watchdog job runs in a stop scheduling thread that updates a
> -timestamp every time it is scheduled. If that timestamp is not updated
> -for 2*watchdog_thresh seconds (the softlockup threshold) the
> -'softlockup detector' (coded inside the hrtimer callback function)
> -will dump useful debug information to the system log, after which it
> -will call panic if it was instructed to do so or resume execution of
> -other kernel code.
> +The soft and hard lockup detectors are built around a hrtimer.
> +In addition, the softlockup detector regularly schedules a job, and
> +the hard lockup detector might use Perf/NMI events on architectures
> +that support it.
>
>  Frequency and Heartbeats
>  ------------------------
>
> -The hrtimer used by the softlockup detector serves a dual purpose:
> -it detects softlockups, and it also generates the interrupts
> -(heartbeats) that the hardlockup detectors use to verify CPU liveness.
> +The core of the detectors in a hrtimer. It servers multiple purpose:
>
> -The period of this hrtimer is 2*watchdog_thresh/5. This means the
> -hrtimer has two or three chances to generate an interrupt before the
> -NMI hardlockup detector kicks in.
> +- schedules watchdog job for the softlockup detector
> +- bumps the interrupt counter for hardlockup detectors (heartbeat)
> +- detects softlockups
> +- detects hardlockups in Buddy mode
> +
> +The period of this hrtimer is 2*watchdog_thresh/5, which is 4 seconds
> +by default. The hrtimer has two or three chances to generate an interrupt
> +(heartbeat) before the hardlockup detector kicks in.
> +
> +Softlockup Detector
> +-------------------
> +
> +The watchdog job is scheduled by the hrtimer and runs in a stop scheduling
> +thread. It updates a timestamp every time it is scheduled. If that timestamp
> +is not updated for 2*watchdog_thresh seconds (the softlockup threshold) the
> +'softlockup detector' (coded inside the hrtimer callback function)
> +will dump useful debug information to the system log, after which it
> +will call panic if it was instructed to do so or resume execution of
> +other kernel code.

I'm happy with Petr's further improvements.

Reviewed-by: Douglas Anderson <dianders@chromium.org>

I think Andrew can just pick it up atop Mayank's. Andrew: If you need
any reposting, please yell.

Petr: thank you very much for your review of these patches! I'm super
happy you found the bug in Mayank's V1 that I missed and I think
things look nice now. :-)

-Doug

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-03-23 22:45 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-12 23:22 [PATCH v2 0/5] watchdog/hardlockup: Improvements to hardlockup detection and documentation Mayank Rungta via B4 Relay
2026-03-12 23:22 ` [PATCH v2 1/5] watchdog: Return early in watchdog_hardlockup_check() Mayank Rungta via B4 Relay
2026-03-13 15:27   ` Doug Anderson
2026-03-23 15:47   ` Petr Mladek
2026-03-12 23:22 ` [PATCH v2 2/5] watchdog: Update saved interrupts during check Mayank Rungta via B4 Relay
2026-03-13 15:27   ` Doug Anderson
2026-03-23 15:58   ` Petr Mladek
2026-03-12 23:22 ` [PATCH v2 3/5] doc: watchdog: Clarify hardlockup detection timing Mayank Rungta via B4 Relay
2026-03-12 23:22 ` [PATCH v2 4/5] watchdog/hardlockup: improve buddy system detection timeliness Mayank Rungta via B4 Relay
2026-03-23 16:26   ` Petr Mladek
2026-03-12 23:22 ` [PATCH v2 5/5] doc: watchdog: Document buddy detector Mayank Rungta via B4 Relay
2026-03-23 17:26   ` Petr Mladek
2026-03-23 22:45     ` Doug Anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox