linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
       [not found] <https://lore.kernel.org/all/20250915035355.10846-1-cuiyunhui@bytedance.com/>
@ 2025-09-16 14:50 ` Jinchao Wang
  2025-09-17  0:03   ` Ian Rogers
  2025-09-17  6:08   ` Christophe Leroy
  0 siblings, 2 replies; 22+ messages in thread
From: Jinchao Wang @ 2025-09-16 14:50 UTC (permalink / raw)
  To: Doug Anderson, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm,
	catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx,
	mingo, bp, dave.hansen, hpa, acme, namhyung, mark.rutland,
	alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees,
	masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie,
	gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd,
	zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1,
	luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86,
	linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users,
	Ian Rogers
  Cc: Jinchao Wang

Currently, the hard lockup detector is selected at compile time via
Kconfig, which requires a kernel rebuild to switch implementations.
This is inflexible, especially on systems where a perf event may not
be available or may be needed for other tasks.

This commit refactors the hard lockup detector to replace a rigid
compile-time choice with a flexible build-time and boot-time solution.
The patch supports building the kernel with either detector
independently, or with both. When both are built, a new boot parameter
`hardlockup_detector="perf|buddy"` allows the selection at boot time.
This is a more robust and user-friendly design.

This patch is a follow-up to the discussion on the kernel mailing list
regarding the preference and future of the hard lockup detectors. It
implements a flexible solution that addresses the community's need to
select an appropriate detector at boot time.

The core changes are:
- The `perf` and `buddy` watchdog implementations are separated into
  distinct functions (e.g., `watchdog_perf_hardlockup_enable`).
- Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`)
  to serve as a single API for the entire feature.
- A new `hardlockup_detector=` boot parameter is added to allow the
  user to select the desired detector at boot time.
- The Kconfig options are simplified by removing the complex
  `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be
  built without mutual exclusion.
- The weak stubs are updated to call the new function pointers,
  centralizing the watchdog logic.

Link: https://lore.kernel.org/all/20250915035355.10846-1-cuiyunhui@bytedance.com/
Link: https://lore.kernel.org/all/CAD=FV=WWUiCi6bZCs_gseFpDDWNkuJMoL6XCftEo6W7q6jRCkg@mail.gmail.com/

Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
 .../admin-guide/kernel-parameters.txt         |  7 +++
 include/linux/nmi.h                           |  6 +++
 kernel/watchdog.c                             | 46 ++++++++++++++++++-
 kernel/watchdog_buddy.c                       |  7 +--
 kernel/watchdog_perf.c                        | 10 ++--
 lib/Kconfig.debug                             | 37 +++++++--------
 6 files changed, 85 insertions(+), 28 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 5a7a83c411e9..0af214ee566c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1828,6 +1828,13 @@
 			backtraces on all cpus.
 			Format: 0 | 1
 
+	hardlockup_detector=
+			[perf, buddy] Selects the hard lockup detector to use at
+			boot time.
+			Format: <string>
+			- "perf": Use the perf-based detector.
+			- "buddy": Use the buddy-based detector.
+
 	hash_pointers=
 			[KNL,EARLY]
 			By default, when pointers are printed to the console
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index cf3c6ab408aa..9298980ce572 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -100,6 +100,9 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs);
 #endif
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
+void watchdog_perf_hardlockup_enable(unsigned int cpu);
+void watchdog_perf_hardlockup_disable(unsigned int cpu);
+extern int watchdog_perf_hardlockup_probe(void);
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
 extern void hardlockup_config_perf_event(const char *str);
@@ -120,6 +123,9 @@ void watchdog_hardlockup_disable(unsigned int cpu);
 void lockup_detector_reconfigure(void);
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY
+void watchdog_buddy_hardlockup_enable(unsigned int cpu);
+void watchdog_buddy_hardlockup_disable(unsigned int cpu);
+int watchdog_buddy_hardlockup_probe(void);
 void watchdog_buddy_check_hardlockup(int hrtimer_interrupts);
 #else
 static inline void watchdog_buddy_check_hardlockup(int hrtimer_interrupts) {}
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 80b56c002c7f..85451d24a77d 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -55,6 +55,37 @@ unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
+/* The global function pointers */
+void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_enable;
+void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_disable;
+int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_perf_hardlockup_probe;
+#elif defined(CONFIG_HARDLOCKUP_DETECTOR_BUDDY)
+void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_enable;
+void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_disable;
+int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_buddy_hardlockup_probe;
+#endif
+
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE
+static char *hardlockup_detector_type = "perf"; /* Default to perf */
+static int __init set_hardlockup_detector_type(char *str)
+{
+	if (!strncmp(str, "perf", 4)) {
+		watchdog_hardlockup_enable_ptr = watchdog_perf_hardlockup_enable;
+		watchdog_hardlockup_disable_ptr = watchdog_perf_hardlockup_disable;
+		watchdog_hardlockup_probe_ptr = watchdog_perf_hardlockup_probe;
+	} else if (!strncmp(str, "buddy", 5)) {
+		watchdog_hardlockup_enable_ptr = watchdog_buddy_hardlockup_enable;
+		watchdog_hardlockup_disable_ptr = watchdog_buddy_hardlockup_disable;
+		watchdog_hardlockup_probe_ptr = watchdog_buddy_hardlockup_probe;
+	}
+	return 1;
+}
+
+__setup("hardlockup_detector=", set_hardlockup_detector_type);
+
+#endif
+
 # ifdef CONFIG_SMP
 int __read_mostly sysctl_hardlockup_all_cpu_backtrace;
 # endif /* CONFIG_SMP */
@@ -262,9 +293,17 @@ static inline void watchdog_hardlockup_kick(void) { }
  * softlockup watchdog start and stop. The detector must select the
  * SOFTLOCKUP_DETECTOR Kconfig.
  */
-void __weak watchdog_hardlockup_enable(unsigned int cpu) { }
+void __weak watchdog_hardlockup_enable(unsigned int cpu)
+{
+	if (watchdog_hardlockup_enable_ptr)
+		watchdog_hardlockup_enable_ptr(cpu);
+}
 
-void __weak watchdog_hardlockup_disable(unsigned int cpu) { }
+void __weak watchdog_hardlockup_disable(unsigned int cpu)
+{
+	if (watchdog_hardlockup_disable_ptr)
+		watchdog_hardlockup_disable_ptr(cpu);
+}
 
 /*
  * Watchdog-detector specific API.
@@ -275,6 +314,9 @@ void __weak watchdog_hardlockup_disable(unsigned int cpu) { }
  */
 int __weak __init watchdog_hardlockup_probe(void)
 {
+	if (watchdog_hardlockup_probe_ptr)
+		return watchdog_hardlockup_probe_ptr();
+
 	return -ENODEV;
 }
 
diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c
index ee754d767c21..390d89bfcafa 100644
--- a/kernel/watchdog_buddy.c
+++ b/kernel/watchdog_buddy.c
@@ -19,15 +19,16 @@ static unsigned int watchdog_next_cpu(unsigned int cpu)
 	return next_cpu;
 }
 
-int __init watchdog_hardlockup_probe(void)
+int __init watchdog_buddy_hardlockup_probe(void)
 {
 	return 0;
 }
 
-void watchdog_hardlockup_enable(unsigned int cpu)
+void watchdog_buddy_hardlockup_enable(unsigned int cpu)
 {
 	unsigned int next_cpu;
 
+	pr_info("ddddd %s\n", __func__);
 	/*
 	 * The new CPU will be marked online before the hrtimer interrupt
 	 * gets a chance to run on it. If another CPU tests for a
@@ -58,7 +59,7 @@ void watchdog_hardlockup_enable(unsigned int cpu)
 	cpumask_set_cpu(cpu, &watchdog_cpus);
 }
 
-void watchdog_hardlockup_disable(unsigned int cpu)
+void watchdog_buddy_hardlockup_disable(unsigned int cpu)
 {
 	unsigned int next_cpu = watchdog_next_cpu(cpu);
 
diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
index 9c58f5b4381d..270110e58f20 100644
--- a/kernel/watchdog_perf.c
+++ b/kernel/watchdog_perf.c
@@ -153,10 +153,12 @@ static int hardlockup_detector_event_create(void)
  * watchdog_hardlockup_enable - Enable the local event
  * @cpu: The CPU to enable hard lockup on.
  */
-void watchdog_hardlockup_enable(unsigned int cpu)
+void watchdog_perf_hardlockup_enable(unsigned int cpu)
 {
 	WARN_ON_ONCE(cpu != smp_processor_id());
 
+	pr_info("ddddd %s\n", __func__);
+
 	if (hardlockup_detector_event_create())
 		return;
 
@@ -172,7 +174,7 @@ void watchdog_hardlockup_enable(unsigned int cpu)
  * watchdog_hardlockup_disable - Disable the local event
  * @cpu: The CPU to enable hard lockup on.
  */
-void watchdog_hardlockup_disable(unsigned int cpu)
+void watchdog_perf_hardlockup_disable(unsigned int cpu)
 {
 	struct perf_event *event = this_cpu_read(watchdog_ev);
 
@@ -257,10 +259,12 @@ bool __weak __init arch_perf_nmi_is_available(void)
 /**
  * watchdog_hardlockup_probe - Probe whether NMI event is available at all
  */
-int __init watchdog_hardlockup_probe(void)
+int __init watchdog_perf_hardlockup_probe(void)
 {
 	int ret;
 
+	pr_info("ddddd %s\n", __func__);
+
 	if (!arch_perf_nmi_is_available())
 		return -ENODEV;
 
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index dc0e0c6ed075..443353fad1c1 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1167,36 +1167,33 @@ config HARDLOCKUP_DETECTOR
 #
 # Note that arch-specific variants are always preferred.
 #
-config HARDLOCKUP_DETECTOR_PREFER_BUDDY
-	bool "Prefer the buddy CPU hardlockup detector"
-	depends on HARDLOCKUP_DETECTOR
-	depends on HAVE_HARDLOCKUP_DETECTOR_PERF && HAVE_HARDLOCKUP_DETECTOR_BUDDY
-	depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH
-	help
-	  Say Y here to prefer the buddy hardlockup detector over the perf one.
-
-	  With the buddy detector, each CPU uses its softlockup hrtimer
-	  to check that the next CPU is processing hrtimer interrupts by
-	  verifying that a counter is increasing.
-
-	  This hardlockup detector is useful on systems that don't have
-	  an arch-specific hardlockup detector or if resources needed
-	  for the hardlockup detector are better used for other things.
-
 config HARDLOCKUP_DETECTOR_PERF
-	bool
+	bool "Enable perf-based hard lockup detector (preferred)"
 	depends on HARDLOCKUP_DETECTOR
-	depends on HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY
+	depends on HAVE_HARDLOCKUP_DETECTOR_PERF
 	depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH
 	select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
+	help
+	  This detector uses a perf event on the CPU to detect when a CPU
+	  has become non-maskable interrupt (NMI) stuck. This is the
+	  preferred method on modern systems as it can detect lockups on
+	  all CPUs at the same time.
 
 config HARDLOCKUP_DETECTOR_BUDDY
-	bool
+	bool "Enable buddy-based hard lockup detector"
 	depends on HARDLOCKUP_DETECTOR
 	depends on HAVE_HARDLOCKUP_DETECTOR_BUDDY
-	depends on !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY
 	depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH
 	select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
+	help
+	  This is an alternative lockup detector that uses a heartbeat
+	  mechanism between CPUs to detect when one has stopped responding.
+	  It is less precise than the perf-based detector and cannot detect
+	  all-CPU lockups, but it does not require a perf counter.
+
+config CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE
+	bool
+	depends on HARDLOCKUP_DETECTOR_PERF && HARDLOCKUP_DETECTOR_BUDDY
 
 config HARDLOCKUP_DETECTOR_ARCH
 	bool
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
  2025-09-16 14:50 ` [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector Jinchao Wang
@ 2025-09-17  0:03   ` Ian Rogers
  2025-09-17  1:47     ` Jinchao Wang
  2025-09-17  6:08   ` Christophe Leroy
  1 sibling, 1 reply; 22+ messages in thread
From: Ian Rogers @ 2025-09-17  0:03 UTC (permalink / raw)
  To: Jinchao Wang
  Cc: Doug Anderson, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm,
	catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx,
	mingo, bp, dave.hansen, hpa, acme, namhyung, mark.rutland,
	alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees,
	masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie,
	gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd,
	zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1,
	luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86,
	linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users

On Tue, Sep 16, 2025 at 7:51 AM Jinchao Wang <wangjinchao600@gmail.com> wrote:
>
> Currently, the hard lockup detector is selected at compile time via
> Kconfig, which requires a kernel rebuild to switch implementations.
> This is inflexible, especially on systems where a perf event may not
> be available or may be needed for other tasks.
>
> This commit refactors the hard lockup detector to replace a rigid
> compile-time choice with a flexible build-time and boot-time solution.
> The patch supports building the kernel with either detector
> independently, or with both. When both are built, a new boot parameter
> `hardlockup_detector="perf|buddy"` allows the selection at boot time.
> This is a more robust and user-friendly design.
>
> This patch is a follow-up to the discussion on the kernel mailing list
> regarding the preference and future of the hard lockup detectors. It
> implements a flexible solution that addresses the community's need to
> select an appropriate detector at boot time.
>
> The core changes are:
> - The `perf` and `buddy` watchdog implementations are separated into
>   distinct functions (e.g., `watchdog_perf_hardlockup_enable`).
> - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`)
>   to serve as a single API for the entire feature.
> - A new `hardlockup_detector=` boot parameter is added to allow the
>   user to select the desired detector at boot time.
> - The Kconfig options are simplified by removing the complex
>   `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be
>   built without mutual exclusion.
> - The weak stubs are updated to call the new function pointers,
>   centralizing the watchdog logic.

What is the impact on  /proc/sys/kernel/nmi_watchdog ? Is that
enabling and disabling whatever the boot time choice was? I'm not sure
why this has to be a boot time option given the ability to configure
via /proc/sys/kernel/nmi_watchdog.

> Link: https://lore.kernel.org/all/20250915035355.10846-1-cuiyunhui@bytedance.com/
> Link: https://lore.kernel.org/all/CAD=FV=WWUiCi6bZCs_gseFpDDWNkuJMoL6XCftEo6W7q6jRCkg@mail.gmail.com/
>
> Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  7 +++
>  include/linux/nmi.h                           |  6 +++
>  kernel/watchdog.c                             | 46 ++++++++++++++++++-
>  kernel/watchdog_buddy.c                       |  7 +--
>  kernel/watchdog_perf.c                        | 10 ++--
>  lib/Kconfig.debug                             | 37 +++++++--------
>  6 files changed, 85 insertions(+), 28 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 5a7a83c411e9..0af214ee566c 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1828,6 +1828,13 @@
>                         backtraces on all cpus.
>                         Format: 0 | 1
>
> +       hardlockup_detector=
> +                       [perf, buddy] Selects the hard lockup detector to use at
> +                       boot time.
> +                       Format: <string>
> +                       - "perf": Use the perf-based detector.
> +                       - "buddy": Use the buddy-based detector.
> +
>         hash_pointers=
>                         [KNL,EARLY]
>                         By default, when pointers are printed to the console
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index cf3c6ab408aa..9298980ce572 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -100,6 +100,9 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs);
>  #endif
>
>  #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> +void watchdog_perf_hardlockup_enable(unsigned int cpu);
> +void watchdog_perf_hardlockup_disable(unsigned int cpu);
> +extern int watchdog_perf_hardlockup_probe(void);
>  extern void hardlockup_detector_perf_stop(void);
>  extern void hardlockup_detector_perf_restart(void);
>  extern void hardlockup_config_perf_event(const char *str);
> @@ -120,6 +123,9 @@ void watchdog_hardlockup_disable(unsigned int cpu);
>  void lockup_detector_reconfigure(void);
>
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY
> +void watchdog_buddy_hardlockup_enable(unsigned int cpu);
> +void watchdog_buddy_hardlockup_disable(unsigned int cpu);
> +int watchdog_buddy_hardlockup_probe(void);
>  void watchdog_buddy_check_hardlockup(int hrtimer_interrupts);
>  #else
>  static inline void watchdog_buddy_check_hardlockup(int hrtimer_interrupts) {}
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 80b56c002c7f..85451d24a77d 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -55,6 +55,37 @@ unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
>
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
> +/* The global function pointers */
> +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_enable;
> +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_disable;
> +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_perf_hardlockup_probe;
> +#elif defined(CONFIG_HARDLOCKUP_DETECTOR_BUDDY)
> +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_enable;
> +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_disable;
> +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_buddy_hardlockup_probe;
> +#endif
> +
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE
> +static char *hardlockup_detector_type = "perf"; /* Default to perf */
> +static int __init set_hardlockup_detector_type(char *str)
> +{
> +       if (!strncmp(str, "perf", 4)) {
> +               watchdog_hardlockup_enable_ptr = watchdog_perf_hardlockup_enable;
> +               watchdog_hardlockup_disable_ptr = watchdog_perf_hardlockup_disable;
> +               watchdog_hardlockup_probe_ptr = watchdog_perf_hardlockup_probe;
> +       } else if (!strncmp(str, "buddy", 5)) {
> +               watchdog_hardlockup_enable_ptr = watchdog_buddy_hardlockup_enable;
> +               watchdog_hardlockup_disable_ptr = watchdog_buddy_hardlockup_disable;
> +               watchdog_hardlockup_probe_ptr = watchdog_buddy_hardlockup_probe;
> +       }
> +       return 1;
> +}
> +
> +__setup("hardlockup_detector=", set_hardlockup_detector_type);
> +
> +#endif
> +
>  # ifdef CONFIG_SMP
>  int __read_mostly sysctl_hardlockup_all_cpu_backtrace;
>  # endif /* CONFIG_SMP */
> @@ -262,9 +293,17 @@ static inline void watchdog_hardlockup_kick(void) { }
>   * softlockup watchdog start and stop. The detector must select the
>   * SOFTLOCKUP_DETECTOR Kconfig.
>   */
> -void __weak watchdog_hardlockup_enable(unsigned int cpu) { }
> +void __weak watchdog_hardlockup_enable(unsigned int cpu)
> +{
> +       if (watchdog_hardlockup_enable_ptr)
> +               watchdog_hardlockup_enable_ptr(cpu);
> +}
>
> -void __weak watchdog_hardlockup_disable(unsigned int cpu) { }
> +void __weak watchdog_hardlockup_disable(unsigned int cpu)
> +{
> +       if (watchdog_hardlockup_disable_ptr)
> +               watchdog_hardlockup_disable_ptr(cpu);
> +}
>
>  /*
>   * Watchdog-detector specific API.
> @@ -275,6 +314,9 @@ void __weak watchdog_hardlockup_disable(unsigned int cpu) { }
>   */
>  int __weak __init watchdog_hardlockup_probe(void)
>  {
> +       if (watchdog_hardlockup_probe_ptr)
> +               return watchdog_hardlockup_probe_ptr();
> +
>         return -ENODEV;
>  }
>
> diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c
> index ee754d767c21..390d89bfcafa 100644
> --- a/kernel/watchdog_buddy.c
> +++ b/kernel/watchdog_buddy.c
> @@ -19,15 +19,16 @@ static unsigned int watchdog_next_cpu(unsigned int cpu)
>         return next_cpu;
>  }
>
> -int __init watchdog_hardlockup_probe(void)
> +int __init watchdog_buddy_hardlockup_probe(void)
>  {
>         return 0;
>  }
>
> -void watchdog_hardlockup_enable(unsigned int cpu)
> +void watchdog_buddy_hardlockup_enable(unsigned int cpu)
>  {
>         unsigned int next_cpu;
>
> +       pr_info("ddddd %s\n", __func__);
>         /*
>          * The new CPU will be marked online before the hrtimer interrupt
>          * gets a chance to run on it. If another CPU tests for a
> @@ -58,7 +59,7 @@ void watchdog_hardlockup_enable(unsigned int cpu)
>         cpumask_set_cpu(cpu, &watchdog_cpus);
>  }
>
> -void watchdog_hardlockup_disable(unsigned int cpu)
> +void watchdog_buddy_hardlockup_disable(unsigned int cpu)
>  {
>         unsigned int next_cpu = watchdog_next_cpu(cpu);
>
> diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
> index 9c58f5b4381d..270110e58f20 100644
> --- a/kernel/watchdog_perf.c
> +++ b/kernel/watchdog_perf.c
> @@ -153,10 +153,12 @@ static int hardlockup_detector_event_create(void)
>   * watchdog_hardlockup_enable - Enable the local event
>   * @cpu: The CPU to enable hard lockup on.
>   */
> -void watchdog_hardlockup_enable(unsigned int cpu)
> +void watchdog_perf_hardlockup_enable(unsigned int cpu)
>  {
>         WARN_ON_ONCE(cpu != smp_processor_id());
>
> +       pr_info("ddddd %s\n", __func__);
> +
>         if (hardlockup_detector_event_create())
>                 return;
>
> @@ -172,7 +174,7 @@ void watchdog_hardlockup_enable(unsigned int cpu)
>   * watchdog_hardlockup_disable - Disable the local event
>   * @cpu: The CPU to enable hard lockup on.
>   */
> -void watchdog_hardlockup_disable(unsigned int cpu)
> +void watchdog_perf_hardlockup_disable(unsigned int cpu)
>  {
>         struct perf_event *event = this_cpu_read(watchdog_ev);
>
> @@ -257,10 +259,12 @@ bool __weak __init arch_perf_nmi_is_available(void)
>  /**
>   * watchdog_hardlockup_probe - Probe whether NMI event is available at all
>   */
> -int __init watchdog_hardlockup_probe(void)
> +int __init watchdog_perf_hardlockup_probe(void)
>  {
>         int ret;
>
> +       pr_info("ddddd %s\n", __func__);
> +
>         if (!arch_perf_nmi_is_available())
>                 return -ENODEV;
>
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index dc0e0c6ed075..443353fad1c1 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1167,36 +1167,33 @@ config HARDLOCKUP_DETECTOR
>  #
>  # Note that arch-specific variants are always preferred.
>  #
> -config HARDLOCKUP_DETECTOR_PREFER_BUDDY
> -       bool "Prefer the buddy CPU hardlockup detector"
> -       depends on HARDLOCKUP_DETECTOR
> -       depends on HAVE_HARDLOCKUP_DETECTOR_PERF && HAVE_HARDLOCKUP_DETECTOR_BUDDY
> -       depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH
> -       help
> -         Say Y here to prefer the buddy hardlockup detector over the perf one.
> -
> -         With the buddy detector, each CPU uses its softlockup hrtimer
> -         to check that the next CPU is processing hrtimer interrupts by
> -         verifying that a counter is increasing.
> -
> -         This hardlockup detector is useful on systems that don't have
> -         an arch-specific hardlockup detector or if resources needed
> -         for the hardlockup detector are better used for other things.
> -
>  config HARDLOCKUP_DETECTOR_PERF
> -       bool
> +       bool "Enable perf-based hard lockup detector (preferred)"
>         depends on HARDLOCKUP_DETECTOR
> -       depends on HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY
> +       depends on HAVE_HARDLOCKUP_DETECTOR_PERF
>         depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH
>         select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
> +       help
> +         This detector uses a perf event on the CPU to detect when a CPU
> +         has become non-maskable interrupt (NMI) stuck. This is the
> +         preferred method on modern systems as it can detect lockups on
> +         all CPUs at the same time.

I'd say this option should be the default for kernel developers but
shouldn't be used by default to free the perf event and due to the
extra power overhead.

Thanks,
Ian

>  config HARDLOCKUP_DETECTOR_BUDDY
> -       bool
> +       bool "Enable buddy-based hard lockup detector"
>         depends on HARDLOCKUP_DETECTOR
>         depends on HAVE_HARDLOCKUP_DETECTOR_BUDDY
> -       depends on !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY
>         depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH
>         select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
> +       help
> +         This is an alternative lockup detector that uses a heartbeat
> +         mechanism between CPUs to detect when one has stopped responding.
> +         It is less precise than the perf-based detector and cannot detect
> +         all-CPU lockups, but it does not require a perf counter.
> +
> +config CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE
> +       bool
> +       depends on HARDLOCKUP_DETECTOR_PERF && HARDLOCKUP_DETECTOR_BUDDY
>
>  config HARDLOCKUP_DETECTOR_ARCH
>         bool
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
  2025-09-17  0:03   ` Ian Rogers
@ 2025-09-17  1:47     ` Jinchao Wang
  2025-09-17  5:13       ` Ian Rogers
  0 siblings, 1 reply; 22+ messages in thread
From: Jinchao Wang @ 2025-09-17  1:47 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Doug Anderson, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm,
	catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx,
	mingo, bp, dave.hansen, hpa, acme, namhyung, mark.rutland,
	alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees,
	masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie,
	gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd,
	zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1,
	luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86,
	linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users

On Tue, Sep 16, 2025 at 05:03:48PM -0700, Ian Rogers wrote:
> On Tue, Sep 16, 2025 at 7:51 AM Jinchao Wang <wangjinchao600@gmail.com> wrote:
> >
> > Currently, the hard lockup detector is selected at compile time via
> > Kconfig, which requires a kernel rebuild to switch implementations.
> > This is inflexible, especially on systems where a perf event may not
> > be available or may be needed for other tasks.
> >
> > This commit refactors the hard lockup detector to replace a rigid
> > compile-time choice with a flexible build-time and boot-time solution.
> > The patch supports building the kernel with either detector
> > independently, or with both. When both are built, a new boot parameter
> > `hardlockup_detector="perf|buddy"` allows the selection at boot time.
> > This is a more robust and user-friendly design.
> >
> > This patch is a follow-up to the discussion on the kernel mailing list
> > regarding the preference and future of the hard lockup detectors. It
> > implements a flexible solution that addresses the community's need to
> > select an appropriate detector at boot time.
> >
> > The core changes are:
> > - The `perf` and `buddy` watchdog implementations are separated into
> >   distinct functions (e.g., `watchdog_perf_hardlockup_enable`).
> > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`)
> >   to serve as a single API for the entire feature.
> > - A new `hardlockup_detector=` boot parameter is added to allow the
> >   user to select the desired detector at boot time.
> > - The Kconfig options are simplified by removing the complex
> >   `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be
> >   built without mutual exclusion.
> > - The weak stubs are updated to call the new function pointers,
> >   centralizing the watchdog logic.
> 
> What is the impact on  /proc/sys/kernel/nmi_watchdog ? Is that
> enabling and disabling whatever the boot time choice was? I'm not sure
> why this has to be a boot time option given the ability to configure
> via /proc/sys/kernel/nmi_watchdog.
The new hardlockup_detector boot parameter and the existing
/proc/sys/kernel/nmi_watchdog file serve different purposes.

The boot parameter selects the type of hard lockup detector (perf or buddy).
This choice is made once at boot.

 /proc/sys/kernel/nmi_watchdog, on the other hand, is only a simple on/off
switch for the currently selected detector. It does not change the detector's
type.
> 
> > Link: https://lore.kernel.org/all/20250915035355.10846-1-cuiyunhui@bytedance.com/
> > Link: https://lore.kernel.org/all/CAD=FV=WWUiCi6bZCs_gseFpDDWNkuJMoL6XCftEo6W7q6jRCkg@mail.gmail.com/
> >
> > Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
> > ---
> >  .../admin-guide/kernel-parameters.txt         |  7 +++
> >  include/linux/nmi.h                           |  6 +++
> >  kernel/watchdog.c                             | 46 ++++++++++++++++++-
> >  kernel/watchdog_buddy.c                       |  7 +--
> >  kernel/watchdog_perf.c                        | 10 ++--
> >  lib/Kconfig.debug                             | 37 +++++++--------
> >  6 files changed, 85 insertions(+), 28 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index 5a7a83c411e9..0af214ee566c 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -1828,6 +1828,13 @@
> >                         backtraces on all cpus.
> >                         Format: 0 | 1
> >
> > +       hardlockup_detector=
> > +                       [perf, buddy] Selects the hard lockup detector to use at
> > +                       boot time.
> > +                       Format: <string>
> > +                       - "perf": Use the perf-based detector.
> > +                       - "buddy": Use the buddy-based detector.
> > +
> >         hash_pointers=
> >                         [KNL,EARLY]
> >                         By default, when pointers are printed to the console
> > diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> > index cf3c6ab408aa..9298980ce572 100644
> > --- a/include/linux/nmi.h
> > +++ b/include/linux/nmi.h
> > @@ -100,6 +100,9 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs);
> >  #endif
> >
> >  #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> > +void watchdog_perf_hardlockup_enable(unsigned int cpu);
> > +void watchdog_perf_hardlockup_disable(unsigned int cpu);
> > +extern int watchdog_perf_hardlockup_probe(void);
> >  extern void hardlockup_detector_perf_stop(void);
> >  extern void hardlockup_detector_perf_restart(void);
> >  extern void hardlockup_config_perf_event(const char *str);
> > @@ -120,6 +123,9 @@ void watchdog_hardlockup_disable(unsigned int cpu);
> >  void lockup_detector_reconfigure(void);
> >
> >  #ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY
> > +void watchdog_buddy_hardlockup_enable(unsigned int cpu);
> > +void watchdog_buddy_hardlockup_disable(unsigned int cpu);
> > +int watchdog_buddy_hardlockup_probe(void);
> >  void watchdog_buddy_check_hardlockup(int hrtimer_interrupts);
> >  #else
> >  static inline void watchdog_buddy_check_hardlockup(int hrtimer_interrupts) {}
> > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > index 80b56c002c7f..85451d24a77d 100644
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -55,6 +55,37 @@ unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
> >
> >  #ifdef CONFIG_HARDLOCKUP_DETECTOR
> >
> > +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
> > +/* The global function pointers */
> > +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_enable;
> > +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_disable;
> > +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_perf_hardlockup_probe;
> > +#elif defined(CONFIG_HARDLOCKUP_DETECTOR_BUDDY)
> > +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_enable;
> > +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_disable;
> > +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_buddy_hardlockup_probe;
> > +#endif
> > +
> > +#ifdef CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE
> > +static char *hardlockup_detector_type = "perf"; /* Default to perf */
> > +static int __init set_hardlockup_detector_type(char *str)
> > +{
> > +       if (!strncmp(str, "perf", 4)) {
> > +               watchdog_hardlockup_enable_ptr = watchdog_perf_hardlockup_enable;
> > +               watchdog_hardlockup_disable_ptr = watchdog_perf_hardlockup_disable;
> > +               watchdog_hardlockup_probe_ptr = watchdog_perf_hardlockup_probe;
> > +       } else if (!strncmp(str, "buddy", 5)) {
> > +               watchdog_hardlockup_enable_ptr = watchdog_buddy_hardlockup_enable;
> > +               watchdog_hardlockup_disable_ptr = watchdog_buddy_hardlockup_disable;
> > +               watchdog_hardlockup_probe_ptr = watchdog_buddy_hardlockup_probe;
> > +       }
> > +       return 1;
> > +}
> > +
> > +__setup("hardlockup_detector=", set_hardlockup_detector_type);
> > +
> > +#endif
> > +
> >  # ifdef CONFIG_SMP
> >  int __read_mostly sysctl_hardlockup_all_cpu_backtrace;
> >  # endif /* CONFIG_SMP */
> > @@ -262,9 +293,17 @@ static inline void watchdog_hardlockup_kick(void) { }
> >   * softlockup watchdog start and stop. The detector must select the
> >   * SOFTLOCKUP_DETECTOR Kconfig.
> >   */
> > -void __weak watchdog_hardlockup_enable(unsigned int cpu) { }
> > +void __weak watchdog_hardlockup_enable(unsigned int cpu)
> > +{
> > +       if (watchdog_hardlockup_enable_ptr)
> > +               watchdog_hardlockup_enable_ptr(cpu);
> > +}
> >
> > -void __weak watchdog_hardlockup_disable(unsigned int cpu) { }
> > +void __weak watchdog_hardlockup_disable(unsigned int cpu)
> > +{
> > +       if (watchdog_hardlockup_disable_ptr)
> > +               watchdog_hardlockup_disable_ptr(cpu);
> > +}
> >
> >  /*
> >   * Watchdog-detector specific API.
> > @@ -275,6 +314,9 @@ void __weak watchdog_hardlockup_disable(unsigned int cpu) { }
> >   */
> >  int __weak __init watchdog_hardlockup_probe(void)
> >  {
> > +       if (watchdog_hardlockup_probe_ptr)
> > +               return watchdog_hardlockup_probe_ptr();
> > +
> >         return -ENODEV;
> >  }
> >
> > diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c
> > index ee754d767c21..390d89bfcafa 100644
> > --- a/kernel/watchdog_buddy.c
> > +++ b/kernel/watchdog_buddy.c
> > @@ -19,15 +19,16 @@ static unsigned int watchdog_next_cpu(unsigned int cpu)
> >         return next_cpu;
> >  }
> >
> > -int __init watchdog_hardlockup_probe(void)
> > +int __init watchdog_buddy_hardlockup_probe(void)
> >  {
> >         return 0;
> >  }
> >
> > -void watchdog_hardlockup_enable(unsigned int cpu)
> > +void watchdog_buddy_hardlockup_enable(unsigned int cpu)
> >  {
> >         unsigned int next_cpu;
> >
> > +       pr_info("ddddd %s\n", __func__);
> >         /*
> >          * The new CPU will be marked online before the hrtimer interrupt
> >          * gets a chance to run on it. If another CPU tests for a
> > @@ -58,7 +59,7 @@ void watchdog_hardlockup_enable(unsigned int cpu)
> >         cpumask_set_cpu(cpu, &watchdog_cpus);
> >  }
> >
> > -void watchdog_hardlockup_disable(unsigned int cpu)
> > +void watchdog_buddy_hardlockup_disable(unsigned int cpu)
> >  {
> >         unsigned int next_cpu = watchdog_next_cpu(cpu);
> >
> > diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
> > index 9c58f5b4381d..270110e58f20 100644
> > --- a/kernel/watchdog_perf.c
> > +++ b/kernel/watchdog_perf.c
> > @@ -153,10 +153,12 @@ static int hardlockup_detector_event_create(void)
> >   * watchdog_hardlockup_enable - Enable the local event
> >   * @cpu: The CPU to enable hard lockup on.
> >   */
> > -void watchdog_hardlockup_enable(unsigned int cpu)
> > +void watchdog_perf_hardlockup_enable(unsigned int cpu)
> >  {
> >         WARN_ON_ONCE(cpu != smp_processor_id());
> >
> > +       pr_info("ddddd %s\n", __func__);
> > +
> >         if (hardlockup_detector_event_create())
> >                 return;
> >
> > @@ -172,7 +174,7 @@ void watchdog_hardlockup_enable(unsigned int cpu)
> >   * watchdog_hardlockup_disable - Disable the local event
> >   * @cpu: The CPU to enable hard lockup on.
> >   */
> > -void watchdog_hardlockup_disable(unsigned int cpu)
> > +void watchdog_perf_hardlockup_disable(unsigned int cpu)
> >  {
> >         struct perf_event *event = this_cpu_read(watchdog_ev);
> >
> > @@ -257,10 +259,12 @@ bool __weak __init arch_perf_nmi_is_available(void)
> >  /**
> >   * watchdog_hardlockup_probe - Probe whether NMI event is available at all
> >   */
> > -int __init watchdog_hardlockup_probe(void)
> > +int __init watchdog_perf_hardlockup_probe(void)
> >  {
> >         int ret;
> >
> > +       pr_info("ddddd %s\n", __func__);
> > +
> >         if (!arch_perf_nmi_is_available())
> >                 return -ENODEV;
> >
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index dc0e0c6ed075..443353fad1c1 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -1167,36 +1167,33 @@ config HARDLOCKUP_DETECTOR
> >  #
> >  # Note that arch-specific variants are always preferred.
> >  #
> > -config HARDLOCKUP_DETECTOR_PREFER_BUDDY
> > -       bool "Prefer the buddy CPU hardlockup detector"
> > -       depends on HARDLOCKUP_DETECTOR
> > -       depends on HAVE_HARDLOCKUP_DETECTOR_PERF && HAVE_HARDLOCKUP_DETECTOR_BUDDY
> > -       depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH
> > -       help
> > -         Say Y here to prefer the buddy hardlockup detector over the perf one.
> > -
> > -         With the buddy detector, each CPU uses its softlockup hrtimer
> > -         to check that the next CPU is processing hrtimer interrupts by
> > -         verifying that a counter is increasing.
> > -
> > -         This hardlockup detector is useful on systems that don't have
> > -         an arch-specific hardlockup detector or if resources needed
> > -         for the hardlockup detector are better used for other things.
> > -
> >  config HARDLOCKUP_DETECTOR_PERF
> > -       bool
> > +       bool "Enable perf-based hard lockup detector (preferred)"
> >         depends on HARDLOCKUP_DETECTOR
> > -       depends on HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY
> > +       depends on HAVE_HARDLOCKUP_DETECTOR_PERF
> >         depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH
> >         select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
> > +       help
> > +         This detector uses a perf event on the CPU to detect when a CPU
> > +         has become non-maskable interrupt (NMI) stuck. This is the
> > +         preferred method on modern systems as it can detect lockups on
> > +         all CPUs at the same time.
> 
> I'd say this option should be the default for kernel developers but
> shouldn't be used by default to free the perf event and due to the
> extra power overhead.
> 
> Thanks,
> Ian
> 
> >  config HARDLOCKUP_DETECTOR_BUDDY
> > -       bool
> > +       bool "Enable buddy-based hard lockup detector"
> >         depends on HARDLOCKUP_DETECTOR
> >         depends on HAVE_HARDLOCKUP_DETECTOR_BUDDY
> > -       depends on !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY
> >         depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH
> >         select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
> > +       help
> > +         This is an alternative lockup detector that uses a heartbeat
> > +         mechanism between CPUs to detect when one has stopped responding.
> > +         It is less precise than the perf-based detector and cannot detect
> > +         all-CPU lockups, but it does not require a perf counter.
> > +
> > +config CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE
> > +       bool
> > +       depends on HARDLOCKUP_DETECTOR_PERF && HARDLOCKUP_DETECTOR_BUDDY
> >
> >  config HARDLOCKUP_DETECTOR_ARCH
> >         bool
> > --
> > 2.43.0
> >

-- 
Jinchao

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
  2025-09-17  1:47     ` Jinchao Wang
@ 2025-09-17  5:13       ` Ian Rogers
  2025-09-17  5:35         ` Namhyung Kim
  0 siblings, 1 reply; 22+ messages in thread
From: Ian Rogers @ 2025-09-17  5:13 UTC (permalink / raw)
  To: Jinchao Wang
  Cc: Doug Anderson, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm,
	catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx,
	mingo, bp, dave.hansen, hpa, acme, namhyung, mark.rutland,
	alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees,
	masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie,
	gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd,
	zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1,
	luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86,
	linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users

On Tue, Sep 16, 2025 at 6:47 PM Jinchao Wang <wangjinchao600@gmail.com> wrote:
>
> On Tue, Sep 16, 2025 at 05:03:48PM -0700, Ian Rogers wrote:
> > On Tue, Sep 16, 2025 at 7:51 AM Jinchao Wang <wangjinchao600@gmail.com> wrote:
> > >
> > > Currently, the hard lockup detector is selected at compile time via
> > > Kconfig, which requires a kernel rebuild to switch implementations.
> > > This is inflexible, especially on systems where a perf event may not
> > > be available or may be needed for other tasks.
> > >
> > > This commit refactors the hard lockup detector to replace a rigid
> > > compile-time choice with a flexible build-time and boot-time solution.
> > > The patch supports building the kernel with either detector
> > > independently, or with both. When both are built, a new boot parameter
> > > `hardlockup_detector="perf|buddy"` allows the selection at boot time.
> > > This is a more robust and user-friendly design.
> > >
> > > This patch is a follow-up to the discussion on the kernel mailing list
> > > regarding the preference and future of the hard lockup detectors. It
> > > implements a flexible solution that addresses the community's need to
> > > select an appropriate detector at boot time.
> > >
> > > The core changes are:
> > > - The `perf` and `buddy` watchdog implementations are separated into
> > >   distinct functions (e.g., `watchdog_perf_hardlockup_enable`).
> > > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`)
> > >   to serve as a single API for the entire feature.
> > > - A new `hardlockup_detector=` boot parameter is added to allow the
> > >   user to select the desired detector at boot time.
> > > - The Kconfig options are simplified by removing the complex
> > >   `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be
> > >   built without mutual exclusion.
> > > - The weak stubs are updated to call the new function pointers,
> > >   centralizing the watchdog logic.
> >
> > What is the impact on  /proc/sys/kernel/nmi_watchdog ? Is that
> > enabling and disabling whatever the boot time choice was? I'm not sure
> > why this has to be a boot time option given the ability to configure
> > via /proc/sys/kernel/nmi_watchdog.
> The new hardlockup_detector boot parameter and the existing
> /proc/sys/kernel/nmi_watchdog file serve different purposes.
>
> The boot parameter selects the type of hard lockup detector (perf or buddy).
> This choice is made once at boot.
>
>  /proc/sys/kernel/nmi_watchdog, on the other hand, is only a simple on/off
> switch for the currently selected detector. It does not change the detector's
> type.

So the name "nmi_watchdog" for the buddy watchdog is wrong for fairly
obvious naming reasons but also because we can't differentiate when a
perf event has been taken or not - this impacts perf that is choosing
not to group events in metrics because of it, reducing the metric's
accuracy. We need an equivalent "buddy_watchdog" file to the
"nmi_watchdog" file. If we have such a file then if I did "echo 1 >
/proc/sys/kernel/nmi_watchdog" I'd expect the buddy watchdog to be
disabled and the perf event one to be enabled. Similarly, if I did
"echo 1 > /proc/sys/kernel/buddy_watchdog" then I would expect the
perf event watchdog to be disabled and the buddy one enabled. If I did
 "echo 0 > /proc/sys/kernel/nmi_watchdog; echo 0 >
/proc/sys/kernel/buddy_watchdog" then I'd expect neither to be
enabled. I don't see why choosing the type of watchdog implementation
at boot time is particularly desirable. It seems sensible to default
normal people to using the buddy watchdog (more perf events, power...)
and  CONFIG_DEBUG_KERNEL type people to using the perf event one. As
the "nmi_watchdog" file may be assumed to control the buddy watchdog,
perhaps a compatibility option (where the "nmi_watchdog" file controls
the buddy watchdog) is needed so that user code has time to migrate.

Thanks,
Ian

> >
> > > Link: https://lore.kernel.org/all/20250915035355.10846-1-cuiyunhui@bytedance.com/
> > > Link: https://lore.kernel.org/all/CAD=FV=WWUiCi6bZCs_gseFpDDWNkuJMoL6XCftEo6W7q6jRCkg@mail.gmail.com/
> > >
> > > Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
> > > ---
> > >  .../admin-guide/kernel-parameters.txt         |  7 +++
> > >  include/linux/nmi.h                           |  6 +++
> > >  kernel/watchdog.c                             | 46 ++++++++++++++++++-
> > >  kernel/watchdog_buddy.c                       |  7 +--
> > >  kernel/watchdog_perf.c                        | 10 ++--
> > >  lib/Kconfig.debug                             | 37 +++++++--------
> > >  6 files changed, 85 insertions(+), 28 deletions(-)
> > >
> > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > index 5a7a83c411e9..0af214ee566c 100644
> > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > @@ -1828,6 +1828,13 @@
> > >                         backtraces on all cpus.
> > >                         Format: 0 | 1
> > >
> > > +       hardlockup_detector=
> > > +                       [perf, buddy] Selects the hard lockup detector to use at
> > > +                       boot time.
> > > +                       Format: <string>
> > > +                       - "perf": Use the perf-based detector.
> > > +                       - "buddy": Use the buddy-based detector.
> > > +
> > >         hash_pointers=
> > >                         [KNL,EARLY]
> > >                         By default, when pointers are printed to the console
> > > diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> > > index cf3c6ab408aa..9298980ce572 100644
> > > --- a/include/linux/nmi.h
> > > +++ b/include/linux/nmi.h
> > > @@ -100,6 +100,9 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs);
> > >  #endif
> > >
> > >  #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> > > +void watchdog_perf_hardlockup_enable(unsigned int cpu);
> > > +void watchdog_perf_hardlockup_disable(unsigned int cpu);
> > > +extern int watchdog_perf_hardlockup_probe(void);
> > >  extern void hardlockup_detector_perf_stop(void);
> > >  extern void hardlockup_detector_perf_restart(void);
> > >  extern void hardlockup_config_perf_event(const char *str);
> > > @@ -120,6 +123,9 @@ void watchdog_hardlockup_disable(unsigned int cpu);
> > >  void lockup_detector_reconfigure(void);
> > >
> > >  #ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY
> > > +void watchdog_buddy_hardlockup_enable(unsigned int cpu);
> > > +void watchdog_buddy_hardlockup_disable(unsigned int cpu);
> > > +int watchdog_buddy_hardlockup_probe(void);
> > >  void watchdog_buddy_check_hardlockup(int hrtimer_interrupts);
> > >  #else
> > >  static inline void watchdog_buddy_check_hardlockup(int hrtimer_interrupts) {}
> > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > > index 80b56c002c7f..85451d24a77d 100644
> > > --- a/kernel/watchdog.c
> > > +++ b/kernel/watchdog.c
> > > @@ -55,6 +55,37 @@ unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
> > >
> > >  #ifdef CONFIG_HARDLOCKUP_DETECTOR
> > >
> > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
> > > +/* The global function pointers */
> > > +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_enable;
> > > +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_disable;
> > > +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_perf_hardlockup_probe;
> > > +#elif defined(CONFIG_HARDLOCKUP_DETECTOR_BUDDY)
> > > +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_enable;
> > > +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_disable;
> > > +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_buddy_hardlockup_probe;
> > > +#endif
> > > +
> > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE
> > > +static char *hardlockup_detector_type = "perf"; /* Default to perf */
> > > +static int __init set_hardlockup_detector_type(char *str)
> > > +{
> > > +       if (!strncmp(str, "perf", 4)) {
> > > +               watchdog_hardlockup_enable_ptr = watchdog_perf_hardlockup_enable;
> > > +               watchdog_hardlockup_disable_ptr = watchdog_perf_hardlockup_disable;
> > > +               watchdog_hardlockup_probe_ptr = watchdog_perf_hardlockup_probe;
> > > +       } else if (!strncmp(str, "buddy", 5)) {
> > > +               watchdog_hardlockup_enable_ptr = watchdog_buddy_hardlockup_enable;
> > > +               watchdog_hardlockup_disable_ptr = watchdog_buddy_hardlockup_disable;
> > > +               watchdog_hardlockup_probe_ptr = watchdog_buddy_hardlockup_probe;
> > > +       }
> > > +       return 1;
> > > +}
> > > +
> > > +__setup("hardlockup_detector=", set_hardlockup_detector_type);
> > > +
> > > +#endif
> > > +
> > >  # ifdef CONFIG_SMP
> > >  int __read_mostly sysctl_hardlockup_all_cpu_backtrace;
> > >  # endif /* CONFIG_SMP */
> > > @@ -262,9 +293,17 @@ static inline void watchdog_hardlockup_kick(void) { }
> > >   * softlockup watchdog start and stop. The detector must select the
> > >   * SOFTLOCKUP_DETECTOR Kconfig.
> > >   */
> > > -void __weak watchdog_hardlockup_enable(unsigned int cpu) { }
> > > +void __weak watchdog_hardlockup_enable(unsigned int cpu)
> > > +{
> > > +       if (watchdog_hardlockup_enable_ptr)
> > > +               watchdog_hardlockup_enable_ptr(cpu);
> > > +}
> > >
> > > -void __weak watchdog_hardlockup_disable(unsigned int cpu) { }
> > > +void __weak watchdog_hardlockup_disable(unsigned int cpu)
> > > +{
> > > +       if (watchdog_hardlockup_disable_ptr)
> > > +               watchdog_hardlockup_disable_ptr(cpu);
> > > +}
> > >
> > >  /*
> > >   * Watchdog-detector specific API.
> > > @@ -275,6 +314,9 @@ void __weak watchdog_hardlockup_disable(unsigned int cpu) { }
> > >   */
> > >  int __weak __init watchdog_hardlockup_probe(void)
> > >  {
> > > +       if (watchdog_hardlockup_probe_ptr)
> > > +               return watchdog_hardlockup_probe_ptr();
> > > +
> > >         return -ENODEV;
> > >  }
> > >
> > > diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c
> > > index ee754d767c21..390d89bfcafa 100644
> > > --- a/kernel/watchdog_buddy.c
> > > +++ b/kernel/watchdog_buddy.c
> > > @@ -19,15 +19,16 @@ static unsigned int watchdog_next_cpu(unsigned int cpu)
> > >         return next_cpu;
> > >  }
> > >
> > > -int __init watchdog_hardlockup_probe(void)
> > > +int __init watchdog_buddy_hardlockup_probe(void)
> > >  {
> > >         return 0;
> > >  }
> > >
> > > -void watchdog_hardlockup_enable(unsigned int cpu)
> > > +void watchdog_buddy_hardlockup_enable(unsigned int cpu)
> > >  {
> > >         unsigned int next_cpu;
> > >
> > > +       pr_info("ddddd %s\n", __func__);
> > >         /*
> > >          * The new CPU will be marked online before the hrtimer interrupt
> > >          * gets a chance to run on it. If another CPU tests for a
> > > @@ -58,7 +59,7 @@ void watchdog_hardlockup_enable(unsigned int cpu)
> > >         cpumask_set_cpu(cpu, &watchdog_cpus);
> > >  }
> > >
> > > -void watchdog_hardlockup_disable(unsigned int cpu)
> > > +void watchdog_buddy_hardlockup_disable(unsigned int cpu)
> > >  {
> > >         unsigned int next_cpu = watchdog_next_cpu(cpu);
> > >
> > > diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
> > > index 9c58f5b4381d..270110e58f20 100644
> > > --- a/kernel/watchdog_perf.c
> > > +++ b/kernel/watchdog_perf.c
> > > @@ -153,10 +153,12 @@ static int hardlockup_detector_event_create(void)
> > >   * watchdog_hardlockup_enable - Enable the local event
> > >   * @cpu: The CPU to enable hard lockup on.
> > >   */
> > > -void watchdog_hardlockup_enable(unsigned int cpu)
> > > +void watchdog_perf_hardlockup_enable(unsigned int cpu)
> > >  {
> > >         WARN_ON_ONCE(cpu != smp_processor_id());
> > >
> > > +       pr_info("ddddd %s\n", __func__);
> > > +
> > >         if (hardlockup_detector_event_create())
> > >                 return;
> > >
> > > @@ -172,7 +174,7 @@ void watchdog_hardlockup_enable(unsigned int cpu)
> > >   * watchdog_hardlockup_disable - Disable the local event
> > >   * @cpu: The CPU to enable hard lockup on.
> > >   */
> > > -void watchdog_hardlockup_disable(unsigned int cpu)
> > > +void watchdog_perf_hardlockup_disable(unsigned int cpu)
> > >  {
> > >         struct perf_event *event = this_cpu_read(watchdog_ev);
> > >
> > > @@ -257,10 +259,12 @@ bool __weak __init arch_perf_nmi_is_available(void)
> > >  /**
> > >   * watchdog_hardlockup_probe - Probe whether NMI event is available at all
> > >   */
> > > -int __init watchdog_hardlockup_probe(void)
> > > +int __init watchdog_perf_hardlockup_probe(void)
> > >  {
> > >         int ret;
> > >
> > > +       pr_info("ddddd %s\n", __func__);
> > > +
> > >         if (!arch_perf_nmi_is_available())
> > >                 return -ENODEV;
> > >
> > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > > index dc0e0c6ed075..443353fad1c1 100644
> > > --- a/lib/Kconfig.debug
> > > +++ b/lib/Kconfig.debug
> > > @@ -1167,36 +1167,33 @@ config HARDLOCKUP_DETECTOR
> > >  #
> > >  # Note that arch-specific variants are always preferred.
> > >  #
> > > -config HARDLOCKUP_DETECTOR_PREFER_BUDDY
> > > -       bool "Prefer the buddy CPU hardlockup detector"
> > > -       depends on HARDLOCKUP_DETECTOR
> > > -       depends on HAVE_HARDLOCKUP_DETECTOR_PERF && HAVE_HARDLOCKUP_DETECTOR_BUDDY
> > > -       depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH
> > > -       help
> > > -         Say Y here to prefer the buddy hardlockup detector over the perf one.
> > > -
> > > -         With the buddy detector, each CPU uses its softlockup hrtimer
> > > -         to check that the next CPU is processing hrtimer interrupts by
> > > -         verifying that a counter is increasing.
> > > -
> > > -         This hardlockup detector is useful on systems that don't have
> > > -         an arch-specific hardlockup detector or if resources needed
> > > -         for the hardlockup detector are better used for other things.
> > > -
> > >  config HARDLOCKUP_DETECTOR_PERF
> > > -       bool
> > > +       bool "Enable perf-based hard lockup detector (preferred)"
> > >         depends on HARDLOCKUP_DETECTOR
> > > -       depends on HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY
> > > +       depends on HAVE_HARDLOCKUP_DETECTOR_PERF
> > >         depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH
> > >         select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
> > > +       help
> > > +         This detector uses a perf event on the CPU to detect when a CPU
> > > +         has become non-maskable interrupt (NMI) stuck. This is the
> > > +         preferred method on modern systems as it can detect lockups on
> > > +         all CPUs at the same time.
> >
> > I'd say this option should be the default for kernel developers but
> > shouldn't be used by default to free the perf event and due to the
> > extra power overhead.
> >
> > Thanks,
> > Ian
> >
> > >  config HARDLOCKUP_DETECTOR_BUDDY
> > > -       bool
> > > +       bool "Enable buddy-based hard lockup detector"
> > >         depends on HARDLOCKUP_DETECTOR
> > >         depends on HAVE_HARDLOCKUP_DETECTOR_BUDDY
> > > -       depends on !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY
> > >         depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH
> > >         select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
> > > +       help
> > > +         This is an alternative lockup detector that uses a heartbeat
> > > +         mechanism between CPUs to detect when one has stopped responding.
> > > +         It is less precise than the perf-based detector and cannot detect
> > > +         all-CPU lockups, but it does not require a perf counter.
> > > +
> > > +config CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE
> > > +       bool
> > > +       depends on HARDLOCKUP_DETECTOR_PERF && HARDLOCKUP_DETECTOR_BUDDY
> > >
> > >  config HARDLOCKUP_DETECTOR_ARCH
> > >         bool
> > > --
> > > 2.43.0
> > >
>
> --
> Jinchao

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
  2025-09-17  5:13       ` Ian Rogers
@ 2025-09-17  5:35         ` Namhyung Kim
  2025-09-17  6:14           ` Jinchao Wang
  0 siblings, 1 reply; 22+ messages in thread
From: Namhyung Kim @ 2025-09-17  5:35 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Jinchao Wang, Doug Anderson, Peter Zijlstra, Will Deacon,
	Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin,
	christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme,
	mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang,
	kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur,
	ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong,
	gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe,
	yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov,
	thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev,
	linux-perf-users

Hello,

On Tue, Sep 16, 2025 at 10:13:12PM -0700, Ian Rogers wrote:
> On Tue, Sep 16, 2025 at 6:47 PM Jinchao Wang <wangjinchao600@gmail.com> wrote:
> >
> > On Tue, Sep 16, 2025 at 05:03:48PM -0700, Ian Rogers wrote:
> > > On Tue, Sep 16, 2025 at 7:51 AM Jinchao Wang <wangjinchao600@gmail.com> wrote:
> > > >
> > > > Currently, the hard lockup detector is selected at compile time via
> > > > Kconfig, which requires a kernel rebuild to switch implementations.
> > > > This is inflexible, especially on systems where a perf event may not
> > > > be available or may be needed for other tasks.
> > > >
> > > > This commit refactors the hard lockup detector to replace a rigid
> > > > compile-time choice with a flexible build-time and boot-time solution.
> > > > The patch supports building the kernel with either detector
> > > > independently, or with both. When both are built, a new boot parameter
> > > > `hardlockup_detector="perf|buddy"` allows the selection at boot time.
> > > > This is a more robust and user-friendly design.
> > > >
> > > > This patch is a follow-up to the discussion on the kernel mailing list
> > > > regarding the preference and future of the hard lockup detectors. It
> > > > implements a flexible solution that addresses the community's need to
> > > > select an appropriate detector at boot time.
> > > >
> > > > The core changes are:
> > > > - The `perf` and `buddy` watchdog implementations are separated into
> > > >   distinct functions (e.g., `watchdog_perf_hardlockup_enable`).
> > > > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`)
> > > >   to serve as a single API for the entire feature.
> > > > - A new `hardlockup_detector=` boot parameter is added to allow the
> > > >   user to select the desired detector at boot time.
> > > > - The Kconfig options are simplified by removing the complex
> > > >   `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be
> > > >   built without mutual exclusion.
> > > > - The weak stubs are updated to call the new function pointers,
> > > >   centralizing the watchdog logic.
> > >
> > > What is the impact on  /proc/sys/kernel/nmi_watchdog ? Is that
> > > enabling and disabling whatever the boot time choice was? I'm not sure
> > > why this has to be a boot time option given the ability to configure
> > > via /proc/sys/kernel/nmi_watchdog.
> > The new hardlockup_detector boot parameter and the existing
> > /proc/sys/kernel/nmi_watchdog file serve different purposes.
> >
> > The boot parameter selects the type of hard lockup detector (perf or buddy).
> > This choice is made once at boot.
> >
> >  /proc/sys/kernel/nmi_watchdog, on the other hand, is only a simple on/off
> > switch for the currently selected detector. It does not change the detector's
> > type.
> 
> So the name "nmi_watchdog" for the buddy watchdog is wrong for fairly
> obvious naming reasons but also because we can't differentiate when a
> perf event has been taken or not - this impacts perf that is choosing
> not to group events in metrics because of it, reducing the metric's
> accuracy. We need an equivalent "buddy_watchdog" file to the
> "nmi_watchdog" file. If we have such a file then if I did "echo 1 >
> /proc/sys/kernel/nmi_watchdog" I'd expect the buddy watchdog to be
> disabled and the perf event one to be enabled. Similarly, if I did
> "echo 1 > /proc/sys/kernel/buddy_watchdog" then I would expect the
> perf event watchdog to be disabled and the buddy one enabled. If I did
>  "echo 0 > /proc/sys/kernel/nmi_watchdog; echo 0 >
> /proc/sys/kernel/buddy_watchdog" then I'd expect neither to be
> enabled. I don't see why choosing the type of watchdog implementation
> at boot time is particularly desirable. It seems sensible to default
> normal people to using the buddy watchdog (more perf events, power...)
> and  CONFIG_DEBUG_KERNEL type people to using the perf event one. As
> the "nmi_watchdog" file may be assumed to control the buddy watchdog,
> perhaps a compatibility option (where the "nmi_watchdog" file controls
> the buddy watchdog) is needed so that user code has time to migrate.

Sounds good to me.  For perf tools, it'd be great if we can have a run-
time check which watchdog is selected.

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
  2025-09-16 14:50 ` [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector Jinchao Wang
  2025-09-17  0:03   ` Ian Rogers
@ 2025-09-17  6:08   ` Christophe Leroy
  2025-09-17  6:54     ` Jinchao Wang
  1 sibling, 1 reply; 22+ messages in thread
From: Christophe Leroy @ 2025-09-17  6:08 UTC (permalink / raw)
  To: Jinchao Wang, Doug Anderson, Peter Zijlstra, Will Deacon,
	Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, tglx,
	mingo, bp, dave.hansen, hpa, acme, namhyung, mark.rutland,
	alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees,
	masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie,
	gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd,
	zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1,
	luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86,
	linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users,
	Ian Rogers



Le 16/09/2025 à 16:50, Jinchao Wang a écrit :
> Currently, the hard lockup detector is selected at compile time via
> Kconfig, which requires a kernel rebuild to switch implementations.
> This is inflexible, especially on systems where a perf event may not
> be available or may be needed for other tasks.
> 
> This commit refactors the hard lockup detector to replace a rigid
> compile-time choice with a flexible build-time and boot-time solution.
> The patch supports building the kernel with either detector
> independently, or with both. When both are built, a new boot parameter
> `hardlockup_detector="perf|buddy"` allows the selection at boot time.
> This is a more robust and user-friendly design.
> 
> This patch is a follow-up to the discussion on the kernel mailing list
> regarding the preference and future of the hard lockup detectors. It
> implements a flexible solution that addresses the community's need to
> select an appropriate detector at boot time.
> 
> The core changes are:
> - The `perf` and `buddy` watchdog implementations are separated into
>    distinct functions (e.g., `watchdog_perf_hardlockup_enable`).
> - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`)
>    to serve as a single API for the entire feature.
> - A new `hardlockup_detector=` boot parameter is added to allow the
>    user to select the desired detector at boot time.
> - The Kconfig options are simplified by removing the complex
>    `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be
>    built without mutual exclusion.
> - The weak stubs are updated to call the new function pointers,
>    centralizing the watchdog logic.
> 
> Link: https://lore.kernel.org/all/20250915035355.10846-1-cuiyunhui@bytedance.com/
> Link: https://lore.kernel.org/all/CAD=FV=WWUiCi6bZCs_gseFpDDWNkuJMoL6XCftEo6W7q6jRCkg@mail.gmail.com/
> 
> Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
> ---
>   .../admin-guide/kernel-parameters.txt         |  7 +++
>   include/linux/nmi.h                           |  6 +++
>   kernel/watchdog.c                             | 46 ++++++++++++++++++-
>   kernel/watchdog_buddy.c                       |  7 +--
>   kernel/watchdog_perf.c                        | 10 ++--
>   lib/Kconfig.debug                             | 37 +++++++--------
>   6 files changed, 85 insertions(+), 28 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 5a7a83c411e9..0af214ee566c 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1828,6 +1828,13 @@
>   			backtraces on all cpus.
>   			Format: 0 | 1
>   
> +	hardlockup_detector=
> +			[perf, buddy] Selects the hard lockup detector to use at
> +			boot time.
> +			Format: <string>
> +			- "perf": Use the perf-based detector.
> +			- "buddy": Use the buddy-based detector.
> +
>   	hash_pointers=
>   			[KNL,EARLY]
>   			By default, when pointers are printed to the console
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index cf3c6ab408aa..9298980ce572 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -100,6 +100,9 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs);
>   #endif
>   
>   #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> +void watchdog_perf_hardlockup_enable(unsigned int cpu);
> +void watchdog_perf_hardlockup_disable(unsigned int cpu);
> +extern int watchdog_perf_hardlockup_probe(void);

No 'extern' on function prototypes, this is pointless.

>   extern void hardlockup_detector_perf_stop(void);
>   extern void hardlockup_detector_perf_restart(void);
>   extern void hardlockup_config_perf_event(const char *str);
> @@ -120,6 +123,9 @@ void watchdog_hardlockup_disable(unsigned int cpu);
>   void lockup_detector_reconfigure(void);
>   
>   #ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY
> +void watchdog_buddy_hardlockup_enable(unsigned int cpu);
> +void watchdog_buddy_hardlockup_disable(unsigned int cpu);
> +int watchdog_buddy_hardlockup_probe(void);
>   void watchdog_buddy_check_hardlockup(int hrtimer_interrupts);
>   #else
>   static inline void watchdog_buddy_check_hardlockup(int hrtimer_interrupts) {}
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 80b56c002c7f..85451d24a77d 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -55,6 +55,37 @@ unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
>   
>   #ifdef CONFIG_HARDLOCKUP_DETECTOR
>   
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
> +/* The global function pointers */
> +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_enable;
> +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_disable;
> +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_perf_hardlockup_probe;

As this is set only once at startup, can we use static_call instead of 
function pointers ?

Also, can it me made __ro_after_init ?

> +#elif defined(CONFIG_HARDLOCKUP_DETECTOR_BUDDY)
> +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_enable;
> +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_disable;
> +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_buddy_hardlockup_probe;
> +#endif
> +
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE
> +static char *hardlockup_detector_type = "perf"; /* Default to perf */
> +static int __init set_hardlockup_detector_type(char *str)
> +{
> +	if (!strncmp(str, "perf", 4)) {

Why strncmp ?

What if I set 'hardlockup_detector=performance" ?


> +		watchdog_hardlockup_enable_ptr = watchdog_perf_hardlockup_enable;
> +		watchdog_hardlockup_disable_ptr = watchdog_perf_hardlockup_disable;
> +		watchdog_hardlockup_probe_ptr = watchdog_perf_hardlockup_probe;
> +	} else if (!strncmp(str, "buddy", 5)) {
> +		watchdog_hardlockup_enable_ptr = watchdog_buddy_hardlockup_enable;
> +		watchdog_hardlockup_disable_ptr = watchdog_buddy_hardlockup_disable;
> +		watchdog_hardlockup_probe_ptr = watchdog_buddy_hardlockup_probe;
> +	}
> +	return 1;
> +}
> +
> +__setup("hardlockup_detector=", set_hardlockup_detector_type);
> +
> +#endif
> +
>   # ifdef CONFIG_SMP
>   int __read_mostly sysctl_hardlockup_all_cpu_backtrace;
>   # endif /* CONFIG_SMP */
> @@ -262,9 +293,17 @@ static inline void watchdog_hardlockup_kick(void) { }
>    * softlockup watchdog start and stop. The detector must select the
>    * SOFTLOCKUP_DETECTOR Kconfig.
>    */
> -void __weak watchdog_hardlockup_enable(unsigned int cpu) { }
> +void __weak watchdog_hardlockup_enable(unsigned int cpu)
> +{
> +	if (watchdog_hardlockup_enable_ptr)
> +		watchdog_hardlockup_enable_ptr(cpu);
> +}

This is a weak function so it can be overloaded. What happens then, for 
instance if the sparc architecture version of 
watchdog_hardlockup_enable() is called instead ?

>   
> -void __weak watchdog_hardlockup_disable(unsigned int cpu) { }
> +void __weak watchdog_hardlockup_disable(unsigned int cpu)
> +{
> +	if (watchdog_hardlockup_disable_ptr)
> +		watchdog_hardlockup_disable_ptr(cpu);
> +}
>   
>   /*
>    * Watchdog-detector specific API.
> @@ -275,6 +314,9 @@ void __weak watchdog_hardlockup_disable(unsigned int cpu) { }
>    */
>   int __weak __init watchdog_hardlockup_probe(void)
>   {
> +	if (watchdog_hardlockup_probe_ptr)
> +		return watchdog_hardlockup_probe_ptr();
> +
>   	return -ENODEV;
>   }
>   
> diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c
> index ee754d767c21..390d89bfcafa 100644
> --- a/kernel/watchdog_buddy.c
> +++ b/kernel/watchdog_buddy.c
> @@ -19,15 +19,16 @@ static unsigned int watchdog_next_cpu(unsigned int cpu)
>   	return next_cpu;
>   }
>   
> -int __init watchdog_hardlockup_probe(void)
> +int __init watchdog_buddy_hardlockup_probe(void)
>   {
>   	return 0;
>   }
>   
> -void watchdog_hardlockup_enable(unsigned int cpu)
> +void watchdog_buddy_hardlockup_enable(unsigned int cpu)
>   {
>   	unsigned int next_cpu;
>   
> +	pr_info("ddddd %s\n", __func__);

Leftover from debuging ?

>   	/*
>   	 * The new CPU will be marked online before the hrtimer interrupt
>   	 * gets a chance to run on it. If another CPU tests for a
> @@ -58,7 +59,7 @@ void watchdog_hardlockup_enable(unsigned int cpu)
>   	cpumask_set_cpu(cpu, &watchdog_cpus);
>   }
>   
> -void watchdog_hardlockup_disable(unsigned int cpu)
> +void watchdog_buddy_hardlockup_disable(unsigned int cpu)
>   {
>   	unsigned int next_cpu = watchdog_next_cpu(cpu);
>   
> diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
> index 9c58f5b4381d..270110e58f20 100644
> --- a/kernel/watchdog_perf.c
> +++ b/kernel/watchdog_perf.c
> @@ -153,10 +153,12 @@ static int hardlockup_detector_event_create(void)
>    * watchdog_hardlockup_enable - Enable the local event
>    * @cpu: The CPU to enable hard lockup on.
>    */
> -void watchdog_hardlockup_enable(unsigned int cpu)
> +void watchdog_perf_hardlockup_enable(unsigned int cpu)
>   {
>   	WARN_ON_ONCE(cpu != smp_processor_id());
>   
> +	pr_info("ddddd %s\n", __func__);
> +
>   	if (hardlockup_detector_event_create())
>   		return;
>   
> @@ -172,7 +174,7 @@ void watchdog_hardlockup_enable(unsigned int cpu)
>    * watchdog_hardlockup_disable - Disable the local event
>    * @cpu: The CPU to enable hard lockup on.
>    */
> -void watchdog_hardlockup_disable(unsigned int cpu)
> +void watchdog_perf_hardlockup_disable(unsigned int cpu)
>   {
>   	struct perf_event *event = this_cpu_read(watchdog_ev);
>   
> @@ -257,10 +259,12 @@ bool __weak __init arch_perf_nmi_is_available(void)
>   /**
>    * watchdog_hardlockup_probe - Probe whether NMI event is available at all
>    */
> -int __init watchdog_hardlockup_probe(void)
> +int __init watchdog_perf_hardlockup_probe(void)
>   {
>   	int ret;
>   
> +	pr_info("ddddd %s\n", __func__);
> +
>   	if (!arch_perf_nmi_is_available())
>   		return -ENODEV;
>   
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index dc0e0c6ed075..443353fad1c1 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1167,36 +1167,33 @@ config HARDLOCKUP_DETECTOR
>   #
>   # Note that arch-specific variants are always preferred.
>   #
> -config HARDLOCKUP_DETECTOR_PREFER_BUDDY
> -	bool "Prefer the buddy CPU hardlockup detector"
> -	depends on HARDLOCKUP_DETECTOR
> -	depends on HAVE_HARDLOCKUP_DETECTOR_PERF && HAVE_HARDLOCKUP_DETECTOR_BUDDY
> -	depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH
> -	help
> -	  Say Y here to prefer the buddy hardlockup detector over the perf one.
> -
> -	  With the buddy detector, each CPU uses its softlockup hrtimer
> -	  to check that the next CPU is processing hrtimer interrupts by
> -	  verifying that a counter is increasing.
> -
> -	  This hardlockup detector is useful on systems that don't have
> -	  an arch-specific hardlockup detector or if resources needed
> -	  for the hardlockup detector are better used for other things.
> -
>   config HARDLOCKUP_DETECTOR_PERF
> -	bool
> +	bool "Enable perf-based hard lockup detector (preferred)"
>   	depends on HARDLOCKUP_DETECTOR
> -	depends on HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY
> +	depends on HAVE_HARDLOCKUP_DETECTOR_PERF
>   	depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH
>   	select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
> +	help
> +	  This detector uses a perf event on the CPU to detect when a CPU
> +	  has become non-maskable interrupt (NMI) stuck. This is the
> +	  preferred method on modern systems as it can detect lockups on
> +	  all CPUs at the same time.
>   
>   config HARDLOCKUP_DETECTOR_BUDDY
> -	bool
> +	bool "Enable buddy-based hard lockup detector"
>   	depends on HARDLOCKUP_DETECTOR
>   	depends on HAVE_HARDLOCKUP_DETECTOR_BUDDY
> -	depends on !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY
>   	depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH
>   	select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
> +	help
> +	  This is an alternative lockup detector that uses a heartbeat
> +	  mechanism between CPUs to detect when one has stopped responding.
> +	  It is less precise than the perf-based detector and cannot detect
> +	  all-CPU lockups, but it does not require a perf counter.
> +
> +config CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE
> +	bool
> +	depends on HARDLOCKUP_DETECTOR_PERF && HARDLOCKUP_DETECTOR_BUDDY
>   
>   config HARDLOCKUP_DETECTOR_ARCH
>   	bool


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
  2025-09-17  5:35         ` Namhyung Kim
@ 2025-09-17  6:14           ` Jinchao Wang
  2025-10-06 21:29             ` Ian Rogers
  0 siblings, 1 reply; 22+ messages in thread
From: Jinchao Wang @ 2025-09-17  6:14 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ian Rogers, Doug Anderson, Peter Zijlstra, Will Deacon,
	Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin,
	christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme,
	mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang,
	kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur,
	ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong,
	gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe,
	yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov,
	thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev,
	linux-perf-users

On Tue, Sep 16, 2025 at 10:35:46PM -0700, Namhyung Kim wrote:
> Hello,
> 
> On Tue, Sep 16, 2025 at 10:13:12PM -0700, Ian Rogers wrote:
> > On Tue, Sep 16, 2025 at 6:47 PM Jinchao Wang <wangjinchao600@gmail.com> wrote:
> > >
> > > On Tue, Sep 16, 2025 at 05:03:48PM -0700, Ian Rogers wrote:
> > > > On Tue, Sep 16, 2025 at 7:51 AM Jinchao Wang <wangjinchao600@gmail.com> wrote:
> > > > >
> > > > > Currently, the hard lockup detector is selected at compile time via
> > > > > Kconfig, which requires a kernel rebuild to switch implementations.
> > > > > This is inflexible, especially on systems where a perf event may not
> > > > > be available or may be needed for other tasks.
> > > > >
> > > > > This commit refactors the hard lockup detector to replace a rigid
> > > > > compile-time choice with a flexible build-time and boot-time solution.
> > > > > The patch supports building the kernel with either detector
> > > > > independently, or with both. When both are built, a new boot parameter
> > > > > `hardlockup_detector="perf|buddy"` allows the selection at boot time.
> > > > > This is a more robust and user-friendly design.
> > > > >
> > > > > This patch is a follow-up to the discussion on the kernel mailing list
> > > > > regarding the preference and future of the hard lockup detectors. It
> > > > > implements a flexible solution that addresses the community's need to
> > > > > select an appropriate detector at boot time.
> > > > >
> > > > > The core changes are:
> > > > > - The `perf` and `buddy` watchdog implementations are separated into
> > > > >   distinct functions (e.g., `watchdog_perf_hardlockup_enable`).
> > > > > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`)
> > > > >   to serve as a single API for the entire feature.
> > > > > - A new `hardlockup_detector=` boot parameter is added to allow the
> > > > >   user to select the desired detector at boot time.
> > > > > - The Kconfig options are simplified by removing the complex
> > > > >   `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be
> > > > >   built without mutual exclusion.
> > > > > - The weak stubs are updated to call the new function pointers,
> > > > >   centralizing the watchdog logic.
> > > >
> > > > What is the impact on  /proc/sys/kernel/nmi_watchdog ? Is that
> > > > enabling and disabling whatever the boot time choice was? I'm not sure
> > > > why this has to be a boot time option given the ability to configure
> > > > via /proc/sys/kernel/nmi_watchdog.
> > > The new hardlockup_detector boot parameter and the existing
> > > /proc/sys/kernel/nmi_watchdog file serve different purposes.
> > >
> > > The boot parameter selects the type of hard lockup detector (perf or buddy).
> > > This choice is made once at boot.
> > >
> > >  /proc/sys/kernel/nmi_watchdog, on the other hand, is only a simple on/off
> > > switch for the currently selected detector. It does not change the detector's
> > > type.
> > 
> > So the name "nmi_watchdog" for the buddy watchdog is wrong for fairly
> > obvious naming reasons but also because we can't differentiate when a
> > perf event has been taken or not - this impacts perf that is choosing
> > not to group events in metrics because of it, reducing the metric's
> > accuracy. We need an equivalent "buddy_watchdog" file to the
> > "nmi_watchdog" file. If we have such a file then if I did "echo 1 >
> > /proc/sys/kernel/nmi_watchdog" I'd expect the buddy watchdog to be
> > disabled and the perf event one to be enabled. Similarly, if I did
> > "echo 1 > /proc/sys/kernel/buddy_watchdog" then I would expect the
> > perf event watchdog to be disabled and the buddy one enabled. If I did
> >  "echo 0 > /proc/sys/kernel/nmi_watchdog; echo 0 >
> > /proc/sys/kernel/buddy_watchdog" then I'd expect neither to be
> > enabled. I don't see why choosing the type of watchdog implementation
> > at boot time is particularly desirable. It seems sensible to default
> > normal people to using the buddy watchdog (more perf events, power...)
> > and  CONFIG_DEBUG_KERNEL type people to using the perf event one. As
> > the "nmi_watchdog" file may be assumed to control the buddy watchdog,
> > perhaps a compatibility option (where the "nmi_watchdog" file controls
> > the buddy watchdog) is needed so that user code has time to migrate.
> 
> Sounds good to me.  For perf tools, it'd be great if we can have a run-
> time check which watchdog is selected.
Considering backward compatibility, I prefer to keep
/proc/sys/kernel/nmi_watchdog and introduce a new file called
/proc/sys/kernel/hardlockup_detector_type, which only shows the default string
or the boot parameter.

The global str pointer hardlockup_detector_type was already introduced in the
patch, so exposing it in a file is straightforward.
> 
> Thanks,
> Namhyung
> 

-- 
Jinchao

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
  2025-09-17  6:08   ` Christophe Leroy
@ 2025-09-17  6:54     ` Jinchao Wang
  2025-10-06 20:13       ` Doug Anderson
  0 siblings, 1 reply; 22+ messages in thread
From: Jinchao Wang @ 2025-09-17  6:54 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Doug Anderson, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm,
	catalin.marinas, maddy, mpe, npiggin, tglx, mingo, bp,
	dave.hansen, hpa, acme, namhyung, mark.rutland,
	alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees,
	masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie,
	gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd,
	zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1,
	luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86,
	linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users,
	Ian Rogers

On Wed, Sep 17, 2025 at 08:08:57AM +0200, Christophe Leroy wrote:
> 
> 
> Le 16/09/2025 à 16:50, Jinchao Wang a écrit :
> > Currently, the hard lockup detector is selected at compile time via
> > Kconfig, which requires a kernel rebuild to switch implementations.
> > This is inflexible, especially on systems where a perf event may not
> > be available or may be needed for other tasks.
> > 
> > This commit refactors the hard lockup detector to replace a rigid
> > compile-time choice with a flexible build-time and boot-time solution.
> > The patch supports building the kernel with either detector
> > independently, or with both. When both are built, a new boot parameter
> > `hardlockup_detector="perf|buddy"` allows the selection at boot time.
> > This is a more robust and user-friendly design.
> > 
> > This patch is a follow-up to the discussion on the kernel mailing list
> > regarding the preference and future of the hard lockup detectors. It
> > implements a flexible solution that addresses the community's need to
> > select an appropriate detector at boot time.
> > 
> > The core changes are:
> > - The `perf` and `buddy` watchdog implementations are separated into
> >    distinct functions (e.g., `watchdog_perf_hardlockup_enable`).
> > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`)
> >    to serve as a single API for the entire feature.
> > - A new `hardlockup_detector=` boot parameter is added to allow the
> >    user to select the desired detector at boot time.
> > - The Kconfig options are simplified by removing the complex
> >    `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be
> >    built without mutual exclusion.
> > - The weak stubs are updated to call the new function pointers,
> >    centralizing the watchdog logic.
> > 
> > Link: https://lore.kernel.org/all/20250915035355.10846-1-cuiyunhui@bytedance.com/
> > Link: https://lore.kernel.org/all/CAD=FV=WWUiCi6bZCs_gseFpDDWNkuJMoL6XCftEo6W7q6jRCkg@mail.gmail.com/
> > 
> > Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
> > ---
> >   .../admin-guide/kernel-parameters.txt         |  7 +++
> >   include/linux/nmi.h                           |  6 +++
> >   kernel/watchdog.c                             | 46 ++++++++++++++++++-
> >   kernel/watchdog_buddy.c                       |  7 +--
> >   kernel/watchdog_perf.c                        | 10 ++--
> >   lib/Kconfig.debug                             | 37 +++++++--------
> >   6 files changed, 85 insertions(+), 28 deletions(-)
> > 
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index 5a7a83c411e9..0af214ee566c 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -1828,6 +1828,13 @@
> >   			backtraces on all cpus.
> >   			Format: 0 | 1
> > +	hardlockup_detector=
> > +			[perf, buddy] Selects the hard lockup detector to use at
> > +			boot time.
> > +			Format: <string>
> > +			- "perf": Use the perf-based detector.
> > +			- "buddy": Use the buddy-based detector.
> > +
> >   	hash_pointers=
> >   			[KNL,EARLY]
> >   			By default, when pointers are printed to the console
> > diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> > index cf3c6ab408aa..9298980ce572 100644
> > --- a/include/linux/nmi.h
> > +++ b/include/linux/nmi.h
> > @@ -100,6 +100,9 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs);
> >   #endif
> >   #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> > +void watchdog_perf_hardlockup_enable(unsigned int cpu);
> > +void watchdog_perf_hardlockup_disable(unsigned int cpu);
> > +extern int watchdog_perf_hardlockup_probe(void);
> 
> No 'extern' on function prototypes, this is pointless.
Got it.
> 
> >   extern void hardlockup_detector_perf_stop(void);
> >   extern void hardlockup_detector_perf_restart(void);
> >   extern void hardlockup_config_perf_event(const char *str);
> > @@ -120,6 +123,9 @@ void watchdog_hardlockup_disable(unsigned int cpu);
> >   void lockup_detector_reconfigure(void);
> >   #ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY
> > +void watchdog_buddy_hardlockup_enable(unsigned int cpu);
> > +void watchdog_buddy_hardlockup_disable(unsigned int cpu);
> > +int watchdog_buddy_hardlockup_probe(void);
> >   void watchdog_buddy_check_hardlockup(int hrtimer_interrupts);
> >   #else
> >   static inline void watchdog_buddy_check_hardlockup(int hrtimer_interrupts) {}
> > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > index 80b56c002c7f..85451d24a77d 100644
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -55,6 +55,37 @@ unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
> >   #ifdef CONFIG_HARDLOCKUP_DETECTOR
> > +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
> > +/* The global function pointers */
> > +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_enable;
> > +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_disable;
> > +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_perf_hardlockup_probe;
> 
> As this is set only once at startup, can we use static_call instead of
> function pointers ?
> 
> Also, can it me made __ro_after_init ?
Not really, this is just an RFC patch, and there is no consensus yet.
If it is included in the final consensus, I will handle it in the next version.
> 
> > +#elif defined(CONFIG_HARDLOCKUP_DETECTOR_BUDDY)
> > +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_enable;
> > +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_disable;
> > +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_buddy_hardlockup_probe;
> > +#endif
> > +
> > +#ifdef CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE
> > +static char *hardlockup_detector_type = "perf"; /* Default to perf */
> > +static int __init set_hardlockup_detector_type(char *str)
> > +{
> > +	if (!strncmp(str, "perf", 4)) {
> 
> Why strncmp ?
Copy from hardlockup_panic_setup().
> 
> What if I set 'hardlockup_detector=performance" ?
I think that is acceptable in this case.
> 
> 
> > +		watchdog_hardlockup_enable_ptr = watchdog_perf_hardlockup_enable;
> > +		watchdog_hardlockup_disable_ptr = watchdog_perf_hardlockup_disable;
> > +		watchdog_hardlockup_probe_ptr = watchdog_perf_hardlockup_probe;
> > +	} else if (!strncmp(str, "buddy", 5)) {
> > +		watchdog_hardlockup_enable_ptr = watchdog_buddy_hardlockup_enable;
> > +		watchdog_hardlockup_disable_ptr = watchdog_buddy_hardlockup_disable;
> > +		watchdog_hardlockup_probe_ptr = watchdog_buddy_hardlockup_probe;
> > +	}
> > +	return 1;
> > +}
> > +
> > +__setup("hardlockup_detector=", set_hardlockup_detector_type);
> > +
> > +#endif
> > +
> >   # ifdef CONFIG_SMP
> >   int __read_mostly sysctl_hardlockup_all_cpu_backtrace;
> >   # endif /* CONFIG_SMP */
> > @@ -262,9 +293,17 @@ static inline void watchdog_hardlockup_kick(void) { }
> >    * softlockup watchdog start and stop. The detector must select the
> >    * SOFTLOCKUP_DETECTOR Kconfig.
> >    */
> > -void __weak watchdog_hardlockup_enable(unsigned int cpu) { }
> > +void __weak watchdog_hardlockup_enable(unsigned int cpu)
> > +{
> > +	if (watchdog_hardlockup_enable_ptr)
> > +		watchdog_hardlockup_enable_ptr(cpu);
> > +}
> 
> This is a weak function so it can be overloaded. What happens then, for
> instance if the sparc architecture version of watchdog_hardlockup_enable()
> is called instead ?
It is a historical problem; I prefer using an #if condition instead. 
I had considered sparc arch, if sparc version is called, it is expected.
Because the __weak functions only handle perf & buddy watchdog not the sparc
watchdog.

I think we should first resolve the consensus issue:
- Should we keep both perf and buddy watchdogs? (probably yes already)
- Should the watchdog type be changeable at boot time?
- Should the watchdog type be changeable at runtime?

How we handle these different watchdog types(maybe including sparc type)
depends on the answers to these questions.
What do you think?

> 
> > -void __weak watchdog_hardlockup_disable(unsigned int cpu) { }
> > +void __weak watchdog_hardlockup_disable(unsigned int cpu)
> > +{
> > +	if (watchdog_hardlockup_disable_ptr)
> > +		watchdog_hardlockup_disable_ptr(cpu);
> > +}
> >   /*
> >    * Watchdog-detector specific API.
> > @@ -275,6 +314,9 @@ void __weak watchdog_hardlockup_disable(unsigned int cpu) { }
> >    */
> >   int __weak __init watchdog_hardlockup_probe(void)
> >   {
> > +	if (watchdog_hardlockup_probe_ptr)
> > +		return watchdog_hardlockup_probe_ptr();
> > +
> >   	return -ENODEV;
> >   }
> > diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c
> > index ee754d767c21..390d89bfcafa 100644
> > --- a/kernel/watchdog_buddy.c
> > +++ b/kernel/watchdog_buddy.c
> > @@ -19,15 +19,16 @@ static unsigned int watchdog_next_cpu(unsigned int cpu)
> >   	return next_cpu;
> >   }
> > -int __init watchdog_hardlockup_probe(void)
> > +int __init watchdog_buddy_hardlockup_probe(void)
> >   {
> >   	return 0;
> >   }
> > -void watchdog_hardlockup_enable(unsigned int cpu)
> > +void watchdog_buddy_hardlockup_enable(unsigned int cpu)
> >   {
> >   	unsigned int next_cpu;
> > +	pr_info("ddddd %s\n", __func__);
> 
> Leftover from debuging ?
Forgot to delete the log, will fix if a v2 is needed.
> 
> >   	/*
> >   	 * The new CPU will be marked online before the hrtimer interrupt
> >   	 * gets a chance to run on it. If another CPU tests for a
> > @@ -58,7 +59,7 @@ void watchdog_hardlockup_enable(unsigned int cpu)
> >   	cpumask_set_cpu(cpu, &watchdog_cpus);
> >   }
> > -void watchdog_hardlockup_disable(unsigned int cpu)
> > +void watchdog_buddy_hardlockup_disable(unsigned int cpu)
> >   {
> >   	unsigned int next_cpu = watchdog_next_cpu(cpu);
> > diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
> > index 9c58f5b4381d..270110e58f20 100644
> > --- a/kernel/watchdog_perf.c
> > +++ b/kernel/watchdog_perf.c
> > @@ -153,10 +153,12 @@ static int hardlockup_detector_event_create(void)
> >    * watchdog_hardlockup_enable - Enable the local event
> >    * @cpu: The CPU to enable hard lockup on.
> >    */
> > -void watchdog_hardlockup_enable(unsigned int cpu)
> > +void watchdog_perf_hardlockup_enable(unsigned int cpu)
> >   {
> >   	WARN_ON_ONCE(cpu != smp_processor_id());
> > +	pr_info("ddddd %s\n", __func__);
> > +
> >   	if (hardlockup_detector_event_create())
> >   		return;
> > @@ -172,7 +174,7 @@ void watchdog_hardlockup_enable(unsigned int cpu)
> >    * watchdog_hardlockup_disable - Disable the local event
> >    * @cpu: The CPU to enable hard lockup on.
> >    */
> > -void watchdog_hardlockup_disable(unsigned int cpu)
> > +void watchdog_perf_hardlockup_disable(unsigned int cpu)
> >   {
> >   	struct perf_event *event = this_cpu_read(watchdog_ev);
> > @@ -257,10 +259,12 @@ bool __weak __init arch_perf_nmi_is_available(void)
> >   /**
> >    * watchdog_hardlockup_probe - Probe whether NMI event is available at all
> >    */
> > -int __init watchdog_hardlockup_probe(void)
> > +int __init watchdog_perf_hardlockup_probe(void)
> >   {
> >   	int ret;
> > +	pr_info("ddddd %s\n", __func__);
> > +
> >   	if (!arch_perf_nmi_is_available())
> >   		return -ENODEV;
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index dc0e0c6ed075..443353fad1c1 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -1167,36 +1167,33 @@ config HARDLOCKUP_DETECTOR
> >   #
> >   # Note that arch-specific variants are always preferred.
> >   #
> > -config HARDLOCKUP_DETECTOR_PREFER_BUDDY
> > -	bool "Prefer the buddy CPU hardlockup detector"
> > -	depends on HARDLOCKUP_DETECTOR
> > -	depends on HAVE_HARDLOCKUP_DETECTOR_PERF && HAVE_HARDLOCKUP_DETECTOR_BUDDY
> > -	depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH
> > -	help
> > -	  Say Y here to prefer the buddy hardlockup detector over the perf one.
> > -
> > -	  With the buddy detector, each CPU uses its softlockup hrtimer
> > -	  to check that the next CPU is processing hrtimer interrupts by
> > -	  verifying that a counter is increasing.
> > -
> > -	  This hardlockup detector is useful on systems that don't have
> > -	  an arch-specific hardlockup detector or if resources needed
> > -	  for the hardlockup detector are better used for other things.
> > -
> >   config HARDLOCKUP_DETECTOR_PERF
> > -	bool
> > +	bool "Enable perf-based hard lockup detector (preferred)"
> >   	depends on HARDLOCKUP_DETECTOR
> > -	depends on HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY
> > +	depends on HAVE_HARDLOCKUP_DETECTOR_PERF
> >   	depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH
> >   	select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
> > +	help
> > +	  This detector uses a perf event on the CPU to detect when a CPU
> > +	  has become non-maskable interrupt (NMI) stuck. This is the
> > +	  preferred method on modern systems as it can detect lockups on
> > +	  all CPUs at the same time.
> >   config HARDLOCKUP_DETECTOR_BUDDY
> > -	bool
> > +	bool "Enable buddy-based hard lockup detector"
> >   	depends on HARDLOCKUP_DETECTOR
> >   	depends on HAVE_HARDLOCKUP_DETECTOR_BUDDY
> > -	depends on !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY
> >   	depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH
> >   	select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
> > +	help
> > +	  This is an alternative lockup detector that uses a heartbeat
> > +	  mechanism between CPUs to detect when one has stopped responding.
> > +	  It is less precise than the perf-based detector and cannot detect
> > +	  all-CPU lockups, but it does not require a perf counter.
> > +
> > +config CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE
> > +	bool
> > +	depends on HARDLOCKUP_DETECTOR_PERF && HARDLOCKUP_DETECTOR_BUDDY
> >   config HARDLOCKUP_DETECTOR_ARCH
> >   	bool
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
  2025-09-17  6:54     ` Jinchao Wang
@ 2025-10-06 20:13       ` Doug Anderson
  0 siblings, 0 replies; 22+ messages in thread
From: Doug Anderson @ 2025-10-06 20:13 UTC (permalink / raw)
  To: Jinchao Wang
  Cc: Christophe Leroy, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm,
	catalin.marinas, maddy, mpe, npiggin, tglx, mingo, bp,
	dave.hansen, hpa, acme, namhyung, mark.rutland,
	alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees,
	masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie,
	gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd,
	zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1,
	luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86,
	linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users,
	Ian Rogers

Hi,

On Tue, Sep 16, 2025 at 11:55 PM Jinchao Wang <wangjinchao600@gmail.com> wrote:
>
> On Wed, Sep 17, 2025 at 08:08:57AM +0200, Christophe Leroy wrote:
> >
> >
> > Le 16/09/2025 à 16:50, Jinchao Wang a écrit :
> > > Currently, the hard lockup detector is selected at compile time via
> > > Kconfig, which requires a kernel rebuild to switch implementations.
> > > This is inflexible, especially on systems where a perf event may not
> > > be available or may be needed for other tasks.
> > >
> > > This commit refactors the hard lockup detector to replace a rigid
> > > compile-time choice with a flexible build-time and boot-time solution.
> > > The patch supports building the kernel with either detector
> > > independently, or with both. When both are built, a new boot parameter
> > > `hardlockup_detector="perf|buddy"` allows the selection at boot time.
> > > This is a more robust and user-friendly design.
> > >
> > > This patch is a follow-up to the discussion on the kernel mailing list
> > > regarding the preference and future of the hard lockup detectors. It
> > > implements a flexible solution that addresses the community's need to
> > > select an appropriate detector at boot time.
> > >
> > > The core changes are:
> > > - The `perf` and `buddy` watchdog implementations are separated into
> > >    distinct functions (e.g., `watchdog_perf_hardlockup_enable`).
> > > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`)
> > >    to serve as a single API for the entire feature.
> > > - A new `hardlockup_detector=` boot parameter is added to allow the
> > >    user to select the desired detector at boot time.
> > > - The Kconfig options are simplified by removing the complex
> > >    `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be
> > >    built without mutual exclusion.
> > > - The weak stubs are updated to call the new function pointers,
> > >    centralizing the watchdog logic.
> > >
> > > Link: https://lore.kernel.org/all/20250915035355.10846-1-cuiyunhui@bytedance.com/
> > > Link: https://lore.kernel.org/all/CAD=FV=WWUiCi6bZCs_gseFpDDWNkuJMoL6XCftEo6W7q6jRCkg@mail.gmail.com/
> > >
> > > Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
> > > ---
> > >   .../admin-guide/kernel-parameters.txt         |  7 +++
> > >   include/linux/nmi.h                           |  6 +++
> > >   kernel/watchdog.c                             | 46 ++++++++++++++++++-
> > >   kernel/watchdog_buddy.c                       |  7 +--
> > >   kernel/watchdog_perf.c                        | 10 ++--
> > >   lib/Kconfig.debug                             | 37 +++++++--------
> > >   6 files changed, 85 insertions(+), 28 deletions(-)
> > >
> > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > index 5a7a83c411e9..0af214ee566c 100644
> > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > @@ -1828,6 +1828,13 @@
> > >                     backtraces on all cpus.
> > >                     Format: 0 | 1
> > > +   hardlockup_detector=
> > > +                   [perf, buddy] Selects the hard lockup detector to use at
> > > +                   boot time.
> > > +                   Format: <string>
> > > +                   - "perf": Use the perf-based detector.
> > > +                   - "buddy": Use the buddy-based detector.
> > > +
> > >     hash_pointers=
> > >                     [KNL,EARLY]
> > >                     By default, when pointers are printed to the console
> > > diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> > > index cf3c6ab408aa..9298980ce572 100644
> > > --- a/include/linux/nmi.h
> > > +++ b/include/linux/nmi.h
> > > @@ -100,6 +100,9 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs);
> > >   #endif
> > >   #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> > > +void watchdog_perf_hardlockup_enable(unsigned int cpu);
> > > +void watchdog_perf_hardlockup_disable(unsigned int cpu);
> > > +extern int watchdog_perf_hardlockup_probe(void);
> >
> > No 'extern' on function prototypes, this is pointless.
> Got it.
> >
> > >   extern void hardlockup_detector_perf_stop(void);
> > >   extern void hardlockup_detector_perf_restart(void);
> > >   extern void hardlockup_config_perf_event(const char *str);
> > > @@ -120,6 +123,9 @@ void watchdog_hardlockup_disable(unsigned int cpu);
> > >   void lockup_detector_reconfigure(void);
> > >   #ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY
> > > +void watchdog_buddy_hardlockup_enable(unsigned int cpu);
> > > +void watchdog_buddy_hardlockup_disable(unsigned int cpu);
> > > +int watchdog_buddy_hardlockup_probe(void);
> > >   void watchdog_buddy_check_hardlockup(int hrtimer_interrupts);
> > >   #else
> > >   static inline void watchdog_buddy_check_hardlockup(int hrtimer_interrupts) {}
> > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > > index 80b56c002c7f..85451d24a77d 100644
> > > --- a/kernel/watchdog.c
> > > +++ b/kernel/watchdog.c
> > > @@ -55,6 +55,37 @@ unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
> > >   #ifdef CONFIG_HARDLOCKUP_DETECTOR
> > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
> > > +/* The global function pointers */
> > > +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_enable;
> > > +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_disable;
> > > +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_perf_hardlockup_probe;
> >
> > As this is set only once at startup, can we use static_call instead of
> > function pointers ?
> >
> > Also, can it me made __ro_after_init ?
> Not really, this is just an RFC patch, and there is no consensus yet.
> If it is included in the final consensus, I will handle it in the next version.
> >
> > > +#elif defined(CONFIG_HARDLOCKUP_DETECTOR_BUDDY)
> > > +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_enable;
> > > +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_disable;
> > > +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_buddy_hardlockup_probe;
> > > +#endif
> > > +
> > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE
> > > +static char *hardlockup_detector_type = "perf"; /* Default to perf */

I'd still at least hope that "buddy" could be the default. While
"perf" can still be useful to catch some things that "buddy" can't,
"buddy" can also catch some things that perf can't so picking the
lower-overhead one as the default is perhaps better?


> I think we should first resolve the consensus issue:
> - Should we keep both perf and buddy watchdogs? (probably yes already)

Sounds like people want to keep perf, so "yes".


> - Should the watchdog type be changeable at boot time?

Seems like a good start.


> - Should the watchdog type be changeable at runtime?

If it was easy/possible then sure, but I don't think it's a giant deal
to have something like this only changeable at boot time, like your
patch does. The fact that your patch is pretty simple is definitely a
big win.


IMO it would be worth sending it out as a real patch.

-Doug

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
  2025-09-17  6:14           ` Jinchao Wang
@ 2025-10-06 21:29             ` Ian Rogers
  2025-10-06 23:24               ` Doug Anderson
  0 siblings, 1 reply; 22+ messages in thread
From: Ian Rogers @ 2025-10-06 21:29 UTC (permalink / raw)
  To: Jinchao Wang
  Cc: Namhyung Kim, Doug Anderson, Peter Zijlstra, Will Deacon,
	Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin,
	christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme,
	mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang,
	kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur,
	ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong,
	gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe,
	yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov,
	thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev,
	linux-perf-users

On Tue, Sep 16, 2025 at 11:14 PM Jinchao Wang <wangjinchao600@gmail.com> wrote:
>
> On Tue, Sep 16, 2025 at 10:35:46PM -0700, Namhyung Kim wrote:
> > Hello,
> >
> > On Tue, Sep 16, 2025 at 10:13:12PM -0700, Ian Rogers wrote:
> > > On Tue, Sep 16, 2025 at 6:47 PM Jinchao Wang <wangjinchao600@gmail.com> wrote:
> > > >
> > > > On Tue, Sep 16, 2025 at 05:03:48PM -0700, Ian Rogers wrote:
> > > > > On Tue, Sep 16, 2025 at 7:51 AM Jinchao Wang <wangjinchao600@gmail.com> wrote:
> > > > > >
> > > > > > Currently, the hard lockup detector is selected at compile time via
> > > > > > Kconfig, which requires a kernel rebuild to switch implementations.
> > > > > > This is inflexible, especially on systems where a perf event may not
> > > > > > be available or may be needed for other tasks.
> > > > > >
> > > > > > This commit refactors the hard lockup detector to replace a rigid
> > > > > > compile-time choice with a flexible build-time and boot-time solution.
> > > > > > The patch supports building the kernel with either detector
> > > > > > independently, or with both. When both are built, a new boot parameter
> > > > > > `hardlockup_detector="perf|buddy"` allows the selection at boot time.
> > > > > > This is a more robust and user-friendly design.
> > > > > >
> > > > > > This patch is a follow-up to the discussion on the kernel mailing list
> > > > > > regarding the preference and future of the hard lockup detectors. It
> > > > > > implements a flexible solution that addresses the community's need to
> > > > > > select an appropriate detector at boot time.
> > > > > >
> > > > > > The core changes are:
> > > > > > - The `perf` and `buddy` watchdog implementations are separated into
> > > > > >   distinct functions (e.g., `watchdog_perf_hardlockup_enable`).
> > > > > > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`)
> > > > > >   to serve as a single API for the entire feature.
> > > > > > - A new `hardlockup_detector=` boot parameter is added to allow the
> > > > > >   user to select the desired detector at boot time.
> > > > > > - The Kconfig options are simplified by removing the complex
> > > > > >   `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be
> > > > > >   built without mutual exclusion.
> > > > > > - The weak stubs are updated to call the new function pointers,
> > > > > >   centralizing the watchdog logic.
> > > > >
> > > > > What is the impact on  /proc/sys/kernel/nmi_watchdog ? Is that
> > > > > enabling and disabling whatever the boot time choice was? I'm not sure
> > > > > why this has to be a boot time option given the ability to configure
> > > > > via /proc/sys/kernel/nmi_watchdog.
> > > > The new hardlockup_detector boot parameter and the existing
> > > > /proc/sys/kernel/nmi_watchdog file serve different purposes.
> > > >
> > > > The boot parameter selects the type of hard lockup detector (perf or buddy).
> > > > This choice is made once at boot.
> > > >
> > > >  /proc/sys/kernel/nmi_watchdog, on the other hand, is only a simple on/off
> > > > switch for the currently selected detector. It does not change the detector's
> > > > type.
> > >
> > > So the name "nmi_watchdog" for the buddy watchdog is wrong for fairly
> > > obvious naming reasons but also because we can't differentiate when a
> > > perf event has been taken or not - this impacts perf that is choosing
> > > not to group events in metrics because of it, reducing the metric's
> > > accuracy. We need an equivalent "buddy_watchdog" file to the
> > > "nmi_watchdog" file. If we have such a file then if I did "echo 1 >
> > > /proc/sys/kernel/nmi_watchdog" I'd expect the buddy watchdog to be
> > > disabled and the perf event one to be enabled. Similarly, if I did
> > > "echo 1 > /proc/sys/kernel/buddy_watchdog" then I would expect the
> > > perf event watchdog to be disabled and the buddy one enabled. If I did
> > >  "echo 0 > /proc/sys/kernel/nmi_watchdog; echo 0 >
> > > /proc/sys/kernel/buddy_watchdog" then I'd expect neither to be
> > > enabled. I don't see why choosing the type of watchdog implementation
> > > at boot time is particularly desirable. It seems sensible to default
> > > normal people to using the buddy watchdog (more perf events, power...)
> > > and  CONFIG_DEBUG_KERNEL type people to using the perf event one. As
> > > the "nmi_watchdog" file may be assumed to control the buddy watchdog,
> > > perhaps a compatibility option (where the "nmi_watchdog" file controls
> > > the buddy watchdog) is needed so that user code has time to migrate.
> >
> > Sounds good to me.  For perf tools, it'd be great if we can have a run-
> > time check which watchdog is selected.
> Considering backward compatibility, I prefer to keep
> /proc/sys/kernel/nmi_watchdog and introduce a new file called
> /proc/sys/kernel/hardlockup_detector_type, which only shows the default string
> or the boot parameter.

Is there code using the buddy watchdog that cares about the
/proc/sys/kernel/nmi_watchdog file? My assumption is that everything
except Android is using the NMI watchdog, so a new
/proc/sys/kernel/buddy_watchdog file doesn't impact them. On Android
writing to /proc/sys/kernel/nmi_watchdog would switch from updating
the buddy watchdog enable/disable to the NMI watchdog enable/disable,
but it is a straightforward patch to make anything doing this update
the buddy_watchdog file instead.

If we have to keep "nmi_watchdog" then we should deprecate it and
create an equivalent file with a better name (ie without NMI in it).
It'll be moderately annoying in perf to determine whether the NMI
watchdog is enabled by having to read two files.

Thanks,
Ian

> The global str pointer hardlockup_detector_type was already introduced in the
> patch, so exposing it in a file is straightforward.
> >
> > Thanks,
> > Namhyung
> >
>
> --
> Jinchao

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
  2025-10-06 21:29             ` Ian Rogers
@ 2025-10-06 23:24               ` Doug Anderson
  2025-10-07  1:00                 ` Ian Rogers
  0 siblings, 1 reply; 22+ messages in thread
From: Doug Anderson @ 2025-10-06 23:24 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Jinchao Wang, Namhyung Kim, Peter Zijlstra, Will Deacon,
	Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin,
	christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme,
	mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang,
	kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur,
	ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong,
	gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe,
	yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov,
	thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev,
	linux-perf-users

Hi,

On Mon, Oct 6, 2025 at 2:30 PM Ian Rogers <irogers@google.com> wrote:
>
> On Tue, Sep 16, 2025 at 11:14 PM Jinchao Wang <wangjinchao600@gmail.com> wrote:
> >
> > On Tue, Sep 16, 2025 at 10:35:46PM -0700, Namhyung Kim wrote:
> > > Hello,
> > >
> > > On Tue, Sep 16, 2025 at 10:13:12PM -0700, Ian Rogers wrote:
> > > > On Tue, Sep 16, 2025 at 6:47 PM Jinchao Wang <wangjinchao600@gmail.com> wrote:
> > > > >
> > > > > On Tue, Sep 16, 2025 at 05:03:48PM -0700, Ian Rogers wrote:
> > > > > > On Tue, Sep 16, 2025 at 7:51 AM Jinchao Wang <wangjinchao600@gmail.com> wrote:
> > > > > > >
> > > > > > > Currently, the hard lockup detector is selected at compile time via
> > > > > > > Kconfig, which requires a kernel rebuild to switch implementations.
> > > > > > > This is inflexible, especially on systems where a perf event may not
> > > > > > > be available or may be needed for other tasks.
> > > > > > >
> > > > > > > This commit refactors the hard lockup detector to replace a rigid
> > > > > > > compile-time choice with a flexible build-time and boot-time solution.
> > > > > > > The patch supports building the kernel with either detector
> > > > > > > independently, or with both. When both are built, a new boot parameter
> > > > > > > `hardlockup_detector="perf|buddy"` allows the selection at boot time.
> > > > > > > This is a more robust and user-friendly design.
> > > > > > >
> > > > > > > This patch is a follow-up to the discussion on the kernel mailing list
> > > > > > > regarding the preference and future of the hard lockup detectors. It
> > > > > > > implements a flexible solution that addresses the community's need to
> > > > > > > select an appropriate detector at boot time.
> > > > > > >
> > > > > > > The core changes are:
> > > > > > > - The `perf` and `buddy` watchdog implementations are separated into
> > > > > > >   distinct functions (e.g., `watchdog_perf_hardlockup_enable`).
> > > > > > > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`)
> > > > > > >   to serve as a single API for the entire feature.
> > > > > > > - A new `hardlockup_detector=` boot parameter is added to allow the
> > > > > > >   user to select the desired detector at boot time.
> > > > > > > - The Kconfig options are simplified by removing the complex
> > > > > > >   `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be
> > > > > > >   built without mutual exclusion.
> > > > > > > - The weak stubs are updated to call the new function pointers,
> > > > > > >   centralizing the watchdog logic.
> > > > > >
> > > > > > What is the impact on  /proc/sys/kernel/nmi_watchdog ? Is that
> > > > > > enabling and disabling whatever the boot time choice was? I'm not sure
> > > > > > why this has to be a boot time option given the ability to configure
> > > > > > via /proc/sys/kernel/nmi_watchdog.
> > > > > The new hardlockup_detector boot parameter and the existing
> > > > > /proc/sys/kernel/nmi_watchdog file serve different purposes.
> > > > >
> > > > > The boot parameter selects the type of hard lockup detector (perf or buddy).
> > > > > This choice is made once at boot.
> > > > >
> > > > >  /proc/sys/kernel/nmi_watchdog, on the other hand, is only a simple on/off
> > > > > switch for the currently selected detector. It does not change the detector's
> > > > > type.
> > > >
> > > > So the name "nmi_watchdog" for the buddy watchdog is wrong for fairly
> > > > obvious naming reasons but also because we can't differentiate when a
> > > > perf event has been taken or not - this impacts perf that is choosing
> > > > not to group events in metrics because of it, reducing the metric's
> > > > accuracy. We need an equivalent "buddy_watchdog" file to the
> > > > "nmi_watchdog" file. If we have such a file then if I did "echo 1 >
> > > > /proc/sys/kernel/nmi_watchdog" I'd expect the buddy watchdog to be
> > > > disabled and the perf event one to be enabled. Similarly, if I did
> > > > "echo 1 > /proc/sys/kernel/buddy_watchdog" then I would expect the
> > > > perf event watchdog to be disabled and the buddy one enabled. If I did
> > > >  "echo 0 > /proc/sys/kernel/nmi_watchdog; echo 0 >
> > > > /proc/sys/kernel/buddy_watchdog" then I'd expect neither to be
> > > > enabled. I don't see why choosing the type of watchdog implementation
> > > > at boot time is particularly desirable. It seems sensible to default
> > > > normal people to using the buddy watchdog (more perf events, power...)
> > > > and  CONFIG_DEBUG_KERNEL type people to using the perf event one. As
> > > > the "nmi_watchdog" file may be assumed to control the buddy watchdog,
> > > > perhaps a compatibility option (where the "nmi_watchdog" file controls
> > > > the buddy watchdog) is needed so that user code has time to migrate.
> > >
> > > Sounds good to me.  For perf tools, it'd be great if we can have a run-
> > > time check which watchdog is selected.
> > Considering backward compatibility, I prefer to keep
> > /proc/sys/kernel/nmi_watchdog and introduce a new file called
> > /proc/sys/kernel/hardlockup_detector_type, which only shows the default string
> > or the boot parameter.
>
> Is there code using the buddy watchdog that cares about the
> /proc/sys/kernel/nmi_watchdog file? My assumption is that everything
> except Android is using the NMI watchdog, so a new
> /proc/sys/kernel/buddy_watchdog file doesn't impact them.

Buddy watchdog has been out there for a few years. At Google, I know
it's used by everything except Android. AKA I believe it is used in
Google's servers and also in ChromeOS. Both of those (presumably)
enable the buddy watchdog via calling it "nmi_watchdog". It's possible
that some Android phones are using the buddy watchdog too but I'm not
aware of it. I don't believe Pixel is using it, though that could
change in the future.

IMO at this point "nmi watchdog" is simply a synonym for the
hardlockup detector. That was what it looked like in the code before I
started messing around and adding the buddy lockup detector and it's
how it is now. While it's unfortunate that there are two names for the
same thing, I don't personally think that should change at this point.
FWIW, even the "buddy" watchdog relies on NMIs to actually get stack
crawls on stuck cores, so NMI still means something even there.

If we want to tell between the perf detector or the buddy detector we
should add a separate file for it.

> On Android
> writing to /proc/sys/kernel/nmi_watchdog would switch from updating
> the buddy watchdog enable/disable to the NMI watchdog enable/disable,
> but it is a straightforward patch to make anything doing this update
> the buddy_watchdog file instead.

Straightforward, but you've got to go find everyone that you break by
doing this. That's not something I want responsibility for. If you
want to convince others this is something worthwhile and you've got
someone signed up to deal with the fallout (if any) then I won't
object, but it's not something I'd be in support of.


> If we have to keep "nmi_watchdog" then we should deprecate it and
> create an equivalent file with a better name (ie without NMI in it).
> It'll be moderately annoying in perf to determine whether the NMI
> watchdog is enabled by having to read two files.

Again, up to you if you want to try to do this, but I'm not really in
support of it. It doesn't seem terribly hard to make a new file that
says which hardlockup detector is in use. If that file exists then
read it. If not then you fallback to what you have today.

-Doug

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
  2025-10-06 23:24               ` Doug Anderson
@ 2025-10-07  1:00                 ` Ian Rogers
  2025-10-07 19:54                   ` Doug Anderson
  0 siblings, 1 reply; 22+ messages in thread
From: Ian Rogers @ 2025-10-07  1:00 UTC (permalink / raw)
  To: Doug Anderson, Greg Kroah-Hartman
  Cc: Jinchao Wang, Namhyung Kim, Peter Zijlstra, Will Deacon,
	Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin,
	christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme,
	mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang,
	kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur,
	ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong,
	gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe,
	yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov,
	thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev,
	linux-perf-users

On Mon, Oct 6, 2025 at 4:32 PM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Mon, Oct 6, 2025 at 2:30 PM Ian Rogers <irogers@google.com> wrote:
> >
> > On Tue, Sep 16, 2025 at 11:14 PM Jinchao Wang <wangjinchao600@gmail.com> wrote:
> > >
> > > On Tue, Sep 16, 2025 at 10:35:46PM -0700, Namhyung Kim wrote:
> > > > Hello,
> > > >
> > > > On Tue, Sep 16, 2025 at 10:13:12PM -0700, Ian Rogers wrote:
> > > > > On Tue, Sep 16, 2025 at 6:47 PM Jinchao Wang <wangjinchao600@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Sep 16, 2025 at 05:03:48PM -0700, Ian Rogers wrote:
> > > > > > > On Tue, Sep 16, 2025 at 7:51 AM Jinchao Wang <wangjinchao600@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Currently, the hard lockup detector is selected at compile time via
> > > > > > > > Kconfig, which requires a kernel rebuild to switch implementations.
> > > > > > > > This is inflexible, especially on systems where a perf event may not
> > > > > > > > be available or may be needed for other tasks.
> > > > > > > >
> > > > > > > > This commit refactors the hard lockup detector to replace a rigid
> > > > > > > > compile-time choice with a flexible build-time and boot-time solution.
> > > > > > > > The patch supports building the kernel with either detector
> > > > > > > > independently, or with both. When both are built, a new boot parameter
> > > > > > > > `hardlockup_detector="perf|buddy"` allows the selection at boot time.
> > > > > > > > This is a more robust and user-friendly design.
> > > > > > > >
> > > > > > > > This patch is a follow-up to the discussion on the kernel mailing list
> > > > > > > > regarding the preference and future of the hard lockup detectors. It
> > > > > > > > implements a flexible solution that addresses the community's need to
> > > > > > > > select an appropriate detector at boot time.
> > > > > > > >
> > > > > > > > The core changes are:
> > > > > > > > - The `perf` and `buddy` watchdog implementations are separated into
> > > > > > > >   distinct functions (e.g., `watchdog_perf_hardlockup_enable`).
> > > > > > > > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`)
> > > > > > > >   to serve as a single API for the entire feature.
> > > > > > > > - A new `hardlockup_detector=` boot parameter is added to allow the
> > > > > > > >   user to select the desired detector at boot time.
> > > > > > > > - The Kconfig options are simplified by removing the complex
> > > > > > > >   `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be
> > > > > > > >   built without mutual exclusion.
> > > > > > > > - The weak stubs are updated to call the new function pointers,
> > > > > > > >   centralizing the watchdog logic.
> > > > > > >
> > > > > > > What is the impact on  /proc/sys/kernel/nmi_watchdog ? Is that
> > > > > > > enabling and disabling whatever the boot time choice was? I'm not sure
> > > > > > > why this has to be a boot time option given the ability to configure
> > > > > > > via /proc/sys/kernel/nmi_watchdog.
> > > > > > The new hardlockup_detector boot parameter and the existing
> > > > > > /proc/sys/kernel/nmi_watchdog file serve different purposes.
> > > > > >
> > > > > > The boot parameter selects the type of hard lockup detector (perf or buddy).
> > > > > > This choice is made once at boot.
> > > > > >
> > > > > >  /proc/sys/kernel/nmi_watchdog, on the other hand, is only a simple on/off
> > > > > > switch for the currently selected detector. It does not change the detector's
> > > > > > type.
> > > > >
> > > > > So the name "nmi_watchdog" for the buddy watchdog is wrong for fairly
> > > > > obvious naming reasons but also because we can't differentiate when a
> > > > > perf event has been taken or not - this impacts perf that is choosing
> > > > > not to group events in metrics because of it, reducing the metric's
> > > > > accuracy. We need an equivalent "buddy_watchdog" file to the
> > > > > "nmi_watchdog" file. If we have such a file then if I did "echo 1 >
> > > > > /proc/sys/kernel/nmi_watchdog" I'd expect the buddy watchdog to be
> > > > > disabled and the perf event one to be enabled. Similarly, if I did
> > > > > "echo 1 > /proc/sys/kernel/buddy_watchdog" then I would expect the
> > > > > perf event watchdog to be disabled and the buddy one enabled. If I did
> > > > >  "echo 0 > /proc/sys/kernel/nmi_watchdog; echo 0 >
> > > > > /proc/sys/kernel/buddy_watchdog" then I'd expect neither to be
> > > > > enabled. I don't see why choosing the type of watchdog implementation
> > > > > at boot time is particularly desirable. It seems sensible to default
> > > > > normal people to using the buddy watchdog (more perf events, power...)
> > > > > and  CONFIG_DEBUG_KERNEL type people to using the perf event one. As
> > > > > the "nmi_watchdog" file may be assumed to control the buddy watchdog,
> > > > > perhaps a compatibility option (where the "nmi_watchdog" file controls
> > > > > the buddy watchdog) is needed so that user code has time to migrate.
> > > >
> > > > Sounds good to me.  For perf tools, it'd be great if we can have a run-
> > > > time check which watchdog is selected.
> > > Considering backward compatibility, I prefer to keep
> > > /proc/sys/kernel/nmi_watchdog and introduce a new file called
> > > /proc/sys/kernel/hardlockup_detector_type, which only shows the default string
> > > or the boot parameter.
> >
> > Is there code using the buddy watchdog that cares about the
> > /proc/sys/kernel/nmi_watchdog file? My assumption is that everything
> > except Android is using the NMI watchdog, so a new
> > /proc/sys/kernel/buddy_watchdog file doesn't impact them.
>
> Buddy watchdog has been out there for a few years. At Google, I know
> it's used by everything except Android. AKA I believe it is used in
> Google's servers and also in ChromeOS. Both of those (presumably)
> enable the buddy watchdog via calling it "nmi_watchdog". It's possible
> that some Android phones are using the buddy watchdog too but I'm not
> aware of it. I don't believe Pixel is using it, though that could
> change in the future.

I thought what motivated the buddy watchdog was patches implementing
this approach on Android for ARM that lacked an NMI based hard lockup
detector?
Anyway, while the buddy watchdog is in use by Google servers the
nmi_watchdog file has an actively detrimental effect on that.
Specifically the nmi_watchdog file having the value of "1" disables
certain event groups for certain metrics in the perf tool as it is
assumed there are too few performance counters due to the NMI watchdog
stealing one. We want groups of events so that events are scheduled
together and metrics are more accurate, we don't want groups that fail
to schedule because of a lack of counters.

> IMO at this point "nmi watchdog" is simply a synonym for the
> hardlockup detector. That was what it looked like in the code before I
> started messing around and adding the buddy lockup detector and it's
> how it is now. While it's unfortunate that there are two names for the
> same thing, I don't personally think that should change at this point.
> FWIW, even the "buddy" watchdog relies on NMIs to actually get stack
> crawls on stuck cores, so NMI still means something even there.

I think this is misguided. Currently the only use of nmi_watchdog I'm
aware of is by perf where the buddy watchdog's use of it causes issues
(as mentioned above).

> If we want to tell between the perf detector or the buddy detector we
> should add a separate file for it.

We could with perf then having to read from two files rather than one.
Presumably the lack of presence of one file will be sufficient to also
avoid a kernel version check.

> > On Android
> > writing to /proc/sys/kernel/nmi_watchdog would switch from updating
> > the buddy watchdog enable/disable to the NMI watchdog enable/disable,
> > but it is a straightforward patch to make anything doing this update
> > the buddy_watchdog file instead.
>
> Straightforward, but you've got to go find everyone that you break by
> doing this. That's not something I want responsibility for. If you
> want to convince others this is something worthwhile and you've got
> someone signed up to deal with the fallout (if any) then I won't
> object, but it's not something I'd be in support of.

Stuff like this happens, check out this thread:
https://lore.kernel.org/lkml/2025020304-chip-trench-4e56@gregkh/
Imo we shouldn't design in using an actively misleading file name and
incurring extra overhead in perf. Having two files nmi_watchdog and
buddy_watchdog is fine as the latter case currently isn't in
mainstream distro use and people shouldn't care. It also maintains and
correct's perf's behavior when the buddy and not nmi watchdog is in
use.

> > If we have to keep "nmi_watchdog" then we should deprecate it and
> > create an equivalent file with a better name (ie without NMI in it).
> > It'll be moderately annoying in perf to determine whether the NMI
> > watchdog is enabled by having to read two files.
>
> Again, up to you if you want to try to do this, but I'm not really in
> support of it. It doesn't seem terribly hard to make a new file that
> says which hardlockup detector is in use. If that file exists then
> read it. If not then you fallback to what you have today.

I don't mind a file that also says what kind of lockup detector is in
use. I'd like the meaning of nmi_watchdog to keep meaning the kernel
stole a perf counter as this is the behavior long assumed by perf
which I think is the majority or only user of the file. I think the
buddy watchdog being controlled by this file is less than intention
revealing.

Thanks,
Ian

> -Doug

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
  2025-10-07  1:00                 ` Ian Rogers
@ 2025-10-07 19:54                   ` Doug Anderson
  2025-10-07 20:43                     ` Ian Rogers
  0 siblings, 1 reply; 22+ messages in thread
From: Doug Anderson @ 2025-10-07 19:54 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Greg Kroah-Hartman, Jinchao Wang, Namhyung Kim, Peter Zijlstra,
	Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe,
	npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa,
	acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter,
	kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh,
	xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong,
	gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe,
	yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov,
	thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev,
	linux-perf-users

Hi,

On Mon, Oct 6, 2025 at 6:00 PM Ian Rogers <irogers@google.com> wrote:
>
> On Mon, Oct 6, 2025 at 4:32 PM Doug Anderson <dianders@chromium.org> wrote:
> >
> > Hi,
> >
> > On Mon, Oct 6, 2025 at 2:30 PM Ian Rogers <irogers@google.com> wrote:
> > >
> > > On Tue, Sep 16, 2025 at 11:14 PM Jinchao Wang <wangjinchao600@gmail.com> wrote:
> > > >
> > > > On Tue, Sep 16, 2025 at 10:35:46PM -0700, Namhyung Kim wrote:
> > > > > Hello,
> > > > >
> > > > > On Tue, Sep 16, 2025 at 10:13:12PM -0700, Ian Rogers wrote:
> > > > > > On Tue, Sep 16, 2025 at 6:47 PM Jinchao Wang <wangjinchao600@gmail.com> wrote:
> > > > > > >
> > > > > > > On Tue, Sep 16, 2025 at 05:03:48PM -0700, Ian Rogers wrote:
> > > > > > > > On Tue, Sep 16, 2025 at 7:51 AM Jinchao Wang <wangjinchao600@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Currently, the hard lockup detector is selected at compile time via
> > > > > > > > > Kconfig, which requires a kernel rebuild to switch implementations.
> > > > > > > > > This is inflexible, especially on systems where a perf event may not
> > > > > > > > > be available or may be needed for other tasks.
> > > > > > > > >
> > > > > > > > > This commit refactors the hard lockup detector to replace a rigid
> > > > > > > > > compile-time choice with a flexible build-time and boot-time solution.
> > > > > > > > > The patch supports building the kernel with either detector
> > > > > > > > > independently, or with both. When both are built, a new boot parameter
> > > > > > > > > `hardlockup_detector="perf|buddy"` allows the selection at boot time.
> > > > > > > > > This is a more robust and user-friendly design.
> > > > > > > > >
> > > > > > > > > This patch is a follow-up to the discussion on the kernel mailing list
> > > > > > > > > regarding the preference and future of the hard lockup detectors. It
> > > > > > > > > implements a flexible solution that addresses the community's need to
> > > > > > > > > select an appropriate detector at boot time.
> > > > > > > > >
> > > > > > > > > The core changes are:
> > > > > > > > > - The `perf` and `buddy` watchdog implementations are separated into
> > > > > > > > >   distinct functions (e.g., `watchdog_perf_hardlockup_enable`).
> > > > > > > > > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`)
> > > > > > > > >   to serve as a single API for the entire feature.
> > > > > > > > > - A new `hardlockup_detector=` boot parameter is added to allow the
> > > > > > > > >   user to select the desired detector at boot time.
> > > > > > > > > - The Kconfig options are simplified by removing the complex
> > > > > > > > >   `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be
> > > > > > > > >   built without mutual exclusion.
> > > > > > > > > - The weak stubs are updated to call the new function pointers,
> > > > > > > > >   centralizing the watchdog logic.
> > > > > > > >
> > > > > > > > What is the impact on  /proc/sys/kernel/nmi_watchdog ? Is that
> > > > > > > > enabling and disabling whatever the boot time choice was? I'm not sure
> > > > > > > > why this has to be a boot time option given the ability to configure
> > > > > > > > via /proc/sys/kernel/nmi_watchdog.
> > > > > > > The new hardlockup_detector boot parameter and the existing
> > > > > > > /proc/sys/kernel/nmi_watchdog file serve different purposes.
> > > > > > >
> > > > > > > The boot parameter selects the type of hard lockup detector (perf or buddy).
> > > > > > > This choice is made once at boot.
> > > > > > >
> > > > > > >  /proc/sys/kernel/nmi_watchdog, on the other hand, is only a simple on/off
> > > > > > > switch for the currently selected detector. It does not change the detector's
> > > > > > > type.
> > > > > >
> > > > > > So the name "nmi_watchdog" for the buddy watchdog is wrong for fairly
> > > > > > obvious naming reasons but also because we can't differentiate when a
> > > > > > perf event has been taken or not - this impacts perf that is choosing
> > > > > > not to group events in metrics because of it, reducing the metric's
> > > > > > accuracy. We need an equivalent "buddy_watchdog" file to the
> > > > > > "nmi_watchdog" file. If we have such a file then if I did "echo 1 >
> > > > > > /proc/sys/kernel/nmi_watchdog" I'd expect the buddy watchdog to be
> > > > > > disabled and the perf event one to be enabled. Similarly, if I did
> > > > > > "echo 1 > /proc/sys/kernel/buddy_watchdog" then I would expect the
> > > > > > perf event watchdog to be disabled and the buddy one enabled. If I did
> > > > > >  "echo 0 > /proc/sys/kernel/nmi_watchdog; echo 0 >
> > > > > > /proc/sys/kernel/buddy_watchdog" then I'd expect neither to be
> > > > > > enabled. I don't see why choosing the type of watchdog implementation
> > > > > > at boot time is particularly desirable. It seems sensible to default
> > > > > > normal people to using the buddy watchdog (more perf events, power...)
> > > > > > and  CONFIG_DEBUG_KERNEL type people to using the perf event one. As
> > > > > > the "nmi_watchdog" file may be assumed to control the buddy watchdog,
> > > > > > perhaps a compatibility option (where the "nmi_watchdog" file controls
> > > > > > the buddy watchdog) is needed so that user code has time to migrate.
> > > > >
> > > > > Sounds good to me.  For perf tools, it'd be great if we can have a run-
> > > > > time check which watchdog is selected.
> > > > Considering backward compatibility, I prefer to keep
> > > > /proc/sys/kernel/nmi_watchdog and introduce a new file called
> > > > /proc/sys/kernel/hardlockup_detector_type, which only shows the default string
> > > > or the boot parameter.
> > >
> > > Is there code using the buddy watchdog that cares about the
> > > /proc/sys/kernel/nmi_watchdog file? My assumption is that everything
> > > except Android is using the NMI watchdog, so a new
> > > /proc/sys/kernel/buddy_watchdog file doesn't impact them.
> >
> > Buddy watchdog has been out there for a few years. At Google, I know
> > it's used by everything except Android. AKA I believe it is used in
> > Google's servers and also in ChromeOS. Both of those (presumably)
> > enable the buddy watchdog via calling it "nmi_watchdog". It's possible
> > that some Android phones are using the buddy watchdog too but I'm not
> > aware of it. I don't believe Pixel is using it, though that could
> > change in the future.
>
> I thought what motivated the buddy watchdog was patches implementing
> this approach on Android for ARM that lacked an NMI based hard lockup
> detector?

It's probably not worth a full rehash of the history, but suffice to
say that the buddy lockup detector did originally come out of Android
but isn't currently used there as far as I can tell.


> Anyway, while the buddy watchdog is in use by Google servers the
> nmi_watchdog file has an actively detrimental effect on that.
> Specifically the nmi_watchdog file having the value of "1" disables
> certain event groups for certain metrics in the perf tool as it is
> assumed there are too few performance counters due to the NMI watchdog
> stealing one. We want groups of events so that events are scheduled
> together and metrics are more accurate, we don't want groups that fail
> to schedule because of a lack of counters.

Fair enough, but that code could also change.


> > IMO at this point "nmi watchdog" is simply a synonym for the
> > hardlockup detector. That was what it looked like in the code before I
> > started messing around and adding the buddy lockup detector and it's
> > how it is now. While it's unfortunate that there are two names for the
> > same thing, I don't personally think that should change at this point.
> > FWIW, even the "buddy" watchdog relies on NMIs to actually get stack
> > crawls on stuck cores, so NMI still means something even there.
>
> I think this is misguided. Currently the only use of nmi_watchdog I'm
> aware of is by perf where the buddy watchdog's use of it causes issues
> (as mentioned above).
>
> > If we want to tell between the perf detector or the buddy detector we
> > should add a separate file for it.
>
> We could with perf then having to read from two files rather than one.
> Presumably the lack of presence of one file will be sufficient to also
> avoid a kernel version check.

This was what I was assuming


> > > On Android
> > > writing to /proc/sys/kernel/nmi_watchdog would switch from updating
> > > the buddy watchdog enable/disable to the NMI watchdog enable/disable,
> > > but it is a straightforward patch to make anything doing this update
> > > the buddy_watchdog file instead.
> >
> > Straightforward, but you've got to go find everyone that you break by
> > doing this. That's not something I want responsibility for. If you
> > want to convince others this is something worthwhile and you've got
> > someone signed up to deal with the fallout (if any) then I won't
> > object, but it's not something I'd be in support of.
>
> Stuff like this happens, check out this thread:
> https://lore.kernel.org/lkml/2025020304-chip-trench-4e56@gregkh/
> Imo we shouldn't design in using an actively misleading file name and
> incurring extra overhead in perf. Having two files nmi_watchdog and
> buddy_watchdog is fine as the latter case currently isn't in
> mainstream distro use and people shouldn't care. It also maintains and
> correct's perf's behavior when the buddy and not nmi watchdog is in
> use.
>
> > > If we have to keep "nmi_watchdog" then we should deprecate it and
> > > create an equivalent file with a better name (ie without NMI in it).
> > > It'll be moderately annoying in perf to determine whether the NMI
> > > watchdog is enabled by having to read two files.
> >
> > Again, up to you if you want to try to do this, but I'm not really in
> > support of it. It doesn't seem terribly hard to make a new file that
> > says which hardlockup detector is in use. If that file exists then
> > read it. If not then you fallback to what you have today.
>
> I don't mind a file that also says what kind of lockup detector is in
> use. I'd like the meaning of nmi_watchdog to keep meaning the kernel
> stole a perf counter as this is the behavior long assumed by perf
> which I think is the majority or only user of the file. I think the
> buddy watchdog being controlled by this file is less than intention
> revealing.

I'm more than happy to be outvoted, but IMO nothing about the name
"nmi_watchdog" says to me that a perf counter was stolen. It just says
that there's a watchdog that NMIs are part of its work.

...and, indeed, "nmi_watchdog" doesn't use perf on PPC, right? As far
as I can tell, from reading `arch/powerpc/kernel/watchdog.c` checks
`watchdog_enabled & WATCHDOG_HARDLOCKUP_ENABLED`. ...and before I did
commit df95d3085caa ("watchdog/hardlockup: rename some "NMI watchdog"
constants/function") it was checking `watchdog_enabled &
NMI_WATCHDOG_ENABLED`. That was at least since 2019, I think...

...and you can see "PPC_WATCHDOG" depends on
"HARDLOCKUP_DETECTOR_ARCH", not "HARDLOCKUP_DETECTOR_PERF" so it's not
perf-backed.

I think this is just the critical question: does/should "nmi_watchdog"
mean perf-backed detector or is "nmi_watchdog" just a synonym for
"hard lockup detector". In your mind it's the former and in my mind
it's the latter. The correct way forward depends on which way we want
to jump, right? In either case some external code is going to need to
change...

The kernel docs are certainly pretty ambiguous here. "kernel.rst"
certainly mentions perf, but it also says that "nmi_watchdog" is only
for x86 which is certainly not true. "lockup-watchdogs.rst" doesn't
say anything about perf and just seems to indicate that "nmi_watchdog"
is another name for the hardlockup detector.  "kernel-parameters.txt"
makes some mention of tweaking which perf event would be used, but
otherwise makes it sound like the nmi_watchdog is just another name
for the hardlockup detector.

My vote would be two files:

* "nmi_watchdog" - simply a synonym for "hardlockup detector". 1
enabled, 0 if not (like today)

* "hardlock_detector" - could be "buddy", "perf", or "arch"

...if the "hardlockup_detector" file doesn't exist and "nmi_watchdog"
is enabled, you could probably guess that a perf event was in use. If
the "hardlockup_detector" file exists then a perf event is only in use
if the value of that file is "perf". It doesn't feel terribly hard to
do this and it doesn't break anyone who was assuming they could turn
on the hardlockup detector with "/proc/nmi_watchdog".

-Doug

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
  2025-10-07 19:54                   ` Doug Anderson
@ 2025-10-07 20:43                     ` Ian Rogers
  2025-10-07 21:43                       ` Doug Anderson
  0 siblings, 1 reply; 22+ messages in thread
From: Ian Rogers @ 2025-10-07 20:43 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Greg Kroah-Hartman, Jinchao Wang, Namhyung Kim, Peter Zijlstra,
	Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe,
	npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa,
	acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter,
	kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh,
	xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong,
	gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe,
	yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov,
	thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev,
	linux-perf-users

On Tue, Oct 7, 2025 at 1:00 PM Doug Anderson <dianders@chromium.org> wrote:
> On Mon, Oct 6, 2025 at 6:00 PM Ian Rogers <irogers@google.com> wrote:
..
> > I don't mind a file that also says what kind of lockup detector is in
> > use. I'd like the meaning of nmi_watchdog to keep meaning the kernel
> > stole a perf counter as this is the behavior long assumed by perf
> > which I think is the majority or only user of the file. I think the
> > buddy watchdog being controlled by this file is less than intention
> > revealing.
>
> I'm more than happy to be outvoted, but IMO nothing about the name
> "nmi_watchdog" says to me that a perf counter was stolen. It just says
> that there's a watchdog that NMIs are part of its work.
>
> ...and, indeed, "nmi_watchdog" doesn't use perf on PPC, right? As far
> as I can tell, from reading `arch/powerpc/kernel/watchdog.c` checks
> `watchdog_enabled & WATCHDOG_HARDLOCKUP_ENABLED`. ...and before I did
> commit df95d3085caa ("watchdog/hardlockup: rename some "NMI watchdog"
> constants/function") it was checking `watchdog_enabled &
> NMI_WATCHDOG_ENABLED`. That was at least since 2019, I think...
>
> ...and you can see "PPC_WATCHDOG" depends on
> "HARDLOCKUP_DETECTOR_ARCH", not "HARDLOCKUP_DETECTOR_PERF" so it's not
> perf-backed.
>
> I think this is just the critical question: does/should "nmi_watchdog"
> mean perf-backed detector or is "nmi_watchdog" just a synonym for
> "hard lockup detector". In your mind it's the former and in my mind
> it's the latter. The correct way forward depends on which way we want
> to jump, right? In either case some external code is going to need to
> change...

So we could say it is one-idea against another, I'm hoping to be
objective and come from the viewpoint of the perf code as part of its
function is to be a demonstration of APIs and the buddy watchdog has
altered this. The perf tool has referred to the nmi_watchdog's
behavior in man pages and code since 2016:
http://lkml.kernel.org/r/1459810686-15913-1-git-send-email-andi@firstfloor.org
```
+Globally pinned events can limit the number of counters available for
+other groups. On x86 systems, the NMI watchdog pins a counter by default.
+The nmi watchdog can be disabled as root with
+
+       echo 0 > /proc/sys/kernel/nmi_watchdog
```
http://lkml.kernel.org/r/1464119559-17203-1-git-send-email-andi@firstfloor.org
```
+void arch_topdown_group_warn(void)
+{
+       fprintf(stderr,
+               "nmi_watchdog enabled with topdown. May give wrong results.\n"
+               "Disable with echo 0 > /proc/sys/kernel/nmi_watchdog\n");
+}
```

Probably the most common error message dates back to 2017:
http://lkml.kernel.org/r/20170211183218.ijnvb5f7ciyuunx4@pd.tnic
```
      Some events weren't counted. Try disabling the NMI watchdog:
           echo 0 > /proc/sys/kernel/nmi_watchdog
           perf stat ...
           echo 1 > /proc/sys/kernel/nmi_watchdog
```
and that is saying "NMI watchdog" not "buddy watchdog". Users are
familiar with the idea that the /proc/sys/kernel/nmi_watchdog is
unremarkably controlling the NMI watchdog.

I've not found another use of /proc/sys/kernel/nmi_watchdog outside of
the perf tool.

> The kernel docs are certainly pretty ambiguous here. "kernel.rst"
> certainly mentions perf, but it also says that "nmi_watchdog" is only
> for x86 which is certainly not true. "lockup-watchdogs.rst" doesn't
> say anything about perf and just seems to indicate that "nmi_watchdog"
> is another name for the hardlockup detector.  "kernel-parameters.txt"
> makes some mention of tweaking which perf event would be used, but
> otherwise makes it sound like the nmi_watchdog is just another name
> for the hardlockup detector.
>
> My vote would be two files:
>
> * "nmi_watchdog" - simply a synonym for "hardlockup detector". 1
> enabled, 0 if not (like today)
>
> * "hardlock_detector" - could be "buddy", "perf", or "arch"
>
> ...if the "hardlockup_detector" file doesn't exist and "nmi_watchdog"
> is enabled, you could probably guess that a perf event was in use. If
> the "hardlockup_detector" file exists then a perf event is only in use
> if the value of that file is "perf". It doesn't feel terribly hard to
> do this and it doesn't break anyone who was assuming they could turn
> on the hardlockup detector with "/proc/nmi_watchdog".

It is not hard but:
1) it means whenever perf wants to determine the NMI watchdog is
present it needs to read two files rather than the existing 1, which
has some runtime cost;
2) the name nmi_watchdog for controlling the behavior of the buddy
watchdog isn't intention revealing as the buddy mechanisms whole point
is to avoid the NMI;
3) old perf tools with the buddy watchdog have the wrong behavior
(they've regressed).

It is also not hard to have the watchdog files named
<mechanism>_watchdog, such as buddy_watchdog, nmi_watchdog,
arch_watchdog and have the contents be 0 or 1 as appropriate. Such a
scheme would have less overhead for perf, make the name more intention
revealing and not alter old perf tools. I'm really not sure what
problem we're trying to fix by not adopting this approach. I was
surprised the buddy watchdog merged but using the nmi_watchdog file.

Thanks,
Ian

> -Doug
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
  2025-10-07 20:43                     ` Ian Rogers
@ 2025-10-07 21:43                       ` Doug Anderson
  2025-10-07 22:45                         ` Ian Rogers
  0 siblings, 1 reply; 22+ messages in thread
From: Doug Anderson @ 2025-10-07 21:43 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Greg Kroah-Hartman, Jinchao Wang, Namhyung Kim, Peter Zijlstra,
	Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe,
	npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa,
	acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter,
	kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh,
	xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong,
	gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe,
	yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov,
	thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev,
	linux-perf-users

Hi,

On Tue, Oct 7, 2025 at 1:43 PM Ian Rogers <irogers@google.com> wrote:
>
> On Tue, Oct 7, 2025 at 1:00 PM Doug Anderson <dianders@chromium.org> wrote:
> > On Mon, Oct 6, 2025 at 6:00 PM Ian Rogers <irogers@google.com> wrote:
> ..
> > > I don't mind a file that also says what kind of lockup detector is in
> > > use. I'd like the meaning of nmi_watchdog to keep meaning the kernel
> > > stole a perf counter as this is the behavior long assumed by perf
> > > which I think is the majority or only user of the file. I think the
> > > buddy watchdog being controlled by this file is less than intention
> > > revealing.
> >
> > I'm more than happy to be outvoted, but IMO nothing about the name
> > "nmi_watchdog" says to me that a perf counter was stolen. It just says
> > that there's a watchdog that NMIs are part of its work.
> >
> > ...and, indeed, "nmi_watchdog" doesn't use perf on PPC, right? As far
> > as I can tell, from reading `arch/powerpc/kernel/watchdog.c` checks
> > `watchdog_enabled & WATCHDOG_HARDLOCKUP_ENABLED`. ...and before I did
> > commit df95d3085caa ("watchdog/hardlockup: rename some "NMI watchdog"
> > constants/function") it was checking `watchdog_enabled &
> > NMI_WATCHDOG_ENABLED`. That was at least since 2019, I think...
> >
> > ...and you can see "PPC_WATCHDOG" depends on
> > "HARDLOCKUP_DETECTOR_ARCH", not "HARDLOCKUP_DETECTOR_PERF" so it's not
> > perf-backed.
> >
> > I think this is just the critical question: does/should "nmi_watchdog"
> > mean perf-backed detector or is "nmi_watchdog" just a synonym for
> > "hard lockup detector". In your mind it's the former and in my mind
> > it's the latter. The correct way forward depends on which way we want
> > to jump, right? In either case some external code is going to need to
> > change...
>
> So we could say it is one-idea against another, I'm hoping to be
> objective and come from the viewpoint of the perf code as part of its
> function is to be a demonstration of APIs and the buddy watchdog has
> altered this. The perf tool has referred to the nmi_watchdog's
> behavior in man pages and code since 2016:
> http://lkml.kernel.org/r/1459810686-15913-1-git-send-email-andi@firstfloor.org
> ```
> +Globally pinned events can limit the number of counters available for
> +other groups. On x86 systems, the NMI watchdog pins a counter by default.
> +The nmi watchdog can be disabled as root with
> +
> +       echo 0 > /proc/sys/kernel/nmi_watchdog
> ```
> http://lkml.kernel.org/r/1464119559-17203-1-git-send-email-andi@firstfloor.org
> ```
> +void arch_topdown_group_warn(void)
> +{
> +       fprintf(stderr,
> +               "nmi_watchdog enabled with topdown. May give wrong results.\n"
> +               "Disable with echo 0 > /proc/sys/kernel/nmi_watchdog\n");
> +}
> ```
>
> Probably the most common error message dates back to 2017:
> http://lkml.kernel.org/r/20170211183218.ijnvb5f7ciyuunx4@pd.tnic
> ```
>       Some events weren't counted. Try disabling the NMI watchdog:
>            echo 0 > /proc/sys/kernel/nmi_watchdog
>            perf stat ...
>            echo 1 > /proc/sys/kernel/nmi_watchdog
> ```
> and that is saying "NMI watchdog" not "buddy watchdog". Users are
> familiar with the idea that the /proc/sys/kernel/nmi_watchdog is
> unremarkably controlling the NMI watchdog.
>
> I've not found another use of /proc/sys/kernel/nmi_watchdog outside of
> the perf tool.

Although it's possible I missed it, from my quick look in ChromeOS, I
don't see any use of `/proc/sys/kernel/nmi_watchdog` either. NOTE that
you also need to look for `sysctl` references to
`kernel.nmi_watchdog`, right? That's essentially a way to access the
exact same file...

The other thing you need to think about is the kernel command-line
parameter. Right now we have the kernel command line parameter
`nmi_watchdog=<0,1>` that can be used to turn the hardlockup detector
on or off. On PowerPC I believe it turns on/off the ARCH hardlockup
detector. On systems with the buddy hardlockup detector it turns
on/off the buddy hardlockup detector. How are you proposing to handle
that? Are you going to make the kernel command line parameter still
affect everyone, or require people using the kernel command line
parameter for buddy/powerpc to change?

All of these ways to turn off/on the hardlockup detector can, in
theory, also be things that end users are messing with. People may
have settings in their `/etc/sysctl.d` to tweak things. They may have
messed with the kernel command line in their bootloader to tweak this
setting. I'm not saying it's impossible to change, but anyone who
changes this has to be prepared for people to yell.


> > The kernel docs are certainly pretty ambiguous here. "kernel.rst"
> > certainly mentions perf, but it also says that "nmi_watchdog" is only
> > for x86 which is certainly not true. "lockup-watchdogs.rst" doesn't
> > say anything about perf and just seems to indicate that "nmi_watchdog"
> > is another name for the hardlockup detector.  "kernel-parameters.txt"
> > makes some mention of tweaking which perf event would be used, but
> > otherwise makes it sound like the nmi_watchdog is just another name
> > for the hardlockup detector.
> >
> > My vote would be two files:
> >
> > * "nmi_watchdog" - simply a synonym for "hardlockup detector". 1
> > enabled, 0 if not (like today)
> >
> > * "hardlock_detector" - could be "buddy", "perf", or "arch"
> >
> > ...if the "hardlockup_detector" file doesn't exist and "nmi_watchdog"
> > is enabled, you could probably guess that a perf event was in use. If
> > the "hardlockup_detector" file exists then a perf event is only in use
> > if the value of that file is "perf". It doesn't feel terribly hard to
> > do this and it doesn't break anyone who was assuming they could turn
> > on the hardlockup detector with "/proc/nmi_watchdog".
>
> It is not hard but:
> 1) it means whenever perf wants to determine the NMI watchdog is
> present it needs to read two files rather than the existing 1, which
> has some runtime cost;
> 2) the name nmi_watchdog for controlling the behavior of the buddy
> watchdog isn't intention revealing as the buddy mechanisms whole point
> is to avoid the NMI;

The buddy mechanism is to avoid the use of a perf counter, not an NMI
(non maskable interrupt). "NMI" != "perf counter". Many types of
interrupts can be non-maskable (or pseudo-non-maskable). The buddy
lockup detector still uses an NMI (indirectly). Specifically, a
non-maskable IPI (NMI-backed IPI) is used to interrupt the other CPU.

You keep saying that perf wants to determine if the "NMI watchdog" is
present. That's not what it wants to know. It wants to know if the
hard lockup detector (which likely uses NMIs in some way) happens to
use up a perf event. If an NMI-backed hardlockup detector (presumably
like the one on PowerPC) wasn't using a perf counter then perf
wouldn't care.


> 3) old perf tools with the buddy watchdog have the wrong behavior
> (they've regressed).
>
> It is also not hard to have the watchdog files named
> <mechanism>_watchdog, such as buddy_watchdog, nmi_watchdog,
> arch_watchdog and have the contents be 0 or 1 as appropriate. Such a
> scheme would have less overhead for perf, make the name more intention
> revealing and not alter old perf tools. I'm really not sure what
> problem we're trying to fix by not adopting this approach. I was
> surprised the buddy watchdog merged but using the nmi_watchdog file.

The buddy watchdog was pretty much following the conventions that were
already in the code: that the hardlockup detector (whether backed by
perf or not) was essentially called the "nmi watchdog". There were a
number of people that were involved in reviews and I don't believe
suggesting creating a whole different mechanism for enabling /
disabling the buddy watchdog was never suggested.
-Doug

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
  2025-10-07 21:43                       ` Doug Anderson
@ 2025-10-07 22:45                         ` Ian Rogers
  2025-10-07 22:58                           ` Doug Anderson
  0 siblings, 1 reply; 22+ messages in thread
From: Ian Rogers @ 2025-10-07 22:45 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Greg Kroah-Hartman, Jinchao Wang, Namhyung Kim, Peter Zijlstra,
	Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe,
	npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa,
	acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter,
	kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh,
	xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong,
	gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe,
	yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov,
	thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev,
	linux-perf-users

On Tue, Oct 7, 2025 at 2:43 PM Doug Anderson <dianders@chromium.org> wrote:
...
> The buddy watchdog was pretty much following the conventions that were
> already in the code: that the hardlockup detector (whether backed by
> perf or not) was essentially called the "nmi watchdog". There were a
> number of people that were involved in reviews and I don't believe
> suggesting creating a whole different mechanism for enabling /
> disabling the buddy watchdog was never suggested.

I suspect they lacked the context that 1 in the nmi_watchdog is taken
to mean there's a perf event in use by the kernel with implications on
how group events behave. This behavior has been user
visible/advertised for 9 years. I don't doubt that there were good
intentions by PowerPC's watchdog and in the buddy watchdog patches in
using the file, that use will lead to spurious warnings and behaviors
by perf.

My points remain:
1) using multiple files regresses perf's performance;
2) the file name by its meaning is wrong;
3) old perf tools on new kernels won't behave as expected wrt warnings
and metrics because the meaning of the file has changed.
Using a separate file for each watchdog resolves this. It seems that
there wasn't enough critical mass for getting this right to have
mattered before, but that doesn't mean we shouldn't get it right now.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
  2025-10-07 22:45                         ` Ian Rogers
@ 2025-10-07 22:58                           ` Doug Anderson
  2025-10-08  0:11                             ` Ian Rogers
  0 siblings, 1 reply; 22+ messages in thread
From: Doug Anderson @ 2025-10-07 22:58 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Greg Kroah-Hartman, Jinchao Wang, Namhyung Kim, Peter Zijlstra,
	Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe,
	npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa,
	acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter,
	kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh,
	xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong,
	gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe,
	yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov,
	thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev,
	linux-perf-users

Hi,

On Tue, Oct 7, 2025 at 3:45 PM Ian Rogers <irogers@google.com> wrote:
>
> On Tue, Oct 7, 2025 at 2:43 PM Doug Anderson <dianders@chromium.org> wrote:
> ...
> > The buddy watchdog was pretty much following the conventions that were
> > already in the code: that the hardlockup detector (whether backed by
> > perf or not) was essentially called the "nmi watchdog". There were a
> > number of people that were involved in reviews and I don't believe
> > suggesting creating a whole different mechanism for enabling /
> > disabling the buddy watchdog was never suggested.
>
> I suspect they lacked the context that 1 in the nmi_watchdog is taken
> to mean there's a perf event in use by the kernel with implications on
> how group events behave. This behavior has been user
> visible/advertised for 9 years. I don't doubt that there were good
> intentions by PowerPC's watchdog and in the buddy watchdog patches in
> using the file, that use will lead to spurious warnings and behaviors
> by perf.
>
> My points remain:
> 1) using multiple files regresses perf's performance;
> 2) the file name by its meaning is wrong;
> 3) old perf tools on new kernels won't behave as expected wrt warnings
> and metrics because the meaning of the file has changed.
> Using a separate file for each watchdog resolves this. It seems that
> there wasn't enough critical mass for getting this right to have
> mattered before, but that doesn't mean we shouldn't get it right now.

Presumably your next steps then are to find someone to submit a patch
and try to convince others on the list that this is a good idea. The
issue with perf has been known for a while now and I haven't seen any
patches. As I've said, I won't stand in the way if everyone else
agrees, but given that I'm still not convinced I'm not going to author
any patches for this myself.

-Doug

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
  2025-10-07 22:58                           ` Doug Anderson
@ 2025-10-08  0:11                             ` Ian Rogers
  2025-10-09  6:50                               ` Jinchao Wang
  0 siblings, 1 reply; 22+ messages in thread
From: Ian Rogers @ 2025-10-08  0:11 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Greg Kroah-Hartman, Jinchao Wang, Namhyung Kim, Peter Zijlstra,
	Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe,
	npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa,
	acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter,
	kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh,
	xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong,
	gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe,
	yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov,
	thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev,
	linux-perf-users

On Tue, Oct 7, 2025 at 3:58 PM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Tue, Oct 7, 2025 at 3:45 PM Ian Rogers <irogers@google.com> wrote:
> >
> > On Tue, Oct 7, 2025 at 2:43 PM Doug Anderson <dianders@chromium.org> wrote:
> > ...
> > > The buddy watchdog was pretty much following the conventions that were
> > > already in the code: that the hardlockup detector (whether backed by
> > > perf or not) was essentially called the "nmi watchdog". There were a
> > > number of people that were involved in reviews and I don't believe
> > > suggesting creating a whole different mechanism for enabling /
> > > disabling the buddy watchdog was never suggested.
> >
> > I suspect they lacked the context that 1 in the nmi_watchdog is taken
> > to mean there's a perf event in use by the kernel with implications on
> > how group events behave. This behavior has been user
> > visible/advertised for 9 years. I don't doubt that there were good
> > intentions by PowerPC's watchdog and in the buddy watchdog patches in
> > using the file, that use will lead to spurious warnings and behaviors
> > by perf.
> >
> > My points remain:
> > 1) using multiple files regresses perf's performance;
> > 2) the file name by its meaning is wrong;
> > 3) old perf tools on new kernels won't behave as expected wrt warnings
> > and metrics because the meaning of the file has changed.
> > Using a separate file for each watchdog resolves this. It seems that
> > there wasn't enough critical mass for getting this right to have
> > mattered before, but that doesn't mean we shouldn't get it right now.
>
> Presumably your next steps then are to find someone to submit a patch
> and try to convince others on the list that this is a good idea. The
> issue with perf has been known for a while now and I haven't seen any
> patches. As I've said, I won't stand in the way if everyone else
> agrees, but given that I'm still not convinced I'm not going to author
> any patches for this myself.

Writing >1 of:
```
static struct ctl_table watchdog_hardlockup_sysctl[] = {
{
.procname       = "nmi_watchdog",
.data = &watchdog_hardlockup_user_enabled,
.maxlen = sizeof(int),
.mode = 0444,
.proc_handler   = proc_nmi_watchdog,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
},
};
```
is an exercise of copy-and-paste, if you need me to do the copy and
pasting then it is okay.

Thanks,
Ian


> -Doug
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
  2025-10-08  0:11                             ` Ian Rogers
@ 2025-10-09  6:50                               ` Jinchao Wang
  2025-10-09 13:22                                 ` Ian Rogers
  0 siblings, 1 reply; 22+ messages in thread
From: Jinchao Wang @ 2025-10-09  6:50 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Doug Anderson, Greg Kroah-Hartman, Namhyung Kim, Peter Zijlstra,
	Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe,
	npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa,
	acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter,
	kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh,
	xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong,
	gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe,
	yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov,
	thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev,
	linux-perf-users

On Tue, Oct 07, 2025 at 05:11:52PM -0700, Ian Rogers wrote:
> On Tue, Oct 7, 2025 at 3:58 PM Doug Anderson <dianders@chromium.org> wrote:
> >
> > Hi,
> >
> > On Tue, Oct 7, 2025 at 3:45 PM Ian Rogers <irogers@google.com> wrote:
> > >
> > > On Tue, Oct 7, 2025 at 2:43 PM Doug Anderson <dianders@chromium.org> wrote:
> > > ...
> > > > The buddy watchdog was pretty much following the conventions that were
> > > > already in the code: that the hardlockup detector (whether backed by
> > > > perf or not) was essentially called the "nmi watchdog". There were a
> > > > number of people that were involved in reviews and I don't believe
> > > > suggesting creating a whole different mechanism for enabling /
> > > > disabling the buddy watchdog was never suggested.
> > >
> > > I suspect they lacked the context that 1 in the nmi_watchdog is taken
> > > to mean there's a perf event in use by the kernel with implications on
> > > how group events behave. This behavior has been user
> > > visible/advertised for 9 years. I don't doubt that there were good
> > > intentions by PowerPC's watchdog and in the buddy watchdog patches in
> > > using the file, that use will lead to spurious warnings and behaviors
> > > by perf.
> > >
> > > My points remain:
> > > 1) using multiple files regresses perf's performance;
> > > 2) the file name by its meaning is wrong;
> > > 3) old perf tools on new kernels won't behave as expected wrt warnings
> > > and metrics because the meaning of the file has changed.
> > > Using a separate file for each watchdog resolves this. It seems that
> > > there wasn't enough critical mass for getting this right to have
> > > mattered before, but that doesn't mean we shouldn't get it right now.
> >
> > Presumably your next steps then are to find someone to submit a patch
> > and try to convince others on the list that this is a good idea. The
> > issue with perf has been known for a while now and I haven't seen any
> > patches. As I've said, I won't stand in the way if everyone else
> > agrees, but given that I'm still not convinced I'm not going to author
> > any patches for this myself.
> 
> Writing >1 of:
> ```
> static struct ctl_table watchdog_hardlockup_sysctl[] = {
> {
> .procname       = "nmi_watchdog",
> .data = &watchdog_hardlockup_user_enabled,
> .maxlen = sizeof(int),
> .mode = 0444,
> .proc_handler   = proc_nmi_watchdog,
> .extra1 = SYSCTL_ZERO,
> .extra2 = SYSCTL_ONE,
> },
> };
> ```
> is an exercise of copy-and-paste, if you need me to do the copy and
> pasting then it is okay.
Can we get whether a perf event is already in use directly from the
perf subsystem? There may be (or will be) other kernel users of
perf_event besides the NMI watchdog. Exposing that state from the perf
side would avoid coupling unrelated users through nmi_watchdog and
similar features.

> 
> Thanks,
> Ian
> 
> 
> > -Doug
> >

-- 
Jinchao

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
  2025-10-09  6:50                               ` Jinchao Wang
@ 2025-10-09 13:22                                 ` Ian Rogers
  2025-10-10 12:54                                   ` Jinchao Wang
  0 siblings, 1 reply; 22+ messages in thread
From: Ian Rogers @ 2025-10-09 13:22 UTC (permalink / raw)
  To: Jinchao Wang
  Cc: Doug Anderson, Greg Kroah-Hartman, Namhyung Kim, Peter Zijlstra,
	Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe,
	npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa,
	acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter,
	kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh,
	xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong,
	gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe,
	yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov,
	thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev,
	linux-perf-users

On Wed, Oct 8, 2025 at 11:50 PM Jinchao Wang <wangjinchao600@gmail.com> wrote:
>
> On Tue, Oct 07, 2025 at 05:11:52PM -0700, Ian Rogers wrote:
> > On Tue, Oct 7, 2025 at 3:58 PM Doug Anderson <dianders@chromium.org> wrote:
> > >
> > > Hi,
> > >
> > > On Tue, Oct 7, 2025 at 3:45 PM Ian Rogers <irogers@google.com> wrote:
> > > >
> > > > On Tue, Oct 7, 2025 at 2:43 PM Doug Anderson <dianders@chromium.org> wrote:
> > > > ...
> > > > > The buddy watchdog was pretty much following the conventions that were
> > > > > already in the code: that the hardlockup detector (whether backed by
> > > > > perf or not) was essentially called the "nmi watchdog". There were a
> > > > > number of people that were involved in reviews and I don't believe
> > > > > suggesting creating a whole different mechanism for enabling /
> > > > > disabling the buddy watchdog was never suggested.
> > > >
> > > > I suspect they lacked the context that 1 in the nmi_watchdog is taken
> > > > to mean there's a perf event in use by the kernel with implications on
> > > > how group events behave. This behavior has been user
> > > > visible/advertised for 9 years. I don't doubt that there were good
> > > > intentions by PowerPC's watchdog and in the buddy watchdog patches in
> > > > using the file, that use will lead to spurious warnings and behaviors
> > > > by perf.
> > > >
> > > > My points remain:
> > > > 1) using multiple files regresses perf's performance;
> > > > 2) the file name by its meaning is wrong;
> > > > 3) old perf tools on new kernels won't behave as expected wrt warnings
> > > > and metrics because the meaning of the file has changed.
> > > > Using a separate file for each watchdog resolves this. It seems that
> > > > there wasn't enough critical mass for getting this right to have
> > > > mattered before, but that doesn't mean we shouldn't get it right now.
> > >
> > > Presumably your next steps then are to find someone to submit a patch
> > > and try to convince others on the list that this is a good idea. The
> > > issue with perf has been known for a while now and I haven't seen any
> > > patches. As I've said, I won't stand in the way if everyone else
> > > agrees, but given that I'm still not convinced I'm not going to author
> > > any patches for this myself.
> >
> > Writing >1 of:
> > ```
> > static struct ctl_table watchdog_hardlockup_sysctl[] = {
> > {
> > .procname       = "nmi_watchdog",
> > .data = &watchdog_hardlockup_user_enabled,
> > .maxlen = sizeof(int),
> > .mode = 0444,
> > .proc_handler   = proc_nmi_watchdog,
> > .extra1 = SYSCTL_ZERO,
> > .extra2 = SYSCTL_ONE,
> > },
> > };
> > ```
> > is an exercise of copy-and-paste, if you need me to do the copy and
> > pasting then it is okay.
> Can we get whether a perf event is already in use directly from the
> perf subsystem? There may be (or will be) other kernel users of
> perf_event besides the NMI watchdog. Exposing that state from the perf
> side would avoid coupling unrelated users through nmi_watchdog and
> similar features.

For regular processes there is this unmerged proposal:
https://lore.kernel.org/lkml/20250603181634.1362626-1-ctshao@google.com/
it doesn't say whether the counter is pinned - the NMI watchdog's
counter is pinned to be a higher priority that flexible regular
counters that may be multiplexed. I don't believe there is anything to
say whether the kernel has taken a performance counter. In general
something else taking a performance counter is okay as  the kernel
will multiplex the counter or groups of counters.

The particular issue for the NMI watchdog counter is that groups of
events are tested to see if they fit on a PMU, the perf event open
will fail when a group isn't possible and then the events will be
reprogrammed by the perf tool without a group. When the group is
tested the PMU has always assumed that all counters are available,
which of course with the NMI watchdog they are not. This results with
the NMI watchdog causing a group of events to be created that can
never be scheduled.

Thanks,
Ian

> >
> > Thanks,
> > Ian
> >
> >
> > > -Doug
> > >
>
> --
> Jinchao

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
  2025-10-09 13:22                                 ` Ian Rogers
@ 2025-10-10 12:54                                   ` Jinchao Wang
  2025-10-13 15:22                                     ` Ian Rogers
  0 siblings, 1 reply; 22+ messages in thread
From: Jinchao Wang @ 2025-10-10 12:54 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Doug Anderson, Greg Kroah-Hartman, Namhyung Kim, Peter Zijlstra,
	Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe,
	npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa,
	acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter,
	kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh,
	xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong,
	gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe,
	yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov,
	thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev,
	linux-perf-users

On 10/9/25 21:22, Ian Rogers wrote:
> On Wed, Oct 8, 2025 at 11:50 PM Jinchao Wang <wangjinchao600@gmail.com> wrote:
>>
>> On Tue, Oct 07, 2025 at 05:11:52PM -0700, Ian Rogers wrote:
>>> On Tue, Oct 7, 2025 at 3:58 PM Doug Anderson <dianders@chromium.org> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On Tue, Oct 7, 2025 at 3:45 PM Ian Rogers <irogers@google.com> wrote:
>>>>>
>>>>> On Tue, Oct 7, 2025 at 2:43 PM Doug Anderson <dianders@chromium.org> wrote:
>>>>> ...
>>>>>> The buddy watchdog was pretty much following the conventions that were
>>>>>> already in the code: that the hardlockup detector (whether backed by
>>>>>> perf or not) was essentially called the "nmi watchdog". There were a
>>>>>> number of people that were involved in reviews and I don't believe
>>>>>> suggesting creating a whole different mechanism for enabling /
>>>>>> disabling the buddy watchdog was never suggested.
>>>>>
>>>>> I suspect they lacked the context that 1 in the nmi_watchdog is taken
>>>>> to mean there's a perf event in use by the kernel with implications on
>>>>> how group events behave. This behavior has been user
>>>>> visible/advertised for 9 years. I don't doubt that there were good
>>>>> intentions by PowerPC's watchdog and in the buddy watchdog patches in
>>>>> using the file, that use will lead to spurious warnings and behaviors
>>>>> by perf.
>>>>>
>>>>> My points remain:
>>>>> 1) using multiple files regresses perf's performance;
>>>>> 2) the file name by its meaning is wrong;
>>>>> 3) old perf tools on new kernels won't behave as expected wrt warnings
>>>>> and metrics because the meaning of the file has changed.
>>>>> Using a separate file for each watchdog resolves this. It seems that
>>>>> there wasn't enough critical mass for getting this right to have
>>>>> mattered before, but that doesn't mean we shouldn't get it right now.
>>>>
>>>> Presumably your next steps then are to find someone to submit a patch
>>>> and try to convince others on the list that this is a good idea. The
>>>> issue with perf has been known for a while now and I haven't seen any
>>>> patches. As I've said, I won't stand in the way if everyone else
>>>> agrees, but given that I'm still not convinced I'm not going to author
>>>> any patches for this myself.
>>>
>>> Writing >1 of:
>>> ```
>>> static struct ctl_table watchdog_hardlockup_sysctl[] = {
>>> {
>>> .procname       = "nmi_watchdog",
>>> .data = &watchdog_hardlockup_user_enabled,
>>> .maxlen = sizeof(int),
>>> .mode = 0444,
>>> .proc_handler   = proc_nmi_watchdog,
>>> .extra1 = SYSCTL_ZERO,
>>> .extra2 = SYSCTL_ONE,
>>> },
>>> };
>>> ```
>>> is an exercise of copy-and-paste, if you need me to do the copy and
>>> pasting then it is okay.
>> Can we get whether a perf event is already in use directly from the
>> perf subsystem? There may be (or will be) other kernel users of
>> perf_event besides the NMI watchdog. Exposing that state from the perf
>> side would avoid coupling unrelated users through nmi_watchdog and
>> similar features.
> 
> For regular processes there is this unmerged proposal:
> https://lore.kernel.org/lkml/20250603181634.1362626-1-ctshao@google.com/
> it doesn't say whether the counter is pinned - the NMI watchdog's
> counter is pinned to be a higher priority that flexible regular
> counters that may be multiplexed. I don't believe there is anything to
> say whether the kernel has taken a performance counter. In general
> something else taking a performance counter is okay as  the kernel
> will multiplex the counter or groups of counters.
> 
> The particular issue for the NMI watchdog counter is that groups of
> events are tested to see if they fit on a PMU, the perf event open
> will fail when a group isn't possible and then the events will be
> reprogrammed by the perf tool without a group. When the group is
> tested the PMU has always assumed that all counters are available,
> which of course with the NMI watchdog they are not. This results with
> the NMI watchdog causing a group of events to be created that can
> never be scheduled.

Addressing the PMU assumption that all counters are available would
resolve the issue. If perf managed reserved or pinned counters
internally, other users would not need to be aware of that detail.

Alternatively, perf could provide an interface to query whether a
counter is pinned. Having the NMI watchdog supply that information
creates coupling between otherwise independent subsystems.

> 
> Thanks,
> Ian
> 
>>>
>>> Thanks,
>>> Ian
>>>
>>>
>>>> -Doug
>>>>
>>
>> --
>> Jinchao



-- 
Jinchao

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector
  2025-10-10 12:54                                   ` Jinchao Wang
@ 2025-10-13 15:22                                     ` Ian Rogers
  0 siblings, 0 replies; 22+ messages in thread
From: Ian Rogers @ 2025-10-13 15:22 UTC (permalink / raw)
  To: Jinchao Wang
  Cc: Doug Anderson, Greg Kroah-Hartman, Namhyung Kim, Peter Zijlstra,
	Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe,
	npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa,
	acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter,
	kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh,
	xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong,
	gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe,
	yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov,
	thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev,
	linux-perf-users

On Fri, Oct 10, 2025 at 5:54 AM Jinchao Wang <wangjinchao600@gmail.com> wrote:
> Addressing the PMU assumption that all counters are available would
> resolve the issue. If perf managed reserved or pinned counters
> internally, other users would not need to be aware of that detail.
>
> Alternatively, perf could provide an interface to query whether a
> counter is pinned. Having the NMI watchdog supply that information
> creates coupling between otherwise independent subsystems.

There are lots of ways to redesign the perf event subsystem, counters,
etc. These things are being pushed upon. In general the API is trying
to hide details like the scheduling counters.

I'm in a loop, but the change here is bad because:
1) the use of nmi_watchdog in this way is misleading (outside of the
perf tool) because of the name;
2) it is bad because it is altering the way a kernel API has worked
for close to 10 years meaning old code doesn't work as intended;
3) it incurs extra overhead in tools.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-10-13 15:22 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <https://lore.kernel.org/all/20250915035355.10846-1-cuiyunhui@bytedance.com/>
2025-09-16 14:50 ` [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector Jinchao Wang
2025-09-17  0:03   ` Ian Rogers
2025-09-17  1:47     ` Jinchao Wang
2025-09-17  5:13       ` Ian Rogers
2025-09-17  5:35         ` Namhyung Kim
2025-09-17  6:14           ` Jinchao Wang
2025-10-06 21:29             ` Ian Rogers
2025-10-06 23:24               ` Doug Anderson
2025-10-07  1:00                 ` Ian Rogers
2025-10-07 19:54                   ` Doug Anderson
2025-10-07 20:43                     ` Ian Rogers
2025-10-07 21:43                       ` Doug Anderson
2025-10-07 22:45                         ` Ian Rogers
2025-10-07 22:58                           ` Doug Anderson
2025-10-08  0:11                             ` Ian Rogers
2025-10-09  6:50                               ` Jinchao Wang
2025-10-09 13:22                                 ` Ian Rogers
2025-10-10 12:54                                   ` Jinchao Wang
2025-10-13 15:22                                     ` Ian Rogers
2025-09-17  6:08   ` Christophe Leroy
2025-09-17  6:54     ` Jinchao Wang
2025-10-06 20:13       ` Doug Anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).