* [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector [not found] <https://lore.kernel.org/all/20250915035355.10846-1-cuiyunhui@bytedance.com/> @ 2025-09-16 14:50 ` Jinchao Wang 2025-09-17 0:03 ` Ian Rogers 2025-09-17 6:08 ` Christophe Leroy 0 siblings, 2 replies; 22+ messages in thread From: Jinchao Wang @ 2025-09-16 14:50 UTC (permalink / raw) To: Doug Anderson, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme, namhyung, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users, Ian Rogers Cc: Jinchao Wang Currently, the hard lockup detector is selected at compile time via Kconfig, which requires a kernel rebuild to switch implementations. This is inflexible, especially on systems where a perf event may not be available or may be needed for other tasks. This commit refactors the hard lockup detector to replace a rigid compile-time choice with a flexible build-time and boot-time solution. The patch supports building the kernel with either detector independently, or with both. When both are built, a new boot parameter `hardlockup_detector="perf|buddy"` allows the selection at boot time. This is a more robust and user-friendly design. This patch is a follow-up to the discussion on the kernel mailing list regarding the preference and future of the hard lockup detectors. It implements a flexible solution that addresses the community's need to select an appropriate detector at boot time. The core changes are: - The `perf` and `buddy` watchdog implementations are separated into distinct functions (e.g., `watchdog_perf_hardlockup_enable`). - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`) to serve as a single API for the entire feature. - A new `hardlockup_detector=` boot parameter is added to allow the user to select the desired detector at boot time. - The Kconfig options are simplified by removing the complex `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be built without mutual exclusion. - The weak stubs are updated to call the new function pointers, centralizing the watchdog logic. Link: https://lore.kernel.org/all/20250915035355.10846-1-cuiyunhui@bytedance.com/ Link: https://lore.kernel.org/all/CAD=FV=WWUiCi6bZCs_gseFpDDWNkuJMoL6XCftEo6W7q6jRCkg@mail.gmail.com/ Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com> --- .../admin-guide/kernel-parameters.txt | 7 +++ include/linux/nmi.h | 6 +++ kernel/watchdog.c | 46 ++++++++++++++++++- kernel/watchdog_buddy.c | 7 +-- kernel/watchdog_perf.c | 10 ++-- lib/Kconfig.debug | 37 +++++++-------- 6 files changed, 85 insertions(+), 28 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 5a7a83c411e9..0af214ee566c 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1828,6 +1828,13 @@ backtraces on all cpus. Format: 0 | 1 + hardlockup_detector= + [perf, buddy] Selects the hard lockup detector to use at + boot time. + Format: <string> + - "perf": Use the perf-based detector. + - "buddy": Use the buddy-based detector. + hash_pointers= [KNL,EARLY] By default, when pointers are printed to the console diff --git a/include/linux/nmi.h b/include/linux/nmi.h index cf3c6ab408aa..9298980ce572 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -100,6 +100,9 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs); #endif #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) +void watchdog_perf_hardlockup_enable(unsigned int cpu); +void watchdog_perf_hardlockup_disable(unsigned int cpu); +extern int watchdog_perf_hardlockup_probe(void); extern void hardlockup_detector_perf_stop(void); extern void hardlockup_detector_perf_restart(void); extern void hardlockup_config_perf_event(const char *str); @@ -120,6 +123,9 @@ void watchdog_hardlockup_disable(unsigned int cpu); void lockup_detector_reconfigure(void); #ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY +void watchdog_buddy_hardlockup_enable(unsigned int cpu); +void watchdog_buddy_hardlockup_disable(unsigned int cpu); +int watchdog_buddy_hardlockup_probe(void); void watchdog_buddy_check_hardlockup(int hrtimer_interrupts); #else static inline void watchdog_buddy_check_hardlockup(int hrtimer_interrupts) {} diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 80b56c002c7f..85451d24a77d 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -55,6 +55,37 @@ unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask); #ifdef CONFIG_HARDLOCKUP_DETECTOR +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF +/* The global function pointers */ +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_enable; +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_disable; +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_perf_hardlockup_probe; +#elif defined(CONFIG_HARDLOCKUP_DETECTOR_BUDDY) +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_enable; +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_disable; +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_buddy_hardlockup_probe; +#endif + +#ifdef CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE +static char *hardlockup_detector_type = "perf"; /* Default to perf */ +static int __init set_hardlockup_detector_type(char *str) +{ + if (!strncmp(str, "perf", 4)) { + watchdog_hardlockup_enable_ptr = watchdog_perf_hardlockup_enable; + watchdog_hardlockup_disable_ptr = watchdog_perf_hardlockup_disable; + watchdog_hardlockup_probe_ptr = watchdog_perf_hardlockup_probe; + } else if (!strncmp(str, "buddy", 5)) { + watchdog_hardlockup_enable_ptr = watchdog_buddy_hardlockup_enable; + watchdog_hardlockup_disable_ptr = watchdog_buddy_hardlockup_disable; + watchdog_hardlockup_probe_ptr = watchdog_buddy_hardlockup_probe; + } + return 1; +} + +__setup("hardlockup_detector=", set_hardlockup_detector_type); + +#endif + # ifdef CONFIG_SMP int __read_mostly sysctl_hardlockup_all_cpu_backtrace; # endif /* CONFIG_SMP */ @@ -262,9 +293,17 @@ static inline void watchdog_hardlockup_kick(void) { } * softlockup watchdog start and stop. The detector must select the * SOFTLOCKUP_DETECTOR Kconfig. */ -void __weak watchdog_hardlockup_enable(unsigned int cpu) { } +void __weak watchdog_hardlockup_enable(unsigned int cpu) +{ + if (watchdog_hardlockup_enable_ptr) + watchdog_hardlockup_enable_ptr(cpu); +} -void __weak watchdog_hardlockup_disable(unsigned int cpu) { } +void __weak watchdog_hardlockup_disable(unsigned int cpu) +{ + if (watchdog_hardlockup_disable_ptr) + watchdog_hardlockup_disable_ptr(cpu); +} /* * Watchdog-detector specific API. @@ -275,6 +314,9 @@ void __weak watchdog_hardlockup_disable(unsigned int cpu) { } */ int __weak __init watchdog_hardlockup_probe(void) { + if (watchdog_hardlockup_probe_ptr) + return watchdog_hardlockup_probe_ptr(); + return -ENODEV; } diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c index ee754d767c21..390d89bfcafa 100644 --- a/kernel/watchdog_buddy.c +++ b/kernel/watchdog_buddy.c @@ -19,15 +19,16 @@ static unsigned int watchdog_next_cpu(unsigned int cpu) return next_cpu; } -int __init watchdog_hardlockup_probe(void) +int __init watchdog_buddy_hardlockup_probe(void) { return 0; } -void watchdog_hardlockup_enable(unsigned int cpu) +void watchdog_buddy_hardlockup_enable(unsigned int cpu) { unsigned int next_cpu; + pr_info("ddddd %s\n", __func__); /* * The new CPU will be marked online before the hrtimer interrupt * gets a chance to run on it. If another CPU tests for a @@ -58,7 +59,7 @@ void watchdog_hardlockup_enable(unsigned int cpu) cpumask_set_cpu(cpu, &watchdog_cpus); } -void watchdog_hardlockup_disable(unsigned int cpu) +void watchdog_buddy_hardlockup_disable(unsigned int cpu) { unsigned int next_cpu = watchdog_next_cpu(cpu); diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c index 9c58f5b4381d..270110e58f20 100644 --- a/kernel/watchdog_perf.c +++ b/kernel/watchdog_perf.c @@ -153,10 +153,12 @@ static int hardlockup_detector_event_create(void) * watchdog_hardlockup_enable - Enable the local event * @cpu: The CPU to enable hard lockup on. */ -void watchdog_hardlockup_enable(unsigned int cpu) +void watchdog_perf_hardlockup_enable(unsigned int cpu) { WARN_ON_ONCE(cpu != smp_processor_id()); + pr_info("ddddd %s\n", __func__); + if (hardlockup_detector_event_create()) return; @@ -172,7 +174,7 @@ void watchdog_hardlockup_enable(unsigned int cpu) * watchdog_hardlockup_disable - Disable the local event * @cpu: The CPU to enable hard lockup on. */ -void watchdog_hardlockup_disable(unsigned int cpu) +void watchdog_perf_hardlockup_disable(unsigned int cpu) { struct perf_event *event = this_cpu_read(watchdog_ev); @@ -257,10 +259,12 @@ bool __weak __init arch_perf_nmi_is_available(void) /** * watchdog_hardlockup_probe - Probe whether NMI event is available at all */ -int __init watchdog_hardlockup_probe(void) +int __init watchdog_perf_hardlockup_probe(void) { int ret; + pr_info("ddddd %s\n", __func__); + if (!arch_perf_nmi_is_available()) return -ENODEV; diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index dc0e0c6ed075..443353fad1c1 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1167,36 +1167,33 @@ config HARDLOCKUP_DETECTOR # # Note that arch-specific variants are always preferred. # -config HARDLOCKUP_DETECTOR_PREFER_BUDDY - bool "Prefer the buddy CPU hardlockup detector" - depends on HARDLOCKUP_DETECTOR - depends on HAVE_HARDLOCKUP_DETECTOR_PERF && HAVE_HARDLOCKUP_DETECTOR_BUDDY - depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH - help - Say Y here to prefer the buddy hardlockup detector over the perf one. - - With the buddy detector, each CPU uses its softlockup hrtimer - to check that the next CPU is processing hrtimer interrupts by - verifying that a counter is increasing. - - This hardlockup detector is useful on systems that don't have - an arch-specific hardlockup detector or if resources needed - for the hardlockup detector are better used for other things. - config HARDLOCKUP_DETECTOR_PERF - bool + bool "Enable perf-based hard lockup detector (preferred)" depends on HARDLOCKUP_DETECTOR - depends on HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY + depends on HAVE_HARDLOCKUP_DETECTOR_PERF depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER + help + This detector uses a perf event on the CPU to detect when a CPU + has become non-maskable interrupt (NMI) stuck. This is the + preferred method on modern systems as it can detect lockups on + all CPUs at the same time. config HARDLOCKUP_DETECTOR_BUDDY - bool + bool "Enable buddy-based hard lockup detector" depends on HARDLOCKUP_DETECTOR depends on HAVE_HARDLOCKUP_DETECTOR_BUDDY - depends on !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER + help + This is an alternative lockup detector that uses a heartbeat + mechanism between CPUs to detect when one has stopped responding. + It is less precise than the perf-based detector and cannot detect + all-CPU lockups, but it does not require a perf counter. + +config CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE + bool + depends on HARDLOCKUP_DETECTOR_PERF && HARDLOCKUP_DETECTOR_BUDDY config HARDLOCKUP_DETECTOR_ARCH bool -- 2.43.0 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector 2025-09-16 14:50 ` [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector Jinchao Wang @ 2025-09-17 0:03 ` Ian Rogers 2025-09-17 1:47 ` Jinchao Wang 2025-09-17 6:08 ` Christophe Leroy 1 sibling, 1 reply; 22+ messages in thread From: Ian Rogers @ 2025-09-17 0:03 UTC (permalink / raw) To: Jinchao Wang Cc: Doug Anderson, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme, namhyung, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users On Tue, Sep 16, 2025 at 7:51 AM Jinchao Wang <wangjinchao600@gmail.com> wrote: > > Currently, the hard lockup detector is selected at compile time via > Kconfig, which requires a kernel rebuild to switch implementations. > This is inflexible, especially on systems where a perf event may not > be available or may be needed for other tasks. > > This commit refactors the hard lockup detector to replace a rigid > compile-time choice with a flexible build-time and boot-time solution. > The patch supports building the kernel with either detector > independently, or with both. When both are built, a new boot parameter > `hardlockup_detector="perf|buddy"` allows the selection at boot time. > This is a more robust and user-friendly design. > > This patch is a follow-up to the discussion on the kernel mailing list > regarding the preference and future of the hard lockup detectors. It > implements a flexible solution that addresses the community's need to > select an appropriate detector at boot time. > > The core changes are: > - The `perf` and `buddy` watchdog implementations are separated into > distinct functions (e.g., `watchdog_perf_hardlockup_enable`). > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`) > to serve as a single API for the entire feature. > - A new `hardlockup_detector=` boot parameter is added to allow the > user to select the desired detector at boot time. > - The Kconfig options are simplified by removing the complex > `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be > built without mutual exclusion. > - The weak stubs are updated to call the new function pointers, > centralizing the watchdog logic. What is the impact on /proc/sys/kernel/nmi_watchdog ? Is that enabling and disabling whatever the boot time choice was? I'm not sure why this has to be a boot time option given the ability to configure via /proc/sys/kernel/nmi_watchdog. > Link: https://lore.kernel.org/all/20250915035355.10846-1-cuiyunhui@bytedance.com/ > Link: https://lore.kernel.org/all/CAD=FV=WWUiCi6bZCs_gseFpDDWNkuJMoL6XCftEo6W7q6jRCkg@mail.gmail.com/ > > Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com> > --- > .../admin-guide/kernel-parameters.txt | 7 +++ > include/linux/nmi.h | 6 +++ > kernel/watchdog.c | 46 ++++++++++++++++++- > kernel/watchdog_buddy.c | 7 +-- > kernel/watchdog_perf.c | 10 ++-- > lib/Kconfig.debug | 37 +++++++-------- > 6 files changed, 85 insertions(+), 28 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 5a7a83c411e9..0af214ee566c 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -1828,6 +1828,13 @@ > backtraces on all cpus. > Format: 0 | 1 > > + hardlockup_detector= > + [perf, buddy] Selects the hard lockup detector to use at > + boot time. > + Format: <string> > + - "perf": Use the perf-based detector. > + - "buddy": Use the buddy-based detector. > + > hash_pointers= > [KNL,EARLY] > By default, when pointers are printed to the console > diff --git a/include/linux/nmi.h b/include/linux/nmi.h > index cf3c6ab408aa..9298980ce572 100644 > --- a/include/linux/nmi.h > +++ b/include/linux/nmi.h > @@ -100,6 +100,9 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs); > #endif > > #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) > +void watchdog_perf_hardlockup_enable(unsigned int cpu); > +void watchdog_perf_hardlockup_disable(unsigned int cpu); > +extern int watchdog_perf_hardlockup_probe(void); > extern void hardlockup_detector_perf_stop(void); > extern void hardlockup_detector_perf_restart(void); > extern void hardlockup_config_perf_event(const char *str); > @@ -120,6 +123,9 @@ void watchdog_hardlockup_disable(unsigned int cpu); > void lockup_detector_reconfigure(void); > > #ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY > +void watchdog_buddy_hardlockup_enable(unsigned int cpu); > +void watchdog_buddy_hardlockup_disable(unsigned int cpu); > +int watchdog_buddy_hardlockup_probe(void); > void watchdog_buddy_check_hardlockup(int hrtimer_interrupts); > #else > static inline void watchdog_buddy_check_hardlockup(int hrtimer_interrupts) {} > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index 80b56c002c7f..85451d24a77d 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -55,6 +55,37 @@ unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask); > > #ifdef CONFIG_HARDLOCKUP_DETECTOR > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF > +/* The global function pointers */ > +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_enable; > +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_disable; > +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_perf_hardlockup_probe; > +#elif defined(CONFIG_HARDLOCKUP_DETECTOR_BUDDY) > +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_enable; > +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_disable; > +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_buddy_hardlockup_probe; > +#endif > + > +#ifdef CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE > +static char *hardlockup_detector_type = "perf"; /* Default to perf */ > +static int __init set_hardlockup_detector_type(char *str) > +{ > + if (!strncmp(str, "perf", 4)) { > + watchdog_hardlockup_enable_ptr = watchdog_perf_hardlockup_enable; > + watchdog_hardlockup_disable_ptr = watchdog_perf_hardlockup_disable; > + watchdog_hardlockup_probe_ptr = watchdog_perf_hardlockup_probe; > + } else if (!strncmp(str, "buddy", 5)) { > + watchdog_hardlockup_enable_ptr = watchdog_buddy_hardlockup_enable; > + watchdog_hardlockup_disable_ptr = watchdog_buddy_hardlockup_disable; > + watchdog_hardlockup_probe_ptr = watchdog_buddy_hardlockup_probe; > + } > + return 1; > +} > + > +__setup("hardlockup_detector=", set_hardlockup_detector_type); > + > +#endif > + > # ifdef CONFIG_SMP > int __read_mostly sysctl_hardlockup_all_cpu_backtrace; > # endif /* CONFIG_SMP */ > @@ -262,9 +293,17 @@ static inline void watchdog_hardlockup_kick(void) { } > * softlockup watchdog start and stop. The detector must select the > * SOFTLOCKUP_DETECTOR Kconfig. > */ > -void __weak watchdog_hardlockup_enable(unsigned int cpu) { } > +void __weak watchdog_hardlockup_enable(unsigned int cpu) > +{ > + if (watchdog_hardlockup_enable_ptr) > + watchdog_hardlockup_enable_ptr(cpu); > +} > > -void __weak watchdog_hardlockup_disable(unsigned int cpu) { } > +void __weak watchdog_hardlockup_disable(unsigned int cpu) > +{ > + if (watchdog_hardlockup_disable_ptr) > + watchdog_hardlockup_disable_ptr(cpu); > +} > > /* > * Watchdog-detector specific API. > @@ -275,6 +314,9 @@ void __weak watchdog_hardlockup_disable(unsigned int cpu) { } > */ > int __weak __init watchdog_hardlockup_probe(void) > { > + if (watchdog_hardlockup_probe_ptr) > + return watchdog_hardlockup_probe_ptr(); > + > return -ENODEV; > } > > diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c > index ee754d767c21..390d89bfcafa 100644 > --- a/kernel/watchdog_buddy.c > +++ b/kernel/watchdog_buddy.c > @@ -19,15 +19,16 @@ static unsigned int watchdog_next_cpu(unsigned int cpu) > return next_cpu; > } > > -int __init watchdog_hardlockup_probe(void) > +int __init watchdog_buddy_hardlockup_probe(void) > { > return 0; > } > > -void watchdog_hardlockup_enable(unsigned int cpu) > +void watchdog_buddy_hardlockup_enable(unsigned int cpu) > { > unsigned int next_cpu; > > + pr_info("ddddd %s\n", __func__); > /* > * The new CPU will be marked online before the hrtimer interrupt > * gets a chance to run on it. If another CPU tests for a > @@ -58,7 +59,7 @@ void watchdog_hardlockup_enable(unsigned int cpu) > cpumask_set_cpu(cpu, &watchdog_cpus); > } > > -void watchdog_hardlockup_disable(unsigned int cpu) > +void watchdog_buddy_hardlockup_disable(unsigned int cpu) > { > unsigned int next_cpu = watchdog_next_cpu(cpu); > > diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c > index 9c58f5b4381d..270110e58f20 100644 > --- a/kernel/watchdog_perf.c > +++ b/kernel/watchdog_perf.c > @@ -153,10 +153,12 @@ static int hardlockup_detector_event_create(void) > * watchdog_hardlockup_enable - Enable the local event > * @cpu: The CPU to enable hard lockup on. > */ > -void watchdog_hardlockup_enable(unsigned int cpu) > +void watchdog_perf_hardlockup_enable(unsigned int cpu) > { > WARN_ON_ONCE(cpu != smp_processor_id()); > > + pr_info("ddddd %s\n", __func__); > + > if (hardlockup_detector_event_create()) > return; > > @@ -172,7 +174,7 @@ void watchdog_hardlockup_enable(unsigned int cpu) > * watchdog_hardlockup_disable - Disable the local event > * @cpu: The CPU to enable hard lockup on. > */ > -void watchdog_hardlockup_disable(unsigned int cpu) > +void watchdog_perf_hardlockup_disable(unsigned int cpu) > { > struct perf_event *event = this_cpu_read(watchdog_ev); > > @@ -257,10 +259,12 @@ bool __weak __init arch_perf_nmi_is_available(void) > /** > * watchdog_hardlockup_probe - Probe whether NMI event is available at all > */ > -int __init watchdog_hardlockup_probe(void) > +int __init watchdog_perf_hardlockup_probe(void) > { > int ret; > > + pr_info("ddddd %s\n", __func__); > + > if (!arch_perf_nmi_is_available()) > return -ENODEV; > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index dc0e0c6ed075..443353fad1c1 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -1167,36 +1167,33 @@ config HARDLOCKUP_DETECTOR > # > # Note that arch-specific variants are always preferred. > # > -config HARDLOCKUP_DETECTOR_PREFER_BUDDY > - bool "Prefer the buddy CPU hardlockup detector" > - depends on HARDLOCKUP_DETECTOR > - depends on HAVE_HARDLOCKUP_DETECTOR_PERF && HAVE_HARDLOCKUP_DETECTOR_BUDDY > - depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH > - help > - Say Y here to prefer the buddy hardlockup detector over the perf one. > - > - With the buddy detector, each CPU uses its softlockup hrtimer > - to check that the next CPU is processing hrtimer interrupts by > - verifying that a counter is increasing. > - > - This hardlockup detector is useful on systems that don't have > - an arch-specific hardlockup detector or if resources needed > - for the hardlockup detector are better used for other things. > - > config HARDLOCKUP_DETECTOR_PERF > - bool > + bool "Enable perf-based hard lockup detector (preferred)" > depends on HARDLOCKUP_DETECTOR > - depends on HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY > + depends on HAVE_HARDLOCKUP_DETECTOR_PERF > depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH > select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER > + help > + This detector uses a perf event on the CPU to detect when a CPU > + has become non-maskable interrupt (NMI) stuck. This is the > + preferred method on modern systems as it can detect lockups on > + all CPUs at the same time. I'd say this option should be the default for kernel developers but shouldn't be used by default to free the perf event and due to the extra power overhead. Thanks, Ian > config HARDLOCKUP_DETECTOR_BUDDY > - bool > + bool "Enable buddy-based hard lockup detector" > depends on HARDLOCKUP_DETECTOR > depends on HAVE_HARDLOCKUP_DETECTOR_BUDDY > - depends on !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY > depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH > select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER > + help > + This is an alternative lockup detector that uses a heartbeat > + mechanism between CPUs to detect when one has stopped responding. > + It is less precise than the perf-based detector and cannot detect > + all-CPU lockups, but it does not require a perf counter. > + > +config CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE > + bool > + depends on HARDLOCKUP_DETECTOR_PERF && HARDLOCKUP_DETECTOR_BUDDY > > config HARDLOCKUP_DETECTOR_ARCH > bool > -- > 2.43.0 > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector 2025-09-17 0:03 ` Ian Rogers @ 2025-09-17 1:47 ` Jinchao Wang 2025-09-17 5:13 ` Ian Rogers 0 siblings, 1 reply; 22+ messages in thread From: Jinchao Wang @ 2025-09-17 1:47 UTC (permalink / raw) To: Ian Rogers Cc: Doug Anderson, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme, namhyung, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users On Tue, Sep 16, 2025 at 05:03:48PM -0700, Ian Rogers wrote: > On Tue, Sep 16, 2025 at 7:51 AM Jinchao Wang <wangjinchao600@gmail.com> wrote: > > > > Currently, the hard lockup detector is selected at compile time via > > Kconfig, which requires a kernel rebuild to switch implementations. > > This is inflexible, especially on systems where a perf event may not > > be available or may be needed for other tasks. > > > > This commit refactors the hard lockup detector to replace a rigid > > compile-time choice with a flexible build-time and boot-time solution. > > The patch supports building the kernel with either detector > > independently, or with both. When both are built, a new boot parameter > > `hardlockup_detector="perf|buddy"` allows the selection at boot time. > > This is a more robust and user-friendly design. > > > > This patch is a follow-up to the discussion on the kernel mailing list > > regarding the preference and future of the hard lockup detectors. It > > implements a flexible solution that addresses the community's need to > > select an appropriate detector at boot time. > > > > The core changes are: > > - The `perf` and `buddy` watchdog implementations are separated into > > distinct functions (e.g., `watchdog_perf_hardlockup_enable`). > > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`) > > to serve as a single API for the entire feature. > > - A new `hardlockup_detector=` boot parameter is added to allow the > > user to select the desired detector at boot time. > > - The Kconfig options are simplified by removing the complex > > `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be > > built without mutual exclusion. > > - The weak stubs are updated to call the new function pointers, > > centralizing the watchdog logic. > > What is the impact on /proc/sys/kernel/nmi_watchdog ? Is that > enabling and disabling whatever the boot time choice was? I'm not sure > why this has to be a boot time option given the ability to configure > via /proc/sys/kernel/nmi_watchdog. The new hardlockup_detector boot parameter and the existing /proc/sys/kernel/nmi_watchdog file serve different purposes. The boot parameter selects the type of hard lockup detector (perf or buddy). This choice is made once at boot. /proc/sys/kernel/nmi_watchdog, on the other hand, is only a simple on/off switch for the currently selected detector. It does not change the detector's type. > > > Link: https://lore.kernel.org/all/20250915035355.10846-1-cuiyunhui@bytedance.com/ > > Link: https://lore.kernel.org/all/CAD=FV=WWUiCi6bZCs_gseFpDDWNkuJMoL6XCftEo6W7q6jRCkg@mail.gmail.com/ > > > > Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com> > > --- > > .../admin-guide/kernel-parameters.txt | 7 +++ > > include/linux/nmi.h | 6 +++ > > kernel/watchdog.c | 46 ++++++++++++++++++- > > kernel/watchdog_buddy.c | 7 +-- > > kernel/watchdog_perf.c | 10 ++-- > > lib/Kconfig.debug | 37 +++++++-------- > > 6 files changed, 85 insertions(+), 28 deletions(-) > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > > index 5a7a83c411e9..0af214ee566c 100644 > > --- a/Documentation/admin-guide/kernel-parameters.txt > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > @@ -1828,6 +1828,13 @@ > > backtraces on all cpus. > > Format: 0 | 1 > > > > + hardlockup_detector= > > + [perf, buddy] Selects the hard lockup detector to use at > > + boot time. > > + Format: <string> > > + - "perf": Use the perf-based detector. > > + - "buddy": Use the buddy-based detector. > > + > > hash_pointers= > > [KNL,EARLY] > > By default, when pointers are printed to the console > > diff --git a/include/linux/nmi.h b/include/linux/nmi.h > > index cf3c6ab408aa..9298980ce572 100644 > > --- a/include/linux/nmi.h > > +++ b/include/linux/nmi.h > > @@ -100,6 +100,9 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs); > > #endif > > > > #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) > > +void watchdog_perf_hardlockup_enable(unsigned int cpu); > > +void watchdog_perf_hardlockup_disable(unsigned int cpu); > > +extern int watchdog_perf_hardlockup_probe(void); > > extern void hardlockup_detector_perf_stop(void); > > extern void hardlockup_detector_perf_restart(void); > > extern void hardlockup_config_perf_event(const char *str); > > @@ -120,6 +123,9 @@ void watchdog_hardlockup_disable(unsigned int cpu); > > void lockup_detector_reconfigure(void); > > > > #ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY > > +void watchdog_buddy_hardlockup_enable(unsigned int cpu); > > +void watchdog_buddy_hardlockup_disable(unsigned int cpu); > > +int watchdog_buddy_hardlockup_probe(void); > > void watchdog_buddy_check_hardlockup(int hrtimer_interrupts); > > #else > > static inline void watchdog_buddy_check_hardlockup(int hrtimer_interrupts) {} > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > > index 80b56c002c7f..85451d24a77d 100644 > > --- a/kernel/watchdog.c > > +++ b/kernel/watchdog.c > > @@ -55,6 +55,37 @@ unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask); > > > > #ifdef CONFIG_HARDLOCKUP_DETECTOR > > > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF > > +/* The global function pointers */ > > +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_enable; > > +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_disable; > > +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_perf_hardlockup_probe; > > +#elif defined(CONFIG_HARDLOCKUP_DETECTOR_BUDDY) > > +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_enable; > > +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_disable; > > +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_buddy_hardlockup_probe; > > +#endif > > + > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE > > +static char *hardlockup_detector_type = "perf"; /* Default to perf */ > > +static int __init set_hardlockup_detector_type(char *str) > > +{ > > + if (!strncmp(str, "perf", 4)) { > > + watchdog_hardlockup_enable_ptr = watchdog_perf_hardlockup_enable; > > + watchdog_hardlockup_disable_ptr = watchdog_perf_hardlockup_disable; > > + watchdog_hardlockup_probe_ptr = watchdog_perf_hardlockup_probe; > > + } else if (!strncmp(str, "buddy", 5)) { > > + watchdog_hardlockup_enable_ptr = watchdog_buddy_hardlockup_enable; > > + watchdog_hardlockup_disable_ptr = watchdog_buddy_hardlockup_disable; > > + watchdog_hardlockup_probe_ptr = watchdog_buddy_hardlockup_probe; > > + } > > + return 1; > > +} > > + > > +__setup("hardlockup_detector=", set_hardlockup_detector_type); > > + > > +#endif > > + > > # ifdef CONFIG_SMP > > int __read_mostly sysctl_hardlockup_all_cpu_backtrace; > > # endif /* CONFIG_SMP */ > > @@ -262,9 +293,17 @@ static inline void watchdog_hardlockup_kick(void) { } > > * softlockup watchdog start and stop. The detector must select the > > * SOFTLOCKUP_DETECTOR Kconfig. > > */ > > -void __weak watchdog_hardlockup_enable(unsigned int cpu) { } > > +void __weak watchdog_hardlockup_enable(unsigned int cpu) > > +{ > > + if (watchdog_hardlockup_enable_ptr) > > + watchdog_hardlockup_enable_ptr(cpu); > > +} > > > > -void __weak watchdog_hardlockup_disable(unsigned int cpu) { } > > +void __weak watchdog_hardlockup_disable(unsigned int cpu) > > +{ > > + if (watchdog_hardlockup_disable_ptr) > > + watchdog_hardlockup_disable_ptr(cpu); > > +} > > > > /* > > * Watchdog-detector specific API. > > @@ -275,6 +314,9 @@ void __weak watchdog_hardlockup_disable(unsigned int cpu) { } > > */ > > int __weak __init watchdog_hardlockup_probe(void) > > { > > + if (watchdog_hardlockup_probe_ptr) > > + return watchdog_hardlockup_probe_ptr(); > > + > > return -ENODEV; > > } > > > > diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c > > index ee754d767c21..390d89bfcafa 100644 > > --- a/kernel/watchdog_buddy.c > > +++ b/kernel/watchdog_buddy.c > > @@ -19,15 +19,16 @@ static unsigned int watchdog_next_cpu(unsigned int cpu) > > return next_cpu; > > } > > > > -int __init watchdog_hardlockup_probe(void) > > +int __init watchdog_buddy_hardlockup_probe(void) > > { > > return 0; > > } > > > > -void watchdog_hardlockup_enable(unsigned int cpu) > > +void watchdog_buddy_hardlockup_enable(unsigned int cpu) > > { > > unsigned int next_cpu; > > > > + pr_info("ddddd %s\n", __func__); > > /* > > * The new CPU will be marked online before the hrtimer interrupt > > * gets a chance to run on it. If another CPU tests for a > > @@ -58,7 +59,7 @@ void watchdog_hardlockup_enable(unsigned int cpu) > > cpumask_set_cpu(cpu, &watchdog_cpus); > > } > > > > -void watchdog_hardlockup_disable(unsigned int cpu) > > +void watchdog_buddy_hardlockup_disable(unsigned int cpu) > > { > > unsigned int next_cpu = watchdog_next_cpu(cpu); > > > > diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c > > index 9c58f5b4381d..270110e58f20 100644 > > --- a/kernel/watchdog_perf.c > > +++ b/kernel/watchdog_perf.c > > @@ -153,10 +153,12 @@ static int hardlockup_detector_event_create(void) > > * watchdog_hardlockup_enable - Enable the local event > > * @cpu: The CPU to enable hard lockup on. > > */ > > -void watchdog_hardlockup_enable(unsigned int cpu) > > +void watchdog_perf_hardlockup_enable(unsigned int cpu) > > { > > WARN_ON_ONCE(cpu != smp_processor_id()); > > > > + pr_info("ddddd %s\n", __func__); > > + > > if (hardlockup_detector_event_create()) > > return; > > > > @@ -172,7 +174,7 @@ void watchdog_hardlockup_enable(unsigned int cpu) > > * watchdog_hardlockup_disable - Disable the local event > > * @cpu: The CPU to enable hard lockup on. > > */ > > -void watchdog_hardlockup_disable(unsigned int cpu) > > +void watchdog_perf_hardlockup_disable(unsigned int cpu) > > { > > struct perf_event *event = this_cpu_read(watchdog_ev); > > > > @@ -257,10 +259,12 @@ bool __weak __init arch_perf_nmi_is_available(void) > > /** > > * watchdog_hardlockup_probe - Probe whether NMI event is available at all > > */ > > -int __init watchdog_hardlockup_probe(void) > > +int __init watchdog_perf_hardlockup_probe(void) > > { > > int ret; > > > > + pr_info("ddddd %s\n", __func__); > > + > > if (!arch_perf_nmi_is_available()) > > return -ENODEV; > > > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > > index dc0e0c6ed075..443353fad1c1 100644 > > --- a/lib/Kconfig.debug > > +++ b/lib/Kconfig.debug > > @@ -1167,36 +1167,33 @@ config HARDLOCKUP_DETECTOR > > # > > # Note that arch-specific variants are always preferred. > > # > > -config HARDLOCKUP_DETECTOR_PREFER_BUDDY > > - bool "Prefer the buddy CPU hardlockup detector" > > - depends on HARDLOCKUP_DETECTOR > > - depends on HAVE_HARDLOCKUP_DETECTOR_PERF && HAVE_HARDLOCKUP_DETECTOR_BUDDY > > - depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH > > - help > > - Say Y here to prefer the buddy hardlockup detector over the perf one. > > - > > - With the buddy detector, each CPU uses its softlockup hrtimer > > - to check that the next CPU is processing hrtimer interrupts by > > - verifying that a counter is increasing. > > - > > - This hardlockup detector is useful on systems that don't have > > - an arch-specific hardlockup detector or if resources needed > > - for the hardlockup detector are better used for other things. > > - > > config HARDLOCKUP_DETECTOR_PERF > > - bool > > + bool "Enable perf-based hard lockup detector (preferred)" > > depends on HARDLOCKUP_DETECTOR > > - depends on HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY > > + depends on HAVE_HARDLOCKUP_DETECTOR_PERF > > depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH > > select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER > > + help > > + This detector uses a perf event on the CPU to detect when a CPU > > + has become non-maskable interrupt (NMI) stuck. This is the > > + preferred method on modern systems as it can detect lockups on > > + all CPUs at the same time. > > I'd say this option should be the default for kernel developers but > shouldn't be used by default to free the perf event and due to the > extra power overhead. > > Thanks, > Ian > > > config HARDLOCKUP_DETECTOR_BUDDY > > - bool > > + bool "Enable buddy-based hard lockup detector" > > depends on HARDLOCKUP_DETECTOR > > depends on HAVE_HARDLOCKUP_DETECTOR_BUDDY > > - depends on !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY > > depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH > > select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER > > + help > > + This is an alternative lockup detector that uses a heartbeat > > + mechanism between CPUs to detect when one has stopped responding. > > + It is less precise than the perf-based detector and cannot detect > > + all-CPU lockups, but it does not require a perf counter. > > + > > +config CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE > > + bool > > + depends on HARDLOCKUP_DETECTOR_PERF && HARDLOCKUP_DETECTOR_BUDDY > > > > config HARDLOCKUP_DETECTOR_ARCH > > bool > > -- > > 2.43.0 > > -- Jinchao ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector 2025-09-17 1:47 ` Jinchao Wang @ 2025-09-17 5:13 ` Ian Rogers 2025-09-17 5:35 ` Namhyung Kim 0 siblings, 1 reply; 22+ messages in thread From: Ian Rogers @ 2025-09-17 5:13 UTC (permalink / raw) To: Jinchao Wang Cc: Doug Anderson, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme, namhyung, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users On Tue, Sep 16, 2025 at 6:47 PM Jinchao Wang <wangjinchao600@gmail.com> wrote: > > On Tue, Sep 16, 2025 at 05:03:48PM -0700, Ian Rogers wrote: > > On Tue, Sep 16, 2025 at 7:51 AM Jinchao Wang <wangjinchao600@gmail.com> wrote: > > > > > > Currently, the hard lockup detector is selected at compile time via > > > Kconfig, which requires a kernel rebuild to switch implementations. > > > This is inflexible, especially on systems where a perf event may not > > > be available or may be needed for other tasks. > > > > > > This commit refactors the hard lockup detector to replace a rigid > > > compile-time choice with a flexible build-time and boot-time solution. > > > The patch supports building the kernel with either detector > > > independently, or with both. When both are built, a new boot parameter > > > `hardlockup_detector="perf|buddy"` allows the selection at boot time. > > > This is a more robust and user-friendly design. > > > > > > This patch is a follow-up to the discussion on the kernel mailing list > > > regarding the preference and future of the hard lockup detectors. It > > > implements a flexible solution that addresses the community's need to > > > select an appropriate detector at boot time. > > > > > > The core changes are: > > > - The `perf` and `buddy` watchdog implementations are separated into > > > distinct functions (e.g., `watchdog_perf_hardlockup_enable`). > > > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`) > > > to serve as a single API for the entire feature. > > > - A new `hardlockup_detector=` boot parameter is added to allow the > > > user to select the desired detector at boot time. > > > - The Kconfig options are simplified by removing the complex > > > `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be > > > built without mutual exclusion. > > > - The weak stubs are updated to call the new function pointers, > > > centralizing the watchdog logic. > > > > What is the impact on /proc/sys/kernel/nmi_watchdog ? Is that > > enabling and disabling whatever the boot time choice was? I'm not sure > > why this has to be a boot time option given the ability to configure > > via /proc/sys/kernel/nmi_watchdog. > The new hardlockup_detector boot parameter and the existing > /proc/sys/kernel/nmi_watchdog file serve different purposes. > > The boot parameter selects the type of hard lockup detector (perf or buddy). > This choice is made once at boot. > > /proc/sys/kernel/nmi_watchdog, on the other hand, is only a simple on/off > switch for the currently selected detector. It does not change the detector's > type. So the name "nmi_watchdog" for the buddy watchdog is wrong for fairly obvious naming reasons but also because we can't differentiate when a perf event has been taken or not - this impacts perf that is choosing not to group events in metrics because of it, reducing the metric's accuracy. We need an equivalent "buddy_watchdog" file to the "nmi_watchdog" file. If we have such a file then if I did "echo 1 > /proc/sys/kernel/nmi_watchdog" I'd expect the buddy watchdog to be disabled and the perf event one to be enabled. Similarly, if I did "echo 1 > /proc/sys/kernel/buddy_watchdog" then I would expect the perf event watchdog to be disabled and the buddy one enabled. If I did "echo 0 > /proc/sys/kernel/nmi_watchdog; echo 0 > /proc/sys/kernel/buddy_watchdog" then I'd expect neither to be enabled. I don't see why choosing the type of watchdog implementation at boot time is particularly desirable. It seems sensible to default normal people to using the buddy watchdog (more perf events, power...) and CONFIG_DEBUG_KERNEL type people to using the perf event one. As the "nmi_watchdog" file may be assumed to control the buddy watchdog, perhaps a compatibility option (where the "nmi_watchdog" file controls the buddy watchdog) is needed so that user code has time to migrate. Thanks, Ian > > > > > Link: https://lore.kernel.org/all/20250915035355.10846-1-cuiyunhui@bytedance.com/ > > > Link: https://lore.kernel.org/all/CAD=FV=WWUiCi6bZCs_gseFpDDWNkuJMoL6XCftEo6W7q6jRCkg@mail.gmail.com/ > > > > > > Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com> > > > --- > > > .../admin-guide/kernel-parameters.txt | 7 +++ > > > include/linux/nmi.h | 6 +++ > > > kernel/watchdog.c | 46 ++++++++++++++++++- > > > kernel/watchdog_buddy.c | 7 +-- > > > kernel/watchdog_perf.c | 10 ++-- > > > lib/Kconfig.debug | 37 +++++++-------- > > > 6 files changed, 85 insertions(+), 28 deletions(-) > > > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > > > index 5a7a83c411e9..0af214ee566c 100644 > > > --- a/Documentation/admin-guide/kernel-parameters.txt > > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > > @@ -1828,6 +1828,13 @@ > > > backtraces on all cpus. > > > Format: 0 | 1 > > > > > > + hardlockup_detector= > > > + [perf, buddy] Selects the hard lockup detector to use at > > > + boot time. > > > + Format: <string> > > > + - "perf": Use the perf-based detector. > > > + - "buddy": Use the buddy-based detector. > > > + > > > hash_pointers= > > > [KNL,EARLY] > > > By default, when pointers are printed to the console > > > diff --git a/include/linux/nmi.h b/include/linux/nmi.h > > > index cf3c6ab408aa..9298980ce572 100644 > > > --- a/include/linux/nmi.h > > > +++ b/include/linux/nmi.h > > > @@ -100,6 +100,9 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs); > > > #endif > > > > > > #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) > > > +void watchdog_perf_hardlockup_enable(unsigned int cpu); > > > +void watchdog_perf_hardlockup_disable(unsigned int cpu); > > > +extern int watchdog_perf_hardlockup_probe(void); > > > extern void hardlockup_detector_perf_stop(void); > > > extern void hardlockup_detector_perf_restart(void); > > > extern void hardlockup_config_perf_event(const char *str); > > > @@ -120,6 +123,9 @@ void watchdog_hardlockup_disable(unsigned int cpu); > > > void lockup_detector_reconfigure(void); > > > > > > #ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY > > > +void watchdog_buddy_hardlockup_enable(unsigned int cpu); > > > +void watchdog_buddy_hardlockup_disable(unsigned int cpu); > > > +int watchdog_buddy_hardlockup_probe(void); > > > void watchdog_buddy_check_hardlockup(int hrtimer_interrupts); > > > #else > > > static inline void watchdog_buddy_check_hardlockup(int hrtimer_interrupts) {} > > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > > > index 80b56c002c7f..85451d24a77d 100644 > > > --- a/kernel/watchdog.c > > > +++ b/kernel/watchdog.c > > > @@ -55,6 +55,37 @@ unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask); > > > > > > #ifdef CONFIG_HARDLOCKUP_DETECTOR > > > > > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF > > > +/* The global function pointers */ > > > +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_enable; > > > +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_disable; > > > +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_perf_hardlockup_probe; > > > +#elif defined(CONFIG_HARDLOCKUP_DETECTOR_BUDDY) > > > +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_enable; > > > +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_disable; > > > +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_buddy_hardlockup_probe; > > > +#endif > > > + > > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE > > > +static char *hardlockup_detector_type = "perf"; /* Default to perf */ > > > +static int __init set_hardlockup_detector_type(char *str) > > > +{ > > > + if (!strncmp(str, "perf", 4)) { > > > + watchdog_hardlockup_enable_ptr = watchdog_perf_hardlockup_enable; > > > + watchdog_hardlockup_disable_ptr = watchdog_perf_hardlockup_disable; > > > + watchdog_hardlockup_probe_ptr = watchdog_perf_hardlockup_probe; > > > + } else if (!strncmp(str, "buddy", 5)) { > > > + watchdog_hardlockup_enable_ptr = watchdog_buddy_hardlockup_enable; > > > + watchdog_hardlockup_disable_ptr = watchdog_buddy_hardlockup_disable; > > > + watchdog_hardlockup_probe_ptr = watchdog_buddy_hardlockup_probe; > > > + } > > > + return 1; > > > +} > > > + > > > +__setup("hardlockup_detector=", set_hardlockup_detector_type); > > > + > > > +#endif > > > + > > > # ifdef CONFIG_SMP > > > int __read_mostly sysctl_hardlockup_all_cpu_backtrace; > > > # endif /* CONFIG_SMP */ > > > @@ -262,9 +293,17 @@ static inline void watchdog_hardlockup_kick(void) { } > > > * softlockup watchdog start and stop. The detector must select the > > > * SOFTLOCKUP_DETECTOR Kconfig. > > > */ > > > -void __weak watchdog_hardlockup_enable(unsigned int cpu) { } > > > +void __weak watchdog_hardlockup_enable(unsigned int cpu) > > > +{ > > > + if (watchdog_hardlockup_enable_ptr) > > > + watchdog_hardlockup_enable_ptr(cpu); > > > +} > > > > > > -void __weak watchdog_hardlockup_disable(unsigned int cpu) { } > > > +void __weak watchdog_hardlockup_disable(unsigned int cpu) > > > +{ > > > + if (watchdog_hardlockup_disable_ptr) > > > + watchdog_hardlockup_disable_ptr(cpu); > > > +} > > > > > > /* > > > * Watchdog-detector specific API. > > > @@ -275,6 +314,9 @@ void __weak watchdog_hardlockup_disable(unsigned int cpu) { } > > > */ > > > int __weak __init watchdog_hardlockup_probe(void) > > > { > > > + if (watchdog_hardlockup_probe_ptr) > > > + return watchdog_hardlockup_probe_ptr(); > > > + > > > return -ENODEV; > > > } > > > > > > diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c > > > index ee754d767c21..390d89bfcafa 100644 > > > --- a/kernel/watchdog_buddy.c > > > +++ b/kernel/watchdog_buddy.c > > > @@ -19,15 +19,16 @@ static unsigned int watchdog_next_cpu(unsigned int cpu) > > > return next_cpu; > > > } > > > > > > -int __init watchdog_hardlockup_probe(void) > > > +int __init watchdog_buddy_hardlockup_probe(void) > > > { > > > return 0; > > > } > > > > > > -void watchdog_hardlockup_enable(unsigned int cpu) > > > +void watchdog_buddy_hardlockup_enable(unsigned int cpu) > > > { > > > unsigned int next_cpu; > > > > > > + pr_info("ddddd %s\n", __func__); > > > /* > > > * The new CPU will be marked online before the hrtimer interrupt > > > * gets a chance to run on it. If another CPU tests for a > > > @@ -58,7 +59,7 @@ void watchdog_hardlockup_enable(unsigned int cpu) > > > cpumask_set_cpu(cpu, &watchdog_cpus); > > > } > > > > > > -void watchdog_hardlockup_disable(unsigned int cpu) > > > +void watchdog_buddy_hardlockup_disable(unsigned int cpu) > > > { > > > unsigned int next_cpu = watchdog_next_cpu(cpu); > > > > > > diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c > > > index 9c58f5b4381d..270110e58f20 100644 > > > --- a/kernel/watchdog_perf.c > > > +++ b/kernel/watchdog_perf.c > > > @@ -153,10 +153,12 @@ static int hardlockup_detector_event_create(void) > > > * watchdog_hardlockup_enable - Enable the local event > > > * @cpu: The CPU to enable hard lockup on. > > > */ > > > -void watchdog_hardlockup_enable(unsigned int cpu) > > > +void watchdog_perf_hardlockup_enable(unsigned int cpu) > > > { > > > WARN_ON_ONCE(cpu != smp_processor_id()); > > > > > > + pr_info("ddddd %s\n", __func__); > > > + > > > if (hardlockup_detector_event_create()) > > > return; > > > > > > @@ -172,7 +174,7 @@ void watchdog_hardlockup_enable(unsigned int cpu) > > > * watchdog_hardlockup_disable - Disable the local event > > > * @cpu: The CPU to enable hard lockup on. > > > */ > > > -void watchdog_hardlockup_disable(unsigned int cpu) > > > +void watchdog_perf_hardlockup_disable(unsigned int cpu) > > > { > > > struct perf_event *event = this_cpu_read(watchdog_ev); > > > > > > @@ -257,10 +259,12 @@ bool __weak __init arch_perf_nmi_is_available(void) > > > /** > > > * watchdog_hardlockup_probe - Probe whether NMI event is available at all > > > */ > > > -int __init watchdog_hardlockup_probe(void) > > > +int __init watchdog_perf_hardlockup_probe(void) > > > { > > > int ret; > > > > > > + pr_info("ddddd %s\n", __func__); > > > + > > > if (!arch_perf_nmi_is_available()) > > > return -ENODEV; > > > > > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > > > index dc0e0c6ed075..443353fad1c1 100644 > > > --- a/lib/Kconfig.debug > > > +++ b/lib/Kconfig.debug > > > @@ -1167,36 +1167,33 @@ config HARDLOCKUP_DETECTOR > > > # > > > # Note that arch-specific variants are always preferred. > > > # > > > -config HARDLOCKUP_DETECTOR_PREFER_BUDDY > > > - bool "Prefer the buddy CPU hardlockup detector" > > > - depends on HARDLOCKUP_DETECTOR > > > - depends on HAVE_HARDLOCKUP_DETECTOR_PERF && HAVE_HARDLOCKUP_DETECTOR_BUDDY > > > - depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH > > > - help > > > - Say Y here to prefer the buddy hardlockup detector over the perf one. > > > - > > > - With the buddy detector, each CPU uses its softlockup hrtimer > > > - to check that the next CPU is processing hrtimer interrupts by > > > - verifying that a counter is increasing. > > > - > > > - This hardlockup detector is useful on systems that don't have > > > - an arch-specific hardlockup detector or if resources needed > > > - for the hardlockup detector are better used for other things. > > > - > > > config HARDLOCKUP_DETECTOR_PERF > > > - bool > > > + bool "Enable perf-based hard lockup detector (preferred)" > > > depends on HARDLOCKUP_DETECTOR > > > - depends on HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY > > > + depends on HAVE_HARDLOCKUP_DETECTOR_PERF > > > depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH > > > select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER > > > + help > > > + This detector uses a perf event on the CPU to detect when a CPU > > > + has become non-maskable interrupt (NMI) stuck. This is the > > > + preferred method on modern systems as it can detect lockups on > > > + all CPUs at the same time. > > > > I'd say this option should be the default for kernel developers but > > shouldn't be used by default to free the perf event and due to the > > extra power overhead. > > > > Thanks, > > Ian > > > > > config HARDLOCKUP_DETECTOR_BUDDY > > > - bool > > > + bool "Enable buddy-based hard lockup detector" > > > depends on HARDLOCKUP_DETECTOR > > > depends on HAVE_HARDLOCKUP_DETECTOR_BUDDY > > > - depends on !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY > > > depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH > > > select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER > > > + help > > > + This is an alternative lockup detector that uses a heartbeat > > > + mechanism between CPUs to detect when one has stopped responding. > > > + It is less precise than the perf-based detector and cannot detect > > > + all-CPU lockups, but it does not require a perf counter. > > > + > > > +config CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE > > > + bool > > > + depends on HARDLOCKUP_DETECTOR_PERF && HARDLOCKUP_DETECTOR_BUDDY > > > > > > config HARDLOCKUP_DETECTOR_ARCH > > > bool > > > -- > > > 2.43.0 > > > > > -- > Jinchao ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector 2025-09-17 5:13 ` Ian Rogers @ 2025-09-17 5:35 ` Namhyung Kim 2025-09-17 6:14 ` Jinchao Wang 0 siblings, 1 reply; 22+ messages in thread From: Namhyung Kim @ 2025-09-17 5:35 UTC (permalink / raw) To: Ian Rogers Cc: Jinchao Wang, Doug Anderson, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users Hello, On Tue, Sep 16, 2025 at 10:13:12PM -0700, Ian Rogers wrote: > On Tue, Sep 16, 2025 at 6:47 PM Jinchao Wang <wangjinchao600@gmail.com> wrote: > > > > On Tue, Sep 16, 2025 at 05:03:48PM -0700, Ian Rogers wrote: > > > On Tue, Sep 16, 2025 at 7:51 AM Jinchao Wang <wangjinchao600@gmail.com> wrote: > > > > > > > > Currently, the hard lockup detector is selected at compile time via > > > > Kconfig, which requires a kernel rebuild to switch implementations. > > > > This is inflexible, especially on systems where a perf event may not > > > > be available or may be needed for other tasks. > > > > > > > > This commit refactors the hard lockup detector to replace a rigid > > > > compile-time choice with a flexible build-time and boot-time solution. > > > > The patch supports building the kernel with either detector > > > > independently, or with both. When both are built, a new boot parameter > > > > `hardlockup_detector="perf|buddy"` allows the selection at boot time. > > > > This is a more robust and user-friendly design. > > > > > > > > This patch is a follow-up to the discussion on the kernel mailing list > > > > regarding the preference and future of the hard lockup detectors. It > > > > implements a flexible solution that addresses the community's need to > > > > select an appropriate detector at boot time. > > > > > > > > The core changes are: > > > > - The `perf` and `buddy` watchdog implementations are separated into > > > > distinct functions (e.g., `watchdog_perf_hardlockup_enable`). > > > > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`) > > > > to serve as a single API for the entire feature. > > > > - A new `hardlockup_detector=` boot parameter is added to allow the > > > > user to select the desired detector at boot time. > > > > - The Kconfig options are simplified by removing the complex > > > > `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be > > > > built without mutual exclusion. > > > > - The weak stubs are updated to call the new function pointers, > > > > centralizing the watchdog logic. > > > > > > What is the impact on /proc/sys/kernel/nmi_watchdog ? Is that > > > enabling and disabling whatever the boot time choice was? I'm not sure > > > why this has to be a boot time option given the ability to configure > > > via /proc/sys/kernel/nmi_watchdog. > > The new hardlockup_detector boot parameter and the existing > > /proc/sys/kernel/nmi_watchdog file serve different purposes. > > > > The boot parameter selects the type of hard lockup detector (perf or buddy). > > This choice is made once at boot. > > > > /proc/sys/kernel/nmi_watchdog, on the other hand, is only a simple on/off > > switch for the currently selected detector. It does not change the detector's > > type. > > So the name "nmi_watchdog" for the buddy watchdog is wrong for fairly > obvious naming reasons but also because we can't differentiate when a > perf event has been taken or not - this impacts perf that is choosing > not to group events in metrics because of it, reducing the metric's > accuracy. We need an equivalent "buddy_watchdog" file to the > "nmi_watchdog" file. If we have such a file then if I did "echo 1 > > /proc/sys/kernel/nmi_watchdog" I'd expect the buddy watchdog to be > disabled and the perf event one to be enabled. Similarly, if I did > "echo 1 > /proc/sys/kernel/buddy_watchdog" then I would expect the > perf event watchdog to be disabled and the buddy one enabled. If I did > "echo 0 > /proc/sys/kernel/nmi_watchdog; echo 0 > > /proc/sys/kernel/buddy_watchdog" then I'd expect neither to be > enabled. I don't see why choosing the type of watchdog implementation > at boot time is particularly desirable. It seems sensible to default > normal people to using the buddy watchdog (more perf events, power...) > and CONFIG_DEBUG_KERNEL type people to using the perf event one. As > the "nmi_watchdog" file may be assumed to control the buddy watchdog, > perhaps a compatibility option (where the "nmi_watchdog" file controls > the buddy watchdog) is needed so that user code has time to migrate. Sounds good to me. For perf tools, it'd be great if we can have a run- time check which watchdog is selected. Thanks, Namhyung ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector 2025-09-17 5:35 ` Namhyung Kim @ 2025-09-17 6:14 ` Jinchao Wang 2025-10-06 21:29 ` Ian Rogers 0 siblings, 1 reply; 22+ messages in thread From: Jinchao Wang @ 2025-09-17 6:14 UTC (permalink / raw) To: Namhyung Kim Cc: Ian Rogers, Doug Anderson, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users On Tue, Sep 16, 2025 at 10:35:46PM -0700, Namhyung Kim wrote: > Hello, > > On Tue, Sep 16, 2025 at 10:13:12PM -0700, Ian Rogers wrote: > > On Tue, Sep 16, 2025 at 6:47 PM Jinchao Wang <wangjinchao600@gmail.com> wrote: > > > > > > On Tue, Sep 16, 2025 at 05:03:48PM -0700, Ian Rogers wrote: > > > > On Tue, Sep 16, 2025 at 7:51 AM Jinchao Wang <wangjinchao600@gmail.com> wrote: > > > > > > > > > > Currently, the hard lockup detector is selected at compile time via > > > > > Kconfig, which requires a kernel rebuild to switch implementations. > > > > > This is inflexible, especially on systems where a perf event may not > > > > > be available or may be needed for other tasks. > > > > > > > > > > This commit refactors the hard lockup detector to replace a rigid > > > > > compile-time choice with a flexible build-time and boot-time solution. > > > > > The patch supports building the kernel with either detector > > > > > independently, or with both. When both are built, a new boot parameter > > > > > `hardlockup_detector="perf|buddy"` allows the selection at boot time. > > > > > This is a more robust and user-friendly design. > > > > > > > > > > This patch is a follow-up to the discussion on the kernel mailing list > > > > > regarding the preference and future of the hard lockup detectors. It > > > > > implements a flexible solution that addresses the community's need to > > > > > select an appropriate detector at boot time. > > > > > > > > > > The core changes are: > > > > > - The `perf` and `buddy` watchdog implementations are separated into > > > > > distinct functions (e.g., `watchdog_perf_hardlockup_enable`). > > > > > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`) > > > > > to serve as a single API for the entire feature. > > > > > - A new `hardlockup_detector=` boot parameter is added to allow the > > > > > user to select the desired detector at boot time. > > > > > - The Kconfig options are simplified by removing the complex > > > > > `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be > > > > > built without mutual exclusion. > > > > > - The weak stubs are updated to call the new function pointers, > > > > > centralizing the watchdog logic. > > > > > > > > What is the impact on /proc/sys/kernel/nmi_watchdog ? Is that > > > > enabling and disabling whatever the boot time choice was? I'm not sure > > > > why this has to be a boot time option given the ability to configure > > > > via /proc/sys/kernel/nmi_watchdog. > > > The new hardlockup_detector boot parameter and the existing > > > /proc/sys/kernel/nmi_watchdog file serve different purposes. > > > > > > The boot parameter selects the type of hard lockup detector (perf or buddy). > > > This choice is made once at boot. > > > > > > /proc/sys/kernel/nmi_watchdog, on the other hand, is only a simple on/off > > > switch for the currently selected detector. It does not change the detector's > > > type. > > > > So the name "nmi_watchdog" for the buddy watchdog is wrong for fairly > > obvious naming reasons but also because we can't differentiate when a > > perf event has been taken or not - this impacts perf that is choosing > > not to group events in metrics because of it, reducing the metric's > > accuracy. We need an equivalent "buddy_watchdog" file to the > > "nmi_watchdog" file. If we have such a file then if I did "echo 1 > > > /proc/sys/kernel/nmi_watchdog" I'd expect the buddy watchdog to be > > disabled and the perf event one to be enabled. Similarly, if I did > > "echo 1 > /proc/sys/kernel/buddy_watchdog" then I would expect the > > perf event watchdog to be disabled and the buddy one enabled. If I did > > "echo 0 > /proc/sys/kernel/nmi_watchdog; echo 0 > > > /proc/sys/kernel/buddy_watchdog" then I'd expect neither to be > > enabled. I don't see why choosing the type of watchdog implementation > > at boot time is particularly desirable. It seems sensible to default > > normal people to using the buddy watchdog (more perf events, power...) > > and CONFIG_DEBUG_KERNEL type people to using the perf event one. As > > the "nmi_watchdog" file may be assumed to control the buddy watchdog, > > perhaps a compatibility option (where the "nmi_watchdog" file controls > > the buddy watchdog) is needed so that user code has time to migrate. > > Sounds good to me. For perf tools, it'd be great if we can have a run- > time check which watchdog is selected. Considering backward compatibility, I prefer to keep /proc/sys/kernel/nmi_watchdog and introduce a new file called /proc/sys/kernel/hardlockup_detector_type, which only shows the default string or the boot parameter. The global str pointer hardlockup_detector_type was already introduced in the patch, so exposing it in a file is straightforward. > > Thanks, > Namhyung > -- Jinchao ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector 2025-09-17 6:14 ` Jinchao Wang @ 2025-10-06 21:29 ` Ian Rogers 2025-10-06 23:24 ` Doug Anderson 0 siblings, 1 reply; 22+ messages in thread From: Ian Rogers @ 2025-10-06 21:29 UTC (permalink / raw) To: Jinchao Wang Cc: Namhyung Kim, Doug Anderson, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users On Tue, Sep 16, 2025 at 11:14 PM Jinchao Wang <wangjinchao600@gmail.com> wrote: > > On Tue, Sep 16, 2025 at 10:35:46PM -0700, Namhyung Kim wrote: > > Hello, > > > > On Tue, Sep 16, 2025 at 10:13:12PM -0700, Ian Rogers wrote: > > > On Tue, Sep 16, 2025 at 6:47 PM Jinchao Wang <wangjinchao600@gmail.com> wrote: > > > > > > > > On Tue, Sep 16, 2025 at 05:03:48PM -0700, Ian Rogers wrote: > > > > > On Tue, Sep 16, 2025 at 7:51 AM Jinchao Wang <wangjinchao600@gmail.com> wrote: > > > > > > > > > > > > Currently, the hard lockup detector is selected at compile time via > > > > > > Kconfig, which requires a kernel rebuild to switch implementations. > > > > > > This is inflexible, especially on systems where a perf event may not > > > > > > be available or may be needed for other tasks. > > > > > > > > > > > > This commit refactors the hard lockup detector to replace a rigid > > > > > > compile-time choice with a flexible build-time and boot-time solution. > > > > > > The patch supports building the kernel with either detector > > > > > > independently, or with both. When both are built, a new boot parameter > > > > > > `hardlockup_detector="perf|buddy"` allows the selection at boot time. > > > > > > This is a more robust and user-friendly design. > > > > > > > > > > > > This patch is a follow-up to the discussion on the kernel mailing list > > > > > > regarding the preference and future of the hard lockup detectors. It > > > > > > implements a flexible solution that addresses the community's need to > > > > > > select an appropriate detector at boot time. > > > > > > > > > > > > The core changes are: > > > > > > - The `perf` and `buddy` watchdog implementations are separated into > > > > > > distinct functions (e.g., `watchdog_perf_hardlockup_enable`). > > > > > > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`) > > > > > > to serve as a single API for the entire feature. > > > > > > - A new `hardlockup_detector=` boot parameter is added to allow the > > > > > > user to select the desired detector at boot time. > > > > > > - The Kconfig options are simplified by removing the complex > > > > > > `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be > > > > > > built without mutual exclusion. > > > > > > - The weak stubs are updated to call the new function pointers, > > > > > > centralizing the watchdog logic. > > > > > > > > > > What is the impact on /proc/sys/kernel/nmi_watchdog ? Is that > > > > > enabling and disabling whatever the boot time choice was? I'm not sure > > > > > why this has to be a boot time option given the ability to configure > > > > > via /proc/sys/kernel/nmi_watchdog. > > > > The new hardlockup_detector boot parameter and the existing > > > > /proc/sys/kernel/nmi_watchdog file serve different purposes. > > > > > > > > The boot parameter selects the type of hard lockup detector (perf or buddy). > > > > This choice is made once at boot. > > > > > > > > /proc/sys/kernel/nmi_watchdog, on the other hand, is only a simple on/off > > > > switch for the currently selected detector. It does not change the detector's > > > > type. > > > > > > So the name "nmi_watchdog" for the buddy watchdog is wrong for fairly > > > obvious naming reasons but also because we can't differentiate when a > > > perf event has been taken or not - this impacts perf that is choosing > > > not to group events in metrics because of it, reducing the metric's > > > accuracy. We need an equivalent "buddy_watchdog" file to the > > > "nmi_watchdog" file. If we have such a file then if I did "echo 1 > > > > /proc/sys/kernel/nmi_watchdog" I'd expect the buddy watchdog to be > > > disabled and the perf event one to be enabled. Similarly, if I did > > > "echo 1 > /proc/sys/kernel/buddy_watchdog" then I would expect the > > > perf event watchdog to be disabled and the buddy one enabled. If I did > > > "echo 0 > /proc/sys/kernel/nmi_watchdog; echo 0 > > > > /proc/sys/kernel/buddy_watchdog" then I'd expect neither to be > > > enabled. I don't see why choosing the type of watchdog implementation > > > at boot time is particularly desirable. It seems sensible to default > > > normal people to using the buddy watchdog (more perf events, power...) > > > and CONFIG_DEBUG_KERNEL type people to using the perf event one. As > > > the "nmi_watchdog" file may be assumed to control the buddy watchdog, > > > perhaps a compatibility option (where the "nmi_watchdog" file controls > > > the buddy watchdog) is needed so that user code has time to migrate. > > > > Sounds good to me. For perf tools, it'd be great if we can have a run- > > time check which watchdog is selected. > Considering backward compatibility, I prefer to keep > /proc/sys/kernel/nmi_watchdog and introduce a new file called > /proc/sys/kernel/hardlockup_detector_type, which only shows the default string > or the boot parameter. Is there code using the buddy watchdog that cares about the /proc/sys/kernel/nmi_watchdog file? My assumption is that everything except Android is using the NMI watchdog, so a new /proc/sys/kernel/buddy_watchdog file doesn't impact them. On Android writing to /proc/sys/kernel/nmi_watchdog would switch from updating the buddy watchdog enable/disable to the NMI watchdog enable/disable, but it is a straightforward patch to make anything doing this update the buddy_watchdog file instead. If we have to keep "nmi_watchdog" then we should deprecate it and create an equivalent file with a better name (ie without NMI in it). It'll be moderately annoying in perf to determine whether the NMI watchdog is enabled by having to read two files. Thanks, Ian > The global str pointer hardlockup_detector_type was already introduced in the > patch, so exposing it in a file is straightforward. > > > > Thanks, > > Namhyung > > > > -- > Jinchao ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector 2025-10-06 21:29 ` Ian Rogers @ 2025-10-06 23:24 ` Doug Anderson 2025-10-07 1:00 ` Ian Rogers 0 siblings, 1 reply; 22+ messages in thread From: Doug Anderson @ 2025-10-06 23:24 UTC (permalink / raw) To: Ian Rogers Cc: Jinchao Wang, Namhyung Kim, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users Hi, On Mon, Oct 6, 2025 at 2:30 PM Ian Rogers <irogers@google.com> wrote: > > On Tue, Sep 16, 2025 at 11:14 PM Jinchao Wang <wangjinchao600@gmail.com> wrote: > > > > On Tue, Sep 16, 2025 at 10:35:46PM -0700, Namhyung Kim wrote: > > > Hello, > > > > > > On Tue, Sep 16, 2025 at 10:13:12PM -0700, Ian Rogers wrote: > > > > On Tue, Sep 16, 2025 at 6:47 PM Jinchao Wang <wangjinchao600@gmail.com> wrote: > > > > > > > > > > On Tue, Sep 16, 2025 at 05:03:48PM -0700, Ian Rogers wrote: > > > > > > On Tue, Sep 16, 2025 at 7:51 AM Jinchao Wang <wangjinchao600@gmail.com> wrote: > > > > > > > > > > > > > > Currently, the hard lockup detector is selected at compile time via > > > > > > > Kconfig, which requires a kernel rebuild to switch implementations. > > > > > > > This is inflexible, especially on systems where a perf event may not > > > > > > > be available or may be needed for other tasks. > > > > > > > > > > > > > > This commit refactors the hard lockup detector to replace a rigid > > > > > > > compile-time choice with a flexible build-time and boot-time solution. > > > > > > > The patch supports building the kernel with either detector > > > > > > > independently, or with both. When both are built, a new boot parameter > > > > > > > `hardlockup_detector="perf|buddy"` allows the selection at boot time. > > > > > > > This is a more robust and user-friendly design. > > > > > > > > > > > > > > This patch is a follow-up to the discussion on the kernel mailing list > > > > > > > regarding the preference and future of the hard lockup detectors. It > > > > > > > implements a flexible solution that addresses the community's need to > > > > > > > select an appropriate detector at boot time. > > > > > > > > > > > > > > The core changes are: > > > > > > > - The `perf` and `buddy` watchdog implementations are separated into > > > > > > > distinct functions (e.g., `watchdog_perf_hardlockup_enable`). > > > > > > > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`) > > > > > > > to serve as a single API for the entire feature. > > > > > > > - A new `hardlockup_detector=` boot parameter is added to allow the > > > > > > > user to select the desired detector at boot time. > > > > > > > - The Kconfig options are simplified by removing the complex > > > > > > > `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be > > > > > > > built without mutual exclusion. > > > > > > > - The weak stubs are updated to call the new function pointers, > > > > > > > centralizing the watchdog logic. > > > > > > > > > > > > What is the impact on /proc/sys/kernel/nmi_watchdog ? Is that > > > > > > enabling and disabling whatever the boot time choice was? I'm not sure > > > > > > why this has to be a boot time option given the ability to configure > > > > > > via /proc/sys/kernel/nmi_watchdog. > > > > > The new hardlockup_detector boot parameter and the existing > > > > > /proc/sys/kernel/nmi_watchdog file serve different purposes. > > > > > > > > > > The boot parameter selects the type of hard lockup detector (perf or buddy). > > > > > This choice is made once at boot. > > > > > > > > > > /proc/sys/kernel/nmi_watchdog, on the other hand, is only a simple on/off > > > > > switch for the currently selected detector. It does not change the detector's > > > > > type. > > > > > > > > So the name "nmi_watchdog" for the buddy watchdog is wrong for fairly > > > > obvious naming reasons but also because we can't differentiate when a > > > > perf event has been taken or not - this impacts perf that is choosing > > > > not to group events in metrics because of it, reducing the metric's > > > > accuracy. We need an equivalent "buddy_watchdog" file to the > > > > "nmi_watchdog" file. If we have such a file then if I did "echo 1 > > > > > /proc/sys/kernel/nmi_watchdog" I'd expect the buddy watchdog to be > > > > disabled and the perf event one to be enabled. Similarly, if I did > > > > "echo 1 > /proc/sys/kernel/buddy_watchdog" then I would expect the > > > > perf event watchdog to be disabled and the buddy one enabled. If I did > > > > "echo 0 > /proc/sys/kernel/nmi_watchdog; echo 0 > > > > > /proc/sys/kernel/buddy_watchdog" then I'd expect neither to be > > > > enabled. I don't see why choosing the type of watchdog implementation > > > > at boot time is particularly desirable. It seems sensible to default > > > > normal people to using the buddy watchdog (more perf events, power...) > > > > and CONFIG_DEBUG_KERNEL type people to using the perf event one. As > > > > the "nmi_watchdog" file may be assumed to control the buddy watchdog, > > > > perhaps a compatibility option (where the "nmi_watchdog" file controls > > > > the buddy watchdog) is needed so that user code has time to migrate. > > > > > > Sounds good to me. For perf tools, it'd be great if we can have a run- > > > time check which watchdog is selected. > > Considering backward compatibility, I prefer to keep > > /proc/sys/kernel/nmi_watchdog and introduce a new file called > > /proc/sys/kernel/hardlockup_detector_type, which only shows the default string > > or the boot parameter. > > Is there code using the buddy watchdog that cares about the > /proc/sys/kernel/nmi_watchdog file? My assumption is that everything > except Android is using the NMI watchdog, so a new > /proc/sys/kernel/buddy_watchdog file doesn't impact them. Buddy watchdog has been out there for a few years. At Google, I know it's used by everything except Android. AKA I believe it is used in Google's servers and also in ChromeOS. Both of those (presumably) enable the buddy watchdog via calling it "nmi_watchdog". It's possible that some Android phones are using the buddy watchdog too but I'm not aware of it. I don't believe Pixel is using it, though that could change in the future. IMO at this point "nmi watchdog" is simply a synonym for the hardlockup detector. That was what it looked like in the code before I started messing around and adding the buddy lockup detector and it's how it is now. While it's unfortunate that there are two names for the same thing, I don't personally think that should change at this point. FWIW, even the "buddy" watchdog relies on NMIs to actually get stack crawls on stuck cores, so NMI still means something even there. If we want to tell between the perf detector or the buddy detector we should add a separate file for it. > On Android > writing to /proc/sys/kernel/nmi_watchdog would switch from updating > the buddy watchdog enable/disable to the NMI watchdog enable/disable, > but it is a straightforward patch to make anything doing this update > the buddy_watchdog file instead. Straightforward, but you've got to go find everyone that you break by doing this. That's not something I want responsibility for. If you want to convince others this is something worthwhile and you've got someone signed up to deal with the fallout (if any) then I won't object, but it's not something I'd be in support of. > If we have to keep "nmi_watchdog" then we should deprecate it and > create an equivalent file with a better name (ie without NMI in it). > It'll be moderately annoying in perf to determine whether the NMI > watchdog is enabled by having to read two files. Again, up to you if you want to try to do this, but I'm not really in support of it. It doesn't seem terribly hard to make a new file that says which hardlockup detector is in use. If that file exists then read it. If not then you fallback to what you have today. -Doug ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector 2025-10-06 23:24 ` Doug Anderson @ 2025-10-07 1:00 ` Ian Rogers 2025-10-07 19:54 ` Doug Anderson 0 siblings, 1 reply; 22+ messages in thread From: Ian Rogers @ 2025-10-07 1:00 UTC (permalink / raw) To: Doug Anderson, Greg Kroah-Hartman Cc: Jinchao Wang, Namhyung Kim, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users On Mon, Oct 6, 2025 at 4:32 PM Doug Anderson <dianders@chromium.org> wrote: > > Hi, > > On Mon, Oct 6, 2025 at 2:30 PM Ian Rogers <irogers@google.com> wrote: > > > > On Tue, Sep 16, 2025 at 11:14 PM Jinchao Wang <wangjinchao600@gmail.com> wrote: > > > > > > On Tue, Sep 16, 2025 at 10:35:46PM -0700, Namhyung Kim wrote: > > > > Hello, > > > > > > > > On Tue, Sep 16, 2025 at 10:13:12PM -0700, Ian Rogers wrote: > > > > > On Tue, Sep 16, 2025 at 6:47 PM Jinchao Wang <wangjinchao600@gmail.com> wrote: > > > > > > > > > > > > On Tue, Sep 16, 2025 at 05:03:48PM -0700, Ian Rogers wrote: > > > > > > > On Tue, Sep 16, 2025 at 7:51 AM Jinchao Wang <wangjinchao600@gmail.com> wrote: > > > > > > > > > > > > > > > > Currently, the hard lockup detector is selected at compile time via > > > > > > > > Kconfig, which requires a kernel rebuild to switch implementations. > > > > > > > > This is inflexible, especially on systems where a perf event may not > > > > > > > > be available or may be needed for other tasks. > > > > > > > > > > > > > > > > This commit refactors the hard lockup detector to replace a rigid > > > > > > > > compile-time choice with a flexible build-time and boot-time solution. > > > > > > > > The patch supports building the kernel with either detector > > > > > > > > independently, or with both. When both are built, a new boot parameter > > > > > > > > `hardlockup_detector="perf|buddy"` allows the selection at boot time. > > > > > > > > This is a more robust and user-friendly design. > > > > > > > > > > > > > > > > This patch is a follow-up to the discussion on the kernel mailing list > > > > > > > > regarding the preference and future of the hard lockup detectors. It > > > > > > > > implements a flexible solution that addresses the community's need to > > > > > > > > select an appropriate detector at boot time. > > > > > > > > > > > > > > > > The core changes are: > > > > > > > > - The `perf` and `buddy` watchdog implementations are separated into > > > > > > > > distinct functions (e.g., `watchdog_perf_hardlockup_enable`). > > > > > > > > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`) > > > > > > > > to serve as a single API for the entire feature. > > > > > > > > - A new `hardlockup_detector=` boot parameter is added to allow the > > > > > > > > user to select the desired detector at boot time. > > > > > > > > - The Kconfig options are simplified by removing the complex > > > > > > > > `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be > > > > > > > > built without mutual exclusion. > > > > > > > > - The weak stubs are updated to call the new function pointers, > > > > > > > > centralizing the watchdog logic. > > > > > > > > > > > > > > What is the impact on /proc/sys/kernel/nmi_watchdog ? Is that > > > > > > > enabling and disabling whatever the boot time choice was? I'm not sure > > > > > > > why this has to be a boot time option given the ability to configure > > > > > > > via /proc/sys/kernel/nmi_watchdog. > > > > > > The new hardlockup_detector boot parameter and the existing > > > > > > /proc/sys/kernel/nmi_watchdog file serve different purposes. > > > > > > > > > > > > The boot parameter selects the type of hard lockup detector (perf or buddy). > > > > > > This choice is made once at boot. > > > > > > > > > > > > /proc/sys/kernel/nmi_watchdog, on the other hand, is only a simple on/off > > > > > > switch for the currently selected detector. It does not change the detector's > > > > > > type. > > > > > > > > > > So the name "nmi_watchdog" for the buddy watchdog is wrong for fairly > > > > > obvious naming reasons but also because we can't differentiate when a > > > > > perf event has been taken or not - this impacts perf that is choosing > > > > > not to group events in metrics because of it, reducing the metric's > > > > > accuracy. We need an equivalent "buddy_watchdog" file to the > > > > > "nmi_watchdog" file. If we have such a file then if I did "echo 1 > > > > > > /proc/sys/kernel/nmi_watchdog" I'd expect the buddy watchdog to be > > > > > disabled and the perf event one to be enabled. Similarly, if I did > > > > > "echo 1 > /proc/sys/kernel/buddy_watchdog" then I would expect the > > > > > perf event watchdog to be disabled and the buddy one enabled. If I did > > > > > "echo 0 > /proc/sys/kernel/nmi_watchdog; echo 0 > > > > > > /proc/sys/kernel/buddy_watchdog" then I'd expect neither to be > > > > > enabled. I don't see why choosing the type of watchdog implementation > > > > > at boot time is particularly desirable. It seems sensible to default > > > > > normal people to using the buddy watchdog (more perf events, power...) > > > > > and CONFIG_DEBUG_KERNEL type people to using the perf event one. As > > > > > the "nmi_watchdog" file may be assumed to control the buddy watchdog, > > > > > perhaps a compatibility option (where the "nmi_watchdog" file controls > > > > > the buddy watchdog) is needed so that user code has time to migrate. > > > > > > > > Sounds good to me. For perf tools, it'd be great if we can have a run- > > > > time check which watchdog is selected. > > > Considering backward compatibility, I prefer to keep > > > /proc/sys/kernel/nmi_watchdog and introduce a new file called > > > /proc/sys/kernel/hardlockup_detector_type, which only shows the default string > > > or the boot parameter. > > > > Is there code using the buddy watchdog that cares about the > > /proc/sys/kernel/nmi_watchdog file? My assumption is that everything > > except Android is using the NMI watchdog, so a new > > /proc/sys/kernel/buddy_watchdog file doesn't impact them. > > Buddy watchdog has been out there for a few years. At Google, I know > it's used by everything except Android. AKA I believe it is used in > Google's servers and also in ChromeOS. Both of those (presumably) > enable the buddy watchdog via calling it "nmi_watchdog". It's possible > that some Android phones are using the buddy watchdog too but I'm not > aware of it. I don't believe Pixel is using it, though that could > change in the future. I thought what motivated the buddy watchdog was patches implementing this approach on Android for ARM that lacked an NMI based hard lockup detector? Anyway, while the buddy watchdog is in use by Google servers the nmi_watchdog file has an actively detrimental effect on that. Specifically the nmi_watchdog file having the value of "1" disables certain event groups for certain metrics in the perf tool as it is assumed there are too few performance counters due to the NMI watchdog stealing one. We want groups of events so that events are scheduled together and metrics are more accurate, we don't want groups that fail to schedule because of a lack of counters. > IMO at this point "nmi watchdog" is simply a synonym for the > hardlockup detector. That was what it looked like in the code before I > started messing around and adding the buddy lockup detector and it's > how it is now. While it's unfortunate that there are two names for the > same thing, I don't personally think that should change at this point. > FWIW, even the "buddy" watchdog relies on NMIs to actually get stack > crawls on stuck cores, so NMI still means something even there. I think this is misguided. Currently the only use of nmi_watchdog I'm aware of is by perf where the buddy watchdog's use of it causes issues (as mentioned above). > If we want to tell between the perf detector or the buddy detector we > should add a separate file for it. We could with perf then having to read from two files rather than one. Presumably the lack of presence of one file will be sufficient to also avoid a kernel version check. > > On Android > > writing to /proc/sys/kernel/nmi_watchdog would switch from updating > > the buddy watchdog enable/disable to the NMI watchdog enable/disable, > > but it is a straightforward patch to make anything doing this update > > the buddy_watchdog file instead. > > Straightforward, but you've got to go find everyone that you break by > doing this. That's not something I want responsibility for. If you > want to convince others this is something worthwhile and you've got > someone signed up to deal with the fallout (if any) then I won't > object, but it's not something I'd be in support of. Stuff like this happens, check out this thread: https://lore.kernel.org/lkml/2025020304-chip-trench-4e56@gregkh/ Imo we shouldn't design in using an actively misleading file name and incurring extra overhead in perf. Having two files nmi_watchdog and buddy_watchdog is fine as the latter case currently isn't in mainstream distro use and people shouldn't care. It also maintains and correct's perf's behavior when the buddy and not nmi watchdog is in use. > > If we have to keep "nmi_watchdog" then we should deprecate it and > > create an equivalent file with a better name (ie without NMI in it). > > It'll be moderately annoying in perf to determine whether the NMI > > watchdog is enabled by having to read two files. > > Again, up to you if you want to try to do this, but I'm not really in > support of it. It doesn't seem terribly hard to make a new file that > says which hardlockup detector is in use. If that file exists then > read it. If not then you fallback to what you have today. I don't mind a file that also says what kind of lockup detector is in use. I'd like the meaning of nmi_watchdog to keep meaning the kernel stole a perf counter as this is the behavior long assumed by perf which I think is the majority or only user of the file. I think the buddy watchdog being controlled by this file is less than intention revealing. Thanks, Ian > -Doug ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector 2025-10-07 1:00 ` Ian Rogers @ 2025-10-07 19:54 ` Doug Anderson 2025-10-07 20:43 ` Ian Rogers 0 siblings, 1 reply; 22+ messages in thread From: Doug Anderson @ 2025-10-07 19:54 UTC (permalink / raw) To: Ian Rogers Cc: Greg Kroah-Hartman, Jinchao Wang, Namhyung Kim, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users Hi, On Mon, Oct 6, 2025 at 6:00 PM Ian Rogers <irogers@google.com> wrote: > > On Mon, Oct 6, 2025 at 4:32 PM Doug Anderson <dianders@chromium.org> wrote: > > > > Hi, > > > > On Mon, Oct 6, 2025 at 2:30 PM Ian Rogers <irogers@google.com> wrote: > > > > > > On Tue, Sep 16, 2025 at 11:14 PM Jinchao Wang <wangjinchao600@gmail.com> wrote: > > > > > > > > On Tue, Sep 16, 2025 at 10:35:46PM -0700, Namhyung Kim wrote: > > > > > Hello, > > > > > > > > > > On Tue, Sep 16, 2025 at 10:13:12PM -0700, Ian Rogers wrote: > > > > > > On Tue, Sep 16, 2025 at 6:47 PM Jinchao Wang <wangjinchao600@gmail.com> wrote: > > > > > > > > > > > > > > On Tue, Sep 16, 2025 at 05:03:48PM -0700, Ian Rogers wrote: > > > > > > > > On Tue, Sep 16, 2025 at 7:51 AM Jinchao Wang <wangjinchao600@gmail.com> wrote: > > > > > > > > > > > > > > > > > > Currently, the hard lockup detector is selected at compile time via > > > > > > > > > Kconfig, which requires a kernel rebuild to switch implementations. > > > > > > > > > This is inflexible, especially on systems where a perf event may not > > > > > > > > > be available or may be needed for other tasks. > > > > > > > > > > > > > > > > > > This commit refactors the hard lockup detector to replace a rigid > > > > > > > > > compile-time choice with a flexible build-time and boot-time solution. > > > > > > > > > The patch supports building the kernel with either detector > > > > > > > > > independently, or with both. When both are built, a new boot parameter > > > > > > > > > `hardlockup_detector="perf|buddy"` allows the selection at boot time. > > > > > > > > > This is a more robust and user-friendly design. > > > > > > > > > > > > > > > > > > This patch is a follow-up to the discussion on the kernel mailing list > > > > > > > > > regarding the preference and future of the hard lockup detectors. It > > > > > > > > > implements a flexible solution that addresses the community's need to > > > > > > > > > select an appropriate detector at boot time. > > > > > > > > > > > > > > > > > > The core changes are: > > > > > > > > > - The `perf` and `buddy` watchdog implementations are separated into > > > > > > > > > distinct functions (e.g., `watchdog_perf_hardlockup_enable`). > > > > > > > > > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`) > > > > > > > > > to serve as a single API for the entire feature. > > > > > > > > > - A new `hardlockup_detector=` boot parameter is added to allow the > > > > > > > > > user to select the desired detector at boot time. > > > > > > > > > - The Kconfig options are simplified by removing the complex > > > > > > > > > `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be > > > > > > > > > built without mutual exclusion. > > > > > > > > > - The weak stubs are updated to call the new function pointers, > > > > > > > > > centralizing the watchdog logic. > > > > > > > > > > > > > > > > What is the impact on /proc/sys/kernel/nmi_watchdog ? Is that > > > > > > > > enabling and disabling whatever the boot time choice was? I'm not sure > > > > > > > > why this has to be a boot time option given the ability to configure > > > > > > > > via /proc/sys/kernel/nmi_watchdog. > > > > > > > The new hardlockup_detector boot parameter and the existing > > > > > > > /proc/sys/kernel/nmi_watchdog file serve different purposes. > > > > > > > > > > > > > > The boot parameter selects the type of hard lockup detector (perf or buddy). > > > > > > > This choice is made once at boot. > > > > > > > > > > > > > > /proc/sys/kernel/nmi_watchdog, on the other hand, is only a simple on/off > > > > > > > switch for the currently selected detector. It does not change the detector's > > > > > > > type. > > > > > > > > > > > > So the name "nmi_watchdog" for the buddy watchdog is wrong for fairly > > > > > > obvious naming reasons but also because we can't differentiate when a > > > > > > perf event has been taken or not - this impacts perf that is choosing > > > > > > not to group events in metrics because of it, reducing the metric's > > > > > > accuracy. We need an equivalent "buddy_watchdog" file to the > > > > > > "nmi_watchdog" file. If we have such a file then if I did "echo 1 > > > > > > > /proc/sys/kernel/nmi_watchdog" I'd expect the buddy watchdog to be > > > > > > disabled and the perf event one to be enabled. Similarly, if I did > > > > > > "echo 1 > /proc/sys/kernel/buddy_watchdog" then I would expect the > > > > > > perf event watchdog to be disabled and the buddy one enabled. If I did > > > > > > "echo 0 > /proc/sys/kernel/nmi_watchdog; echo 0 > > > > > > > /proc/sys/kernel/buddy_watchdog" then I'd expect neither to be > > > > > > enabled. I don't see why choosing the type of watchdog implementation > > > > > > at boot time is particularly desirable. It seems sensible to default > > > > > > normal people to using the buddy watchdog (more perf events, power...) > > > > > > and CONFIG_DEBUG_KERNEL type people to using the perf event one. As > > > > > > the "nmi_watchdog" file may be assumed to control the buddy watchdog, > > > > > > perhaps a compatibility option (where the "nmi_watchdog" file controls > > > > > > the buddy watchdog) is needed so that user code has time to migrate. > > > > > > > > > > Sounds good to me. For perf tools, it'd be great if we can have a run- > > > > > time check which watchdog is selected. > > > > Considering backward compatibility, I prefer to keep > > > > /proc/sys/kernel/nmi_watchdog and introduce a new file called > > > > /proc/sys/kernel/hardlockup_detector_type, which only shows the default string > > > > or the boot parameter. > > > > > > Is there code using the buddy watchdog that cares about the > > > /proc/sys/kernel/nmi_watchdog file? My assumption is that everything > > > except Android is using the NMI watchdog, so a new > > > /proc/sys/kernel/buddy_watchdog file doesn't impact them. > > > > Buddy watchdog has been out there for a few years. At Google, I know > > it's used by everything except Android. AKA I believe it is used in > > Google's servers and also in ChromeOS. Both of those (presumably) > > enable the buddy watchdog via calling it "nmi_watchdog". It's possible > > that some Android phones are using the buddy watchdog too but I'm not > > aware of it. I don't believe Pixel is using it, though that could > > change in the future. > > I thought what motivated the buddy watchdog was patches implementing > this approach on Android for ARM that lacked an NMI based hard lockup > detector? It's probably not worth a full rehash of the history, but suffice to say that the buddy lockup detector did originally come out of Android but isn't currently used there as far as I can tell. > Anyway, while the buddy watchdog is in use by Google servers the > nmi_watchdog file has an actively detrimental effect on that. > Specifically the nmi_watchdog file having the value of "1" disables > certain event groups for certain metrics in the perf tool as it is > assumed there are too few performance counters due to the NMI watchdog > stealing one. We want groups of events so that events are scheduled > together and metrics are more accurate, we don't want groups that fail > to schedule because of a lack of counters. Fair enough, but that code could also change. > > IMO at this point "nmi watchdog" is simply a synonym for the > > hardlockup detector. That was what it looked like in the code before I > > started messing around and adding the buddy lockup detector and it's > > how it is now. While it's unfortunate that there are two names for the > > same thing, I don't personally think that should change at this point. > > FWIW, even the "buddy" watchdog relies on NMIs to actually get stack > > crawls on stuck cores, so NMI still means something even there. > > I think this is misguided. Currently the only use of nmi_watchdog I'm > aware of is by perf where the buddy watchdog's use of it causes issues > (as mentioned above). > > > If we want to tell between the perf detector or the buddy detector we > > should add a separate file for it. > > We could with perf then having to read from two files rather than one. > Presumably the lack of presence of one file will be sufficient to also > avoid a kernel version check. This was what I was assuming > > > On Android > > > writing to /proc/sys/kernel/nmi_watchdog would switch from updating > > > the buddy watchdog enable/disable to the NMI watchdog enable/disable, > > > but it is a straightforward patch to make anything doing this update > > > the buddy_watchdog file instead. > > > > Straightforward, but you've got to go find everyone that you break by > > doing this. That's not something I want responsibility for. If you > > want to convince others this is something worthwhile and you've got > > someone signed up to deal with the fallout (if any) then I won't > > object, but it's not something I'd be in support of. > > Stuff like this happens, check out this thread: > https://lore.kernel.org/lkml/2025020304-chip-trench-4e56@gregkh/ > Imo we shouldn't design in using an actively misleading file name and > incurring extra overhead in perf. Having two files nmi_watchdog and > buddy_watchdog is fine as the latter case currently isn't in > mainstream distro use and people shouldn't care. It also maintains and > correct's perf's behavior when the buddy and not nmi watchdog is in > use. > > > > If we have to keep "nmi_watchdog" then we should deprecate it and > > > create an equivalent file with a better name (ie without NMI in it). > > > It'll be moderately annoying in perf to determine whether the NMI > > > watchdog is enabled by having to read two files. > > > > Again, up to you if you want to try to do this, but I'm not really in > > support of it. It doesn't seem terribly hard to make a new file that > > says which hardlockup detector is in use. If that file exists then > > read it. If not then you fallback to what you have today. > > I don't mind a file that also says what kind of lockup detector is in > use. I'd like the meaning of nmi_watchdog to keep meaning the kernel > stole a perf counter as this is the behavior long assumed by perf > which I think is the majority or only user of the file. I think the > buddy watchdog being controlled by this file is less than intention > revealing. I'm more than happy to be outvoted, but IMO nothing about the name "nmi_watchdog" says to me that a perf counter was stolen. It just says that there's a watchdog that NMIs are part of its work. ...and, indeed, "nmi_watchdog" doesn't use perf on PPC, right? As far as I can tell, from reading `arch/powerpc/kernel/watchdog.c` checks `watchdog_enabled & WATCHDOG_HARDLOCKUP_ENABLED`. ...and before I did commit df95d3085caa ("watchdog/hardlockup: rename some "NMI watchdog" constants/function") it was checking `watchdog_enabled & NMI_WATCHDOG_ENABLED`. That was at least since 2019, I think... ...and you can see "PPC_WATCHDOG" depends on "HARDLOCKUP_DETECTOR_ARCH", not "HARDLOCKUP_DETECTOR_PERF" so it's not perf-backed. I think this is just the critical question: does/should "nmi_watchdog" mean perf-backed detector or is "nmi_watchdog" just a synonym for "hard lockup detector". In your mind it's the former and in my mind it's the latter. The correct way forward depends on which way we want to jump, right? In either case some external code is going to need to change... The kernel docs are certainly pretty ambiguous here. "kernel.rst" certainly mentions perf, but it also says that "nmi_watchdog" is only for x86 which is certainly not true. "lockup-watchdogs.rst" doesn't say anything about perf and just seems to indicate that "nmi_watchdog" is another name for the hardlockup detector. "kernel-parameters.txt" makes some mention of tweaking which perf event would be used, but otherwise makes it sound like the nmi_watchdog is just another name for the hardlockup detector. My vote would be two files: * "nmi_watchdog" - simply a synonym for "hardlockup detector". 1 enabled, 0 if not (like today) * "hardlock_detector" - could be "buddy", "perf", or "arch" ...if the "hardlockup_detector" file doesn't exist and "nmi_watchdog" is enabled, you could probably guess that a perf event was in use. If the "hardlockup_detector" file exists then a perf event is only in use if the value of that file is "perf". It doesn't feel terribly hard to do this and it doesn't break anyone who was assuming they could turn on the hardlockup detector with "/proc/nmi_watchdog". -Doug ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector 2025-10-07 19:54 ` Doug Anderson @ 2025-10-07 20:43 ` Ian Rogers 2025-10-07 21:43 ` Doug Anderson 0 siblings, 1 reply; 22+ messages in thread From: Ian Rogers @ 2025-10-07 20:43 UTC (permalink / raw) To: Doug Anderson Cc: Greg Kroah-Hartman, Jinchao Wang, Namhyung Kim, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users On Tue, Oct 7, 2025 at 1:00 PM Doug Anderson <dianders@chromium.org> wrote: > On Mon, Oct 6, 2025 at 6:00 PM Ian Rogers <irogers@google.com> wrote: .. > > I don't mind a file that also says what kind of lockup detector is in > > use. I'd like the meaning of nmi_watchdog to keep meaning the kernel > > stole a perf counter as this is the behavior long assumed by perf > > which I think is the majority or only user of the file. I think the > > buddy watchdog being controlled by this file is less than intention > > revealing. > > I'm more than happy to be outvoted, but IMO nothing about the name > "nmi_watchdog" says to me that a perf counter was stolen. It just says > that there's a watchdog that NMIs are part of its work. > > ...and, indeed, "nmi_watchdog" doesn't use perf on PPC, right? As far > as I can tell, from reading `arch/powerpc/kernel/watchdog.c` checks > `watchdog_enabled & WATCHDOG_HARDLOCKUP_ENABLED`. ...and before I did > commit df95d3085caa ("watchdog/hardlockup: rename some "NMI watchdog" > constants/function") it was checking `watchdog_enabled & > NMI_WATCHDOG_ENABLED`. That was at least since 2019, I think... > > ...and you can see "PPC_WATCHDOG" depends on > "HARDLOCKUP_DETECTOR_ARCH", not "HARDLOCKUP_DETECTOR_PERF" so it's not > perf-backed. > > I think this is just the critical question: does/should "nmi_watchdog" > mean perf-backed detector or is "nmi_watchdog" just a synonym for > "hard lockup detector". In your mind it's the former and in my mind > it's the latter. The correct way forward depends on which way we want > to jump, right? In either case some external code is going to need to > change... So we could say it is one-idea against another, I'm hoping to be objective and come from the viewpoint of the perf code as part of its function is to be a demonstration of APIs and the buddy watchdog has altered this. The perf tool has referred to the nmi_watchdog's behavior in man pages and code since 2016: http://lkml.kernel.org/r/1459810686-15913-1-git-send-email-andi@firstfloor.org ``` +Globally pinned events can limit the number of counters available for +other groups. On x86 systems, the NMI watchdog pins a counter by default. +The nmi watchdog can be disabled as root with + + echo 0 > /proc/sys/kernel/nmi_watchdog ``` http://lkml.kernel.org/r/1464119559-17203-1-git-send-email-andi@firstfloor.org ``` +void arch_topdown_group_warn(void) +{ + fprintf(stderr, + "nmi_watchdog enabled with topdown. May give wrong results.\n" + "Disable with echo 0 > /proc/sys/kernel/nmi_watchdog\n"); +} ``` Probably the most common error message dates back to 2017: http://lkml.kernel.org/r/20170211183218.ijnvb5f7ciyuunx4@pd.tnic ``` Some events weren't counted. Try disabling the NMI watchdog: echo 0 > /proc/sys/kernel/nmi_watchdog perf stat ... echo 1 > /proc/sys/kernel/nmi_watchdog ``` and that is saying "NMI watchdog" not "buddy watchdog". Users are familiar with the idea that the /proc/sys/kernel/nmi_watchdog is unremarkably controlling the NMI watchdog. I've not found another use of /proc/sys/kernel/nmi_watchdog outside of the perf tool. > The kernel docs are certainly pretty ambiguous here. "kernel.rst" > certainly mentions perf, but it also says that "nmi_watchdog" is only > for x86 which is certainly not true. "lockup-watchdogs.rst" doesn't > say anything about perf and just seems to indicate that "nmi_watchdog" > is another name for the hardlockup detector. "kernel-parameters.txt" > makes some mention of tweaking which perf event would be used, but > otherwise makes it sound like the nmi_watchdog is just another name > for the hardlockup detector. > > My vote would be two files: > > * "nmi_watchdog" - simply a synonym for "hardlockup detector". 1 > enabled, 0 if not (like today) > > * "hardlock_detector" - could be "buddy", "perf", or "arch" > > ...if the "hardlockup_detector" file doesn't exist and "nmi_watchdog" > is enabled, you could probably guess that a perf event was in use. If > the "hardlockup_detector" file exists then a perf event is only in use > if the value of that file is "perf". It doesn't feel terribly hard to > do this and it doesn't break anyone who was assuming they could turn > on the hardlockup detector with "/proc/nmi_watchdog". It is not hard but: 1) it means whenever perf wants to determine the NMI watchdog is present it needs to read two files rather than the existing 1, which has some runtime cost; 2) the name nmi_watchdog for controlling the behavior of the buddy watchdog isn't intention revealing as the buddy mechanisms whole point is to avoid the NMI; 3) old perf tools with the buddy watchdog have the wrong behavior (they've regressed). It is also not hard to have the watchdog files named <mechanism>_watchdog, such as buddy_watchdog, nmi_watchdog, arch_watchdog and have the contents be 0 or 1 as appropriate. Such a scheme would have less overhead for perf, make the name more intention revealing and not alter old perf tools. I'm really not sure what problem we're trying to fix by not adopting this approach. I was surprised the buddy watchdog merged but using the nmi_watchdog file. Thanks, Ian > -Doug > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector 2025-10-07 20:43 ` Ian Rogers @ 2025-10-07 21:43 ` Doug Anderson 2025-10-07 22:45 ` Ian Rogers 0 siblings, 1 reply; 22+ messages in thread From: Doug Anderson @ 2025-10-07 21:43 UTC (permalink / raw) To: Ian Rogers Cc: Greg Kroah-Hartman, Jinchao Wang, Namhyung Kim, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users Hi, On Tue, Oct 7, 2025 at 1:43 PM Ian Rogers <irogers@google.com> wrote: > > On Tue, Oct 7, 2025 at 1:00 PM Doug Anderson <dianders@chromium.org> wrote: > > On Mon, Oct 6, 2025 at 6:00 PM Ian Rogers <irogers@google.com> wrote: > .. > > > I don't mind a file that also says what kind of lockup detector is in > > > use. I'd like the meaning of nmi_watchdog to keep meaning the kernel > > > stole a perf counter as this is the behavior long assumed by perf > > > which I think is the majority or only user of the file. I think the > > > buddy watchdog being controlled by this file is less than intention > > > revealing. > > > > I'm more than happy to be outvoted, but IMO nothing about the name > > "nmi_watchdog" says to me that a perf counter was stolen. It just says > > that there's a watchdog that NMIs are part of its work. > > > > ...and, indeed, "nmi_watchdog" doesn't use perf on PPC, right? As far > > as I can tell, from reading `arch/powerpc/kernel/watchdog.c` checks > > `watchdog_enabled & WATCHDOG_HARDLOCKUP_ENABLED`. ...and before I did > > commit df95d3085caa ("watchdog/hardlockup: rename some "NMI watchdog" > > constants/function") it was checking `watchdog_enabled & > > NMI_WATCHDOG_ENABLED`. That was at least since 2019, I think... > > > > ...and you can see "PPC_WATCHDOG" depends on > > "HARDLOCKUP_DETECTOR_ARCH", not "HARDLOCKUP_DETECTOR_PERF" so it's not > > perf-backed. > > > > I think this is just the critical question: does/should "nmi_watchdog" > > mean perf-backed detector or is "nmi_watchdog" just a synonym for > > "hard lockup detector". In your mind it's the former and in my mind > > it's the latter. The correct way forward depends on which way we want > > to jump, right? In either case some external code is going to need to > > change... > > So we could say it is one-idea against another, I'm hoping to be > objective and come from the viewpoint of the perf code as part of its > function is to be a demonstration of APIs and the buddy watchdog has > altered this. The perf tool has referred to the nmi_watchdog's > behavior in man pages and code since 2016: > http://lkml.kernel.org/r/1459810686-15913-1-git-send-email-andi@firstfloor.org > ``` > +Globally pinned events can limit the number of counters available for > +other groups. On x86 systems, the NMI watchdog pins a counter by default. > +The nmi watchdog can be disabled as root with > + > + echo 0 > /proc/sys/kernel/nmi_watchdog > ``` > http://lkml.kernel.org/r/1464119559-17203-1-git-send-email-andi@firstfloor.org > ``` > +void arch_topdown_group_warn(void) > +{ > + fprintf(stderr, > + "nmi_watchdog enabled with topdown. May give wrong results.\n" > + "Disable with echo 0 > /proc/sys/kernel/nmi_watchdog\n"); > +} > ``` > > Probably the most common error message dates back to 2017: > http://lkml.kernel.org/r/20170211183218.ijnvb5f7ciyuunx4@pd.tnic > ``` > Some events weren't counted. Try disabling the NMI watchdog: > echo 0 > /proc/sys/kernel/nmi_watchdog > perf stat ... > echo 1 > /proc/sys/kernel/nmi_watchdog > ``` > and that is saying "NMI watchdog" not "buddy watchdog". Users are > familiar with the idea that the /proc/sys/kernel/nmi_watchdog is > unremarkably controlling the NMI watchdog. > > I've not found another use of /proc/sys/kernel/nmi_watchdog outside of > the perf tool. Although it's possible I missed it, from my quick look in ChromeOS, I don't see any use of `/proc/sys/kernel/nmi_watchdog` either. NOTE that you also need to look for `sysctl` references to `kernel.nmi_watchdog`, right? That's essentially a way to access the exact same file... The other thing you need to think about is the kernel command-line parameter. Right now we have the kernel command line parameter `nmi_watchdog=<0,1>` that can be used to turn the hardlockup detector on or off. On PowerPC I believe it turns on/off the ARCH hardlockup detector. On systems with the buddy hardlockup detector it turns on/off the buddy hardlockup detector. How are you proposing to handle that? Are you going to make the kernel command line parameter still affect everyone, or require people using the kernel command line parameter for buddy/powerpc to change? All of these ways to turn off/on the hardlockup detector can, in theory, also be things that end users are messing with. People may have settings in their `/etc/sysctl.d` to tweak things. They may have messed with the kernel command line in their bootloader to tweak this setting. I'm not saying it's impossible to change, but anyone who changes this has to be prepared for people to yell. > > The kernel docs are certainly pretty ambiguous here. "kernel.rst" > > certainly mentions perf, but it also says that "nmi_watchdog" is only > > for x86 which is certainly not true. "lockup-watchdogs.rst" doesn't > > say anything about perf and just seems to indicate that "nmi_watchdog" > > is another name for the hardlockup detector. "kernel-parameters.txt" > > makes some mention of tweaking which perf event would be used, but > > otherwise makes it sound like the nmi_watchdog is just another name > > for the hardlockup detector. > > > > My vote would be two files: > > > > * "nmi_watchdog" - simply a synonym for "hardlockup detector". 1 > > enabled, 0 if not (like today) > > > > * "hardlock_detector" - could be "buddy", "perf", or "arch" > > > > ...if the "hardlockup_detector" file doesn't exist and "nmi_watchdog" > > is enabled, you could probably guess that a perf event was in use. If > > the "hardlockup_detector" file exists then a perf event is only in use > > if the value of that file is "perf". It doesn't feel terribly hard to > > do this and it doesn't break anyone who was assuming they could turn > > on the hardlockup detector with "/proc/nmi_watchdog". > > It is not hard but: > 1) it means whenever perf wants to determine the NMI watchdog is > present it needs to read two files rather than the existing 1, which > has some runtime cost; > 2) the name nmi_watchdog for controlling the behavior of the buddy > watchdog isn't intention revealing as the buddy mechanisms whole point > is to avoid the NMI; The buddy mechanism is to avoid the use of a perf counter, not an NMI (non maskable interrupt). "NMI" != "perf counter". Many types of interrupts can be non-maskable (or pseudo-non-maskable). The buddy lockup detector still uses an NMI (indirectly). Specifically, a non-maskable IPI (NMI-backed IPI) is used to interrupt the other CPU. You keep saying that perf wants to determine if the "NMI watchdog" is present. That's not what it wants to know. It wants to know if the hard lockup detector (which likely uses NMIs in some way) happens to use up a perf event. If an NMI-backed hardlockup detector (presumably like the one on PowerPC) wasn't using a perf counter then perf wouldn't care. > 3) old perf tools with the buddy watchdog have the wrong behavior > (they've regressed). > > It is also not hard to have the watchdog files named > <mechanism>_watchdog, such as buddy_watchdog, nmi_watchdog, > arch_watchdog and have the contents be 0 or 1 as appropriate. Such a > scheme would have less overhead for perf, make the name more intention > revealing and not alter old perf tools. I'm really not sure what > problem we're trying to fix by not adopting this approach. I was > surprised the buddy watchdog merged but using the nmi_watchdog file. The buddy watchdog was pretty much following the conventions that were already in the code: that the hardlockup detector (whether backed by perf or not) was essentially called the "nmi watchdog". There were a number of people that were involved in reviews and I don't believe suggesting creating a whole different mechanism for enabling / disabling the buddy watchdog was never suggested. -Doug ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector 2025-10-07 21:43 ` Doug Anderson @ 2025-10-07 22:45 ` Ian Rogers 2025-10-07 22:58 ` Doug Anderson 0 siblings, 1 reply; 22+ messages in thread From: Ian Rogers @ 2025-10-07 22:45 UTC (permalink / raw) To: Doug Anderson Cc: Greg Kroah-Hartman, Jinchao Wang, Namhyung Kim, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users On Tue, Oct 7, 2025 at 2:43 PM Doug Anderson <dianders@chromium.org> wrote: ... > The buddy watchdog was pretty much following the conventions that were > already in the code: that the hardlockup detector (whether backed by > perf or not) was essentially called the "nmi watchdog". There were a > number of people that were involved in reviews and I don't believe > suggesting creating a whole different mechanism for enabling / > disabling the buddy watchdog was never suggested. I suspect they lacked the context that 1 in the nmi_watchdog is taken to mean there's a perf event in use by the kernel with implications on how group events behave. This behavior has been user visible/advertised for 9 years. I don't doubt that there were good intentions by PowerPC's watchdog and in the buddy watchdog patches in using the file, that use will lead to spurious warnings and behaviors by perf. My points remain: 1) using multiple files regresses perf's performance; 2) the file name by its meaning is wrong; 3) old perf tools on new kernels won't behave as expected wrt warnings and metrics because the meaning of the file has changed. Using a separate file for each watchdog resolves this. It seems that there wasn't enough critical mass for getting this right to have mattered before, but that doesn't mean we shouldn't get it right now. Thanks, Ian ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector 2025-10-07 22:45 ` Ian Rogers @ 2025-10-07 22:58 ` Doug Anderson 2025-10-08 0:11 ` Ian Rogers 0 siblings, 1 reply; 22+ messages in thread From: Doug Anderson @ 2025-10-07 22:58 UTC (permalink / raw) To: Ian Rogers Cc: Greg Kroah-Hartman, Jinchao Wang, Namhyung Kim, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users Hi, On Tue, Oct 7, 2025 at 3:45 PM Ian Rogers <irogers@google.com> wrote: > > On Tue, Oct 7, 2025 at 2:43 PM Doug Anderson <dianders@chromium.org> wrote: > ... > > The buddy watchdog was pretty much following the conventions that were > > already in the code: that the hardlockup detector (whether backed by > > perf or not) was essentially called the "nmi watchdog". There were a > > number of people that were involved in reviews and I don't believe > > suggesting creating a whole different mechanism for enabling / > > disabling the buddy watchdog was never suggested. > > I suspect they lacked the context that 1 in the nmi_watchdog is taken > to mean there's a perf event in use by the kernel with implications on > how group events behave. This behavior has been user > visible/advertised for 9 years. I don't doubt that there were good > intentions by PowerPC's watchdog and in the buddy watchdog patches in > using the file, that use will lead to spurious warnings and behaviors > by perf. > > My points remain: > 1) using multiple files regresses perf's performance; > 2) the file name by its meaning is wrong; > 3) old perf tools on new kernels won't behave as expected wrt warnings > and metrics because the meaning of the file has changed. > Using a separate file for each watchdog resolves this. It seems that > there wasn't enough critical mass for getting this right to have > mattered before, but that doesn't mean we shouldn't get it right now. Presumably your next steps then are to find someone to submit a patch and try to convince others on the list that this is a good idea. The issue with perf has been known for a while now and I haven't seen any patches. As I've said, I won't stand in the way if everyone else agrees, but given that I'm still not convinced I'm not going to author any patches for this myself. -Doug ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector 2025-10-07 22:58 ` Doug Anderson @ 2025-10-08 0:11 ` Ian Rogers 2025-10-09 6:50 ` Jinchao Wang 0 siblings, 1 reply; 22+ messages in thread From: Ian Rogers @ 2025-10-08 0:11 UTC (permalink / raw) To: Doug Anderson Cc: Greg Kroah-Hartman, Jinchao Wang, Namhyung Kim, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users On Tue, Oct 7, 2025 at 3:58 PM Doug Anderson <dianders@chromium.org> wrote: > > Hi, > > On Tue, Oct 7, 2025 at 3:45 PM Ian Rogers <irogers@google.com> wrote: > > > > On Tue, Oct 7, 2025 at 2:43 PM Doug Anderson <dianders@chromium.org> wrote: > > ... > > > The buddy watchdog was pretty much following the conventions that were > > > already in the code: that the hardlockup detector (whether backed by > > > perf or not) was essentially called the "nmi watchdog". There were a > > > number of people that were involved in reviews and I don't believe > > > suggesting creating a whole different mechanism for enabling / > > > disabling the buddy watchdog was never suggested. > > > > I suspect they lacked the context that 1 in the nmi_watchdog is taken > > to mean there's a perf event in use by the kernel with implications on > > how group events behave. This behavior has been user > > visible/advertised for 9 years. I don't doubt that there were good > > intentions by PowerPC's watchdog and in the buddy watchdog patches in > > using the file, that use will lead to spurious warnings and behaviors > > by perf. > > > > My points remain: > > 1) using multiple files regresses perf's performance; > > 2) the file name by its meaning is wrong; > > 3) old perf tools on new kernels won't behave as expected wrt warnings > > and metrics because the meaning of the file has changed. > > Using a separate file for each watchdog resolves this. It seems that > > there wasn't enough critical mass for getting this right to have > > mattered before, but that doesn't mean we shouldn't get it right now. > > Presumably your next steps then are to find someone to submit a patch > and try to convince others on the list that this is a good idea. The > issue with perf has been known for a while now and I haven't seen any > patches. As I've said, I won't stand in the way if everyone else > agrees, but given that I'm still not convinced I'm not going to author > any patches for this myself. Writing >1 of: ``` static struct ctl_table watchdog_hardlockup_sysctl[] = { { .procname = "nmi_watchdog", .data = &watchdog_hardlockup_user_enabled, .maxlen = sizeof(int), .mode = 0444, .proc_handler = proc_nmi_watchdog, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, }; ``` is an exercise of copy-and-paste, if you need me to do the copy and pasting then it is okay. Thanks, Ian > -Doug > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector 2025-10-08 0:11 ` Ian Rogers @ 2025-10-09 6:50 ` Jinchao Wang 2025-10-09 13:22 ` Ian Rogers 0 siblings, 1 reply; 22+ messages in thread From: Jinchao Wang @ 2025-10-09 6:50 UTC (permalink / raw) To: Ian Rogers Cc: Doug Anderson, Greg Kroah-Hartman, Namhyung Kim, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users On Tue, Oct 07, 2025 at 05:11:52PM -0700, Ian Rogers wrote: > On Tue, Oct 7, 2025 at 3:58 PM Doug Anderson <dianders@chromium.org> wrote: > > > > Hi, > > > > On Tue, Oct 7, 2025 at 3:45 PM Ian Rogers <irogers@google.com> wrote: > > > > > > On Tue, Oct 7, 2025 at 2:43 PM Doug Anderson <dianders@chromium.org> wrote: > > > ... > > > > The buddy watchdog was pretty much following the conventions that were > > > > already in the code: that the hardlockup detector (whether backed by > > > > perf or not) was essentially called the "nmi watchdog". There were a > > > > number of people that were involved in reviews and I don't believe > > > > suggesting creating a whole different mechanism for enabling / > > > > disabling the buddy watchdog was never suggested. > > > > > > I suspect they lacked the context that 1 in the nmi_watchdog is taken > > > to mean there's a perf event in use by the kernel with implications on > > > how group events behave. This behavior has been user > > > visible/advertised for 9 years. I don't doubt that there were good > > > intentions by PowerPC's watchdog and in the buddy watchdog patches in > > > using the file, that use will lead to spurious warnings and behaviors > > > by perf. > > > > > > My points remain: > > > 1) using multiple files regresses perf's performance; > > > 2) the file name by its meaning is wrong; > > > 3) old perf tools on new kernels won't behave as expected wrt warnings > > > and metrics because the meaning of the file has changed. > > > Using a separate file for each watchdog resolves this. It seems that > > > there wasn't enough critical mass for getting this right to have > > > mattered before, but that doesn't mean we shouldn't get it right now. > > > > Presumably your next steps then are to find someone to submit a patch > > and try to convince others on the list that this is a good idea. The > > issue with perf has been known for a while now and I haven't seen any > > patches. As I've said, I won't stand in the way if everyone else > > agrees, but given that I'm still not convinced I'm not going to author > > any patches for this myself. > > Writing >1 of: > ``` > static struct ctl_table watchdog_hardlockup_sysctl[] = { > { > .procname = "nmi_watchdog", > .data = &watchdog_hardlockup_user_enabled, > .maxlen = sizeof(int), > .mode = 0444, > .proc_handler = proc_nmi_watchdog, > .extra1 = SYSCTL_ZERO, > .extra2 = SYSCTL_ONE, > }, > }; > ``` > is an exercise of copy-and-paste, if you need me to do the copy and > pasting then it is okay. Can we get whether a perf event is already in use directly from the perf subsystem? There may be (or will be) other kernel users of perf_event besides the NMI watchdog. Exposing that state from the perf side would avoid coupling unrelated users through nmi_watchdog and similar features. > > Thanks, > Ian > > > > -Doug > > -- Jinchao ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector 2025-10-09 6:50 ` Jinchao Wang @ 2025-10-09 13:22 ` Ian Rogers 2025-10-10 12:54 ` Jinchao Wang 0 siblings, 1 reply; 22+ messages in thread From: Ian Rogers @ 2025-10-09 13:22 UTC (permalink / raw) To: Jinchao Wang Cc: Doug Anderson, Greg Kroah-Hartman, Namhyung Kim, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users On Wed, Oct 8, 2025 at 11:50 PM Jinchao Wang <wangjinchao600@gmail.com> wrote: > > On Tue, Oct 07, 2025 at 05:11:52PM -0700, Ian Rogers wrote: > > On Tue, Oct 7, 2025 at 3:58 PM Doug Anderson <dianders@chromium.org> wrote: > > > > > > Hi, > > > > > > On Tue, Oct 7, 2025 at 3:45 PM Ian Rogers <irogers@google.com> wrote: > > > > > > > > On Tue, Oct 7, 2025 at 2:43 PM Doug Anderson <dianders@chromium.org> wrote: > > > > ... > > > > > The buddy watchdog was pretty much following the conventions that were > > > > > already in the code: that the hardlockup detector (whether backed by > > > > > perf or not) was essentially called the "nmi watchdog". There were a > > > > > number of people that were involved in reviews and I don't believe > > > > > suggesting creating a whole different mechanism for enabling / > > > > > disabling the buddy watchdog was never suggested. > > > > > > > > I suspect they lacked the context that 1 in the nmi_watchdog is taken > > > > to mean there's a perf event in use by the kernel with implications on > > > > how group events behave. This behavior has been user > > > > visible/advertised for 9 years. I don't doubt that there were good > > > > intentions by PowerPC's watchdog and in the buddy watchdog patches in > > > > using the file, that use will lead to spurious warnings and behaviors > > > > by perf. > > > > > > > > My points remain: > > > > 1) using multiple files regresses perf's performance; > > > > 2) the file name by its meaning is wrong; > > > > 3) old perf tools on new kernels won't behave as expected wrt warnings > > > > and metrics because the meaning of the file has changed. > > > > Using a separate file for each watchdog resolves this. It seems that > > > > there wasn't enough critical mass for getting this right to have > > > > mattered before, but that doesn't mean we shouldn't get it right now. > > > > > > Presumably your next steps then are to find someone to submit a patch > > > and try to convince others on the list that this is a good idea. The > > > issue with perf has been known for a while now and I haven't seen any > > > patches. As I've said, I won't stand in the way if everyone else > > > agrees, but given that I'm still not convinced I'm not going to author > > > any patches for this myself. > > > > Writing >1 of: > > ``` > > static struct ctl_table watchdog_hardlockup_sysctl[] = { > > { > > .procname = "nmi_watchdog", > > .data = &watchdog_hardlockup_user_enabled, > > .maxlen = sizeof(int), > > .mode = 0444, > > .proc_handler = proc_nmi_watchdog, > > .extra1 = SYSCTL_ZERO, > > .extra2 = SYSCTL_ONE, > > }, > > }; > > ``` > > is an exercise of copy-and-paste, if you need me to do the copy and > > pasting then it is okay. > Can we get whether a perf event is already in use directly from the > perf subsystem? There may be (or will be) other kernel users of > perf_event besides the NMI watchdog. Exposing that state from the perf > side would avoid coupling unrelated users through nmi_watchdog and > similar features. For regular processes there is this unmerged proposal: https://lore.kernel.org/lkml/20250603181634.1362626-1-ctshao@google.com/ it doesn't say whether the counter is pinned - the NMI watchdog's counter is pinned to be a higher priority that flexible regular counters that may be multiplexed. I don't believe there is anything to say whether the kernel has taken a performance counter. In general something else taking a performance counter is okay as the kernel will multiplex the counter or groups of counters. The particular issue for the NMI watchdog counter is that groups of events are tested to see if they fit on a PMU, the perf event open will fail when a group isn't possible and then the events will be reprogrammed by the perf tool without a group. When the group is tested the PMU has always assumed that all counters are available, which of course with the NMI watchdog they are not. This results with the NMI watchdog causing a group of events to be created that can never be scheduled. Thanks, Ian > > > > Thanks, > > Ian > > > > > > > -Doug > > > > > -- > Jinchao ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector 2025-10-09 13:22 ` Ian Rogers @ 2025-10-10 12:54 ` Jinchao Wang 2025-10-13 15:22 ` Ian Rogers 0 siblings, 1 reply; 22+ messages in thread From: Jinchao Wang @ 2025-10-10 12:54 UTC (permalink / raw) To: Ian Rogers Cc: Doug Anderson, Greg Kroah-Hartman, Namhyung Kim, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users On 10/9/25 21:22, Ian Rogers wrote: > On Wed, Oct 8, 2025 at 11:50 PM Jinchao Wang <wangjinchao600@gmail.com> wrote: >> >> On Tue, Oct 07, 2025 at 05:11:52PM -0700, Ian Rogers wrote: >>> On Tue, Oct 7, 2025 at 3:58 PM Doug Anderson <dianders@chromium.org> wrote: >>>> >>>> Hi, >>>> >>>> On Tue, Oct 7, 2025 at 3:45 PM Ian Rogers <irogers@google.com> wrote: >>>>> >>>>> On Tue, Oct 7, 2025 at 2:43 PM Doug Anderson <dianders@chromium.org> wrote: >>>>> ... >>>>>> The buddy watchdog was pretty much following the conventions that were >>>>>> already in the code: that the hardlockup detector (whether backed by >>>>>> perf or not) was essentially called the "nmi watchdog". There were a >>>>>> number of people that were involved in reviews and I don't believe >>>>>> suggesting creating a whole different mechanism for enabling / >>>>>> disabling the buddy watchdog was never suggested. >>>>> >>>>> I suspect they lacked the context that 1 in the nmi_watchdog is taken >>>>> to mean there's a perf event in use by the kernel with implications on >>>>> how group events behave. This behavior has been user >>>>> visible/advertised for 9 years. I don't doubt that there were good >>>>> intentions by PowerPC's watchdog and in the buddy watchdog patches in >>>>> using the file, that use will lead to spurious warnings and behaviors >>>>> by perf. >>>>> >>>>> My points remain: >>>>> 1) using multiple files regresses perf's performance; >>>>> 2) the file name by its meaning is wrong; >>>>> 3) old perf tools on new kernels won't behave as expected wrt warnings >>>>> and metrics because the meaning of the file has changed. >>>>> Using a separate file for each watchdog resolves this. It seems that >>>>> there wasn't enough critical mass for getting this right to have >>>>> mattered before, but that doesn't mean we shouldn't get it right now. >>>> >>>> Presumably your next steps then are to find someone to submit a patch >>>> and try to convince others on the list that this is a good idea. The >>>> issue with perf has been known for a while now and I haven't seen any >>>> patches. As I've said, I won't stand in the way if everyone else >>>> agrees, but given that I'm still not convinced I'm not going to author >>>> any patches for this myself. >>> >>> Writing >1 of: >>> ``` >>> static struct ctl_table watchdog_hardlockup_sysctl[] = { >>> { >>> .procname = "nmi_watchdog", >>> .data = &watchdog_hardlockup_user_enabled, >>> .maxlen = sizeof(int), >>> .mode = 0444, >>> .proc_handler = proc_nmi_watchdog, >>> .extra1 = SYSCTL_ZERO, >>> .extra2 = SYSCTL_ONE, >>> }, >>> }; >>> ``` >>> is an exercise of copy-and-paste, if you need me to do the copy and >>> pasting then it is okay. >> Can we get whether a perf event is already in use directly from the >> perf subsystem? There may be (or will be) other kernel users of >> perf_event besides the NMI watchdog. Exposing that state from the perf >> side would avoid coupling unrelated users through nmi_watchdog and >> similar features. > > For regular processes there is this unmerged proposal: > https://lore.kernel.org/lkml/20250603181634.1362626-1-ctshao@google.com/ > it doesn't say whether the counter is pinned - the NMI watchdog's > counter is pinned to be a higher priority that flexible regular > counters that may be multiplexed. I don't believe there is anything to > say whether the kernel has taken a performance counter. In general > something else taking a performance counter is okay as the kernel > will multiplex the counter or groups of counters. > > The particular issue for the NMI watchdog counter is that groups of > events are tested to see if they fit on a PMU, the perf event open > will fail when a group isn't possible and then the events will be > reprogrammed by the perf tool without a group. When the group is > tested the PMU has always assumed that all counters are available, > which of course with the NMI watchdog they are not. This results with > the NMI watchdog causing a group of events to be created that can > never be scheduled. Addressing the PMU assumption that all counters are available would resolve the issue. If perf managed reserved or pinned counters internally, other users would not need to be aware of that detail. Alternatively, perf could provide an interface to query whether a counter is pinned. Having the NMI watchdog supply that information creates coupling between otherwise independent subsystems. > > Thanks, > Ian > >>> >>> Thanks, >>> Ian >>> >>> >>>> -Doug >>>> >> >> -- >> Jinchao -- Jinchao ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector 2025-10-10 12:54 ` Jinchao Wang @ 2025-10-13 15:22 ` Ian Rogers 0 siblings, 0 replies; 22+ messages in thread From: Ian Rogers @ 2025-10-13 15:22 UTC (permalink / raw) To: Jinchao Wang Cc: Doug Anderson, Greg Kroah-Hartman, Namhyung Kim, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, christophe.leroy, tglx, mingo, bp, dave.hansen, hpa, acme, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users On Fri, Oct 10, 2025 at 5:54 AM Jinchao Wang <wangjinchao600@gmail.com> wrote: > Addressing the PMU assumption that all counters are available would > resolve the issue. If perf managed reserved or pinned counters > internally, other users would not need to be aware of that detail. > > Alternatively, perf could provide an interface to query whether a > counter is pinned. Having the NMI watchdog supply that information > creates coupling between otherwise independent subsystems. There are lots of ways to redesign the perf event subsystem, counters, etc. These things are being pushed upon. In general the API is trying to hide details like the scheduling counters. I'm in a loop, but the change here is bad because: 1) the use of nmi_watchdog in this way is misleading (outside of the perf tool) because of the name; 2) it is bad because it is altering the way a kernel API has worked for close to 10 years meaning old code doesn't work as intended; 3) it incurs extra overhead in tools. Thanks, Ian ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector 2025-09-16 14:50 ` [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector Jinchao Wang 2025-09-17 0:03 ` Ian Rogers @ 2025-09-17 6:08 ` Christophe Leroy 2025-09-17 6:54 ` Jinchao Wang 1 sibling, 1 reply; 22+ messages in thread From: Christophe Leroy @ 2025-09-17 6:08 UTC (permalink / raw) To: Jinchao Wang, Doug Anderson, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, tglx, mingo, bp, dave.hansen, hpa, acme, namhyung, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users, Ian Rogers Le 16/09/2025 à 16:50, Jinchao Wang a écrit : > Currently, the hard lockup detector is selected at compile time via > Kconfig, which requires a kernel rebuild to switch implementations. > This is inflexible, especially on systems where a perf event may not > be available or may be needed for other tasks. > > This commit refactors the hard lockup detector to replace a rigid > compile-time choice with a flexible build-time and boot-time solution. > The patch supports building the kernel with either detector > independently, or with both. When both are built, a new boot parameter > `hardlockup_detector="perf|buddy"` allows the selection at boot time. > This is a more robust and user-friendly design. > > This patch is a follow-up to the discussion on the kernel mailing list > regarding the preference and future of the hard lockup detectors. It > implements a flexible solution that addresses the community's need to > select an appropriate detector at boot time. > > The core changes are: > - The `perf` and `buddy` watchdog implementations are separated into > distinct functions (e.g., `watchdog_perf_hardlockup_enable`). > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`) > to serve as a single API for the entire feature. > - A new `hardlockup_detector=` boot parameter is added to allow the > user to select the desired detector at boot time. > - The Kconfig options are simplified by removing the complex > `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be > built without mutual exclusion. > - The weak stubs are updated to call the new function pointers, > centralizing the watchdog logic. > > Link: https://lore.kernel.org/all/20250915035355.10846-1-cuiyunhui@bytedance.com/ > Link: https://lore.kernel.org/all/CAD=FV=WWUiCi6bZCs_gseFpDDWNkuJMoL6XCftEo6W7q6jRCkg@mail.gmail.com/ > > Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com> > --- > .../admin-guide/kernel-parameters.txt | 7 +++ > include/linux/nmi.h | 6 +++ > kernel/watchdog.c | 46 ++++++++++++++++++- > kernel/watchdog_buddy.c | 7 +-- > kernel/watchdog_perf.c | 10 ++-- > lib/Kconfig.debug | 37 +++++++-------- > 6 files changed, 85 insertions(+), 28 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 5a7a83c411e9..0af214ee566c 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -1828,6 +1828,13 @@ > backtraces on all cpus. > Format: 0 | 1 > > + hardlockup_detector= > + [perf, buddy] Selects the hard lockup detector to use at > + boot time. > + Format: <string> > + - "perf": Use the perf-based detector. > + - "buddy": Use the buddy-based detector. > + > hash_pointers= > [KNL,EARLY] > By default, when pointers are printed to the console > diff --git a/include/linux/nmi.h b/include/linux/nmi.h > index cf3c6ab408aa..9298980ce572 100644 > --- a/include/linux/nmi.h > +++ b/include/linux/nmi.h > @@ -100,6 +100,9 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs); > #endif > > #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) > +void watchdog_perf_hardlockup_enable(unsigned int cpu); > +void watchdog_perf_hardlockup_disable(unsigned int cpu); > +extern int watchdog_perf_hardlockup_probe(void); No 'extern' on function prototypes, this is pointless. > extern void hardlockup_detector_perf_stop(void); > extern void hardlockup_detector_perf_restart(void); > extern void hardlockup_config_perf_event(const char *str); > @@ -120,6 +123,9 @@ void watchdog_hardlockup_disable(unsigned int cpu); > void lockup_detector_reconfigure(void); > > #ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY > +void watchdog_buddy_hardlockup_enable(unsigned int cpu); > +void watchdog_buddy_hardlockup_disable(unsigned int cpu); > +int watchdog_buddy_hardlockup_probe(void); > void watchdog_buddy_check_hardlockup(int hrtimer_interrupts); > #else > static inline void watchdog_buddy_check_hardlockup(int hrtimer_interrupts) {} > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index 80b56c002c7f..85451d24a77d 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -55,6 +55,37 @@ unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask); > > #ifdef CONFIG_HARDLOCKUP_DETECTOR > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF > +/* The global function pointers */ > +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_enable; > +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_disable; > +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_perf_hardlockup_probe; As this is set only once at startup, can we use static_call instead of function pointers ? Also, can it me made __ro_after_init ? > +#elif defined(CONFIG_HARDLOCKUP_DETECTOR_BUDDY) > +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_enable; > +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_disable; > +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_buddy_hardlockup_probe; > +#endif > + > +#ifdef CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE > +static char *hardlockup_detector_type = "perf"; /* Default to perf */ > +static int __init set_hardlockup_detector_type(char *str) > +{ > + if (!strncmp(str, "perf", 4)) { Why strncmp ? What if I set 'hardlockup_detector=performance" ? > + watchdog_hardlockup_enable_ptr = watchdog_perf_hardlockup_enable; > + watchdog_hardlockup_disable_ptr = watchdog_perf_hardlockup_disable; > + watchdog_hardlockup_probe_ptr = watchdog_perf_hardlockup_probe; > + } else if (!strncmp(str, "buddy", 5)) { > + watchdog_hardlockup_enable_ptr = watchdog_buddy_hardlockup_enable; > + watchdog_hardlockup_disable_ptr = watchdog_buddy_hardlockup_disable; > + watchdog_hardlockup_probe_ptr = watchdog_buddy_hardlockup_probe; > + } > + return 1; > +} > + > +__setup("hardlockup_detector=", set_hardlockup_detector_type); > + > +#endif > + > # ifdef CONFIG_SMP > int __read_mostly sysctl_hardlockup_all_cpu_backtrace; > # endif /* CONFIG_SMP */ > @@ -262,9 +293,17 @@ static inline void watchdog_hardlockup_kick(void) { } > * softlockup watchdog start and stop. The detector must select the > * SOFTLOCKUP_DETECTOR Kconfig. > */ > -void __weak watchdog_hardlockup_enable(unsigned int cpu) { } > +void __weak watchdog_hardlockup_enable(unsigned int cpu) > +{ > + if (watchdog_hardlockup_enable_ptr) > + watchdog_hardlockup_enable_ptr(cpu); > +} This is a weak function so it can be overloaded. What happens then, for instance if the sparc architecture version of watchdog_hardlockup_enable() is called instead ? > > -void __weak watchdog_hardlockup_disable(unsigned int cpu) { } > +void __weak watchdog_hardlockup_disable(unsigned int cpu) > +{ > + if (watchdog_hardlockup_disable_ptr) > + watchdog_hardlockup_disable_ptr(cpu); > +} > > /* > * Watchdog-detector specific API. > @@ -275,6 +314,9 @@ void __weak watchdog_hardlockup_disable(unsigned int cpu) { } > */ > int __weak __init watchdog_hardlockup_probe(void) > { > + if (watchdog_hardlockup_probe_ptr) > + return watchdog_hardlockup_probe_ptr(); > + > return -ENODEV; > } > > diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c > index ee754d767c21..390d89bfcafa 100644 > --- a/kernel/watchdog_buddy.c > +++ b/kernel/watchdog_buddy.c > @@ -19,15 +19,16 @@ static unsigned int watchdog_next_cpu(unsigned int cpu) > return next_cpu; > } > > -int __init watchdog_hardlockup_probe(void) > +int __init watchdog_buddy_hardlockup_probe(void) > { > return 0; > } > > -void watchdog_hardlockup_enable(unsigned int cpu) > +void watchdog_buddy_hardlockup_enable(unsigned int cpu) > { > unsigned int next_cpu; > > + pr_info("ddddd %s\n", __func__); Leftover from debuging ? > /* > * The new CPU will be marked online before the hrtimer interrupt > * gets a chance to run on it. If another CPU tests for a > @@ -58,7 +59,7 @@ void watchdog_hardlockup_enable(unsigned int cpu) > cpumask_set_cpu(cpu, &watchdog_cpus); > } > > -void watchdog_hardlockup_disable(unsigned int cpu) > +void watchdog_buddy_hardlockup_disable(unsigned int cpu) > { > unsigned int next_cpu = watchdog_next_cpu(cpu); > > diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c > index 9c58f5b4381d..270110e58f20 100644 > --- a/kernel/watchdog_perf.c > +++ b/kernel/watchdog_perf.c > @@ -153,10 +153,12 @@ static int hardlockup_detector_event_create(void) > * watchdog_hardlockup_enable - Enable the local event > * @cpu: The CPU to enable hard lockup on. > */ > -void watchdog_hardlockup_enable(unsigned int cpu) > +void watchdog_perf_hardlockup_enable(unsigned int cpu) > { > WARN_ON_ONCE(cpu != smp_processor_id()); > > + pr_info("ddddd %s\n", __func__); > + > if (hardlockup_detector_event_create()) > return; > > @@ -172,7 +174,7 @@ void watchdog_hardlockup_enable(unsigned int cpu) > * watchdog_hardlockup_disable - Disable the local event > * @cpu: The CPU to enable hard lockup on. > */ > -void watchdog_hardlockup_disable(unsigned int cpu) > +void watchdog_perf_hardlockup_disable(unsigned int cpu) > { > struct perf_event *event = this_cpu_read(watchdog_ev); > > @@ -257,10 +259,12 @@ bool __weak __init arch_perf_nmi_is_available(void) > /** > * watchdog_hardlockup_probe - Probe whether NMI event is available at all > */ > -int __init watchdog_hardlockup_probe(void) > +int __init watchdog_perf_hardlockup_probe(void) > { > int ret; > > + pr_info("ddddd %s\n", __func__); > + > if (!arch_perf_nmi_is_available()) > return -ENODEV; > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index dc0e0c6ed075..443353fad1c1 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -1167,36 +1167,33 @@ config HARDLOCKUP_DETECTOR > # > # Note that arch-specific variants are always preferred. > # > -config HARDLOCKUP_DETECTOR_PREFER_BUDDY > - bool "Prefer the buddy CPU hardlockup detector" > - depends on HARDLOCKUP_DETECTOR > - depends on HAVE_HARDLOCKUP_DETECTOR_PERF && HAVE_HARDLOCKUP_DETECTOR_BUDDY > - depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH > - help > - Say Y here to prefer the buddy hardlockup detector over the perf one. > - > - With the buddy detector, each CPU uses its softlockup hrtimer > - to check that the next CPU is processing hrtimer interrupts by > - verifying that a counter is increasing. > - > - This hardlockup detector is useful on systems that don't have > - an arch-specific hardlockup detector or if resources needed > - for the hardlockup detector are better used for other things. > - > config HARDLOCKUP_DETECTOR_PERF > - bool > + bool "Enable perf-based hard lockup detector (preferred)" > depends on HARDLOCKUP_DETECTOR > - depends on HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY > + depends on HAVE_HARDLOCKUP_DETECTOR_PERF > depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH > select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER > + help > + This detector uses a perf event on the CPU to detect when a CPU > + has become non-maskable interrupt (NMI) stuck. This is the > + preferred method on modern systems as it can detect lockups on > + all CPUs at the same time. > > config HARDLOCKUP_DETECTOR_BUDDY > - bool > + bool "Enable buddy-based hard lockup detector" > depends on HARDLOCKUP_DETECTOR > depends on HAVE_HARDLOCKUP_DETECTOR_BUDDY > - depends on !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY > depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH > select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER > + help > + This is an alternative lockup detector that uses a heartbeat > + mechanism between CPUs to detect when one has stopped responding. > + It is less precise than the perf-based detector and cannot detect > + all-CPU lockups, but it does not require a perf counter. > + > +config CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE > + bool > + depends on HARDLOCKUP_DETECTOR_PERF && HARDLOCKUP_DETECTOR_BUDDY > > config HARDLOCKUP_DETECTOR_ARCH > bool ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector 2025-09-17 6:08 ` Christophe Leroy @ 2025-09-17 6:54 ` Jinchao Wang 2025-10-06 20:13 ` Doug Anderson 0 siblings, 1 reply; 22+ messages in thread From: Jinchao Wang @ 2025-09-17 6:54 UTC (permalink / raw) To: Christophe Leroy Cc: Doug Anderson, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, tglx, mingo, bp, dave.hansen, hpa, acme, namhyung, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users, Ian Rogers On Wed, Sep 17, 2025 at 08:08:57AM +0200, Christophe Leroy wrote: > > > Le 16/09/2025 à 16:50, Jinchao Wang a écrit : > > Currently, the hard lockup detector is selected at compile time via > > Kconfig, which requires a kernel rebuild to switch implementations. > > This is inflexible, especially on systems where a perf event may not > > be available or may be needed for other tasks. > > > > This commit refactors the hard lockup detector to replace a rigid > > compile-time choice with a flexible build-time and boot-time solution. > > The patch supports building the kernel with either detector > > independently, or with both. When both are built, a new boot parameter > > `hardlockup_detector="perf|buddy"` allows the selection at boot time. > > This is a more robust and user-friendly design. > > > > This patch is a follow-up to the discussion on the kernel mailing list > > regarding the preference and future of the hard lockup detectors. It > > implements a flexible solution that addresses the community's need to > > select an appropriate detector at boot time. > > > > The core changes are: > > - The `perf` and `buddy` watchdog implementations are separated into > > distinct functions (e.g., `watchdog_perf_hardlockup_enable`). > > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`) > > to serve as a single API for the entire feature. > > - A new `hardlockup_detector=` boot parameter is added to allow the > > user to select the desired detector at boot time. > > - The Kconfig options are simplified by removing the complex > > `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be > > built without mutual exclusion. > > - The weak stubs are updated to call the new function pointers, > > centralizing the watchdog logic. > > > > Link: https://lore.kernel.org/all/20250915035355.10846-1-cuiyunhui@bytedance.com/ > > Link: https://lore.kernel.org/all/CAD=FV=WWUiCi6bZCs_gseFpDDWNkuJMoL6XCftEo6W7q6jRCkg@mail.gmail.com/ > > > > Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com> > > --- > > .../admin-guide/kernel-parameters.txt | 7 +++ > > include/linux/nmi.h | 6 +++ > > kernel/watchdog.c | 46 ++++++++++++++++++- > > kernel/watchdog_buddy.c | 7 +-- > > kernel/watchdog_perf.c | 10 ++-- > > lib/Kconfig.debug | 37 +++++++-------- > > 6 files changed, 85 insertions(+), 28 deletions(-) > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > > index 5a7a83c411e9..0af214ee566c 100644 > > --- a/Documentation/admin-guide/kernel-parameters.txt > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > @@ -1828,6 +1828,13 @@ > > backtraces on all cpus. > > Format: 0 | 1 > > + hardlockup_detector= > > + [perf, buddy] Selects the hard lockup detector to use at > > + boot time. > > + Format: <string> > > + - "perf": Use the perf-based detector. > > + - "buddy": Use the buddy-based detector. > > + > > hash_pointers= > > [KNL,EARLY] > > By default, when pointers are printed to the console > > diff --git a/include/linux/nmi.h b/include/linux/nmi.h > > index cf3c6ab408aa..9298980ce572 100644 > > --- a/include/linux/nmi.h > > +++ b/include/linux/nmi.h > > @@ -100,6 +100,9 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs); > > #endif > > #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) > > +void watchdog_perf_hardlockup_enable(unsigned int cpu); > > +void watchdog_perf_hardlockup_disable(unsigned int cpu); > > +extern int watchdog_perf_hardlockup_probe(void); > > No 'extern' on function prototypes, this is pointless. Got it. > > > extern void hardlockup_detector_perf_stop(void); > > extern void hardlockup_detector_perf_restart(void); > > extern void hardlockup_config_perf_event(const char *str); > > @@ -120,6 +123,9 @@ void watchdog_hardlockup_disable(unsigned int cpu); > > void lockup_detector_reconfigure(void); > > #ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY > > +void watchdog_buddy_hardlockup_enable(unsigned int cpu); > > +void watchdog_buddy_hardlockup_disable(unsigned int cpu); > > +int watchdog_buddy_hardlockup_probe(void); > > void watchdog_buddy_check_hardlockup(int hrtimer_interrupts); > > #else > > static inline void watchdog_buddy_check_hardlockup(int hrtimer_interrupts) {} > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > > index 80b56c002c7f..85451d24a77d 100644 > > --- a/kernel/watchdog.c > > +++ b/kernel/watchdog.c > > @@ -55,6 +55,37 @@ unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask); > > #ifdef CONFIG_HARDLOCKUP_DETECTOR > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF > > +/* The global function pointers */ > > +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_enable; > > +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_disable; > > +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_perf_hardlockup_probe; > > As this is set only once at startup, can we use static_call instead of > function pointers ? > > Also, can it me made __ro_after_init ? Not really, this is just an RFC patch, and there is no consensus yet. If it is included in the final consensus, I will handle it in the next version. > > > +#elif defined(CONFIG_HARDLOCKUP_DETECTOR_BUDDY) > > +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_enable; > > +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_disable; > > +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_buddy_hardlockup_probe; > > +#endif > > + > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE > > +static char *hardlockup_detector_type = "perf"; /* Default to perf */ > > +static int __init set_hardlockup_detector_type(char *str) > > +{ > > + if (!strncmp(str, "perf", 4)) { > > Why strncmp ? Copy from hardlockup_panic_setup(). > > What if I set 'hardlockup_detector=performance" ? I think that is acceptable in this case. > > > > + watchdog_hardlockup_enable_ptr = watchdog_perf_hardlockup_enable; > > + watchdog_hardlockup_disable_ptr = watchdog_perf_hardlockup_disable; > > + watchdog_hardlockup_probe_ptr = watchdog_perf_hardlockup_probe; > > + } else if (!strncmp(str, "buddy", 5)) { > > + watchdog_hardlockup_enable_ptr = watchdog_buddy_hardlockup_enable; > > + watchdog_hardlockup_disable_ptr = watchdog_buddy_hardlockup_disable; > > + watchdog_hardlockup_probe_ptr = watchdog_buddy_hardlockup_probe; > > + } > > + return 1; > > +} > > + > > +__setup("hardlockup_detector=", set_hardlockup_detector_type); > > + > > +#endif > > + > > # ifdef CONFIG_SMP > > int __read_mostly sysctl_hardlockup_all_cpu_backtrace; > > # endif /* CONFIG_SMP */ > > @@ -262,9 +293,17 @@ static inline void watchdog_hardlockup_kick(void) { } > > * softlockup watchdog start and stop. The detector must select the > > * SOFTLOCKUP_DETECTOR Kconfig. > > */ > > -void __weak watchdog_hardlockup_enable(unsigned int cpu) { } > > +void __weak watchdog_hardlockup_enable(unsigned int cpu) > > +{ > > + if (watchdog_hardlockup_enable_ptr) > > + watchdog_hardlockup_enable_ptr(cpu); > > +} > > This is a weak function so it can be overloaded. What happens then, for > instance if the sparc architecture version of watchdog_hardlockup_enable() > is called instead ? It is a historical problem; I prefer using an #if condition instead. I had considered sparc arch, if sparc version is called, it is expected. Because the __weak functions only handle perf & buddy watchdog not the sparc watchdog. I think we should first resolve the consensus issue: - Should we keep both perf and buddy watchdogs? (probably yes already) - Should the watchdog type be changeable at boot time? - Should the watchdog type be changeable at runtime? How we handle these different watchdog types(maybe including sparc type) depends on the answers to these questions. What do you think? > > > -void __weak watchdog_hardlockup_disable(unsigned int cpu) { } > > +void __weak watchdog_hardlockup_disable(unsigned int cpu) > > +{ > > + if (watchdog_hardlockup_disable_ptr) > > + watchdog_hardlockup_disable_ptr(cpu); > > +} > > /* > > * Watchdog-detector specific API. > > @@ -275,6 +314,9 @@ void __weak watchdog_hardlockup_disable(unsigned int cpu) { } > > */ > > int __weak __init watchdog_hardlockup_probe(void) > > { > > + if (watchdog_hardlockup_probe_ptr) > > + return watchdog_hardlockup_probe_ptr(); > > + > > return -ENODEV; > > } > > diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c > > index ee754d767c21..390d89bfcafa 100644 > > --- a/kernel/watchdog_buddy.c > > +++ b/kernel/watchdog_buddy.c > > @@ -19,15 +19,16 @@ static unsigned int watchdog_next_cpu(unsigned int cpu) > > return next_cpu; > > } > > -int __init watchdog_hardlockup_probe(void) > > +int __init watchdog_buddy_hardlockup_probe(void) > > { > > return 0; > > } > > -void watchdog_hardlockup_enable(unsigned int cpu) > > +void watchdog_buddy_hardlockup_enable(unsigned int cpu) > > { > > unsigned int next_cpu; > > + pr_info("ddddd %s\n", __func__); > > Leftover from debuging ? Forgot to delete the log, will fix if a v2 is needed. > > > /* > > * The new CPU will be marked online before the hrtimer interrupt > > * gets a chance to run on it. If another CPU tests for a > > @@ -58,7 +59,7 @@ void watchdog_hardlockup_enable(unsigned int cpu) > > cpumask_set_cpu(cpu, &watchdog_cpus); > > } > > -void watchdog_hardlockup_disable(unsigned int cpu) > > +void watchdog_buddy_hardlockup_disable(unsigned int cpu) > > { > > unsigned int next_cpu = watchdog_next_cpu(cpu); > > diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c > > index 9c58f5b4381d..270110e58f20 100644 > > --- a/kernel/watchdog_perf.c > > +++ b/kernel/watchdog_perf.c > > @@ -153,10 +153,12 @@ static int hardlockup_detector_event_create(void) > > * watchdog_hardlockup_enable - Enable the local event > > * @cpu: The CPU to enable hard lockup on. > > */ > > -void watchdog_hardlockup_enable(unsigned int cpu) > > +void watchdog_perf_hardlockup_enable(unsigned int cpu) > > { > > WARN_ON_ONCE(cpu != smp_processor_id()); > > + pr_info("ddddd %s\n", __func__); > > + > > if (hardlockup_detector_event_create()) > > return; > > @@ -172,7 +174,7 @@ void watchdog_hardlockup_enable(unsigned int cpu) > > * watchdog_hardlockup_disable - Disable the local event > > * @cpu: The CPU to enable hard lockup on. > > */ > > -void watchdog_hardlockup_disable(unsigned int cpu) > > +void watchdog_perf_hardlockup_disable(unsigned int cpu) > > { > > struct perf_event *event = this_cpu_read(watchdog_ev); > > @@ -257,10 +259,12 @@ bool __weak __init arch_perf_nmi_is_available(void) > > /** > > * watchdog_hardlockup_probe - Probe whether NMI event is available at all > > */ > > -int __init watchdog_hardlockup_probe(void) > > +int __init watchdog_perf_hardlockup_probe(void) > > { > > int ret; > > + pr_info("ddddd %s\n", __func__); > > + > > if (!arch_perf_nmi_is_available()) > > return -ENODEV; > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > > index dc0e0c6ed075..443353fad1c1 100644 > > --- a/lib/Kconfig.debug > > +++ b/lib/Kconfig.debug > > @@ -1167,36 +1167,33 @@ config HARDLOCKUP_DETECTOR > > # > > # Note that arch-specific variants are always preferred. > > # > > -config HARDLOCKUP_DETECTOR_PREFER_BUDDY > > - bool "Prefer the buddy CPU hardlockup detector" > > - depends on HARDLOCKUP_DETECTOR > > - depends on HAVE_HARDLOCKUP_DETECTOR_PERF && HAVE_HARDLOCKUP_DETECTOR_BUDDY > > - depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH > > - help > > - Say Y here to prefer the buddy hardlockup detector over the perf one. > > - > > - With the buddy detector, each CPU uses its softlockup hrtimer > > - to check that the next CPU is processing hrtimer interrupts by > > - verifying that a counter is increasing. > > - > > - This hardlockup detector is useful on systems that don't have > > - an arch-specific hardlockup detector or if resources needed > > - for the hardlockup detector are better used for other things. > > - > > config HARDLOCKUP_DETECTOR_PERF > > - bool > > + bool "Enable perf-based hard lockup detector (preferred)" > > depends on HARDLOCKUP_DETECTOR > > - depends on HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY > > + depends on HAVE_HARDLOCKUP_DETECTOR_PERF > > depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH > > select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER > > + help > > + This detector uses a perf event on the CPU to detect when a CPU > > + has become non-maskable interrupt (NMI) stuck. This is the > > + preferred method on modern systems as it can detect lockups on > > + all CPUs at the same time. > > config HARDLOCKUP_DETECTOR_BUDDY > > - bool > > + bool "Enable buddy-based hard lockup detector" > > depends on HARDLOCKUP_DETECTOR > > depends on HAVE_HARDLOCKUP_DETECTOR_BUDDY > > - depends on !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY > > depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH > > select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER > > + help > > + This is an alternative lockup detector that uses a heartbeat > > + mechanism between CPUs to detect when one has stopped responding. > > + It is less precise than the perf-based detector and cannot detect > > + all-CPU lockups, but it does not require a perf counter. > > + > > +config CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE > > + bool > > + depends on HARDLOCKUP_DETECTOR_PERF && HARDLOCKUP_DETECTOR_BUDDY > > config HARDLOCKUP_DETECTOR_ARCH > > bool > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector 2025-09-17 6:54 ` Jinchao Wang @ 2025-10-06 20:13 ` Doug Anderson 0 siblings, 0 replies; 22+ messages in thread From: Doug Anderson @ 2025-10-06 20:13 UTC (permalink / raw) To: Jinchao Wang Cc: Christophe Leroy, Peter Zijlstra, Will Deacon, Yunhui Cui, akpm, catalin.marinas, maddy, mpe, npiggin, tglx, mingo, bp, dave.hansen, hpa, acme, namhyung, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, kees, masahiroy, aliceryhl, ojeda, thomas.weissschuh, xur, ruanjinjie, gshan, maz, suzuki.poulose, zhanjie9, yangyicong, gautam, arnd, zhao.xichao, rppt, lihuafei1, coxu, jpoimboe, yaozhenguo1, luogengkun, max.kellermann, tj, yury.norov, thorsten.blum, x86, linux-kernel, linux-arm-kernel, linuxppc-dev, linux-perf-users, Ian Rogers Hi, On Tue, Sep 16, 2025 at 11:55 PM Jinchao Wang <wangjinchao600@gmail.com> wrote: > > On Wed, Sep 17, 2025 at 08:08:57AM +0200, Christophe Leroy wrote: > > > > > > Le 16/09/2025 à 16:50, Jinchao Wang a écrit : > > > Currently, the hard lockup detector is selected at compile time via > > > Kconfig, which requires a kernel rebuild to switch implementations. > > > This is inflexible, especially on systems where a perf event may not > > > be available or may be needed for other tasks. > > > > > > This commit refactors the hard lockup detector to replace a rigid > > > compile-time choice with a flexible build-time and boot-time solution. > > > The patch supports building the kernel with either detector > > > independently, or with both. When both are built, a new boot parameter > > > `hardlockup_detector="perf|buddy"` allows the selection at boot time. > > > This is a more robust and user-friendly design. > > > > > > This patch is a follow-up to the discussion on the kernel mailing list > > > regarding the preference and future of the hard lockup detectors. It > > > implements a flexible solution that addresses the community's need to > > > select an appropriate detector at boot time. > > > > > > The core changes are: > > > - The `perf` and `buddy` watchdog implementations are separated into > > > distinct functions (e.g., `watchdog_perf_hardlockup_enable`). > > > - Global function pointers are introduced (`watchdog_hardlockup_enable_ptr`) > > > to serve as a single API for the entire feature. > > > - A new `hardlockup_detector=` boot parameter is added to allow the > > > user to select the desired detector at boot time. > > > - The Kconfig options are simplified by removing the complex > > > `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors to be > > > built without mutual exclusion. > > > - The weak stubs are updated to call the new function pointers, > > > centralizing the watchdog logic. > > > > > > Link: https://lore.kernel.org/all/20250915035355.10846-1-cuiyunhui@bytedance.com/ > > > Link: https://lore.kernel.org/all/CAD=FV=WWUiCi6bZCs_gseFpDDWNkuJMoL6XCftEo6W7q6jRCkg@mail.gmail.com/ > > > > > > Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com> > > > --- > > > .../admin-guide/kernel-parameters.txt | 7 +++ > > > include/linux/nmi.h | 6 +++ > > > kernel/watchdog.c | 46 ++++++++++++++++++- > > > kernel/watchdog_buddy.c | 7 +-- > > > kernel/watchdog_perf.c | 10 ++-- > > > lib/Kconfig.debug | 37 +++++++-------- > > > 6 files changed, 85 insertions(+), 28 deletions(-) > > > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > > > index 5a7a83c411e9..0af214ee566c 100644 > > > --- a/Documentation/admin-guide/kernel-parameters.txt > > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > > @@ -1828,6 +1828,13 @@ > > > backtraces on all cpus. > > > Format: 0 | 1 > > > + hardlockup_detector= > > > + [perf, buddy] Selects the hard lockup detector to use at > > > + boot time. > > > + Format: <string> > > > + - "perf": Use the perf-based detector. > > > + - "buddy": Use the buddy-based detector. > > > + > > > hash_pointers= > > > [KNL,EARLY] > > > By default, when pointers are printed to the console > > > diff --git a/include/linux/nmi.h b/include/linux/nmi.h > > > index cf3c6ab408aa..9298980ce572 100644 > > > --- a/include/linux/nmi.h > > > +++ b/include/linux/nmi.h > > > @@ -100,6 +100,9 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs); > > > #endif > > > #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) > > > +void watchdog_perf_hardlockup_enable(unsigned int cpu); > > > +void watchdog_perf_hardlockup_disable(unsigned int cpu); > > > +extern int watchdog_perf_hardlockup_probe(void); > > > > No 'extern' on function prototypes, this is pointless. > Got it. > > > > > extern void hardlockup_detector_perf_stop(void); > > > extern void hardlockup_detector_perf_restart(void); > > > extern void hardlockup_config_perf_event(const char *str); > > > @@ -120,6 +123,9 @@ void watchdog_hardlockup_disable(unsigned int cpu); > > > void lockup_detector_reconfigure(void); > > > #ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY > > > +void watchdog_buddy_hardlockup_enable(unsigned int cpu); > > > +void watchdog_buddy_hardlockup_disable(unsigned int cpu); > > > +int watchdog_buddy_hardlockup_probe(void); > > > void watchdog_buddy_check_hardlockup(int hrtimer_interrupts); > > > #else > > > static inline void watchdog_buddy_check_hardlockup(int hrtimer_interrupts) {} > > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > > > index 80b56c002c7f..85451d24a77d 100644 > > > --- a/kernel/watchdog.c > > > +++ b/kernel/watchdog.c > > > @@ -55,6 +55,37 @@ unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask); > > > #ifdef CONFIG_HARDLOCKUP_DETECTOR > > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF > > > +/* The global function pointers */ > > > +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_enable; > > > +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_perf_hardlockup_disable; > > > +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_perf_hardlockup_probe; > > > > As this is set only once at startup, can we use static_call instead of > > function pointers ? > > > > Also, can it me made __ro_after_init ? > Not really, this is just an RFC patch, and there is no consensus yet. > If it is included in the final consensus, I will handle it in the next version. > > > > > +#elif defined(CONFIG_HARDLOCKUP_DETECTOR_BUDDY) > > > +void (*watchdog_hardlockup_enable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_enable; > > > +void (*watchdog_hardlockup_disable_ptr)(unsigned int cpu) = watchdog_buddy_hardlockup_disable; > > > +int (*watchdog_hardlockup_probe_ptr)(void) = watchdog_buddy_hardlockup_probe; > > > +#endif > > > + > > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR_MULTIPLE > > > +static char *hardlockup_detector_type = "perf"; /* Default to perf */ I'd still at least hope that "buddy" could be the default. While "perf" can still be useful to catch some things that "buddy" can't, "buddy" can also catch some things that perf can't so picking the lower-overhead one as the default is perhaps better? > I think we should first resolve the consensus issue: > - Should we keep both perf and buddy watchdogs? (probably yes already) Sounds like people want to keep perf, so "yes". > - Should the watchdog type be changeable at boot time? Seems like a good start. > - Should the watchdog type be changeable at runtime? If it was easy/possible then sure, but I don't think it's a giant deal to have something like this only changeable at boot time, like your patch does. The fact that your patch is pretty simple is definitely a big win. IMO it would be worth sending it out as a real patch. -Doug ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2025-10-13 15:22 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <https://lore.kernel.org/all/20250915035355.10846-1-cuiyunhui@bytedance.com/>
2025-09-16 14:50 ` [RFC PATCH V1] watchdog: Add boot-time selection for hard lockup detector Jinchao Wang
2025-09-17 0:03 ` Ian Rogers
2025-09-17 1:47 ` Jinchao Wang
2025-09-17 5:13 ` Ian Rogers
2025-09-17 5:35 ` Namhyung Kim
2025-09-17 6:14 ` Jinchao Wang
2025-10-06 21:29 ` Ian Rogers
2025-10-06 23:24 ` Doug Anderson
2025-10-07 1:00 ` Ian Rogers
2025-10-07 19:54 ` Doug Anderson
2025-10-07 20:43 ` Ian Rogers
2025-10-07 21:43 ` Doug Anderson
2025-10-07 22:45 ` Ian Rogers
2025-10-07 22:58 ` Doug Anderson
2025-10-08 0:11 ` Ian Rogers
2025-10-09 6:50 ` Jinchao Wang
2025-10-09 13:22 ` Ian Rogers
2025-10-10 12:54 ` Jinchao Wang
2025-10-13 15:22 ` Ian Rogers
2025-09-17 6:08 ` Christophe Leroy
2025-09-17 6:54 ` Jinchao Wang
2025-10-06 20:13 ` Doug Anderson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).