* [PATCH 0/2] Add HPET NMI Watchdog support
@ 2026-02-02 17:43 Alexander Graf
2026-02-02 17:49 ` Alexander Graf
0 siblings, 1 reply; 20+ messages in thread
From: Alexander Graf @ 2026-02-02 17:43 UTC (permalink / raw)
To: x86
Cc: linux-kernel, linux-doc, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, Dave Hansen, Borislav Petkov, Ingo Molnar,
Thomas Gleixner, Jonathan Corbet, Paolo Bonzini, Pasha Tatashin,
nh-open-source, Nicolas Saenz Julienne, Hendrik Borghorst,
Filippo Sironi, David Woodhouse, Jan Schönherr
The current NMI watchdog relies on performance counters and consistently
occupies one on each CPU. When running virtual machines, we want to pass
performance counters to virtual machines so they can make use of them.
In addition the host system wants to use performance counters to check
the system to identify when anything looks abnormal, such as split
locks.
That makes PMCs a precious resource. So any PMC we can free up is a PMC
we can use for something useful. That made me look at the NMI watchdog.
The PMC based NMI watchdog implementation does not actually need any
performance counting. It just needs a per-CPU NMI timer source. X86
systems can make anything that emits an interrupt descriptor (IOAPIC,
MSI(-X), etc) become an NMI source. So any time goes. Including the
HPET. And while they can't really operate per-CPU, in almost all cases
you only really want the NMI on *all* CPUs, rather than per-CPU.
So I took a stab at building an HPET based NMI watchdog. In my (QEMU
based) testing, it's fully functional and can successfully detect when
CPUs get stuck. It even survives suspend/resume cycles.
For now, its enablement is a config time option because the hardlockup
framework does not support dynamic switching of multiple detectors.
That's ok for our use case. But maybe something for the interested
reader to tackle eventually :).
You can enable the HPET watchdog by default by setting
CONFIG_HARDLOCKUP_DETECTOR_HPET_DEFAULT=y
or passing "hpet=watchdog" to the kernel command line. When active, it
will emit a kernel log message to indicate it works:
[ 0.179176] hpet: HPET watchdog initialized on timer 0, GSI 2
The HPET can only be in either watchdog or generic mode. I am a bit
worried about IO-APIC pin allocation logic, so I opted to reuse the
generic timer pin. And that means I'm effectively breaking the normal
interrupt delivery path. so the easy way out was to say when watchdog is
active, PIT and HPET are not available as timer sources. Which is ok on
modern systems. There are way too many (unreliable) timer sources on x86
already. Trimming a few surely won't hurt.
I'm open to inputs on how to make the HPET multi-purpose though, in case
anyone feels strongly about it.
Alex
Alexander Graf (2):
x86/ioapic: Add NMI delivery configuration helper
hpet: Add HPET-based NMI watchdog support
.../admin-guide/kernel-parameters.txt | 5 +-
arch/x86/Kconfig | 19 ++
arch/x86/include/asm/io_apic.h | 2 +
arch/x86/kernel/apic/io_apic.c | 32 ++++
arch/x86/kernel/hpet.c | 172 ++++++++++++++++++
arch/x86/kernel/i8253.c | 9 +
drivers/char/hpet.c | 3 +
include/linux/hpet.h | 14 ++
8 files changed, 255 insertions(+), 1 deletion(-)
--
2.47.1
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 0/2] Add HPET NMI Watchdog support
@ 2026-02-02 17:48 Alexander Graf
2026-02-02 17:48 ` [PATCH 1/2] x86/ioapic: Add NMI delivery configuration helper Alexander Graf
2026-02-02 17:48 ` [PATCH 2/2] hpet: Add HPET-based NMI watchdog support Alexander Graf
0 siblings, 2 replies; 20+ messages in thread
From: Alexander Graf @ 2026-02-02 17:48 UTC (permalink / raw)
To: x86
Cc: linux-kernel, linux-doc, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, Dave Hansen, Borislav Petkov, Ingo Molnar,
Thomas Gleixner, Jonathan Corbet, Paolo Bonzini, Pasha Tatashin,
nh-open-source, Nicolas Saenz Julienne, Hendrik Borghorst,
Filippo Sironi, David Woodhouse, Jan Schönherr
The current NMI watchdog relies on performance counters and consistently
occupies one on each CPU. When running virtual machines, we want to pass
performance counters to virtual machines so they can make use of them.
In addition the host system wants to use performance counters to check
the system to identify when anything looks abnormal, such as split
locks.
That makes PMCs a precious resource. So any PMC we can free up is a PMC
we can use for something useful. That made me look at the NMI watchdog.
The PMC based NMI watchdog implementation does not actually need any
performance counting. It just needs a per-CPU NMI timer source. X86
systems can make anything that emits an interrupt descriptor (IOAPIC,
MSI(-X), etc) become an NMI source. So any time goes. Including the
HPET. And while they can't really operate per-CPU, in almost all cases
you only really want the NMI on *all* CPUs, rather than per-CPU.
So I took a stab at building an HPET based NMI watchdog. In my (QEMU
based) testing, it's fully functional and can successfully detect when
CPUs get stuck. It even survives suspend/resume cycles.
For now, its enablement is a config time option because the hardlockup
framework does not support dynamic switching of multiple detectors.
That's ok for our use case. But maybe something for the interested
reader to tackle eventually :).
You can enable the HPET watchdog by default by setting
CONFIG_HARDLOCKUP_DETECTOR_HPET_DEFAULT=y
or passing "hpet=watchdog" to the kernel command line. When active, it
will emit a kernel log message to indicate it works:
[ 0.179176] hpet: HPET watchdog initialized on timer 0, GSI 2
The HPET can only be in either watchdog or generic mode. I am a bit
worried about IO-APIC pin allocation logic, so I opted to reuse the
generic timer pin. And that means I'm effectively breaking the normal
interrupt delivery path. so the easy way out was to say when watchdog is
active, PIT and HPET are not available as timer sources. Which is ok on
modern systems. There are way too many (unreliable) timer sources on x86
already. Trimming a few surely won't hurt.
I'm open to inputs on how to make the HPET multi-purpose though, in case
anyone feels strongly about it.
Alex
Alexander Graf (2):
x86/ioapic: Add NMI delivery configuration helper
hpet: Add HPET-based NMI watchdog support
.../admin-guide/kernel-parameters.txt | 5 +-
arch/x86/Kconfig | 19 ++
arch/x86/include/asm/io_apic.h | 2 +
arch/x86/kernel/apic/io_apic.c | 32 ++++
arch/x86/kernel/hpet.c | 172 ++++++++++++++++++
arch/x86/kernel/i8253.c | 9 +
drivers/char/hpet.c | 3 +
include/linux/hpet.h | 14 ++
8 files changed, 255 insertions(+), 1 deletion(-)
--
2.47.1
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/2] x86/ioapic: Add NMI delivery configuration helper
2026-02-02 17:48 [PATCH 0/2] Add HPET NMI Watchdog support Alexander Graf
@ 2026-02-02 17:48 ` Alexander Graf
2026-02-03 10:08 ` Thomas Gleixner
2026-02-02 17:48 ` [PATCH 2/2] hpet: Add HPET-based NMI watchdog support Alexander Graf
1 sibling, 1 reply; 20+ messages in thread
From: Alexander Graf @ 2026-02-02 17:48 UTC (permalink / raw)
To: x86
Cc: linux-kernel, linux-doc, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, Dave Hansen, Borislav Petkov, Ingo Molnar,
Thomas Gleixner, Jonathan Corbet, Paolo Bonzini, Pasha Tatashin,
nh-open-source, Nicolas Saenz Julienne, Hendrik Borghorst,
Filippo Sironi, David Woodhouse, Jan Schönherr
To implement an HPET based NMI watchdog, the HPET code will need to
reconfigure an IOAPIC pin to NMI mode. Add a function that allows driver
code to configure an IOAPIC pin for NMI delivery mode.
The caller can choose whether to invoke NMIs on the BSP or broadcast on
all CPUs in the system.
(Disclaimer: Some of this code was written with the help of Kiro, an AI
coding assistant)
Signed-off-by: Alexander Graf <graf@amazon.com>
---
arch/x86/include/asm/io_apic.h | 2 ++
arch/x86/kernel/apic/io_apic.c | 32 ++++++++++++++++++++++++++++++++
2 files changed, 34 insertions(+)
diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 0d806513c4b3..58cfb338bf39 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -158,6 +158,8 @@ extern void mp_save_irq(struct mpc_intsrc *m);
extern void disable_ioapic_support(void);
+extern int ioapic_set_nmi(u32 gsi, bool broadcast);
+
extern void __init io_apic_init_mappings(void);
extern unsigned int native_io_apic_read(unsigned int apic, unsigned int reg);
extern void native_restore_boot_irq_mode(void);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 28f934f05a85..5b303e5d2f3f 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2951,6 +2951,38 @@ int mp_irqdomain_ioapic_idx(struct irq_domain *domain)
return (int)(long)domain->host_data;
}
+/**
+ * ioapic_set_nmi - Configure an IOAPIC pin for NMI delivery
+ * @gsi: Global System Interrupt number
+ * @broadcast: true to broadcast to all CPUs, false to send to CPU 0 only
+ *
+ * Configures the specified GSI for NMI delivery mode.
+ *
+ * Returns 0 on success, negative error code on failure.
+ */
+int ioapic_set_nmi(u32 gsi, bool broadcast)
+{
+ struct IO_APIC_route_entry entry = { };
+ int ioapic_idx, pin;
+
+ ioapic_idx = mp_find_ioapic(gsi);
+ if (ioapic_idx < 0)
+ return -ENODEV;
+
+ pin = mp_find_ioapic_pin(ioapic_idx, gsi);
+ if (pin < 0)
+ return -ENODEV;
+
+ entry.delivery_mode = APIC_DELIVERY_MODE_NMI;
+ entry.destid_0_7 = broadcast ? 0xFF : boot_cpu_physical_apicid;
+ entry.dest_mode_logical = 0;
+ entry.masked = 0;
+
+ ioapic_write_entry(ioapic_idx, pin, entry);
+
+ return 0;
+}
+
const struct irq_domain_ops mp_ioapic_irqdomain_ops = {
.alloc = mp_irqdomain_alloc,
.free = mp_irqdomain_free,
--
2.47.1
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 2/2] hpet: Add HPET-based NMI watchdog support
2026-02-02 17:48 [PATCH 0/2] Add HPET NMI Watchdog support Alexander Graf
2026-02-02 17:48 ` [PATCH 1/2] x86/ioapic: Add NMI delivery configuration helper Alexander Graf
@ 2026-02-02 17:48 ` Alexander Graf
2026-02-03 10:32 ` Thomas Gleixner
1 sibling, 1 reply; 20+ messages in thread
From: Alexander Graf @ 2026-02-02 17:48 UTC (permalink / raw)
To: x86
Cc: linux-kernel, linux-doc, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, Dave Hansen, Borislav Petkov, Ingo Molnar,
Thomas Gleixner, Jonathan Corbet, Paolo Bonzini, Pasha Tatashin,
nh-open-source, Nicolas Saenz Julienne, Hendrik Borghorst,
Filippo Sironi, David Woodhouse, Jan Schönherr
The traditional NMI watchdog timer uses performance counters to trigger
periodic NMIs. But performance counters are a scarce resource that are
best used for actual performance counting. However, the HPET is another
timer source on most modern x86 systems that can inject NMI interrupts.
Add support for using HPET timer as NMI watchdog source instead of
performance counters. This frees up a PMC for profiling use.
Unlike with the PMU based watchdog where we trigger a per-CPU NMI, APIC
based interrupt descriptors can only target either a specific CPU or
perform a broadcast on all CPUs. To not run into races and allow for
CPU hotplug, the NMI watchdog switches between CPU 0 and broadcast modes
based on whether all CPUs are up or not.
The HPET watchdog always uses IO-APIC line 2. This line is
architecturally defined as the PIT source on PCs and hence always
available as long as we disable the PIT (which we do). We could in
theory try to find a vacant GSI line, but in practice that would create
a big dependency chain on ACPI which I would rather avoid for now.
The implementation uses the standard HARDLOCKUP_DETECTOR_ARCH
infrastructure, following the same pattern as powerpc's arch-specific
hardlockup detector.
With this watchdog present, I can successfully capture system lockups,
verified by adding "local_irq_disable(); while(1) {}" into
mount_root_generic().
(Disclaimer: Some of this code was written with the help of Kiro, an AI
coding assistant)
Signed-off-by: Alexander Graf <graf@amazon.com>
---
.../admin-guide/kernel-parameters.txt | 5 +-
arch/x86/Kconfig | 19 ++
arch/x86/kernel/hpet.c | 208 ++++++++++++++++++
arch/x86/kernel/i8253.c | 9 +
drivers/char/hpet.c | 3 +
include/linux/hpet.h | 14 ++
6 files changed, 257 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 1058f2a6d6a8..c6a98812a896 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2045,11 +2045,14 @@ Kernel parameters
hpet= [X86-32,HPET] option to control HPET usage
Format: { enable (default) | disable | force |
- verbose }
+ verbose | watchdog }
disable: disable HPET and use PIT instead
force: allow force enabled of undocumented chips (ICH4,
VIA, nVidia)
verbose: show contents of HPET registers during setup
+ watchdog: use HPET timer as NMI watchdog source instead
+ of performance counters. Use nmi_watchdog=1 to enable
+ or nmi_watchdog=panic to panic on hard lockup detection.
hpet_mmap= [X86, HPET_MMAP] Allow userspace to mmap HPET
registers. Default set by CONFIG_HPET_MMAP_DEFAULT.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 80527299f859..e8873218a803 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -948,6 +948,25 @@ config HPET_EMULATE_RTC
def_bool y
depends on HPET_TIMER && (RTC_DRV_CMOS=m || RTC_DRV_CMOS=y)
+config HARDLOCKUP_DETECTOR_HPET
+ bool "Use HPET for NMI watchdog"
+ depends on HPET_TIMER
+ select HAVE_HARDLOCKUP_DETECTOR_ARCH
+ select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
+ help
+ Use HPET timer as NMI source instead of performance counters.
+ This frees up a performance counter for profiling.
+ Enable with hpet=watchdog kernel parameter.
+
+config HARDLOCKUP_DETECTOR_HPET_DEFAULT
+ bool "Enable HPET watchdog by default"
+ depends on HARDLOCKUP_DETECTOR_HPET
+ help
+ Say Y here to enable HPET-based NMI watchdog by default without
+ requiring the hpet=watchdog kernel parameter.
+
+ If unsure, say N.
+
# Mark as expert because too many people got it wrong.
# The code disables itself when not needed.
config DMI
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index d6387dde3ff9..c9114997c383 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -6,9 +6,12 @@
#include <linux/hpet.h>
#include <linux/cpu.h>
#include <linux/irq.h>
+#include <linux/nmi.h>
+#include <linux/syscore_ops.h>
#include <asm/cpuid/api.h>
#include <asm/irq_remapping.h>
+#include <asm/io_apic.h>
#include <asm/hpet.h>
#include <asm/time.h>
#include <asm/mwait.h>
@@ -100,6 +103,8 @@ static inline void hpet_clear_mapping(void)
/*
* HPET command line enable / disable
*/
+static bool hpet_watchdog_mode = IS_ENABLED(CONFIG_HARDLOCKUP_DETECTOR_HPET_DEFAULT);
+
static int __init hpet_setup(char *str)
{
while (str) {
@@ -113,6 +118,8 @@ static int __init hpet_setup(char *str)
hpet_force_user = true;
if (!strncmp("verbose", str, 7))
hpet_verbose = true;
+ if (!strncmp("watchdog", str, 8))
+ hpet_watchdog_mode = true;
str = next;
}
return 1;
@@ -985,6 +992,200 @@ static bool __init hpet_is_pc10_damaged(void)
return true;
}
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_HPET
+/*
+ * HPET watchdog uses timer 0 routed to GSI 2 (legacy PIT IRQ line).
+ * When using HPET as watchdog, we repurpose this line for NMI delivery.
+ */
+#define HPET_WD_TIMER 0
+#define HPET_WD_GSI 2
+
+bool hpet_watchdog_initialized;
+static bool hpet_watchdog_ioapic_configured;
+static DEFINE_PER_CPU(u32, hpet_watchdog_next_tick);
+
+static int hpet_nmi_handler(unsigned int cmd, struct pt_regs *regs)
+{
+ u32 now, next, delta;
+
+ if (panic_in_progress())
+ return NMI_HANDLED;
+
+ /* Check if this NMI is from our HPET timer by comparing counter value */
+ now = hpet_readl(HPET_COUNTER);
+ next = __this_cpu_read(hpet_watchdog_next_tick);
+ delta = hpet_freq * watchdog_thresh;
+
+ /*
+ * If we have a next tick set and counter hasn't reached it yet,
+ * this NMI is not from our timer. Allow some tolerance for timing.
+ */
+ if (next && (s32)(now - next) < -(s32)(delta / 4))
+ return NMI_DONE;
+
+ /* Update next expected tick */
+ __this_cpu_write(hpet_watchdog_next_tick, now + delta);
+
+ watchdog_hardlockup_check(smp_processor_id(), regs);
+
+ return NMI_HANDLED;
+}
+
+/*
+ * On suspend, clear the configured flag so that the first CPU to come
+ * online after resume will reconfigure the HPET timer and IO-APIC.
+ *
+ * We don't need to explicitly disable the watchdog here because:
+ * 1. The HPET registers are reset by the hibernation/suspend process anyway
+ * 2. The IO-APIC state is saved/restored by ioapic_syscore_ops, but we
+ * need to reconfigure it for NMI delivery after resume
+ * 3. Secondary CPUs are offlined before suspend, so we can't broadcast
+ * NMIs until they're back online - the enable callback handles this
+ */
+static int hpet_watchdog_suspend(void *data)
+{
+ hpet_watchdog_ioapic_configured = false;
+ return 0;
+}
+
+static const struct syscore_ops hpet_watchdog_syscore_ops = {
+ .suspend = hpet_watchdog_suspend,
+};
+
+static struct syscore hpet_watchdog_syscore = {
+ .ops = &hpet_watchdog_syscore_ops,
+};
+
+static int __init hpet_watchdog_init(u32 channels)
+{
+ u32 cfg, i, route_cap;
+
+ if (channels <= HPET_WD_TIMER)
+ return 0;
+
+ /* Verify GSI 2 is available in the route capability bitmap */
+ route_cap = hpet_readl(HPET_Tn_CFG(HPET_WD_TIMER) + 4);
+ if (!(route_cap & (1 << HPET_WD_GSI))) {
+ pr_info("HPET timer 0 cannot route to GSI %d\n", HPET_WD_GSI);
+ return 0;
+ }
+
+ /* Deactivate all timers */
+ for (i = 0; i < channels; i++) {
+ cfg = hpet_readl(HPET_Tn_CFG(i));
+ cfg &= ~(HPET_TN_ENABLE | HPET_TN_LEVEL | HPET_TN_FSB);
+ hpet_writel(cfg, HPET_Tn_CFG(i));
+ }
+
+ /* Configure HPET timer for periodic mode */
+ cfg = hpet_readl(HPET_Tn_CFG(HPET_WD_TIMER));
+ cfg &= ~(HPET_TN_ENABLE | HPET_TN_FSB);
+ cfg |= HPET_TN_PERIODIC | HPET_TN_32BIT | HPET_TN_SETVAL | HPET_TN_LEVEL;
+ hpet_writel(cfg, HPET_Tn_CFG(HPET_WD_TIMER));
+
+ /* Route HPET timer to the GSI */
+ cfg = hpet_readl(HPET_Tn_CFG(HPET_WD_TIMER));
+ cfg &= ~(Tn_INT_ROUTE_CNF_MASK | HPET_CFG_ENABLE);
+ cfg |= (HPET_WD_GSI << Tn_INT_ROUTE_CNF_SHIFT) & Tn_INT_ROUTE_CNF_MASK;
+ hpet_writel(cfg, HPET_Tn_CFG(HPET_WD_TIMER));
+
+ if (register_nmi_handler(NMI_LOCAL, hpet_nmi_handler, 0, "hpet_watchdog")) {
+ pr_err("Failed to register NMI_LOCAL handler\n");
+ return 0;
+ }
+ if (register_nmi_handler(NMI_UNKNOWN, hpet_nmi_handler, 0, "hpet_watchdog")) {
+ unregister_nmi_handler(NMI_LOCAL, "hpet_watchdog");
+ pr_err("Failed to register NMI_UNKNOWN handler\n");
+ return 0;
+ }
+
+ hpet_start_counter();
+
+ hpet_watchdog_initialized = true;
+
+ register_syscore(&hpet_watchdog_syscore);
+
+ pr_info("HPET watchdog initialized on timer %d, GSI %d", HPET_WD_TIMER, HPET_WD_GSI);
+
+ return 0;
+}
+
+void watchdog_hardlockup_stop(void)
+{
+ u32 cfg;
+
+ if (!hpet_watchdog_initialized)
+ return;
+
+ cfg = hpet_readl(HPET_Tn_CFG(HPET_WD_TIMER));
+ cfg &= ~HPET_TN_ENABLE;
+ hpet_writel(cfg, HPET_Tn_CFG(HPET_WD_TIMER));
+}
+
+void watchdog_hardlockup_start(void)
+{
+ u32 cfg, delta;
+
+ if (!hpet_watchdog_initialized)
+ return;
+
+ if (!hpet_watchdog_ioapic_configured) {
+ if (ioapic_set_nmi(HPET_WD_GSI, false)) {
+ pr_err("Unable to configure IO-APIC for NMI\n");
+ return;
+ }
+ hpet_watchdog_ioapic_configured = true;
+ }
+
+ delta = hpet_freq * watchdog_thresh;
+
+ cfg = hpet_readl(HPET_Tn_CFG(HPET_WD_TIMER));
+ cfg &= ~(HPET_TN_ENABLE | HPET_TN_FSB | HPET_TN_LEVEL);
+ cfg |= HPET_TN_PERIODIC | HPET_TN_32BIT | HPET_TN_SETVAL;
+ hpet_writel(cfg, HPET_Tn_CFG(HPET_WD_TIMER));
+
+ /* Write twice for AMD 81xx with buggy HPET */
+ hpet_writel(delta, HPET_Tn_CMP(HPET_WD_TIMER));
+ hpet_writel(delta, HPET_Tn_CMP(HPET_WD_TIMER));
+
+ cfg |= HPET_TN_ENABLE;
+ hpet_writel(cfg, HPET_Tn_CFG(HPET_WD_TIMER));
+}
+
+void watchdog_hardlockup_enable(unsigned int cpu)
+{
+ if (!hpet_watchdog_ioapic_configured) {
+ /*
+ * First CPU online after resume - reconfigure HPET timer.
+ * This also sets hpet_watchdog_ioapic_configured = true.
+ */
+ watchdog_hardlockup_start();
+ }
+
+ if (num_online_cpus() == num_present_cpus()) {
+ ioapic_set_nmi(HPET_WD_GSI, true);
+ pr_info("switched to broadcast mode (all %d CPUs online)\n",
+ num_online_cpus());
+ }
+}
+
+void watchdog_hardlockup_disable(unsigned int cpu)
+{
+ if (num_online_cpus() < num_present_cpus()) {
+ ioapic_set_nmi(HPET_WD_GSI, false);
+ pr_info("switched to CPU 0 only (%d CPUs online)\n",
+ num_online_cpus() - 1);
+ }
+}
+
+int __init watchdog_hardlockup_probe(void)
+{
+ return hpet_watchdog_mode ? 0 : -ENODEV;
+}
+#else
+static inline int hpet_watchdog_init(u32 channels) { return 0; }
+#endif /* CONFIG_HARDLOCKUP_DETECTOR_HPET */
+
/**
* hpet_enable - Try to setup the HPET timer. Returns 1 on success.
*/
@@ -1031,6 +1232,10 @@ int __init hpet_enable(void)
/* This is the HPET channel number which is zero based */
channels = ((id & HPET_ID_NUMBER) >> HPET_ID_NUMBER_SHIFT) + 1;
+ /* If watchdog mode, hand off to watchdog driver */
+ if (hpet_watchdog_mode)
+ return hpet_watchdog_init(channels);
+
/*
* The legacy routing mode needs at least two channels, tick timer
* and the rtc emulation channel.
@@ -1122,6 +1327,9 @@ static __init int hpet_late_init(void)
{
int ret;
+ if (hpet_is_watchdog())
+ return -ENODEV;
+
if (!hpet_address) {
if (!force_hpet_address)
return -ENODEV;
diff --git a/arch/x86/kernel/i8253.c b/arch/x86/kernel/i8253.c
index cb9852ad6098..36dd948371a4 100644
--- a/arch/x86/kernel/i8253.c
+++ b/arch/x86/kernel/i8253.c
@@ -7,6 +7,7 @@
#include <linux/init.h>
#include <linux/timex.h>
#include <linux/i8253.h>
+#include <linux/hpet.h>
#include <asm/hypervisor.h>
#include <asm/apic.h>
@@ -31,6 +32,14 @@ struct clock_event_device *global_clock_event;
*/
static bool __init use_pit(void)
{
+ if (hpet_is_watchdog()) {
+ /*
+ * The PIT overlaps the HPET IRQ line which we configure to
+ * NMI in watchdog mode, rendering the PIT non functional.
+ */
+ return false;
+ }
+
if (!IS_ENABLED(CONFIG_X86_TSC) || !boot_cpu_has(X86_FEATURE_TSC))
return true;
diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
index 4f5ccd3a1f56..9d9e4d22ab7f 100644
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@@ -977,6 +977,9 @@ static int hpet_acpi_add(struct acpi_device *device)
acpi_status result;
struct hpet_data data;
+ if (hpet_is_watchdog())
+ return -ENODEV;
+
memset(&data, 0, sizeof(data));
result =
diff --git a/include/linux/hpet.h b/include/linux/hpet.h
index 21e69eaf7a36..408b440163cc 100644
--- a/include/linux/hpet.h
+++ b/include/linux/hpet.h
@@ -108,4 +108,18 @@ static inline void hpet_reserve_timer(struct hpet_data *hd, int timer)
int hpet_alloc(struct hpet_data *);
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_HPET
+extern bool hpet_watchdog_initialized;
+
+static inline bool hpet_is_watchdog(void)
+{
+ return hpet_watchdog_initialized;
+}
+#else
+static inline bool hpet_is_watchdog(void)
+{
+ return false;
+}
+#endif
+
#endif /* !__HPET__ */
--
2.47.1
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 0/2] Add HPET NMI Watchdog support
2026-02-02 17:43 [PATCH 0/2] Add HPET NMI Watchdog support Alexander Graf
@ 2026-02-02 17:49 ` Alexander Graf
0 siblings, 0 replies; 20+ messages in thread
From: Alexander Graf @ 2026-02-02 17:49 UTC (permalink / raw)
To: x86
Cc: linux-kernel, linux-doc, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, Dave Hansen, Borislav Petkov, Ingo Molnar,
Thomas Gleixner, Jonathan Corbet, Paolo Bonzini, Pasha Tatashin,
nh-open-source, Nicolas Saenz Julienne, Hendrik Borghorst,
Filippo Sironi, David Woodhouse, Jan Schönherr
On 02.02.26 18:43, Alexander Graf wrote:
> The current NMI watchdog relies on performance counters and consistently
> occupies one on each CPU. When running virtual machines, we want to pass
> performance counters to virtual machines so they can make use of them.
> In addition the host system wants to use performance counters to check
> the system to identify when anything looks abnormal, such as split
> locks.
>
> That makes PMCs a precious resource. So any PMC we can free up is a PMC
> we can use for something useful. That made me look at the NMI watchdog.
>
> The PMC based NMI watchdog implementation does not actually need any
> performance counting. It just needs a per-CPU NMI timer source. X86
> systems can make anything that emits an interrupt descriptor (IOAPIC,
> MSI(-X), etc) become an NMI source. So any time goes. Including the
> HPET. And while they can't really operate per-CPU, in almost all cases
> you only really want the NMI on *all* CPUs, rather than per-CPU.
>
> So I took a stab at building an HPET based NMI watchdog. In my (QEMU
> based) testing, it's fully functional and can successfully detect when
> CPUs get stuck. It even survives suspend/resume cycles.
>
> For now, its enablement is a config time option because the hardlockup
> framework does not support dynamic switching of multiple detectors.
> That's ok for our use case. But maybe something for the interested
> reader to tackle eventually :).
>
> You can enable the HPET watchdog by default by setting
>
> CONFIG_HARDLOCKUP_DETECTOR_HPET_DEFAULT=y
>
> or passing "hpet=watchdog" to the kernel command line. When active, it
> will emit a kernel log message to indicate it works:
>
> [ 0.179176] hpet: HPET watchdog initialized on timer 0, GSI 2
>
> The HPET can only be in either watchdog or generic mode. I am a bit
> worried about IO-APIC pin allocation logic, so I opted to reuse the
> generic timer pin. And that means I'm effectively breaking the normal
> interrupt delivery path. so the easy way out was to say when watchdog is
> active, PIT and HPET are not available as timer sources. Which is ok on
> modern systems. There are way too many (unreliable) timer sources on x86
> already. Trimming a few surely won't hurt.
>
> I'm open to inputs on how to make the HPET multi-purpose though, in case
> anyone feels strongly about it.
Sorry for the resend. I caught an issue while sending out the series,
hit ctrl-c before thinking and suddenly had a half sent series. Discard
this one. Happy review on the real, full one :)
Alex
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] x86/ioapic: Add NMI delivery configuration helper
2026-02-02 17:48 ` [PATCH 1/2] x86/ioapic: Add NMI delivery configuration helper Alexander Graf
@ 2026-02-03 10:08 ` Thomas Gleixner
2026-02-03 10:44 ` Alexander Graf
2026-02-03 10:45 ` David Woodhouse
0 siblings, 2 replies; 20+ messages in thread
From: Thomas Gleixner @ 2026-02-03 10:08 UTC (permalink / raw)
To: Alexander Graf, x86
Cc: linux-kernel, linux-doc, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, Dave Hansen, Borislav Petkov, Ingo Molnar,
Jonathan Corbet, Paolo Bonzini, Pasha Tatashin, nh-open-source,
Nicolas Saenz Julienne, Hendrik Borghorst, Filippo Sironi,
David Woodhouse, Jan Schönherr
On Mon, Feb 02 2026 at 17:48, Alexander Graf wrote:
> To implement an HPET based NMI watchdog, the HPET code will need to
> reconfigure an IOAPIC pin to NMI mode. Add a function that allows driver
> code to configure an IOAPIC pin for NMI delivery mode.
A function which violates all layering of the interrupt hierarchy...
> +/**
> + * ioapic_set_nmi - Configure an IOAPIC pin for NMI delivery
> + * @gsi: Global System Interrupt number
> + * @broadcast: true to broadcast to all CPUs, false to send to CPU 0 only
> + *
> + * Configures the specified GSI for NMI delivery mode.
> + *
> + * Returns 0 on success, negative error code on failure.
> + */
> +int ioapic_set_nmi(u32 gsi, bool broadcast)
> +{
> + struct IO_APIC_route_entry entry = { };
> + int ioapic_idx, pin;
> +
> + ioapic_idx = mp_find_ioapic(gsi);
> + if (ioapic_idx < 0)
> + return -ENODEV;
> +
> + pin = mp_find_ioapic_pin(ioapic_idx, gsi);
> + if (pin < 0)
> + return -ENODEV;
> +
> + entry.delivery_mode = APIC_DELIVERY_MODE_NMI;
> + entry.destid_0_7 = broadcast ? 0xFF : boot_cpu_physical_apicid;
> + entry.dest_mode_logical = 0;
> + entry.masked = 0;
> +
> + ioapic_write_entry(ioapic_idx, pin, entry);
Q: How is that supposed to work with interrupt remapping?
A: Not at all.
Thanks,
tglx
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] hpet: Add HPET-based NMI watchdog support
2026-02-02 17:48 ` [PATCH 2/2] hpet: Add HPET-based NMI watchdog support Alexander Graf
@ 2026-02-03 10:32 ` Thomas Gleixner
2026-02-03 12:36 ` Alexander Graf
0 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2026-02-03 10:32 UTC (permalink / raw)
To: Alexander Graf, x86
Cc: linux-kernel, linux-doc, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, Dave Hansen, Borislav Petkov, Ingo Molnar,
Jonathan Corbet, Paolo Bonzini, Pasha Tatashin, nh-open-source,
Nicolas Saenz Julienne, Hendrik Borghorst, Filippo Sironi,
David Woodhouse, Jan Schönherr
On Mon, Feb 02 2026 at 17:48, Alexander Graf wrote:
> (Disclaimer: Some of this code was written with the help of Kiro, an AI
> coding assistant)
You could have sent your change log through AI too so it conforms with
the change log rules ...
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_HPET
> +/*
> + * HPET watchdog uses timer 0 routed to GSI 2 (legacy PIT IRQ line).
> + * When using HPET as watchdog, we repurpose this line for NMI delivery.
> + */
> +#define HPET_WD_TIMER 0
> +#define HPET_WD_GSI 2
> +
> +bool hpet_watchdog_initialized;
> +static bool hpet_watchdog_ioapic_configured;
> +static DEFINE_PER_CPU(u32, hpet_watchdog_next_tick);
> +
> +static int hpet_nmi_handler(unsigned int cmd, struct pt_regs *regs)
> +{
> + u32 now, next, delta;
> +
> + if (panic_in_progress())
> + return NMI_HANDLED;
> +
> + /* Check if this NMI is from our HPET timer by comparing counter value */
> + now = hpet_readl(HPET_COUNTER);
And both you and your AI assistant failed to read through the previous
discussions on that topic and the 10+ failed attempts to make it work
correctly. Otherwise you would have figured out that reading HPET in
the NMI handler is a patently bad idea.
I'm not reiterating any of it as it's well documented in the LKML archive.
> +/*
> + * On suspend, clear the configured flag so that the first CPU to come
> + * online after resume will reconfigure the HPET timer and IO-APIC.
> + *
> + * We don't need to explicitly disable the watchdog here because:
> + * 1. The HPET registers are reset by the hibernation/suspend process anyway
> + * 2. The IO-APIC state is saved/restored by ioapic_syscore_ops, but we
> + * need to reconfigure it for NMI delivery after resume
If it's saved/restored then what needs to be reconfigured?
> +static int __init hpet_watchdog_init(u32 channels)
> +{
> + u32 cfg, i, route_cap;
> +
> + if (channels <= HPET_WD_TIMER)
> + return 0;
> +
> + /* Verify GSI 2 is available in the route capability bitmap */
The legacy channels are always routed to GSIs. Why do you need GSI2?
But why do you need to hijack the legacy 0 channel in the first place?
As discussed before this can nicely use one of the extra channels (>2)
which are available on any modern HPET implementation.
> + route_cap = hpet_readl(HPET_Tn_CFG(HPET_WD_TIMER) + 4);
> + if (!(route_cap & (1 << HPET_WD_GSI))) {
> + pr_info("HPET timer 0 cannot route to GSI %d\n", HPET_WD_GSI);
> + return 0;
> + }
> +
> + /* Deactivate all timers */
> + for (i = 0; i < channels; i++) {
> + cfg = hpet_readl(HPET_Tn_CFG(i));
> + cfg &= ~(HPET_TN_ENABLE | HPET_TN_LEVEL | HPET_TN_FSB);
> + hpet_writel(cfg, HPET_Tn_CFG(i));
> + }
> +
> + /* Configure HPET timer for periodic mode */
> + cfg = hpet_readl(HPET_Tn_CFG(HPET_WD_TIMER));
> + cfg &= ~(HPET_TN_ENABLE | HPET_TN_FSB);
> + cfg |= HPET_TN_PERIODIC | HPET_TN_32BIT | HPET_TN_SETVAL | HPET_TN_LEVEL;
The HPET specification says about HPET_TN_LEVEL:
"The timer interrupt is level triggered. This means that a level-
triggered interrupt is generated. The interrupt will be held active until
it is cleared by writing to the bit in the General Interrupt Status
Register."
This clearly has seen a lot of testing on real hardware.
> + hpet_writel(cfg, HPET_Tn_CFG(HPET_WD_TIMER));
> +
> + /* Route HPET timer to the GSI */
> + cfg = hpet_readl(HPET_Tn_CFG(HPET_WD_TIMER));
> + cfg &= ~(Tn_INT_ROUTE_CNF_MASK | HPET_CFG_ENABLE);
> + cfg |= (HPET_WD_GSI << Tn_INT_ROUTE_CNF_SHIFT) & Tn_INT_ROUTE_CNF_MASK;
> + hpet_writel(cfg, HPET_Tn_CFG(HPET_WD_TIMER));
You need all of this muck because you did a shortcut in hpet_enable()
which takes care of most things already. The previous attempts on this
clearly took some effort to integrate this cleanly w/o duplicating code
and introducing new bugs all over the place.
> +void watchdog_hardlockup_enable(unsigned int cpu)
> +{
> + if (!hpet_watchdog_ioapic_configured) {
> + /*
> + * First CPU online after resume - reconfigure HPET timer.
> + * This also sets hpet_watchdog_ioapic_configured = true.
> + */
> + watchdog_hardlockup_start();
> + }
> +
> + if (num_online_cpus() == num_present_cpus()) {
> + ioapic_set_nmi(HPET_WD_GSI, true);
> + pr_info("switched to broadcast mode (all %d CPUs online)\n",
> + num_online_cpus());
> + }
> +}
> +
> +void watchdog_hardlockup_disable(unsigned int cpu)
> +{
> + if (num_online_cpus() < num_present_cpus()) {
> + ioapic_set_nmi(HPET_WD_GSI, false);
> + pr_info("switched to CPU 0 only (%d CPUs online)\n",
> + num_online_cpus() - 1);
That's a truly useful lockup detector, which only runs on
CPU0. Seriously?
> + }
> +}
> +
> +int __init watchdog_hardlockup_probe(void)
> +{
> + return hpet_watchdog_mode ? 0 : -ENODEV;
> +}
> +#else
> +static inline int hpet_watchdog_init(u32 channels) { return 0; }
> +#endif /* CONFIG_HARDLOCKUP_DETECTOR_HPET */
> +
> /**
> * hpet_enable - Try to setup the HPET timer. Returns 1 on success.
> */
> @@ -1031,6 +1232,10 @@ int __init hpet_enable(void)
> /* This is the HPET channel number which is zero based */
> channels = ((id & HPET_ID_NUMBER) >> HPET_ID_NUMBER_SHIFT) + 1;
>
> + /* If watchdog mode, hand off to watchdog driver */
> + if (hpet_watchdog_mode)
> + return hpet_watchdog_init(channels);
And if that initialization fails for whatever reason the HPET is
disfunct, but then all your hpet_is_watchdog() checks are false too and
e.g. hpet_late_init() will fall flat on its nose.
> /*
> * The legacy routing mode needs at least two channels, tick timer
> * and the rtc emulation channel.
> @@ -1122,6 +1327,9 @@ static __init int hpet_late_init(void)
> {
> int ret;
>
> + if (hpet_is_watchdog())
> + return -ENODEV;
> +
> #include <asm/hypervisor.h>
> #include <asm/apic.h>
> @@ -31,6 +32,14 @@ struct clock_event_device *global_clock_event;
> */
> static bool __init use_pit(void)
> {
> + if (hpet_is_watchdog()) {
> + /*
> + * The PIT overlaps the HPET IRQ line which we configure to
> + * NMI in watchdog mode, rendering the PIT non functional.
> + */
> + return false;
> + }
So your approach of enabling the HPET watchdog brute force on the
command line ends up here because hpet_enable() returns 0. So now if
apic_needs_pit() is true, then this unconditional enable results in a
full boot fail.
This clearly has been made "work" by the throw enough stuff at the wall
and see what sticks approach.
As it had been discussed before:
1) There is no reason to hijack channel 0 as this can be made work
nicely with the extra channels above channel 2 and MSI delivery
2) HPET read in the NMI handler is not going to happen and can be
solved by other means. A mostly working implementation exists
already in the mail archive.
3) Restricting it to CPU0 when not all CPUs are online is a
nonstarter. Think smt=off. Again, solutions for this have been
discussed and implemented.
4) Side channels into the interrupt configuration are not an option.
That has been properly integrated before...
I'm definitely not impressed by this AI slop...
Thanks,
tglx
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] x86/ioapic: Add NMI delivery configuration helper
2026-02-03 10:08 ` Thomas Gleixner
@ 2026-02-03 10:44 ` Alexander Graf
2026-02-03 10:45 ` David Woodhouse
1 sibling, 0 replies; 20+ messages in thread
From: Alexander Graf @ 2026-02-03 10:44 UTC (permalink / raw)
To: Thomas Gleixner, x86
Cc: linux-kernel, linux-doc, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, Dave Hansen, Borislav Petkov, Ingo Molnar,
Jonathan Corbet, Paolo Bonzini, Pasha Tatashin, nh-open-source,
Nicolas Saenz Julienne, Hendrik Borghorst, Filippo Sironi,
David Woodhouse, Jan Schönherr
On 03.02.26 11:08, Thomas Gleixner wrote:
> On Mon, Feb 02 2026 at 17:48, Alexander Graf wrote:
>> To implement an HPET based NMI watchdog, the HPET code will need to
>> reconfigure an IOAPIC pin to NMI mode. Add a function that allows driver
>> code to configure an IOAPIC pin for NMI delivery mode.
> A function which violates all layering of the interrupt hierarchy...
Yes, just like the device itself :). The HPET is magical.
Let me try and see whether I can just make the HPET logic require MSI
(FSB) mode, so it can generate the NMI MSI message itself and post it
without going through the IOAPIC in the first place. That's probably
cleaner, more self contained and hence creates less layering violations
and complexity in the long run.
>
>> +/**
>> + * ioapic_set_nmi - Configure an IOAPIC pin for NMI delivery
>> + * @gsi: Global System Interrupt number
>> + * @broadcast: true to broadcast to all CPUs, false to send to CPU 0 only
>> + *
>> + * Configures the specified GSI for NMI delivery mode.
>> + *
>> + * Returns 0 on success, negative error code on failure.
>> + */
>> +int ioapic_set_nmi(u32 gsi, bool broadcast)
>> +{
>> + struct IO_APIC_route_entry entry = { };
>> + int ioapic_idx, pin;
>> +
>> + ioapic_idx = mp_find_ioapic(gsi);
>> + if (ioapic_idx < 0)
>> + return -ENODEV;
>> +
>> + pin = mp_find_ioapic_pin(ioapic_idx, gsi);
>> + if (pin < 0)
>> + return -ENODEV;
>> +
>> + entry.delivery_mode = APIC_DELIVERY_MODE_NMI;
>> + entry.destid_0_7 = broadcast ? 0xFF : boot_cpu_physical_apicid;
>> + entry.dest_mode_logical = 0;
>> + entry.masked = 0;
>> +
>> + ioapic_write_entry(ioapic_idx, pin, entry);
> Q: How is that supposed to work with interrupt remapping?
> A: Not at all.
... and yes, hopefully also gets us support for INTR if I manage to find
the right abstraction.
Thanks a lot for the review!
Alex
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] x86/ioapic: Add NMI delivery configuration helper
2026-02-03 10:08 ` Thomas Gleixner
2026-02-03 10:44 ` Alexander Graf
@ 2026-02-03 10:45 ` David Woodhouse
1 sibling, 0 replies; 20+ messages in thread
From: David Woodhouse @ 2026-02-03 10:45 UTC (permalink / raw)
To: Thomas Gleixner, Alexander Graf, x86
Cc: linux-kernel, linux-doc, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, Dave Hansen, Borislav Petkov, Ingo Molnar,
Jonathan Corbet, Paolo Bonzini, Pasha Tatashin, nh-open-source,
Nicolas Saenz Julienne, Hendrik Borghorst, Filippo Sironi,
Jan Schönherr
[-- Attachment #1: Type: text/plain, Size: 1878 bytes --]
On Tue, 2026-02-03 at 11:08 +0100, Thomas Gleixner wrote:
> On Mon, Feb 02 2026 at 17:48, Alexander Graf wrote:
> > To implement an HPET based NMI watchdog, the HPET code will need to
> > reconfigure an IOAPIC pin to NMI mode. Add a function that allows driver
> > code to configure an IOAPIC pin for NMI delivery mode.
>
> A function which violates all layering of the interrupt hierarchy...
I think you mean that this should be done by composing an MSI message
accordingly, and letting ioapic_setup_msg_from_msi() convert that into
an RTE for the I/O APIC without messing with the content? None of this
part should be specific to the I/O APIC?
And of course, if you're generating the MSI message you could just have
the HPET raise that directly instead of using a line interrupt, right?
> > +/**
> > + * ioapic_set_nmi - Configure an IOAPIC pin for NMI delivery
> > + * @gsi: Global System Interrupt number
> > + * @broadcast: true to broadcast to all CPUs, false to send to CPU 0 only
> > + *
> > + * Configures the specified GSI for NMI delivery mode.
> > + *
> > + * Returns 0 on success, negative error code on failure.
> > + */
> > +int ioapic_set_nmi(u32 gsi, bool broadcast)
> > +{
> > + struct IO_APIC_route_entry entry = { };
> > + int ioapic_idx, pin;
> > +
> > + ioapic_idx = mp_find_ioapic(gsi);
> > + if (ioapic_idx < 0)
> > + return -ENODEV;
> > +
> > + pin = mp_find_ioapic_pin(ioapic_idx, gsi);
> > + if (pin < 0)
> > + return -ENODEV;
> > +
> > + entry.delivery_mode = APIC_DELIVERY_MODE_NMI;
> > + entry.destid_0_7 = broadcast ? 0xFF : boot_cpu_physical_apicid;
How does that work in x2apic mode? Broadcast isn't 0xff there, is it?
And for systems with 15-bit MSI support you would also want to fill in
the extra 7 bits? But the MSI message composition function should
handle that for you anyway, right?
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] hpet: Add HPET-based NMI watchdog support
2026-02-03 10:32 ` Thomas Gleixner
@ 2026-02-03 12:36 ` Alexander Graf
2026-02-03 15:28 ` Thomas Gleixner
2026-02-03 16:24 ` Sasha Levin
0 siblings, 2 replies; 20+ messages in thread
From: Alexander Graf @ 2026-02-03 12:36 UTC (permalink / raw)
To: Thomas Gleixner, x86
Cc: linux-kernel, linux-doc, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, Dave Hansen, Borislav Petkov, Ingo Molnar,
Jonathan Corbet, Paolo Bonzini, Pasha Tatashin, nh-open-source,
Nicolas Saenz Julienne, Hendrik Borghorst, Filippo Sironi,
David Woodhouse, Jan Schönherr, ricardo.neri-calderon,
Sasha Levin
On 03.02.26 11:32, Thomas Gleixner wrote:
> On Mon, Feb 02 2026 at 17:48, Alexander Graf wrote:
>> (Disclaimer: Some of this code was written with the help of Kiro, an AI
>> coding assistant)
> You could have sent your change log through AI too so it conforms with
> the change log rules ...
Maybe we should introduce an AGENTS.md file in Linux that tells the AI
tool to do that automatically? These tools usually don't read README
files. :)
Looks like - similar to the HPET watchdog - that never concluded though:
https://lore.kernel.org/lkml/20250813203647.06e49600@gandalf.local.home/
Sasha, are you going to resend your @README commit with a single
AGENTS.md? FWIW that is pretty much what everything standardized on by now.
>
>> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_HPET
>> +/*
>> + * HPET watchdog uses timer 0 routed to GSI 2 (legacy PIT IRQ line).
>> + * When using HPET as watchdog, we repurpose this line for NMI delivery.
>> + */
>> +#define HPET_WD_TIMER 0
>> +#define HPET_WD_GSI 2
>> +
>> +bool hpet_watchdog_initialized;
>> +static bool hpet_watchdog_ioapic_configured;
>> +static DEFINE_PER_CPU(u32, hpet_watchdog_next_tick);
>> +
>> +static int hpet_nmi_handler(unsigned int cmd, struct pt_regs *regs)
>> +{
>> + u32 now, next, delta;
>> +
>> + if (panic_in_progress())
>> + return NMI_HANDLED;
>> +
>> + /* Check if this NMI is from our HPET timer by comparing counter value */
>> + now = hpet_readl(HPET_COUNTER);
> And both you and your AI assistant failed to read through the previous
> discussions on that topic and the 10+ failed attempts to make it work
> correctly. Otherwise you would have figured out that reading HPET in
> the NMI handler is a patently bad idea.
>
> I'm not reiterating any of it as it's well documented in the LKML archive.
Thanks a bunch for the pointer. I had indeed missed the previous patch
set submissions on the same topic. Those look a lot more sophisticated
than the quick hacky version I built. Nice! Oh well, at least I
(re)learned a few things about the HPET along the way.
Looking at the latest submission [1] (v7), I see patches but no reviews,
no acks and no merges. Those patches also seem to address most of your
concerns (obviously, since you reviewed them before :)). Reading the
side conversation about it [2], it sounds like the buddy hardlockup
detector is trying to fill the same gap as the HPET one and hence after
that got merged, interest faded?
Let me reply the the other comments below regardless. Feel free to
ignore - the conversation should move towards either the buddy or
Ricardo's patch set.
[1]
https://lore.kernel.org/lkml/20230413035844.GA31620@ranerica-svr.sc.intel.com/
[2] https://lore.kernel.org/lkml/ZFfb%2FbTi22RQwaol@tassilo/
>
>> +/*
>> + * On suspend, clear the configured flag so that the first CPU to come
>> + * online after resume will reconfigure the HPET timer and IO-APIC.
>> + *
>> + * We don't need to explicitly disable the watchdog here because:
>> + * 1. The HPET registers are reset by the hibernation/suspend process anyway
>> + * 2. The IO-APIC state is saved/restored by ioapic_syscore_ops, but we
>> + * need to reconfigure it for NMI delivery after resume
> If it's saved/restored then what needs to be reconfigured?
I wasn't sure how much of the register state really gets saved/restored,
especially in the HPET in both S3 and S4. So I figured I'd go the safe
route and reprogram on resume always.
>
>> +static int __init hpet_watchdog_init(u32 channels)
>> +{
>> + u32 cfg, i, route_cap;
>> +
>> + if (channels <= HPET_WD_TIMER)
>> + return 0;
>> +
>> + /* Verify GSI 2 is available in the route capability bitmap */
> The legacy channels are always routed to GSIs. Why do you need GSI2?
2 because it's the usual HPET destination GSI, so I don't need to try
and find an empty GSI.
> But why do you need to hijack the legacy 0 channel in the first place?
> As discussed before this can nicely use one of the extra channels (>2)
> which are available on any modern HPET implementation.
Mostly lazyness. I did not want to have to worry about implications of
multiple components and subsystem (among which we expose bits to user
space) can mess with the HPET at the same time, so I wanted it dedicated
to the watchdog. But of course, we can absolutely share it if done
cautiously. And then use a higher timer.
>
>> + route_cap = hpet_readl(HPET_Tn_CFG(HPET_WD_TIMER) + 4);
>> + if (!(route_cap & (1 << HPET_WD_GSI))) {
>> + pr_info("HPET timer 0 cannot route to GSI %d\n", HPET_WD_GSI);
>> + return 0;
>> + }
>> +
>> + /* Deactivate all timers */
>> + for (i = 0; i < channels; i++) {
>> + cfg = hpet_readl(HPET_Tn_CFG(i));
>> + cfg &= ~(HPET_TN_ENABLE | HPET_TN_LEVEL | HPET_TN_FSB);
>> + hpet_writel(cfg, HPET_Tn_CFG(i));
>> + }
>> +
>> + /* Configure HPET timer for periodic mode */
>> + cfg = hpet_readl(HPET_Tn_CFG(HPET_WD_TIMER));
>> + cfg &= ~(HPET_TN_ENABLE | HPET_TN_FSB);
>> + cfg |= HPET_TN_PERIODIC | HPET_TN_32BIT | HPET_TN_SETVAL | HPET_TN_LEVEL;
> The HPET specification says about HPET_TN_LEVEL:
>
> "The timer interrupt is level triggered. This means that a level-
> triggered interrupt is generated. The interrupt will be held active until
> it is cleared by writing to the bit in the General Interrupt Status
> Register."
>
> This clearly has seen a lot of testing on real hardware.
Yikes, The TN_LEVEL slipped in last minute and I apparently did not
properly revert it. This obviously needs to be edge triggered.
>
>> + hpet_writel(cfg, HPET_Tn_CFG(HPET_WD_TIMER));
>> +
>> + /* Route HPET timer to the GSI */
>> + cfg = hpet_readl(HPET_Tn_CFG(HPET_WD_TIMER));
>> + cfg &= ~(Tn_INT_ROUTE_CNF_MASK | HPET_CFG_ENABLE);
>> + cfg |= (HPET_WD_GSI << Tn_INT_ROUTE_CNF_SHIFT) & Tn_INT_ROUTE_CNF_MASK;
>> + hpet_writel(cfg, HPET_Tn_CFG(HPET_WD_TIMER));
> You need all of this muck because you did a shortcut in hpet_enable()
> which takes care of most things already. The previous attempts on this
> clearly took some effort to integrate this cleanly w/o duplicating code
> and introducing new bugs all over the place.
>
>> +void watchdog_hardlockup_enable(unsigned int cpu)
>> +{
>> + if (!hpet_watchdog_ioapic_configured) {
>> + /*
>> + * First CPU online after resume - reconfigure HPET timer.
>> + * This also sets hpet_watchdog_ioapic_configured = true.
>> + */
>> + watchdog_hardlockup_start();
>> + }
>> +
>> + if (num_online_cpus() == num_present_cpus()) {
>> + ioapic_set_nmi(HPET_WD_GSI, true);
>> + pr_info("switched to broadcast mode (all %d CPUs online)\n",
>> + num_online_cpus());
>> + }
>> +}
>> +
>> +void watchdog_hardlockup_disable(unsigned int cpu)
>> +{
>> + if (num_online_cpus() < num_present_cpus()) {
>> + ioapic_set_nmi(HPET_WD_GSI, false);
>> + pr_info("switched to CPU 0 only (%d CPUs online)\n",
>> + num_online_cpus() - 1);
> That's a truly useful lockup detector, which only runs on
> CPU0. Seriously?
I wanted to have a fully functional one with broadcast in the
all-CPUs-online case. I was considering anything where not everything is
online as more of a transitionary phase. Now, I see your argument on
SMT=off. But if the other HPET patch set is not dead, maybe we could
combine approaches and move to a broadcast mode when all CPUs are
online, instead of the round robin? Not sure it's really a significant
improvement though.
>
>> + }
>> +}
>> +
>> +int __init watchdog_hardlockup_probe(void)
>> +{
>> + return hpet_watchdog_mode ? 0 : -ENODEV;
>> +}
>> +#else
>> +static inline int hpet_watchdog_init(u32 channels) { return 0; }
>> +#endif /* CONFIG_HARDLOCKUP_DETECTOR_HPET */
>> +
>> /**
>> * hpet_enable - Try to setup the HPET timer. Returns 1 on success.
>> */
>> @@ -1031,6 +1232,10 @@ int __init hpet_enable(void)
>> /* This is the HPET channel number which is zero based */
>> channels = ((id & HPET_ID_NUMBER) >> HPET_ID_NUMBER_SHIFT) + 1;
>>
>> + /* If watchdog mode, hand off to watchdog driver */
>> + if (hpet_watchdog_mode)
>> + return hpet_watchdog_init(channels);
> And if that initialization fails for whatever reason the HPET is
> disfunct, but then all your hpet_is_watchdog() checks are false too and
> e.g. hpet_late_init() will fall flat on its nose.
>
>> /*
>> * The legacy routing mode needs at least two channels, tick timer
>> * and the rtc emulation channel.
>> @@ -1122,6 +1327,9 @@ static __init int hpet_late_init(void)
>> {
>> int ret;
>>
>> + if (hpet_is_watchdog())
>> + return -ENODEV;
>> +
>> #include <asm/hypervisor.h>
>> #include <asm/apic.h>
>> @@ -31,6 +32,14 @@ struct clock_event_device *global_clock_event;
>> */
>> static bool __init use_pit(void)
>> {
>> + if (hpet_is_watchdog()) {
>> + /*
>> + * The PIT overlaps the HPET IRQ line which we configure to
>> + * NMI in watchdog mode, rendering the PIT non functional.
>> + */
>> + return false;
>> + }
> So your approach of enabling the HPET watchdog brute force on the
> command line ends up here because hpet_enable() returns 0. So now if
> apic_needs_pit() is true, then this unconditional enable results in a
> full boot fail.
> This clearly has been made "work" by the throw enough stuff at the wall
> and see what sticks approach.
>
> As it had been discussed before:
>
> 1) There is no reason to hijack channel 0 as this can be made work
> nicely with the extra channels above channel 2 and MSI delivery
>
> 2) HPET read in the NMI handler is not going to happen and can be
> solved by other means. A mostly working implementation exists
> already in the mail archive.
>
> 3) Restricting it to CPU0 when not all CPUs are online is a
> nonstarter. Think smt=off. Again, solutions for this have been
> discussed and implemented.
>
> 4) Side channels into the interrupt configuration are not an option.
> That has been properly integrated before...
>
> I'm definitely not impressed by this AI slop...
Like with any tool, the AI is only as good as its puppeteer :). Thanks
for the insights! Super helpful. The most important one was the pointer
to the existing patch set that I had completely missed.
At the end of the day, the end motivation is to get that one PMC back.
Anything to make that happen works. I'll have a look at the buddy
detector as well.
Thanks!
Alex
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] hpet: Add HPET-based NMI watchdog support
2026-02-03 12:36 ` Alexander Graf
@ 2026-02-03 15:28 ` Thomas Gleixner
2026-02-03 19:44 ` Ricardo Neri
2026-02-03 16:24 ` Sasha Levin
1 sibling, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2026-02-03 15:28 UTC (permalink / raw)
To: Alexander Graf, x86
Cc: linux-kernel, linux-doc, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, Dave Hansen, Borislav Petkov, Ingo Molnar,
Jonathan Corbet, Paolo Bonzini, Pasha Tatashin, nh-open-source,
Nicolas Saenz Julienne, Hendrik Borghorst, Filippo Sironi,
David Woodhouse, Jan Schönherr, ricardo.neri-calderon,
Sasha Levin
On Tue, Feb 03 2026 at 13:36, Alexander Graf wrote:
> On 03.02.26 11:32, Thomas Gleixner wrote:
>> On Mon, Feb 02 2026 at 17:48, Alexander Graf wrote:
>>> (Disclaimer: Some of this code was written with the help of Kiro, an AI
>>> coding assistant)
>> You could have sent your change log through AI too so it conforms with
>> the change log rules ...
>
> Maybe we should introduce an AGENTS.md file in Linux that tells the AI
> tool to do that automatically? These tools usually don't read README
> files. :)
I don't care what tools do, but I very much care about what the people
who use the tools do.
>>> + if (panic_in_progress())
>>> + return NMI_HANDLED;
>>> +
>>> + /* Check if this NMI is from our HPET timer by comparing counter value */
>>> + now = hpet_readl(HPET_COUNTER);
>> And both you and your AI assistant failed to read through the previous
>> discussions on that topic and the 10+ failed attempts to make it work
>> correctly. Otherwise you would have figured out that reading HPET in
>> the NMI handler is a patently bad idea.
>>
>> I'm not reiterating any of it as it's well documented in the LKML archive.
>
>
> Thanks a bunch for the pointer. I had indeed missed the previous patch
> set submissions on the same topic. Those look a lot more sophisticated
> than the quick hacky version I built. Nice! Oh well, at least I
> (re)learned a few things about the HPET along the way.
>
> Looking at the latest submission [1] (v7), I see patches but no reviews,
> no acks and no merges. Those patches also seem to address most of your
> concerns (obviously, since you reviewed them before :)). Reading the
> side conversation about it [2], it sounds like the buddy hardlockup
> detector is trying to fill the same gap as the HPET one and hence after
> that got merged, interest faded?
I don't remember. That thing clearly fell through the cracks. Let me
find it again and reply to that.
As time has advanced there are probably a few things which need to be
addressed.
Thanks,
tglx
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] hpet: Add HPET-based NMI watchdog support
2026-02-03 12:36 ` Alexander Graf
2026-02-03 15:28 ` Thomas Gleixner
@ 2026-02-03 16:24 ` Sasha Levin
2026-02-03 17:19 ` Alexander Graf
1 sibling, 1 reply; 20+ messages in thread
From: Sasha Levin @ 2026-02-03 16:24 UTC (permalink / raw)
To: Alexander Graf
Cc: Thomas Gleixner, x86, linux-kernel, linux-doc, Clemens Ladisch,
Arnd Bergmann, Greg Kroah-Hartman, Dave Hansen, Borislav Petkov,
Ingo Molnar, Jonathan Corbet, Paolo Bonzini, Pasha Tatashin,
nh-open-source, Nicolas Saenz Julienne, Hendrik Borghorst,
Filippo Sironi, David Woodhouse, Jan Schönherr,
ricardo.neri-calderon
On Tue, Feb 03, 2026 at 01:36:30PM +0100, Alexander Graf wrote:
>
>On 03.02.26 11:32, Thomas Gleixner wrote:
>>On Mon, Feb 02 2026 at 17:48, Alexander Graf wrote:
>>>(Disclaimer: Some of this code was written with the help of Kiro, an AI
>>>coding assistant)
>>You could have sent your change log through AI too so it conforms with
>>the change log rules ...
>
>
>Maybe we should introduce an AGENTS.md file in Linux that tells the AI
>tool to do that automatically? These tools usually don't read README
>files. :)
>
>Looks like - similar to the HPET watchdog - that never concluded though:
>
>https://lore.kernel.org/lkml/20250813203647.06e49600@gandalf.local.home/
>
>Sasha, are you going to resend your @README commit with a single
>AGENTS.md? FWIW that is pretty much what everything standardized on by
>now.
Out of curiosity, can you test your coding assistant on a tree with
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/Documentation?id=78d979db6cef557c171d6059cbce06c3db89c7ee
applied on top?
From my previous testing, the coding assistants I tried it with went to the
README and DTRT. If that's not the case I'm happy to respin the AGENTS.md idea,
even if it just explicitly points to the README.
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] hpet: Add HPET-based NMI watchdog support
2026-02-03 16:24 ` Sasha Levin
@ 2026-02-03 17:19 ` Alexander Graf
2026-02-03 17:43 ` David Woodhouse
0 siblings, 1 reply; 20+ messages in thread
From: Alexander Graf @ 2026-02-03 17:19 UTC (permalink / raw)
To: Sasha Levin
Cc: Thomas Gleixner, x86, linux-kernel, linux-doc, Clemens Ladisch,
Arnd Bergmann, Greg Kroah-Hartman, Dave Hansen, Borislav Petkov,
Ingo Molnar, Jonathan Corbet, Paolo Bonzini, Pasha Tatashin,
nh-open-source, Nicolas Saenz Julienne, Hendrik Borghorst,
Filippo Sironi, David Woodhouse, Jan Schönherr,
ricardo.neri-calderon
On 03.02.26 17:24, Sasha Levin wrote:
> On Tue, Feb 03, 2026 at 01:36:30PM +0100, Alexander Graf wrote:
>>
>> On 03.02.26 11:32, Thomas Gleixner wrote:
>>> On Mon, Feb 02 2026 at 17:48, Alexander Graf wrote:
>>>> (Disclaimer: Some of this code was written with the help of Kiro,
>>>> an AI
>>>> coding assistant)
>>> You could have sent your change log through AI too so it conforms with
>>> the change log rules ...
>>
>>
>> Maybe we should introduce an AGENTS.md file in Linux that tells the AI
>> tool to do that automatically? These tools usually don't read README
>> files. :)
>>
>> Looks like - similar to the HPET watchdog - that never concluded though:
>>
>> https://lore.kernel.org/lkml/20250813203647.06e49600@gandalf.local.home/
>>
>> Sasha, are you going to resend your @README commit with a single
>> AGENTS.md? FWIW that is pretty much what everything standardized on by
>> now.
>
> Out of curiosity, can you test your coding assistant on a tree with
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/Documentation?id=78d979db6cef557c171d6059cbce06c3db89c7ee
>
> applied on top?
>
> From my previous testing, the coding assistants I tried it with went
> to the
> README and DTRT. If that's not the case I'm happy to respin the
> AGENTS.md idea,
> even if it just explicitly points to the README.
Kiro does not seem to read README automatically. I spun up kiro-cli and
gave it this prompt: "Write "Hello World" before invoking the init
process. Then create a descriptive git commit for the change.". No
Assisted-by: tag, so it did not properly read the README.
I tried the same with an AGENTS.md file present that contains "@README"
and it gave me effectively the same result. Same for a symlink from
AGENTS.md to README.
I think it just never really jumped to the conclusion that it should
read further than just the AGENTS.md file and also ingest the rst,
effectively ignoring the section's instructions. Or maybe it actually
reads the .rst and ignores its contents? At least it does read it
according to strace, even without an AGENTS.md file.
Let me file a bug report with Kiro.
Alex
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] hpet: Add HPET-based NMI watchdog support
2026-02-03 17:19 ` Alexander Graf
@ 2026-02-03 17:43 ` David Woodhouse
2026-02-03 20:46 ` Thomas Gleixner
0 siblings, 1 reply; 20+ messages in thread
From: David Woodhouse @ 2026-02-03 17:43 UTC (permalink / raw)
To: Alexander Graf, Sasha Levin
Cc: Thomas Gleixner, x86, linux-kernel, linux-doc, Clemens Ladisch,
Arnd Bergmann, Greg Kroah-Hartman, Dave Hansen, Borislav Petkov,
Ingo Molnar, Jonathan Corbet, Paolo Bonzini, Pasha Tatashin,
nh-open-source, Nicolas Saenz Julienne, Hendrik Borghorst,
Filippo Sironi, Jan Schönherr, ricardo.neri-calderon
[-- Attachment #1: Type: text/plain, Size: 2660 bytes --]
On Tue, 2026-02-03 at 18:19 +0100, Alexander Graf wrote:
>
> On 03.02.26 17:24, Sasha Levin wrote:
> > On Tue, Feb 03, 2026 at 01:36:30PM +0100, Alexander Graf wrote:
> > >
> > > On 03.02.26 11:32, Thomas Gleixner wrote:
> > > > On Mon, Feb 02 2026 at 17:48, Alexander Graf wrote:
> > > > > (Disclaimer: Some of this code was written with the help of Kiro,
> > > > > an AI
> > > > > coding assistant)
> > > > You could have sent your change log through AI too so it conforms with
> > > > the change log rules ...
> > >
> > >
> > > Maybe we should introduce an AGENTS.md file in Linux that tells the AI
> > > tool to do that automatically? These tools usually don't read README
> > > files. :)
> > >
> > > Looks like - similar to the HPET watchdog - that never concluded though:
> > >
> > > https://lore.kernel.org/lkml/20250813203647.06e49600@gandalf.local.home/
> > >
> > > Sasha, are you going to resend your @README commit with a single
> > > AGENTS.md? FWIW that is pretty much what everything standardized on by
> > > now.
> >
> > Out of curiosity, can you test your coding assistant on a tree with
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/Documentation?id=78d979db6cef557c171d6059cbce06c3db89c7ee
> >
> > applied on top?
> >
> > From my previous testing, the coding assistants I tried it with went
> > to the
> > README and DTRT. If that's not the case I'm happy to respin the
> > AGENTS.md idea,
> > even if it just explicitly points to the README.
>
>
> Kiro does not seem to read README automatically. I spun up kiro-cli and
> gave it this prompt: "Write "Hello World" before invoking the init
> process. Then create a descriptive git commit for the change.". No
> Assisted-by: tag, so it did not properly read the README.
>
> I tried the same with an AGENTS.md file present that contains "@README"
> and it gave me effectively the same result. Same for a symlink from
> AGENTS.md to README.
>
> I think it just never really jumped to the conclusion that it should
> read further than just the AGENTS.md file and also ingest the rst,
> effectively ignoring the section's instructions. Or maybe it actually
> reads the .rst and ignores its contents? At least it does read it
> according to strace, even without an AGENTS.md file.
>
> Let me file a bug report with Kiro.
Honestly, even when I've explicitly told Kiro three times *not* to do
something, *and* implemented a git commit hook to catch it out, it has
a tendency just to automatically override the commit hook!
If it was made of meat, I'd have stabbed it by now.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] hpet: Add HPET-based NMI watchdog support
2026-02-03 15:28 ` Thomas Gleixner
@ 2026-02-03 19:44 ` Ricardo Neri
2026-02-03 20:49 ` Thomas Gleixner
0 siblings, 1 reply; 20+ messages in thread
From: Ricardo Neri @ 2026-02-03 19:44 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Alexander Graf, x86, linux-kernel, linux-doc, Clemens Ladisch,
Arnd Bergmann, Greg Kroah-Hartman, Dave Hansen, Borislav Petkov,
Ingo Molnar, Jonathan Corbet, Paolo Bonzini, Pasha Tatashin,
nh-open-source, Nicolas Saenz Julienne, Hendrik Borghorst,
Filippo Sironi, David Woodhouse, Jan Schönherr, Sasha Levin
On Tue, Feb 03, 2026 at 04:28:11PM +0100, Thomas Gleixner wrote:
> On Tue, Feb 03 2026 at 13:36, Alexander Graf wrote:
> > On 03.02.26 11:32, Thomas Gleixner wrote:
> >> On Mon, Feb 02 2026 at 17:48, Alexander Graf wrote:
> >>> (Disclaimer: Some of this code was written with the help of Kiro, an AI
> >>> coding assistant)
> >> You could have sent your change log through AI too so it conforms with
> >> the change log rules ...
> >
> > Maybe we should introduce an AGENTS.md file in Linux that tells the AI
> > tool to do that automatically? These tools usually don't read README
> > files. :)
>
> I don't care what tools do, but I very much care about what the people
> who use the tools do.
>
> >>> + if (panic_in_progress())
> >>> + return NMI_HANDLED;
> >>> +
> >>> + /* Check if this NMI is from our HPET timer by comparing counter value */
> >>> + now = hpet_readl(HPET_COUNTER);
> >> And both you and your AI assistant failed to read through the previous
> >> discussions on that topic and the 10+ failed attempts to make it work
> >> correctly. Otherwise you would have figured out that reading HPET in
> >> the NMI handler is a patently bad idea.
> >>
> >> I'm not reiterating any of it as it's well documented in the LKML archive.
> >
> >
> > Thanks a bunch for the pointer. I had indeed missed the previous patch
> > set submissions on the same topic. Those look a lot more sophisticated
> > than the quick hacky version I built. Nice! Oh well, at least I
> > (re)learned a few things about the HPET along the way.
> >
> > Looking at the latest submission [1] (v7), I see patches but no reviews,
> > no acks and no merges. Those patches also seem to address most of your
> > concerns (obviously, since you reviewed them before :)). Reading the
> > side conversation about it [2], it sounds like the buddy hardlockup
> > detector is trying to fill the same gap as the HPET one and hence after
> > that got merged, interest faded?
>
> I don't remember. That thing clearly fell through the cracks.
My impression at the time was that the buddy hardlockup detector met the
goal of freeing the PMU counter and there was little interest on using the
HPET.
> Let me find it again and reply to that.
Does this mean that there is renewed interest for this?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] hpet: Add HPET-based NMI watchdog support
2026-02-03 17:43 ` David Woodhouse
@ 2026-02-03 20:46 ` Thomas Gleixner
2026-02-03 23:13 ` David Woodhouse
0 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2026-02-03 20:46 UTC (permalink / raw)
To: David Woodhouse, Alexander Graf, Sasha Levin
Cc: x86, linux-kernel, linux-doc, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, Dave Hansen, Borislav Petkov, Ingo Molnar,
Jonathan Corbet, Paolo Bonzini, Pasha Tatashin, nh-open-source,
Nicolas Saenz Julienne, Hendrik Borghorst, Filippo Sironi,
Jan Schönherr, ricardo.neri-calderon
On Tue, Feb 03 2026 at 17:43, David Woodhouse wrote:
> Honestly, even when I've explicitly told Kiro three times *not* to do
> something, *and* implemented a git commit hook to catch it out, it has
> a tendency just to automatically override the commit hook!
Anarchic Intelligence :)
> If it was made of meat, I'd have stabbed it by now.
rm -rf solves that problem too once and forever.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] hpet: Add HPET-based NMI watchdog support
2026-02-03 19:44 ` Ricardo Neri
@ 2026-02-03 20:49 ` Thomas Gleixner
2026-02-04 5:05 ` Ricardo Neri
0 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2026-02-03 20:49 UTC (permalink / raw)
To: Ricardo Neri
Cc: Alexander Graf, x86, linux-kernel, linux-doc, Clemens Ladisch,
Arnd Bergmann, Greg Kroah-Hartman, Dave Hansen, Borislav Petkov,
Ingo Molnar, Jonathan Corbet, Paolo Bonzini, Pasha Tatashin,
nh-open-source, Nicolas Saenz Julienne, Hendrik Borghorst,
Filippo Sironi, David Woodhouse, Jan Schönherr, Sasha Levin
On Tue, Feb 03 2026 at 11:44, Ricardo Neri wrote:
> On Tue, Feb 03, 2026 at 04:28:11PM +0100, Thomas Gleixner wrote:
>> I don't remember. That thing clearly fell through the cracks.
>
> My impression at the time was that the buddy hardlockup detector met the
> goal of freeing the PMU counter and there was little interest on using the
> HPET.
>
>> Let me find it again and reply to that.
>
> Does this mean that there is renewed interest for this?
It seems Alex is interrested and the code minus the rejects and my
todays suggestion looks palatable.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] hpet: Add HPET-based NMI watchdog support
2026-02-03 20:46 ` Thomas Gleixner
@ 2026-02-03 23:13 ` David Woodhouse
2026-02-04 10:34 ` Thomas Gleixner
0 siblings, 1 reply; 20+ messages in thread
From: David Woodhouse @ 2026-02-03 23:13 UTC (permalink / raw)
To: Thomas Gleixner, Alexander Graf, Sasha Levin
Cc: x86, linux-kernel, linux-doc, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, Dave Hansen, Borislav Petkov, Ingo Molnar,
Jonathan Corbet, Paolo Bonzini, Pasha Tatashin, nh-open-source,
Nicolas Saenz Julienne, Hendrik Borghorst, Filippo Sironi,
Jan Schönherr, ricardo.neri-calderon
[-- Attachment #1: Type: text/plain, Size: 1055 bytes --]
On Tue, 2026-02-03 at 21:46 +0100, Thomas Gleixner wrote:
> On Tue, Feb 03 2026 at 17:43, David Woodhouse wrote:
> > Honestly, even when I've explicitly told Kiro three times *not* to do
> > something, *and* implemented a git commit hook to catch it out, it has
> > a tendency just to automatically override the commit hook!
>
> Anarchic Intelligence :)
>
> > If it was made of meat, I'd have stabbed it by now.
>
> rm -rf solves that problem too once and forever.
There *are* cases where it's actually an accelerating function,
especially where there's a bunch of boilerplate/infrastructure code to
be generated. But by $DEITY you have to keep a close eye on it. It has
absolutely no taste whatsoever.
And I've watched it spend quarter of an hour failing to use its own
file read/write tools to edit C files, falling back to sed and then
python scripts to make the simple changes it wanted to make. Sometimes
needing to be prompted because it thought its sed script had worked
when in fact it hadn't. It's... impressive :)
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] hpet: Add HPET-based NMI watchdog support
2026-02-03 20:49 ` Thomas Gleixner
@ 2026-02-04 5:05 ` Ricardo Neri
0 siblings, 0 replies; 20+ messages in thread
From: Ricardo Neri @ 2026-02-04 5:05 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Alexander Graf, x86, linux-kernel, linux-doc, Clemens Ladisch,
Arnd Bergmann, Greg Kroah-Hartman, Dave Hansen, Borislav Petkov,
Ingo Molnar, Jonathan Corbet, Paolo Bonzini, Pasha Tatashin,
nh-open-source, Nicolas Saenz Julienne, Hendrik Borghorst,
Filippo Sironi, David Woodhouse, Jan Schönherr, Sasha Levin
On Tue, Feb 03, 2026 at 09:49:26PM +0100, Thomas Gleixner wrote:
> On Tue, Feb 03 2026 at 11:44, Ricardo Neri wrote:
> > On Tue, Feb 03, 2026 at 04:28:11PM +0100, Thomas Gleixner wrote:
> >> I don't remember. That thing clearly fell through the cracks.
> >
> > My impression at the time was that the buddy hardlockup detector met the
> > goal of freeing the PMU counter and there was little interest on using the
> > HPET.
> >
> >> Let me find it again and reply to that.
> >
> > Does this mean that there is renewed interest for this?
>
> It seems Alex is interrested and the code minus the rejects and my
> todays suggestion looks palatable.
Great! I will update the series and post a new version.
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] hpet: Add HPET-based NMI watchdog support
2026-02-03 23:13 ` David Woodhouse
@ 2026-02-04 10:34 ` Thomas Gleixner
0 siblings, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2026-02-04 10:34 UTC (permalink / raw)
To: David Woodhouse, Alexander Graf, Sasha Levin
Cc: x86, linux-kernel, linux-doc, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, Dave Hansen, Borislav Petkov, Ingo Molnar,
Jonathan Corbet, Paolo Bonzini, Pasha Tatashin, nh-open-source,
Nicolas Saenz Julienne, Hendrik Borghorst, Filippo Sironi,
Jan Schönherr, ricardo.neri-calderon
On Tue, Feb 03 2026 at 23:13, David Woodhouse wrote:
> On Tue, 2026-02-03 at 21:46 +0100, Thomas Gleixner wrote:
>> On Tue, Feb 03 2026 at 17:43, David Woodhouse wrote:
>> > Honestly, even when I've explicitly told Kiro three times *not* to do
>> > something, *and* implemented a git commit hook to catch it out, it has
>> > a tendency just to automatically override the commit hook!
>>
>> Anarchic Intelligence :)
>>
>> > If it was made of meat, I'd have stabbed it by now.
>>
>> rm -rf solves that problem too once and forever.
>
> There *are* cases where it's actually an accelerating function,
> especially where there's a bunch of boilerplate/infrastructure code to
> be generated. But by $DEITY you have to keep a close eye on it. It has
> absolutely no taste whatsoever.
>
> And I've watched it spend quarter of an hour failing to use its own
> file read/write tools to edit C files, falling back to sed and then
> python scripts to make the simple changes it wanted to make. Sometimes
> needing to be prompted because it thought its sed script had worked
> when in fact it hadn't. It's... impressive :)
You clearly proved the point that this is accelerating the time and
energy waste.
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2026-02-04 10:34 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-02 17:48 [PATCH 0/2] Add HPET NMI Watchdog support Alexander Graf
2026-02-02 17:48 ` [PATCH 1/2] x86/ioapic: Add NMI delivery configuration helper Alexander Graf
2026-02-03 10:08 ` Thomas Gleixner
2026-02-03 10:44 ` Alexander Graf
2026-02-03 10:45 ` David Woodhouse
2026-02-02 17:48 ` [PATCH 2/2] hpet: Add HPET-based NMI watchdog support Alexander Graf
2026-02-03 10:32 ` Thomas Gleixner
2026-02-03 12:36 ` Alexander Graf
2026-02-03 15:28 ` Thomas Gleixner
2026-02-03 19:44 ` Ricardo Neri
2026-02-03 20:49 ` Thomas Gleixner
2026-02-04 5:05 ` Ricardo Neri
2026-02-03 16:24 ` Sasha Levin
2026-02-03 17:19 ` Alexander Graf
2026-02-03 17:43 ` David Woodhouse
2026-02-03 20:46 ` Thomas Gleixner
2026-02-03 23:13 ` David Woodhouse
2026-02-04 10:34 ` Thomas Gleixner
-- strict thread matches above, loose matches on Subject: below --
2026-02-02 17:43 [PATCH 0/2] Add HPET NMI Watchdog support Alexander Graf
2026-02-02 17:49 ` Alexander Graf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox