From: Sean Christopherson <seanjc@google.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>,
Jim Mattson <jmattson@google.com>,
Binbin Wu <binbin.wu@linux.intel.com>,
Vishal L Verma <vishal.l.verma@intel.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
Rick P Edgecombe <rick.p.edgecombe@intel.com>,
Binbin Wu <binbin.wu@intel.com>,
"x86@kernel.org" <x86@kernel.org>,
Paolo Bonzini <bonzini@redhat.com>
Subject: Re: CPU Lockups in KVM with deferred hrtimer rearming
Date: Tue, 21 Apr 2026 20:57:24 +0000 [thread overview]
Message-ID: <aefktNLiS3ur10yD@google.com> (raw)
In-Reply-To: <20260421200620.GK3126523@noisy.programming.kicks-ass.net>
On Tue, Apr 21, 2026, Peter Zijlstra wrote:
> On Tue, Apr 21, 2026 at 11:55:33AM -0700, Sean Christopherson wrote:
>
> > Pulling in an earlier idea:
> >
> > : Now for VMX, that hrtimer_rearm_deferred() call should really go into
> > : handle_external_interrupt_irqoff(), which in turn requires to export
> > : __hrtimer_rearm_deferred().
> >
>
> > Actually, even better would be to bury the FRED vs. not-FRED details in entry
> > code. E.g. on the KVM invocation side, we could get to something like the below,
> > and I'm pretty sure _reduce_ the number of for-KVM exports in the process.
>
> Something like so then?
Yep!
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> new file mode 100644
> index 000000000000..4b0171abb083
> --- /dev/null
> +++ b/arch/x86/entry/common.c
> @@ -0,0 +1,22 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#include <linux/kvm_types.h>
> +#include <linux/hrtimer_rearm.h>
For CONFIG_X86_FRED=n, which is possible on x86-64 if CONFIG_KVM_INTEL=n, this
#include <linux/sched/task_stack.h>
is needed so that task_pt_regs() can find task_stack_page() (and including
task_stack.h in processor.h would create cyclical includes).
> +#include <asm/entry-common.h>
> +#include <asm/fred.h>
> +#include <asm/desc.h>
> +
Related to CONFIG_X86_FRED=n, I vote to wrap this API with #if IS_ENABLED(CONFIG_KVM_INTEL)
and then delete the fred_entry_from_kvm() stub so that a goof results in a build
failure. That'd also be a good place for a comment to explain some of the usage.
> +noinstr void x86_entry_from_kvm(unsigned int event_type, unsigned int vector)
> +{
> +#ifdef CONFIG_X86_64
> + fred_entry_from_kvm(event_type, vector);
> +#else
> + idt_entry_from_kvm(vector);
> +#endif
...
> +SYM_FUNC_START(idt_do_interrupt_irqoff)
> + IDT_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1
> +SYM_FUNC_END(idt_do_interrupt_irqoff)
> +
> +SYM_FUNC_START(idt_do_nmi_irqoff)
> + IDT_DO_EVENT_IRQOFF call asm_exc_nmi_kvm_vmx
> +SYM_FUNC_END(idt_do_nmi_irqoff)
These need to be declared, and the KVM declarations can be deleted.
> static void __init idt_map_in_cea(void)
> diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
> index 8a481dae9cae..ff1f254a0ef4 100644
> --- a/arch/x86/kvm/vmx/vmenter.S
> +++ b/arch/x86/kvm/vmx/vmenter.S
> @@ -31,38 +31,6 @@
> #define VCPU_R15 __VCPU_REGS_R15 * WORD_SIZE
> #endif
>
> -.macro VMX_DO_EVENT_IRQOFF call_insn call_target
> - /*
> - * Unconditionally create a stack frame, getting the correct RSP on the
> - * stack (for x86-64) would take two instructions anyways, and RBP can
> - * be used to restore RSP to make objtool happy (see below).
> - */
> - push %_ASM_BP
> - mov %_ASM_SP, %_ASM_BP
> -
> -#ifdef CONFIG_X86_64
> - /*
> - * Align RSP to a 16-byte boundary (to emulate CPU behavior) before
> - * creating the synthetic interrupt stack frame for the IRQ/NMI.
> - */
> - and $-16, %rsp
> - push $__KERNEL_DS
> - push %rbp
> -#endif
For anyone else having an -ENOCOFFEE moment, this has been dead code since commit
28d11e4548b7 ("x86/fred: KVM: VMX: Always use FRED for IRQs when CONFIG_X86_FRED=y").
This as delta? (I had typed this all up before Peter posted a new verison, so
dammit I'm sending it!)
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 4b0171abb083..b039276bede9 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -2,10 +2,20 @@
#include <linux/kvm_types.h>
#include <linux/hrtimer_rearm.h>
+#include <linux/sched/task_stack.h>
#include <asm/entry-common.h>
#include <asm/fred.h>
#include <asm/desc.h>
+#if IS_ENABLED(CONFIG_KVM_INTEL)
+/*
+ * On VMX, NMIs and IRQs (as configured by KVM) are acknowledge by hardware as
+ * part of the VM-Exit, i.e. the event itself is consumed as part the VM-Exit.
+ * x86_entry_from_kvm() is invoked by KVM to effectively forward NMIs and IRQs
+ * to the kernel for servicing. On SVM, a.k.a. AMD, the NMI/IRQ VM-Exit is
+ * purely a signal that an NMI/IRQ is pending, i.e. the event that triggered
+ * the VM-Exit is held pending until it's unblocked in the host.
+ */
noinstr void x86_entry_from_kvm(unsigned int event_type, unsigned int vector)
{
#ifdef CONFIG_X86_64
@@ -20,3 +30,4 @@ noinstr void x86_entry_from_kvm(unsigned int event_type, unsigned int vector)
}
}
EXPORT_SYMBOL_FOR_KVM(x86_entry_from_kvm);
+#endif
diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index eca24b5e07f4..2421b1edf77e 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -98,5 +98,7 @@ static __always_inline void arch_exit_to_user_mode(void)
#define arch_exit_to_user_mode arch_exit_to_user_mode
extern void x86_entry_from_kvm(unsigned int entry_type, unsigned int vector);
+extern void idt_do_interrupt_irqoff(unsigned long entry);
+extern void idt_do_nmi_irqoff(void);
#endif
diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index 2bb65677c079..18a2f811c358 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -110,7 +110,6 @@ static __always_inline unsigned long fred_event_data(struct pt_regs *regs) { ret
static inline void cpu_init_fred_exceptions(void) { }
static inline void cpu_init_fred_rsps(void) { }
static inline void fred_complete_exception_setup(void) { }
-static inline void fred_entry_from_kvm(unsigned int type, unsigned int vector) { }
static inline void fred_sync_rsp0(unsigned long rsp0) { }
static inline void fred_update_rsp0(void) { }
#endif /* CONFIG_X86_FRED */
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 3d239ed12744..52a3afb1b79e 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -614,7 +614,6 @@ DEFINE_IDTENTRY_RAW(exc_nmi_kvm_vmx)
{
exc_nmi(regs);
}
-EXPORT_SYMBOL_FOR_KVM(asm_exc_nmi_kvm_vmx);
#endif
#ifdef CONFIG_NMI_CHECK_CPU
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f6f5c124ed3b..753f0dbb9cf8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7083,9 +7083,6 @@ void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
vmcs_write64(EOI_EXIT_BITMAP3, eoi_exit_bitmap[3]);
}
-void vmx_do_interrupt_irqoff(unsigned long entry);
-void vmx_do_nmi_irqoff(void);
-
static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
{
/*
next prev parent reply other threads:[~2026-04-21 20:57 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-16 20:50 CPU Lockups in KVM with deferred hrtimer rearming Verma, Vishal L
2026-04-20 15:00 ` Thomas Gleixner
2026-04-20 15:22 ` Thomas Gleixner
2026-04-20 20:57 ` Verma, Vishal L
2026-04-20 22:19 ` Thomas Gleixner
2026-04-20 22:24 ` Verma, Vishal L
2026-04-21 6:29 ` Thomas Gleixner
2026-04-21 4:51 ` Binbin Wu
2026-04-21 7:39 ` Thomas Gleixner
2026-04-21 11:18 ` Peter Zijlstra
2026-04-21 11:32 ` Peter Zijlstra
2026-04-21 11:34 ` Peter Zijlstra
2026-04-21 11:49 ` Peter Zijlstra
2026-04-21 12:05 ` Peter Zijlstra
2026-04-21 13:19 ` Peter Zijlstra
2026-04-21 13:29 ` Peter Zijlstra
2026-04-21 16:36 ` Thomas Gleixner
2026-04-21 18:11 ` Verma, Vishal L
2026-04-21 17:11 ` Thomas Gleixner
2026-04-21 17:20 ` Jim Mattson
2026-04-21 18:29 ` Thomas Gleixner
2026-04-21 18:55 ` Sean Christopherson
2026-04-21 20:06 ` Peter Zijlstra
2026-04-21 20:46 ` Peter Zijlstra
2026-04-21 20:57 ` Sean Christopherson [this message]
2026-04-21 21:02 ` Peter Zijlstra
2026-04-21 21:42 ` Sean Christopherson
2026-04-22 6:55 ` Peter Zijlstra
2026-04-22 7:46 ` Peter Zijlstra
2026-04-22 14:08 ` Peter Zijlstra
2026-04-22 15:26 ` Sean Christopherson
2026-04-22 19:13 ` Verma, Vishal L
2026-04-22 22:57 ` Thomas Gleixner
2026-04-23 15:23 ` Peter Zijlstra
2026-04-22 13:47 ` Sean Christopherson
2026-04-21 20:39 ` Paolo Bonzini
2026-04-21 21:02 ` Sean Christopherson
2026-04-21 22:48 ` Thomas Gleixner
2026-04-21 23:15 ` Paolo Bonzini
2026-04-21 23:34 ` Jim Mattson
2026-04-21 23:37 ` Paolo Bonzini
2026-04-22 2:10 ` Thomas Gleixner
2026-04-21 21:49 ` Thomas Gleixner
2026-04-21 22:07 ` Sean Christopherson
2026-04-21 22:24 ` Paolo Bonzini
2026-04-21 19:18 ` Verma, Vishal L
2026-04-21 16:30 ` Thomas Gleixner
2026-04-21 16:11 ` Verma, Vishal L
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aefktNLiS3ur10yD@google.com \
--to=seanjc@google.com \
--cc=binbin.wu@intel.com \
--cc=binbin.wu@linux.intel.com \
--cc=bonzini@redhat.com \
--cc=jmattson@google.com \
--cc=kvm@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=rick.p.edgecombe@intel.com \
--cc=tglx@kernel.org \
--cc=vishal.l.verma@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.