All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Thomas Gleixner <tglx@kernel.org>
Cc: Jim Mattson <jmattson@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	 Binbin Wu <binbin.wu@linux.intel.com>,
	Vishal L Verma <vishal.l.verma@intel.com>,
	 "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Rick P Edgecombe <rick.p.edgecombe@intel.com>,
	 Binbin Wu <binbin.wu@intel.com>,
	"x86@kernel.org" <x86@kernel.org>,
	Paolo Bonzini <bonzini@redhat.com>
Subject: Re: CPU Lockups in KVM with deferred hrtimer rearming
Date: Tue, 21 Apr 2026 11:55:33 -0700	[thread overview]
Message-ID: <aefIJR_FcEeP-fcS@google.com> (raw)
In-Reply-To: <878qagb20x.ffs@tglx>

On Tue, Apr 21, 2026, Thomas Gleixner wrote:
> On Tue, Apr 21 2026 at 10:20, Jim Mattson wrote:
> > On Tue, Apr 21, 2026 at 10:14 AM Thomas Gleixner <tglx@kernel.org> wrote:
> >>
> >> On Tue, Apr 21 2026 at 13:49, Peter Zijlstra wrote:
> >> > On Tue, Apr 21, 2026 at 01:34:07PM +0200, Peter Zijlstra wrote:
> >> >> > > > KVM invokes regular interrupts with pt_regs which have interrupts
> >> >> > > > disabled. That's correct from the KVM point of view, but completely
> >> >> > > > violates the obviously correct expectations of the interrupt entry/exit
> >> >> > > > code.
> >> >> > >
> >> >> > > Mooo :-(
> >> >>
> >> >> Also, is this a x86/KVM 'special' or is this true for all arch/KVM that
> >> >> use GENERIC_ENTRY?
> >> >
> >> > Should we not make asm_fred_entry_from_kvm()/VMX_DO_EVENT_IRQOFF fix IF
> >> > on the fake frame instead? We know it will enable IRQs after doing
> >> > handle_exit_irqoff() in vcpu_enter_guest().
> >>
> >> Doesn't work :)
> >>
> >> > SVM does not seem affected with this particular insanity.
> >>
> >> Looks like. It will take the interrupt after local_irq_enable().
> >
> > FWIW, VMX should work just like SVM if we clear VM_EXIT_ACK_INTR_ON_EXIT.

Hell no.

> I know. What's the point of that VM_EXIT_ACK_INTR_ON_EXIT exercise? Is
> there any performance benefit or is it just used because it's there?

There are performance benefits, and it preserves ordering: the first IRQ that's
serviced by the host is guaranteed to be _the_ IRQ that triggered the VM-Exit.
E.g. with AMD's approach, any IRQs that arrive between the VM-Exit and STI (which
is a pretty big swath of code) could be serviced before the IRQ that triggered
the exit, depending on priority.

VM_EXIT_ACK_INTR_ON_EXIT also provides symmetry with Intel's handing of NMIs, as
NMIs are unconditionally "acked" on VM-Exit.

Even if performance is "fine", changing decades of fundamental KVM behavior is
terrifying.

Pulling in an earlier idea:

 : Now for VMX, that hrtimer_rearm_deferred() call should really go into
 : handle_external_interrupt_irqoff(), which in turn requires to export
 : __hrtimer_rearm_deferred().

IMO, that's the way to go.  But instead of exporting __hrtimer_rearm_deferred(),
move vmx_do_nmi_irqoff() and vmx_do_interrupt_irqoff() into core kernel entry code
(along with the assembly glue), and then EXPORT_SYMBOL_FOR_KVM those.  It'd mean
some extra surgery, e.g. to provide an equivalent to KVM's IDT lookup:

	gate_offset((gate_desc *)host_idt_base + vector)

But I suspect it would be a big net positive in the end.i  E.g. the entry code
would *know* it's dealing with a direct call from KVM, and thus shouldn't need
to play pt_regs games.

Actually, even better would be to bury the FRED vs. not-FRED details in entry
code.  E.g. on the KVM invocation side, we could get to something like the below,
and I'm pretty sure _reduce_ the number of for-KVM exports in the process.

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a29896a9ef14..f6f5c124ed3b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7127,17 +7127,9 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu,
            "unexpected VM-Exit interrupt info: 0x%x", intr_info))
                return;
 
-       /*
-        * Invoke the kernel's IRQ handler for the vector.  Use the FRED path
-        * when it's available even if FRED isn't fully enabled, e.g. even if
-        * FRED isn't supported in hardware, in order to avoid the indirect
-        * CALL in the non-FRED path.
-        */
+       /* For the IRQ to the core kernel for processing. */
        kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ);
-       if (IS_ENABLED(CONFIG_X86_FRED))
-               fred_entry_from_kvm(EVENT_TYPE_EXTINT, vector);
-       else
-               vmx_do_interrupt_irqoff(gate_offset((gate_desc *)host_idt_base + vector));
+       x86_entry_from_kvm(EVENT_TYPE_EXTINT, vector);
        kvm_after_interrupt(vcpu);
 
        vcpu->arch.at_instruction_boundary = true;
@@ -7447,10 +7439,7 @@ noinstr void vmx_handle_nmi(struct kvm_vcpu *vcpu)
                return;
 
        kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
-       if (cpu_feature_enabled(X86_FEATURE_FRED))
-               fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR);
-       else
-               vmx_do_nmi_irqoff();
+       x86_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR);
        kvm_after_interrupt(vcpu);
 }

  reply	other threads:[~2026-04-21 18:55 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-16 20:50 CPU Lockups in KVM with deferred hrtimer rearming Verma, Vishal L
2026-04-20 15:00 ` Thomas Gleixner
2026-04-20 15:22   ` Thomas Gleixner
2026-04-20 20:57   ` Verma, Vishal L
2026-04-20 22:19     ` Thomas Gleixner
2026-04-20 22:24       ` Verma, Vishal L
2026-04-21  6:29         ` Thomas Gleixner
2026-04-21  4:51   ` Binbin Wu
2026-04-21  7:39     ` Thomas Gleixner
2026-04-21 11:18       ` Peter Zijlstra
2026-04-21 11:32         ` Peter Zijlstra
2026-04-21 11:34           ` Peter Zijlstra
2026-04-21 11:49             ` Peter Zijlstra
2026-04-21 12:05               ` Peter Zijlstra
2026-04-21 13:19                 ` Peter Zijlstra
2026-04-21 13:29                   ` Peter Zijlstra
2026-04-21 16:36                     ` Thomas Gleixner
2026-04-21 18:11                     ` Verma, Vishal L
2026-04-21 17:11               ` Thomas Gleixner
2026-04-21 17:20                 ` Jim Mattson
2026-04-21 18:29                   ` Thomas Gleixner
2026-04-21 18:55                     ` Sean Christopherson [this message]
2026-04-21 20:06                       ` Peter Zijlstra
2026-04-21 20:46                         ` Peter Zijlstra
2026-04-21 20:57                         ` Sean Christopherson
2026-04-21 21:02                           ` Peter Zijlstra
2026-04-21 21:42                             ` Sean Christopherson
2026-04-22  6:55                               ` Peter Zijlstra
2026-04-22  7:46                                 ` Peter Zijlstra
2026-04-22 14:08                                   ` Peter Zijlstra
2026-04-22 15:26                                     ` Sean Christopherson
2026-04-22 19:13                                   ` Verma, Vishal L
2026-04-22 22:57                                   ` Thomas Gleixner
2026-04-23 15:23                                     ` Peter Zijlstra
2026-04-22 13:47                                 ` Sean Christopherson
2026-04-21 20:39                       ` Paolo Bonzini
2026-04-21 21:02                         ` Sean Christopherson
2026-04-21 22:48                         ` Thomas Gleixner
2026-04-21 23:15                           ` Paolo Bonzini
2026-04-21 23:34                             ` Jim Mattson
2026-04-21 23:37                               ` Paolo Bonzini
2026-04-22  2:10                             ` Thomas Gleixner
2026-04-21 21:49                       ` Thomas Gleixner
2026-04-21 22:07                         ` Sean Christopherson
2026-04-21 22:24                         ` Paolo Bonzini
2026-04-21 19:18                 ` Verma, Vishal L
2026-04-21 16:30           ` Thomas Gleixner
2026-04-21 16:11       ` Verma, Vishal L

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aefIJR_FcEeP-fcS@google.com \
    --to=seanjc@google.com \
    --cc=binbin.wu@intel.com \
    --cc=binbin.wu@linux.intel.com \
    --cc=bonzini@redhat.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=tglx@kernel.org \
    --cc=vishal.l.verma@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.