Re: [PATCH] KVM: x86: Ignore pending PV EOI if the vCPU has since disabled PV EOIs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Kai Huang <kai.huang@intel.com>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] KVM: x86: Ignore pending PV EOI if the vCPU has since disabled PV EOIs
Date: Thu, 25 Jun 2026 08:33:19 -0700	[thread overview]
Message-ID: <aj1KP2h49GzfNHFz@google.com> (raw)
In-Reply-To: <c4ce0b3ebc40384afef7895909df3d00370b3e1d.camel@intel.com>

On Thu, Jun 25, 2026, Kai Huang wrote:
> On Wed, 2026-06-24 at 15:05 -0700, Sean Christopherson wrote:
> > Ignore KVM's internal "service pending PV EOI" request if the vCPU has
> > disabled PV EOIs since the request was made.  Asserting that PV EOIs are
> > enabled can fail if reading guest memory in pv_eoi_get_user() fails, i.e.
> > if pv_eoi_test_and_clr_pending() bails early, *and* the vCPU also disables
> > PV EOIs.
> > 
> >   kernel BUG at arch/x86/kvm/lapic.c:3338!
> >   Oops: invalid opcode: 0000 [#1] SMP
> >   CPU: 4 UID: 1000 PID: 890 Comm: pv_eoi_test Not tainted 7.0.0-d585aa5894d8-vm #337 PREEMPT
> >   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> >   RIP: 0010:kvm_lapic_sync_from_vapic+0x12b/0x140 [kvm]
> >   Call Trace:
> >    <TASK>
> >    kvm_arch_vcpu_ioctl_run+0x1075/0x1c30 [kvm]
> >    kvm_vcpu_ioctl+0x2d5/0x980 [kvm]
> >    __x64_sys_ioctl+0x8a/0xd0
> >    do_syscall_64+0xb5/0xb40
> >    entry_SYSCALL_64_after_hwframe+0x4b/0x53
> >    </TASK>
> >   Modules linked in: kvm_intel kvm irqbypass
> >   ---[ end trace 0000000000000000 ]---
> > 
> > Fixes: ae7a2a3fb6f8 ("KVM: host side for eoi optimization")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/kvm/lapic.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 6f30bbdddb5a..38bba9a1114c 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -3371,6 +3371,12 @@ static void apic_sync_pv_eoi_from_guest(struct kvm_vcpu *vcpu,
> >  					struct kvm_lapic *apic)
> >  {
> >  	int vector;
> > +
> > +	if (unlikely(!pv_eoi_enabled(vcpu))) {
> > +		__clear_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
> > +		return;
> > +	}
> > +
> 
> Do you think whether it's better to do this in kvm_lapic_set_pv_eoi(), i.e.,
> when guest disables PV_EOI for this vCPU?

Yes, if we were doing the initial implementation.  I opted not to go that route
because it would require writing to guest memory to put the shared flag back to
KVM_PV_EOI_DISABLED, at which point this becomes a guest-visible change.  It's
probably fine?  But very, very, theoretically, a guest could toggle the feature
on/off and expect memory to remain unchanged.

And I want to get rid of the BUG_ON() no matter what, so I opted to leave the
WRMSR path alone and go with the minimal (albeit ugly) fix.

> If we do so, it seems we can save the unnecessary pv_eoi_set_pending() (which
> writes guest memory) called from apic_sync_pv_eoi_to_guest(), or it's not
> possible to happen?

No, the whole point of apic_sync_pv_eoi_to_guest() is to do pv_eoi_set_pending().
The naming of the flags and the way the code is written obfuscates the logic to
some extent, but what it's doing in pseudocode is:

    IF vCPU has in-service IRQ and no pending IRQs; THEN
        vCPU.pv_eoi = ACTIVE
    FI

where vCPU.pv_eoi is the shared memory.  The guest side, in its APIC EOI path:

  static notrace __maybe_unused void kvm_guest_apic_eoi_write(void)
  {
	/**
	 * This relies on __test_and_clear_bit to modify the memory
	 * in a way that is atomic with respect to the local CPU.
	 * The hypervisor only accesses this memory from the local CPU so
	 * there's no need for lock or memory barriers.
	 * An optimization barrier is implied in apic write.
	 */
	if (__test_and_clear_bit(KVM_PV_EOI_BIT, this_cpu_ptr(&kvm_apic_eoi)))
		return;
	apic_native_eoi();
  }

then does:

    IF vCPU.pv_eoi != ACTIVE; THEN
        do APIC EOI
    FI

next prev parent reply	other threads:[~2026-06-25 15:33 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-24 22:05 [PATCH] KVM: x86: Ignore pending PV EOI if the vCPU has since disabled PV EOIs Sean Christopherson
2026-06-24 22:17 ` sashiko-bot
2026-06-24 22:27   ` Sean Christopherson
2026-06-25  7:30 ` Huang, Kai
2026-06-25 15:33   ` Sean Christopherson [this message]
2026-06-25 23:44     ` Huang, Kai
2026-06-26 17:44       ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aj1KP2h49GzfNHFz@google.com \
    --to=seanjc@google.com \
    --cc=kai.huang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.