All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: "Markku Ahvenjärvi" <mankku@gmail.com>
Cc: bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
	 janne.karhunen@gmail.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org,  mingo@redhat.com,
	pbonzini@redhat.com, tglx@linutronix.de, x86@kernel.org
Subject: Re: [PATCH 1/1] KVM: nVMX: update VPPR on vmlaunch/vmresume
Date: Wed, 2 Oct 2024 08:52:54 -0700	[thread overview]
Message-ID: <Zv1gbzT1KTYpNgY1@google.com> (raw)
In-Reply-To: <20241002124324.14360-1-mankku@gmail.com>

On Wed, Oct 02, 2024, Markku Ahvenjärvi wrote:
> Hi Sean,
> 
> > On Fri, Sep 20, 2024, Markku Ahvenjärvi wrote:
> > > Running certain hypervisors under KVM on VMX suffered L1 hangs after
> > > launching a nested guest. The external interrupts were not processed on
> > > vmlaunch/vmresume due to stale VPPR, and L2 guest would resume without
> > > allowing L1 hypervisor to process the events.
> > > 
> > > The patch ensures VPPR to be updated when checking for pending
> > > interrupts.
> >
> > This is architecturally incorrect, PPR isn't refreshed at VM-Enter.
> 
> I looked into this and found the following from Intel manual:
> 
> "30.1.3 PPR Virtualization
> 
> The processor performs PPR virtualization in response to the following
> operations: (1) VM entry; (2) TPR virtualization; and (3) EOI virtualization.
> 
> ..."
> 
> The section "27.3.2.5 Updating Non-Register State" further explains the VM
> enter:
> 
> "If the “virtual-interrupt delivery” VM-execution control is 1, VM entry loads
> the values of RVI and SVI from the guest interrupt-status field in the VMCS
> (see Section 25.4.2). After doing so, the logical processor first causes PPR
> virtualization (Section 30.1.3) and then evaluates pending virtual interrupts
> (Section 30.2.1). If a virtual interrupt is recognized, it may be delivered in
> VMX non-root operation immediately after VM entry (including any specified
> event injection) completes; ..."
> 
> According to that, PPR is supposed to be refreshed at VM-Enter, or am I
> missing something here?

Huh, I missed that.  It makes sense I guess; VM-Enter processes pending virtual
interrupts, so it stands that VM-Enter would refresh PPR as well.

Ugh, and looking again, KVM refreshes PPR every time it checks for a pending
interrupt, including the VM-Enter case (via kvm_apic_has_interrupt()) when nested
posted interrupts are in use:

	/* Emulate processing of posted interrupts on VM-Enter. */
	if (nested_cpu_has_posted_intr(vmcs12) &&
	    kvm_apic_has_interrupt(vcpu) == vmx->nested.posted_intr_nv) {
		vmx->nested.pi_pending = true;
		kvm_make_request(KVM_REQ_EVENT, vcpu);
		kvm_apic_clear_irr(vcpu, vmx->nested.posted_intr_nv);
	}

I'm still curious as to what's different about your setup, but certainly not
curious enough to hold up a fix.

Anyways, back to the code, I think we can and should shoot for a more complete
cleanup (on top of a minimal fix).  As Chao suggested[*], the above nested posted
interrupt code shouldn't exist, as KVM should handle nested posted interrupts as
part of vmx_check_nested_events(), which honors event priority.  And I see a way,
albeit a bit of an ugly way, to avoid regressing performance when there's pending
nested posted interrupt at VM-Enter.

The other aspect of this code is that I don't think we need to limit the check
to APICv, i.e. KVM can simply check kvm_apic_has_interrupt() after VM-Enter
succeeds (the funky pre-check is necessary to read RVI from vmcs01, with the
event request deferred until KVM knows VM-Enter will be successful).

Arguably, that's probably more correct, as PPR virtualization should only occur
if VM-Enter is successful (or at least guest past the VM-Fail checks).

So, for an immediate fix, I _think_ we can do:

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index a8e7bc04d9bf..784b61c9810b 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3593,7 +3593,8 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
         * effectively unblock various events, e.g. INIT/SIPI cause VM-Exit
         * unconditionally.
         */
-       if (unlikely(evaluate_pending_interrupts))
+       if (unlikely(evaluate_pending_interrupts) ||
+           kvm_apic_has_interrupt(vcpu))
                kvm_make_request(KVM_REQ_EVENT, vcpu);
 
        /*

and then eventually make nested_vmx_enter_non_root_mode() look like the below.

Can you verify that the above fixes your setup?  If it does, I'll put together a
small series with that change and the cleanups I have in mind.

Thanks much!

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index a8e7bc04d9bf..77f0695784d8 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3483,7 +3483,6 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
        struct vcpu_vmx *vmx = to_vmx(vcpu);
        struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
        enum vm_entry_failure_code entry_failure_code;
-       bool evaluate_pending_interrupts;
        union vmx_exit_reason exit_reason = {
                .basic = EXIT_REASON_INVALID_STATE,
                .failed_vmentry = 1,
@@ -3502,13 +3501,6 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 
        kvm_service_local_tlb_flush_requests(vcpu);
 
-       evaluate_pending_interrupts = exec_controls_get(vmx) &
-               (CPU_BASED_INTR_WINDOW_EXITING | CPU_BASED_NMI_WINDOW_EXITING);
-       if (likely(!evaluate_pending_interrupts) && kvm_vcpu_apicv_active(vcpu))
-               evaluate_pending_interrupts |= vmx_has_apicv_interrupt(vcpu);
-       if (!evaluate_pending_interrupts)
-               evaluate_pending_interrupts |= kvm_apic_has_pending_init_or_sipi(vcpu);
-
        if (!vmx->nested.nested_run_pending ||
            !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS))
                vmx->nested.pre_vmenter_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
@@ -3591,9 +3583,13 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
         * Re-evaluate pending events if L1 had a pending IRQ/NMI/INIT/SIPI
         * when it executed VMLAUNCH/VMRESUME, as entering non-root mode can
         * effectively unblock various events, e.g. INIT/SIPI cause VM-Exit
-        * unconditionally.
+        * unconditionally.  Take care to pull data from vmcs01 as appropriate,
+        * e.g. when checking for interrupt windows, as vmcs02 is now loaded.
         */
-       if (unlikely(evaluate_pending_interrupts))
+       if ((__exec_controls_get(&vmx->vmcs01) & (CPU_BASED_INTR_WINDOW_EXITING |
+                                                 CPU_BASED_NMI_WINDOW_EXITING)) ||
+           kvm_apic_has_pending_init_or_sipi(vcpu) ||
+           kvm_apic_has_interrupt(vcpu))
                kvm_make_request(KVM_REQ_EVENT, vcpu);
 
        /*


[*] https://lore.kernel.org/all/Zp%2FC5IlwfzC5DCsl@chao-email

  reply	other threads:[~2024-10-02 15:52 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-20  7:59 [PATCH 0/1] KVM: nVMX: update VPPR on vmlaunch/vmresume Markku Ahvenjärvi
2024-09-20  7:59 ` [PATCH 1/1] " Markku Ahvenjärvi
2024-09-20  8:18   ` Sean Christopherson
2024-09-20 12:40     ` Markku Ahvenjärvi
2024-10-02 12:42     ` Markku Ahvenjärvi
2024-10-02 15:52       ` Sean Christopherson [this message]
2024-10-02 16:49         ` Sean Christopherson
2024-10-02 17:20           ` Sean Christopherson
2024-10-03 11:29             ` Markku Ahvenjärvi
2024-10-10 11:00             ` Chao Gao
2024-10-14 10:57               ` Markku Ahvenjärvi
2024-10-16 18:54               ` Sean Christopherson
2024-10-17 13:27                 ` Chao Gao
2024-10-17 16:05                   ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zv1gbzT1KTYpNgY1@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=janne.karhunen@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mankku@gmail.com \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.