public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: "Markku Ahvenjärvi" <mankku@gmail.com>
Cc: bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
	 janne.karhunen@gmail.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org,  mingo@redhat.com,
	pbonzini@redhat.com, tglx@linutronix.de, x86@kernel.org
Subject: Re: [PATCH 1/1] KVM: nVMX: update VPPR on vmlaunch/vmresume
Date: Wed, 2 Oct 2024 08:52:54 -0700	[thread overview]
Message-ID: <Zv1gbzT1KTYpNgY1@google.com> (raw)
In-Reply-To: <20241002124324.14360-1-mankku@gmail.com>

On Wed, Oct 02, 2024, Markku Ahvenjärvi wrote:
> Hi Sean,
> 
> > On Fri, Sep 20, 2024, Markku Ahvenjärvi wrote:
> > > Running certain hypervisors under KVM on VMX suffered L1 hangs after
> > > launching a nested guest. The external interrupts were not processed on
> > > vmlaunch/vmresume due to stale VPPR, and L2 guest would resume without
> > > allowing L1 hypervisor to process the events.
> > > 
> > > The patch ensures VPPR to be updated when checking for pending
> > > interrupts.
> >
> > This is architecturally incorrect, PPR isn't refreshed at VM-Enter.
> 
> I looked into this and found the following from Intel manual:
> 
> "30.1.3 PPR Virtualization
> 
> The processor performs PPR virtualization in response to the following
> operations: (1) VM entry; (2) TPR virtualization; and (3) EOI virtualization.
> 
> ..."
> 
> The section "27.3.2.5 Updating Non-Register State" further explains the VM
> enter:
> 
> "If the “virtual-interrupt delivery” VM-execution control is 1, VM entry loads
> the values of RVI and SVI from the guest interrupt-status field in the VMCS
> (see Section 25.4.2). After doing so, the logical processor first causes PPR
> virtualization (Section 30.1.3) and then evaluates pending virtual interrupts
> (Section 30.2.1). If a virtual interrupt is recognized, it may be delivered in
> VMX non-root operation immediately after VM entry (including any specified
> event injection) completes; ..."
> 
> According to that, PPR is supposed to be refreshed at VM-Enter, or am I
> missing something here?

Huh, I missed that.  It makes sense I guess; VM-Enter processes pending virtual
interrupts, so it stands that VM-Enter would refresh PPR as well.

Ugh, and looking again, KVM refreshes PPR every time it checks for a pending
interrupt, including the VM-Enter case (via kvm_apic_has_interrupt()) when nested
posted interrupts are in use:

	/* Emulate processing of posted interrupts on VM-Enter. */
	if (nested_cpu_has_posted_intr(vmcs12) &&
	    kvm_apic_has_interrupt(vcpu) == vmx->nested.posted_intr_nv) {
		vmx->nested.pi_pending = true;
		kvm_make_request(KVM_REQ_EVENT, vcpu);
		kvm_apic_clear_irr(vcpu, vmx->nested.posted_intr_nv);
	}

I'm still curious as to what's different about your setup, but certainly not
curious enough to hold up a fix.

Anyways, back to the code, I think we can and should shoot for a more complete
cleanup (on top of a minimal fix).  As Chao suggested[*], the above nested posted
interrupt code shouldn't exist, as KVM should handle nested posted interrupts as
part of vmx_check_nested_events(), which honors event priority.  And I see a way,
albeit a bit of an ugly way, to avoid regressing performance when there's pending
nested posted interrupt at VM-Enter.

The other aspect of this code is that I don't think we need to limit the check
to APICv, i.e. KVM can simply check kvm_apic_has_interrupt() after VM-Enter
succeeds (the funky pre-check is necessary to read RVI from vmcs01, with the
event request deferred until KVM knows VM-Enter will be successful).

Arguably, that's probably more correct, as PPR virtualization should only occur
if VM-Enter is successful (or at least guest past the VM-Fail checks).

So, for an immediate fix, I _think_ we can do:

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index a8e7bc04d9bf..784b61c9810b 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3593,7 +3593,8 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
         * effectively unblock various events, e.g. INIT/SIPI cause VM-Exit
         * unconditionally.
         */
-       if (unlikely(evaluate_pending_interrupts))
+       if (unlikely(evaluate_pending_interrupts) ||
+           kvm_apic_has_interrupt(vcpu))
                kvm_make_request(KVM_REQ_EVENT, vcpu);
 
        /*

and then eventually make nested_vmx_enter_non_root_mode() look like the below.

Can you verify that the above fixes your setup?  If it does, I'll put together a
small series with that change and the cleanups I have in mind.

Thanks much!

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index a8e7bc04d9bf..77f0695784d8 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3483,7 +3483,6 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
        struct vcpu_vmx *vmx = to_vmx(vcpu);
        struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
        enum vm_entry_failure_code entry_failure_code;
-       bool evaluate_pending_interrupts;
        union vmx_exit_reason exit_reason = {
                .basic = EXIT_REASON_INVALID_STATE,
                .failed_vmentry = 1,
@@ -3502,13 +3501,6 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 
        kvm_service_local_tlb_flush_requests(vcpu);
 
-       evaluate_pending_interrupts = exec_controls_get(vmx) &
-               (CPU_BASED_INTR_WINDOW_EXITING | CPU_BASED_NMI_WINDOW_EXITING);
-       if (likely(!evaluate_pending_interrupts) && kvm_vcpu_apicv_active(vcpu))
-               evaluate_pending_interrupts |= vmx_has_apicv_interrupt(vcpu);
-       if (!evaluate_pending_interrupts)
-               evaluate_pending_interrupts |= kvm_apic_has_pending_init_or_sipi(vcpu);
-
        if (!vmx->nested.nested_run_pending ||
            !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS))
                vmx->nested.pre_vmenter_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
@@ -3591,9 +3583,13 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
         * Re-evaluate pending events if L1 had a pending IRQ/NMI/INIT/SIPI
         * when it executed VMLAUNCH/VMRESUME, as entering non-root mode can
         * effectively unblock various events, e.g. INIT/SIPI cause VM-Exit
-        * unconditionally.
+        * unconditionally.  Take care to pull data from vmcs01 as appropriate,
+        * e.g. when checking for interrupt windows, as vmcs02 is now loaded.
         */
-       if (unlikely(evaluate_pending_interrupts))
+       if ((__exec_controls_get(&vmx->vmcs01) & (CPU_BASED_INTR_WINDOW_EXITING |
+                                                 CPU_BASED_NMI_WINDOW_EXITING)) ||
+           kvm_apic_has_pending_init_or_sipi(vcpu) ||
+           kvm_apic_has_interrupt(vcpu))
                kvm_make_request(KVM_REQ_EVENT, vcpu);
 
        /*


[*] https://lore.kernel.org/all/Zp%2FC5IlwfzC5DCsl@chao-email

  reply	other threads:[~2024-10-02 15:52 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-20  7:59 [PATCH 0/1] KVM: nVMX: update VPPR on vmlaunch/vmresume Markku Ahvenjärvi
2024-09-20  7:59 ` [PATCH 1/1] " Markku Ahvenjärvi
2024-09-20  8:18   ` Sean Christopherson
2024-09-20 12:40     ` Markku Ahvenjärvi
2024-10-02 12:42     ` Markku Ahvenjärvi
2024-10-02 15:52       ` Sean Christopherson [this message]
2024-10-02 16:49         ` Sean Christopherson
2024-10-02 17:20           ` Sean Christopherson
2024-10-03 11:29             ` Markku Ahvenjärvi
2024-10-10 11:00             ` Chao Gao
2024-10-14 10:57               ` Markku Ahvenjärvi
2024-10-16 18:54               ` Sean Christopherson
2024-10-17 13:27                 ` Chao Gao
2024-10-17 16:05                   ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zv1gbzT1KTYpNgY1@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=janne.karhunen@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mankku@gmail.com \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox