From: Sean Christopherson <seanjc@google.com>
To: Chao Gao <chao.gao@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Dongli Zhang <dongli.zhang@oracle.com>
Subject: Re: [PATCH v3 06/10] KVM: nVMX: Switch to vmcs01 to update SVI on-demand if L2 is active
Date: Tue, 30 Dec 2025 13:03:44 -0800 [thread overview]
Message-ID: <aVQ-MNmUa1fb83zH@google.com> (raw)
In-Reply-To: <aUz2J/cK2PN/n0of@intel.com>
On Thu, Dec 25, 2025, Chao Gao wrote:
> On Fri, Dec 05, 2025 at 03:19:09PM -0800, Sean Christopherson wrote:
> >@@ -6963,21 +6963,16 @@ void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
> > u16 status;
> > u8 old;
> >
> >- /*
> >- * If L2 is active, defer the SVI update until vmcs01 is loaded, as SVI
> >- * is only relevant for if and only if Virtual Interrupt Delivery is
> >- * enabled in vmcs12, and if VID is enabled then L2 EOIs affect L2's
> >- * vAPIC, not L1's vAPIC. KVM must update vmcs01 on the next nested
> >- * VM-Exit, otherwise L1 with run with a stale SVI.
> >- */
> >- if (is_guest_mode(vcpu)) {
> >- to_vmx(vcpu)->nested.update_vmcs01_hwapic_isr = true;
> >- return;
> >- }
> >-
> > if (max_isr == -1)
> > max_isr = 0;
> >
> >+ /*
> >+ * Always update SVI in vmcs01, as SVI is only relevant for L2 if and
> >+ * only if Virtual Interrupt Delivery is enabled in vmcs12, and if VID
> >+ * is enabled then L2 EOIs affect L2's vAPIC, not L1's vAPIC.
> >+ */
> >+ guard(vmx_vmcs01)(vcpu);
>
> KVM calls this function when virtualizing EOI for L2, and in a previous
> discussion, you mentioned that the overhead of switching to VMCS01 is
> "non-trivial and unnecessary" (see [1]).
>
> My testing shows that guard(vmx_vmcs01) takes about 140-250 cycles. I think
> this overhead is acceptable for nested scenarios, since it only affects
> EOI-induced VM-exits in specific/suboptimal configurations.
>
> But I'm wondering whether KVM should update SVI on every VM-entry instead of
> doing it on-demand (i.e., when vISR gets changed). We've encountered two
> SVI-related bugs [1][2] that were difficult to debug. Preventing these issues
> entirely seems worthwhile, and the overhead of always updating SVI during
> VM-entry should be minimal since KVM already updates RVI (RVI and SVI are in
> the the same VMCS field) in vmx_sync_irr_to_pir() when APICv is enabled.
Hmm. At first glance, I _really_ like this idea, but I'm leaning fairly strongly
towards keeping .hwapic_isr_update().
While small (~28 cycles on EMR), the runtime cost isn't zero, and it affects the
fastpath. And number of useful updates is comically small. E.g. without a nested
VM, AFAICT they basically never happen post-boot. Even when running nested VMs,
the number of useful update when running L1 hovers around ~0.001%.
More importantly, KVM will carry most of the complexity related to vISR updates
regardless of how KVM handles SVI because of the ISR caching for non-APICv
systems. So while I acknowledge that we've had some nasty bugs and 100% agree
that squashing them entirely is _very_ enticing, I think those bugs were due to
what were effectively two systemic flaws in KVM: (1) not aligning SVI with KVM's
ISR caching code, and (2) the whole "defer updates to nested VM-Exit" mess.
At the end of this series, both (1) and (2) are "solved". Huh. And now that I
look at (1) again, the last patch is wrong (benignly wrong, but still wrong).
The changelog says this:
First, it adds a call during kvm_lapic_reset(), but that's a glorified nop as
the ISR has already been zeroed.
but that's simply not true. There's already a call in kvm_lapic_reset(). So
that patch can be amended with:
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 7be4d759884c..55a7a2be3a2e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2907,10 +2907,8 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
vcpu->arch.pv_eoi.msr_val = 0;
apic_update_ppr(apic);
- if (apic->apicv_active) {
+ if (apic->apicv_active)
kvm_x86_call(apicv_post_state_restore)(vcpu);
- kvm_x86_call(hwapic_isr_update)(vcpu, -1);
- }
vcpu->arch.apic_arb_prio = 0;
vcpu->arch.apic_attention = 0;
At which point updates to highest_isr_cache and .hwapic_isr_update() are fully
symmetrical (ignoring that KVM simply invalidates highest_isr_cache instead of
scanning the vISR on EOI and APICv changes).
So yeah, the more I look at all of this, the more I'm in favor of keeping
.hwapic_isr_update(), e.g. if only to let it serve as a canary for finding issues
related to highest_isr_cache and/or isr_count.
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index ef8d29c677b9..e7883bf7665f 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6957,45 +6957,20 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
> read_unlock(&vcpu->kvm->mmu_lock);
> }
>
> -void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
> +static void vmx_set_rvi_svi(int rvi, int svi)
If this ever goes anywhere, my vote would be to call this vmx_sync_guest_intr_status(),
and pass in only @rvi, e.g.
static void vmx_sync_guest_intr_status(struct kvm_vcpu *vcpu, int rvi)
{
int svi = kvm_lapic_find_highest_isr(vcpu);
u16 status, new;
...
}
> status = vmcs_read16(GUEST_INTR_STATUS);
> + new = (rvi & 0xff) | ((u8)svi << 8);
I think this is technically undefined behavior? Due to a shift larger than type
(casting to an 8-bit value and then shifting by 8). svi[31:8] should always be
'0', but to be paranoid we could do:
new = (rvi & 0xff) | ((svi & 0xff) << 8);
> + if (new != status)
> + vmcs_write16(GUEST_INTR_STATUS, new);
> }
next prev parent reply other threads:[~2025-12-30 21:03 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-05 23:19 [PATCH v3 00/10] KVM: VMX: Fix APICv activation bugs Sean Christopherson
2025-12-05 23:19 ` [PATCH v3 01/10] KVM: VMX: Update SVI during runtime APICv activation Sean Christopherson
2025-12-05 23:19 ` [PATCH v3 02/10] KVM: nVMX: Immediately refresh APICv controls as needed on nested VM-Exit Sean Christopherson
2025-12-05 23:19 ` [PATCH v3 03/10] KVM: selftests: Add a test to verify APICv updates (while L2 is active) Sean Christopherson
2025-12-12 3:24 ` Chao Gao
2025-12-12 18:01 ` Sean Christopherson
2025-12-05 23:19 ` [PATCH v3 04/10] KVM: nVMX: Switch to vmcs01 to update PML controls on-demand if L2 is active Sean Christopherson
2025-12-24 7:53 ` Chao Gao
2025-12-05 23:19 ` [PATCH v3 05/10] KVM: nVMX: Switch to vmcs01 to update TPR threshold " Sean Christopherson
2025-12-25 6:38 ` Chao Gao
2025-12-05 23:19 ` [PATCH v3 06/10] KVM: nVMX: Switch to vmcs01 to update SVI " Sean Christopherson
2025-12-25 8:30 ` Chao Gao
2025-12-30 21:03 ` Sean Christopherson [this message]
2025-12-31 2:17 ` Chao Gao
2025-12-05 23:19 ` [PATCH v3 07/10] KVM: nVMX: Switch to vmcs01 to refresh APICv controls " Sean Christopherson
2025-12-26 1:45 ` Chao Gao
2025-12-05 23:19 ` [PATCH v3 08/10] KVM: nVMX: Switch to vmcs01 to update APIC page " Sean Christopherson
2025-12-26 2:01 ` Chao Gao
2025-12-05 23:19 ` [PATCH v3 09/10] KVM: nVMX: Switch to vmcs01 to set virtual APICv mode " Sean Christopherson
2025-12-26 5:16 ` Chao Gao
2025-12-05 23:19 ` [PATCH v3 10/10] KVM: x86: Update APICv ISR (a.k.a. SVI) as part of kvm_apic_update_apicv() Sean Christopherson
2025-12-26 5:16 ` Chao Gao
2025-12-10 0:25 ` [PATCH v3 00/10] KVM: VMX: Fix APICv activation bugs Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aVQ-MNmUa1fb83zH@google.com \
--to=seanjc@google.com \
--cc=chao.gao@intel.com \
--cc=dongli.zhang@oracle.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.