From: Chao Gao <chao.gao@intel.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, <kvm@vger.kernel.org>,
<linux-kernel@vger.kernel.org>,
Dongli Zhang <dongli.zhang@oracle.com>
Subject: Re: [PATCH v3 06/10] KVM: nVMX: Switch to vmcs01 to update SVI on-demand if L2 is active
Date: Thu, 25 Dec 2025 16:30:31 +0800 [thread overview]
Message-ID: <aUz2J/cK2PN/n0of@intel.com> (raw)
In-Reply-To: <20251205231913.441872-7-seanjc@google.com>
On Fri, Dec 05, 2025 at 03:19:09PM -0800, Sean Christopherson wrote:
>If APICv is activated while L2 is running and triggers an SVI update,
>temporarily load vmcs01 and immediately update SVI instead of deferring
>the update until the next nested VM-Exit. This will eventually allow
>killing off kvm_apic_update_hwapic_isr(), and all of nVMX's deferred
>APICv updates.
>
>Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
A side topic below:
<snip>
>@@ -6963,21 +6963,16 @@ void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
> u16 status;
> u8 old;
>
>- /*
>- * If L2 is active, defer the SVI update until vmcs01 is loaded, as SVI
>- * is only relevant for if and only if Virtual Interrupt Delivery is
>- * enabled in vmcs12, and if VID is enabled then L2 EOIs affect L2's
>- * vAPIC, not L1's vAPIC. KVM must update vmcs01 on the next nested
>- * VM-Exit, otherwise L1 with run with a stale SVI.
>- */
>- if (is_guest_mode(vcpu)) {
>- to_vmx(vcpu)->nested.update_vmcs01_hwapic_isr = true;
>- return;
>- }
>-
> if (max_isr == -1)
> max_isr = 0;
>
>+ /*
>+ * Always update SVI in vmcs01, as SVI is only relevant for L2 if and
>+ * only if Virtual Interrupt Delivery is enabled in vmcs12, and if VID
>+ * is enabled then L2 EOIs affect L2's vAPIC, not L1's vAPIC.
>+ */
>+ guard(vmx_vmcs01)(vcpu);
KVM calls this function when virtualizing EOI for L2, and in a previous
discussion, you mentioned that the overhead of switching to VMCS01 is
"non-trivial and unnecessary" (see [1]).
My testing shows that guard(vmx_vmcs01) takes about 140-250 cycles. I think
this overhead is acceptable for nested scenarios, since it only affects
EOI-induced VM-exits in specific/suboptimal configurations.
But I'm wondering whether KVM should update SVI on every VM-entry instead of
doing it on-demand (i.e., when vISR gets changed). We've encountered two
SVI-related bugs [1][2] that were difficult to debug. Preventing these issues
entirely seems worthwhile, and the overhead of always updating SVI during
VM-entry should be minimal since KVM already updates RVI (RVI and SVI are in
the the same VMCS field) in vmx_sync_irr_to_pir() when APICv is enabled.
[1]: https://lore.kernel.org/kvm/ZxAL6thxEH67CpW7@google.com/
[2]: https://lore.kernel.org/kvm/20251205231913.441872-2-seanjc@google.com/
e.g.,
---
arch/x86/include/asm/kvm-x86-ops.h | 1 -
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/lapic.c | 24 ++++++++--------
arch/x86/kvm/lapic.h | 1 +
arch/x86/kvm/vmx/main.c | 9 ------
arch/x86/kvm/vmx/vmx.c | 45 +++++++-----------------------
arch/x86/kvm/vmx/x86_ops.h | 1 -
7 files changed, 22 insertions(+), 60 deletions(-)
diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index de709fb5bd76..69aa64aa9910 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -84,7 +84,6 @@ KVM_X86_OP(enable_nmi_window)
KVM_X86_OP(enable_irq_window)
KVM_X86_OP_OPTIONAL(update_cr8_intercept)
KVM_X86_OP(refresh_apicv_exec_ctrl)
-KVM_X86_OP_OPTIONAL(hwapic_isr_update)
KVM_X86_OP_OPTIONAL(load_eoi_exitmap)
KVM_X86_OP_OPTIONAL(set_virtual_apic_mode)
KVM_X86_OP_OPTIONAL(set_apic_access_page_addr)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5a3bfa293e8b..de57a90dac2f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1824,7 +1824,6 @@ struct kvm_x86_ops {
const unsigned long required_apicv_inhibits;
bool allow_apicv_in_x2apic_without_x2apic_virtualization;
void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
- void (*hwapic_isr_update)(struct kvm_vcpu *vcpu, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_apic_mode)(struct kvm_vcpu *vcpu);
void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 7be4d759884c..b0c87fa68f2a 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -715,9 +715,7 @@ static inline void apic_set_isr(int vec, struct kvm_lapic *apic)
* because the processor can modify ISR under the hood. Instead
* just set SVI.
*/
- if (unlikely(apic->apicv_active))
- kvm_x86_call(hwapic_isr_update)(apic->vcpu, vec);
- else {
+ if (likely(!apic->apicv_active)) {
++apic->isr_count;
BUG_ON(apic->isr_count > MAX_APIC_VECTOR);
/*
@@ -761,9 +759,7 @@ static inline void apic_clear_isr(int vec, struct kvm_lapic *apic)
* on the other hand isr_count and highest_isr_cache are unused
* and must be left alone.
*/
- if (unlikely(apic->apicv_active))
- kvm_x86_call(hwapic_isr_update)(apic->vcpu, apic_find_highest_isr(apic));
- else {
+ if (likely(!apic->apicv_active)) {
--apic->isr_count;
BUG_ON(apic->isr_count < 0);
apic->highest_isr_cache = -1;
@@ -781,6 +777,12 @@ int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lapic_find_highest_irr);
+int kvm_lapic_find_highest_isr(struct kvm_vcpu *vcpu)
+{
+ return apic_find_highest_isr(vcpu->arch.apic);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lapic_find_highest_isr);
+
static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
int vector, int level, int trig_mode,
struct dest_map *dest_map);
@@ -2774,12 +2776,10 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
*/
apic->irr_pending = true;
- if (apic->apicv_active) {
+ if (apic->apicv_active)
apic->isr_count = 1;
- kvm_x86_call(hwapic_isr_update)(vcpu, apic_find_highest_isr(apic));
- } else {
+ else
apic->isr_count = count_vectors(apic->regs + APIC_ISR);
- }
apic->highest_isr_cache = -1;
}
@@ -2907,10 +2907,8 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
vcpu->arch.pv_eoi.msr_val = 0;
apic_update_ppr(apic);
- if (apic->apicv_active) {
+ if (apic->apicv_active)
kvm_x86_call(apicv_post_state_restore)(vcpu);
- kvm_x86_call(hwapic_isr_update)(vcpu, -1);
- }
vcpu->arch.apic_arb_prio = 0;
vcpu->arch.apic_attention = 0;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index aa0a9b55dbb7..6242c8c7a682 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -127,6 +127,7 @@ int kvm_apic_set_base(struct kvm_vcpu *vcpu, u64 value, bool host_initiated);
int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
+int kvm_lapic_find_highest_isr(struct kvm_vcpu *vcpu);
u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index a46ccd670785..73dd74efcf9a 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -282,14 +282,6 @@ static void vt_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
return vmx_set_virtual_apic_mode(vcpu);
}
-static void vt_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
-{
- if (is_td_vcpu(vcpu))
- return;
-
- return vmx_hwapic_isr_update(vcpu, max_isr);
-}
-
static int vt_sync_pir_to_irr(struct kvm_vcpu *vcpu)
{
if (is_td_vcpu(vcpu))
@@ -953,7 +945,6 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
.load_eoi_exitmap = vt_op(load_eoi_exitmap),
.apicv_pre_state_restore = pi_apicv_pre_state_restore,
.required_apicv_inhibits = VMX_REQUIRED_APICV_INHIBITS,
- .hwapic_isr_update = vt_op(hwapic_isr_update),
.sync_pir_to_irr = vt_op(sync_pir_to_irr),
.deliver_interrupt = vt_op(deliver_interrupt),
.dy_apicv_has_pending_interrupt = pi_has_pending_interrupt,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ef8d29c677b9..e7883bf7665f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6957,45 +6957,20 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
read_unlock(&vcpu->kvm->mmu_lock);
}
-void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
+static void vmx_set_rvi_svi(int rvi, int svi)
{
u16 status;
- u8 old;
+ u16 new;
- if (max_isr == -1)
- max_isr = 0;
-
- /*
- * Always update SVI in vmcs01, as SVI is only relevant for L2 if and
- * only if Virtual Interrupt Delivery is enabled in vmcs12, and if VID
- * is enabled then L2 EOIs affect L2's vAPIC, not L1's vAPIC.
- */
- guard(vmx_vmcs01)(vcpu);
+ if (rvi == -1)
+ rvi = 0;
+ if (svi == -1)
+ svi = 0;
status = vmcs_read16(GUEST_INTR_STATUS);
- old = status >> 8;
- if (max_isr != old) {
- status &= 0xff;
- status |= max_isr << 8;
- vmcs_write16(GUEST_INTR_STATUS, status);
- }
-}
-
-static void vmx_set_rvi(int vector)
-{
- u16 status;
- u8 old;
-
- if (vector == -1)
- vector = 0;
-
- status = vmcs_read16(GUEST_INTR_STATUS);
- old = (u8)status & 0xff;
- if ((u8)vector != old) {
- status &= ~0xff;
- status |= (u8)vector;
- vmcs_write16(GUEST_INTR_STATUS, status);
- }
+ new = (rvi & 0xff) | ((u8)svi << 8);
+ if (new != status)
+ vmcs_write16(GUEST_INTR_STATUS, new);
}
int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
@@ -7037,7 +7012,7 @@ int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
* a VM-Exit and the subsequent entry will call sync_pir_to_irr.
*/
if (!is_guest_mode(vcpu) && kvm_vcpu_apicv_active(vcpu))
- vmx_set_rvi(max_irr);
+ vmx_set_rvi_svi(max_irr, kvm_lapic_find_highest_isr(vcpu));
else if (got_posted_interrupt)
kvm_make_request(KVM_REQ_EVENT, vcpu);
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index d09abeac2b56..ab7349e67809 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -46,7 +46,6 @@ int vmx_check_intercept(struct kvm_vcpu *vcpu,
bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu);
void vmx_migrate_timers(struct kvm_vcpu *vcpu);
void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
-void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr);
int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu);
void vmx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
int trig_mode, int vector);
--
2.47.3
next prev parent reply other threads:[~2025-12-25 8:30 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-05 23:19 [PATCH v3 00/10] KVM: VMX: Fix APICv activation bugs Sean Christopherson
2025-12-05 23:19 ` [PATCH v3 01/10] KVM: VMX: Update SVI during runtime APICv activation Sean Christopherson
2025-12-05 23:19 ` [PATCH v3 02/10] KVM: nVMX: Immediately refresh APICv controls as needed on nested VM-Exit Sean Christopherson
2025-12-05 23:19 ` [PATCH v3 03/10] KVM: selftests: Add a test to verify APICv updates (while L2 is active) Sean Christopherson
2025-12-12 3:24 ` Chao Gao
2025-12-12 18:01 ` Sean Christopherson
2025-12-05 23:19 ` [PATCH v3 04/10] KVM: nVMX: Switch to vmcs01 to update PML controls on-demand if L2 is active Sean Christopherson
2025-12-24 7:53 ` Chao Gao
2025-12-05 23:19 ` [PATCH v3 05/10] KVM: nVMX: Switch to vmcs01 to update TPR threshold " Sean Christopherson
2025-12-25 6:38 ` Chao Gao
2025-12-05 23:19 ` [PATCH v3 06/10] KVM: nVMX: Switch to vmcs01 to update SVI " Sean Christopherson
2025-12-25 8:30 ` Chao Gao [this message]
2025-12-30 21:03 ` Sean Christopherson
2025-12-31 2:17 ` Chao Gao
2025-12-05 23:19 ` [PATCH v3 07/10] KVM: nVMX: Switch to vmcs01 to refresh APICv controls " Sean Christopherson
2025-12-26 1:45 ` Chao Gao
2025-12-05 23:19 ` [PATCH v3 08/10] KVM: nVMX: Switch to vmcs01 to update APIC page " Sean Christopherson
2025-12-26 2:01 ` Chao Gao
2025-12-05 23:19 ` [PATCH v3 09/10] KVM: nVMX: Switch to vmcs01 to set virtual APICv mode " Sean Christopherson
2025-12-26 5:16 ` Chao Gao
2025-12-05 23:19 ` [PATCH v3 10/10] KVM: x86: Update APICv ISR (a.k.a. SVI) as part of kvm_apic_update_apicv() Sean Christopherson
2025-12-26 5:16 ` Chao Gao
2025-12-10 0:25 ` [PATCH v3 00/10] KVM: VMX: Fix APICv activation bugs Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aUz2J/cK2PN/n0of@intel.com \
--to=chao.gao@intel.com \
--cc=dongli.zhang@oracle.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.