* [PATCH v2 0/5] KVM: SVM: Fix x2AVIC MSR interception issues
@ 2026-05-06 18:47 Sean Christopherson
2026-05-06 18:47 ` [PATCH v2 1/5] KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually supports Sean Christopherson
` (5 more replies)
0 siblings, 6 replies; 19+ messages in thread
From: Sean Christopherson @ 2026-05-06 18:47 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Naveen N Rao
Fix a variety of bugs in SVM's handling of x2APIC MSR passthrough for x2AVIC,
where KVM disables interception for MSR accesses that aren't accelerated by
hardware (pointless and suboptimal), and also does NOT disable interception
for practically any of the "range of vectors" MSRs, i.e. IRR, ISR, and TMR.
Found by inspection when reviewing a TDX patch to fix a bug where KVM botched
the "range of vectors"[*] (I was curious how other KVM code handled the ranges;
wasn't expecting this...).
Note, I tagged all of this for stable, but I could be convinced these fixes
shouldn't be sent to LTS trees. Patch 3 in particular doesn't truly fix
anything, though I definitely don't like relying on poorly documented behavior.
Note #2, the diff stats are misleading due to the hacks, the "real" stats are:
arch/x86/kvm/svm/avic.c | 51 ++++++++++++++++-----------------------------------
1 file changed, 16 insertions(+), 35 deletions(-)
[*] https://lore.kernel.org/all/20260318190111.1041924-1-dmaluka@chromium.org
v2:
- Actually iterate over the mask of readable regs. [Naveen]
- Rewrite the changelog for patch 3 to more accurately capture what happens,
and to avoid conflating "unaccelerated" with "fault-like". [Naveen]
- Massage the changlog for patch 1 to describe the observed behavior of
DFR and ICR2.
- Test the #VMEXIT (or not) behavior with hacks (patches 4 and 5).
v1: https://lore.kernel.org/all/20260409222449.2013847-1-seanjc@google.com
Sean Christopherson (5):
KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually
supports
KVM: SVM: Always intercept RDMSR for TMCCT (current APIC timer count)
KVM: SVM: Only disable x2AVIC WRMSR interception for MSRs that are
accelerated
*** DO NOT MERGE *** KVM: x86: Hack in a stat to track guest-induced
exits (for testing)
*** DO NOT MERGE *** KVM: selftests: Add hacky test to verify x2APIC
MSR interception
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/kvm/svm/avic.c | 51 ++--
arch/x86/kvm/svm/svm.c | 81 +++++++
arch/x86/kvm/vmx/vmx.c | 79 +++++++
arch/x86/kvm/x86.c | 2 +
.../testing/selftests/kvm/include/x86/apic.h | 84 ++++++-
.../selftests/kvm/x86/fix_hypercall_test.c | 2 +-
.../selftests/kvm/x86/xapic_ipi_test.c | 4 +-
.../selftests/kvm/x86/xapic_state_test.c | 217 ++++++++++++++++++
9 files changed, 476 insertions(+), 46 deletions(-)
base-commit: 6d35786de28116ecf78797a62b84e6bf3c45aa5a
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v2 1/5] KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually supports
2026-05-06 18:47 [PATCH v2 0/5] KVM: SVM: Fix x2AVIC MSR interception issues Sean Christopherson
@ 2026-05-06 18:47 ` Sean Christopherson
2026-05-07 13:56 ` Naveen N Rao
2026-05-06 18:47 ` [PATCH v2 2/5] KVM: SVM: Always intercept RDMSR for TMCCT (current APIC timer count) Sean Christopherson
` (4 subsequent siblings)
5 siblings, 1 reply; 19+ messages in thread
From: Sean Christopherson @ 2026-05-06 18:47 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Naveen N Rao
Fix multiple (classes of) bugs with one stone by using KVM's mask of
readable local APIC registers to determine which x2APIC MSRs to pass
through (or not) when toggling x2AVIC on/off. The existing hand-coded
list of MSRs is wrong on multiple fronts:
- ARBPRI isn't supported by x2APIC, but its unaccelerated AVIC intercept
is fault-like; disabling interception is nonsensical and suboptimal as
the access generates a #VMEXIT that requires decoding the instruction.
- DFR and ICR2 aren't supported by x2APIC and so don't need their
intercepts disabled for performance reasons. While the #GP due to
x2APIC being abled has higher priority than the trap-like #VMEXIT,
disabling interception of unsupported MSRs is confusing and unnecessary.
- RRR is completely unsupported.
- AVIC currently fails to pass through the "range of vectors" registers,
IRR, ISR, and TMR, as e.g. X2APIC_MSR(APIC_IRR) only affects IRR0, and
thus only disables intercept for vectors 31:0 (which are the *least*
interesting registers).
Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Cc: stable@vger.kernel.org
Cc: Naveen N Rao (AMD) <naveen@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/avic.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index adf211860949..4f203e503e8e 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -122,6 +122,9 @@ static u32 x2avic_max_physical_id;
static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm,
bool intercept)
{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ u64 x2apic_readable_mask;
+
static const u32 x2avic_passthrough_msrs[] = {
X2APIC_MSR(APIC_ID),
X2APIC_MSR(APIC_LVR),
@@ -162,9 +165,16 @@ static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm,
if (!x2avic_enabled)
return;
+ x2apic_readable_mask = kvm_lapic_readable_reg_mask(vcpu->arch.apic);
+
+ for_each_set_bit(i, (unsigned long *)&x2apic_readable_mask,
+ BITS_PER_TYPE(x2apic_readable_mask))
+ svm_set_intercept_for_msr(vcpu, APIC_BASE_MSR + i,
+ MSR_TYPE_R, intercept);
+
for (i = 0; i < ARRAY_SIZE(x2avic_passthrough_msrs); i++)
- svm_set_intercept_for_msr(&svm->vcpu, x2avic_passthrough_msrs[i],
- MSR_TYPE_RW, intercept);
+ svm_set_intercept_for_msr(vcpu, x2avic_passthrough_msrs[i],
+ MSR_TYPE_W, intercept);
svm->x2avic_msrs_intercepted = intercept;
}
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 2/5] KVM: SVM: Always intercept RDMSR for TMCCT (current APIC timer count)
2026-05-06 18:47 [PATCH v2 0/5] KVM: SVM: Fix x2AVIC MSR interception issues Sean Christopherson
2026-05-06 18:47 ` [PATCH v2 1/5] KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually supports Sean Christopherson
@ 2026-05-06 18:47 ` Sean Christopherson
2026-05-07 14:19 ` Naveen N Rao
2026-05-06 18:47 ` [PATCH v2 3/5] KVM: SVM: Only disable x2AVIC WRMSR interception for MSRs that are accelerated Sean Christopherson
` (3 subsequent siblings)
5 siblings, 1 reply; 19+ messages in thread
From: Sean Christopherson @ 2026-05-06 18:47 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Naveen N Rao
Explicitly intercept RDMSR for TMMCT, a.k.a. the current APIC timer count,
when x2AVIC is enabled, as TMMCT reads aren't accelerated by hardware.
Disabling interception is suboptimal as the RDMSR generates an
AVIC_UNACCELERATED_ACCESS fault #VMEXIT, which forces KVM to decode the
instruction to figure out what the guest was trying to access.
Note, the only reason this isn't a fatal bug is that the AVIC architecture
had the foresight to guard against buggy hypervisors. E.g. if hardware
simply read from the virtual APIC page, the guest would get garbage.
Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Cc: stable@vger.kernel.org
Cc: Naveen N Rao (AMD) <naveen@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/avic.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 4f203e503e8e..d693c9ff9f18 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -172,6 +172,9 @@ static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm,
svm_set_intercept_for_msr(vcpu, APIC_BASE_MSR + i,
MSR_TYPE_R, intercept);
+ if (!intercept)
+ svm_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_R);
+
for (i = 0; i < ARRAY_SIZE(x2avic_passthrough_msrs); i++)
svm_set_intercept_for_msr(vcpu, x2avic_passthrough_msrs[i],
MSR_TYPE_W, intercept);
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 3/5] KVM: SVM: Only disable x2AVIC WRMSR interception for MSRs that are accelerated
2026-05-06 18:47 [PATCH v2 0/5] KVM: SVM: Fix x2AVIC MSR interception issues Sean Christopherson
2026-05-06 18:47 ` [PATCH v2 1/5] KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually supports Sean Christopherson
2026-05-06 18:47 ` [PATCH v2 2/5] KVM: SVM: Always intercept RDMSR for TMCCT (current APIC timer count) Sean Christopherson
@ 2026-05-06 18:47 ` Sean Christopherson
2026-05-08 16:59 ` Naveen N Rao
2026-05-06 18:47 ` [PATCH v2 4/5] *** DO NOT MERGE *** KVM: x86: Hack in a stat to track guest-induced exits (for testing) Sean Christopherson
` (2 subsequent siblings)
5 siblings, 1 reply; 19+ messages in thread
From: Sean Christopherson @ 2026-05-06 18:47 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Naveen N Rao
When x2AVIC is enabled, disable WRMSR interception only for MSRs that are
actually accelerated by hardware. Disabling interception for MSRs that
aren't accelerated is functionally "fine", and in some cases a weird "win"
for performance, but only for cases that should never be triggered by a
well-behaved VM (writes to read-only registers; the #GP will typically
occur in the guest without taking a #VMEXIT, even for fault-like exits).
But overall, disabling interception for MSRs that aren't accelerated is at
best confusing and unintuitive, and at worst introduces avoidable risk, as
the effective guest-visible behavior depends on the whims of the CPU (the
behavior of x2APIC MSR writes on at least Zen4 doesn't match the behavior
documented in the table in "15.29.3.1 Virtual APIC Register Accesses" of
the APM).
Note, the set of MSRs that are passed through for write is identical to
VMX's set when IPI virtualization is enabled. This is not a coincidence,
and is another motiviating factor for cleaning up the intercepts, as x2AVIC
is functionally equivalent to APICv+IPIv.
Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Cc: stable@vger.kernel.org
Cc: Naveen N Rao (AMD) <naveen@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/avic.c | 40 ++++------------------------------------
1 file changed, 4 insertions(+), 36 deletions(-)
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index d693c9ff9f18..c5d46c0d2403 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -124,39 +124,6 @@ static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm,
{
struct kvm_vcpu *vcpu = &svm->vcpu;
u64 x2apic_readable_mask;
-
- static const u32 x2avic_passthrough_msrs[] = {
- X2APIC_MSR(APIC_ID),
- X2APIC_MSR(APIC_LVR),
- X2APIC_MSR(APIC_TASKPRI),
- X2APIC_MSR(APIC_ARBPRI),
- X2APIC_MSR(APIC_PROCPRI),
- X2APIC_MSR(APIC_EOI),
- X2APIC_MSR(APIC_RRR),
- X2APIC_MSR(APIC_LDR),
- X2APIC_MSR(APIC_DFR),
- X2APIC_MSR(APIC_SPIV),
- X2APIC_MSR(APIC_ISR),
- X2APIC_MSR(APIC_TMR),
- X2APIC_MSR(APIC_IRR),
- X2APIC_MSR(APIC_ESR),
- X2APIC_MSR(APIC_ICR),
- X2APIC_MSR(APIC_ICR2),
-
- /*
- * Note! Always intercept LVTT, as TSC-deadline timer mode
- * isn't virtualized by hardware, and the CPU will generate a
- * #GP instead of a #VMEXIT.
- */
- X2APIC_MSR(APIC_LVTTHMR),
- X2APIC_MSR(APIC_LVTPC),
- X2APIC_MSR(APIC_LVT0),
- X2APIC_MSR(APIC_LVT1),
- X2APIC_MSR(APIC_LVTERR),
- X2APIC_MSR(APIC_TMICT),
- X2APIC_MSR(APIC_TMCCT),
- X2APIC_MSR(APIC_TDCR),
- };
int i;
if (intercept == svm->x2avic_msrs_intercepted)
@@ -175,9 +142,10 @@ static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm,
if (!intercept)
svm_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_R);
- for (i = 0; i < ARRAY_SIZE(x2avic_passthrough_msrs); i++)
- svm_set_intercept_for_msr(vcpu, x2avic_passthrough_msrs[i],
- MSR_TYPE_W, intercept);
+ svm_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TASKPRI), MSR_TYPE_W, intercept);
+ svm_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_EOI), MSR_TYPE_W, intercept);
+ svm_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W, intercept);
+ svm_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_ICR), MSR_TYPE_W, intercept);
svm->x2avic_msrs_intercepted = intercept;
}
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 4/5] *** DO NOT MERGE *** KVM: x86: Hack in a stat to track guest-induced exits (for testing)
2026-05-06 18:47 [PATCH v2 0/5] KVM: SVM: Fix x2AVIC MSR interception issues Sean Christopherson
` (2 preceding siblings ...)
2026-05-06 18:47 ` [PATCH v2 3/5] KVM: SVM: Only disable x2AVIC WRMSR interception for MSRs that are accelerated Sean Christopherson
@ 2026-05-06 18:47 ` Sean Christopherson
2026-05-08 17:14 ` Naveen N Rao
2026-05-06 18:47 ` [PATCH v2 5/5] *** DO NOT MERGE *** KVM: selftests: Add hacky test to verify x2APIC MSR interception Sean Christopherson
2026-05-09 5:10 ` [PATCH v2 0/5] KVM: SVM: Fix x2AVIC MSR interception issues Naveen N Rao
5 siblings, 1 reply; 19+ messages in thread
From: Sean Christopherson @ 2026-05-06 18:47 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Naveen N Rao
Not-signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/kvm/svm/svm.c | 81 +++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.c | 79 ++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 2 +
4 files changed, 164 insertions(+)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c470e40a00aa..bff534bd00dc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1703,6 +1703,8 @@ struct kvm_vcpu_stat {
u64 invlpg;
u64 exits;
+ u64 guest_induced_exits;
+ u64 msr_exits;
u64 io_exits;
u64 mmio_exits;
u64 signal_exits;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e7fdd7a9c280..7886bd1ad8f2 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4378,6 +4378,81 @@ static fastpath_t svm_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
return EXIT_FASTPATH_NONE;
}
+static bool is_guest_induced_exit(u64 exit_code)
+{
+ switch (exit_code) {
+ case SVM_EXIT_READ_CR0:
+ case SVM_EXIT_READ_CR3:
+ case SVM_EXIT_READ_CR4:
+ case SVM_EXIT_READ_CR8:
+ case SVM_EXIT_CR0_SEL_WRITE:
+ case SVM_EXIT_WRITE_CR0:
+ case SVM_EXIT_WRITE_CR3:
+ case SVM_EXIT_WRITE_CR4:
+ case SVM_EXIT_WRITE_CR8:
+ case SVM_EXIT_READ_DR0:
+ case SVM_EXIT_READ_DR1:
+ case SVM_EXIT_READ_DR2:
+ case SVM_EXIT_READ_DR3:
+ case SVM_EXIT_READ_DR4:
+ case SVM_EXIT_READ_DR5:
+ case SVM_EXIT_READ_DR6:
+ case SVM_EXIT_READ_DR7:
+ case SVM_EXIT_WRITE_DR0:
+ case SVM_EXIT_WRITE_DR1:
+ case SVM_EXIT_WRITE_DR2:
+ case SVM_EXIT_WRITE_DR3:
+ case SVM_EXIT_WRITE_DR4:
+ case SVM_EXIT_WRITE_DR5:
+ case SVM_EXIT_WRITE_DR6:
+ case SVM_EXIT_WRITE_DR7:
+ case SVM_EXIT_EXCP_BASE + DB_VECTOR:
+ case SVM_EXIT_EXCP_BASE + BP_VECTOR:
+ case SVM_EXIT_EXCP_BASE + UD_VECTOR:
+ case SVM_EXIT_EXCP_BASE + PF_VECTOR:
+ case SVM_EXIT_EXCP_BASE + AC_VECTOR:
+ case SVM_EXIT_EXCP_BASE + GP_VECTOR:
+ case SVM_EXIT_RDPMC:
+ case SVM_EXIT_CPUID:
+ case SVM_EXIT_IRET:
+ case SVM_EXIT_INVD:
+ case SVM_EXIT_PAUSE:
+ case SVM_EXIT_HLT:
+ case SVM_EXIT_INVLPG:
+ case SVM_EXIT_INVLPGA:
+ case SVM_EXIT_IOIO:
+ case SVM_EXIT_MSR:
+ case SVM_EXIT_TASK_SWITCH:
+ case SVM_EXIT_SHUTDOWN:
+ case SVM_EXIT_VMRUN:
+ case SVM_EXIT_VMMCALL:
+ case SVM_EXIT_VMLOAD:
+ case SVM_EXIT_VMSAVE:
+ case SVM_EXIT_STGI:
+ case SVM_EXIT_CLGI:
+ case SVM_EXIT_SKINIT:
+ case SVM_EXIT_RDTSCP:
+ case SVM_EXIT_WBINVD:
+ case SVM_EXIT_MONITOR:
+ case SVM_EXIT_MWAIT:
+ case SVM_EXIT_XSETBV:
+ case SVM_EXIT_RDPRU:
+ case SVM_EXIT_EFER_WRITE_TRAP:
+ case SVM_EXIT_CR0_WRITE_TRAP:
+ case SVM_EXIT_CR4_WRITE_TRAP:
+ case SVM_EXIT_CR8_WRITE_TRAP:
+ case SVM_EXIT_INVPCID:
+ case SVM_EXIT_IDLE_HLT:
+ case SVM_EXIT_RSM:
+ case SVM_EXIT_AVIC_INCOMPLETE_IPI:
+ case SVM_EXIT_AVIC_UNACCELERATED_ACCESS:
+ return true;
+ default:
+ break;
+ }
+ return false;
+}
+
static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_intercepted)
{
struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, vcpu->cpu);
@@ -4573,6 +4648,12 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
if (is_guest_mode(vcpu))
svm->nested.ctl.next_rip = svm->vmcb->control.next_rip;
+ if (is_guest_induced_exit(svm->vmcb->control.exit_code))
+ ++vcpu->stat.guest_induced_exits;
+
+ if (svm->vmcb->control.exit_code == SVM_EXIT_MSR)
+ ++vcpu->stat.msr_exits;
+
return svm_exit_handlers_fastpath(vcpu);
}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5c2c33a5f7dc..859f4bc01445 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7478,6 +7478,79 @@ noinstr void vmx_handle_nmi(struct kvm_vcpu *vcpu)
kvm_after_interrupt(vcpu);
}
+static bool is_guest_induced_exit(struct kvm_vcpu *vcpu)
+{
+ switch (vmx_get_exit_reason(vcpu).basic) {
+ case EXIT_REASON_EXCEPTION_NMI:
+ if (is_nmi(vmx_get_intr_info(vcpu)))
+ return false;
+ return true;
+ case EXIT_REASON_TRIPLE_FAULT:
+ case EXIT_REASON_IO_INSTRUCTION:
+ case EXIT_REASON_CR_ACCESS:
+ case EXIT_REASON_DR_ACCESS:
+ case EXIT_REASON_CPUID:
+ case EXIT_REASON_MSR_READ:
+ case EXIT_REASON_MSR_WRITE:
+ case EXIT_REASON_HLT:
+ case EXIT_REASON_INVD:
+ case EXIT_REASON_INVLPG:
+ case EXIT_REASON_RDPMC:
+ case EXIT_REASON_VMCALL:
+ case EXIT_REASON_VMCLEAR:
+ case EXIT_REASON_VMLAUNCH:
+ case EXIT_REASON_VMPTRLD:
+ case EXIT_REASON_VMPTRST:
+ case EXIT_REASON_VMREAD:
+ case EXIT_REASON_VMRESUME:
+ case EXIT_REASON_VMWRITE:
+ case EXIT_REASON_VMOFF:
+ case EXIT_REASON_VMON:
+ case EXIT_REASON_TPR_BELOW_THRESHOLD:
+ case EXIT_REASON_APIC_ACCESS:
+ case EXIT_REASON_APIC_WRITE:
+ case EXIT_REASON_EOI_INDUCED:
+ case EXIT_REASON_WBINVD:
+ case EXIT_REASON_XSETBV:
+ case EXIT_REASON_TASK_SWITCH:
+ case EXIT_REASON_MCE_DURING_VMENTRY:
+ case EXIT_REASON_GDTR_IDTR:
+ case EXIT_REASON_LDTR_TR:
+ case EXIT_REASON_PAUSE_INSTRUCTION:
+ case EXIT_REASON_MWAIT_INSTRUCTION:
+ case EXIT_REASON_MONITOR_INSTRUCTION:
+ case EXIT_REASON_INVEPT:
+ case EXIT_REASON_INVVPID:
+ case EXIT_REASON_RDRAND:
+ case EXIT_REASON_RDSEED:
+ case EXIT_REASON_INVPCID:
+ case EXIT_REASON_VMFUNC:
+ case EXIT_REASON_ENCLS:
+ case EXIT_REASON_SEAMCALL:
+ case EXIT_REASON_TDCALL:
+ case EXIT_REASON_MSR_READ_IMM:
+ case EXIT_REASON_MSR_WRITE_IMM:
+ return true;
+ default:
+ break;
+ }
+ return false;
+}
+
+static bool is_msr_exit(struct kvm_vcpu *vcpu)
+{
+ switch (vmx_get_exit_reason(vcpu).basic) {
+ case EXIT_REASON_MSR_READ:
+ case EXIT_REASON_MSR_WRITE:
+ case EXIT_REASON_MSR_READ_IMM:
+ case EXIT_REASON_MSR_WRITE_IMM:
+ return true;
+ default:
+ break;
+ }
+ return false;
+}
+
static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
unsigned int flags)
{
@@ -7667,6 +7740,12 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
vmx_recover_nmi_blocking(vmx);
vmx_complete_interrupts(vmx);
+ if (is_guest_induced_exit(vcpu))
+ ++vcpu->stat.guest_induced_exits;
+
+ if (is_msr_exit(vcpu))
+ ++vcpu->stat.msr_exits;
+
return vmx_exit_handlers_fastpath(vcpu, force_immediate_exit);
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0a1b63c63d1a..dc69b8cebe0b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -283,6 +283,8 @@ const struct kvm_stats_desc kvm_vcpu_stats_desc[] = {
STATS_DESC_COUNTER(VCPU, tlb_flush),
STATS_DESC_COUNTER(VCPU, invlpg),
STATS_DESC_COUNTER(VCPU, exits),
+ STATS_DESC_COUNTER(VCPU, guest_induced_exits),
+ STATS_DESC_COUNTER(VCPU, msr_exits),
STATS_DESC_COUNTER(VCPU, io_exits),
STATS_DESC_COUNTER(VCPU, mmio_exits),
STATS_DESC_COUNTER(VCPU, signal_exits),
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 5/5] *** DO NOT MERGE *** KVM: selftests: Add hacky test to verify x2APIC MSR interception
2026-05-06 18:47 [PATCH v2 0/5] KVM: SVM: Fix x2AVIC MSR interception issues Sean Christopherson
` (3 preceding siblings ...)
2026-05-06 18:47 ` [PATCH v2 4/5] *** DO NOT MERGE *** KVM: x86: Hack in a stat to track guest-induced exits (for testing) Sean Christopherson
@ 2026-05-06 18:47 ` Sean Christopherson
2026-05-09 5:10 ` [PATCH v2 0/5] KVM: SVM: Fix x2AVIC MSR interception issues Naveen N Rao
5 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2026-05-06 18:47 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Naveen N Rao
Not-signed-off-by: Sean Christopherson <seanjc@google.com>
---
.../testing/selftests/kvm/include/x86/apic.h | 84 ++++++-
.../selftests/kvm/x86/fix_hypercall_test.c | 2 +-
.../selftests/kvm/x86/xapic_ipi_test.c | 4 +-
.../selftests/kvm/x86/xapic_state_test.c | 217 ++++++++++++++++++
4 files changed, 296 insertions(+), 11 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/apic.h b/tools/testing/selftests/kvm/include/x86/apic.h
index 31887bdc3d6c..3c3e362f2b98 100644
--- a/tools/testing/selftests/kvm/include/x86/apic.h
+++ b/tools/testing/selftests/kvm/include/x86/apic.h
@@ -23,21 +23,64 @@
#define APIC_BASE_MSR 0x800
#define X2APIC_ENABLE (1UL << 10)
+
+#define APIC_DELIVERY_MODE_FIXED 0
+#define APIC_DELIVERY_MODE_LOWESTPRIO 1
+#define APIC_DELIVERY_MODE_SMI 2
+#define APIC_DELIVERY_MODE_NMI 4
+#define APIC_DELIVERY_MODE_INIT 5
+#define APIC_DELIVERY_MODE_EXTINT 7
+
#define APIC_ID 0x20
+
#define APIC_LVR 0x30
-#define GET_APIC_ID_FIELD(x) (((x) >> 24) & 0xFF)
+#define APIC_LVR_MASK 0xFF00FF
+#define APIC_LVR_DIRECTED_EOI (1 << 24)
+#define GET_APIC_VERSION(x) ((x) & 0xFFu)
+#define GET_APIC_MAXLVT(x) (((x) >> 16) & 0xFFu)
+#ifdef CONFIG_X86_32
+# define APIC_INTEGRATED(x) ((x) & 0xF0u)
+#else
+# define APIC_INTEGRATED(x) (1)
+#endif
+#define APIC_XAPIC(x) ((x) >= 0x14)
+#define APIC_EXT_SPACE(x) ((x) & 0x80000000)
#define APIC_TASKPRI 0x80
+#define APIC_TPRI_MASK 0xFFu
+#define APIC_ARBPRI 0x90
+#define APIC_ARBPRI_MASK 0xFFu
#define APIC_PROCPRI 0xA0
-#define GET_APIC_PRI(x) (((x) & GENMASK(7, 4)) >> 4)
-#define SET_APIC_PRI(x, y) (((x) & ~GENMASK(7, 4)) | (y << 4))
+#define GET_APIC_PRI(x) (((x) & GENMASK(7, 4)) >> 4)
+#define SET_APIC_PRI(x, y) (((x) & ~GENMASK(7, 4)) | (y << 4))
#define APIC_EOI 0xB0
+#define APIC_EOI_ACK 0x0 /* Docs say 0 for future compat. */
+#define APIC_RRR 0xC0
+#define APIC_LDR 0xD0
+#define APIC_LDR_MASK (0xFFu << 24)
+#define GET_APIC_LOGICAL_ID(x) (((x) >> 24) & 0xFFu)
+#define SET_APIC_LOGICAL_ID(x) (((x) << 24))
+#define APIC_ALL_CPUS 0xFFu
+#define APIC_DFR 0xE0
+#define APIC_DFR_CLUSTER 0x0FFFFFFFul
+#define APIC_DFR_FLAT 0xFFFFFFFFul
#define APIC_SPIV 0xF0
+#define APIC_SPIV_DIRECTED_EOI (1 << 12)
#define APIC_SPIV_FOCUS_DISABLED (1 << 9)
#define APIC_SPIV_APIC_ENABLED (1 << 8)
#define APIC_ISR 0x100
-#define APIC_IRR 0x200
+#define APIC_ISR_NR 0x8 /* Number of 32 bit ISR registers. */
+#define APIC_TMR 0x180
+#define APIC_IRR 0x200
+#define APIC_ESR 0x280
+#define APIC_ESR_SEND_CS 0x00001
+#define APIC_ESR_RECV_CS 0x00002
+#define APIC_ESR_SEND_ACC 0x00004
+#define APIC_ESR_RECV_ACC 0x00008
+#define APIC_ESR_SENDILL 0x00020
+#define APIC_ESR_RECVILL 0x00040
+#define APIC_ESR_ILLREGA 0x00080
+#define APIC_LVTCMCI 0x2f0
#define APIC_ICR 0x300
-#define APIC_LVTCMCI 0x2f0
#define APIC_DEST_SELF 0x40000
#define APIC_DEST_ALLINC 0x80000
#define APIC_DEST_ALLBUT 0xC0000
@@ -61,16 +104,41 @@
#define APIC_DM_EXTINT 0x00700
#define APIC_VECTOR_MASK 0x000FF
#define APIC_ICR2 0x310
-#define SET_APIC_DEST_FIELD(x) ((x) << 24)
-#define APIC_LVTT 0x320
+#define GET_XAPIC_DEST_FIELD(x) (((x) >> 24) & 0xFF)
+#define SET_XAPIC_DEST_FIELD(x) ((x) << 24)
+#define APIC_LVTT 0x320
+#define APIC_LVTTHMR 0x330
+#define APIC_LVTPC 0x340
+#define APIC_LVT0 0x350
#define APIC_LVT_TIMER_ONESHOT (0 << 17)
#define APIC_LVT_TIMER_PERIODIC (1 << 17)
#define APIC_LVT_TIMER_TSCDEADLINE (2 << 17)
#define APIC_LVT_MASKED (1 << 16)
+#define APIC_LVT_LEVEL_TRIGGER (1 << 15)
+#define APIC_LVT_REMOTE_IRR (1 << 14)
+#define APIC_INPUT_POLARITY (1 << 13)
+#define APIC_SEND_PENDING (1 << 12)
+#define APIC_MODE_MASK 0x700
+#define GET_APIC_DELIVERY_MODE(x) (((x) >> 8) & 0x7)
+#define SET_APIC_DELIVERY_MODE(x, y) (((x) & ~0x700) | ((y) << 8))
+#define APIC_MODE_FIXED 0x0
+#define APIC_MODE_NMI 0x4
+#define APIC_MODE_EXTINT 0x7
+#define APIC_LVT1 0x360
+#define APIC_LVTERR 0x370
#define APIC_TMICT 0x380
#define APIC_TMCCT 0x390
#define APIC_TDCR 0x3E0
-#define APIC_SELF_IPI 0x3F0
+#define APIC_SELF_IPI 0x3F0
+#define APIC_TDR_DIV_TMBASE (1 << 2)
+#define APIC_TDR_DIV_1 0xB
+#define APIC_TDR_DIV_2 0x0
+#define APIC_TDR_DIV_4 0x1
+#define APIC_TDR_DIV_8 0x2
+#define APIC_TDR_DIV_16 0x3
+#define APIC_TDR_DIV_32 0x8
+#define APIC_TDR_DIV_64 0x9
+#define APIC_TDR_DIV_128 0xA
#define APIC_VECTOR_TO_BIT_NUMBER(v) ((unsigned int)(v) % 32)
#define APIC_VECTOR_TO_REG_OFFSET(v) ((unsigned int)(v) / 32 * 0x10)
diff --git a/tools/testing/selftests/kvm/x86/fix_hypercall_test.c b/tools/testing/selftests/kvm/x86/fix_hypercall_test.c
index 753a0e730ea8..ad61da99ee4c 100644
--- a/tools/testing/selftests/kvm/x86/fix_hypercall_test.c
+++ b/tools/testing/selftests/kvm/x86/fix_hypercall_test.c
@@ -63,7 +63,7 @@ static void guest_main(void)
memcpy(hypercall_insn, other_hypercall_insn, HYPERCALL_INSN_SIZE);
- ret = do_sched_yield(GET_APIC_ID_FIELD(xapic_read_reg(APIC_ID)));
+ ret = do_sched_yield(GET_XAPIC_DEST_FIELD(xapic_read_reg(APIC_ID)));
/*
* If the quirk is disabled, verify that guest_ud_handler() "returned"
diff --git a/tools/testing/selftests/kvm/x86/xapic_ipi_test.c b/tools/testing/selftests/kvm/x86/xapic_ipi_test.c
index 39ce9a9369f5..75b87f850abc 100644
--- a/tools/testing/selftests/kvm/x86/xapic_ipi_test.c
+++ b/tools/testing/selftests/kvm/x86/xapic_ipi_test.c
@@ -91,7 +91,7 @@ static void halter_guest_code(struct test_data_page *data)
verify_apic_base_addr();
xapic_enable();
- data->halter_apic_id = GET_APIC_ID_FIELD(xapic_read_reg(APIC_ID));
+ data->halter_apic_id = GET_XAPIC_DEST_FIELD(xapic_read_reg(APIC_ID));
data->halter_lvr = xapic_read_reg(APIC_LVR);
/*
@@ -147,7 +147,7 @@ static void sender_guest_code(struct test_data_page *data)
* set data->halter_apic_id.
*/
icr_val = (APIC_DEST_PHYSICAL | APIC_DM_FIXED | IPI_VECTOR);
- icr2_val = SET_APIC_DEST_FIELD(data->halter_apic_id);
+ icr2_val = SET_XAPIC_DEST_FIELD(data->halter_apic_id);
data->icr = icr_val;
data->icr2 = icr2_val;
diff --git a/tools/testing/selftests/kvm/x86/xapic_state_test.c b/tools/testing/selftests/kvm/x86/xapic_state_test.c
index 637bb90c1d93..3c7c6a5485e4 100644
--- a/tools/testing/selftests/kvm/x86/xapic_state_test.c
+++ b/tools/testing/selftests/kvm/x86/xapic_state_test.c
@@ -222,6 +222,221 @@ static void test_x2apic_id(void)
kvm_vm_free(vm);
}
+#define X2APIC_MSR(r) (0x800 + ((r) >> 4))
+
+static bool is_x2apic_mode = true;
+
+static bool is_ro_only_reg(int reg)
+{
+ switch (reg) {
+ case APIC_ID:
+ case APIC_LVR:
+ case APIC_PROCPRI:
+ case APIC_LDR:
+ case APIC_ARBPRI:
+ case APIC_ISR:
+ case APIC_TMR:
+ case APIC_IRR:
+ case APIC_TMCCT:
+ return true;
+ }
+ return false;
+}
+
+static bool is_xapic_only_reg(int reg)
+{
+ return reg == APIC_ARBPRI || reg == APIC_DFR || reg == APIC_ICR2;
+}
+
+static bool is_accelerated_reg(int reg, bool write)
+{
+ if (!write)
+ return reg != APIC_TMCCT;
+
+ switch (reg) {
+ case APIC_TASKPRI:
+ case APIC_EOI:
+ case APIC_SELF_IPI:
+ case APIC_ICR:
+ return true;
+ default:
+ break;
+ }
+ return false;
+}
+
+static void x2apic_msr_guest_code(void)
+{
+ const u32 xapic_regs[] = {
+ APIC_ID,
+ APIC_LVR,
+ APIC_TASKPRI,
+ APIC_PROCPRI,
+ APIC_LDR,
+ APIC_SPIV,
+ APIC_ISR,
+ APIC_TMR,
+ APIC_IRR,
+ APIC_ESR,
+ APIC_LVTT,
+ APIC_LVTTHMR,
+ APIC_LVTPC,
+ APIC_LVT0,
+ APIC_LVT1,
+ APIC_LVTERR,
+ APIC_TMICT,
+ APIC_TMCCT,
+ APIC_TDCR,
+
+ // APIC_EOI,
+ // APIC_SELF_IPI,
+ // APIC_ICR,
+
+ APIC_ARBPRI,
+ APIC_DFR,
+ APIC_ICR2,
+ };
+ int i, j;
+ u64 val;
+ u32 msr;
+ u8 vec;
+
+ cli();
+
+ if (is_x2apic_mode)
+ x2apic_enable();
+
+ GUEST_SYNC(0xbeef);
+
+ for (i = 0; i < ARRAY_SIZE(xapic_regs); i++) {
+ int nr_regs;
+ u8 rd, wr;
+
+ if (!is_x2apic_mode || is_xapic_only_reg(xapic_regs[i])) {
+ rd = wr = GP_VECTOR;
+ } else {
+ rd = 0;
+ wr = is_ro_only_reg(xapic_regs[i]) ? GP_VECTOR : 0;
+ }
+
+ if (xapic_regs[i] == APIC_IRR ||
+ xapic_regs[i] == APIC_ISR ||
+ xapic_regs[i] == APIC_TMR)
+ nr_regs = APIC_ISR_NR;
+ else
+ nr_regs = 1;
+
+ for (j = 0; j < nr_regs; j++) {
+ msr = X2APIC_MSR(xapic_regs[i] + j * 0x10);
+
+ vec = rdmsr_safe(msr, &val);
+ __GUEST_ASSERT(vec == rd,
+ "Wanted %s on RDMSR(%x), got %s",
+ ex_str(rd), msr, ex_str(vec));
+ GUEST_SYNC3(xapic_regs[i], false, vec);
+
+ vec = wrmsr_safe(msr, 0);
+ __GUEST_ASSERT(vec == wr,
+ "Wanted %s on WRMSR(%x), got %s",
+ ex_str(wr), msr, ex_str(vec));
+
+ GUEST_SYNC3(xapic_regs[i], true, vec);
+ }
+ }
+ GUEST_DONE();
+}
+
+static void test_x2apic_msr_intercepts(void)
+{
+ u64 last_guest, last_io, last_msr, guest_exits, io_exits, msr_exits;
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+ struct ucall uc;
+
+ vm = vm_create_with_one_vcpu(&vcpu, x2apic_msr_guest_code);
+
+ TEST_ASSERT_EQ(vcpu_get_stat(vcpu, guest_induced_exits), 0);
+
+ vcpu_run(vcpu);
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+
+ TEST_ASSERT_EQ(get_ucall(vcpu, NULL), UCALL_SYNC);
+
+ for ( ;; ) {
+ last_guest = vcpu_get_stat(vcpu, guest_induced_exits);
+ last_io = vcpu_get_stat(vcpu, io_exits);
+ last_msr = vcpu_get_stat(vcpu, msr_exits);
+
+ vcpu_run(vcpu);
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+
+ guest_exits = vcpu_get_stat(vcpu, guest_induced_exits);
+ io_exits = vcpu_get_stat(vcpu, io_exits);
+ msr_exits = vcpu_get_stat(vcpu, msr_exits);
+
+ TEST_ASSERT_EQ(io_exits, last_io + 1);
+
+ switch (get_ucall(vcpu, &uc)) {
+ case UCALL_SYNC: {
+ int vector = uc.args[2];
+ bool write = uc.args[1];
+ u32 reg = uc.args[0];
+
+ printf("reg = %x, write = %u, fault = %u\n", reg, write, vector);
+ if (vector || !is_accelerated_reg(reg, write)) {
+ TEST_ASSERT_EQ(msr_exits, last_msr + 1);
+ TEST_ASSERT_EQ(guest_exits - last_guest,
+ io_exits - last_io + msr_exits - last_msr);
+ } else {
+ TEST_ASSERT_EQ(msr_exits, last_msr);
+ TEST_ASSERT_EQ(guest_exits - last_guest,
+ io_exits - last_io);
+ }
+ // printf("On msr = %x\n", msr);
+ break;
+ }
+ case UCALL_DONE:
+ goto test_xapic;
+ case UCALL_ABORT:
+ REPORT_GUEST_ASSERT(uc);
+ default:
+ TEST_FAIL("Unknown ucall %lu", uc.cmd);
+ }
+ }
+
+test_xapic:
+ kvm_vm_free(vm);
+
+ vm = vm_create_with_one_vcpu(&vcpu, x2apic_msr_guest_code);
+ vcpu_clear_cpuid_feature(vcpu, X86_FEATURE_X2APIC);
+ is_x2apic_mode = false;
+ sync_global_to_guest(vm, is_x2apic_mode);
+
+ for ( ;; ) {
+ vcpu_run(vcpu);
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+
+ // vcpu_get_stat(vcpu, io_exits);
+ // vcpu_get_stat(vcpu, msr_exits);
+ // vcpu_get_stat(vcpu, guest_induced_exits);
+
+ switch (get_ucall(vcpu, &uc)) {
+ case UCALL_SYNC:
+ // u32 msr = uc.args[1];
+
+ // printf("On msr = %x\n", msr);
+ break;
+ case UCALL_DONE:
+ goto done;
+ case UCALL_ABORT:
+ REPORT_GUEST_ASSERT(uc);
+ default:
+ TEST_FAIL("Unknown ucall %lu", uc.cmd);
+ }
+ }
+done:
+}
+
int main(int argc, char *argv[])
{
struct xapic_vcpu x = {
@@ -230,6 +445,8 @@ int main(int argc, char *argv[])
};
struct kvm_vm *vm;
+ test_x2apic_msr_intercepts();
+
vm = vm_create_with_one_vcpu(&x.vcpu, x2apic_guest_code);
test_icr(&x);
kvm_vm_free(vm);
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/5] KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually supports
2026-05-06 18:47 ` [PATCH v2 1/5] KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually supports Sean Christopherson
@ 2026-05-07 13:56 ` Naveen N Rao
2026-05-07 14:27 ` Sean Christopherson
0 siblings, 1 reply; 19+ messages in thread
From: Naveen N Rao @ 2026-05-07 13:56 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
On Wed, May 06, 2026 at 11:47:42AM -0700, Sean Christopherson wrote:
> Fix multiple (classes of) bugs with one stone by using KVM's mask of
> readable local APIC registers to determine which x2APIC MSRs to pass
> through (or not) when toggling x2AVIC on/off. The existing hand-coded
> list of MSRs is wrong on multiple fronts:
>
> - ARBPRI isn't supported by x2APIC, but its unaccelerated AVIC intercept
^^^^^^^^^
access/exit?
> is fault-like; disabling interception is nonsensical and suboptimal as
> the access generates a #VMEXIT that requires decoding the instruction.
As far as I can tell, it looks like ARBPRI is actually "supported" in
x2APIC mode on AMD processors. APM lists this in the x2APIC register
list (Section 16.11.1 x2APIC Register Address Space Table 16-6. x2APIC
Register), as well as in the AVIC chapter (15.29.3.1, table 15-22).
This is probably not relevant though, since it looks like KVM has never
supported this.
>
> - DFR and ICR2 aren't supported by x2APIC and so don't need their
> intercepts disabled for performance reasons. While the #GP due to
> x2APIC being abled has higher priority than the trap-like #VMEXIT,
^^^^^ enabled
> disabling interception of unsupported MSRs is confusing and
> unnecessary.
>
> - RRR is completely unsupported.
Would be good to also call out change to EOI and LVTT handling. LVTT
reads will now be allowed and should be returned from the backing page.
I'm guessing this is fine and that the hardware won't validate it as
LVTT may have TSC Deadline enabled (for emulation).
- Naveen
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 2/5] KVM: SVM: Always intercept RDMSR for TMCCT (current APIC timer count)
2026-05-06 18:47 ` [PATCH v2 2/5] KVM: SVM: Always intercept RDMSR for TMCCT (current APIC timer count) Sean Christopherson
@ 2026-05-07 14:19 ` Naveen N Rao
2026-05-07 15:44 ` Sean Christopherson
0 siblings, 1 reply; 19+ messages in thread
From: Naveen N Rao @ 2026-05-07 14:19 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
On Wed, May 06, 2026 at 11:47:43AM -0700, Sean Christopherson wrote:
> Explicitly intercept RDMSR for TMMCT, a.k.a. the current APIC timer count,
> when x2AVIC is enabled, as TMMCT reads aren't accelerated by hardware.
s/TMMCT/TMCCT for the above two lines.
> Disabling interception is suboptimal as the RDMSR generates an
> AVIC_UNACCELERATED_ACCESS fault #VMEXIT, which forces KVM to decode the
> instruction to figure out what the guest was trying to access.
>
> Note, the only reason this isn't a fatal bug is that the AVIC architecture
> had the foresight to guard against buggy hypervisors. E.g. if hardware
> simply read from the virtual APIC page, the guest would get garbage.
>
> Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
> Cc: stable@vger.kernel.org
> Cc: Naveen N Rao (AMD) <naveen@kernel.org>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/svm/avic.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 4f203e503e8e..d693c9ff9f18 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -172,6 +172,9 @@ static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm,
> svm_set_intercept_for_msr(vcpu, APIC_BASE_MSR + i,
> MSR_TYPE_R, intercept);
>
> + if (!intercept)
> + svm_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_R);
> +
Nit: I'm thinking it might be better to roll this into the previous
loop. That way, all MSR_TYPE_R intercepts are setup in one place and we
don't need to parse the if (!intercept) condition..
Something like this?
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index c5d46c0d2403..f292cba45e07 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -136,11 +136,9 @@ static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm,
for_each_set_bit(i, (unsigned long *)&x2apic_readable_mask,
BITS_PER_TYPE(x2apic_readable_mask))
- svm_set_intercept_for_msr(vcpu, APIC_BASE_MSR + i,
- MSR_TYPE_R, intercept);
-
- if (!intercept)
- svm_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_R);
+ if (APIC_BASE_MSR + i != X2APIC_MSR(APIC_TMCCT))
+ svm_set_intercept_for_msr(vcpu, APIC_BASE_MSR + i,
+ MSR_TYPE_R, intercept);
- Naveen
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/5] KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually supports
2026-05-07 13:56 ` Naveen N Rao
@ 2026-05-07 14:27 ` Sean Christopherson
2026-05-08 16:35 ` Naveen N Rao
0 siblings, 1 reply; 19+ messages in thread
From: Sean Christopherson @ 2026-05-07 14:27 UTC (permalink / raw)
To: Naveen N Rao; +Cc: Paolo Bonzini, kvm, linux-kernel
On Thu, May 07, 2026, Naveen N Rao wrote:
> On Wed, May 06, 2026 at 11:47:42AM -0700, Sean Christopherson wrote:
> > Fix multiple (classes of) bugs with one stone by using KVM's mask of
> > readable local APIC registers to determine which x2APIC MSRs to pass
> > through (or not) when toggling x2AVIC on/off. The existing hand-coded
> > list of MSRs is wrong on multiple fronts:
> >
> > - ARBPRI isn't supported by x2APIC, but its unaccelerated AVIC intercept
> ^^^^^^^^^
> access/exit?
Ya, #VMEXIT is a better description here.
> > is fault-like; disabling interception is nonsensical and suboptimal as
> > the access generates a #VMEXIT that requires decoding the instruction.
>
> As far as I can tell, it looks like ARBPRI is actually "supported" in
> x2APIC mode on AMD processors. APM lists this in the x2APIC register
> list (Section 16.11.1 x2APIC Register Address Space Table 16-6. x2APIC
> Register), as well as in the AVIC chapter (15.29.3.1, table 15-22).
Yeah, agreed. I missed Table 16-6 (so many things to cross-reference, blech).
> This is probably not relevant though, since it looks like KVM has never
> supported this.
Definitely worth getting it right in the changelog though.
> > - DFR and ICR2 aren't supported by x2APIC and so don't need their
> > intercepts disabled for performance reasons. While the #GP due to
> > x2APIC being abled has higher priority than the trap-like #VMEXIT,
> ^^^^^ enabled
>
> > disabling interception of unsupported MSRs is confusing and
> > unnecessary.
> >
> > - RRR is completely unsupported.
>
> Would be good to also call out change to EOI and LVTT handling.
+1. I either totally missed or forgot that this also impacts LVTT reads, and
I definitely missed that KVM was allowing EOI reads.
> LVTT reads will now be allowed and should be returned from the backing page.
> I'm guessing this is fine and that the hardware won't validate it as
> LVTT may have TSC Deadline enabled (for emulation).
Ya, confirmed via the KUT test:
diff --git x86/apic.c x86/apic.c
index 0a52e9a4..b91e8500 100644
--- x86/apic.c
+++ x86/apic.c
@@ -569,6 +569,9 @@ static inline void apic_change_mode(unsigned long new_mode)
lvtt = apic_read(APIC_LVTT);
apic_write(APIC_LVTT, (lvtt & ~APIC_LVT_TIMER_MASK) | new_mode);
+
+ lvtt = apic_read(APIC_LVTT);
+ report((lvtt & APIC_LVT_TIMER_MASK) == new_mode, "LVTT mode switch");
}
static void test_apic_change_mode(void)
And given that AVIC (!x2APIC mode) says that reads are allowed, I don't see how
hardware could do anything differently.
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v2 2/5] KVM: SVM: Always intercept RDMSR for TMCCT (current APIC timer count)
2026-05-07 14:19 ` Naveen N Rao
@ 2026-05-07 15:44 ` Sean Christopherson
2026-05-07 18:26 ` Sean Christopherson
0 siblings, 1 reply; 19+ messages in thread
From: Sean Christopherson @ 2026-05-07 15:44 UTC (permalink / raw)
To: Naveen N Rao; +Cc: Paolo Bonzini, kvm, linux-kernel
On Thu, May 07, 2026, Naveen N Rao wrote:
> On Wed, May 06, 2026 at 11:47:43AM -0700, Sean Christopherson wrote:
> > Explicitly intercept RDMSR for TMMCT, a.k.a. the current APIC timer count,
> > when x2AVIC is enabled, as TMMCT reads aren't accelerated by hardware.
>
> s/TMMCT/TMCCT for the above two lines.
>
> > Disabling interception is suboptimal as the RDMSR generates an
> > AVIC_UNACCELERATED_ACCESS fault #VMEXIT, which forces KVM to decode the
> > instruction to figure out what the guest was trying to access.
> >
> > Note, the only reason this isn't a fatal bug is that the AVIC architecture
> > had the foresight to guard against buggy hypervisors. E.g. if hardware
> > simply read from the virtual APIC page, the guest would get garbage.
> >
> > Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
> > Cc: stable@vger.kernel.org
> > Cc: Naveen N Rao (AMD) <naveen@kernel.org>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> > arch/x86/kvm/svm/avic.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index 4f203e503e8e..d693c9ff9f18 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -172,6 +172,9 @@ static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm,
> > svm_set_intercept_for_msr(vcpu, APIC_BASE_MSR + i,
> > MSR_TYPE_R, intercept);
> >
> > + if (!intercept)
> > + svm_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_R);
> > +
>
> Nit: I'm thinking it might be better to roll this into the previous
> loop. That way, all MSR_TYPE_R intercepts are setup in one place and we
> don't need to parse the if (!intercept) condition..
>
> Something like this?
>
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index c5d46c0d2403..f292cba45e07 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -136,11 +136,9 @@ static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm,
>
> for_each_set_bit(i, (unsigned long *)&x2apic_readable_mask,
> BITS_PER_TYPE(x2apic_readable_mask))
> - svm_set_intercept_for_msr(vcpu, APIC_BASE_MSR + i,
> - MSR_TYPE_R, intercept);
> -
> - if (!intercept)
> - svm_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_R);
> + if (APIC_BASE_MSR + i != X2APIC_MSR(APIC_TMCCT))
> + svm_set_intercept_for_msr(vcpu, APIC_BASE_MSR + i,
> + MSR_TYPE_R, intercept);
Hmm, I don't love burying the check in the for-loop, as it makes it a bit hard to
see that KVM *never* toggles the intercept for TMCCT. And in the unlikely scenario
we need to exempt more MSRs, this will be annoying to maintain.
What about adjusting the mask straightaway?
diff --git arch/x86/kvm/lapic.h arch/x86/kvm/lapic.h
index 274885af4ebc..2842a7b70d74 100644
--- arch/x86/kvm/lapic.h
+++ arch/x86/kvm/lapic.h
@@ -156,6 +156,8 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
int kvm_lapic_set_pv_eoi(struct kvm_vcpu *vcpu, u64 data, unsigned long len);
void kvm_lapic_exit(void);
+#define APIC_REG_MASK(reg) (1ull << ((reg) >> 4))
+
u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic);
static inline void kvm_lapic_set_irr(int vec, struct kvm_lapic *apic)
diff --git arch/x86/kvm/svm/avic.c arch/x86/kvm/svm/avic.c
index c5d46c0d2403..e51f468477a4 100644
--- arch/x86/kvm/svm/avic.c
+++ arch/x86/kvm/svm/avic.c
@@ -132,16 +132,14 @@ static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm,
if (!x2avic_enabled)
return;
- x2apic_readable_mask = kvm_lapic_readable_reg_mask(vcpu->arch.apic);
+ x2apic_readable_mask = kvm_lapic_readable_reg_mask(vcpu->arch.apic) &
+ ~APIC_REG_MASK(APIC_TMCCT);
for_each_set_bit(i, (unsigned long *)&x2apic_readable_mask,
BITS_PER_TYPE(x2apic_readable_mask))
svm_set_intercept_for_msr(vcpu, APIC_BASE_MSR + i,
MSR_TYPE_R, intercept);
- if (!intercept)
- svm_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_R);
-
svm_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TASKPRI), MSR_TYPE_W, intercept);
svm_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_EOI), MSR_TYPE_W, intercept);
svm_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W, intercept);
Oh! Actually, even better! This is a great opportunity to dedup Intel vs. AMD
(and we can/should do the same for writes).
diff --git arch/x86/kvm/lapic.c arch/x86/kvm/lapic.c
index 4078e624ca66..ac57d6dbe032 100644
--- arch/x86/kvm/lapic.c
+++ arch/x86/kvm/lapic.c
@@ -1730,7 +1730,7 @@ static inline struct kvm_lapic *to_lapic(struct kvm_io_device *dev)
#define APIC_REGS_MASK(first, count) \
(APIC_REG_MASK(first) * ((1ull << (count)) - 1))
-u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic)
+static u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic)
{
/* Leave bits '0' for reserved and write-only registers. */
u64 valid_reg_mask =
@@ -1766,7 +1766,22 @@ u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic)
return valid_reg_mask;
}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lapic_readable_reg_mask);
+
+u64 kvm_x2apic_disable_intercept_reg_mask(struct kvm_vcpu *vcpu)
+{
+ if (WARN_ON_ONCE(!lapic_in_kernel(vcpu)))
+ return 0;
+
+ /*
+ * TMMCT, a.k.a. the current APIC timer count, reads aren't accelerated
+ * by hardware (Intel or AMD), and handling the fault-like APIC-access
+ * VM-Exit is more expensive than handling a WRMSR VM-Exit (because the
+ * APIC-access path requires slow emulation of the code stream).
+ */
+ return kvm_lapic_readable_reg_mask(vcpu->arch.apic) &
+ ~APIC_REG_MASK(APIC_TMCCT);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_x2apic_disable_intercept_reg_mask);
static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
void *data)
diff --git arch/x86/kvm/lapic.h arch/x86/kvm/lapic.h
index 274885af4ebc..7b61167e1b5c 100644
--- arch/x86/kvm/lapic.h
+++ arch/x86/kvm/lapic.h
@@ -156,7 +156,7 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
int kvm_lapic_set_pv_eoi(struct kvm_vcpu *vcpu, u64 data, unsigned long len);
void kvm_lapic_exit(void);
-u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic);
+u64 kvm_x2apic_disable_intercept_reg_mask(struct kvm_vcpu *vcpu);
static inline void kvm_lapic_set_irr(int vec, struct kvm_lapic *apic)
{
diff --git arch/x86/kvm/svm/avic.c arch/x86/kvm/svm/avic.c
index c5d46c0d2403..1532bfff5686 100644
--- arch/x86/kvm/svm/avic.c
+++ arch/x86/kvm/svm/avic.c
@@ -132,16 +132,13 @@ static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm,
if (!x2avic_enabled)
return;
- x2apic_readable_mask = kvm_lapic_readable_reg_mask(vcpu->arch.apic);
+ x2apic_readable_mask = kvm_x2apic_disable_intercept_reg_mask(vcpu);
for_each_set_bit(i, (unsigned long *)&x2apic_readable_mask,
BITS_PER_TYPE(x2apic_readable_mask))
svm_set_intercept_for_msr(vcpu, APIC_BASE_MSR + i,
MSR_TYPE_R, intercept);
- if (!intercept)
- svm_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_R);
-
svm_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TASKPRI), MSR_TYPE_W, intercept);
svm_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_EOI), MSR_TYPE_W, intercept);
svm_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W, intercept);
diff --git arch/x86/kvm/vmx/vmx.c arch/x86/kvm/vmx/vmx.c
index 859f4bc01445..40dff50e800a 100644
--- arch/x86/kvm/vmx/vmx.c
+++ arch/x86/kvm/vmx/vmx.c
@@ -4138,7 +4138,7 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu)
* mode, only the current timer count needs on-demand emulation by KVM.
*/
if (mode & MSR_BITMAP_MODE_X2APIC_APICV)
- msr_bitmap[read_idx] = ~kvm_lapic_readable_reg_mask(vcpu->arch.apic);
+ msr_bitmap[read_idx] = ~kvm_x2apic_disable_intercept_reg_mask(vcpu);
else
msr_bitmap[read_idx] = ~0ull;
msr_bitmap[write_idx] = ~0ull;
@@ -4151,7 +4151,6 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu)
!(mode & MSR_BITMAP_MODE_X2APIC));
if (mode & MSR_BITMAP_MODE_X2APIC_APICV) {
- vmx_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_RW);
vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_EOI), MSR_TYPE_W);
vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W);
if (enable_ipiv)
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v2 2/5] KVM: SVM: Always intercept RDMSR for TMCCT (current APIC timer count)
2026-05-07 15:44 ` Sean Christopherson
@ 2026-05-07 18:26 ` Sean Christopherson
2026-05-08 16:41 ` Naveen N Rao
0 siblings, 1 reply; 19+ messages in thread
From: Sean Christopherson @ 2026-05-07 18:26 UTC (permalink / raw)
To: Naveen N Rao; +Cc: Paolo Bonzini, kvm, linux-kernel
On Thu, May 07, 2026, Sean Christopherson wrote:
> Oh! Actually, even better! This is a great opportunity to dedup Intel vs. AMD
> (and we can/should do the same for writes).
Scratch the writes idea, the behavior of Intel x2APIC virtualization and AMD x2AVIC
are too different. Intel doesn't trap writes when x2APIC virtualization is
enabled, and instead redirects the raw value to the APIC backing page. Which I
guess makes sense since WRMSR interception is about the same overall cost, and
it allows the host to safely and fully disable interception for registers it
doesn't want/need to interpose on.
AMD on the other hand more or less follows the xAPIC (AVIC) behavior, where regs
without "fancy" acceleration generate traps.
Side topic, handling a trap-like unaccelerated AVIC #VMEXIT is ~10 cycles faster
than handling an intercepted WRMSR (out of ~1770+ cycles for a super simple reg
like LVT0). I.e. we _could_ deliberately disable interception of x2AVIC MSRs that
get trap-like behavior, but for me, being perfectly consistent between Intel and
AMD is more valuable than shaving a few cycles for paths that should rarely be hit
(most of the trap-like registers are "configure once and forget about them").
The only reg that's at all hot is Timer Initial Count Register, and (a) it's a
moot point with TSC Deadline mode, and (b) the cost to program hrtimers is so high
than shaving ~10 cycles is completely meaningless.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/5] KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually supports
2026-05-07 14:27 ` Sean Christopherson
@ 2026-05-08 16:35 ` Naveen N Rao
0 siblings, 0 replies; 19+ messages in thread
From: Naveen N Rao @ 2026-05-08 16:35 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
On Thu, May 07, 2026 at 07:27:11AM -0700, Sean Christopherson wrote:
> On Thu, May 07, 2026, Naveen N Rao wrote:
> > On Wed, May 06, 2026 at 11:47:42AM -0700, Sean Christopherson wrote:
>
> > LVTT reads will now be allowed and should be returned from the backing page.
> > I'm guessing this is fine and that the hardware won't validate it as
> > LVTT may have TSC Deadline enabled (for emulation).
>
> Ya, confirmed via the KUT test:
>
> diff --git x86/apic.c x86/apic.c
> index 0a52e9a4..b91e8500 100644
> --- x86/apic.c
> +++ x86/apic.c
> @@ -569,6 +569,9 @@ static inline void apic_change_mode(unsigned long new_mode)
>
> lvtt = apic_read(APIC_LVTT);
> apic_write(APIC_LVTT, (lvtt & ~APIC_LVT_TIMER_MASK) | new_mode);
> +
> + lvtt = apic_read(APIC_LVTT);
> + report((lvtt & APIC_LVT_TIMER_MASK) == new_mode, "LVTT mode switch");
> }
>
> static void test_apic_change_mode(void)
>
> And given that AVIC (!x2APIC mode) says that reads are allowed, I don't see how
> hardware could do anything differently.
Indeed, I additionally did:
diff --git a/x86/apic.c b/x86/apic.c
index b45fc9c1..b0902b2d 100644
--- a/x86/apic.c
+++ b/x86/apic.c
@@ -42,11 +42,13 @@ static void __test_tsc_deadline_timer(void)
static int enable_tsc_deadline_timer(void)
{
- uint32_t lvtt;
+ uint32_t lvtt, new_mode;
if (this_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER)) {
lvtt = APIC_LVT_TIMER_TSCDEADLINE | TSC_DEADLINE_TIMER_VECTOR;
apic_write(APIC_LVTT, lvtt);
+ new_mode = apic_read(APIC_LVTT);
+ report((new_mode & APIC_LVT_TIMER_MASK) == (lvtt & APIC_LVT_TIMER_MASK), "LVTT TSC Deadline mode");
return 1;
} else {
return 0;
... and that works fine.
Thanks,
Naveen
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v2 2/5] KVM: SVM: Always intercept RDMSR for TMCCT (current APIC timer count)
2026-05-07 18:26 ` Sean Christopherson
@ 2026-05-08 16:41 ` Naveen N Rao
2026-05-08 16:56 ` Sean Christopherson
0 siblings, 1 reply; 19+ messages in thread
From: Naveen N Rao @ 2026-05-08 16:41 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
On Thu, May 07, 2026 at 11:26:15AM -0700, Sean Christopherson wrote:
> On Thu, May 07, 2026, Sean Christopherson wrote:
> > Oh! Actually, even better! This is a great opportunity to dedup Intel vs. AMD
> > (and we can/should do the same for writes).
>
> Scratch the writes idea, the behavior of Intel x2APIC virtualization and AMD x2AVIC
> are too different. Intel doesn't trap writes when x2APIC virtualization is
> enabled, and instead redirects the raw value to the APIC backing page. Which I
> guess makes sense since WRMSR interception is about the same overall cost, and
> it allows the host to safely and fully disable interception for registers it
> doesn't want/need to interpose on.
>
> AMD on the other hand more or less follows the xAPIC (AVIC) behavior, where regs
> without "fancy" acceleration generate traps.
Sure, your earlier plan to update the readable registers mask is fine.
>
> Side topic, handling a trap-like unaccelerated AVIC #VMEXIT is ~10 cycles faster
> than handling an intercepted WRMSR (out of ~1770+ cycles for a super simple reg
> like LVT0). I.e. we _could_ deliberately disable interception of x2AVIC MSRs that
> get trap-like behavior, but for me, being perfectly consistent between Intel and
> AMD is more valuable than shaving a few cycles for paths that should rarely be hit
> (most of the trap-like registers are "configure once and forget about them").
>
> The only reg that's at all hot is Timer Initial Count Register, and (a) it's a
> moot point with TSC Deadline mode, and (b) the cost to program hrtimers is so high
> than shaving ~10 cycles is completely meaningless.
Thanks for the checking this - this was something I wanted to check. And
I agree with your assessment. None of those registers look to be
commonly written to, and ~10 cycles is almost in the noise. If we ever
come across a performance issue, it should be fairly simple to pass
additional registers through (with good reason, of course).
On a side note, how did you measure this? My naive attempt showed a lot
of variation between runs.
- Naveen
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 2/5] KVM: SVM: Always intercept RDMSR for TMCCT (current APIC timer count)
2026-05-08 16:41 ` Naveen N Rao
@ 2026-05-08 16:56 ` Sean Christopherson
0 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2026-05-08 16:56 UTC (permalink / raw)
To: Naveen N Rao; +Cc: Paolo Bonzini, kvm, linux-kernel
On Fri, May 08, 2026, Naveen N Rao wrote:
> On Thu, May 07, 2026 at 11:26:15AM -0700, Sean Christopherson wrote:
> > The only reg that's at all hot is Timer Initial Count Register, and (a) it's a
> > moot point with TSC Deadline mode, and (b) the cost to program hrtimers is so high
> > than shaving ~10 cycles is completely meaningless.
>
> Thanks for the checking this - this was something I wanted to check. And
> I agree with your assessment. None of those registers look to be
> commonly written to, and ~10 cycles is almost in the noise. If we ever
> come across a performance issue, it should be fairly simple to pass
> additional registers through (with good reason, of course).
>
> On a side note, how did you measure this? My naive attempt showed a lot
> of variation between runs.
I hacked the x86/vmexit.c test in KUT, and then ran it with and without x2APIC:
./x86/run x86/vmexit.flat -smp 2 -cpu qemu64,+x2apic -append apic_wr_lvt0
./x86/run x86/vmexit.flat -smp 2 -cpu qemu64,-x2apic -append apic_wr_lvt0
That test super useful for micro-benchmarking single instructions and/or short
sequences. Even without pinning vCPUs, it does a decent job of generating stable
results.
diff --git a/x86/vmexit.c b/x86/vmexit.c
index 5296ed38..1749cbd8 100644
--- a/x86/vmexit.c
+++ b/x86/vmexit.c
@@ -22,6 +22,7 @@ struct test {
static int nr_cpus;
static u64 cr4_shadow;
+static u32 lvt0;
static void cpuid_test(void)
{
@@ -447,6 +448,11 @@ static void tscdeadline(void)
while (x == 0) barrier();
}
+static void apic_wr_lvt0(void)
+{
+ apic_write(APIC_LVT0, lvt0);
+}
+
static void wr_tsx_ctrl_msr(void)
{
wrmsr(MSR_IA32_TSX_CTRL, 0);
@@ -501,6 +507,7 @@ static struct test tests[] = {
{ mov_from_cr8, "mov_from_cr8", .parallel = 1, },
{ mov_to_cr8, "mov_to_cr8" , .parallel = 1, },
#endif
+ { apic_wr_lvt0, "apic_wr_lvt0", .parallel = 1, },
{ inl_pmtimer, "inl_from_pmtimer", .parallel = 1, },
{ inl_nop_qemu, "inl_from_qemu", .parallel = 1 },
{ inl_nop_kernel, "inl_from_kernel", .parallel = 1 },
@@ -618,6 +625,7 @@ int main(int ac, char **av)
setup_vm();
cr4_shadow = read_cr4();
+ lvt0 = apic_read(APIC_LVT0);
handle_irq(IPI_TEST_VECTOR, self_ipi_isr);
nr_cpus = cpu_count();
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v2 3/5] KVM: SVM: Only disable x2AVIC WRMSR interception for MSRs that are accelerated
2026-05-06 18:47 ` [PATCH v2 3/5] KVM: SVM: Only disable x2AVIC WRMSR interception for MSRs that are accelerated Sean Christopherson
@ 2026-05-08 16:59 ` Naveen N Rao
0 siblings, 0 replies; 19+ messages in thread
From: Naveen N Rao @ 2026-05-08 16:59 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
On Wed, May 06, 2026 at 11:47:44AM -0700, Sean Christopherson wrote:
> When x2AVIC is enabled, disable WRMSR interception only for MSRs that are
> actually accelerated by hardware. Disabling interception for MSRs that
> aren't accelerated is functionally "fine", and in some cases a weird "win"
> for performance, but only for cases that should never be triggered by a
> well-behaved VM (writes to read-only registers; the #GP will typically
> occur in the guest without taking a #VMEXIT, even for fault-like exits).
Doesn't have to be part of this series, but I think we can now also clean
up avic_unaccelerated_access_interception() and some of the other
functions it calls for updating LDR/DFR. With this change, I believe the
only reason we can ever see AVIC_UNACCELERATED_ACCESS when x2AVIC is
enabled will be for APIC_EOI writes for level-triggered interrupts.
Probably worth a comment/assert in that function.
>
> But overall, disabling interception for MSRs that aren't accelerated is at
> best confusing and unintuitive, and at worst introduces avoidable risk, as
> the effective guest-visible behavior depends on the whims of the CPU (the
> behavior of x2APIC MSR writes on at least Zen4 doesn't match the behavior
> documented in the table in "15.29.3.1 Virtual APIC Register Accesses" of
> the APM).
FWIW, I tested the current behavior (with most MSRs passed-through) and
the new behavior with your changes, and (had AI) put together a table to
capture all of this. It also serves to document what x2AVIC does (except
for a few MSRs that were intercepted currently).
It is inline with my expectations, no surprises here:
+--------------+---------------+---------------+---------------+---------------+---------------+
| MSR | Register | Current RDMSR | New RDMSR | Current WRMSR | New WRMSR |
+--------------+---------------+---------------+---------------+---------------+---------------+
| 0x802 | APIC_ID | HW | HW | #GP-direct | * MSR_INT:#GP |
| 0x803 | APIC_LVR | HW | HW | #GP-direct | * MSR_INT:#GP |
| 0x808 | APIC_TPR | HW | HW | HW | HW |
| 0x809 | APIC_ARBPRI | UAA(f):#GP | * MSR_INT:#GP | #GP-direct | * MSR_INT:#GP |
| 0x80A | APIC_PPR | HW | HW | #GP-direct | * MSR_INT:#GP |
| 0x80B | APIC_EOI | #GP-direct | * MSR_INT:#GP | HW | HW |
| 0x80C | APIC_RRR | #GP-direct | * MSR_INT:#GP | #GP-direct | * MSR_INT:#GP |
| 0x80D | APIC_LDR | HW | HW | #GP-direct | * MSR_INT:#GP |
| 0x80E | APIC_DFR | #GP-direct | * MSR_INT:#GP | #GP-direct | * MSR_INT:#GP |
| 0x80F | APIC_SPIV | HW | HW | UAA(t) | * MSR_INT:ok |
| 0x810 | APIC_ISR0 | HW | HW | #GP-direct | * MSR_INT:#GP |
| 0x811..0x817 | APIC_ISR1..7 | MSR_INT:ok | * HW | MSR_INT:#GP | MSR_INT:#GP |
| 0x818 | APIC_TMR0 | HW | HW | #GP-direct | * MSR_INT:#GP |
| 0x819..0x81F | APIC_TMR1..7 | MSR_INT:ok | * HW | MSR_INT:#GP | MSR_INT:#GP |
| 0x820 | APIC_IRR0 | HW | HW | #GP-direct | * MSR_INT:#GP |
| 0x821..0x827 | APIC_IRR1..7 | MSR_INT:ok | * HW | MSR_INT:#GP | MSR_INT:#GP |
| 0x828 | APIC_ESR | HW | HW | UAA(t) | * MSR_INT:ok |
| 0x830 | APIC_ICR | HW | HW | INC_IPI | HW / INC_IPI |
| 0x831 | APIC_ICR2 [1] | #GP-direct | * MSR_INT:#GP | #GP-direct | * MSR_INT:#GP |
| 0x832 | APIC_LVTT | MSR_INT:ok | * HW | MSR_INT:ok | MSR_INT:ok |
| 0x833 | APIC_LVTTHMR | HW | HW | UAA(t) | * MSR_INT:ok |
| 0x834 | APIC_LVTPC | HW | HW | UAA(t) | * MSR_INT:ok |
| 0x835 | APIC_LVT0 | HW | HW | UAA(t) | * MSR_INT:ok |
| 0x836 | APIC_LVT1 | HW | HW | UAA(t) | * MSR_INT:ok |
| 0x837 | APIC_LVTERR | HW | HW | UAA(t) | * MSR_INT:ok |
| 0x838 | APIC_TMICT | HW | HW | UAA(t) | * MSR_INT:ok |
| 0x839 | APIC_TMCCT | UAA(f):0 | * MSR_INT:0 | #GP-direct | * MSR_INT:#GP |
| 0x83E | APIC_TDCR | HW | HW | UAA(t) | * MSR_INT:ok |
| 0x83F | APIC_SELF_IPI | MSR_INT:#GP | MSR_INT:#GP | MSR_INT:ok | * HW / INC_IPI|
+--------------+---------------+---------------+---------------+---------------+---------------+
Legend:
HW HW-accelerated; no #VMEXIT
#GP-direct CPU delivers #GP from microcode; no #VMEXIT
UAA(f):X AVIC_UNACCEL_ACCESS exit, fault flavor; KVM emulates, guest sees X
UAA(t) AVIC_UNACCEL_ACCESS exit, trap flavor; write completed in vAPIC page, KVM post-processes
MSR_INT:X MSR_INTERCEPT (MSR-bitmap) exit; KVM emulates, guest sees X
INC_IPI AVIC_INCOMPLETE_IPI exit; KVM emulates IPI delivery
* cell value differs from corresponding existing-behavior cell
- Naveen
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 4/5] *** DO NOT MERGE *** KVM: x86: Hack in a stat to track guest-induced exits (for testing)
2026-05-06 18:47 ` [PATCH v2 4/5] *** DO NOT MERGE *** KVM: x86: Hack in a stat to track guest-induced exits (for testing) Sean Christopherson
@ 2026-05-08 17:14 ` Naveen N Rao
2026-05-08 17:49 ` Sean Christopherson
0 siblings, 1 reply; 19+ messages in thread
From: Naveen N Rao @ 2026-05-08 17:14 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
On Wed, May 06, 2026 at 11:47:45AM -0700, Sean Christopherson wrote:
> Not-signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/include/asm/kvm_host.h | 2 +
> arch/x86/kvm/svm/svm.c | 81 +++++++++++++++++++++++++++++++++
> arch/x86/kvm/vmx/vmx.c | 79 ++++++++++++++++++++++++++++++++
> arch/x86/kvm/x86.c | 2 +
> 4 files changed, 164 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c470e40a00aa..bff534bd00dc 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1703,6 +1703,8 @@ struct kvm_vcpu_stat {
> u64 invlpg;
>
> u64 exits;
> + u64 guest_induced_exits;
> + u64 msr_exits;
> u64 io_exits;
> u64 mmio_exits;
> u64 signal_exits;
This looks promising. I'm assuming 'hack' in the title is only meant to
indicate the PoC nature of this?
Taking this forward, introducing a similar bucket for all AVIC/APICv
related exits might help with a few tests.
- Naveen
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 4/5] *** DO NOT MERGE *** KVM: x86: Hack in a stat to track guest-induced exits (for testing)
2026-05-08 17:14 ` Naveen N Rao
@ 2026-05-08 17:49 ` Sean Christopherson
2026-05-09 5:08 ` Naveen N Rao
0 siblings, 1 reply; 19+ messages in thread
From: Sean Christopherson @ 2026-05-08 17:49 UTC (permalink / raw)
To: Naveen N Rao; +Cc: Paolo Bonzini, kvm, linux-kernel
On Fri, May 08, 2026, Naveen N Rao wrote:
> On Wed, May 06, 2026 at 11:47:45AM -0700, Sean Christopherson wrote:
> > Not-signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> > arch/x86/include/asm/kvm_host.h | 2 +
> > arch/x86/kvm/svm/svm.c | 81 +++++++++++++++++++++++++++++++++
> > arch/x86/kvm/vmx/vmx.c | 79 ++++++++++++++++++++++++++++++++
> > arch/x86/kvm/x86.c | 2 +
> > 4 files changed, 164 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index c470e40a00aa..bff534bd00dc 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1703,6 +1703,8 @@ struct kvm_vcpu_stat {
> > u64 invlpg;
> >
> > u64 exits;
> > + u64 guest_induced_exits;
> > + u64 msr_exits;
> > u64 io_exits;
> > u64 mmio_exits;
> > u64 signal_exits;
>
> This looks promising. I'm assuming 'hack' in the title is only meant to
> indicate the PoC nature of this?
More that I don't think I'll ever propose merging anything like this.
> Taking this forward, introducing a similar bucket for all AVIC/APICv
> related exits might help with a few tests.
I don't have any plans to take this forward. guest_induced_exits alone simply
isn't useful enough, even for tests. Outside of tests, I don't think it has any
usefulness, at all.
For tests and for real-world usage, we really do need per-exit tracking for it
to be useful. Maybe with some "bundling" allowed for exception vectors? We can
hack in one-off things like MSR exits, but either we'll have to be super hypocritical
in choosing which use cases are justified and which are not, or we'll have created
a slippery slope by adding a per-exit stat, i.e. we'd just be delaying the inevitable.
For selftests, which is really the only test framework that can utilize stats in
this way, BPF is probably a better answer, at least for the kernel, and probably
for selftests in the long-run as well. E.g. if we can make it easy-ish to use BPF
in selftests (which is a tall order), then we can write tests that do *very* fancy
validation of KVM behavior, e.g. by peeking at other vCPU state in the context of
each and every exit.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 4/5] *** DO NOT MERGE *** KVM: x86: Hack in a stat to track guest-induced exits (for testing)
2026-05-08 17:49 ` Sean Christopherson
@ 2026-05-09 5:08 ` Naveen N Rao
0 siblings, 0 replies; 19+ messages in thread
From: Naveen N Rao @ 2026-05-09 5:08 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
On Fri, May 08, 2026 at 10:49:42AM -0700, Sean Christopherson wrote:
> On Fri, May 08, 2026, Naveen N Rao wrote:
> > On Wed, May 06, 2026 at 11:47:45AM -0700, Sean Christopherson wrote:
> > > Not-signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > > arch/x86/include/asm/kvm_host.h | 2 +
> > > arch/x86/kvm/svm/svm.c | 81 +++++++++++++++++++++++++++++++++
> > > arch/x86/kvm/vmx/vmx.c | 79 ++++++++++++++++++++++++++++++++
> > > arch/x86/kvm/x86.c | 2 +
> > > 4 files changed, 164 insertions(+)
> > >
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index c470e40a00aa..bff534bd00dc 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1703,6 +1703,8 @@ struct kvm_vcpu_stat {
> > > u64 invlpg;
> > >
> > > u64 exits;
> > > + u64 guest_induced_exits;
> > > + u64 msr_exits;
> > > u64 io_exits;
> > > u64 mmio_exits;
> > > u64 signal_exits;
> >
> > This looks promising. I'm assuming 'hack' in the title is only meant to
> > indicate the PoC nature of this?
>
> More that I don't think I'll ever propose merging anything like this.
>
> > Taking this forward, introducing a similar bucket for all AVIC/APICv
> > related exits might help with a few tests.
>
> I don't have any plans to take this forward. guest_induced_exits alone simply
> isn't useful enough, even for tests. Outside of tests, I don't think it has any
> usefulness, at all.
>
> For tests and for real-world usage, we really do need per-exit tracking for it
> to be useful. Maybe with some "bundling" allowed for exception vectors? We can
> hack in one-off things like MSR exits, but either we'll have to be super hypocritical
> in choosing which use cases are justified and which are not, or we'll have created
> a slippery slope by adding a per-exit stat, i.e. we'd just be delaying the inevitable.
Ack.
>
> For selftests, which is really the only test framework that can utilize stats in
> this way, BPF is probably a better answer, at least for the kernel, and probably
> for selftests in the long-run as well. E.g. if we can make it easy-ish to use BPF
> in selftests (which is a tall order), then we can write tests that do *very* fancy
> validation of KVM behavior, e.g. by peeking at other vCPU state in the context of
> each and every exit.
That's good to hear - tracing was the alternative I had in mind, so it's
good to know that's an option we can look at. Just hooking into the
tracepoints should enable a lot of tests to begin with.
Thanks,
Naveen
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 0/5] KVM: SVM: Fix x2AVIC MSR interception issues
2026-05-06 18:47 [PATCH v2 0/5] KVM: SVM: Fix x2AVIC MSR interception issues Sean Christopherson
` (4 preceding siblings ...)
2026-05-06 18:47 ` [PATCH v2 5/5] *** DO NOT MERGE *** KVM: selftests: Add hacky test to verify x2APIC MSR interception Sean Christopherson
@ 2026-05-09 5:10 ` Naveen N Rao
5 siblings, 0 replies; 19+ messages in thread
From: Naveen N Rao @ 2026-05-09 5:10 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
On Wed, May 06, 2026 at 11:47:41AM -0700, Sean Christopherson wrote:
> Fix a variety of bugs in SVM's handling of x2APIC MSR passthrough for x2AVIC,
> where KVM disables interception for MSR accesses that aren't accelerated by
> hardware (pointless and suboptimal), and also does NOT disable interception
> for practically any of the "range of vectors" MSRs, i.e. IRR, ISR, and TMR.
>
> Found by inspection when reviewing a TDX patch to fix a bug where KVM botched
> the "range of vectors"[*] (I was curious how other KVM code handled the ranges;
> wasn't expecting this...).
>
> Note, I tagged all of this for stable, but I could be convinced these fixes
> shouldn't be sent to LTS trees. Patch 3 in particular doesn't truly fix
> anything, though I definitely don't like relying on poorly documented behavior.
>
> Note #2, the diff stats are misleading due to the hacks, the "real" stats are:
>
> arch/x86/kvm/svm/avic.c | 51 ++++++++++++++++-----------------------------------
> 1 file changed, 16 insertions(+), 35 deletions(-)
For the series (except the selftests), with the minor changes we
discussed:
Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org>
- Naveen
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2026-05-09 5:13 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-06 18:47 [PATCH v2 0/5] KVM: SVM: Fix x2AVIC MSR interception issues Sean Christopherson
2026-05-06 18:47 ` [PATCH v2 1/5] KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually supports Sean Christopherson
2026-05-07 13:56 ` Naveen N Rao
2026-05-07 14:27 ` Sean Christopherson
2026-05-08 16:35 ` Naveen N Rao
2026-05-06 18:47 ` [PATCH v2 2/5] KVM: SVM: Always intercept RDMSR for TMCCT (current APIC timer count) Sean Christopherson
2026-05-07 14:19 ` Naveen N Rao
2026-05-07 15:44 ` Sean Christopherson
2026-05-07 18:26 ` Sean Christopherson
2026-05-08 16:41 ` Naveen N Rao
2026-05-08 16:56 ` Sean Christopherson
2026-05-06 18:47 ` [PATCH v2 3/5] KVM: SVM: Only disable x2AVIC WRMSR interception for MSRs that are accelerated Sean Christopherson
2026-05-08 16:59 ` Naveen N Rao
2026-05-06 18:47 ` [PATCH v2 4/5] *** DO NOT MERGE *** KVM: x86: Hack in a stat to track guest-induced exits (for testing) Sean Christopherson
2026-05-08 17:14 ` Naveen N Rao
2026-05-08 17:49 ` Sean Christopherson
2026-05-09 5:08 ` Naveen N Rao
2026-05-06 18:47 ` [PATCH v2 5/5] *** DO NOT MERGE *** KVM: selftests: Add hacky test to verify x2APIC MSR interception Sean Christopherson
2026-05-09 5:10 ` [PATCH v2 0/5] KVM: SVM: Fix x2AVIC MSR interception issues Naveen N Rao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox