* [PATCH v3 1/3] KVM: x86: Add dedicated API for getting mask of accelerated x2APIC MSRs
2026-05-14 21:31 [PATCH v3 0/3] KVM: SVM: Fix x2AVIC MSR interception issues Sean Christopherson
@ 2026-05-14 21:31 ` Sean Christopherson
2026-05-15 14:18 ` Naveen N Rao
2026-05-14 21:31 ` [PATCH v3 2/3] KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually supports Sean Christopherson
2026-05-14 21:31 ` [PATCH v3 3/3] KVM: SVM: Only disable x2AVIC WRMSR interception for MSRs that are accelerated Sean Christopherson
2 siblings, 1 reply; 6+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:31 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Naveen N Rao
Add a dedicated local APIC API, kvm_x2apic_disable_intercept_reg_mask(),
to provide the mask of x2APIC registers whose MSRs can and should be passed
through to the guest when x2APIC virtualization is enable, and use it in
lieu of the open-coded equivalent VMX logic. Providing a common helper
will allow sharing the logic with SVM (x2AVIC), and as a bonus eliminates
the somewhat confusing code where KVM enables interception for MSR_TYPE_RW,
even though only the READ case actually needs to be updated.
No functional change intended.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/lapic.c | 21 +++++++++++++++++++--
arch/x86/kvm/lapic.h | 2 +-
arch/x86/kvm/vmx/vmx.c | 3 +--
3 files changed, 21 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 4078e624ca66..4e34f75e705d 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1730,7 +1730,7 @@ static inline struct kvm_lapic *to_lapic(struct kvm_io_device *dev)
#define APIC_REGS_MASK(first, count) \
(APIC_REG_MASK(first) * ((1ull << (count)) - 1))
-u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic)
+static u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic)
{
/* Leave bits '0' for reserved and write-only registers. */
u64 valid_reg_mask =
@@ -1766,7 +1766,24 @@ u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic)
return valid_reg_mask;
}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lapic_readable_reg_mask);
+
+u64 kvm_x2apic_disable_read_intercept_reg_mask(struct kvm_vcpu *vcpu)
+{
+ if (WARN_ON_ONCE(!lapic_in_kernel(vcpu)))
+ return 0;
+
+ /*
+ * TMMCT, a.k.a. the current APIC timer count, reads aren't accelerated
+ * by hardware (Intel or AMD) as the timer is emulated in software (by
+ * KVM), i.e. reads from the virtual APIC page would return garbage.
+ * Intercept RDMSR, as handling the fault-like APIC-access VM-Exit is
+ * more expensive than handling a RDMSR VM-Exit (the APIC-access exit
+ * requires slow emulation of the code stream).
+ */
+ return kvm_lapic_readable_reg_mask(vcpu->arch.apic) &
+ ~APIC_REG_MASK(APIC_TMCCT);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_x2apic_disable_read_intercept_reg_mask);
static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
void *data)
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 274885af4ebc..f763cd29a508 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -156,7 +156,7 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
int kvm_lapic_set_pv_eoi(struct kvm_vcpu *vcpu, u64 data, unsigned long len);
void kvm_lapic_exit(void);
-u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic);
+u64 kvm_x2apic_disable_read_intercept_reg_mask(struct kvm_vcpu *vcpu);
static inline void kvm_lapic_set_irr(int vec, struct kvm_lapic *apic)
{
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b02d176800f8..a23a144eef13 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4156,7 +4156,7 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu)
* mode, only the current timer count needs on-demand emulation by KVM.
*/
if (mode & MSR_BITMAP_MODE_X2APIC_APICV)
- msr_bitmap[read_idx] = ~kvm_lapic_readable_reg_mask(vcpu->arch.apic);
+ msr_bitmap[read_idx] = ~kvm_x2apic_disable_read_intercept_reg_mask(vcpu);
else
msr_bitmap[read_idx] = ~0ull;
msr_bitmap[write_idx] = ~0ull;
@@ -4169,7 +4169,6 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu)
!(mode & MSR_BITMAP_MODE_X2APIC));
if (mode & MSR_BITMAP_MODE_X2APIC_APICV) {
- vmx_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_RW);
vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_EOI), MSR_TYPE_W);
vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W);
if (enable_ipiv)
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH v3 1/3] KVM: x86: Add dedicated API for getting mask of accelerated x2APIC MSRs
2026-05-14 21:31 ` [PATCH v3 1/3] KVM: x86: Add dedicated API for getting mask of accelerated x2APIC MSRs Sean Christopherson
@ 2026-05-15 14:18 ` Naveen N Rao
0 siblings, 0 replies; 6+ messages in thread
From: Naveen N Rao @ 2026-05-15 14:18 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
On Thu, May 14, 2026 at 02:31:13PM -0700, Sean Christopherson wrote:
> Add a dedicated local APIC API, kvm_x2apic_disable_intercept_reg_mask(),
> to provide the mask of x2APIC registers whose MSRs can and should be passed
> through to the guest when x2APIC virtualization is enable, and use it in
> lieu of the open-coded equivalent VMX logic. Providing a common helper
> will allow sharing the logic with SVM (x2AVIC), and as a bonus eliminates
> the somewhat confusing code where KVM enables interception for MSR_TYPE_RW,
> even though only the READ case actually needs to be updated.
I thought you discarded this patch based on your response to v2. I now
realize you were only referring to not unifying the WRMSR interception
paths (d'uh!)
This LGTM.
Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org>
Thanks,
Naveen
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3 2/3] KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually supports
2026-05-14 21:31 [PATCH v3 0/3] KVM: SVM: Fix x2AVIC MSR interception issues Sean Christopherson
2026-05-14 21:31 ` [PATCH v3 1/3] KVM: x86: Add dedicated API for getting mask of accelerated x2APIC MSRs Sean Christopherson
@ 2026-05-14 21:31 ` Sean Christopherson
2026-05-15 14:19 ` Naveen N Rao
2026-05-14 21:31 ` [PATCH v3 3/3] KVM: SVM: Only disable x2AVIC WRMSR interception for MSRs that are accelerated Sean Christopherson
2 siblings, 1 reply; 6+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:31 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Naveen N Rao
When toggling x2AVIC on/off, use KVM's curated mask of x2APIC MSRs that
can/should be passed through to the guest (or not) when 2AVIC is enabled.
Using the effective list provided by the local APIC emulation fixes
multiple (classes of) bugs, as the existing hand-coded list of MSRs is
wrong on multiple fronts:
- ARBPRI isn't supported by x2APIC, but its unaccelerated AVIC intercept
is fault-like; disabling interception is nonsensical and suboptimal as
the access generates a #VMEXIT that requires decoding the instruction.
- DFR and ICR2 aren't supported by x2APIC and so don't need their
intercepts disabled for performance reasons. While the #GP due to
x2APIC being abled has higher priority than the trap-like #VMEXIT,
disabling interception of unsupported MSRs is confusing and unnecessary.
- RRR is completely unsupported.
- AVIC currently fails to pass through the "range of vectors" registers,
IRR, ISR, and TMR, as e.g. X2APIC_MSR(APIC_IRR) only affects IRR0, and
thus only disables intercept for vectors 31:0 (which are the *least*
interesting registers).
- TMCCT (the current APIC timer count) isn't accelerated by hardware, and
generates a fault-like AVIC_UNACCELERATED_ACCESS #VMEXIT, i.e. requires
KVM to decode the instruction to figure out what the guest was trying to
access. Note, the only reason this isn't a fatal bug is that the AVIC
architecture had the foresight to guard against buggy hypervisors. E.g.
if hardware simply read from the virtual APIC page, the guest would get
garbage (because the timer is emulated in software).
Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Cc: stable@vger.kernel.org
Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/avic.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index adf211860949..8e4926c7b8dc 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -122,6 +122,9 @@ static u32 x2avic_max_physical_id;
static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm,
bool intercept)
{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ u64 rd_regs;
+
static const u32 x2avic_passthrough_msrs[] = {
X2APIC_MSR(APIC_ID),
X2APIC_MSR(APIC_LVR),
@@ -162,9 +165,15 @@ static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm,
if (!x2avic_enabled)
return;
+ rd_regs = kvm_x2apic_disable_read_intercept_reg_mask(vcpu);
+
+ for_each_set_bit(i, (unsigned long *)&rd_regs, BITS_PER_TYPE(rd_regs))
+ svm_set_intercept_for_msr(vcpu, APIC_BASE_MSR + i,
+ MSR_TYPE_R, intercept);
+
for (i = 0; i < ARRAY_SIZE(x2avic_passthrough_msrs); i++)
- svm_set_intercept_for_msr(&svm->vcpu, x2avic_passthrough_msrs[i],
- MSR_TYPE_RW, intercept);
+ svm_set_intercept_for_msr(vcpu, x2avic_passthrough_msrs[i],
+ MSR_TYPE_W, intercept);
svm->x2avic_msrs_intercepted = intercept;
}
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH v3 2/3] KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually supports
2026-05-14 21:31 ` [PATCH v3 2/3] KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually supports Sean Christopherson
@ 2026-05-15 14:19 ` Naveen N Rao
0 siblings, 0 replies; 6+ messages in thread
From: Naveen N Rao @ 2026-05-15 14:19 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
On Thu, May 14, 2026 at 02:31:14PM -0700, Sean Christopherson wrote:
> When toggling x2AVIC on/off, use KVM's curated mask of x2APIC MSRs that
> can/should be passed through to the guest (or not) when 2AVIC is enabled.
> Using the effective list provided by the local APIC emulation fixes
> multiple (classes of) bugs, as the existing hand-coded list of MSRs is
> wrong on multiple fronts:
>
> - ARBPRI isn't supported by x2APIC, but its unaccelerated AVIC intercept
> is fault-like; disabling interception is nonsensical and suboptimal as
> the access generates a #VMEXIT that requires decoding the instruction.
Nit: unless you decided against incorporating the changes we discussed
in v2, please update this part while applying.
- Naveen
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3 3/3] KVM: SVM: Only disable x2AVIC WRMSR interception for MSRs that are accelerated
2026-05-14 21:31 [PATCH v3 0/3] KVM: SVM: Fix x2AVIC MSR interception issues Sean Christopherson
2026-05-14 21:31 ` [PATCH v3 1/3] KVM: x86: Add dedicated API for getting mask of accelerated x2APIC MSRs Sean Christopherson
2026-05-14 21:31 ` [PATCH v3 2/3] KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually supports Sean Christopherson
@ 2026-05-14 21:31 ` Sean Christopherson
2 siblings, 0 replies; 6+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:31 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Naveen N Rao
When x2AVIC is enabled, disable WRMSR interception only for MSRs that are
actually accelerated by hardware. Disabling interception for MSRs that
aren't accelerated is functionally "fine", and in some cases a weird "win"
for performance, but only for cases that should never be triggered by a
well-behaved VM (writes to read-only registers; the #GP will typically
occur in the guest without taking a #VMEXIT, even for fault-like exits).
But overall, disabling interception for MSRs that aren't accelerated is at
best confusing and unintuitive, and at worst introduces avoidable risk, as
the APM's documentation is imperfect and contradictory. The table in
"15.29.3.1 Virtual APIC Register Accesses" of simply states that such
writes generate exits, where as "Section 15.29.10 x2AVIC" says:
x2APIC MSR intercept checks and access checks have higher priority than
AVIC access permission checks.
CPU behavior follows the latter (which makes perfect sense), but all in
all there's simply no reason to disable interception just to make a #GP
faster.
Note, the set of MSRs that are passed through for write is identical to
VMX's set when IPI virtualization is enabled. This is not a coincidence,
and is another motiviating factor for cleaning up the intercepts, as x2AVIC
is functionally equivalent to APICv+IPIv.
Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Cc: stable@vger.kernel.org
Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/avic.c | 40 ++++------------------------------------
1 file changed, 4 insertions(+), 36 deletions(-)
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 8e4926c7b8dc..724a45c2aa23 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -124,39 +124,6 @@ static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm,
{
struct kvm_vcpu *vcpu = &svm->vcpu;
u64 rd_regs;
-
- static const u32 x2avic_passthrough_msrs[] = {
- X2APIC_MSR(APIC_ID),
- X2APIC_MSR(APIC_LVR),
- X2APIC_MSR(APIC_TASKPRI),
- X2APIC_MSR(APIC_ARBPRI),
- X2APIC_MSR(APIC_PROCPRI),
- X2APIC_MSR(APIC_EOI),
- X2APIC_MSR(APIC_RRR),
- X2APIC_MSR(APIC_LDR),
- X2APIC_MSR(APIC_DFR),
- X2APIC_MSR(APIC_SPIV),
- X2APIC_MSR(APIC_ISR),
- X2APIC_MSR(APIC_TMR),
- X2APIC_MSR(APIC_IRR),
- X2APIC_MSR(APIC_ESR),
- X2APIC_MSR(APIC_ICR),
- X2APIC_MSR(APIC_ICR2),
-
- /*
- * Note! Always intercept LVTT, as TSC-deadline timer mode
- * isn't virtualized by hardware, and the CPU will generate a
- * #GP instead of a #VMEXIT.
- */
- X2APIC_MSR(APIC_LVTTHMR),
- X2APIC_MSR(APIC_LVTPC),
- X2APIC_MSR(APIC_LVT0),
- X2APIC_MSR(APIC_LVT1),
- X2APIC_MSR(APIC_LVTERR),
- X2APIC_MSR(APIC_TMICT),
- X2APIC_MSR(APIC_TMCCT),
- X2APIC_MSR(APIC_TDCR),
- };
int i;
if (intercept == svm->x2avic_msrs_intercepted)
@@ -171,9 +138,10 @@ static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm,
svm_set_intercept_for_msr(vcpu, APIC_BASE_MSR + i,
MSR_TYPE_R, intercept);
- for (i = 0; i < ARRAY_SIZE(x2avic_passthrough_msrs); i++)
- svm_set_intercept_for_msr(vcpu, x2avic_passthrough_msrs[i],
- MSR_TYPE_W, intercept);
+ svm_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TASKPRI), MSR_TYPE_W, intercept);
+ svm_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_EOI), MSR_TYPE_W, intercept);
+ svm_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W, intercept);
+ svm_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_ICR), MSR_TYPE_W, intercept);
svm->x2avic_msrs_intercepted = intercept;
}
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread