* [PATCH v3 0/3] KVM: SVM: Fix x2AVIC MSR interception issues
@ 2026-05-14 21:31 Sean Christopherson
2026-05-14 21:31 ` [PATCH v3 1/3] KVM: x86: Add dedicated API for getting mask of accelerated x2APIC MSRs Sean Christopherson
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:31 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Naveen N Rao
Fix a variety of bugs in SVM's handling of x2APIC MSR passthrough for x2AVIC,
where KVM disables interception for MSR accesses that aren't accelerated by
hardware (pointless and suboptimal), and also does NOT disable interception
for practically any of the "range of vectors" MSRs, i.e. IRR, ISR, and TMR.
Note, I tagged all of this for stable, but I could be convinced these fixes
shouldn't be sent to LTS trees, as there are no functional bugs being fixed.
v3:
- Consolidate list generation for APICv and x2AVIC RDMSR passthrough (and
avoid the wonky post-iteration fixup in the process). [Naveen]
- Collect reviews. [Naveen]
- Drop the hacky selftest (it's still available in v2).
- Massage the changelog for patch 3 to call out that at least one section
of the APM does document that #GP has priority over the AVIC checks.
[Naveen]
- Document the impact on TMCCT in patch 2. [Naveen]
v2:
- https://lore.kernel.org/all/20260506184746.2719880-1-seanjc@google.com
- Actually iterate over the mask of readable regs. [Naveen]
- Rewrite the changelog for patch 3 to more accurately capture what happens,
and to avoid conflating "unaccelerated" with "fault-like". [Naveen]
- Massage the changlog for patch 1 to describe the observed behavior of
DFR and ICR2.
- Test the #VMEXIT (or not) behavior with hacks (patches 4 and 5).
v1: https://lore.kernel.org/all/20260409222449.2013847-1-seanjc@google.com
Sean Christopherson (3):
KVM: x86: Add dedicated API for getting mask of accelerated x2APIC
MSRs
KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually
supports
KVM: SVM: Only disable x2AVIC WRMSR interception for MSRs that are
accelerated
arch/x86/kvm/lapic.c | 21 ++++++++++++++++--
arch/x86/kvm/lapic.h | 2 +-
arch/x86/kvm/svm/avic.c | 47 +++++++++++------------------------------
arch/x86/kvm/vmx/vmx.c | 3 +--
4 files changed, 33 insertions(+), 40 deletions(-)
base-commit: a9512a611bd030088f13477258d1f8103cceaa40
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v3 1/3] KVM: x86: Add dedicated API for getting mask of accelerated x2APIC MSRs
2026-05-14 21:31 [PATCH v3 0/3] KVM: SVM: Fix x2AVIC MSR interception issues Sean Christopherson
@ 2026-05-14 21:31 ` Sean Christopherson
2026-05-14 21:31 ` [PATCH v3 2/3] KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually supports Sean Christopherson
2026-05-14 21:31 ` [PATCH v3 3/3] KVM: SVM: Only disable x2AVIC WRMSR interception for MSRs that are accelerated Sean Christopherson
2 siblings, 0 replies; 4+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:31 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Naveen N Rao
Add a dedicated local APIC API, kvm_x2apic_disable_intercept_reg_mask(),
to provide the mask of x2APIC registers whose MSRs can and should be passed
through to the guest when x2APIC virtualization is enable, and use it in
lieu of the open-coded equivalent VMX logic. Providing a common helper
will allow sharing the logic with SVM (x2AVIC), and as a bonus eliminates
the somewhat confusing code where KVM enables interception for MSR_TYPE_RW,
even though only the READ case actually needs to be updated.
No functional change intended.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/lapic.c | 21 +++++++++++++++++++--
arch/x86/kvm/lapic.h | 2 +-
arch/x86/kvm/vmx/vmx.c | 3 +--
3 files changed, 21 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 4078e624ca66..4e34f75e705d 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1730,7 +1730,7 @@ static inline struct kvm_lapic *to_lapic(struct kvm_io_device *dev)
#define APIC_REGS_MASK(first, count) \
(APIC_REG_MASK(first) * ((1ull << (count)) - 1))
-u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic)
+static u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic)
{
/* Leave bits '0' for reserved and write-only registers. */
u64 valid_reg_mask =
@@ -1766,7 +1766,24 @@ u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic)
return valid_reg_mask;
}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lapic_readable_reg_mask);
+
+u64 kvm_x2apic_disable_read_intercept_reg_mask(struct kvm_vcpu *vcpu)
+{
+ if (WARN_ON_ONCE(!lapic_in_kernel(vcpu)))
+ return 0;
+
+ /*
+ * TMMCT, a.k.a. the current APIC timer count, reads aren't accelerated
+ * by hardware (Intel or AMD) as the timer is emulated in software (by
+ * KVM), i.e. reads from the virtual APIC page would return garbage.
+ * Intercept RDMSR, as handling the fault-like APIC-access VM-Exit is
+ * more expensive than handling a RDMSR VM-Exit (the APIC-access exit
+ * requires slow emulation of the code stream).
+ */
+ return kvm_lapic_readable_reg_mask(vcpu->arch.apic) &
+ ~APIC_REG_MASK(APIC_TMCCT);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_x2apic_disable_read_intercept_reg_mask);
static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
void *data)
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 274885af4ebc..f763cd29a508 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -156,7 +156,7 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
int kvm_lapic_set_pv_eoi(struct kvm_vcpu *vcpu, u64 data, unsigned long len);
void kvm_lapic_exit(void);
-u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic);
+u64 kvm_x2apic_disable_read_intercept_reg_mask(struct kvm_vcpu *vcpu);
static inline void kvm_lapic_set_irr(int vec, struct kvm_lapic *apic)
{
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b02d176800f8..a23a144eef13 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4156,7 +4156,7 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu)
* mode, only the current timer count needs on-demand emulation by KVM.
*/
if (mode & MSR_BITMAP_MODE_X2APIC_APICV)
- msr_bitmap[read_idx] = ~kvm_lapic_readable_reg_mask(vcpu->arch.apic);
+ msr_bitmap[read_idx] = ~kvm_x2apic_disable_read_intercept_reg_mask(vcpu);
else
msr_bitmap[read_idx] = ~0ull;
msr_bitmap[write_idx] = ~0ull;
@@ -4169,7 +4169,6 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu)
!(mode & MSR_BITMAP_MODE_X2APIC));
if (mode & MSR_BITMAP_MODE_X2APIC_APICV) {
- vmx_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_RW);
vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_EOI), MSR_TYPE_W);
vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W);
if (enable_ipiv)
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v3 2/3] KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually supports
2026-05-14 21:31 [PATCH v3 0/3] KVM: SVM: Fix x2AVIC MSR interception issues Sean Christopherson
2026-05-14 21:31 ` [PATCH v3 1/3] KVM: x86: Add dedicated API for getting mask of accelerated x2APIC MSRs Sean Christopherson
@ 2026-05-14 21:31 ` Sean Christopherson
2026-05-14 21:31 ` [PATCH v3 3/3] KVM: SVM: Only disable x2AVIC WRMSR interception for MSRs that are accelerated Sean Christopherson
2 siblings, 0 replies; 4+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:31 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Naveen N Rao
When toggling x2AVIC on/off, use KVM's curated mask of x2APIC MSRs that
can/should be passed through to the guest (or not) when 2AVIC is enabled.
Using the effective list provided by the local APIC emulation fixes
multiple (classes of) bugs, as the existing hand-coded list of MSRs is
wrong on multiple fronts:
- ARBPRI isn't supported by x2APIC, but its unaccelerated AVIC intercept
is fault-like; disabling interception is nonsensical and suboptimal as
the access generates a #VMEXIT that requires decoding the instruction.
- DFR and ICR2 aren't supported by x2APIC and so don't need their
intercepts disabled for performance reasons. While the #GP due to
x2APIC being abled has higher priority than the trap-like #VMEXIT,
disabling interception of unsupported MSRs is confusing and unnecessary.
- RRR is completely unsupported.
- AVIC currently fails to pass through the "range of vectors" registers,
IRR, ISR, and TMR, as e.g. X2APIC_MSR(APIC_IRR) only affects IRR0, and
thus only disables intercept for vectors 31:0 (which are the *least*
interesting registers).
- TMCCT (the current APIC timer count) isn't accelerated by hardware, and
generates a fault-like AVIC_UNACCELERATED_ACCESS #VMEXIT, i.e. requires
KVM to decode the instruction to figure out what the guest was trying to
access. Note, the only reason this isn't a fatal bug is that the AVIC
architecture had the foresight to guard against buggy hypervisors. E.g.
if hardware simply read from the virtual APIC page, the guest would get
garbage (because the timer is emulated in software).
Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Cc: stable@vger.kernel.org
Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/avic.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index adf211860949..8e4926c7b8dc 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -122,6 +122,9 @@ static u32 x2avic_max_physical_id;
static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm,
bool intercept)
{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ u64 rd_regs;
+
static const u32 x2avic_passthrough_msrs[] = {
X2APIC_MSR(APIC_ID),
X2APIC_MSR(APIC_LVR),
@@ -162,9 +165,15 @@ static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm,
if (!x2avic_enabled)
return;
+ rd_regs = kvm_x2apic_disable_read_intercept_reg_mask(vcpu);
+
+ for_each_set_bit(i, (unsigned long *)&rd_regs, BITS_PER_TYPE(rd_regs))
+ svm_set_intercept_for_msr(vcpu, APIC_BASE_MSR + i,
+ MSR_TYPE_R, intercept);
+
for (i = 0; i < ARRAY_SIZE(x2avic_passthrough_msrs); i++)
- svm_set_intercept_for_msr(&svm->vcpu, x2avic_passthrough_msrs[i],
- MSR_TYPE_RW, intercept);
+ svm_set_intercept_for_msr(vcpu, x2avic_passthrough_msrs[i],
+ MSR_TYPE_W, intercept);
svm->x2avic_msrs_intercepted = intercept;
}
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v3 3/3] KVM: SVM: Only disable x2AVIC WRMSR interception for MSRs that are accelerated
2026-05-14 21:31 [PATCH v3 0/3] KVM: SVM: Fix x2AVIC MSR interception issues Sean Christopherson
2026-05-14 21:31 ` [PATCH v3 1/3] KVM: x86: Add dedicated API for getting mask of accelerated x2APIC MSRs Sean Christopherson
2026-05-14 21:31 ` [PATCH v3 2/3] KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually supports Sean Christopherson
@ 2026-05-14 21:31 ` Sean Christopherson
2 siblings, 0 replies; 4+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:31 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Naveen N Rao
When x2AVIC is enabled, disable WRMSR interception only for MSRs that are
actually accelerated by hardware. Disabling interception for MSRs that
aren't accelerated is functionally "fine", and in some cases a weird "win"
for performance, but only for cases that should never be triggered by a
well-behaved VM (writes to read-only registers; the #GP will typically
occur in the guest without taking a #VMEXIT, even for fault-like exits).
But overall, disabling interception for MSRs that aren't accelerated is at
best confusing and unintuitive, and at worst introduces avoidable risk, as
the APM's documentation is imperfect and contradictory. The table in
"15.29.3.1 Virtual APIC Register Accesses" of simply states that such
writes generate exits, where as "Section 15.29.10 x2AVIC" says:
x2APIC MSR intercept checks and access checks have higher priority than
AVIC access permission checks.
CPU behavior follows the latter (which makes perfect sense), but all in
all there's simply no reason to disable interception just to make a #GP
faster.
Note, the set of MSRs that are passed through for write is identical to
VMX's set when IPI virtualization is enabled. This is not a coincidence,
and is another motiviating factor for cleaning up the intercepts, as x2AVIC
is functionally equivalent to APICv+IPIv.
Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Cc: stable@vger.kernel.org
Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/avic.c | 40 ++++------------------------------------
1 file changed, 4 insertions(+), 36 deletions(-)
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 8e4926c7b8dc..724a45c2aa23 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -124,39 +124,6 @@ static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm,
{
struct kvm_vcpu *vcpu = &svm->vcpu;
u64 rd_regs;
-
- static const u32 x2avic_passthrough_msrs[] = {
- X2APIC_MSR(APIC_ID),
- X2APIC_MSR(APIC_LVR),
- X2APIC_MSR(APIC_TASKPRI),
- X2APIC_MSR(APIC_ARBPRI),
- X2APIC_MSR(APIC_PROCPRI),
- X2APIC_MSR(APIC_EOI),
- X2APIC_MSR(APIC_RRR),
- X2APIC_MSR(APIC_LDR),
- X2APIC_MSR(APIC_DFR),
- X2APIC_MSR(APIC_SPIV),
- X2APIC_MSR(APIC_ISR),
- X2APIC_MSR(APIC_TMR),
- X2APIC_MSR(APIC_IRR),
- X2APIC_MSR(APIC_ESR),
- X2APIC_MSR(APIC_ICR),
- X2APIC_MSR(APIC_ICR2),
-
- /*
- * Note! Always intercept LVTT, as TSC-deadline timer mode
- * isn't virtualized by hardware, and the CPU will generate a
- * #GP instead of a #VMEXIT.
- */
- X2APIC_MSR(APIC_LVTTHMR),
- X2APIC_MSR(APIC_LVTPC),
- X2APIC_MSR(APIC_LVT0),
- X2APIC_MSR(APIC_LVT1),
- X2APIC_MSR(APIC_LVTERR),
- X2APIC_MSR(APIC_TMICT),
- X2APIC_MSR(APIC_TMCCT),
- X2APIC_MSR(APIC_TDCR),
- };
int i;
if (intercept == svm->x2avic_msrs_intercepted)
@@ -171,9 +138,10 @@ static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm,
svm_set_intercept_for_msr(vcpu, APIC_BASE_MSR + i,
MSR_TYPE_R, intercept);
- for (i = 0; i < ARRAY_SIZE(x2avic_passthrough_msrs); i++)
- svm_set_intercept_for_msr(vcpu, x2avic_passthrough_msrs[i],
- MSR_TYPE_W, intercept);
+ svm_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TASKPRI), MSR_TYPE_W, intercept);
+ svm_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_EOI), MSR_TYPE_W, intercept);
+ svm_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W, intercept);
+ svm_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_ICR), MSR_TYPE_W, intercept);
svm->x2avic_msrs_intercepted = intercept;
}
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-05-14 21:31 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-14 21:31 [PATCH v3 0/3] KVM: SVM: Fix x2AVIC MSR interception issues Sean Christopherson
2026-05-14 21:31 ` [PATCH v3 1/3] KVM: x86: Add dedicated API for getting mask of accelerated x2APIC MSRs Sean Christopherson
2026-05-14 21:31 ` [PATCH v3 2/3] KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually supports Sean Christopherson
2026-05-14 21:31 ` [PATCH v3 3/3] KVM: SVM: Only disable x2AVIC WRMSR interception for MSRs that are accelerated Sean Christopherson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.