Kernel KVM virtualization development
 help / color / mirror / Atom feed
* [PATCH v2 00/32] KVM: x86: Clean up MSR interception code
@ 2025-06-10 22:57 Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 01/32] KVM: SVM: Disable interception of SPEC_CTRL iff the MSR exists for the guest Sean Christopherson
                   ` (33 more replies)
  0 siblings, 34 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Clean up KVM's MSR interception code (especially the SVM code, which is all
kinds of ugly).  The main goals are to:

 - Make the SVM and VMX APIs consistent (and sane; the current SVM APIs have
   inverted polarity).

 - Eliminate the shadow bitmaps that are used to determine intercepts on
   userspace MSR filter update.

v2:
 - Add a patch to set MSR_IA32_SPEC_CTRL interception as appropriate. [Chao]
 - Add a patch to cleanup {svm,vmx}_disable_intercept_for_msr() once the
   dust has settled. [Dapeng]
 - Return -ENOSPC if msrpm_offsets[] is full. [Chao]
 - Free iopm_pages directly instead of bouncing through iopm_base. [Chao]
 - Check for "offset == MSR_INVALID" before using offset. [Chao]
 - Temporarily keep MSR_IA32_DEBUGCTLMSR in the nested list. [Chao]
 - Add a comment to explain nested_svm_msrpm_merge_offsets. [Chao]
 - Add a patch to shift the IOPM allocation to avoid having to unwind it.
 - Init nested_svm_msrpm_merge_offsets iff nested=1. [Chao]
 - Add a helper to dedup alloc+init of MSRPM and IOPM.
 - Tag merge_msrs as "static" and "__initconst". [Paolo]
 - Rework helpers to use fewer macros. [Paolo]
 - Account for each MSRPM byte covering 4 MSRs. [Paolo]
 - Opportunistically use cpu_feature_enabled(). [Xin]
 - Fully remove MAX_DIRECT_ACCESS_MSRS, MSRPM_OFFSETS, and msrpm_offsets.
   [Francesco]
 - Fix typos. [Dapeng, Chao]
 - Collect reviews. [Chao, Dapeng, Xin]

v1: https://lore.kernel.org/all/20250529234013.3826933-1-seanjc@google.com

v0: https://lore.kernel.org/kvm/20241127201929.4005605-1-aaronlewis@google.com

Sean Christopherson (32):
  KVM: SVM: Disable interception of SPEC_CTRL iff the MSR exists for the
    guest
  KVM: SVM: Allocate IOPM pages after initial setup in
    svm_hardware_setup()
  KVM: SVM: Don't BUG if setting up the MSR intercept bitmaps fails
  KVM: SVM: Tag MSR bitmap initialization helpers with __init
  KVM: SVM: Use ARRAY_SIZE() to iterate over direct_access_msrs
  KVM: SVM: Kill the VM instead of the host if MSR interception is buggy
  KVM: x86: Use non-atomic bit ops to manipulate "shadow" MSR intercepts
  KVM: SVM: Massage name and param of helper that merges vmcb01 and
    vmcb12 MSRPMs
  KVM: SVM: Clean up macros related to architectural MSRPM definitions
  KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1
    bitmaps
  KVM: nSVM: Omit SEV-ES specific passthrough MSRs from L0+L1 bitmap
    merge
  KVM: nSVM: Don't initialize vmcb02 MSRPM with vmcb01's "always
    passthrough"
  KVM: SVM: Add helpers for accessing MSR bitmap that don't rely on
    offsets
  KVM: SVM: Implement and adopt VMX style MSR intercepts APIs
  KVM: SVM: Pass through GHCB MSR if and only if VM is an SEV-ES guest
  KVM: SVM: Drop "always" flag from list of possible passthrough MSRs
  KVM: x86: Move definition of X2APIC_MSR() to lapic.h
  KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter
    change
  KVM: SVM: Manually recalc all MSR intercepts on userspace MSR filter
    change
  KVM: x86: Rename msr_filter_changed() => recalc_msr_intercepts()
  KVM: SVM: Rename init_vmcb_after_set_cpuid() to make it intercepts
    specific
  KVM: SVM: Fold svm_vcpu_init_msrpm() into its sole caller
  KVM: SVM: Merge "after set CPUID" intercept recalc helpers
  KVM: SVM: Drop explicit check on MSRPM offset when emulating SEV-ES
    accesses
  KVM: SVM: Move svm_msrpm_offset() to nested.c
  KVM: SVM: Store MSRPM pointer as "void *" instead of "u32 *"
  KVM: nSVM: Access MSRPM in 4-byte chunks only for merging L0 and L1
    bitmaps
  KVM: SVM: Return -EINVAL instead of MSR_INVALID to signal out-of-range
    MSR
  KVM: nSVM: Merge MSRPM in 64-bit chunks on 64-bit kernels
  KVM: SVM: Add a helper to allocate and initialize permissions bitmaps
  KVM: x86: Simplify userspace filter logic when disabling MSR
    interception
  KVM: selftests: Verify KVM disable interception (for userspace) on
    filter change

 arch/x86/include/asm/kvm-x86-ops.h            |   2 +-
 arch/x86/include/asm/kvm_host.h               |   2 +-
 arch/x86/kvm/lapic.h                          |   2 +
 arch/x86/kvm/svm/nested.c                     | 126 +++--
 arch/x86/kvm/svm/sev.c                        |  29 +-
 arch/x86/kvm/svm/svm.c                        | 490 ++++++------------
 arch/x86/kvm/svm/svm.h                        | 102 +++-
 arch/x86/kvm/vmx/main.c                       |   6 +-
 arch/x86/kvm/vmx/vmx.c                        | 202 ++------
 arch/x86/kvm/vmx/vmx.h                        |   9 -
 arch/x86/kvm/vmx/x86_ops.h                    |   2 +-
 arch/x86/kvm/x86.c                            |   8 +-
 .../kvm/x86/userspace_msr_exit_test.c         |   8 +
 13 files changed, 426 insertions(+), 562 deletions(-)


base-commit: 61374cc145f4a56377eaf87c7409a97ec7a34041
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 01/32] KVM: SVM: Disable interception of SPEC_CTRL iff the MSR exists for the guest
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-11  4:38   ` Binbin Wu
  2025-06-10 22:57 ` [PATCH v2 02/32] KVM: SVM: Allocate IOPM pages after initial setup in svm_hardware_setup() Sean Christopherson
                   ` (32 subsequent siblings)
  33 siblings, 1 reply; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Disable interception of SPEC_CTRL when the CPU virtualizes (i.e. context
switches) SPEC_CTRL if and only if the MSR exists according to the vCPU's
CPUID model.  Letting the guest access SPEC_CTRL is generally benign, but
the guest would see inconsistent behavior if KVM happened to emulate an
access to the MSR.

Fixes: d00b99c514b3 ("KVM: SVM: Add support for Virtual SPEC_CTRL")
Reported-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 0ad1a6d4fb6d..21e745acebc3 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1362,11 +1362,14 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 	svm_recalc_instruction_intercepts(vcpu, svm);
 
 	/*
-	 * If the host supports V_SPEC_CTRL then disable the interception
-	 * of MSR_IA32_SPEC_CTRL.
+	 * If the CPU virtualizes MSR_IA32_SPEC_CTRL, i.e. KVM doesn't need to
+	 * manually context switch the MSR, immediately configure interception
+	 * of SPEC_CTRL, without waiting for the guest to access the MSR.
 	 */
 	if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
+		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL,
+				     guest_has_spec_ctrl_msr(vcpu),
+				     guest_has_spec_ctrl_msr(vcpu));
 
 	if (kvm_vcpu_apicv_active(vcpu))
 		avic_init_vmcb(svm, vmcb);
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 02/32] KVM: SVM: Allocate IOPM pages after initial setup in svm_hardware_setup()
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 01/32] KVM: SVM: Disable interception of SPEC_CTRL iff the MSR exists for the guest Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 03/32] KVM: SVM: Don't BUG if setting up the MSR intercept bitmaps fails Sean Christopherson
                   ` (31 subsequent siblings)
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Allocate pages for the IOPM after initial setup has been completed in
svm_hardware_setup(), so that sanity checks can be added in the setup flow
without needing to free the IOPM pages.  The IOPM is only referenced (via
iopm_base) in init_vmcb() and svm_hardware_unsetup(), so there's no need
to allocate it early on.

No functional change intended (beyond the obvious ordering differences,
e.g. if the allocation fails).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 21e745acebc3..262eae46a396 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5505,15 +5505,6 @@ static __init int svm_hardware_setup(void)
 	}
 	kvm_enable_efer_bits(EFER_NX);
 
-	iopm_pages = alloc_pages(GFP_KERNEL, order);
-
-	if (!iopm_pages)
-		return -ENOMEM;
-
-	iopm_va = page_address(iopm_pages);
-	memset(iopm_va, 0xff, PAGE_SIZE * (1 << order));
-	iopm_base = __sme_page_pa(iopm_pages);
-
 	init_msrpm_offsets();
 
 	kvm_caps.supported_xcr0 &= ~(XFEATURE_MASK_BNDREGS |
@@ -5580,6 +5571,15 @@ static __init int svm_hardware_setup(void)
 		else
 			pr_info("LBR virtualization supported\n");
 	}
+
+	iopm_pages = alloc_pages(GFP_KERNEL, order);
+	if (!iopm_pages)
+		return -ENOMEM;
+
+	iopm_va = page_address(iopm_pages);
+	memset(iopm_va, 0xff, PAGE_SIZE * (1 << order));
+	iopm_base = __sme_page_pa(iopm_pages);
+
 	/*
 	 * Note, SEV setup consumes npt_enabled and enable_mmio_caching (which
 	 * may be modified by svm_adjust_mmio_mask()), as well as nrips.
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 03/32] KVM: SVM: Don't BUG if setting up the MSR intercept bitmaps fails
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 01/32] KVM: SVM: Disable interception of SPEC_CTRL iff the MSR exists for the guest Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 02/32] KVM: SVM: Allocate IOPM pages after initial setup in svm_hardware_setup() Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 04/32] KVM: SVM: Tag MSR bitmap initialization helpers with __init Sean Christopherson
                   ` (30 subsequent siblings)
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

WARN and reject module loading if there is a problem with KVM's MSR
interception bitmaps.  Panicking the host in this situation is inexcusable
since it is trivially easy to propagate the error up the stack.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 262eae46a396..f70211780880 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -945,7 +945,7 @@ static void svm_msr_filter_changed(struct kvm_vcpu *vcpu)
 	}
 }
 
-static void add_msr_offset(u32 offset)
+static int add_msr_offset(u32 offset)
 {
 	int i;
 
@@ -953,7 +953,7 @@ static void add_msr_offset(u32 offset)
 
 		/* Offset already in list? */
 		if (msrpm_offsets[i] == offset)
-			return;
+			return 0;
 
 		/* Slot used by another offset? */
 		if (msrpm_offsets[i] != MSR_INVALID)
@@ -962,17 +962,13 @@ static void add_msr_offset(u32 offset)
 		/* Add offset to list */
 		msrpm_offsets[i] = offset;
 
-		return;
+		return 0;
 	}
 
-	/*
-	 * If this BUG triggers the msrpm_offsets table has an overflow. Just
-	 * increase MSRPM_OFFSETS in this case.
-	 */
-	BUG();
+	return -ENOSPC;
 }
 
-static void init_msrpm_offsets(void)
+static int init_msrpm_offsets(void)
 {
 	int i;
 
@@ -982,10 +978,13 @@ static void init_msrpm_offsets(void)
 		u32 offset;
 
 		offset = svm_msrpm_offset(direct_access_msrs[i].index);
-		BUG_ON(offset == MSR_INVALID);
+		if (WARN_ON(offset == MSR_INVALID))
+			return -EIO;
 
-		add_msr_offset(offset);
+		if (WARN_ON_ONCE(add_msr_offset(offset)))
+			return -EIO;
 	}
+	return 0;
 }
 
 void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
@@ -5505,7 +5504,9 @@ static __init int svm_hardware_setup(void)
 	}
 	kvm_enable_efer_bits(EFER_NX);
 
-	init_msrpm_offsets();
+	r = init_msrpm_offsets();
+	if (r)
+		return r;
 
 	kvm_caps.supported_xcr0 &= ~(XFEATURE_MASK_BNDREGS |
 				     XFEATURE_MASK_BNDCSR);
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 04/32] KVM: SVM: Tag MSR bitmap initialization helpers with __init
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (2 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 03/32] KVM: SVM: Don't BUG if setting up the MSR intercept bitmaps fails Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 05/32] KVM: SVM: Use ARRAY_SIZE() to iterate over direct_access_msrs Sean Christopherson
                   ` (29 subsequent siblings)
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Tag init_msrpm_offsets() and add_msr_offset() with __init, as they're used
only during hardware setup to map potential passthrough MSRs to offsets in
the bitmap.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f70211780880..0c71efc99208 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -945,7 +945,7 @@ static void svm_msr_filter_changed(struct kvm_vcpu *vcpu)
 	}
 }
 
-static int add_msr_offset(u32 offset)
+static __init int add_msr_offset(u32 offset)
 {
 	int i;
 
@@ -968,7 +968,7 @@ static int add_msr_offset(u32 offset)
 	return -ENOSPC;
 }
 
-static int init_msrpm_offsets(void)
+static __init int init_msrpm_offsets(void)
 {
 	int i;
 
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 05/32] KVM: SVM: Use ARRAY_SIZE() to iterate over direct_access_msrs
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (3 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 04/32] KVM: SVM: Tag MSR bitmap initialization helpers with __init Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 06/32] KVM: SVM: Kill the VM instead of the host if MSR interception is buggy Sean Christopherson
                   ` (28 subsequent siblings)
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Drop the unnecessary and dangerous value-terminated behavior of
direct_access_msrs, and simply iterate over the actual size of the array.
The use in svm_set_x2apic_msr_interception() is especially sketchy, as it
relies on unused capacity being zero-initialized, and '0' being outside
the range of x2APIC MSRs.

To ensure the array and shadow_msr_intercept stay synchronized, simply
assert that their sizes are identical (note the six 64-bit-only MSRs).

Note, direct_access_msrs will soon be removed entirely; keeping the assert
synchronized with the array isn't expected to be along-term maintenance
burden.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 18 +++++++++++-------
 arch/x86/kvm/svm/svm.h |  2 +-
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 0c71efc99208..c75977ca600b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -86,7 +86,7 @@ static DEFINE_PER_CPU(u64, current_tsc_ratio);
 static const struct svm_direct_access_msrs {
 	u32 index;   /* Index of the MSR */
 	bool always; /* True if intercept is initially cleared */
-} direct_access_msrs[MAX_DIRECT_ACCESS_MSRS] = {
+} direct_access_msrs[] = {
 	{ .index = MSR_STAR,				.always = true  },
 	{ .index = MSR_IA32_SYSENTER_CS,		.always = true  },
 	{ .index = MSR_IA32_SYSENTER_EIP,		.always = false },
@@ -144,9 +144,12 @@ static const struct svm_direct_access_msrs {
 	{ .index = X2APIC_MSR(APIC_TMICT),		.always = false },
 	{ .index = X2APIC_MSR(APIC_TMCCT),		.always = false },
 	{ .index = X2APIC_MSR(APIC_TDCR),		.always = false },
-	{ .index = MSR_INVALID,				.always = false },
 };
 
+static_assert(ARRAY_SIZE(direct_access_msrs) ==
+	      MAX_DIRECT_ACCESS_MSRS - 6 * !IS_ENABLED(CONFIG_X86_64));
+#undef MAX_DIRECT_ACCESS_MSRS
+
 /*
  * These 2 parameters are used to config the controls for Pause-Loop Exiting:
  * pause_filter_count: On processors that support Pause filtering(indicated
@@ -767,9 +770,10 @@ static int direct_access_msr_slot(u32 msr)
 {
 	u32 i;
 
-	for (i = 0; direct_access_msrs[i].index != MSR_INVALID; i++)
+	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
 		if (direct_access_msrs[i].index == msr)
 			return i;
+	}
 
 	return -ENOENT;
 }
@@ -891,7 +895,7 @@ void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm)
 {
 	int i;
 
-	for (i = 0; direct_access_msrs[i].index != MSR_INVALID; i++) {
+	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
 		if (!direct_access_msrs[i].always)
 			continue;
 		set_msr_interception(vcpu, msrpm, direct_access_msrs[i].index, 1, 1);
@@ -908,7 +912,7 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
 	if (!x2avic_enabled)
 		return;
 
-	for (i = 0; i < MAX_DIRECT_ACCESS_MSRS; i++) {
+	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
 		int index = direct_access_msrs[i].index;
 
 		if ((index < APIC_BASE_MSR) ||
@@ -936,7 +940,7 @@ static void svm_msr_filter_changed(struct kvm_vcpu *vcpu)
 	 * will automatically get filtered through the MSR filter, so we are
 	 * back in sync after this.
 	 */
-	for (i = 0; direct_access_msrs[i].index != MSR_INVALID; i++) {
+	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
 		u32 msr = direct_access_msrs[i].index;
 		u32 read = test_bit(i, svm->shadow_msr_intercept.read);
 		u32 write = test_bit(i, svm->shadow_msr_intercept.write);
@@ -974,7 +978,7 @@ static __init int init_msrpm_offsets(void)
 
 	memset(msrpm_offsets, 0xff, sizeof(msrpm_offsets));
 
-	for (i = 0; direct_access_msrs[i].index != MSR_INVALID; i++) {
+	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
 		u32 offset;
 
 		offset = svm_msrpm_offset(direct_access_msrs[i].index);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index e6f3c6a153a0..f1e466a10219 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -44,7 +44,7 @@ static inline struct page *__sme_pa_to_page(unsigned long pa)
 #define	IOPM_SIZE PAGE_SIZE * 3
 #define	MSRPM_SIZE PAGE_SIZE * 2
 
-#define MAX_DIRECT_ACCESS_MSRS	48
+#define MAX_DIRECT_ACCESS_MSRS	47
 #define MSRPM_OFFSETS	32
 extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
 extern bool npt_enabled;
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 06/32] KVM: SVM: Kill the VM instead of the host if MSR interception is buggy
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (4 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 05/32] KVM: SVM: Use ARRAY_SIZE() to iterate over direct_access_msrs Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-11  2:16   ` Mi, Dapeng
  2025-06-10 22:57 ` [PATCH v2 07/32] KVM: x86: Use non-atomic bit ops to manipulate "shadow" MSR intercepts Sean Christopherson
                   ` (27 subsequent siblings)
  33 siblings, 1 reply; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

WARN and kill the VM instead of panicking the host if KVM attempts to set
or query MSR interception for an unsupported MSR.  Accessing the MSR
interception bitmaps only meaningfully affects post-VMRUN behavior, and
KVM_BUG_ON() is guaranteed to prevent the current vCPU from doing VMRUN,
i.e. there is no need to panic the entire host.

Opportunistically move the sanity checks about their use to index into the
MSRPM, e.g. so that bugs only WARN and terminate the VM, as opposed to
doing that _and_ generating an out-of-bounds load.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c75977ca600b..7e39b9df61f1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -824,11 +824,12 @@ static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
 				      to_svm(vcpu)->msrpm;
 
 	offset    = svm_msrpm_offset(msr);
+	if (KVM_BUG_ON(offset == MSR_INVALID, vcpu->kvm))
+		return false;
+
 	bit_write = 2 * (msr & 0x0f) + 1;
 	tmp       = msrpm[offset];
 
-	BUG_ON(offset == MSR_INVALID);
-
 	return test_bit(bit_write, &tmp);
 }
 
@@ -854,12 +855,13 @@ static void set_msr_interception_bitmap(struct kvm_vcpu *vcpu, u32 *msrpm,
 		write = 0;
 
 	offset    = svm_msrpm_offset(msr);
+	if (KVM_BUG_ON(offset == MSR_INVALID, vcpu->kvm))
+		return;
+
 	bit_read  = 2 * (msr & 0x0f);
 	bit_write = 2 * (msr & 0x0f) + 1;
 	tmp       = msrpm[offset];
 
-	BUG_ON(offset == MSR_INVALID);
-
 	read  ? clear_bit(bit_read,  &tmp) : set_bit(bit_read,  &tmp);
 	write ? clear_bit(bit_write, &tmp) : set_bit(bit_write, &tmp);
 
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 07/32] KVM: x86: Use non-atomic bit ops to manipulate "shadow" MSR intercepts
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (5 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 06/32] KVM: SVM: Kill the VM instead of the host if MSR interception is buggy Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-11  6:38   ` Binbin Wu
  2025-06-10 22:57 ` [PATCH v2 08/32] KVM: SVM: Massage name and param of helper that merges vmcb01 and vmcb12 MSRPMs Sean Christopherson
                   ` (26 subsequent siblings)
  33 siblings, 1 reply; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Manipulate the MSR bitmaps using non-atomic bit ops APIs (two underscores),
as the bitmaps are per-vCPU and are only ever accessed while vcpu->mutex is
held.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 12 ++++++------
 arch/x86/kvm/vmx/vmx.c |  8 ++++----
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7e39b9df61f1..ec97ea1d7b38 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -789,14 +789,14 @@ static void set_shadow_msr_intercept(struct kvm_vcpu *vcpu, u32 msr, int read,
 
 	/* Set the shadow bitmaps to the desired intercept states */
 	if (read)
-		set_bit(slot, svm->shadow_msr_intercept.read);
+		__set_bit(slot, svm->shadow_msr_intercept.read);
 	else
-		clear_bit(slot, svm->shadow_msr_intercept.read);
+		__clear_bit(slot, svm->shadow_msr_intercept.read);
 
 	if (write)
-		set_bit(slot, svm->shadow_msr_intercept.write);
+		__set_bit(slot, svm->shadow_msr_intercept.write);
 	else
-		clear_bit(slot, svm->shadow_msr_intercept.write);
+		__clear_bit(slot, svm->shadow_msr_intercept.write);
 }
 
 static bool valid_msr_intercept(u32 index)
@@ -862,8 +862,8 @@ static void set_msr_interception_bitmap(struct kvm_vcpu *vcpu, u32 *msrpm,
 	bit_write = 2 * (msr & 0x0f) + 1;
 	tmp       = msrpm[offset];
 
-	read  ? clear_bit(bit_read,  &tmp) : set_bit(bit_read,  &tmp);
-	write ? clear_bit(bit_write, &tmp) : set_bit(bit_write, &tmp);
+	read  ? __clear_bit(bit_read,  &tmp) : __set_bit(bit_read,  &tmp);
+	write ? __clear_bit(bit_write, &tmp) : __set_bit(bit_write, &tmp);
 
 	msrpm[offset] = tmp;
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9ff00ae9f05a..8f7fe04a1998 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4029,9 +4029,9 @@ void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
 	idx = vmx_get_passthrough_msr_slot(msr);
 	if (idx >= 0) {
 		if (type & MSR_TYPE_R)
-			clear_bit(idx, vmx->shadow_msr_intercept.read);
+			__clear_bit(idx, vmx->shadow_msr_intercept.read);
 		if (type & MSR_TYPE_W)
-			clear_bit(idx, vmx->shadow_msr_intercept.write);
+			__clear_bit(idx, vmx->shadow_msr_intercept.write);
 	}
 
 	if ((type & MSR_TYPE_R) &&
@@ -4071,9 +4071,9 @@ void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
 	idx = vmx_get_passthrough_msr_slot(msr);
 	if (idx >= 0) {
 		if (type & MSR_TYPE_R)
-			set_bit(idx, vmx->shadow_msr_intercept.read);
+			__set_bit(idx, vmx->shadow_msr_intercept.read);
 		if (type & MSR_TYPE_W)
-			set_bit(idx, vmx->shadow_msr_intercept.write);
+			__set_bit(idx, vmx->shadow_msr_intercept.write);
 	}
 
 	if (type & MSR_TYPE_R)
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 08/32] KVM: SVM: Massage name and param of helper that merges vmcb01 and vmcb12 MSRPMs
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (6 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 07/32] KVM: x86: Use non-atomic bit ops to manipulate "shadow" MSR intercepts Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-11  2:22   ` Mi, Dapeng
  2025-06-10 22:57 ` [PATCH v2 09/32] KVM: SVM: Clean up macros related to architectural MSRPM definitions Sean Christopherson
                   ` (25 subsequent siblings)
  33 siblings, 1 reply; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Rename nested_svm_vmrun_msrpm() to nested_svm_merge_msrpm() to better
capture its role, and opportunistically feed it @vcpu instead of @svm, as
grabbing "svm" only to turn around and grab svm->vcpu is rather silly.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c | 15 +++++++--------
 arch/x86/kvm/svm/svm.c    |  2 +-
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 8427a48b8b7a..89a77f0f1cc8 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -189,8 +189,9 @@ void recalc_intercepts(struct vcpu_svm *svm)
  * is optimized in that it only merges the parts where KVM MSR permission bitmap
  * may contain zero bits.
  */
-static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
+static bool nested_svm_merge_msrpm(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
 	int i;
 
 	/*
@@ -205,7 +206,7 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
 	if (!svm->nested.force_msr_bitmap_recalc) {
 		struct hv_vmcb_enlightenments *hve = &svm->nested.ctl.hv_enlightenments;
 
-		if (kvm_hv_hypercall_enabled(&svm->vcpu) &&
+		if (kvm_hv_hypercall_enabled(vcpu) &&
 		    hve->hv_enlightenments_control.msr_bitmap &&
 		    (svm->nested.ctl.clean & BIT(HV_VMCB_NESTED_ENLIGHTENMENTS)))
 			goto set_msrpm_base_pa;
@@ -230,7 +231,7 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
 
 		offset = svm->nested.ctl.msrpm_base_pa + (p * 4);
 
-		if (kvm_vcpu_read_guest(&svm->vcpu, offset, &value, 4))
+		if (kvm_vcpu_read_guest(vcpu, offset, &value, 4))
 			return false;
 
 		svm->nested.msrpm[p] = svm->msrpm[p] | value;
@@ -937,7 +938,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
 	if (enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, true))
 		goto out_exit_err;
 
-	if (nested_svm_vmrun_msrpm(svm))
+	if (nested_svm_merge_msrpm(vcpu))
 		goto out;
 
 out_exit_err:
@@ -1819,13 +1820,11 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
 
 static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu)
 {
-	struct vcpu_svm *svm = to_svm(vcpu);
-
 	if (WARN_ON(!is_guest_mode(vcpu)))
 		return true;
 
 	if (!vcpu->arch.pdptrs_from_userspace &&
-	    !nested_npt_enabled(svm) && is_pae_paging(vcpu))
+	    !nested_npt_enabled(to_svm(vcpu)) && is_pae_paging(vcpu))
 		/*
 		 * Reload the guest's PDPTRs since after a migration
 		 * the guest CR3 might be restored prior to setting the nested
@@ -1834,7 +1833,7 @@ static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu)
 		if (CC(!load_pdptrs(vcpu, vcpu->arch.cr3)))
 			return false;
 
-	if (!nested_svm_vmrun_msrpm(svm)) {
+	if (!nested_svm_merge_msrpm(vcpu)) {
 		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
 		vcpu->run->internal.suberror =
 			KVM_INTERNAL_ERROR_EMULATION;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ec97ea1d7b38..854904a80b7e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3137,7 +3137,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		 *
 		 * For nested:
 		 * The handling of the MSR bitmap for L2 guests is done in
-		 * nested_svm_vmrun_msrpm.
+		 * nested_svm_merge_msrpm().
 		 * We update the L1 MSR bit as well since it will end up
 		 * touching the MSR anyway now.
 		 */
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 09/32] KVM: SVM: Clean up macros related to architectural MSRPM definitions
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (7 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 08/32] KVM: SVM: Massage name and param of helper that merges vmcb01 and vmcb12 MSRPMs Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-11  6:09   ` Binbin Wu
  2025-06-10 22:57 ` [PATCH v2 10/32] KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps Sean Christopherson
                   ` (24 subsequent siblings)
  33 siblings, 1 reply; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Move SVM's MSR Permissions Map macros to svm.h in antipication of adding
helpers that are available to SVM code, and opportunistically replace a
variety of open-coded literals with (hopefully) informative macros.

Opportunistically open code ARRAY_SIZE(msrpm_ranges) instead of wrapping
it as NUM_MSR_MAPS, which is an ambiguous name even if it were qualified
with "SVM_MSRPM".

Deliberately leave the ranges as open coded literals, as using macros to
define the ranges actually introduces more potential failure points, since
both the definitions and the usage have to be careful to use the correct
index.  The lack of clear intent behind the ranges will be addressed in
future patches.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 12 ++++--------
 arch/x86/kvm/svm/svm.h | 13 ++++++++++++-
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 854904a80b7e..a683602cae22 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -268,22 +268,18 @@ static int tsc_aux_uret_slot __read_mostly = -1;
 
 static const u32 msrpm_ranges[] = {0, 0xc0000000, 0xc0010000};
 
-#define NUM_MSR_MAPS ARRAY_SIZE(msrpm_ranges)
-#define MSRS_RANGE_SIZE 2048
-#define MSRS_IN_RANGE (MSRS_RANGE_SIZE * 8 / 2)
-
 u32 svm_msrpm_offset(u32 msr)
 {
 	u32 offset;
 	int i;
 
-	for (i = 0; i < NUM_MSR_MAPS; i++) {
+	for (i = 0; i < ARRAY_SIZE(msrpm_ranges); i++) {
 		if (msr < msrpm_ranges[i] ||
-		    msr >= msrpm_ranges[i] + MSRS_IN_RANGE)
+		    msr >= msrpm_ranges[i] + SVM_MSRS_PER_RANGE)
 			continue;
 
-		offset  = (msr - msrpm_ranges[i]) / 4; /* 4 msrs per u8 */
-		offset += (i * MSRS_RANGE_SIZE);       /* add range offset */
+		offset  = (msr - msrpm_ranges[i]) / SVM_MSRS_PER_BYTE;
+		offset += (i * SVM_MSRPM_BYTES_PER_RANGE);  /* add range offset */
 
 		/* Now we have the u8 offset - but need the u32 offset */
 		return offset / 4;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f1e466a10219..086a8c8aae86 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -613,11 +613,22 @@ static inline void svm_vmgexit_no_action(struct vcpu_svm *svm, u64 data)
 	svm_vmgexit_set_return_code(svm, GHCB_HV_RESP_NO_ACTION, data);
 }
 
-/* svm.c */
+/*
+ * The MSRPM is 8KiB in size, divided into four 2KiB ranges (the fourth range
+ * is reserved).  Each MSR within a range is covered by two bits, one each for
+ * read (bit 0) and write (bit 1), where a bit value of '1' means intercepted.
+ */
+#define SVM_MSRPM_BYTES_PER_RANGE 2048
+#define SVM_BITS_PER_MSR 2
+#define SVM_MSRS_PER_BYTE (BITS_PER_BYTE / SVM_BITS_PER_MSR)
+#define SVM_MSRS_PER_RANGE (SVM_MSRPM_BYTES_PER_RANGE * SVM_MSRS_PER_BYTE)
+static_assert(SVM_MSRS_PER_RANGE == 8192);
+
 #define MSR_INVALID				0xffffffffU
 
 #define DEBUGCTL_RESERVED_BITS (~DEBUGCTLMSR_LBR)
 
+/* svm.c */
 extern bool dump_invalid_vmcb;
 
 u32 svm_msrpm_offset(u32 msr);
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 10/32] KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (8 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 09/32] KVM: SVM: Clean up macros related to architectural MSRPM definitions Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 11/32] KVM: nSVM: Omit SEV-ES specific passthrough MSRs from L0+L1 bitmap merge Sean Christopherson
                   ` (23 subsequent siblings)
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Use a dedicated array of MSRPM offsets to merge L0 and L1 bitmaps, i.e. to
merge KVM's vmcb01 bitmap with L1's vmcb12 bitmap.  This will eventually
allow for the removal of direct_access_msrs, as the only path where
tracking the offsets is truly justified is the merge for nested SVM, where
merging in chunks is an easy way to batch uaccess reads/writes.

Opportunistically omit the x2APIC MSRs from the merge-specific array
instead of filtering them out at runtime.

Note, disabling interception of DEBUGCTL, XSS, EFER, PAT, GHCB, and
TSC_AUX is mutually exclusive with nested virtualization, as KVM passes
through those MSRs only for SEV-ES guests, and KVM doesn't support nested
virtualization for SEV+ guests.  Defer removing those MSRs to a future
cleanup in order to make this refactoring as benign as possible.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c | 83 +++++++++++++++++++++++++++++++++------
 arch/x86/kvm/svm/svm.c    |  4 ++
 arch/x86/kvm/svm/svm.h    |  2 +
 3 files changed, 78 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 89a77f0f1cc8..666469e11602 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -184,6 +184,75 @@ void recalc_intercepts(struct vcpu_svm *svm)
 	}
 }
 
+/*
+ * This array (and its actual size) holds the set of offsets (indexing by chunk
+ * size) to process when merging vmcb12's MSRPM with vmcb01's MSRPM.  Note, the
+ * set of MSRs for which interception is disabled in vmcb01 is per-vCPU, e.g.
+ * based on CPUID features.  This array only tracks MSRs that *might* be passed
+ * through to the guest.
+ *
+ * Hardcode the capacity of the array based on the maximum number of _offsets_.
+ * MSRs are batched together, so there are fewer offsets than MSRs.
+ */
+static int nested_svm_msrpm_merge_offsets[9] __ro_after_init;
+static int nested_svm_nr_msrpm_merge_offsets __ro_after_init;
+
+int __init nested_svm_init_msrpm_merge_offsets(void)
+{
+	static const u32 merge_msrs[] __initconst = {
+		MSR_STAR,
+		MSR_IA32_SYSENTER_CS,
+		MSR_IA32_SYSENTER_EIP,
+		MSR_IA32_SYSENTER_ESP,
+	#ifdef CONFIG_X86_64
+		MSR_GS_BASE,
+		MSR_FS_BASE,
+		MSR_KERNEL_GS_BASE,
+		MSR_LSTAR,
+		MSR_CSTAR,
+		MSR_SYSCALL_MASK,
+	#endif
+		MSR_IA32_SPEC_CTRL,
+		MSR_IA32_PRED_CMD,
+		MSR_IA32_FLUSH_CMD,
+		MSR_IA32_LASTBRANCHFROMIP,
+		MSR_IA32_LASTBRANCHTOIP,
+		MSR_IA32_LASTINTFROMIP,
+		MSR_IA32_LASTINTTOIP,
+
+		MSR_IA32_DEBUGCTLMSR,
+		MSR_IA32_XSS,
+		MSR_EFER,
+		MSR_IA32_CR_PAT,
+		MSR_AMD64_SEV_ES_GHCB,
+		MSR_TSC_AUX,
+	};
+	int i, j;
+
+	for (i = 0; i < ARRAY_SIZE(merge_msrs); i++) {
+		u32 offset = svm_msrpm_offset(merge_msrs[i]);
+
+		if (WARN_ON(offset == MSR_INVALID))
+			return -EIO;
+
+		for (j = 0; j < nested_svm_nr_msrpm_merge_offsets; j++) {
+			if (nested_svm_msrpm_merge_offsets[j] == offset)
+				break;
+		}
+
+		if (j < nested_svm_nr_msrpm_merge_offsets)
+			continue;
+
+		if (WARN_ON(j >= ARRAY_SIZE(nested_svm_msrpm_merge_offsets)))
+			return -EIO;
+
+		nested_svm_msrpm_merge_offsets[j] = offset;
+		nested_svm_nr_msrpm_merge_offsets++;
+	}
+
+	return 0;
+}
+
 /*
  * Merge L0's (KVM) and L1's (Nested VMCB) MSR permission bitmaps. The function
  * is optimized in that it only merges the parts where KVM MSR permission bitmap
@@ -216,19 +285,11 @@ static bool nested_svm_merge_msrpm(struct kvm_vcpu *vcpu)
 	if (!(vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_MSR_PROT)))
 		return true;
 
-	for (i = 0; i < MSRPM_OFFSETS; i++) {
-		u32 value, p;
+	for (i = 0; i < nested_svm_nr_msrpm_merge_offsets; i++) {
+		const int p = nested_svm_msrpm_merge_offsets[i];
+		u32 value;
 		u64 offset;
 
-		if (msrpm_offsets[i] == 0xffffffff)
-			break;
-
-		p      = msrpm_offsets[i];
-
-		/* x2apic msrs are intercepted always for the nested guest */
-		if (is_x2apic_msrpm_offset(p))
-			continue;
-
 		offset = svm->nested.ctl.msrpm_base_pa + (p * 4);
 
 		if (kvm_vcpu_read_guest(vcpu, offset, &value, 4))
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a683602cae22..1ee936b8a6d0 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5543,6 +5543,10 @@ static __init int svm_hardware_setup(void)
 	if (nested) {
 		pr_info("Nested Virtualization enabled\n");
 		kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
+
+		r = nested_svm_init_msrpm_merge_offsets();
+		if (r)
+			return r;
 	}
 
 	/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 086a8c8aae86..9f750b2399e9 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -682,6 +682,8 @@ static inline bool nested_exit_on_nmi(struct vcpu_svm *svm)
 	return vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_NMI);
 }
 
+int __init nested_svm_init_msrpm_merge_offsets(void);
+
 int enter_svm_guest_mode(struct kvm_vcpu *vcpu,
 			 u64 vmcb_gpa, struct vmcb *vmcb12, bool from_vmrun);
 void svm_leave_nested(struct kvm_vcpu *vcpu);
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 11/32] KVM: nSVM: Omit SEV-ES specific passthrough MSRs from L0+L1 bitmap merge
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (9 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 10/32] KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 12/32] KVM: nSVM: Don't initialize vmcb02 MSRPM with vmcb01's "always passthrough" Sean Christopherson
                   ` (22 subsequent siblings)
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Don't merge bitmaps on nested VMRUN for MSRs that KVM passes through only
for SEV-ES guests.  KVM doesn't support nested virtualization for SEV-ES,
and likely never will.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 666469e11602..360dbd80a728 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -194,7 +194,7 @@ void recalc_intercepts(struct vcpu_svm *svm)
  * Hardcode the capacity of the array based on the maximum number of _offsets_.
  * MSRs are batched together, so there are fewer offsets than MSRs.
  */
-static int nested_svm_msrpm_merge_offsets[9] __ro_after_init;
+static int nested_svm_msrpm_merge_offsets[6] __ro_after_init;
 static int nested_svm_nr_msrpm_merge_offsets __ro_after_init;
 
 int __init nested_svm_init_msrpm_merge_offsets(void)
@@ -219,13 +219,6 @@ int __init nested_svm_init_msrpm_merge_offsets(void)
 		MSR_IA32_LASTBRANCHTOIP,
 		MSR_IA32_LASTINTFROMIP,
 		MSR_IA32_LASTINTTOIP,
-
-		MSR_IA32_DEBUGCTLMSR,
-		MSR_IA32_XSS,
-		MSR_EFER,
-		MSR_IA32_CR_PAT,
-		MSR_AMD64_SEV_ES_GHCB,
-		MSR_TSC_AUX,
 	};
 	int i, j;
 
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 12/32] KVM: nSVM: Don't initialize vmcb02 MSRPM with vmcb01's "always passthrough"
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (10 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 11/32] KVM: nSVM: Omit SEV-ES specific passthrough MSRs from L0+L1 bitmap merge Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 13/32] KVM: SVM: Add helpers for accessing MSR bitmap that don't rely on offsets Sean Christopherson
                   ` (21 subsequent siblings)
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Don't initialize vmcb02's MSRPM with KVM's set of "always passthrough"
MSRs, as KVM always needs to consult L1's intercepts, i.e. needs to merge
vmcb01 with vmcb12 and write the result to vmcb02.  This will eventually
allow for the removal of svm_vcpu_init_msrpm().

Note, the bitmaps are truly initialized by svm_vcpu_alloc_msrpm() (default
to intercepting all MSRs), e.g. if there is a bug lurking elsewhere, the
worst case scenario from dropping the call to svm_vcpu_init_msrpm() should
be that KVM would fail to passthrough MSRs to L2.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c | 1 -
 arch/x86/kvm/svm/svm.c    | 5 +++--
 arch/x86/kvm/svm/svm.h    | 1 -
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 360dbd80a728..cf148f7db887 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1285,7 +1285,6 @@ int svm_allocate_nested(struct vcpu_svm *svm)
 	svm->nested.msrpm = svm_vcpu_alloc_msrpm();
 	if (!svm->nested.msrpm)
 		goto err_free_vmcb02;
-	svm_vcpu_init_msrpm(&svm->vcpu, svm->nested.msrpm);
 
 	svm->nested.initialized = true;
 	return 0;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1ee936b8a6d0..798d33a76796 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -889,8 +889,9 @@ u32 *svm_vcpu_alloc_msrpm(void)
 	return msrpm;
 }
 
-void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm)
+static void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu)
 {
+	u32 *msrpm = to_svm(vcpu)->msrpm;
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
@@ -1402,7 +1403,7 @@ static void __svm_vcpu_reset(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	svm_vcpu_init_msrpm(vcpu, svm->msrpm);
+	svm_vcpu_init_msrpm(vcpu);
 
 	svm_init_osvw(vcpu);
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 9f750b2399e9..bce66afafa11 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -633,7 +633,6 @@ extern bool dump_invalid_vmcb;
 
 u32 svm_msrpm_offset(u32 msr);
 u32 *svm_vcpu_alloc_msrpm(void);
-void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm);
 void svm_vcpu_free_msrpm(u32 *msrpm);
 void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb);
 void svm_enable_lbrv(struct kvm_vcpu *vcpu);
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 13/32] KVM: SVM: Add helpers for accessing MSR bitmap that don't rely on offsets
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (11 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 12/32] KVM: nSVM: Don't initialize vmcb02 MSRPM with vmcb01's "always passthrough" Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 14/32] KVM: SVM: Implement and adopt VMX style MSR intercepts APIs Sean Christopherson
                   ` (20 subsequent siblings)
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Add macro-built helpers for testing, setting, and clearing MSRPM entries
without relying on precomputed offsets.  This sets the stage for eventually
removing general KVM use of precomputed offsets, which are quite confusing
and rather inefficient for the vast majority of KVM's usage.

Outside of merging L0 and L1 bitmaps for nested SVM, using u32-indexed
offsets and accesses is at best unnecessary, and at worst introduces extra
operations to retrieve the individual bit from within the offset u32 value.
And simply calling them "offsets" is very confusing, as the "unit" of the
offset isn't immediately obvious.

Use the new helpers in set_msr_interception_bitmap() and
msr_write_intercepted() to verify the math and operations, but keep the
existing offset-based logic in set_msr_interception_bitmap() to sanity
check the "clear" and "set" operations.  Manipulating MSR interceptions
isn't a hot path and no kernel release is ever expected to contain this
specific version of set_msr_interception_bitmap() (it will be removed
entirely in the near future).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 30 ++++++++++++++--------------
 arch/x86/kvm/svm/svm.h | 44 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 798d33a76796..cd1e0ca964b0 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -802,11 +802,6 @@ static bool valid_msr_intercept(u32 index)
 
 static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
 {
-	u8 bit_write;
-	unsigned long tmp;
-	u32 offset;
-	u32 *msrpm;
-
 	/*
 	 * For non-nested case:
 	 * If the L01 MSR bitmap does not intercept the MSR, then we need to
@@ -816,17 +811,10 @@ static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
 	 * If the L02 MSR bitmap does not intercept the MSR, then we need to
 	 * save it.
 	 */
-	msrpm = is_guest_mode(vcpu) ? to_svm(vcpu)->nested.msrpm:
-				      to_svm(vcpu)->msrpm;
+	void *msrpm = is_guest_mode(vcpu) ? to_svm(vcpu)->nested.msrpm :
+					    to_svm(vcpu)->msrpm;
 
-	offset    = svm_msrpm_offset(msr);
-	if (KVM_BUG_ON(offset == MSR_INVALID, vcpu->kvm))
-		return false;
-
-	bit_write = 2 * (msr & 0x0f) + 1;
-	tmp       = msrpm[offset];
-
-	return test_bit(bit_write, &tmp);
+	return svm_test_msr_bitmap_write(msrpm, msr);
 }
 
 static void set_msr_interception_bitmap(struct kvm_vcpu *vcpu, u32 *msrpm,
@@ -861,7 +849,17 @@ static void set_msr_interception_bitmap(struct kvm_vcpu *vcpu, u32 *msrpm,
 	read  ? __clear_bit(bit_read,  &tmp) : __set_bit(bit_read,  &tmp);
 	write ? __clear_bit(bit_write, &tmp) : __set_bit(bit_write, &tmp);
 
-	msrpm[offset] = tmp;
+	if (read)
+		svm_clear_msr_bitmap_read((void *)msrpm, msr);
+	else
+		svm_set_msr_bitmap_read((void *)msrpm, msr);
+
+	if (write)
+		svm_clear_msr_bitmap_write((void *)msrpm, msr);
+	else
+		svm_set_msr_bitmap_write((void *)msrpm, msr);
+
+	WARN_ON_ONCE(msrpm[offset] != (u32)tmp);
 
 	svm_hv_vmcb_dirty_nested_enlightenments(vcpu);
 	svm->nested.force_msr_bitmap_recalc = true;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index bce66afafa11..a2be18579e09 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -623,9 +623,53 @@ static inline void svm_vmgexit_no_action(struct vcpu_svm *svm, u64 data)
 #define SVM_MSRS_PER_BYTE (BITS_PER_BYTE / SVM_BITS_PER_MSR)
 #define SVM_MSRS_PER_RANGE (SVM_MSRPM_BYTES_PER_RANGE * SVM_MSRS_PER_BYTE)
 static_assert(SVM_MSRS_PER_RANGE == 8192);
+#define SVM_MSRPM_OFFSET_MASK (SVM_MSRS_PER_RANGE - 1)
 
 #define MSR_INVALID				0xffffffffU
 
+static __always_inline u32 svm_msrpm_bit_nr(u32 msr)
+{
+	int range_nr;
+
+	switch (msr & ~SVM_MSRPM_OFFSET_MASK) {
+	case 0:
+		range_nr = 0;
+		break;
+	case 0xc0000000:
+		range_nr = 1;
+		break;
+	case 0xc0010000:
+		range_nr = 2;
+		break;
+	default:
+		return MSR_INVALID;
+	}
+
+	return range_nr * SVM_MSRPM_BYTES_PER_RANGE * BITS_PER_BYTE +
+	       (msr & SVM_MSRPM_OFFSET_MASK) * SVM_BITS_PER_MSR;
+}
+
+#define __BUILD_SVM_MSR_BITMAP_HELPER(rtype, action, bitop, access, bit_rw)	\
+static inline rtype svm_##action##_msr_bitmap_##access(unsigned long *bitmap,	\
+						       u32 msr)			\
+{										\
+	u32 bit_nr;								\
+										\
+	bit_nr = svm_msrpm_bit_nr(msr);						\
+	if (bit_nr == MSR_INVALID)								\
+		return (rtype)true;						\
+										\
+	return bitop##_bit(bit_nr + bit_rw, bitmap);				\
+}
+
+#define BUILD_SVM_MSR_BITMAP_HELPERS(ret_type, action, bitop)			\
+	__BUILD_SVM_MSR_BITMAP_HELPER(ret_type, action, bitop, read,  0)	\
+	__BUILD_SVM_MSR_BITMAP_HELPER(ret_type, action, bitop, write, 1)
+
+BUILD_SVM_MSR_BITMAP_HELPERS(bool, test, test)
+BUILD_SVM_MSR_BITMAP_HELPERS(void, clear, __clear)
+BUILD_SVM_MSR_BITMAP_HELPERS(void, set, __set)
+
 #define DEBUGCTL_RESERVED_BITS (~DEBUGCTLMSR_LBR)
 
 /* svm.c */
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 14/32] KVM: SVM: Implement and adopt VMX style MSR intercepts APIs
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (12 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 13/32] KVM: SVM: Add helpers for accessing MSR bitmap that don't rely on offsets Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-11  7:31   ` Binbin Wu
  2025-06-10 22:57 ` [PATCH v2 15/32] KVM: SVM: Pass through GHCB MSR if and only if VM is an SEV-ES guest Sean Christopherson
                   ` (19 subsequent siblings)
  33 siblings, 1 reply; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Add and use SVM MSR interception APIs (in most paths) to match VMX's
APIs and nomenclature.  Specifically, add SVM variants of:

        vmx_disable_intercept_for_msr(vcpu, msr, type)
        vmx_enable_intercept_for_msr(vcpu, msr, type)
        vmx_set_intercept_for_msr(vcpu, msr, type, intercept)

to eventually replace SVM's single helper:

        set_msr_interception(vcpu, msrpm, msr, allow_read, allow_write)

which is awkward to use (in all cases, KVM either applies the same logic
for both reads and writes, or intercepts one of read or write), and is
unintuitive due to using '0' to indicate interception should be *set*.

Keep the guts of the old API for the moment to avoid churning the MSR
filter code, as that mess will be overhauled in the near future.  Leave
behind a temporary comment to call out that the shadow bitmaps have
inverted polarity relative to the bitmaps consumed by hardware.

No functional change intended.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c | 18 ++++----
 arch/x86/kvm/svm/svm.c | 99 +++++++++++++++++++++++++++++-------------
 arch/x86/kvm/svm/svm.h | 12 +++++
 3 files changed, 90 insertions(+), 39 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6c2f840a0171..74dab69fb69e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4351,12 +4351,10 @@ static void sev_es_vcpu_after_set_cpuid(struct vcpu_svm *svm)
 {
 	struct kvm_vcpu *vcpu = &svm->vcpu;
 
-	if (boot_cpu_has(X86_FEATURE_V_TSC_AUX)) {
-		bool v_tsc_aux = guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) ||
-				 guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID);
-
-		set_msr_interception(vcpu, svm->msrpm, MSR_TSC_AUX, v_tsc_aux, v_tsc_aux);
-	}
+	if (boot_cpu_has(X86_FEATURE_V_TSC_AUX))
+		svm_set_intercept_for_msr(vcpu, MSR_TSC_AUX, MSR_TYPE_RW,
+					  !guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) &&
+					  !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID));
 
 	/*
 	 * For SEV-ES, accesses to MSR_IA32_XSS should not be intercepted if
@@ -4372,9 +4370,9 @@ static void sev_es_vcpu_after_set_cpuid(struct vcpu_svm *svm)
 	 */
 	if (guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVES) &&
 	    guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
+		svm_disable_intercept_for_msr(vcpu, MSR_IA32_XSS, MSR_TYPE_RW);
 	else
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 0, 0);
+		svm_enable_intercept_for_msr(vcpu, MSR_IA32_XSS, MSR_TYPE_RW);
 }
 
 void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm)
@@ -4451,8 +4449,8 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
 	svm_clr_intercept(svm, INTERCEPT_XSETBV);
 
 	/* Clear intercepts on selected MSRs */
-	set_msr_interception(vcpu, svm->msrpm, MSR_EFER, 1, 1);
-	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_CR_PAT, 1, 1);
+	svm_disable_intercept_for_msr(vcpu, MSR_EFER, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_IA32_CR_PAT, MSR_TYPE_RW);
 }
 
 void sev_init_vmcb(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cd1e0ca964b0..93d66109f495 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -865,11 +865,53 @@ static void set_msr_interception_bitmap(struct kvm_vcpu *vcpu, u32 *msrpm,
 	svm->nested.force_msr_bitmap_recalc = true;
 }
 
-void set_msr_interception(struct kvm_vcpu *vcpu, u32 *msrpm, u32 msr,
-			  int read, int write)
+void svm_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
 {
-	set_shadow_msr_intercept(vcpu, msr, read, write);
-	set_msr_interception_bitmap(vcpu, msrpm, msr, read, write);
+	struct vcpu_svm *svm = to_svm(vcpu);
+	void *msrpm = svm->msrpm;
+
+	/* Note, the shadow intercept bitmaps have inverted polarity. */
+	set_shadow_msr_intercept(vcpu, msr, type & MSR_TYPE_R, type & MSR_TYPE_W);
+
+	/* Don't disable interception for MSRs userspace wants to handle. */
+	if ((type & MSR_TYPE_R) &&
+	    !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ)) {
+		svm_set_msr_bitmap_read(msrpm, msr);
+		type &= ~MSR_TYPE_R;
+	}
+
+	if ((type & MSR_TYPE_W) &&
+	    !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE)) {
+		svm_set_msr_bitmap_write(msrpm, msr);
+		type &= ~MSR_TYPE_W;
+	}
+
+	if (type & MSR_TYPE_R)
+		svm_clear_msr_bitmap_read(msrpm, msr);
+
+	if (type & MSR_TYPE_W)
+		svm_clear_msr_bitmap_write(msrpm, msr);
+
+	svm_hv_vmcb_dirty_nested_enlightenments(vcpu);
+	svm->nested.force_msr_bitmap_recalc = true;
+}
+
+void svm_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	void *msrpm = svm->msrpm;
+
+	set_shadow_msr_intercept(vcpu, msr,
+				 !(type & MSR_TYPE_R), !(type & MSR_TYPE_W));
+
+	if (type & MSR_TYPE_R)
+		svm_set_msr_bitmap_read(msrpm, msr);
+
+	if (type & MSR_TYPE_W)
+		svm_set_msr_bitmap_write(msrpm, msr);
+
+	svm_hv_vmcb_dirty_nested_enlightenments(vcpu);
+	svm->nested.force_msr_bitmap_recalc = true;
 }
 
 u32 *svm_vcpu_alloc_msrpm(void)
@@ -889,13 +931,13 @@ u32 *svm_vcpu_alloc_msrpm(void)
 
 static void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu)
 {
-	u32 *msrpm = to_svm(vcpu)->msrpm;
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
 		if (!direct_access_msrs[i].always)
 			continue;
-		set_msr_interception(vcpu, msrpm, direct_access_msrs[i].index, 1, 1);
+		svm_disable_intercept_for_msr(vcpu, direct_access_msrs[i].index,
+					      MSR_TYPE_RW);
 	}
 }
 
@@ -915,8 +957,8 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
 		if ((index < APIC_BASE_MSR) ||
 		    (index > APIC_BASE_MSR + 0xff))
 			continue;
-		set_msr_interception(&svm->vcpu, svm->msrpm, index,
-				     !intercept, !intercept);
+
+		svm_set_intercept_for_msr(&svm->vcpu, index, MSR_TYPE_RW, intercept);
 	}
 
 	svm->x2avic_msrs_intercepted = intercept;
@@ -1004,13 +1046,13 @@ void svm_enable_lbrv(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 
 	svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK;
-	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHFROMIP, 1, 1);
-	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 1, 1);
-	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTFROMIP, 1, 1);
-	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTTOIP, 1, 1);
+	svm_disable_intercept_for_msr(vcpu, MSR_IA32_LASTBRANCHFROMIP, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_IA32_LASTBRANCHTOIP, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_IA32_LASTINTFROMIP, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_IA32_LASTINTTOIP, MSR_TYPE_RW);
 
 	if (sev_es_guest(vcpu->kvm))
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_DEBUGCTLMSR, 1, 1);
+		svm_disable_intercept_for_msr(vcpu, MSR_IA32_DEBUGCTLMSR, MSR_TYPE_RW);
 
 	/* Move the LBR msrs to the vmcb02 so that the guest can see them. */
 	if (is_guest_mode(vcpu))
@@ -1024,10 +1066,10 @@ static void svm_disable_lbrv(struct kvm_vcpu *vcpu)
 	KVM_BUG_ON(sev_es_guest(vcpu->kvm), vcpu->kvm);
 
 	svm->vmcb->control.virt_ext &= ~LBR_CTL_ENABLE_MASK;
-	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHFROMIP, 0, 0);
-	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 0, 0);
-	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTFROMIP, 0, 0);
-	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTTOIP, 0, 0);
+	svm_enable_intercept_for_msr(vcpu, MSR_IA32_LASTBRANCHFROMIP, MSR_TYPE_RW);
+	svm_enable_intercept_for_msr(vcpu, MSR_IA32_LASTBRANCHTOIP, MSR_TYPE_RW);
+	svm_enable_intercept_for_msr(vcpu, MSR_IA32_LASTINTFROMIP, MSR_TYPE_RW);
+	svm_enable_intercept_for_msr(vcpu, MSR_IA32_LASTINTTOIP, MSR_TYPE_RW);
 
 	/*
 	 * Move the LBR msrs back to the vmcb01 to avoid copying them
@@ -1219,8 +1261,8 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
 		svm_set_intercept(svm, INTERCEPT_VMSAVE);
 		svm->vmcb->control.virt_ext &= ~VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
 
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 0, 0);
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 0, 0);
+		svm_enable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
+		svm_enable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
 	} else {
 		/*
 		 * If hardware supports Virtual VMLOAD VMSAVE then enable it
@@ -1232,8 +1274,8 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
 			svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
 		}
 		/* No need to intercept these MSRs */
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 1, 1);
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 1, 1);
+		svm_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
+		svm_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
 	}
 }
 
@@ -1367,9 +1409,8 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 	 * of SPEC_CTRL, without waiting for the guest to access the MSR.
 	 */
 	if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL,
-				     guest_has_spec_ctrl_msr(vcpu),
-				     guest_has_spec_ctrl_msr(vcpu));
+		svm_set_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW,
+					  !guest_has_spec_ctrl_msr(vcpu));
 
 	if (kvm_vcpu_apicv_active(vcpu))
 		avic_init_vmcb(svm, vmcb);
@@ -3136,7 +3177,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		 * We update the L1 MSR bit as well since it will end up
 		 * touching the MSR anyway now.
 		 */
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
+		svm_disable_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW);
 		break;
 	case MSR_AMD64_VIRT_SPEC_CTRL:
 		if (!msr->host_initiated &&
@@ -4640,12 +4681,12 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	svm_recalc_instruction_intercepts(vcpu, svm);
 
 	if (boot_cpu_has(X86_FEATURE_IBPB))
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_PRED_CMD, 0,
-				     !!guest_has_pred_cmd_msr(vcpu));
+		svm_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
+					  !guest_has_pred_cmd_msr(vcpu));
 
 	if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_FLUSH_CMD, 0,
-				     !!guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
+		svm_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
+					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
 
 	if (sev_guest(vcpu->kvm))
 		sev_vcpu_after_set_cpuid(svm);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a2be18579e09..5d5805ab59a7 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -697,6 +697,18 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool disable);
 void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
 				     int trig_mode, int vec);
 
+void svm_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type);
+void svm_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type);
+
+static inline void svm_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr,
+					     int type, bool enable_intercept)
+{
+	if (enable_intercept)
+		svm_enable_intercept_for_msr(vcpu, msr, type);
+	else
+		svm_disable_intercept_for_msr(vcpu, msr, type);
+}
+
 /* nested.c */
 
 #define NESTED_EXIT_HOST	0	/* Exit handled on host level */
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 15/32] KVM: SVM: Pass through GHCB MSR if and only if VM is an SEV-ES guest
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (13 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 14/32] KVM: SVM: Implement and adopt VMX style MSR intercepts APIs Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 16/32] KVM: SVM: Drop "always" flag from list of possible passthrough MSRs Sean Christopherson
                   ` (18 subsequent siblings)
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Disable interception of the GHCB MSR if and only if the VM is an SEV-ES
guest.  While the exact behavior is completely undocumented in the APM,
common sense and testing on SEV-ES capable CPUs says that accesses to the
GHCB from non-SEV-ES guests will #GP.  I.e. from the guest's perspective,
no functional change intended.

Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c | 3 ++-
 arch/x86/kvm/svm/svm.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 74dab69fb69e..a020aa755a7e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4448,7 +4448,8 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
 	/* Can't intercept XSETBV, HV can't modify XCR0 directly */
 	svm_clr_intercept(svm, INTERCEPT_XSETBV);
 
-	/* Clear intercepts on selected MSRs */
+	/* Clear intercepts on MSRs that are context switched by hardware. */
+	svm_disable_intercept_for_msr(vcpu, MSR_AMD64_SEV_ES_GHCB, MSR_TYPE_RW);
 	svm_disable_intercept_for_msr(vcpu, MSR_EFER, MSR_TYPE_RW);
 	svm_disable_intercept_for_msr(vcpu, MSR_IA32_CR_PAT, MSR_TYPE_RW);
 }
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 93d66109f495..7747f9bc3e9d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -110,7 +110,7 @@ static const struct svm_direct_access_msrs {
 	{ .index = MSR_IA32_XSS,			.always = false },
 	{ .index = MSR_EFER,				.always = false },
 	{ .index = MSR_IA32_CR_PAT,			.always = false },
-	{ .index = MSR_AMD64_SEV_ES_GHCB,		.always = true  },
+	{ .index = MSR_AMD64_SEV_ES_GHCB,		.always = false },
 	{ .index = MSR_TSC_AUX,				.always = false },
 	{ .index = X2APIC_MSR(APIC_ID),			.always = false },
 	{ .index = X2APIC_MSR(APIC_LVR),		.always = false },
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 16/32] KVM: SVM: Drop "always" flag from list of possible passthrough MSRs
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (14 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 15/32] KVM: SVM: Pass through GHCB MSR if and only if VM is an SEV-ES guest Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 17/32] KVM: x86: Move definition of X2APIC_MSR() to lapic.h Sean Christopherson
                   ` (17 subsequent siblings)
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Drop the "always" flag from the array of possible passthrough MSRs, and
instead manually initialize the permissions for the handful of MSRs that
KVM passes through by default.  In addition to cutting down on boilerplate
copy+paste code and eliminating a misleading flag (the MSRs aren't always
passed through, e.g. thanks to MSR filters), this will allow for removing
the direct_access_msrs array entirely.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 124 ++++++++++++++++++++---------------------
 1 file changed, 62 insertions(+), 62 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7747f9bc3e9d..4ee92e444dde 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -83,51 +83,48 @@ static DEFINE_PER_CPU(u64, current_tsc_ratio);
 
 #define X2APIC_MSR(x)	(APIC_BASE_MSR + (x >> 4))
 
-static const struct svm_direct_access_msrs {
-	u32 index;   /* Index of the MSR */
-	bool always; /* True if intercept is initially cleared */
-} direct_access_msrs[] = {
-	{ .index = MSR_STAR,				.always = true  },
-	{ .index = MSR_IA32_SYSENTER_CS,		.always = true  },
-	{ .index = MSR_IA32_SYSENTER_EIP,		.always = false },
-	{ .index = MSR_IA32_SYSENTER_ESP,		.always = false },
+static const u32 direct_access_msrs[] = {
+	MSR_STAR,
+	MSR_IA32_SYSENTER_CS,
+	MSR_IA32_SYSENTER_EIP,
+	MSR_IA32_SYSENTER_ESP,
 #ifdef CONFIG_X86_64
-	{ .index = MSR_GS_BASE,				.always = true  },
-	{ .index = MSR_FS_BASE,				.always = true  },
-	{ .index = MSR_KERNEL_GS_BASE,			.always = true  },
-	{ .index = MSR_LSTAR,				.always = true  },
-	{ .index = MSR_CSTAR,				.always = true  },
-	{ .index = MSR_SYSCALL_MASK,			.always = true  },
+	MSR_GS_BASE,
+	MSR_FS_BASE,
+	MSR_KERNEL_GS_BASE,
+	MSR_LSTAR,
+	MSR_CSTAR,
+	MSR_SYSCALL_MASK,
 #endif
-	{ .index = MSR_IA32_SPEC_CTRL,			.always = false },
-	{ .index = MSR_IA32_PRED_CMD,			.always = false },
-	{ .index = MSR_IA32_FLUSH_CMD,			.always = false },
-	{ .index = MSR_IA32_DEBUGCTLMSR,		.always = false },
-	{ .index = MSR_IA32_LASTBRANCHFROMIP,		.always = false },
-	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
-	{ .index = MSR_IA32_LASTINTFROMIP,		.always = false },
-	{ .index = MSR_IA32_LASTINTTOIP,		.always = false },
-	{ .index = MSR_IA32_XSS,			.always = false },
-	{ .index = MSR_EFER,				.always = false },
-	{ .index = MSR_IA32_CR_PAT,			.always = false },
-	{ .index = MSR_AMD64_SEV_ES_GHCB,		.always = false },
-	{ .index = MSR_TSC_AUX,				.always = false },
-	{ .index = X2APIC_MSR(APIC_ID),			.always = false },
-	{ .index = X2APIC_MSR(APIC_LVR),		.always = false },
-	{ .index = X2APIC_MSR(APIC_TASKPRI),		.always = false },
-	{ .index = X2APIC_MSR(APIC_ARBPRI),		.always = false },
-	{ .index = X2APIC_MSR(APIC_PROCPRI),		.always = false },
-	{ .index = X2APIC_MSR(APIC_EOI),		.always = false },
-	{ .index = X2APIC_MSR(APIC_RRR),		.always = false },
-	{ .index = X2APIC_MSR(APIC_LDR),		.always = false },
-	{ .index = X2APIC_MSR(APIC_DFR),		.always = false },
-	{ .index = X2APIC_MSR(APIC_SPIV),		.always = false },
-	{ .index = X2APIC_MSR(APIC_ISR),		.always = false },
-	{ .index = X2APIC_MSR(APIC_TMR),		.always = false },
-	{ .index = X2APIC_MSR(APIC_IRR),		.always = false },
-	{ .index = X2APIC_MSR(APIC_ESR),		.always = false },
-	{ .index = X2APIC_MSR(APIC_ICR),		.always = false },
-	{ .index = X2APIC_MSR(APIC_ICR2),		.always = false },
+	MSR_IA32_SPEC_CTRL,
+	MSR_IA32_PRED_CMD,
+	MSR_IA32_FLUSH_CMD,
+	MSR_IA32_DEBUGCTLMSR,
+	MSR_IA32_LASTBRANCHFROMIP,
+	MSR_IA32_LASTBRANCHTOIP,
+	MSR_IA32_LASTINTFROMIP,
+	MSR_IA32_LASTINTTOIP,
+	MSR_IA32_XSS,
+	MSR_EFER,
+	MSR_IA32_CR_PAT,
+	MSR_AMD64_SEV_ES_GHCB,
+	MSR_TSC_AUX,
+	X2APIC_MSR(APIC_ID),
+	X2APIC_MSR(APIC_LVR),
+	X2APIC_MSR(APIC_TASKPRI),
+	X2APIC_MSR(APIC_ARBPRI),
+	X2APIC_MSR(APIC_PROCPRI),
+	X2APIC_MSR(APIC_EOI),
+	X2APIC_MSR(APIC_RRR),
+	X2APIC_MSR(APIC_LDR),
+	X2APIC_MSR(APIC_DFR),
+	X2APIC_MSR(APIC_SPIV),
+	X2APIC_MSR(APIC_ISR),
+	X2APIC_MSR(APIC_TMR),
+	X2APIC_MSR(APIC_IRR),
+	X2APIC_MSR(APIC_ESR),
+	X2APIC_MSR(APIC_ICR),
+	X2APIC_MSR(APIC_ICR2),
 
 	/*
 	 * Note:
@@ -136,14 +133,14 @@ static const struct svm_direct_access_msrs {
 	 * the AVIC hardware would generate GP fault. Therefore, always
 	 * intercept the MSR 0x832, and do not setup direct_access_msr.
 	 */
-	{ .index = X2APIC_MSR(APIC_LVTTHMR),		.always = false },
-	{ .index = X2APIC_MSR(APIC_LVTPC),		.always = false },
-	{ .index = X2APIC_MSR(APIC_LVT0),		.always = false },
-	{ .index = X2APIC_MSR(APIC_LVT1),		.always = false },
-	{ .index = X2APIC_MSR(APIC_LVTERR),		.always = false },
-	{ .index = X2APIC_MSR(APIC_TMICT),		.always = false },
-	{ .index = X2APIC_MSR(APIC_TMCCT),		.always = false },
-	{ .index = X2APIC_MSR(APIC_TDCR),		.always = false },
+	X2APIC_MSR(APIC_LVTTHMR),
+	X2APIC_MSR(APIC_LVTPC),
+	X2APIC_MSR(APIC_LVT0),
+	X2APIC_MSR(APIC_LVT1),
+	X2APIC_MSR(APIC_LVTERR),
+	X2APIC_MSR(APIC_TMICT),
+	X2APIC_MSR(APIC_TMCCT),
+	X2APIC_MSR(APIC_TDCR),
 };
 
 static_assert(ARRAY_SIZE(direct_access_msrs) ==
@@ -767,7 +764,7 @@ static int direct_access_msr_slot(u32 msr)
 	u32 i;
 
 	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
-		if (direct_access_msrs[i].index == msr)
+		if (direct_access_msrs[i] == msr)
 			return i;
 	}
 
@@ -931,14 +928,17 @@ u32 *svm_vcpu_alloc_msrpm(void)
 
 static void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu)
 {
-	int i;
+	svm_disable_intercept_for_msr(vcpu, MSR_STAR, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
 
-	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
-		if (!direct_access_msrs[i].always)
-			continue;
-		svm_disable_intercept_for_msr(vcpu, direct_access_msrs[i].index,
-					      MSR_TYPE_RW);
-	}
+#ifdef CONFIG_X86_64
+	svm_disable_intercept_for_msr(vcpu, MSR_GS_BASE, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_FS_BASE, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_LSTAR, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_CSTAR, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_SYSCALL_MASK, MSR_TYPE_RW);
+#endif
 }
 
 void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
@@ -952,7 +952,7 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
 		return;
 
 	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
-		int index = direct_access_msrs[i].index;
+		int index = direct_access_msrs[i];
 
 		if ((index < APIC_BASE_MSR) ||
 		    (index > APIC_BASE_MSR + 0xff))
@@ -980,7 +980,7 @@ static void svm_msr_filter_changed(struct kvm_vcpu *vcpu)
 	 * back in sync after this.
 	 */
 	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
-		u32 msr = direct_access_msrs[i].index;
+		u32 msr = direct_access_msrs[i];
 		u32 read = test_bit(i, svm->shadow_msr_intercept.read);
 		u32 write = test_bit(i, svm->shadow_msr_intercept.write);
 
@@ -1020,7 +1020,7 @@ static __init int init_msrpm_offsets(void)
 	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
 		u32 offset;
 
-		offset = svm_msrpm_offset(direct_access_msrs[i].index);
+		offset = svm_msrpm_offset(direct_access_msrs[i]);
 		if (WARN_ON(offset == MSR_INVALID))
 			return -EIO;
 
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 17/32] KVM: x86: Move definition of X2APIC_MSR() to lapic.h
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (15 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 16/32] KVM: SVM: Drop "always" flag from list of possible passthrough MSRs Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-11  2:29   ` Mi, Dapeng
  2025-06-10 22:57 ` [PATCH v2 18/32] KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter change Sean Christopherson
                   ` (16 subsequent siblings)
  33 siblings, 1 reply; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Dedup the definition of X2APIC_MSR and put it in the local APIC code
where it belongs.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.h   | 2 ++
 arch/x86/kvm/svm/svm.c | 2 --
 arch/x86/kvm/vmx/vmx.h | 2 --
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 4ce30db65828..4518b4e0552f 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -21,6 +21,8 @@
 #define APIC_BROADCAST			0xFF
 #define X2APIC_BROADCAST		0xFFFFFFFFul
 
+#define X2APIC_MSR(r) (APIC_BASE_MSR + ((r) >> 4))
+
 enum lapic_mode {
 	LAPIC_MODE_DISABLED = 0,
 	LAPIC_MODE_INVALID = X2APIC_ENABLE,
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4ee92e444dde..900a1303e0e7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -81,8 +81,6 @@ static uint64_t osvw_len = 4, osvw_status;
 
 static DEFINE_PER_CPU(u64, current_tsc_ratio);
 
-#define X2APIC_MSR(x)	(APIC_BASE_MSR + (x >> 4))
-
 static const u32 direct_access_msrs[] = {
 	MSR_STAR,
 	MSR_IA32_SYSENTER_CS,
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index b5758c33c60f..0afe97e3478f 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -19,8 +19,6 @@
 #include "../mmu.h"
 #include "common.h"
 
-#define X2APIC_MSR(r) (APIC_BASE_MSR + ((r) >> 4))
-
 #ifdef CONFIG_X86_64
 #define MAX_NR_USER_RETURN_MSRS	7
 #else
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 18/32] KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter change
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (16 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 17/32] KVM: x86: Move definition of X2APIC_MSR() to lapic.h Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-11  6:52   ` Binbin Wu
  2025-06-10 22:57 ` [PATCH v2 19/32] KVM: SVM: " Sean Christopherson
                   ` (15 subsequent siblings)
  33 siblings, 1 reply; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

On a userspace MSR filter change, recalculate all MSR intercepts using the
filter-agnostic logic instead of maintaining a "shadow copy" of KVM's
desired intercepts.  The shadow bitmaps add yet another point of failure,
are confusing (e.g. what does "handled specially" mean!?!?), an eyesore,
and a maintenance burden.

Given that KVM *must* be able to recalculate the correct intercepts at any
given time, and that MSR filter updates are not hot paths, there is zero
benefit to maintaining the shadow bitmaps.

Opportunistically switch from boot_cpu_has() to cpu_feature_enabled() as
appropriate.

Link: https://lore.kernel.org/all/aCdPbZiYmtni4Bjs@google.com
Link: https://lore.kernel.org/all/20241126180253.GAZ0YNTdXH1UGeqsu6@fat_crate.local
Cc: Borislav Petkov <bp@alien8.de>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Xin Li (Intel) <xin@zytor.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 183 +++++++++++------------------------------
 arch/x86/kvm/vmx/vmx.h |   7 --
 2 files changed, 46 insertions(+), 144 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 8f7fe04a1998..ce7a1c07e402 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -166,31 +166,6 @@ module_param(allow_smaller_maxphyaddr, bool, S_IRUGO);
 	RTIT_STATUS_ERROR | RTIT_STATUS_STOPPED | \
 	RTIT_STATUS_BYTECNT))
 
-/*
- * List of MSRs that can be directly passed to the guest.
- * In addition to these x2apic, PT and LBR MSRs are handled specially.
- */
-static u32 vmx_possible_passthrough_msrs[MAX_POSSIBLE_PASSTHROUGH_MSRS] = {
-	MSR_IA32_SPEC_CTRL,
-	MSR_IA32_PRED_CMD,
-	MSR_IA32_FLUSH_CMD,
-	MSR_IA32_TSC,
-#ifdef CONFIG_X86_64
-	MSR_FS_BASE,
-	MSR_GS_BASE,
-	MSR_KERNEL_GS_BASE,
-	MSR_IA32_XFD,
-	MSR_IA32_XFD_ERR,
-#endif
-	MSR_IA32_SYSENTER_CS,
-	MSR_IA32_SYSENTER_ESP,
-	MSR_IA32_SYSENTER_EIP,
-	MSR_CORE_C1_RES,
-	MSR_CORE_C3_RESIDENCY,
-	MSR_CORE_C6_RESIDENCY,
-	MSR_CORE_C7_RESIDENCY,
-};
-
 /*
  * These 2 parameters are used to config the controls for Pause-Loop Exiting:
  * ple_gap:    upper bound on the amount of time between two successive
@@ -672,40 +647,6 @@ static inline bool cpu_need_virtualize_apic_accesses(struct kvm_vcpu *vcpu)
 	return flexpriority_enabled && lapic_in_kernel(vcpu);
 }
 
-static int vmx_get_passthrough_msr_slot(u32 msr)
-{
-	int i;
-
-	switch (msr) {
-	case 0x800 ... 0x8ff:
-		/* x2APIC MSRs. These are handled in vmx_update_msr_bitmap_x2apic() */
-		return -ENOENT;
-	case MSR_IA32_RTIT_STATUS:
-	case MSR_IA32_RTIT_OUTPUT_BASE:
-	case MSR_IA32_RTIT_OUTPUT_MASK:
-	case MSR_IA32_RTIT_CR3_MATCH:
-	case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
-		/* PT MSRs. These are handled in pt_update_intercept_for_msr() */
-	case MSR_LBR_SELECT:
-	case MSR_LBR_TOS:
-	case MSR_LBR_INFO_0 ... MSR_LBR_INFO_0 + 31:
-	case MSR_LBR_NHM_FROM ... MSR_LBR_NHM_FROM + 31:
-	case MSR_LBR_NHM_TO ... MSR_LBR_NHM_TO + 31:
-	case MSR_LBR_CORE_FROM ... MSR_LBR_CORE_FROM + 8:
-	case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8:
-		/* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */
-		return -ENOENT;
-	}
-
-	for (i = 0; i < ARRAY_SIZE(vmx_possible_passthrough_msrs); i++) {
-		if (vmx_possible_passthrough_msrs[i] == msr)
-			return i;
-	}
-
-	WARN(1, "Invalid MSR %x, please adapt vmx_possible_passthrough_msrs[]", msr);
-	return -ENOENT;
-}
-
 struct vmx_uret_msr *vmx_find_uret_msr(struct vcpu_vmx *vmx, u32 msr)
 {
 	int i;
@@ -4015,25 +3956,12 @@ void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
-	int idx;
 
 	if (!cpu_has_vmx_msr_bitmap())
 		return;
 
 	vmx_msr_bitmap_l01_changed(vmx);
 
-	/*
-	 * Mark the desired intercept state in shadow bitmap, this is needed
-	 * for resync when the MSR filters change.
-	 */
-	idx = vmx_get_passthrough_msr_slot(msr);
-	if (idx >= 0) {
-		if (type & MSR_TYPE_R)
-			__clear_bit(idx, vmx->shadow_msr_intercept.read);
-		if (type & MSR_TYPE_W)
-			__clear_bit(idx, vmx->shadow_msr_intercept.write);
-	}
-
 	if ((type & MSR_TYPE_R) &&
 	    !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ)) {
 		vmx_set_msr_bitmap_read(msr_bitmap, msr);
@@ -4057,25 +3985,12 @@ void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
-	int idx;
 
 	if (!cpu_has_vmx_msr_bitmap())
 		return;
 
 	vmx_msr_bitmap_l01_changed(vmx);
 
-	/*
-	 * Mark the desired intercept state in shadow bitmap, this is needed
-	 * for resync when the MSR filter changes.
-	 */
-	idx = vmx_get_passthrough_msr_slot(msr);
-	if (idx >= 0) {
-		if (type & MSR_TYPE_R)
-			__set_bit(idx, vmx->shadow_msr_intercept.read);
-		if (type & MSR_TYPE_W)
-			__set_bit(idx, vmx->shadow_msr_intercept.write);
-	}
-
 	if (type & MSR_TYPE_R)
 		vmx_set_msr_bitmap_read(msr_bitmap, msr);
 
@@ -4159,35 +4074,58 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
 	}
 }
 
-void vmx_msr_filter_changed(struct kvm_vcpu *vcpu)
+static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 {
-	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	u32 i;
-
 	if (!cpu_has_vmx_msr_bitmap())
 		return;
 
-	/*
-	 * Redo intercept permissions for MSRs that KVM is passing through to
-	 * the guest.  Disabling interception will check the new MSR filter and
-	 * ensure that KVM enables interception if usersepace wants to filter
-	 * the MSR.  MSRs that KVM is already intercepting don't need to be
-	 * refreshed since KVM is going to intercept them regardless of what
-	 * userspace wants.
-	 */
-	for (i = 0; i < ARRAY_SIZE(vmx_possible_passthrough_msrs); i++) {
-		u32 msr = vmx_possible_passthrough_msrs[i];
-
-		if (!test_bit(i, vmx->shadow_msr_intercept.read))
-			vmx_disable_intercept_for_msr(vcpu, msr, MSR_TYPE_R);
-
-		if (!test_bit(i, vmx->shadow_msr_intercept.write))
-			vmx_disable_intercept_for_msr(vcpu, msr, MSR_TYPE_W);
+	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_TSC, MSR_TYPE_R);
+#ifdef CONFIG_X86_64
+	vmx_disable_intercept_for_msr(vcpu, MSR_FS_BASE, MSR_TYPE_RW);
+	vmx_disable_intercept_for_msr(vcpu, MSR_GS_BASE, MSR_TYPE_RW);
+	vmx_disable_intercept_for_msr(vcpu, MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
+#endif
+	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
+	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
+	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
+	if (kvm_cstate_in_guest(vcpu->kvm)) {
+		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C1_RES, MSR_TYPE_R);
+		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
+		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
+		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
 	}
 
 	/* PT MSRs can be passed through iff PT is exposed to the guest. */
 	if (vmx_pt_mode_is_host_guest())
 		pt_update_intercept_for_msr(vcpu);
+
+	if (vcpu->arch.xfd_no_write_intercept)
+		vmx_disable_intercept_for_msr(vcpu, MSR_IA32_XFD, MSR_TYPE_RW);
+
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW,
+				  !to_vmx(vcpu)->spec_ctrl);
+
+	if (kvm_cpu_cap_has(X86_FEATURE_XFD))
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_R,
+					  !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD));
+
+	if (cpu_feature_enabled(X86_FEATURE_IBPB))
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
+					  !guest_has_pred_cmd_msr(vcpu));
+
+	if (cpu_feature_enabled(X86_FEATURE_FLUSH_L1D))
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
+					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
+
+	/*
+	 * x2APIC and LBR MSR intercepts are modified on-demand and cannot be
+	 * filtered by userspace.
+	 */
+}
+
+void vmx_msr_filter_changed(struct kvm_vcpu *vcpu)
+{
+	vmx_recalc_msr_intercepts(vcpu);
 }
 
 static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
@@ -7537,26 +7475,6 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu)
 		evmcs->hv_enlightenments_control.msr_bitmap = 1;
 	}
 
-	/* The MSR bitmap starts with all ones */
-	bitmap_fill(vmx->shadow_msr_intercept.read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
-	bitmap_fill(vmx->shadow_msr_intercept.write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
-
-	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_TSC, MSR_TYPE_R);
-#ifdef CONFIG_X86_64
-	vmx_disable_intercept_for_msr(vcpu, MSR_FS_BASE, MSR_TYPE_RW);
-	vmx_disable_intercept_for_msr(vcpu, MSR_GS_BASE, MSR_TYPE_RW);
-	vmx_disable_intercept_for_msr(vcpu, MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
-#endif
-	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
-	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
-	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
-	if (kvm_cstate_in_guest(vcpu->kvm)) {
-		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C1_RES, MSR_TYPE_R);
-		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
-		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
-		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
-	}
-
 	vmx->loaded_vmcs = &vmx->vmcs01;
 
 	if (cpu_need_virtualize_apic_accesses(vcpu)) {
@@ -7842,18 +7760,6 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		}
 	}
 
-	if (kvm_cpu_cap_has(X86_FEATURE_XFD))
-		vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_R,
-					  !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD));
-
-	if (boot_cpu_has(X86_FEATURE_IBPB))
-		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
-					  !guest_has_pred_cmd_msr(vcpu));
-
-	if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
-		vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
-					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
-
 	set_cr4_guest_host_mask(vmx);
 
 	vmx_write_encls_bitmap(vcpu, NULL);
@@ -7869,6 +7775,9 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		vmx->msr_ia32_feature_control_valid_bits &=
 			~FEAT_CTL_SGX_LC_ENABLED;
 
+	/* Recalc MSR interception to account for feature changes. */
+	vmx_recalc_msr_intercepts(vcpu);
+
 	/* Refresh #PF interception to account for MAXPHYADDR changes. */
 	vmx_update_exception_bitmap(vcpu);
 }
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 0afe97e3478f..a26fe3d9e1d2 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -294,13 +294,6 @@ struct vcpu_vmx {
 	struct pt_desc pt_desc;
 	struct lbr_desc lbr_desc;
 
-	/* Save desired MSR intercept (read: pass-through) state */
-#define MAX_POSSIBLE_PASSTHROUGH_MSRS	16
-	struct {
-		DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
-		DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
-	} shadow_msr_intercept;
-
 	/* ve_info must be page aligned. */
 	struct vmx_ve_information *ve_info;
 };
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 19/32] KVM: SVM: Manually recalc all MSR intercepts on userspace MSR filter change
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (17 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 18/32] KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter change Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 20/32] KVM: x86: Rename msr_filter_changed() => recalc_msr_intercepts() Sean Christopherson
                   ` (14 subsequent siblings)
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

On a userspace MSR filter change, recalculate all MSR intercepts using the
filter-agnostic logic instead of maintaining a "shadow copy" of KVM's
desired intercepts.  The shadow bitmaps add yet another point of failure,
are confusing (e.g. what does "handled specially" mean!?!?), an eyesore,
and a maintenance burden.

Given that KVM *must* be able to recalculate the correct intercepts at any
given time, and that MSR filter updates are not hot paths, there is zero
benefit to maintaining the shadow bitmaps.

Opportunistically switch from boot_cpu_has() to cpu_feature_enabled() as
appropriate.

Link: https://lore.kernel.org/all/aCdPbZiYmtni4Bjs@google.com
Link: https://lore.kernel.org/all/20241126180253.GAZ0YNTdXH1UGeqsu6@fat_crate.local
Cc: Francesco Lavra <francescolavra.fl@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c |  16 +-
 arch/x86/kvm/svm/svm.c | 373 +++++++++++------------------------------
 arch/x86/kvm/svm/svm.h |  10 +-
 3 files changed, 108 insertions(+), 291 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a020aa755a7e..6282c2930cda 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4347,9 +4347,12 @@ int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in)
 				    count, in);
 }
 
-static void sev_es_vcpu_after_set_cpuid(struct vcpu_svm *svm)
+void sev_es_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 {
-	struct kvm_vcpu *vcpu = &svm->vcpu;
+	/* Clear intercepts on MSRs that are context switched by hardware. */
+	svm_disable_intercept_for_msr(vcpu, MSR_AMD64_SEV_ES_GHCB, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_EFER, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_IA32_CR_PAT, MSR_TYPE_RW);
 
 	if (boot_cpu_has(X86_FEATURE_V_TSC_AUX))
 		svm_set_intercept_for_msr(vcpu, MSR_TSC_AUX, MSR_TYPE_RW,
@@ -4384,16 +4387,12 @@ void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm)
 	best = kvm_find_cpuid_entry(vcpu, 0x8000001F);
 	if (best)
 		vcpu->arch.reserved_gpa_bits &= ~(1UL << (best->ebx & 0x3f));
-
-	if (sev_es_guest(svm->vcpu.kvm))
-		sev_es_vcpu_after_set_cpuid(svm);
 }
 
 static void sev_es_init_vmcb(struct vcpu_svm *svm)
 {
 	struct kvm_sev_info *sev = to_kvm_sev_info(svm->vcpu.kvm);
 	struct vmcb *vmcb = svm->vmcb01.ptr;
-	struct kvm_vcpu *vcpu = &svm->vcpu;
 
 	svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ES_ENABLE;
 
@@ -4447,11 +4446,6 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
 
 	/* Can't intercept XSETBV, HV can't modify XCR0 directly */
 	svm_clr_intercept(svm, INTERCEPT_XSETBV);
-
-	/* Clear intercepts on MSRs that are context switched by hardware. */
-	svm_disable_intercept_for_msr(vcpu, MSR_AMD64_SEV_ES_GHCB, MSR_TYPE_RW);
-	svm_disable_intercept_for_msr(vcpu, MSR_EFER, MSR_TYPE_RW);
-	svm_disable_intercept_for_msr(vcpu, MSR_IA32_CR_PAT, MSR_TYPE_RW);
 }
 
 void sev_init_vmcb(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 900a1303e0e7..de3d59c71229 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -71,8 +71,6 @@ MODULE_DEVICE_TABLE(x86cpu, svm_cpu_id);
 
 static bool erratum_383_found __read_mostly;
 
-u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
-
 /*
  * Set osvw_len to higher value when updated Revision Guides
  * are published and we know what the new status bits are
@@ -81,70 +79,6 @@ static uint64_t osvw_len = 4, osvw_status;
 
 static DEFINE_PER_CPU(u64, current_tsc_ratio);
 
-static const u32 direct_access_msrs[] = {
-	MSR_STAR,
-	MSR_IA32_SYSENTER_CS,
-	MSR_IA32_SYSENTER_EIP,
-	MSR_IA32_SYSENTER_ESP,
-#ifdef CONFIG_X86_64
-	MSR_GS_BASE,
-	MSR_FS_BASE,
-	MSR_KERNEL_GS_BASE,
-	MSR_LSTAR,
-	MSR_CSTAR,
-	MSR_SYSCALL_MASK,
-#endif
-	MSR_IA32_SPEC_CTRL,
-	MSR_IA32_PRED_CMD,
-	MSR_IA32_FLUSH_CMD,
-	MSR_IA32_DEBUGCTLMSR,
-	MSR_IA32_LASTBRANCHFROMIP,
-	MSR_IA32_LASTBRANCHTOIP,
-	MSR_IA32_LASTINTFROMIP,
-	MSR_IA32_LASTINTTOIP,
-	MSR_IA32_XSS,
-	MSR_EFER,
-	MSR_IA32_CR_PAT,
-	MSR_AMD64_SEV_ES_GHCB,
-	MSR_TSC_AUX,
-	X2APIC_MSR(APIC_ID),
-	X2APIC_MSR(APIC_LVR),
-	X2APIC_MSR(APIC_TASKPRI),
-	X2APIC_MSR(APIC_ARBPRI),
-	X2APIC_MSR(APIC_PROCPRI),
-	X2APIC_MSR(APIC_EOI),
-	X2APIC_MSR(APIC_RRR),
-	X2APIC_MSR(APIC_LDR),
-	X2APIC_MSR(APIC_DFR),
-	X2APIC_MSR(APIC_SPIV),
-	X2APIC_MSR(APIC_ISR),
-	X2APIC_MSR(APIC_TMR),
-	X2APIC_MSR(APIC_IRR),
-	X2APIC_MSR(APIC_ESR),
-	X2APIC_MSR(APIC_ICR),
-	X2APIC_MSR(APIC_ICR2),
-
-	/*
-	 * Note:
-	 * AMD does not virtualize APIC TSC-deadline timer mode, but it is
-	 * emulated by KVM. When setting APIC LVTT (0x832) register bit 18,
-	 * the AVIC hardware would generate GP fault. Therefore, always
-	 * intercept the MSR 0x832, and do not setup direct_access_msr.
-	 */
-	X2APIC_MSR(APIC_LVTTHMR),
-	X2APIC_MSR(APIC_LVTPC),
-	X2APIC_MSR(APIC_LVT0),
-	X2APIC_MSR(APIC_LVT1),
-	X2APIC_MSR(APIC_LVTERR),
-	X2APIC_MSR(APIC_TMICT),
-	X2APIC_MSR(APIC_TMCCT),
-	X2APIC_MSR(APIC_TDCR),
-};
-
-static_assert(ARRAY_SIZE(direct_access_msrs) ==
-	      MAX_DIRECT_ACCESS_MSRS - 6 * !IS_ENABLED(CONFIG_X86_64));
-#undef MAX_DIRECT_ACCESS_MSRS
-
 /*
  * These 2 parameters are used to config the controls for Pause-Loop Exiting:
  * pause_filter_count: On processors that support Pause filtering(indicated
@@ -757,44 +691,6 @@ static void clr_dr_intercepts(struct vcpu_svm *svm)
 	recalc_intercepts(svm);
 }
 
-static int direct_access_msr_slot(u32 msr)
-{
-	u32 i;
-
-	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
-		if (direct_access_msrs[i] == msr)
-			return i;
-	}
-
-	return -ENOENT;
-}
-
-static void set_shadow_msr_intercept(struct kvm_vcpu *vcpu, u32 msr, int read,
-				     int write)
-{
-	struct vcpu_svm *svm = to_svm(vcpu);
-	int slot = direct_access_msr_slot(msr);
-
-	if (slot == -ENOENT)
-		return;
-
-	/* Set the shadow bitmaps to the desired intercept states */
-	if (read)
-		__set_bit(slot, svm->shadow_msr_intercept.read);
-	else
-		__clear_bit(slot, svm->shadow_msr_intercept.read);
-
-	if (write)
-		__set_bit(slot, svm->shadow_msr_intercept.write);
-	else
-		__clear_bit(slot, svm->shadow_msr_intercept.write);
-}
-
-static bool valid_msr_intercept(u32 index)
-{
-	return direct_access_msr_slot(index) != -ENOENT;
-}
-
 static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
 {
 	/*
@@ -812,62 +708,11 @@ static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
 	return svm_test_msr_bitmap_write(msrpm, msr);
 }
 
-static void set_msr_interception_bitmap(struct kvm_vcpu *vcpu, u32 *msrpm,
-					u32 msr, int read, int write)
-{
-	struct vcpu_svm *svm = to_svm(vcpu);
-	u8 bit_read, bit_write;
-	unsigned long tmp;
-	u32 offset;
-
-	/*
-	 * If this warning triggers extend the direct_access_msrs list at the
-	 * beginning of the file
-	 */
-	WARN_ON(!valid_msr_intercept(msr));
-
-	/* Enforce non allowed MSRs to trap */
-	if (read && !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ))
-		read = 0;
-
-	if (write && !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE))
-		write = 0;
-
-	offset    = svm_msrpm_offset(msr);
-	if (KVM_BUG_ON(offset == MSR_INVALID, vcpu->kvm))
-		return;
-
-	bit_read  = 2 * (msr & 0x0f);
-	bit_write = 2 * (msr & 0x0f) + 1;
-	tmp       = msrpm[offset];
-
-	read  ? __clear_bit(bit_read,  &tmp) : __set_bit(bit_read,  &tmp);
-	write ? __clear_bit(bit_write, &tmp) : __set_bit(bit_write, &tmp);
-
-	if (read)
-		svm_clear_msr_bitmap_read((void *)msrpm, msr);
-	else
-		svm_set_msr_bitmap_read((void *)msrpm, msr);
-
-	if (write)
-		svm_clear_msr_bitmap_write((void *)msrpm, msr);
-	else
-		svm_set_msr_bitmap_write((void *)msrpm, msr);
-
-	WARN_ON_ONCE(msrpm[offset] != (u32)tmp);
-
-	svm_hv_vmcb_dirty_nested_enlightenments(vcpu);
-	svm->nested.force_msr_bitmap_recalc = true;
-}
-
 void svm_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	void *msrpm = svm->msrpm;
 
-	/* Note, the shadow intercept bitmaps have inverted polarity. */
-	set_shadow_msr_intercept(vcpu, msr, type & MSR_TYPE_R, type & MSR_TYPE_W);
-
 	/* Don't disable interception for MSRs userspace wants to handle. */
 	if ((type & MSR_TYPE_R) &&
 	    !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ)) {
@@ -896,9 +741,6 @@ void svm_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
 	struct vcpu_svm *svm = to_svm(vcpu);
 	void *msrpm = svm->msrpm;
 
-	set_shadow_msr_intercept(vcpu, msr,
-				 !(type & MSR_TYPE_R), !(type & MSR_TYPE_W));
-
 	if (type & MSR_TYPE_R)
 		svm_set_msr_bitmap_read(msrpm, msr);
 
@@ -924,6 +766,19 @@ u32 *svm_vcpu_alloc_msrpm(void)
 	return msrpm;
 }
 
+static void svm_recalc_lbr_msr_intercepts(struct kvm_vcpu *vcpu)
+{
+	bool intercept = !(to_svm(vcpu)->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK);
+
+	svm_set_intercept_for_msr(vcpu, MSR_IA32_LASTBRANCHFROMIP, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(vcpu, MSR_IA32_LASTBRANCHTOIP, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(vcpu, MSR_IA32_LASTINTFROMIP, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(vcpu, MSR_IA32_LASTINTTOIP, MSR_TYPE_RW, intercept);
+
+	if (sev_es_guest(vcpu->kvm))
+		svm_set_intercept_for_msr(vcpu, MSR_IA32_DEBUGCTLMSR, MSR_TYPE_RW, intercept);
+}
+
 static void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu)
 {
 	svm_disable_intercept_for_msr(vcpu, MSR_STAR, MSR_TYPE_RW);
@@ -941,6 +796,38 @@ static void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu)
 
 void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
 {
+	static const u32 x2avic_passthrough_msrs[] = {
+		X2APIC_MSR(APIC_ID),
+		X2APIC_MSR(APIC_LVR),
+		X2APIC_MSR(APIC_TASKPRI),
+		X2APIC_MSR(APIC_ARBPRI),
+		X2APIC_MSR(APIC_PROCPRI),
+		X2APIC_MSR(APIC_EOI),
+		X2APIC_MSR(APIC_RRR),
+		X2APIC_MSR(APIC_LDR),
+		X2APIC_MSR(APIC_DFR),
+		X2APIC_MSR(APIC_SPIV),
+		X2APIC_MSR(APIC_ISR),
+		X2APIC_MSR(APIC_TMR),
+		X2APIC_MSR(APIC_IRR),
+		X2APIC_MSR(APIC_ESR),
+		X2APIC_MSR(APIC_ICR),
+		X2APIC_MSR(APIC_ICR2),
+
+		/*
+		 * Note!  Always intercept LVTT, as TSC-deadline timer mode
+		 * isn't virtualized by hardware, and the CPU will generate a
+		 * #GP instead of a #VMEXIT.
+		 */
+		X2APIC_MSR(APIC_LVTTHMR),
+		X2APIC_MSR(APIC_LVTPC),
+		X2APIC_MSR(APIC_LVT0),
+		X2APIC_MSR(APIC_LVT1),
+		X2APIC_MSR(APIC_LVTERR),
+		X2APIC_MSR(APIC_TMICT),
+		X2APIC_MSR(APIC_TMCCT),
+		X2APIC_MSR(APIC_TDCR),
+	};
 	int i;
 
 	if (intercept == svm->x2avic_msrs_intercepted)
@@ -949,15 +836,9 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
 	if (!x2avic_enabled)
 		return;
 
-	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
-		int index = direct_access_msrs[i];
-
-		if ((index < APIC_BASE_MSR) ||
-		    (index > APIC_BASE_MSR + 0xff))
-			continue;
-
-		svm_set_intercept_for_msr(&svm->vcpu, index, MSR_TYPE_RW, intercept);
-	}
+	for (i = 0; i < ARRAY_SIZE(x2avic_passthrough_msrs); i++)
+		svm_set_intercept_for_msr(&svm->vcpu, x2avic_passthrough_msrs[i],
+					  MSR_TYPE_RW, intercept);
 
 	svm->x2avic_msrs_intercepted = intercept;
 }
@@ -967,65 +848,57 @@ void svm_vcpu_free_msrpm(u32 *msrpm)
 	__free_pages(virt_to_page(msrpm), get_order(MSRPM_SIZE));
 }
 
+static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	svm_vcpu_init_msrpm(vcpu);
+
+	if (lbrv)
+		svm_recalc_lbr_msr_intercepts(vcpu);
+
+	if (cpu_feature_enabled(X86_FEATURE_IBPB))
+		svm_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
+					  !guest_has_pred_cmd_msr(vcpu));
+
+	if (cpu_feature_enabled(X86_FEATURE_FLUSH_L1D))
+		svm_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
+					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
+
+	/*
+	 * Disable interception of SPEC_CTRL if KVM doesn't need to manually
+	 * context switch the MSR (SPEC_CTRL is virtualized by the CPU), or if
+	 * the guest has a non-zero SPEC_CTRL value, i.e. is likely actively
+	 * using SPEC_CTRL.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_V_SPEC_CTRL))
+		svm_set_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW,
+					  !guest_has_spec_ctrl_msr(vcpu));
+	else
+		svm_set_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW,
+					  !svm->spec_ctrl);
+
+	/*
+	 * Intercept SYSENTER_EIP and SYSENTER_ESP when emulating an Intel CPU,
+	 * as AMD hardware only store 32 bits, whereas Intel CPUs track 64 bits.
+	 */
+	svm_set_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW,
+				  guest_cpuid_is_intel_compatible(vcpu));
+	svm_set_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW,
+				  guest_cpuid_is_intel_compatible(vcpu));
+
+	if (sev_es_guest(vcpu->kvm))
+		sev_es_recalc_msr_intercepts(vcpu);
+
+	/*
+	 * x2APIC intercepts are modified on-demand and cannot be filtered by
+	 * userspace.
+	 */
+}
+
 static void svm_msr_filter_changed(struct kvm_vcpu *vcpu)
 {
-	struct vcpu_svm *svm = to_svm(vcpu);
-	u32 i;
-
-	/*
-	 * Set intercept permissions for all direct access MSRs again. They
-	 * will automatically get filtered through the MSR filter, so we are
-	 * back in sync after this.
-	 */
-	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
-		u32 msr = direct_access_msrs[i];
-		u32 read = test_bit(i, svm->shadow_msr_intercept.read);
-		u32 write = test_bit(i, svm->shadow_msr_intercept.write);
-
-		set_msr_interception_bitmap(vcpu, svm->msrpm, msr, read, write);
-	}
-}
-
-static __init int add_msr_offset(u32 offset)
-{
-	int i;
-
-	for (i = 0; i < MSRPM_OFFSETS; ++i) {
-
-		/* Offset already in list? */
-		if (msrpm_offsets[i] == offset)
-			return 0;
-
-		/* Slot used by another offset? */
-		if (msrpm_offsets[i] != MSR_INVALID)
-			continue;
-
-		/* Add offset to list */
-		msrpm_offsets[i] = offset;
-
-		return 0;
-	}
-
-	return -ENOSPC;
-}
-
-static __init int init_msrpm_offsets(void)
-{
-	int i;
-
-	memset(msrpm_offsets, 0xff, sizeof(msrpm_offsets));
-
-	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
-		u32 offset;
-
-		offset = svm_msrpm_offset(direct_access_msrs[i]);
-		if (WARN_ON(offset == MSR_INVALID))
-			return -EIO;
-
-		if (WARN_ON_ONCE(add_msr_offset(offset)))
-			return -EIO;
-	}
-	return 0;
+	svm_recalc_msr_intercepts(vcpu);
 }
 
 void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
@@ -1044,13 +917,7 @@ void svm_enable_lbrv(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 
 	svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK;
-	svm_disable_intercept_for_msr(vcpu, MSR_IA32_LASTBRANCHFROMIP, MSR_TYPE_RW);
-	svm_disable_intercept_for_msr(vcpu, MSR_IA32_LASTBRANCHTOIP, MSR_TYPE_RW);
-	svm_disable_intercept_for_msr(vcpu, MSR_IA32_LASTINTFROMIP, MSR_TYPE_RW);
-	svm_disable_intercept_for_msr(vcpu, MSR_IA32_LASTINTTOIP, MSR_TYPE_RW);
-
-	if (sev_es_guest(vcpu->kvm))
-		svm_disable_intercept_for_msr(vcpu, MSR_IA32_DEBUGCTLMSR, MSR_TYPE_RW);
+	svm_recalc_lbr_msr_intercepts(vcpu);
 
 	/* Move the LBR msrs to the vmcb02 so that the guest can see them. */
 	if (is_guest_mode(vcpu))
@@ -1064,10 +931,7 @@ static void svm_disable_lbrv(struct kvm_vcpu *vcpu)
 	KVM_BUG_ON(sev_es_guest(vcpu->kvm), vcpu->kvm);
 
 	svm->vmcb->control.virt_ext &= ~LBR_CTL_ENABLE_MASK;
-	svm_enable_intercept_for_msr(vcpu, MSR_IA32_LASTBRANCHFROMIP, MSR_TYPE_RW);
-	svm_enable_intercept_for_msr(vcpu, MSR_IA32_LASTBRANCHTOIP, MSR_TYPE_RW);
-	svm_enable_intercept_for_msr(vcpu, MSR_IA32_LASTINTFROMIP, MSR_TYPE_RW);
-	svm_enable_intercept_for_msr(vcpu, MSR_IA32_LASTINTTOIP, MSR_TYPE_RW);
+	svm_recalc_lbr_msr_intercepts(vcpu);
 
 	/*
 	 * Move the LBR msrs back to the vmcb01 to avoid copying them
@@ -1250,17 +1114,9 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 
 	if (guest_cpuid_is_intel_compatible(vcpu)) {
-		/*
-		 * We must intercept SYSENTER_EIP and SYSENTER_ESP
-		 * accesses because the processor only stores 32 bits.
-		 * For the same reason we cannot use virtual VMLOAD/VMSAVE.
-		 */
 		svm_set_intercept(svm, INTERCEPT_VMLOAD);
 		svm_set_intercept(svm, INTERCEPT_VMSAVE);
 		svm->vmcb->control.virt_ext &= ~VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
-
-		svm_enable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
-		svm_enable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
 	} else {
 		/*
 		 * If hardware supports Virtual VMLOAD VMSAVE then enable it
@@ -1271,10 +1127,9 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
 			svm_clr_intercept(svm, INTERCEPT_VMSAVE);
 			svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
 		}
-		/* No need to intercept these MSRs */
-		svm_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
-		svm_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
 	}
+
+	svm_recalc_msr_intercepts(vcpu);
 }
 
 static void init_vmcb(struct kvm_vcpu *vcpu)
@@ -1401,15 +1256,6 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 
 	svm_recalc_instruction_intercepts(vcpu, svm);
 
-	/*
-	 * If the CPU virtualizes MSR_IA32_SPEC_CTRL, i.e. KVM doesn't need to
-	 * manually context switch the MSR, immediately configure interception
-	 * of SPEC_CTRL, without waiting for the guest to access the MSR.
-	 */
-	if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
-		svm_set_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW,
-					  !guest_has_spec_ctrl_msr(vcpu));
-
 	if (kvm_vcpu_apicv_active(vcpu))
 		avic_init_vmcb(svm, vmcb);
 
@@ -1440,8 +1286,6 @@ static void __svm_vcpu_reset(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	svm_vcpu_init_msrpm(vcpu);
-
 	svm_init_osvw(vcpu);
 
 	if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_STUFF_FEATURE_MSRS))
@@ -3241,8 +3085,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 
 		/*
 		 * TSC_AUX is usually changed only during boot and never read
-		 * directly.  Intercept TSC_AUX instead of exposing it to the
-		 * guest via direct_access_msrs, and switch it via user return.
+		 * directly.  Intercept TSC_AUX and switch it via user return.
 		 */
 		preempt_disable();
 		ret = kvm_set_user_return_msr(tsc_aux_uret_slot, data, -1ull);
@@ -4678,14 +4521,6 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 	svm_recalc_instruction_intercepts(vcpu, svm);
 
-	if (boot_cpu_has(X86_FEATURE_IBPB))
-		svm_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
-					  !guest_has_pred_cmd_msr(vcpu));
-
-	if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
-		svm_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
-					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
-
 	if (sev_guest(vcpu->kvm))
 		sev_vcpu_after_set_cpuid(svm);
 
@@ -5544,10 +5379,6 @@ static __init int svm_hardware_setup(void)
 	}
 	kvm_enable_efer_bits(EFER_NX);
 
-	r = init_msrpm_offsets();
-	if (r)
-		return r;
-
 	kvm_caps.supported_xcr0 &= ~(XFEATURE_MASK_BNDREGS |
 				     XFEATURE_MASK_BNDCSR);
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 5d5805ab59a7..91c4eb2232e0 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -44,9 +44,6 @@ static inline struct page *__sme_pa_to_page(unsigned long pa)
 #define	IOPM_SIZE PAGE_SIZE * 3
 #define	MSRPM_SIZE PAGE_SIZE * 2
 
-#define MAX_DIRECT_ACCESS_MSRS	47
-#define MSRPM_OFFSETS	32
-extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
 extern bool npt_enabled;
 extern int nrips;
 extern int vgif;
@@ -318,12 +315,6 @@ struct vcpu_svm {
 	struct list_head ir_list;
 	spinlock_t ir_list_lock;
 
-	/* Save desired MSR intercept (read: pass-through) state */
-	struct {
-		DECLARE_BITMAP(read, MAX_DIRECT_ACCESS_MSRS);
-		DECLARE_BITMAP(write, MAX_DIRECT_ACCESS_MSRS);
-	} shadow_msr_intercept;
-
 	struct vcpu_sev_es_state sev_es;
 
 	bool guest_state_loaded;
@@ -820,6 +811,7 @@ void sev_init_vmcb(struct vcpu_svm *svm);
 void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm);
 int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
 void sev_es_vcpu_reset(struct vcpu_svm *svm);
+void sev_es_recalc_msr_intercepts(struct kvm_vcpu *vcpu);
 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
 void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa);
 void sev_es_unmap_ghcb(struct vcpu_svm *svm);
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 20/32] KVM: x86: Rename msr_filter_changed() => recalc_msr_intercepts()
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (18 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 19/32] KVM: SVM: " Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-11  7:09   ` Binbin Wu
  2025-06-10 22:57 ` [PATCH v2 21/32] KVM: SVM: Rename init_vmcb_after_set_cpuid() to make it intercepts specific Sean Christopherson
                   ` (13 subsequent siblings)
  33 siblings, 1 reply; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Rename msr_filter_changed() to recalc_msr_intercepts() and drop the
trampoline wrapper now that both SVM and VMX use a filter-agnostic recalc
helper to react to the new userspace filter.

No functional change intended.

Reviewed-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm-x86-ops.h | 2 +-
 arch/x86/include/asm/kvm_host.h    | 2 +-
 arch/x86/kvm/svm/svm.c             | 8 +-------
 arch/x86/kvm/vmx/main.c            | 6 +++---
 arch/x86/kvm/vmx/vmx.c             | 7 +------
 arch/x86/kvm/vmx/x86_ops.h         | 2 +-
 arch/x86/kvm/x86.c                 | 8 +++++++-
 7 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 8d50e3e0a19b..19a6735d6dd8 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -139,7 +139,7 @@ KVM_X86_OP(check_emulate_instruction)
 KVM_X86_OP(apic_init_signal_blocked)
 KVM_X86_OP_OPTIONAL(enable_l2_tlb_flush)
 KVM_X86_OP_OPTIONAL(migrate_timers)
-KVM_X86_OP(msr_filter_changed)
+KVM_X86_OP(recalc_msr_intercepts)
 KVM_X86_OP(complete_emulated_msr)
 KVM_X86_OP(vcpu_deliver_sipi_vector)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 330cdcbed1a6..89a626e5b80f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1885,7 +1885,7 @@ struct kvm_x86_ops {
 	int (*enable_l2_tlb_flush)(struct kvm_vcpu *vcpu);
 
 	void (*migrate_timers)(struct kvm_vcpu *vcpu);
-	void (*msr_filter_changed)(struct kvm_vcpu *vcpu);
+	void (*recalc_msr_intercepts)(struct kvm_vcpu *vcpu);
 	int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
 
 	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index de3d59c71229..710bc5f965dc 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -896,11 +896,6 @@ static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 	 */
 }
 
-static void svm_msr_filter_changed(struct kvm_vcpu *vcpu)
-{
-	svm_recalc_msr_intercepts(vcpu);
-}
-
 void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
 {
 	to_vmcb->save.dbgctl		= from_vmcb->save.dbgctl;
@@ -929,7 +924,6 @@ static void svm_disable_lbrv(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 
 	KVM_BUG_ON(sev_es_guest(vcpu->kvm), vcpu->kvm);
-
 	svm->vmcb->control.virt_ext &= ~LBR_CTL_ENABLE_MASK;
 	svm_recalc_lbr_msr_intercepts(vcpu);
 
@@ -5227,7 +5221,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
 
-	.msr_filter_changed = svm_msr_filter_changed,
+	.recalc_msr_intercepts = svm_recalc_msr_intercepts,
 	.complete_emulated_msr = svm_complete_emulated_msr,
 
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index d1e02e567b57..b3c58731a2f5 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -220,7 +220,7 @@ static int vt_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	return vmx_get_msr(vcpu, msr_info);
 }
 
-static void vt_msr_filter_changed(struct kvm_vcpu *vcpu)
+static void vt_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 {
 	/*
 	 * TDX doesn't allow VMM to configure interception of MSR accesses.
@@ -231,7 +231,7 @@ static void vt_msr_filter_changed(struct kvm_vcpu *vcpu)
 	if (is_td_vcpu(vcpu))
 		return;
 
-	vmx_msr_filter_changed(vcpu);
+	vmx_recalc_msr_intercepts(vcpu);
 }
 
 static int vt_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
@@ -1034,7 +1034,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.apic_init_signal_blocked = vt_op(apic_init_signal_blocked),
 	.migrate_timers = vmx_migrate_timers,
 
-	.msr_filter_changed = vt_op(msr_filter_changed),
+	.recalc_msr_intercepts = vt_op(recalc_msr_intercepts),
 	.complete_emulated_msr = vt_op(complete_emulated_msr),
 
 	.vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ce7a1c07e402..bdff81f8288d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4074,7 +4074,7 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
 	}
 }
 
-static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
+void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 {
 	if (!cpu_has_vmx_msr_bitmap())
 		return;
@@ -4123,11 +4123,6 @@ static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 	 */
 }
 
-void vmx_msr_filter_changed(struct kvm_vcpu *vcpu)
-{
-	vmx_recalc_msr_intercepts(vcpu);
-}
-
 static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
 						int vector)
 {
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index b4596f651232..34c6e683e321 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -52,7 +52,7 @@ void vmx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
 			   int trig_mode, int vector);
 void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu);
 bool vmx_has_emulated_msr(struct kvm *kvm, u32 index);
-void vmx_msr_filter_changed(struct kvm_vcpu *vcpu);
+void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu);
 void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
 void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu);
 int vmx_get_feature_msr(u32 msr, u64 *data);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dd34a2ec854c..cc9a01b6dbc8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10926,8 +10926,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_vcpu_update_apicv(vcpu);
 		if (kvm_check_request(KVM_REQ_APF_READY, vcpu))
 			kvm_check_async_pf_completion(vcpu);
+
+		/*
+		 * Recalc MSR intercepts as userspace may want to intercept
+		 * accesses to MSRs that KVM would otherwise pass through to
+		 * the guest.
+		 */
 		if (kvm_check_request(KVM_REQ_MSR_FILTER_CHANGED, vcpu))
-			kvm_x86_call(msr_filter_changed)(vcpu);
+			kvm_x86_call(recalc_msr_intercepts)(vcpu);
 
 		if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
 			kvm_x86_call(update_cpu_dirty_logging)(vcpu);
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 21/32] KVM: SVM: Rename init_vmcb_after_set_cpuid() to make it intercepts specific
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (19 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 20/32] KVM: x86: Rename msr_filter_changed() => recalc_msr_intercepts() Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 22/32] KVM: SVM: Fold svm_vcpu_init_msrpm() into its sole caller Sean Christopherson
                   ` (12 subsequent siblings)
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Rename init_vmcb_after_set_cpuid() to svm_recalc_intercepts_after_set_cpuid()
to more precisely describe its role.  Strictly speaking, the name isn't
perfect as toggling virtual VM{LOAD,SAVE} is arguably not recalculating an
intercept, but practically speaking it's close enough.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 710bc5f965dc..1e3250ed2954 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1103,7 +1103,7 @@ static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
 	}
 }
 
-static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
+static void svm_recalc_intercepts_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
@@ -1269,7 +1269,8 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 		sev_init_vmcb(svm);
 
 	svm_hv_init_vmcb(vmcb);
-	init_vmcb_after_set_cpuid(vcpu);
+
+	svm_recalc_intercepts_after_set_cpuid(vcpu);
 
 	vmcb_mark_all_dirty(vmcb);
 
@@ -4518,7 +4519,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	if (sev_guest(vcpu->kvm))
 		sev_vcpu_after_set_cpuid(svm);
 
-	init_vmcb_after_set_cpuid(vcpu);
+	svm_recalc_intercepts_after_set_cpuid(vcpu);
 }
 
 static bool svm_has_wbinvd_exit(void)
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 22/32] KVM: SVM: Fold svm_vcpu_init_msrpm() into its sole caller
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (20 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 21/32] KVM: SVM: Rename init_vmcb_after_set_cpuid() to make it intercepts specific Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 23/32] KVM: SVM: Merge "after set CPUID" intercept recalc helpers Sean Christopherson
                   ` (11 subsequent siblings)
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Fold svm_vcpu_init_msrpm() into svm_recalc_msr_intercepts() now that there
is only the one caller (and because the "init" misnomer is even more
misleading than it was in the past).

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1e3250ed2954..be2e6914e9d9 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -779,21 +779,6 @@ static void svm_recalc_lbr_msr_intercepts(struct kvm_vcpu *vcpu)
 		svm_set_intercept_for_msr(vcpu, MSR_IA32_DEBUGCTLMSR, MSR_TYPE_RW, intercept);
 }
 
-static void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu)
-{
-	svm_disable_intercept_for_msr(vcpu, MSR_STAR, MSR_TYPE_RW);
-	svm_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
-
-#ifdef CONFIG_X86_64
-	svm_disable_intercept_for_msr(vcpu, MSR_GS_BASE, MSR_TYPE_RW);
-	svm_disable_intercept_for_msr(vcpu, MSR_FS_BASE, MSR_TYPE_RW);
-	svm_disable_intercept_for_msr(vcpu, MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
-	svm_disable_intercept_for_msr(vcpu, MSR_LSTAR, MSR_TYPE_RW);
-	svm_disable_intercept_for_msr(vcpu, MSR_CSTAR, MSR_TYPE_RW);
-	svm_disable_intercept_for_msr(vcpu, MSR_SYSCALL_MASK, MSR_TYPE_RW);
-#endif
-}
-
 void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
 {
 	static const u32 x2avic_passthrough_msrs[] = {
@@ -852,7 +837,17 @@ static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	svm_vcpu_init_msrpm(vcpu);
+	svm_disable_intercept_for_msr(vcpu, MSR_STAR, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
+
+#ifdef CONFIG_X86_64
+	svm_disable_intercept_for_msr(vcpu, MSR_GS_BASE, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_FS_BASE, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_LSTAR, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_CSTAR, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_SYSCALL_MASK, MSR_TYPE_RW);
+#endif
 
 	if (lbrv)
 		svm_recalc_lbr_msr_intercepts(vcpu);
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 23/32] KVM: SVM: Merge "after set CPUID" intercept recalc helpers
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (21 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 22/32] KVM: SVM: Fold svm_vcpu_init_msrpm() into its sole caller Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 24/32] KVM: SVM: Drop explicit check on MSRPM offset when emulating SEV-ES accesses Sean Christopherson
                   ` (10 subsequent siblings)
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Merge svm_recalc_intercepts_after_set_cpuid() and
svm_recalc_instruction_intercepts() such that the "after set CPUID" helper
simply invokes the type-specific helpers (MSRs vs. instructions), i.e.
make svm_recalc_intercepts_after_set_cpuid() a single entry point for all
intercept updates that need to be performed after a CPUID change.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index be2e6914e9d9..59088f68c557 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1075,9 +1075,10 @@ void svm_write_tsc_multiplier(struct kvm_vcpu *vcpu)
 }
 
 /* Evaluate instruction intercepts that depend on guest CPUID features. */
-static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
-					      struct vcpu_svm *svm)
+static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
+
 	/*
 	 * Intercept INVPCID if shadow paging is enabled to sync/free shadow
 	 * roots, or if INVPCID is disabled in the guest to inject #UD.
@@ -1096,11 +1097,6 @@ static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
 		else
 			svm_set_intercept(svm, INTERCEPT_RDTSCP);
 	}
-}
-
-static void svm_recalc_intercepts_after_set_cpuid(struct kvm_vcpu *vcpu)
-{
-	struct vcpu_svm *svm = to_svm(vcpu);
 
 	if (guest_cpuid_is_intel_compatible(vcpu)) {
 		svm_set_intercept(svm, INTERCEPT_VMLOAD);
@@ -1117,7 +1113,11 @@ static void svm_recalc_intercepts_after_set_cpuid(struct kvm_vcpu *vcpu)
 			svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
 		}
 	}
+}
 
+static void svm_recalc_intercepts_after_set_cpuid(struct kvm_vcpu *vcpu)
+{
+	svm_recalc_instruction_intercepts(vcpu);
 	svm_recalc_msr_intercepts(vcpu);
 }
 
@@ -1243,8 +1243,6 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 		svm_clr_intercept(svm, INTERCEPT_PAUSE);
 	}
 
-	svm_recalc_instruction_intercepts(vcpu, svm);
-
 	if (kvm_vcpu_apicv_active(vcpu))
 		avic_init_vmcb(svm, vmcb);
 
@@ -4509,8 +4507,6 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	if (guest_cpuid_is_intel_compatible(vcpu))
 		guest_cpu_cap_clear(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
 
-	svm_recalc_instruction_intercepts(vcpu, svm);
-
 	if (sev_guest(vcpu->kvm))
 		sev_vcpu_after_set_cpuid(svm);
 
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 24/32] KVM: SVM: Drop explicit check on MSRPM offset when emulating SEV-ES accesses
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (22 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 23/32] KVM: SVM: Merge "after set CPUID" intercept recalc helpers Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 25/32] KVM: SVM: Move svm_msrpm_offset() to nested.c Sean Christopherson
                   ` (9 subsequent siblings)
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Now that msr_write_intercepted() defaults to true, i.e. accurately reflects
hardware behavior for out-of-range MSRs, and doesn't WARN (or BUG) on an
out-of-range MSR, drop sev_es_prevent_msr_access()'s svm_msrpm_offset()
check that guarded against calling msr_write_intercepted() with a "bad"
index.

Opportunistically clean up the helper's formatting.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 59088f68c557..9e4d08dba5f8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2767,12 +2767,11 @@ static int svm_get_feature_msr(u32 msr, u64 *data)
 	return 0;
 }
 
-static bool
-sev_es_prevent_msr_access(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+static bool sev_es_prevent_msr_access(struct kvm_vcpu *vcpu,
+				      struct msr_data *msr_info)
 {
 	return sev_es_guest(vcpu->kvm) &&
 	       vcpu->arch.guest_state_protected &&
-	       svm_msrpm_offset(msr_info->index) != MSR_INVALID &&
 	       !msr_write_intercepted(vcpu, msr_info->index);
 }
 
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 25/32] KVM: SVM: Move svm_msrpm_offset() to nested.c
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (23 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 24/32] KVM: SVM: Drop explicit check on MSRPM offset when emulating SEV-ES accesses Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 26/32] KVM: SVM: Store MSRPM pointer as "void *" instead of "u32 *" Sean Christopherson
                   ` (8 subsequent siblings)
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Move svm_msrpm_offset() from svm.c to nested.c now that all usage of the
u32-index offsets is nested virtualization specific.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c | 23 +++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c    | 23 -----------------------
 arch/x86/kvm/svm/svm.h    |  1 -
 3 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index cf148f7db887..13de4f63a9c2 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -197,6 +197,29 @@ void recalc_intercepts(struct vcpu_svm *svm)
 static int nested_svm_msrpm_merge_offsets[6] __ro_after_init;
 static int nested_svm_nr_msrpm_merge_offsets __ro_after_init;
 
+static const u32 msrpm_ranges[] = {0, 0xc0000000, 0xc0010000};
+
+static u32 svm_msrpm_offset(u32 msr)
+{
+	u32 offset;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(msrpm_ranges); i++) {
+		if (msr < msrpm_ranges[i] ||
+		    msr >= msrpm_ranges[i] + SVM_MSRS_PER_RANGE)
+			continue;
+
+		offset  = (msr - msrpm_ranges[i]) / SVM_MSRS_PER_BYTE;
+		offset += (i * SVM_MSRPM_BYTES_PER_RANGE);  /* add range offset */
+
+		/* Now we have the u8 offset - but need the u32 offset */
+		return offset / 4;
+	}
+
+	/* MSR not in any range */
+	return MSR_INVALID;
+}
+
 int __init nested_svm_init_msrpm_merge_offsets(void)
 {
 	static const u32 merge_msrs[] __initconst = {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9e4d08dba5f8..5008e929b1a5 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -195,29 +195,6 @@ static DEFINE_MUTEX(vmcb_dump_mutex);
  */
 static int tsc_aux_uret_slot __read_mostly = -1;
 
-static const u32 msrpm_ranges[] = {0, 0xc0000000, 0xc0010000};
-
-u32 svm_msrpm_offset(u32 msr)
-{
-	u32 offset;
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(msrpm_ranges); i++) {
-		if (msr < msrpm_ranges[i] ||
-		    msr >= msrpm_ranges[i] + SVM_MSRS_PER_RANGE)
-			continue;
-
-		offset  = (msr - msrpm_ranges[i]) / SVM_MSRS_PER_BYTE;
-		offset += (i * SVM_MSRPM_BYTES_PER_RANGE);  /* add range offset */
-
-		/* Now we have the u8 offset - but need the u32 offset */
-		return offset / 4;
-	}
-
-	/* MSR not in any range */
-	return MSR_INVALID;
-}
-
 static int get_npt_level(void)
 {
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 91c4eb2232e0..a0c14256cc56 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -666,7 +666,6 @@ BUILD_SVM_MSR_BITMAP_HELPERS(void, set, __set)
 /* svm.c */
 extern bool dump_invalid_vmcb;
 
-u32 svm_msrpm_offset(u32 msr);
 u32 *svm_vcpu_alloc_msrpm(void);
 void svm_vcpu_free_msrpm(u32 *msrpm);
 void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb);
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 26/32] KVM: SVM: Store MSRPM pointer as "void *" instead of "u32 *"
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (24 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 25/32] KVM: SVM: Move svm_msrpm_offset() to nested.c Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 27/32] KVM: nSVM: Access MSRPM in 4-byte chunks only for merging L0 and L1 bitmaps Sean Christopherson
                   ` (7 subsequent siblings)
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Store KVM's MSRPM pointers as "void *" instead of "u32 *" to guard against
directly accessing the bitmaps outside of code that is explicitly written
to access the bitmaps with a specific type.

Opportunistically use svm_vcpu_free_msrpm() in svm_vcpu_free() instead of
open coding an equivalent.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c |  4 +++-
 arch/x86/kvm/svm/svm.c    |  8 ++++----
 arch/x86/kvm/svm/svm.h    | 13 ++++++++-----
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 13de4f63a9c2..f9bda148273e 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -277,6 +277,8 @@ int __init nested_svm_init_msrpm_merge_offsets(void)
 static bool nested_svm_merge_msrpm(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	u32 *msrpm02 = svm->nested.msrpm;
+	u32 *msrpm01 = svm->msrpm;
 	int i;
 
 	/*
@@ -311,7 +313,7 @@ static bool nested_svm_merge_msrpm(struct kvm_vcpu *vcpu)
 		if (kvm_vcpu_read_guest(vcpu, offset, &value, 4))
 			return false;
 
-		svm->nested.msrpm[p] = svm->msrpm[p] | value;
+		msrpm02[p] = msrpm01[p] | value;
 	}
 
 	svm->nested.force_msr_bitmap_recalc = false;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 5008e929b1a5..fc41ec70b6de 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -728,11 +728,11 @@ void svm_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
 	svm->nested.force_msr_bitmap_recalc = true;
 }
 
-u32 *svm_vcpu_alloc_msrpm(void)
+void *svm_vcpu_alloc_msrpm(void)
 {
 	unsigned int order = get_order(MSRPM_SIZE);
 	struct page *pages = alloc_pages(GFP_KERNEL_ACCOUNT, order);
-	u32 *msrpm;
+	void *msrpm;
 
 	if (!pages)
 		return NULL;
@@ -805,7 +805,7 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
 	svm->x2avic_msrs_intercepted = intercept;
 }
 
-void svm_vcpu_free_msrpm(u32 *msrpm)
+void svm_vcpu_free_msrpm(void *msrpm)
 {
 	__free_pages(virt_to_page(msrpm), get_order(MSRPM_SIZE));
 }
@@ -1353,7 +1353,7 @@ static void svm_vcpu_free(struct kvm_vcpu *vcpu)
 	sev_free_vcpu(vcpu);
 
 	__free_page(__sme_pa_to_page(svm->vmcb01.pa));
-	__free_pages(virt_to_page(svm->msrpm), get_order(MSRPM_SIZE));
+	svm_vcpu_free_msrpm(svm->msrpm);
 }
 
 #ifdef CONFIG_CPU_MITIGATIONS
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a0c14256cc56..e078df15f1d8 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -186,8 +186,11 @@ struct svm_nested_state {
 	u64 vmcb12_gpa;
 	u64 last_vmcb12_gpa;
 
-	/* These are the merged vectors */
-	u32 *msrpm;
+	/*
+	 * The MSR permissions map used for vmcb02, which is the merge result
+	 * of vmcb01 and vmcb12
+	 */
+	void *msrpm;
 
 	/* A VMRUN has started but has not yet been performed, so
 	 * we cannot inject a nested vmexit yet.  */
@@ -268,7 +271,7 @@ struct vcpu_svm {
 	 */
 	u64 virt_spec_ctrl;
 
-	u32 *msrpm;
+	void *msrpm;
 
 	ulong nmi_iret_rip;
 
@@ -666,8 +669,8 @@ BUILD_SVM_MSR_BITMAP_HELPERS(void, set, __set)
 /* svm.c */
 extern bool dump_invalid_vmcb;
 
-u32 *svm_vcpu_alloc_msrpm(void);
-void svm_vcpu_free_msrpm(u32 *msrpm);
+void *svm_vcpu_alloc_msrpm(void);
+void svm_vcpu_free_msrpm(void *msrpm);
 void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb);
 void svm_enable_lbrv(struct kvm_vcpu *vcpu);
 void svm_update_lbrv(struct kvm_vcpu *vcpu);
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 27/32] KVM: nSVM: Access MSRPM in 4-byte chunks only for merging L0 and L1 bitmaps
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (25 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 26/32] KVM: SVM: Store MSRPM pointer as "void *" instead of "u32 *" Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 28/32] KVM: SVM: Return -EINVAL instead of MSR_INVALID to signal out-of-range MSR Sean Christopherson
                   ` (6 subsequent siblings)
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Access the MSRPM using u32/4-byte chunks (and appropriately adjusted
offsets) only when merging L0 and L1 bitmaps as part of emulating VMRUN.
The only reason to batch accesses to MSRPMs is to avoid the overhead of
uaccess operations (e.g. STAC/CLAC and bounds checks) when reading L1's
bitmap pointed at by vmcb12.  For all other uses, either per-bit accesses
are more than fast enough (no uaccess), or KVM is only accessing a single
bit (nested_svm_exit_handled_msr()) and so there's nothing to batch.

In addition to (hopefully) documenting the uniqueness of the merging code,
restricting chunked access to _just_ the merging code will allow for
increasing the chunk size (to unsigned long) with minimal risk.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c | 52 ++++++++++++++-------------------------
 1 file changed, 18 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index f9bda148273e..fb0ac87df00a 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -197,29 +197,6 @@ void recalc_intercepts(struct vcpu_svm *svm)
 static int nested_svm_msrpm_merge_offsets[6] __ro_after_init;
 static int nested_svm_nr_msrpm_merge_offsets __ro_after_init;
 
-static const u32 msrpm_ranges[] = {0, 0xc0000000, 0xc0010000};
-
-static u32 svm_msrpm_offset(u32 msr)
-{
-	u32 offset;
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(msrpm_ranges); i++) {
-		if (msr < msrpm_ranges[i] ||
-		    msr >= msrpm_ranges[i] + SVM_MSRS_PER_RANGE)
-			continue;
-
-		offset  = (msr - msrpm_ranges[i]) / SVM_MSRS_PER_BYTE;
-		offset += (i * SVM_MSRPM_BYTES_PER_RANGE);  /* add range offset */
-
-		/* Now we have the u8 offset - but need the u32 offset */
-		return offset / 4;
-	}
-
-	/* MSR not in any range */
-	return MSR_INVALID;
-}
-
 int __init nested_svm_init_msrpm_merge_offsets(void)
 {
 	static const u32 merge_msrs[] __initconst = {
@@ -246,11 +223,18 @@ int __init nested_svm_init_msrpm_merge_offsets(void)
 	int i, j;
 
 	for (i = 0; i < ARRAY_SIZE(merge_msrs); i++) {
-		u32 offset = svm_msrpm_offset(merge_msrs[i]);
+		u32 bit_nr = svm_msrpm_bit_nr(merge_msrs[i]);
+		u32 offset;
 
-		if (WARN_ON(offset == MSR_INVALID))
+		if (WARN_ON(bit_nr == MSR_INVALID))
 			return -EIO;
 
+		/*
+		 * Merging is done in 32-bit chunks to reduce the number of
+		 * accesses to L1's bitmap.
+		 */
+		offset = bit_nr / BITS_PER_BYTE / sizeof(u32);
+
 		for (j = 0; j < nested_svm_nr_msrpm_merge_offsets; j++) {
 			if (nested_svm_msrpm_merge_offsets[j] == offset)
 				break;
@@ -1369,26 +1353,26 @@ void svm_leave_nested(struct kvm_vcpu *vcpu)
 
 static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
 {
-	u32 offset, msr, value;
-	int write, mask;
+	gpa_t base = svm->nested.ctl.msrpm_base_pa;
+	u32 msr, bit_nr;
+	u8 value, mask;
+	int write;
 
 	if (!(vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_MSR_PROT)))
 		return NESTED_EXIT_HOST;
 
 	msr    = svm->vcpu.arch.regs[VCPU_REGS_RCX];
-	offset = svm_msrpm_offset(msr);
+	bit_nr = svm_msrpm_bit_nr(msr);
 	write  = svm->vmcb->control.exit_info_1 & 1;
-	mask   = 1 << ((2 * (msr & 0xf)) + write);
 
-	if (offset == MSR_INVALID)
+	if (bit_nr == MSR_INVALID)
 		return NESTED_EXIT_DONE;
 
-	/* Offset is in 32 bit units but need in 8 bit units */
-	offset *= 4;
-
-	if (kvm_vcpu_read_guest(&svm->vcpu, svm->nested.ctl.msrpm_base_pa + offset, &value, 4))
+	if (kvm_vcpu_read_guest(&svm->vcpu, base + bit_nr / BITS_PER_BYTE,
+				&value, sizeof(value)))
 		return NESTED_EXIT_DONE;
 
+	mask = BIT(write) << (bit_nr & (BITS_PER_BYTE - 1));
 	return (value & mask) ? NESTED_EXIT_DONE : NESTED_EXIT_HOST;
 }
 
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 28/32] KVM: SVM: Return -EINVAL instead of MSR_INVALID to signal out-of-range MSR
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (26 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 27/32] KVM: nSVM: Access MSRPM in 4-byte chunks only for merging L0 and L1 bitmaps Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 29/32] KVM: nSVM: Merge MSRPM in 64-bit chunks on 64-bit kernels Sean Christopherson
                   ` (5 subsequent siblings)
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Return -EINVAL instead of MSR_INVALID from svm_msrpm_bit_nr() to indicate
that the MSR isn't covered by one of the (currently) three MSRPM ranges,
and delete the MSR_INVALID macro now that all users are gone.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c | 10 +++++-----
 arch/x86/kvm/svm/svm.h    | 10 ++++------
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index fb0ac87df00a..7ca45361ced3 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -223,10 +223,10 @@ int __init nested_svm_init_msrpm_merge_offsets(void)
 	int i, j;
 
 	for (i = 0; i < ARRAY_SIZE(merge_msrs); i++) {
-		u32 bit_nr = svm_msrpm_bit_nr(merge_msrs[i]);
+		int bit_nr = svm_msrpm_bit_nr(merge_msrs[i]);
 		u32 offset;
 
-		if (WARN_ON(bit_nr == MSR_INVALID))
+		if (WARN_ON(bit_nr < 0))
 			return -EIO;
 
 		/*
@@ -1354,9 +1354,9 @@ void svm_leave_nested(struct kvm_vcpu *vcpu)
 static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
 {
 	gpa_t base = svm->nested.ctl.msrpm_base_pa;
-	u32 msr, bit_nr;
+	int write, bit_nr;
 	u8 value, mask;
-	int write;
+	u32 msr;
 
 	if (!(vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_MSR_PROT)))
 		return NESTED_EXIT_HOST;
@@ -1365,7 +1365,7 @@ static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
 	bit_nr = svm_msrpm_bit_nr(msr);
 	write  = svm->vmcb->control.exit_info_1 & 1;
 
-	if (bit_nr == MSR_INVALID)
+	if (bit_nr < 0)
 		return NESTED_EXIT_DONE;
 
 	if (kvm_vcpu_read_guest(&svm->vcpu, base + bit_nr / BITS_PER_BYTE,
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index e078df15f1d8..489adc2ca3f5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -619,9 +619,7 @@ static inline void svm_vmgexit_no_action(struct vcpu_svm *svm, u64 data)
 static_assert(SVM_MSRS_PER_RANGE == 8192);
 #define SVM_MSRPM_OFFSET_MASK (SVM_MSRS_PER_RANGE - 1)
 
-#define MSR_INVALID				0xffffffffU
-
-static __always_inline u32 svm_msrpm_bit_nr(u32 msr)
+static __always_inline int svm_msrpm_bit_nr(u32 msr)
 {
 	int range_nr;
 
@@ -636,7 +634,7 @@ static __always_inline u32 svm_msrpm_bit_nr(u32 msr)
 		range_nr = 2;
 		break;
 	default:
-		return MSR_INVALID;
+		return -EINVAL;
 	}
 
 	return range_nr * SVM_MSRPM_BYTES_PER_RANGE * BITS_PER_BYTE +
@@ -647,10 +645,10 @@ static __always_inline u32 svm_msrpm_bit_nr(u32 msr)
 static inline rtype svm_##action##_msr_bitmap_##access(unsigned long *bitmap,	\
 						       u32 msr)			\
 {										\
-	u32 bit_nr;								\
+	int bit_nr;								\
 										\
 	bit_nr = svm_msrpm_bit_nr(msr);						\
-	if (bit_nr == MSR_INVALID)								\
+	if (bit_nr < 0)								\
 		return (rtype)true;						\
 										\
 	return bitop##_bit(bit_nr + bit_rw, bitmap);				\
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 29/32] KVM: nSVM: Merge MSRPM in 64-bit chunks on 64-bit kernels
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (27 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 28/32] KVM: SVM: Return -EINVAL instead of MSR_INVALID to signal out-of-range MSR Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 30/32] KVM: SVM: Add a helper to allocate and initialize permissions bitmaps Sean Christopherson
                   ` (4 subsequent siblings)
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

When merging L0 and L1 MSRPMs as part of nested VMRUN emulation, access
the bitmaps using "unsigned long" chunks, i.e. use 8-byte access for
64-bit kernels instead of arbitrarily working on 4-byte chunks.

Opportunistically rename local variables in nested_svm_merge_msrpm() to
more precisely/accurately reflect their purpose ("offset" in particular is
extremely ambiguous).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 7ca45361ced3..749f7b866ac8 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -196,6 +196,7 @@ void recalc_intercepts(struct vcpu_svm *svm)
  */
 static int nested_svm_msrpm_merge_offsets[6] __ro_after_init;
 static int nested_svm_nr_msrpm_merge_offsets __ro_after_init;
+typedef unsigned long nsvm_msrpm_merge_t;
 
 int __init nested_svm_init_msrpm_merge_offsets(void)
 {
@@ -230,10 +231,10 @@ int __init nested_svm_init_msrpm_merge_offsets(void)
 			return -EIO;
 
 		/*
-		 * Merging is done in 32-bit chunks to reduce the number of
-		 * accesses to L1's bitmap.
+		 * Merging is done in chunks to reduce the number of accesses
+		 * to L1's bitmap.
 		 */
-		offset = bit_nr / BITS_PER_BYTE / sizeof(u32);
+		offset = bit_nr / BITS_PER_BYTE / sizeof(nsvm_msrpm_merge_t);
 
 		for (j = 0; j < nested_svm_nr_msrpm_merge_offsets; j++) {
 			if (nested_svm_msrpm_merge_offsets[j] == offset)
@@ -261,8 +262,8 @@ int __init nested_svm_init_msrpm_merge_offsets(void)
 static bool nested_svm_merge_msrpm(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
-	u32 *msrpm02 = svm->nested.msrpm;
-	u32 *msrpm01 = svm->msrpm;
+	nsvm_msrpm_merge_t *msrpm02 = svm->nested.msrpm;
+	nsvm_msrpm_merge_t *msrpm01 = svm->msrpm;
 	int i;
 
 	/*
@@ -289,15 +290,15 @@ static bool nested_svm_merge_msrpm(struct kvm_vcpu *vcpu)
 
 	for (i = 0; i < nested_svm_nr_msrpm_merge_offsets; i++) {
 		const int p = nested_svm_msrpm_merge_offsets[i];
-		u32 value;
-		u64 offset;
+		nsvm_msrpm_merge_t l1_val;
+		gpa_t gpa;
 
-		offset = svm->nested.ctl.msrpm_base_pa + (p * 4);
+		gpa = svm->nested.ctl.msrpm_base_pa + (p * sizeof(l1_val));
 
-		if (kvm_vcpu_read_guest(vcpu, offset, &value, 4))
+		if (kvm_vcpu_read_guest(vcpu, gpa, &l1_val, sizeof(l1_val)))
 			return false;
 
-		msrpm02[p] = msrpm01[p] | value;
+		msrpm02[p] = msrpm01[p] | l1_val;
 	}
 
 	svm->nested.force_msr_bitmap_recalc = false;
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 30/32] KVM: SVM: Add a helper to allocate and initialize permissions bitmaps
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (28 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 29/32] KVM: nSVM: Merge MSRPM in 64-bit chunks on 64-bit kernels Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-10 22:57 ` [PATCH v2 31/32] KVM: x86: Simplify userspace filter logic when disabling MSR interception Sean Christopherson
                   ` (3 subsequent siblings)
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Add a helper to allocate and initialize an MSR or I/O permissions map, as
the logic is identical between the two map types, the only difference is
the size of the bitmap.  Opportunistically add a comment to explain why
the bitmaps are initialized with 0xff, e.g. instead of the more common
zero-initialized behavior, which is the main motivation for deduplicating
the code.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 31 +++++++++++++++----------------
 arch/x86/kvm/svm/svm.h |  8 +++++++-
 2 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index fc41ec70b6de..e3c49c763225 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -728,19 +728,23 @@ void svm_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
 	svm->nested.force_msr_bitmap_recalc = true;
 }
 
-void *svm_vcpu_alloc_msrpm(void)
+void *svm_alloc_permissions_map(unsigned long size, gfp_t gfp_mask)
 {
-	unsigned int order = get_order(MSRPM_SIZE);
-	struct page *pages = alloc_pages(GFP_KERNEL_ACCOUNT, order);
-	void *msrpm;
+	unsigned int order = get_order(size);
+	struct page *pages = alloc_pages(gfp_mask, order);
+	void *pm;
 
 	if (!pages)
 		return NULL;
 
-	msrpm = page_address(pages);
-	memset(msrpm, 0xff, PAGE_SIZE * (1 << order));
+	/*
+	 * Set all bits in the permissions map so that all MSR and I/O accesses
+	 * are intercepted by default.
+	 */
+	pm = page_address(pages);
+	memset(pm, 0xff, PAGE_SIZE * (1 << order));
 
-	return msrpm;
+	return pm;
 }
 
 static void svm_recalc_lbr_msr_intercepts(struct kvm_vcpu *vcpu)
@@ -5325,11 +5329,8 @@ static __init void svm_set_cpu_caps(void)
 
 static __init int svm_hardware_setup(void)
 {
-	int cpu;
-	struct page *iopm_pages;
 	void *iopm_va;
-	int r;
-	unsigned int order = get_order(IOPM_SIZE);
+	int cpu, r;
 
 	/*
 	 * NX is required for shadow paging and for NPT if the NX huge pages
@@ -5410,13 +5411,11 @@ static __init int svm_hardware_setup(void)
 			pr_info("LBR virtualization supported\n");
 	}
 
-	iopm_pages = alloc_pages(GFP_KERNEL, order);
-	if (!iopm_pages)
+	iopm_va = svm_alloc_permissions_map(IOPM_SIZE, GFP_KERNEL);
+	if (!iopm_va)
 		return -ENOMEM;
 
-	iopm_va = page_address(iopm_pages);
-	memset(iopm_va, 0xff, PAGE_SIZE * (1 << order));
-	iopm_base = __sme_page_pa(iopm_pages);
+	iopm_base = __sme_set(__pa(iopm_va));
 
 	/*
 	 * Note, SEV setup consumes npt_enabled and enable_mmio_caching (which
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 489adc2ca3f5..8d3279563261 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -667,7 +667,13 @@ BUILD_SVM_MSR_BITMAP_HELPERS(void, set, __set)
 /* svm.c */
 extern bool dump_invalid_vmcb;
 
-void *svm_vcpu_alloc_msrpm(void);
+void *svm_alloc_permissions_map(unsigned long size, gfp_t gfp_mask);
+
+static inline void *svm_vcpu_alloc_msrpm(void)
+{
+	return svm_alloc_permissions_map(MSRPM_SIZE, GFP_KERNEL_ACCOUNT);
+}
+
 void svm_vcpu_free_msrpm(void *msrpm);
 void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb);
 void svm_enable_lbrv(struct kvm_vcpu *vcpu);
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 31/32] KVM: x86: Simplify userspace filter logic when disabling MSR interception
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (29 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 30/32] KVM: SVM: Add a helper to allocate and initialize permissions bitmaps Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-11  2:35   ` Mi, Dapeng
  2025-06-10 22:57 ` [PATCH v2 32/32] KVM: selftests: Verify KVM disable interception (for userspace) on filter change Sean Christopherson
                   ` (2 subsequent siblings)
  33 siblings, 1 reply; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Refactor {svm,vmx}_disable_intercept_for_msr() to simplify the handling of
userspace filters that disallow access to an MSR.  The more complicated
logic is no longer needed or justified now that KVM recalculates all MSR
intercepts on a userspace MSR filter change, i.e. now that KVM doesn't
need to also update shadow bitmaps.

No functional change intended.

Suggested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 24 ++++++++++--------------
 arch/x86/kvm/vmx/vmx.c | 24 ++++++++++--------------
 2 files changed, 20 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e3c49c763225..5453478d1ca3 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -691,24 +691,20 @@ void svm_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
 	void *msrpm = svm->msrpm;
 
 	/* Don't disable interception for MSRs userspace wants to handle. */
-	if ((type & MSR_TYPE_R) &&
-	    !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ)) {
-		svm_set_msr_bitmap_read(msrpm, msr);
-		type &= ~MSR_TYPE_R;
+	if (type & MSR_TYPE_R) {
+		if (kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ))
+			svm_clear_msr_bitmap_read(msrpm, msr);
+		else
+			svm_set_msr_bitmap_read(msrpm, msr);
 	}
 
-	if ((type & MSR_TYPE_W) &&
-	    !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE)) {
-		svm_set_msr_bitmap_write(msrpm, msr);
-		type &= ~MSR_TYPE_W;
+	if (type & MSR_TYPE_W) {
+		if (kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE))
+			svm_clear_msr_bitmap_write(msrpm, msr);
+		else
+			svm_set_msr_bitmap_write(msrpm, msr);
 	}
 
-	if (type & MSR_TYPE_R)
-		svm_clear_msr_bitmap_read(msrpm, msr);
-
-	if (type & MSR_TYPE_W)
-		svm_clear_msr_bitmap_write(msrpm, msr);
-
 	svm_hv_vmcb_dirty_nested_enlightenments(vcpu);
 	svm->nested.force_msr_bitmap_recalc = true;
 }
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index bdff81f8288d..277c6b5b5d5f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3962,23 +3962,19 @@ void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
 
 	vmx_msr_bitmap_l01_changed(vmx);
 
-	if ((type & MSR_TYPE_R) &&
-	    !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ)) {
-		vmx_set_msr_bitmap_read(msr_bitmap, msr);
-		type &= ~MSR_TYPE_R;
+	if (type & MSR_TYPE_R) {
+		if (kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ))
+			vmx_clear_msr_bitmap_read(msr_bitmap, msr);
+		else
+			vmx_set_msr_bitmap_read(msr_bitmap, msr);
 	}
 
-	if ((type & MSR_TYPE_W) &&
-	    !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE)) {
-		vmx_set_msr_bitmap_write(msr_bitmap, msr);
-		type &= ~MSR_TYPE_W;
+	if (type & MSR_TYPE_W) {
+		if (kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE))
+			vmx_clear_msr_bitmap_write(msr_bitmap, msr);
+		else
+			vmx_set_msr_bitmap_write(msr_bitmap, msr);
 	}
-
-	if (type & MSR_TYPE_R)
-		vmx_clear_msr_bitmap_read(msr_bitmap, msr);
-
-	if (type & MSR_TYPE_W)
-		vmx_clear_msr_bitmap_write(msr_bitmap, msr);
 }
 
 void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 32/32] KVM: selftests: Verify KVM disable interception (for userspace) on filter change
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (30 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 31/32] KVM: x86: Simplify userspace filter logic when disabling MSR interception Sean Christopherson
@ 2025-06-10 22:57 ` Sean Christopherson
  2025-06-24 19:38 ` [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
  2025-06-25 12:03 ` Manali Shukla
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-10 22:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

Re-read MSR_{FS,GS}_BASE after restoring the "allow everything" userspace
MSR filter to verify that KVM stops forwarding exits to userspace.  This
can also be used in conjunction with manual verification (e.g. printk) to
ensure KVM is correctly updating the MSR bitmaps consumed by hardware.

Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Manali Shukla <Manali.Shukla@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/x86/userspace_msr_exit_test.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/testing/selftests/kvm/x86/userspace_msr_exit_test.c b/tools/testing/selftests/kvm/x86/userspace_msr_exit_test.c
index 32b2794b78fe..8463a9956410 100644
--- a/tools/testing/selftests/kvm/x86/userspace_msr_exit_test.c
+++ b/tools/testing/selftests/kvm/x86/userspace_msr_exit_test.c
@@ -343,6 +343,12 @@ static void guest_code_permission_bitmap(void)
 	data = test_rdmsr(MSR_GS_BASE);
 	GUEST_ASSERT(data == MSR_GS_BASE);
 
+	/* Access the MSRs again to ensure KVM has disabled interception.*/
+	data = test_rdmsr(MSR_FS_BASE);
+	GUEST_ASSERT(data != MSR_FS_BASE);
+	data = test_rdmsr(MSR_GS_BASE);
+	GUEST_ASSERT(data != MSR_GS_BASE);
+
 	GUEST_DONE();
 }
 
@@ -682,6 +688,8 @@ KVM_ONE_VCPU_TEST(user_msr, msr_permission_bitmap, guest_code_permission_bitmap)
 		    "Expected ucall state to be UCALL_SYNC.");
 	vm_ioctl(vm, KVM_X86_SET_MSR_FILTER, &filter_gs);
 	run_guest_then_process_rdmsr(vcpu, MSR_GS_BASE);
+
+	vm_ioctl(vm, KVM_X86_SET_MSR_FILTER, &filter_allow);
 	run_guest_then_process_ucall_done(vcpu);
 }
 
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 06/32] KVM: SVM: Kill the VM instead of the host if MSR interception is buggy
  2025-06-10 22:57 ` [PATCH v2 06/32] KVM: SVM: Kill the VM instead of the host if MSR interception is buggy Sean Christopherson
@ 2025-06-11  2:16   ` Mi, Dapeng
  0 siblings, 0 replies; 46+ messages in thread
From: Mi, Dapeng @ 2025-06-11  2:16 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li,
	Francesco Lavra, Manali Shukla


On 6/11/2025 6:57 AM, Sean Christopherson wrote:
> WARN and kill the VM instead of panicking the host if KVM attempts to set
> or query MSR interception for an unsupported MSR.  Accessing the MSR
> interception bitmaps only meaningfully affects post-VMRUN behavior, and
> KVM_BUG_ON() is guaranteed to prevent the current vCPU from doing VMRUN,
> i.e. there is no need to panic the entire host.
>
> Opportunistically move the sanity checks about their use to index into the
> MSRPM, e.g. so that bugs only WARN and terminate the VM, as opposed to
> doing that _and_ generating an out-of-bounds load.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/svm.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index c75977ca600b..7e39b9df61f1 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -824,11 +824,12 @@ static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
>  				      to_svm(vcpu)->msrpm;
>  
>  	offset    = svm_msrpm_offset(msr);
> +	if (KVM_BUG_ON(offset == MSR_INVALID, vcpu->kvm))
> +		return false;
> +
>  	bit_write = 2 * (msr & 0x0f) + 1;
>  	tmp       = msrpm[offset];
>  
> -	BUG_ON(offset == MSR_INVALID);
> -
>  	return test_bit(bit_write, &tmp);
>  }
>  
> @@ -854,12 +855,13 @@ static void set_msr_interception_bitmap(struct kvm_vcpu *vcpu, u32 *msrpm,
>  		write = 0;
>  
>  	offset    = svm_msrpm_offset(msr);
> +	if (KVM_BUG_ON(offset == MSR_INVALID, vcpu->kvm))
> +		return;
> +
>  	bit_read  = 2 * (msr & 0x0f);
>  	bit_write = 2 * (msr & 0x0f) + 1;
>  	tmp       = msrpm[offset];
>  
> -	BUG_ON(offset == MSR_INVALID);
> -
>  	read  ? clear_bit(bit_read,  &tmp) : set_bit(bit_read,  &tmp);
>  	write ? clear_bit(bit_write, &tmp) : set_bit(bit_write, &tmp);
>  

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 08/32] KVM: SVM: Massage name and param of helper that merges vmcb01 and vmcb12 MSRPMs
  2025-06-10 22:57 ` [PATCH v2 08/32] KVM: SVM: Massage name and param of helper that merges vmcb01 and vmcb12 MSRPMs Sean Christopherson
@ 2025-06-11  2:22   ` Mi, Dapeng
  0 siblings, 0 replies; 46+ messages in thread
From: Mi, Dapeng @ 2025-06-11  2:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li,
	Francesco Lavra, Manali Shukla


On 6/11/2025 6:57 AM, Sean Christopherson wrote:
> Rename nested_svm_vmrun_msrpm() to nested_svm_merge_msrpm() to better
> capture its role, and opportunistically feed it @vcpu instead of @svm, as
> grabbing "svm" only to turn around and grab svm->vcpu is rather silly.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/nested.c | 15 +++++++--------
>  arch/x86/kvm/svm/svm.c    |  2 +-
>  2 files changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 8427a48b8b7a..89a77f0f1cc8 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -189,8 +189,9 @@ void recalc_intercepts(struct vcpu_svm *svm)
>   * is optimized in that it only merges the parts where KVM MSR permission bitmap
>   * may contain zero bits.
>   */
> -static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
> +static bool nested_svm_merge_msrpm(struct kvm_vcpu *vcpu)
>  {
> +	struct vcpu_svm *svm = to_svm(vcpu);
>  	int i;
>  
>  	/*
> @@ -205,7 +206,7 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
>  	if (!svm->nested.force_msr_bitmap_recalc) {
>  		struct hv_vmcb_enlightenments *hve = &svm->nested.ctl.hv_enlightenments;
>  
> -		if (kvm_hv_hypercall_enabled(&svm->vcpu) &&
> +		if (kvm_hv_hypercall_enabled(vcpu) &&
>  		    hve->hv_enlightenments_control.msr_bitmap &&
>  		    (svm->nested.ctl.clean & BIT(HV_VMCB_NESTED_ENLIGHTENMENTS)))
>  			goto set_msrpm_base_pa;
> @@ -230,7 +231,7 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
>  
>  		offset = svm->nested.ctl.msrpm_base_pa + (p * 4);
>  
> -		if (kvm_vcpu_read_guest(&svm->vcpu, offset, &value, 4))
> +		if (kvm_vcpu_read_guest(vcpu, offset, &value, 4))
>  			return false;
>  
>  		svm->nested.msrpm[p] = svm->msrpm[p] | value;
> @@ -937,7 +938,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
>  	if (enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, true))
>  		goto out_exit_err;
>  
> -	if (nested_svm_vmrun_msrpm(svm))
> +	if (nested_svm_merge_msrpm(vcpu))
>  		goto out;
>  
>  out_exit_err:
> @@ -1819,13 +1820,11 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
>  
>  static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu)
>  {
> -	struct vcpu_svm *svm = to_svm(vcpu);
> -
>  	if (WARN_ON(!is_guest_mode(vcpu)))
>  		return true;
>  
>  	if (!vcpu->arch.pdptrs_from_userspace &&
> -	    !nested_npt_enabled(svm) && is_pae_paging(vcpu))
> +	    !nested_npt_enabled(to_svm(vcpu)) && is_pae_paging(vcpu))
>  		/*
>  		 * Reload the guest's PDPTRs since after a migration
>  		 * the guest CR3 might be restored prior to setting the nested
> @@ -1834,7 +1833,7 @@ static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu)
>  		if (CC(!load_pdptrs(vcpu, vcpu->arch.cr3)))
>  			return false;
>  
> -	if (!nested_svm_vmrun_msrpm(svm)) {
> +	if (!nested_svm_merge_msrpm(vcpu)) {
>  		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
>  		vcpu->run->internal.suberror =
>  			KVM_INTERNAL_ERROR_EMULATION;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index ec97ea1d7b38..854904a80b7e 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3137,7 +3137,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>  		 *
>  		 * For nested:
>  		 * The handling of the MSR bitmap for L2 guests is done in
> -		 * nested_svm_vmrun_msrpm.
> +		 * nested_svm_merge_msrpm().
>  		 * We update the L1 MSR bit as well since it will end up
>  		 * touching the MSR anyway now.
>  		 */

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 17/32] KVM: x86: Move definition of X2APIC_MSR() to lapic.h
  2025-06-10 22:57 ` [PATCH v2 17/32] KVM: x86: Move definition of X2APIC_MSR() to lapic.h Sean Christopherson
@ 2025-06-11  2:29   ` Mi, Dapeng
  0 siblings, 0 replies; 46+ messages in thread
From: Mi, Dapeng @ 2025-06-11  2:29 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li,
	Francesco Lavra, Manali Shukla


On 6/11/2025 6:57 AM, Sean Christopherson wrote:
> Dedup the definition of X2APIC_MSR and put it in the local APIC code
> where it belongs.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/lapic.h   | 2 ++
>  arch/x86/kvm/svm/svm.c | 2 --
>  arch/x86/kvm/vmx/vmx.h | 2 --
>  3 files changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 4ce30db65828..4518b4e0552f 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -21,6 +21,8 @@
>  #define APIC_BROADCAST			0xFF
>  #define X2APIC_BROADCAST		0xFFFFFFFFul
>  
> +#define X2APIC_MSR(r) (APIC_BASE_MSR + ((r) >> 4))
> +
>  enum lapic_mode {
>  	LAPIC_MODE_DISABLED = 0,
>  	LAPIC_MODE_INVALID = X2APIC_ENABLE,
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 4ee92e444dde..900a1303e0e7 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -81,8 +81,6 @@ static uint64_t osvw_len = 4, osvw_status;
>  
>  static DEFINE_PER_CPU(u64, current_tsc_ratio);
>  
> -#define X2APIC_MSR(x)	(APIC_BASE_MSR + (x >> 4))
> -
>  static const u32 direct_access_msrs[] = {
>  	MSR_STAR,
>  	MSR_IA32_SYSENTER_CS,
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index b5758c33c60f..0afe97e3478f 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -19,8 +19,6 @@
>  #include "../mmu.h"
>  #include "common.h"
>  
> -#define X2APIC_MSR(r) (APIC_BASE_MSR + ((r) >> 4))
> -
>  #ifdef CONFIG_X86_64
>  #define MAX_NR_USER_RETURN_MSRS	7
>  #else

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 31/32] KVM: x86: Simplify userspace filter logic when disabling MSR interception
  2025-06-10 22:57 ` [PATCH v2 31/32] KVM: x86: Simplify userspace filter logic when disabling MSR interception Sean Christopherson
@ 2025-06-11  2:35   ` Mi, Dapeng
  0 siblings, 0 replies; 46+ messages in thread
From: Mi, Dapeng @ 2025-06-11  2:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li,
	Francesco Lavra, Manali Shukla


On 6/11/2025 6:57 AM, Sean Christopherson wrote:
> Refactor {svm,vmx}_disable_intercept_for_msr() to simplify the handling of
> userspace filters that disallow access to an MSR.  The more complicated
> logic is no longer needed or justified now that KVM recalculates all MSR
> intercepts on a userspace MSR filter change, i.e. now that KVM doesn't
> need to also update shadow bitmaps.
>
> No functional change intended.
>
> Suggested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/svm.c | 24 ++++++++++--------------
>  arch/x86/kvm/vmx/vmx.c | 24 ++++++++++--------------
>  2 files changed, 20 insertions(+), 28 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index e3c49c763225..5453478d1ca3 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -691,24 +691,20 @@ void svm_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
>  	void *msrpm = svm->msrpm;
>  
>  	/* Don't disable interception for MSRs userspace wants to handle. */
> -	if ((type & MSR_TYPE_R) &&
> -	    !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ)) {
> -		svm_set_msr_bitmap_read(msrpm, msr);
> -		type &= ~MSR_TYPE_R;
> +	if (type & MSR_TYPE_R) {
> +		if (kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ))
> +			svm_clear_msr_bitmap_read(msrpm, msr);
> +		else
> +			svm_set_msr_bitmap_read(msrpm, msr);
>  	}
>  
> -	if ((type & MSR_TYPE_W) &&
> -	    !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE)) {
> -		svm_set_msr_bitmap_write(msrpm, msr);
> -		type &= ~MSR_TYPE_W;
> +	if (type & MSR_TYPE_W) {
> +		if (kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE))
> +			svm_clear_msr_bitmap_write(msrpm, msr);
> +		else
> +			svm_set_msr_bitmap_write(msrpm, msr);
>  	}
>  
> -	if (type & MSR_TYPE_R)
> -		svm_clear_msr_bitmap_read(msrpm, msr);
> -
> -	if (type & MSR_TYPE_W)
> -		svm_clear_msr_bitmap_write(msrpm, msr);
> -
>  	svm_hv_vmcb_dirty_nested_enlightenments(vcpu);
>  	svm->nested.force_msr_bitmap_recalc = true;
>  }
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index bdff81f8288d..277c6b5b5d5f 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -3962,23 +3962,19 @@ void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
>  
>  	vmx_msr_bitmap_l01_changed(vmx);
>  
> -	if ((type & MSR_TYPE_R) &&
> -	    !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ)) {
> -		vmx_set_msr_bitmap_read(msr_bitmap, msr);
> -		type &= ~MSR_TYPE_R;
> +	if (type & MSR_TYPE_R) {
> +		if (kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ))
> +			vmx_clear_msr_bitmap_read(msr_bitmap, msr);
> +		else
> +			vmx_set_msr_bitmap_read(msr_bitmap, msr);
>  	}
>  
> -	if ((type & MSR_TYPE_W) &&
> -	    !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE)) {
> -		vmx_set_msr_bitmap_write(msr_bitmap, msr);
> -		type &= ~MSR_TYPE_W;
> +	if (type & MSR_TYPE_W) {
> +		if (kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE))
> +			vmx_clear_msr_bitmap_write(msr_bitmap, msr);
> +		else
> +			vmx_set_msr_bitmap_write(msr_bitmap, msr);
>  	}
> -
> -	if (type & MSR_TYPE_R)
> -		vmx_clear_msr_bitmap_read(msr_bitmap, msr);
> -
> -	if (type & MSR_TYPE_W)
> -		vmx_clear_msr_bitmap_write(msr_bitmap, msr);
>  }
>  
>  void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 01/32] KVM: SVM: Disable interception of SPEC_CTRL iff the MSR exists for the guest
  2025-06-10 22:57 ` [PATCH v2 01/32] KVM: SVM: Disable interception of SPEC_CTRL iff the MSR exists for the guest Sean Christopherson
@ 2025-06-11  4:38   ` Binbin Wu
  2025-06-11  7:14     ` Binbin Wu
  0 siblings, 1 reply; 46+ messages in thread
From: Binbin Wu @ 2025-06-11  4:38 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla



On 6/11/2025 6:57 AM, Sean Christopherson wrote:
> Disable interception of SPEC_CTRL when the CPU virtualizes (i.e. context
> switches) SPEC_CTRL if and only if the MSR exists according to the vCPU's
> CPUID model.  Letting the guest access SPEC_CTRL is generally benign, but
> the guest would see inconsistent behavior if KVM happened to emulate an
> access to the MSR.
>
> Fixes: d00b99c514b3 ("KVM: SVM: Add support for Virtual SPEC_CTRL")
> Reported-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/svm/svm.c | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 0ad1a6d4fb6d..21e745acebc3 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1362,11 +1362,14 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
>   	svm_recalc_instruction_intercepts(vcpu, svm);
>   
>   	/*
> -	 * If the host supports V_SPEC_CTRL then disable the interception
> -	 * of MSR_IA32_SPEC_CTRL.
> +	 * If the CPU virtualizes MSR_IA32_SPEC_CTRL, i.e. KVM doesn't need to
> +	 * manually context switch the MSR, immediately configure interception
> +	 * of SPEC_CTRL, without waiting for the guest to access the MSR.
>   	 */
>   	if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
> -		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
> +		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL,
> +				     guest_has_spec_ctrl_msr(vcpu),
> +				     guest_has_spec_ctrl_msr(vcpu));
Side topic, not related to this patch directly.

Setting to 1 for set_msr_interception() means to disable interception.
The name of the function seems a bit counterintuitive to me.
Maybe some description for the function can help people not familiar with
SVM code without further checking the implementation?


>   
>   	if (kvm_vcpu_apicv_active(vcpu))
>   		avic_init_vmcb(svm, vmcb);


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 09/32] KVM: SVM: Clean up macros related to architectural MSRPM definitions
  2025-06-10 22:57 ` [PATCH v2 09/32] KVM: SVM: Clean up macros related to architectural MSRPM definitions Sean Christopherson
@ 2025-06-11  6:09   ` Binbin Wu
  0 siblings, 0 replies; 46+ messages in thread
From: Binbin Wu @ 2025-06-11  6:09 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Chao Gao, Borislav Petkov,
	Xin Li, Dapeng Mi, Francesco Lavra, Manali Shukla



On 6/11/2025 6:57 AM, Sean Christopherson wrote:
> Move SVM's MSR Permissions Map macros to svm.h in antipication of adding

antipication -> anticipation?


> helpers that are available to SVM code, and opportunistically replace a
> variety of open-coded literals with (hopefully) informative macros.
>
> Opportunistically open code ARRAY_SIZE(msrpm_ranges) instead of wrapping
> it as NUM_MSR_MAPS, which is an ambiguous name even if it were qualified
> with "SVM_MSRPM".
>
> Deliberately leave the ranges as open coded literals, as using macros to
> define the ranges actually introduces more potential failure points, since
> both the definitions and the usage have to be careful to use the correct
> index.  The lack of clear intent behind the ranges will be addressed in
> future patches.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
[...]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 07/32] KVM: x86: Use non-atomic bit ops to manipulate "shadow" MSR intercepts
  2025-06-10 22:57 ` [PATCH v2 07/32] KVM: x86: Use non-atomic bit ops to manipulate "shadow" MSR intercepts Sean Christopherson
@ 2025-06-11  6:38   ` Binbin Wu
  0 siblings, 0 replies; 46+ messages in thread
From: Binbin Wu @ 2025-06-11  6:38 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Chao Gao, Borislav Petkov,
	Xin Li, Dapeng Mi, Francesco Lavra, Manali Shukla



On 6/11/2025 6:57 AM, Sean Christopherson wrote:
> Manipulate the MSR bitmaps using non-atomic bit ops APIs (two underscores),
> as the bitmaps are per-vCPU and are only ever accessed while vcpu->mutex is
> held.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

> ---
>   arch/x86/kvm/svm/svm.c | 12 ++++++------
>   arch/x86/kvm/vmx/vmx.c |  8 ++++----
>   2 files changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 7e39b9df61f1..ec97ea1d7b38 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -789,14 +789,14 @@ static void set_shadow_msr_intercept(struct kvm_vcpu *vcpu, u32 msr, int read,
>   
>   	/* Set the shadow bitmaps to the desired intercept states */
>   	if (read)
> -		set_bit(slot, svm->shadow_msr_intercept.read);
> +		__set_bit(slot, svm->shadow_msr_intercept.read);
>   	else
> -		clear_bit(slot, svm->shadow_msr_intercept.read);
> +		__clear_bit(slot, svm->shadow_msr_intercept.read);
>   
>   	if (write)
> -		set_bit(slot, svm->shadow_msr_intercept.write);
> +		__set_bit(slot, svm->shadow_msr_intercept.write);
>   	else
> -		clear_bit(slot, svm->shadow_msr_intercept.write);
> +		__clear_bit(slot, svm->shadow_msr_intercept.write);
>   }
>   
>   static bool valid_msr_intercept(u32 index)
> @@ -862,8 +862,8 @@ static void set_msr_interception_bitmap(struct kvm_vcpu *vcpu, u32 *msrpm,
>   	bit_write = 2 * (msr & 0x0f) + 1;
>   	tmp       = msrpm[offset];
>   
> -	read  ? clear_bit(bit_read,  &tmp) : set_bit(bit_read,  &tmp);
> -	write ? clear_bit(bit_write, &tmp) : set_bit(bit_write, &tmp);
> +	read  ? __clear_bit(bit_read,  &tmp) : __set_bit(bit_read,  &tmp);
> +	write ? __clear_bit(bit_write, &tmp) : __set_bit(bit_write, &tmp);
>   
>   	msrpm[offset] = tmp;
>   
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 9ff00ae9f05a..8f7fe04a1998 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4029,9 +4029,9 @@ void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
>   	idx = vmx_get_passthrough_msr_slot(msr);
>   	if (idx >= 0) {
>   		if (type & MSR_TYPE_R)
> -			clear_bit(idx, vmx->shadow_msr_intercept.read);
> +			__clear_bit(idx, vmx->shadow_msr_intercept.read);
>   		if (type & MSR_TYPE_W)
> -			clear_bit(idx, vmx->shadow_msr_intercept.write);
> +			__clear_bit(idx, vmx->shadow_msr_intercept.write);
>   	}
>   
>   	if ((type & MSR_TYPE_R) &&
> @@ -4071,9 +4071,9 @@ void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
>   	idx = vmx_get_passthrough_msr_slot(msr);
>   	if (idx >= 0) {
>   		if (type & MSR_TYPE_R)
> -			set_bit(idx, vmx->shadow_msr_intercept.read);
> +			__set_bit(idx, vmx->shadow_msr_intercept.read);
>   		if (type & MSR_TYPE_W)
> -			set_bit(idx, vmx->shadow_msr_intercept.write);
> +			__set_bit(idx, vmx->shadow_msr_intercept.write);
>   	}
>   
>   	if (type & MSR_TYPE_R)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 18/32] KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter change
  2025-06-10 22:57 ` [PATCH v2 18/32] KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter change Sean Christopherson
@ 2025-06-11  6:52   ` Binbin Wu
  0 siblings, 0 replies; 46+ messages in thread
From: Binbin Wu @ 2025-06-11  6:52 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Chao Gao, Borislav Petkov,
	Xin Li, Dapeng Mi, Francesco Lavra, Manali Shukla



On 6/11/2025 6:57 AM, Sean Christopherson wrote:
> On a userspace MSR filter change, recalculate all MSR intercepts using the
> filter-agnostic logic instead of maintaining a "shadow copy" of KVM's
> desired intercepts.  The shadow bitmaps add yet another point of failure,
> are confusing (e.g. what does "handled specially" mean!?!?), an eyesore,
> and a maintenance burden.
>
> Given that KVM *must* be able to recalculate the correct intercepts at any
> given time, and that MSR filter updates are not hot paths, there is zero
> benefit to maintaining the shadow bitmaps.
>
> Opportunistically switch from boot_cpu_has() to cpu_feature_enabled() as
> appropriate.
>
> Link: https://lore.kernel.org/all/aCdPbZiYmtni4Bjs@google.com
> Link: https://lore.kernel.org/all/20241126180253.GAZ0YNTdXH1UGeqsu6@fat_crate.local
> Cc: Borislav Petkov <bp@alien8.de>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> Reviewed-by: Xin Li (Intel) <xin@zytor.com>
> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

> ---
>   arch/x86/kvm/vmx/vmx.c | 183 +++++++++++------------------------------
>   arch/x86/kvm/vmx/vmx.h |   7 --
>   2 files changed, 46 insertions(+), 144 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 8f7fe04a1998..ce7a1c07e402 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -166,31 +166,6 @@ module_param(allow_smaller_maxphyaddr, bool, S_IRUGO);
>   	RTIT_STATUS_ERROR | RTIT_STATUS_STOPPED | \
>   	RTIT_STATUS_BYTECNT))
>   
> -/*
> - * List of MSRs that can be directly passed to the guest.
> - * In addition to these x2apic, PT and LBR MSRs are handled specially.
> - */
> -static u32 vmx_possible_passthrough_msrs[MAX_POSSIBLE_PASSTHROUGH_MSRS] = {
> -	MSR_IA32_SPEC_CTRL,
> -	MSR_IA32_PRED_CMD,
> -	MSR_IA32_FLUSH_CMD,
> -	MSR_IA32_TSC,
> -#ifdef CONFIG_X86_64
> -	MSR_FS_BASE,
> -	MSR_GS_BASE,
> -	MSR_KERNEL_GS_BASE,
> -	MSR_IA32_XFD,
> -	MSR_IA32_XFD_ERR,
> -#endif
> -	MSR_IA32_SYSENTER_CS,
> -	MSR_IA32_SYSENTER_ESP,
> -	MSR_IA32_SYSENTER_EIP,
> -	MSR_CORE_C1_RES,
> -	MSR_CORE_C3_RESIDENCY,
> -	MSR_CORE_C6_RESIDENCY,
> -	MSR_CORE_C7_RESIDENCY,
> -};
> -
>   /*
>    * These 2 parameters are used to config the controls for Pause-Loop Exiting:
>    * ple_gap:    upper bound on the amount of time between two successive
> @@ -672,40 +647,6 @@ static inline bool cpu_need_virtualize_apic_accesses(struct kvm_vcpu *vcpu)
>   	return flexpriority_enabled && lapic_in_kernel(vcpu);
>   }
>   
> -static int vmx_get_passthrough_msr_slot(u32 msr)
> -{
> -	int i;
> -
> -	switch (msr) {
> -	case 0x800 ... 0x8ff:
> -		/* x2APIC MSRs. These are handled in vmx_update_msr_bitmap_x2apic() */
> -		return -ENOENT;
> -	case MSR_IA32_RTIT_STATUS:
> -	case MSR_IA32_RTIT_OUTPUT_BASE:
> -	case MSR_IA32_RTIT_OUTPUT_MASK:
> -	case MSR_IA32_RTIT_CR3_MATCH:
> -	case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
> -		/* PT MSRs. These are handled in pt_update_intercept_for_msr() */
> -	case MSR_LBR_SELECT:
> -	case MSR_LBR_TOS:
> -	case MSR_LBR_INFO_0 ... MSR_LBR_INFO_0 + 31:
> -	case MSR_LBR_NHM_FROM ... MSR_LBR_NHM_FROM + 31:
> -	case MSR_LBR_NHM_TO ... MSR_LBR_NHM_TO + 31:
> -	case MSR_LBR_CORE_FROM ... MSR_LBR_CORE_FROM + 8:
> -	case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8:
> -		/* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */
> -		return -ENOENT;
> -	}
> -
> -	for (i = 0; i < ARRAY_SIZE(vmx_possible_passthrough_msrs); i++) {
> -		if (vmx_possible_passthrough_msrs[i] == msr)
> -			return i;
> -	}
> -
> -	WARN(1, "Invalid MSR %x, please adapt vmx_possible_passthrough_msrs[]", msr);
> -	return -ENOENT;
> -}
> -
>   struct vmx_uret_msr *vmx_find_uret_msr(struct vcpu_vmx *vmx, u32 msr)
>   {
>   	int i;
> @@ -4015,25 +3956,12 @@ void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
>   {
>   	struct vcpu_vmx *vmx = to_vmx(vcpu);
>   	unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
> -	int idx;
>   
>   	if (!cpu_has_vmx_msr_bitmap())
>   		return;
>   
>   	vmx_msr_bitmap_l01_changed(vmx);
>   
> -	/*
> -	 * Mark the desired intercept state in shadow bitmap, this is needed
> -	 * for resync when the MSR filters change.
> -	 */
> -	idx = vmx_get_passthrough_msr_slot(msr);
> -	if (idx >= 0) {
> -		if (type & MSR_TYPE_R)
> -			__clear_bit(idx, vmx->shadow_msr_intercept.read);
> -		if (type & MSR_TYPE_W)
> -			__clear_bit(idx, vmx->shadow_msr_intercept.write);
> -	}
> -
>   	if ((type & MSR_TYPE_R) &&
>   	    !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ)) {
>   		vmx_set_msr_bitmap_read(msr_bitmap, msr);
> @@ -4057,25 +3985,12 @@ void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
>   {
>   	struct vcpu_vmx *vmx = to_vmx(vcpu);
>   	unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
> -	int idx;
>   
>   	if (!cpu_has_vmx_msr_bitmap())
>   		return;
>   
>   	vmx_msr_bitmap_l01_changed(vmx);
>   
> -	/*
> -	 * Mark the desired intercept state in shadow bitmap, this is needed
> -	 * for resync when the MSR filter changes.
> -	 */
> -	idx = vmx_get_passthrough_msr_slot(msr);
> -	if (idx >= 0) {
> -		if (type & MSR_TYPE_R)
> -			__set_bit(idx, vmx->shadow_msr_intercept.read);
> -		if (type & MSR_TYPE_W)
> -			__set_bit(idx, vmx->shadow_msr_intercept.write);
> -	}
> -
>   	if (type & MSR_TYPE_R)
>   		vmx_set_msr_bitmap_read(msr_bitmap, msr);
>   
> @@ -4159,35 +4074,58 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
>   	}
>   }
>   
> -void vmx_msr_filter_changed(struct kvm_vcpu *vcpu)
> +static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
>   {
> -	struct vcpu_vmx *vmx = to_vmx(vcpu);
> -	u32 i;
> -
>   	if (!cpu_has_vmx_msr_bitmap())
>   		return;
>   
> -	/*
> -	 * Redo intercept permissions for MSRs that KVM is passing through to
> -	 * the guest.  Disabling interception will check the new MSR filter and
> -	 * ensure that KVM enables interception if usersepace wants to filter
> -	 * the MSR.  MSRs that KVM is already intercepting don't need to be
> -	 * refreshed since KVM is going to intercept them regardless of what
> -	 * userspace wants.
> -	 */
> -	for (i = 0; i < ARRAY_SIZE(vmx_possible_passthrough_msrs); i++) {
> -		u32 msr = vmx_possible_passthrough_msrs[i];
> -
> -		if (!test_bit(i, vmx->shadow_msr_intercept.read))
> -			vmx_disable_intercept_for_msr(vcpu, msr, MSR_TYPE_R);
> -
> -		if (!test_bit(i, vmx->shadow_msr_intercept.write))
> -			vmx_disable_intercept_for_msr(vcpu, msr, MSR_TYPE_W);
> +	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_TSC, MSR_TYPE_R);
> +#ifdef CONFIG_X86_64
> +	vmx_disable_intercept_for_msr(vcpu, MSR_FS_BASE, MSR_TYPE_RW);
> +	vmx_disable_intercept_for_msr(vcpu, MSR_GS_BASE, MSR_TYPE_RW);
> +	vmx_disable_intercept_for_msr(vcpu, MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
> +#endif
> +	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
> +	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
> +	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
> +	if (kvm_cstate_in_guest(vcpu->kvm)) {
> +		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C1_RES, MSR_TYPE_R);
> +		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
> +		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
> +		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
>   	}
>   
>   	/* PT MSRs can be passed through iff PT is exposed to the guest. */
>   	if (vmx_pt_mode_is_host_guest())
>   		pt_update_intercept_for_msr(vcpu);
> +
> +	if (vcpu->arch.xfd_no_write_intercept)
> +		vmx_disable_intercept_for_msr(vcpu, MSR_IA32_XFD, MSR_TYPE_RW);
> +
> +	vmx_set_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW,
> +				  !to_vmx(vcpu)->spec_ctrl);
> +
> +	if (kvm_cpu_cap_has(X86_FEATURE_XFD))
> +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_R,
> +					  !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD));
> +
> +	if (cpu_feature_enabled(X86_FEATURE_IBPB))
> +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
> +					  !guest_has_pred_cmd_msr(vcpu));
> +
> +	if (cpu_feature_enabled(X86_FEATURE_FLUSH_L1D))
> +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
> +					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
> +
> +	/*
> +	 * x2APIC and LBR MSR intercepts are modified on-demand and cannot be
> +	 * filtered by userspace.
> +	 */
> +}
> +
> +void vmx_msr_filter_changed(struct kvm_vcpu *vcpu)
> +{
> +	vmx_recalc_msr_intercepts(vcpu);
>   }
>   
>   static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
> @@ -7537,26 +7475,6 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu)
>   		evmcs->hv_enlightenments_control.msr_bitmap = 1;
>   	}
>   
> -	/* The MSR bitmap starts with all ones */
> -	bitmap_fill(vmx->shadow_msr_intercept.read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
> -	bitmap_fill(vmx->shadow_msr_intercept.write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
> -
> -	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_TSC, MSR_TYPE_R);
> -#ifdef CONFIG_X86_64
> -	vmx_disable_intercept_for_msr(vcpu, MSR_FS_BASE, MSR_TYPE_RW);
> -	vmx_disable_intercept_for_msr(vcpu, MSR_GS_BASE, MSR_TYPE_RW);
> -	vmx_disable_intercept_for_msr(vcpu, MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
> -#endif
> -	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
> -	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
> -	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
> -	if (kvm_cstate_in_guest(vcpu->kvm)) {
> -		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C1_RES, MSR_TYPE_R);
> -		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
> -		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
> -		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
> -	}
> -
>   	vmx->loaded_vmcs = &vmx->vmcs01;
>   
>   	if (cpu_need_virtualize_apic_accesses(vcpu)) {
> @@ -7842,18 +7760,6 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>   		}
>   	}
>   
> -	if (kvm_cpu_cap_has(X86_FEATURE_XFD))
> -		vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_R,
> -					  !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD));
> -
> -	if (boot_cpu_has(X86_FEATURE_IBPB))
> -		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
> -					  !guest_has_pred_cmd_msr(vcpu));
> -
> -	if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
> -		vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
> -					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
> -
>   	set_cr4_guest_host_mask(vmx);
>   
>   	vmx_write_encls_bitmap(vcpu, NULL);
> @@ -7869,6 +7775,9 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>   		vmx->msr_ia32_feature_control_valid_bits &=
>   			~FEAT_CTL_SGX_LC_ENABLED;
>   
> +	/* Recalc MSR interception to account for feature changes. */
> +	vmx_recalc_msr_intercepts(vcpu);
> +
>   	/* Refresh #PF interception to account for MAXPHYADDR changes. */
>   	vmx_update_exception_bitmap(vcpu);
>   }
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 0afe97e3478f..a26fe3d9e1d2 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -294,13 +294,6 @@ struct vcpu_vmx {
>   	struct pt_desc pt_desc;
>   	struct lbr_desc lbr_desc;
>   
> -	/* Save desired MSR intercept (read: pass-through) state */
> -#define MAX_POSSIBLE_PASSTHROUGH_MSRS	16
> -	struct {
> -		DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
> -		DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
> -	} shadow_msr_intercept;
> -
>   	/* ve_info must be page aligned. */
>   	struct vmx_ve_information *ve_info;
>   };


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 20/32] KVM: x86: Rename msr_filter_changed() => recalc_msr_intercepts()
  2025-06-10 22:57 ` [PATCH v2 20/32] KVM: x86: Rename msr_filter_changed() => recalc_msr_intercepts() Sean Christopherson
@ 2025-06-11  7:09   ` Binbin Wu
  0 siblings, 0 replies; 46+ messages in thread
From: Binbin Wu @ 2025-06-11  7:09 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Chao Gao, Borislav Petkov,
	Xin Li, Dapeng Mi, Francesco Lavra, Manali Shukla



On 6/11/2025 6:57 AM, Sean Christopherson wrote:
> Rename msr_filter_changed() to recalc_msr_intercepts() and drop the
> trampoline wrapper now that both SVM and VMX use a filter-agnostic recalc
> helper to react to the new userspace filter.
>
> No functional change intended.
>
> Reviewed-by: Xin Li (Intel) <xin@zytor.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

> ---
>   arch/x86/include/asm/kvm-x86-ops.h | 2 +-
>   arch/x86/include/asm/kvm_host.h    | 2 +-
>   arch/x86/kvm/svm/svm.c             | 8 +-------
>   arch/x86/kvm/vmx/main.c            | 6 +++---
>   arch/x86/kvm/vmx/vmx.c             | 7 +------
>   arch/x86/kvm/vmx/x86_ops.h         | 2 +-
>   arch/x86/kvm/x86.c                 | 8 +++++++-
>   7 files changed, 15 insertions(+), 20 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 8d50e3e0a19b..19a6735d6dd8 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -139,7 +139,7 @@ KVM_X86_OP(check_emulate_instruction)
>   KVM_X86_OP(apic_init_signal_blocked)
>   KVM_X86_OP_OPTIONAL(enable_l2_tlb_flush)
>   KVM_X86_OP_OPTIONAL(migrate_timers)
> -KVM_X86_OP(msr_filter_changed)
> +KVM_X86_OP(recalc_msr_intercepts)
>   KVM_X86_OP(complete_emulated_msr)
>   KVM_X86_OP(vcpu_deliver_sipi_vector)
>   KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 330cdcbed1a6..89a626e5b80f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1885,7 +1885,7 @@ struct kvm_x86_ops {
>   	int (*enable_l2_tlb_flush)(struct kvm_vcpu *vcpu);
>   
>   	void (*migrate_timers)(struct kvm_vcpu *vcpu);
> -	void (*msr_filter_changed)(struct kvm_vcpu *vcpu);
> +	void (*recalc_msr_intercepts)(struct kvm_vcpu *vcpu);
>   	int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
>   
>   	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index de3d59c71229..710bc5f965dc 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -896,11 +896,6 @@ static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
>   	 */
>   }
>   
> -static void svm_msr_filter_changed(struct kvm_vcpu *vcpu)
> -{
> -	svm_recalc_msr_intercepts(vcpu);
> -}
> -
>   void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
>   {
>   	to_vmcb->save.dbgctl		= from_vmcb->save.dbgctl;
> @@ -929,7 +924,6 @@ static void svm_disable_lbrv(struct kvm_vcpu *vcpu)
>   	struct vcpu_svm *svm = to_svm(vcpu);
>   
>   	KVM_BUG_ON(sev_es_guest(vcpu->kvm), vcpu->kvm);
> -
>   	svm->vmcb->control.virt_ext &= ~LBR_CTL_ENABLE_MASK;
>   	svm_recalc_lbr_msr_intercepts(vcpu);
>   
> @@ -5227,7 +5221,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>   
>   	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
>   
> -	.msr_filter_changed = svm_msr_filter_changed,
> +	.recalc_msr_intercepts = svm_recalc_msr_intercepts,
>   	.complete_emulated_msr = svm_complete_emulated_msr,
>   
>   	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> index d1e02e567b57..b3c58731a2f5 100644
> --- a/arch/x86/kvm/vmx/main.c
> +++ b/arch/x86/kvm/vmx/main.c
> @@ -220,7 +220,7 @@ static int vt_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   	return vmx_get_msr(vcpu, msr_info);
>   }
>   
> -static void vt_msr_filter_changed(struct kvm_vcpu *vcpu)
> +static void vt_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
>   {
>   	/*
>   	 * TDX doesn't allow VMM to configure interception of MSR accesses.
> @@ -231,7 +231,7 @@ static void vt_msr_filter_changed(struct kvm_vcpu *vcpu)
>   	if (is_td_vcpu(vcpu))
>   		return;
>   
> -	vmx_msr_filter_changed(vcpu);
> +	vmx_recalc_msr_intercepts(vcpu);
>   }
>   
>   static int vt_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
> @@ -1034,7 +1034,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
>   	.apic_init_signal_blocked = vt_op(apic_init_signal_blocked),
>   	.migrate_timers = vmx_migrate_timers,
>   
> -	.msr_filter_changed = vt_op(msr_filter_changed),
> +	.recalc_msr_intercepts = vt_op(recalc_msr_intercepts),
>   	.complete_emulated_msr = vt_op(complete_emulated_msr),
>   
>   	.vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index ce7a1c07e402..bdff81f8288d 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4074,7 +4074,7 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
>   	}
>   }
>   
> -static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
> +void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
>   {
>   	if (!cpu_has_vmx_msr_bitmap())
>   		return;
> @@ -4123,11 +4123,6 @@ static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
>   	 */
>   }
>   
> -void vmx_msr_filter_changed(struct kvm_vcpu *vcpu)
> -{
> -	vmx_recalc_msr_intercepts(vcpu);
> -}
> -
>   static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
>   						int vector)
>   {
> diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
> index b4596f651232..34c6e683e321 100644
> --- a/arch/x86/kvm/vmx/x86_ops.h
> +++ b/arch/x86/kvm/vmx/x86_ops.h
> @@ -52,7 +52,7 @@ void vmx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
>   			   int trig_mode, int vector);
>   void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu);
>   bool vmx_has_emulated_msr(struct kvm *kvm, u32 index);
> -void vmx_msr_filter_changed(struct kvm_vcpu *vcpu);
> +void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu);
>   void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
>   void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu);
>   int vmx_get_feature_msr(u32 msr, u64 *data);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index dd34a2ec854c..cc9a01b6dbc8 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10926,8 +10926,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>   			kvm_vcpu_update_apicv(vcpu);
>   		if (kvm_check_request(KVM_REQ_APF_READY, vcpu))
>   			kvm_check_async_pf_completion(vcpu);
> +
> +		/*
> +		 * Recalc MSR intercepts as userspace may want to intercept
> +		 * accesses to MSRs that KVM would otherwise pass through to
> +		 * the guest.
> +		 */
>   		if (kvm_check_request(KVM_REQ_MSR_FILTER_CHANGED, vcpu))
> -			kvm_x86_call(msr_filter_changed)(vcpu);
> +			kvm_x86_call(recalc_msr_intercepts)(vcpu);
>   
>   		if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
>   			kvm_x86_call(update_cpu_dirty_logging)(vcpu);


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 01/32] KVM: SVM: Disable interception of SPEC_CTRL iff the MSR exists for the guest
  2025-06-11  4:38   ` Binbin Wu
@ 2025-06-11  7:14     ` Binbin Wu
  0 siblings, 0 replies; 46+ messages in thread
From: Binbin Wu @ 2025-06-11  7:14 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Chao Gao, Borislav Petkov,
	Xin Li, Dapeng Mi, Francesco Lavra, Manali Shukla



On 6/11/2025 12:38 PM, Binbin Wu wrote:
>
>
> On 6/11/2025 6:57 AM, Sean Christopherson wrote:
>> Disable interception of SPEC_CTRL when the CPU virtualizes (i.e. context
>> switches) SPEC_CTRL if and only if the MSR exists according to the vCPU's
>> CPUID model.  Letting the guest access SPEC_CTRL is generally benign, but
>> the guest would see inconsistent behavior if KVM happened to emulate an
>> access to the MSR.
>>
>> Fixes: d00b99c514b3 ("KVM: SVM: Add support for Virtual SPEC_CTRL")
>> Reported-by: Chao Gao <chao.gao@intel.com>
>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>> ---
>>   arch/x86/kvm/svm/svm.c | 9 ++++++---
>>   1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index 0ad1a6d4fb6d..21e745acebc3 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -1362,11 +1362,14 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
>>       svm_recalc_instruction_intercepts(vcpu, svm);
>>         /*
>> -     * If the host supports V_SPEC_CTRL then disable the interception
>> -     * of MSR_IA32_SPEC_CTRL.
>> +     * If the CPU virtualizes MSR_IA32_SPEC_CTRL, i.e. KVM doesn't need to
>> +     * manually context switch the MSR, immediately configure interception
>> +     * of SPEC_CTRL, without waiting for the guest to access the MSR.
>>        */
>>       if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
>> -        set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
>> +        set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL,
>> +                     guest_has_spec_ctrl_msr(vcpu),
>> +                     guest_has_spec_ctrl_msr(vcpu));
> Side topic, not related to this patch directly.
>
> Setting to 1 for set_msr_interception() means to disable interception.
> The name of the function seems a bit counterintuitive to me.
> Maybe some description for the function can help people not familiar with
> SVM code without further checking the implementation?

Oh, please ignore it.

A later patch in this patch set has handled it.

>
>
>>         if (kvm_vcpu_apicv_active(vcpu))
>>           avic_init_vmcb(svm, vmcb);
>
>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 14/32] KVM: SVM: Implement and adopt VMX style MSR intercepts APIs
  2025-06-10 22:57 ` [PATCH v2 14/32] KVM: SVM: Implement and adopt VMX style MSR intercepts APIs Sean Christopherson
@ 2025-06-11  7:31   ` Binbin Wu
  0 siblings, 0 replies; 46+ messages in thread
From: Binbin Wu @ 2025-06-11  7:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Chao Gao, Borislav Petkov,
	Xin Li, Dapeng Mi, Francesco Lavra, Manali Shukla



On 6/11/2025 6:57 AM, Sean Christopherson wrote:
> Add and use SVM MSR interception APIs (in most paths) to match VMX's
> APIs and nomenclature.  Specifically, add SVM variants of:
>
>          vmx_disable_intercept_for_msr(vcpu, msr, type)
>          vmx_enable_intercept_for_msr(vcpu, msr, type)
>          vmx_set_intercept_for_msr(vcpu, msr, type, intercept)
>
> to eventually replace SVM's single helper:
>
>          set_msr_interception(vcpu, msrpm, msr, allow_read, allow_write)
>
> which is awkward to use (in all cases, KVM either applies the same logic
> for both reads and writes, or intercepts one of read or write), and is
> unintuitive due to using '0' to indicate interception should be *set*.
>
> Keep the guts of the old API for the moment to avoid churning the MSR
> filter code, as that mess will be overhauled in the near future.  Leave
> behind a temporary comment to call out that the shadow bitmaps have
> inverted polarity relative to the bitmaps consumed by hardware.
>
> No functional change intended.
>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

[...]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 00/32] KVM: x86: Clean up MSR interception code
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (31 preceding siblings ...)
  2025-06-10 22:57 ` [PATCH v2 32/32] KVM: selftests: Verify KVM disable interception (for userspace) on filter change Sean Christopherson
@ 2025-06-24 19:38 ` Sean Christopherson
  2025-06-25 12:03 ` Manali Shukla
  33 siblings, 0 replies; 46+ messages in thread
From: Sean Christopherson @ 2025-06-24 19:38 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra, Manali Shukla

On Tue, 10 Jun 2025 15:57:05 -0700, Sean Christopherson wrote:
> Clean up KVM's MSR interception code (especially the SVM code, which is all
> kinds of ugly).  The main goals are to:
> 
>  - Make the SVM and VMX APIs consistent (and sane; the current SVM APIs have
>    inverted polarity).
> 
>  - Eliminate the shadow bitmaps that are used to determine intercepts on
>    userspace MSR filter update.
> 
> [...]

Applied to kvm-x86 misc, thanks!

[01/32] KVM: SVM: Disable interception of SPEC_CTRL iff the MSR exists for the guest
        https://github.com/kvm-x86/linux/commit/674ffc650351
[02/32] KVM: SVM: Allocate IOPM pages after initial setup in svm_hardware_setup()
        https://github.com/kvm-x86/linux/commit/fb96d5cf0fda
[03/32] KVM: SVM: Don't BUG if setting up the MSR intercept bitmaps fails
        https://github.com/kvm-x86/linux/commit/5ebd73730832
[04/32] KVM: SVM: Tag MSR bitmap initialization helpers with __init
        https://github.com/kvm-x86/linux/commit/f886515f9ba2
[05/32] KVM: SVM: Use ARRAY_SIZE() to iterate over direct_access_msrs
        https://github.com/kvm-x86/linux/commit/b241c50c4e30
[06/32] KVM: SVM: Kill the VM instead of the host if MSR interception is buggy
        https://github.com/kvm-x86/linux/commit/6353cd685c69
[07/32] KVM: x86: Use non-atomic bit ops to manipulate "shadow" MSR intercepts
        https://github.com/kvm-x86/linux/commit/b1bccf788390
[08/32] KVM: SVM: Massage name and param of helper that merges vmcb01 and vmcb12 MSRPMs
        https://github.com/kvm-x86/linux/commit/925149b6d054
[09/32] KVM: SVM: Clean up macros related to architectural MSRPM definitions
        https://github.com/kvm-x86/linux/commit/16e9584cc0a8
[10/32] KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps
        https://github.com/kvm-x86/linux/commit/9b72c3d59f42
[11/32] KVM: nSVM: Omit SEV-ES specific passthrough MSRs from L0+L1 bitmap merge
        https://github.com/kvm-x86/linux/commit/f21ff2c8c997
[12/32] KVM: nSVM: Don't initialize vmcb02 MSRPM with vmcb01's "always passthrough"
        https://github.com/kvm-x86/linux/commit/4879dc9469e6
[13/32] KVM: SVM: Add helpers for accessing MSR bitmap that don't rely on offsets
        https://github.com/kvm-x86/linux/commit/c38595ad69ce
[14/32] KVM: SVM: Implement and adopt VMX style MSR intercepts APIs
        https://github.com/kvm-x86/linux/commit/6b7315fe54ce
[15/32] KVM: SVM: Pass through GHCB MSR if and only if VM is an SEV-ES guest
        https://github.com/kvm-x86/linux/commit/3a0f09b361e1
[16/32] KVM: SVM: Drop "always" flag from list of possible passthrough MSRs
        https://github.com/kvm-x86/linux/commit/cb53d079484c
[17/32] KVM: x86: Move definition of X2APIC_MSR() to lapic.h
        https://github.com/kvm-x86/linux/commit/405a63d4d386
[18/32] KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter change
        https://github.com/kvm-x86/linux/commit/8a056ece45d2
[19/32] KVM: SVM: Manually recalc all MSR intercepts on userspace MSR filter change
        https://github.com/kvm-x86/linux/commit/160f143cc131
[20/32] KVM: x86: Rename msr_filter_changed() => recalc_msr_intercepts()
        https://github.com/kvm-x86/linux/commit/4ceca57e3f20
[21/32] KVM: SVM: Rename init_vmcb_after_set_cpuid() to make it intercepts specific
        https://github.com/kvm-x86/linux/commit/049dff172b6d
[22/32] KVM: SVM: Fold svm_vcpu_init_msrpm() into its sole caller
        https://github.com/kvm-x86/linux/commit/40ba80e4b043
[23/32] KVM: SVM: Merge "after set CPUID" intercept recalc helpers
        https://github.com/kvm-x86/linux/commit/4880919aaf8d
[24/32] KVM: SVM: Drop explicit check on MSRPM offset when emulating SEV-ES accesses
        https://github.com/kvm-x86/linux/commit/2f89888434bc
[25/32] KVM: SVM: Move svm_msrpm_offset() to nested.c
        https://github.com/kvm-x86/linux/commit/5c9c08476363
[26/32] KVM: SVM: Store MSRPM pointer as "void *" instead of "u32 *"
        https://github.com/kvm-x86/linux/commit/7fe057804118
[27/32] KVM: nSVM: Access MSRPM in 4-byte chunks only for merging L0 and L1 bitmaps
        https://github.com/kvm-x86/linux/commit/52f82177429e
[28/32] KVM: SVM: Return -EINVAL instead of MSR_INVALID to signal out-of-range MSR
        https://github.com/kvm-x86/linux/commit/5904ba517246
[29/32] KVM: nSVM: Merge MSRPM in 64-bit chunks on 64-bit kernels
        https://github.com/kvm-x86/linux/commit/54f1c770611b
[30/32] KVM: SVM: Add a helper to allocate and initialize permissions bitmaps
        https://github.com/kvm-x86/linux/commit/73be81b3bb7c
[31/32] KVM: x86: Simplify userspace filter logic when disabling MSR interception
        https://github.com/kvm-x86/linux/commit/bea44d199240
[32/32] KVM: selftests: Verify KVM disable interception (for userspace) on filter change
        https://github.com/kvm-x86/linux/commit/0792c71c1c94

--
https://github.com/kvm-x86/kvm-unit-tests/tree/next

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 00/32] KVM: x86: Clean up MSR interception code
  2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (32 preceding siblings ...)
  2025-06-24 19:38 ` [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
@ 2025-06-25 12:03 ` Manali Shukla
  33 siblings, 0 replies; 46+ messages in thread
From: Manali Shukla @ 2025-06-25 12:03 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Chao Gao, Borislav Petkov, Xin Li, Dapeng Mi,
	Francesco Lavra

Hi Sean,

On 6/11/2025 4:27 AM, Sean Christopherson wrote:
> Clean up KVM's MSR interception code (especially the SVM code, which is all
> kinds of ugly).  The main goals are to:
> 
>  - Make the SVM and VMX APIs consistent (and sane; the current SVM APIs have
>    inverted polarity).
> 
>  - Eliminate the shadow bitmaps that are used to determine intercepts on
>    userspace MSR filter update.
> 
> v2:
>  - Add a patch to set MSR_IA32_SPEC_CTRL interception as appropriate. [Chao]
>  - Add a patch to cleanup {svm,vmx}_disable_intercept_for_msr() once the
>    dust has settled. [Dapeng]
>  - Return -ENOSPC if msrpm_offsets[] is full. [Chao]
>  - Free iopm_pages directly instead of bouncing through iopm_base. [Chao]
>  - Check for "offset == MSR_INVALID" before using offset. [Chao]
>  - Temporarily keep MSR_IA32_DEBUGCTLMSR in the nested list. [Chao]
>  - Add a comment to explain nested_svm_msrpm_merge_offsets. [Chao]
>  - Add a patch to shift the IOPM allocation to avoid having to unwind it.
>  - Init nested_svm_msrpm_merge_offsets iff nested=1. [Chao]
>  - Add a helper to dedup alloc+init of MSRPM and IOPM.
>  - Tag merge_msrs as "static" and "__initconst". [Paolo]
>  - Rework helpers to use fewer macros. [Paolo]
>  - Account for each MSRPM byte covering 4 MSRs. [Paolo]
>  - Opportunistically use cpu_feature_enabled(). [Xin]
>  - Fully remove MAX_DIRECT_ACCESS_MSRS, MSRPM_OFFSETS, and msrpm_offsets.
>    [Francesco]
>  - Fix typos. [Dapeng, Chao]
>  - Collect reviews. [Chao, Dapeng, Xin]
> 
> v1: https://lore.kernel.org/all/20250529234013.3826933-1-seanjc@google.com
> 
> v0: https://lore.kernel.org/kvm/20241127201929.4005605-1-aaronlewis@google.com
> 
> Sean Christopherson (32):
>   KVM: SVM: Disable interception of SPEC_CTRL iff the MSR exists for the
>     guest
>   KVM: SVM: Allocate IOPM pages after initial setup in
>     svm_hardware_setup()
>   KVM: SVM: Don't BUG if setting up the MSR intercept bitmaps fails
>   KVM: SVM: Tag MSR bitmap initialization helpers with __init
>   KVM: SVM: Use ARRAY_SIZE() to iterate over direct_access_msrs
>   KVM: SVM: Kill the VM instead of the host if MSR interception is buggy
>   KVM: x86: Use non-atomic bit ops to manipulate "shadow" MSR intercepts
>   KVM: SVM: Massage name and param of helper that merges vmcb01 and
>     vmcb12 MSRPMs
>   KVM: SVM: Clean up macros related to architectural MSRPM definitions
>   KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1
>     bitmaps
>   KVM: nSVM: Omit SEV-ES specific passthrough MSRs from L0+L1 bitmap
>     merge
>   KVM: nSVM: Don't initialize vmcb02 MSRPM with vmcb01's "always
>     passthrough"
>   KVM: SVM: Add helpers for accessing MSR bitmap that don't rely on
>     offsets
>   KVM: SVM: Implement and adopt VMX style MSR intercepts APIs
>   KVM: SVM: Pass through GHCB MSR if and only if VM is an SEV-ES guest
>   KVM: SVM: Drop "always" flag from list of possible passthrough MSRs
>   KVM: x86: Move definition of X2APIC_MSR() to lapic.h
>   KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter
>     change
>   KVM: SVM: Manually recalc all MSR intercepts on userspace MSR filter
>     change
>   KVM: x86: Rename msr_filter_changed() => recalc_msr_intercepts()
>   KVM: SVM: Rename init_vmcb_after_set_cpuid() to make it intercepts
>     specific
>   KVM: SVM: Fold svm_vcpu_init_msrpm() into its sole caller
>   KVM: SVM: Merge "after set CPUID" intercept recalc helpers
>   KVM: SVM: Drop explicit check on MSRPM offset when emulating SEV-ES
>     accesses
>   KVM: SVM: Move svm_msrpm_offset() to nested.c
>   KVM: SVM: Store MSRPM pointer as "void *" instead of "u32 *"
>   KVM: nSVM: Access MSRPM in 4-byte chunks only for merging L0 and L1
>     bitmaps
>   KVM: SVM: Return -EINVAL instead of MSR_INVALID to signal out-of-range
>     MSR
>   KVM: nSVM: Merge MSRPM in 64-bit chunks on 64-bit kernels
>   KVM: SVM: Add a helper to allocate and initialize permissions bitmaps
>   KVM: x86: Simplify userspace filter logic when disabling MSR
>     interception
>   KVM: selftests: Verify KVM disable interception (for userspace) on
>     filter change
> 
>  arch/x86/include/asm/kvm-x86-ops.h            |   2 +-
>  arch/x86/include/asm/kvm_host.h               |   2 +-
>  arch/x86/kvm/lapic.h                          |   2 +
>  arch/x86/kvm/svm/nested.c                     | 126 +++--
>  arch/x86/kvm/svm/sev.c                        |  29 +-
>  arch/x86/kvm/svm/svm.c                        | 490 ++++++------------
>  arch/x86/kvm/svm/svm.h                        | 102 +++-
>  arch/x86/kvm/vmx/main.c                       |   6 +-
>  arch/x86/kvm/vmx/vmx.c                        | 202 ++------
>  arch/x86/kvm/vmx/vmx.h                        |   9 -
>  arch/x86/kvm/vmx/x86_ops.h                    |   2 +-
>  arch/x86/kvm/x86.c                            |   8 +-
>  .../kvm/x86/userspace_msr_exit_test.c         |   8 +
>  13 files changed, 426 insertions(+), 562 deletions(-)
> 
> 
> base-commit: 61374cc145f4a56377eaf87c7409a97ec7a34041


I’ve tested this patch series using the `msr` tests from kvm-unit-tests and didn’t
observe any unexpected results.

Additionally, I rebased the mediated PMU v4 patches on top of this series and ran
PMU-related tests from kvm-unit-tests with the following configurations:

  -cpu host
  -cpu host,-perfctr-core
  -cpu host,-perfmon-v2

I don't see any unexpected results.
Testing was performed on a Turin machine (AMD EPYC 9745 128-Core Processor).

I understand the patches are already merged, but just wanted to share this for reference.  

Feel free to add:
Tested-by: Manali Shukla <Manali.Shukla@amd.com>

-Manali




^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2025-06-25 12:03 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-10 22:57 [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
2025-06-10 22:57 ` [PATCH v2 01/32] KVM: SVM: Disable interception of SPEC_CTRL iff the MSR exists for the guest Sean Christopherson
2025-06-11  4:38   ` Binbin Wu
2025-06-11  7:14     ` Binbin Wu
2025-06-10 22:57 ` [PATCH v2 02/32] KVM: SVM: Allocate IOPM pages after initial setup in svm_hardware_setup() Sean Christopherson
2025-06-10 22:57 ` [PATCH v2 03/32] KVM: SVM: Don't BUG if setting up the MSR intercept bitmaps fails Sean Christopherson
2025-06-10 22:57 ` [PATCH v2 04/32] KVM: SVM: Tag MSR bitmap initialization helpers with __init Sean Christopherson
2025-06-10 22:57 ` [PATCH v2 05/32] KVM: SVM: Use ARRAY_SIZE() to iterate over direct_access_msrs Sean Christopherson
2025-06-10 22:57 ` [PATCH v2 06/32] KVM: SVM: Kill the VM instead of the host if MSR interception is buggy Sean Christopherson
2025-06-11  2:16   ` Mi, Dapeng
2025-06-10 22:57 ` [PATCH v2 07/32] KVM: x86: Use non-atomic bit ops to manipulate "shadow" MSR intercepts Sean Christopherson
2025-06-11  6:38   ` Binbin Wu
2025-06-10 22:57 ` [PATCH v2 08/32] KVM: SVM: Massage name and param of helper that merges vmcb01 and vmcb12 MSRPMs Sean Christopherson
2025-06-11  2:22   ` Mi, Dapeng
2025-06-10 22:57 ` [PATCH v2 09/32] KVM: SVM: Clean up macros related to architectural MSRPM definitions Sean Christopherson
2025-06-11  6:09   ` Binbin Wu
2025-06-10 22:57 ` [PATCH v2 10/32] KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps Sean Christopherson
2025-06-10 22:57 ` [PATCH v2 11/32] KVM: nSVM: Omit SEV-ES specific passthrough MSRs from L0+L1 bitmap merge Sean Christopherson
2025-06-10 22:57 ` [PATCH v2 12/32] KVM: nSVM: Don't initialize vmcb02 MSRPM with vmcb01's "always passthrough" Sean Christopherson
2025-06-10 22:57 ` [PATCH v2 13/32] KVM: SVM: Add helpers for accessing MSR bitmap that don't rely on offsets Sean Christopherson
2025-06-10 22:57 ` [PATCH v2 14/32] KVM: SVM: Implement and adopt VMX style MSR intercepts APIs Sean Christopherson
2025-06-11  7:31   ` Binbin Wu
2025-06-10 22:57 ` [PATCH v2 15/32] KVM: SVM: Pass through GHCB MSR if and only if VM is an SEV-ES guest Sean Christopherson
2025-06-10 22:57 ` [PATCH v2 16/32] KVM: SVM: Drop "always" flag from list of possible passthrough MSRs Sean Christopherson
2025-06-10 22:57 ` [PATCH v2 17/32] KVM: x86: Move definition of X2APIC_MSR() to lapic.h Sean Christopherson
2025-06-11  2:29   ` Mi, Dapeng
2025-06-10 22:57 ` [PATCH v2 18/32] KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter change Sean Christopherson
2025-06-11  6:52   ` Binbin Wu
2025-06-10 22:57 ` [PATCH v2 19/32] KVM: SVM: " Sean Christopherson
2025-06-10 22:57 ` [PATCH v2 20/32] KVM: x86: Rename msr_filter_changed() => recalc_msr_intercepts() Sean Christopherson
2025-06-11  7:09   ` Binbin Wu
2025-06-10 22:57 ` [PATCH v2 21/32] KVM: SVM: Rename init_vmcb_after_set_cpuid() to make it intercepts specific Sean Christopherson
2025-06-10 22:57 ` [PATCH v2 22/32] KVM: SVM: Fold svm_vcpu_init_msrpm() into its sole caller Sean Christopherson
2025-06-10 22:57 ` [PATCH v2 23/32] KVM: SVM: Merge "after set CPUID" intercept recalc helpers Sean Christopherson
2025-06-10 22:57 ` [PATCH v2 24/32] KVM: SVM: Drop explicit check on MSRPM offset when emulating SEV-ES accesses Sean Christopherson
2025-06-10 22:57 ` [PATCH v2 25/32] KVM: SVM: Move svm_msrpm_offset() to nested.c Sean Christopherson
2025-06-10 22:57 ` [PATCH v2 26/32] KVM: SVM: Store MSRPM pointer as "void *" instead of "u32 *" Sean Christopherson
2025-06-10 22:57 ` [PATCH v2 27/32] KVM: nSVM: Access MSRPM in 4-byte chunks only for merging L0 and L1 bitmaps Sean Christopherson
2025-06-10 22:57 ` [PATCH v2 28/32] KVM: SVM: Return -EINVAL instead of MSR_INVALID to signal out-of-range MSR Sean Christopherson
2025-06-10 22:57 ` [PATCH v2 29/32] KVM: nSVM: Merge MSRPM in 64-bit chunks on 64-bit kernels Sean Christopherson
2025-06-10 22:57 ` [PATCH v2 30/32] KVM: SVM: Add a helper to allocate and initialize permissions bitmaps Sean Christopherson
2025-06-10 22:57 ` [PATCH v2 31/32] KVM: x86: Simplify userspace filter logic when disabling MSR interception Sean Christopherson
2025-06-11  2:35   ` Mi, Dapeng
2025-06-10 22:57 ` [PATCH v2 32/32] KVM: selftests: Verify KVM disable interception (for userspace) on filter change Sean Christopherson
2025-06-24 19:38 ` [PATCH v2 00/32] KVM: x86: Clean up MSR interception code Sean Christopherson
2025-06-25 12:03 ` Manali Shukla

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox