[PATCH 00/28] KVM: x86: Clean up MSR interception code

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/28] KVM: x86: Clean up MSR interception code
@ 2025-05-29 23:39 Sean Christopherson
  2025-05-29 23:39 ` [PATCH 01/28] KVM: SVM: Don't BUG if setting up the MSR intercept bitmaps fails Sean Christopherson
                   ` (27 more replies)
  0 siblings, 28 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:39 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Clean up KVM's MSR interception code (especially the SVM code, which is all
kinds of ugly).  The main goals are to:

 - Make the SVM and VMX APIs consistent (and sane; the current SVM APIs have
   inverted polarity).

 - Eliminate the shadow bitmaps that are used to determine intercepts on
   userspace MSR filter update.

Folks that are explicitly Cc'd, my plan/hope is to apply this in advance
of landing the CET virtualization and mediated PMU series, so that we don't
need to deal with extended the shadow bitmaps.  Any reviews/testing you can
provide to help make that happen would be greatly appreciated.

Note, this is a spiritual successor to the "Unify MSR intercepts in x86"
series that was posted last year[*], but I started the versioning back at
v1 as very, very little of the code actually survived, and there's obviously
no true unification in this series.  That series also had several bugs (that
were never pointed out on list), so I wanted to make a clean break.

FWIW, I still like the _idea_ of unified code, but with the shadow bitmaps
gone, it's not actually that much code, and the logic isn't all that complex.
In the end, I couldn't convince myself that unifying that small amount of
logic was worth taking on the complexity of generating and passing around bit
numbers and bitmap pointers to common code (or adding 4 more kvm_x86_ops hooks).

[*] https://lore.kernel.org/kvm/20241127201929.4005605-1-aaronlewis@google.com

Sean Christopherson (28):
  KVM: SVM: Don't BUG if setting up the MSR intercept bitmaps fails
  KVM: SVM: Tag MSR bitmap initialization helpers with __init
  KVM: SVM: Use ARRAY_SIZE() to iterate over direct_access_msrs
  KVM: SVM: Kill the VM instead of the host if MSR interception is buggy
  KVM: x86: Use non-atomic bit ops to manipulate "shadow" MSR intercepts
  KVM: SVM: Massage name and param of helper that merges vmcb01 and
    vmcb12 MSRPMs
  KVM: SVM: Clean up macros related to architectural MSRPM definitions
  KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1
    bitmaps
  KVM: nSVM: Omit SEV-ES specific passthrough MSRs from L0+L1 bitmap
    merge
  KVM: nSVM: Don't initialize vmcb02 MSRPM with vmcb01's "always
    passthrough"
  KVM: SVM: Add helpers for accessing MSR bitmap that don't rely on
    offsets
  KVM: SVM: Implement and adopt VMX style MSR intercepts APIs
  KVM: SVM: Pass through GHCB MSR if and only if VM is an SEV-ES guest
  KVM: SVM: Drop "always" flag from list of possible passthrough MSRs
  KVM: x86: Move definition of X2APIC_MSR() to lapic.h
  KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter
    change
  KVM: SVM: Manually recalc all MSR intercepts on userspace MSR filter
    change
  KVM: x86: Rename msr_filter_changed() => recalc_msr_intercepts()
  KVM: SVM: Rename init_vmcb_after_set_cpuid() to make it intercepts
    specific
  KVM: SVM: Fold svm_vcpu_init_msrpm() into its sole caller
  KVM: SVM: Merge "after set CPUID" intercept recalc helpers
  KVM: SVM: Drop explicit check on MSRPM offset when emulating SEV-ES
    accesses
  KVM: SVM: Move svm_msrpm_offset() to nested.c
  KVM: SVM: Store MSRPM pointer as "void *" instead of "u32 *"
  KVM: nSVM: Access MSRPM in 4-byte chunks only for merging L0 and L1
    bitmaps
  KVM: SVM: Return -EINVAL instead of MSR_INVALID to signal out-of-range
    MSR
  KVM: nSVM: Merge MSRPM in 64-bit chunks on 64-bit kernels
  KVM: selftests: Verify KVM disable interception (for userspace) on
    filter change

 arch/x86/include/asm/kvm-x86-ops.h            |   2 +-
 arch/x86/include/asm/kvm_host.h               |   2 +-
 arch/x86/kvm/lapic.h                          |   2 +
 arch/x86/kvm/svm/nested.c                     | 128 +++--
 arch/x86/kvm/svm/sev.c                        |  29 +-
 arch/x86/kvm/svm/svm.c                        | 449 ++++++------------
 arch/x86/kvm/svm/svm.h                        | 107 ++++-
 arch/x86/kvm/vmx/main.c                       |   6 +-
 arch/x86/kvm/vmx/vmx.c                        | 179 ++-----
 arch/x86/kvm/vmx/vmx.h                        |   9 -
 arch/x86/kvm/vmx/x86_ops.h                    |   2 +-
 arch/x86/kvm/x86.c                            |   8 +-
 .../kvm/x86/userspace_msr_exit_test.c         |   8 +
 13 files changed, 408 insertions(+), 523 deletions(-)


base-commit: 3f7b307757ecffc1c18ede9ee3cf9ce8101f3cc9
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 01/28] KVM: SVM: Don't BUG if setting up the MSR intercept bitmaps fails
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
@ 2025-05-29 23:39 ` Sean Christopherson
  2025-06-03  7:17   ` Chao Gao
  2025-05-29 23:39 ` [PATCH 02/28] KVM: SVM: Tag MSR bitmap initialization helpers with __init Sean Christopherson
                   ` (26 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:39 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

WARN and reject module loading if there is a problem with KVM's MSR
interception bitmaps.  Panicking the host in this situation is inexcusable
since it is trivially easy to propagate the error up the stack.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 0ad1a6d4fb6d..bd75ff8e4f20 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -945,7 +945,7 @@ static void svm_msr_filter_changed(struct kvm_vcpu *vcpu)
 	}
 }
 
-static void add_msr_offset(u32 offset)
+static int add_msr_offset(u32 offset)
 {
 	int i;
 
@@ -953,7 +953,7 @@ static void add_msr_offset(u32 offset)
 
 		/* Offset already in list? */
 		if (msrpm_offsets[i] == offset)
-			return;
+			return 0;
 
 		/* Slot used by another offset? */
 		if (msrpm_offsets[i] != MSR_INVALID)
@@ -962,17 +962,13 @@ static void add_msr_offset(u32 offset)
 		/* Add offset to list */
 		msrpm_offsets[i] = offset;
 
-		return;
+		return 0;
 	}
 
-	/*
-	 * If this BUG triggers the msrpm_offsets table has an overflow. Just
-	 * increase MSRPM_OFFSETS in this case.
-	 */
-	BUG();
+	return -EIO;
 }
 
-static void init_msrpm_offsets(void)
+static int init_msrpm_offsets(void)
 {
 	int i;
 
@@ -982,10 +978,13 @@ static void init_msrpm_offsets(void)
 		u32 offset;
 
 		offset = svm_msrpm_offset(direct_access_msrs[i].index);
-		BUG_ON(offset == MSR_INVALID);
+		if (WARN_ON(offset == MSR_INVALID))
+			return -EIO;
 
-		add_msr_offset(offset);
+		if (WARN_ON_ONCE(add_msr_offset(offset)))
+			return -EIO;
 	}
+	return 0;
 }
 
 void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
@@ -5511,7 +5510,11 @@ static __init int svm_hardware_setup(void)
 	memset(iopm_va, 0xff, PAGE_SIZE * (1 << order));
 	iopm_base = __sme_page_pa(iopm_pages);
 
-	init_msrpm_offsets();
+	r = init_msrpm_offsets();
+	if (r) {
+		__free_pages(__sme_pa_to_page(iopm_base), get_order(IOPM_SIZE));
+		return r;
+	}
 
 	kvm_caps.supported_xcr0 &= ~(XFEATURE_MASK_BNDREGS |
 				     XFEATURE_MASK_BNDCSR);
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 02/28] KVM: SVM: Tag MSR bitmap initialization helpers with __init
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
  2025-05-29 23:39 ` [PATCH 01/28] KVM: SVM: Don't BUG if setting up the MSR intercept bitmaps fails Sean Christopherson
@ 2025-05-29 23:39 ` Sean Christopherson
  2025-06-03  7:18   ` Chao Gao
  2025-05-29 23:39 ` [PATCH 03/28] KVM: SVM: Use ARRAY_SIZE() to iterate over direct_access_msrs Sean Christopherson
                   ` (25 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:39 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Tag init_msrpm_offsets() and add_msr_offset() with __init, as they're used
only during hardware setup to map potential passthrough MSRs to offsets in
the bitmap.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index bd75ff8e4f20..25165d57f1e5 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -945,7 +945,7 @@ static void svm_msr_filter_changed(struct kvm_vcpu *vcpu)
 	}
 }
 
-static int add_msr_offset(u32 offset)
+static __init int add_msr_offset(u32 offset)
 {
 	int i;
 
@@ -968,7 +968,7 @@ static int add_msr_offset(u32 offset)
 	return -EIO;
 }
 
-static int init_msrpm_offsets(void)
+static __init int init_msrpm_offsets(void)
 {
 	int i;
 
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 03/28] KVM: SVM: Use ARRAY_SIZE() to iterate over direct_access_msrs
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
  2025-05-29 23:39 ` [PATCH 01/28] KVM: SVM: Don't BUG if setting up the MSR intercept bitmaps fails Sean Christopherson
  2025-05-29 23:39 ` [PATCH 02/28] KVM: SVM: Tag MSR bitmap initialization helpers with __init Sean Christopherson
@ 2025-05-29 23:39 ` Sean Christopherson
  2025-06-03  7:57   ` Chao Gao
  2025-05-29 23:39 ` [PATCH 04/28] KVM: SVM: Kill the VM instead of the host if MSR interception is buggy Sean Christopherson
                   ` (24 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:39 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Drop the unnecessary and dangerous value-terminated behavior of
direct_access_msrs, and simply iterate over the actual size of the array.
The use in svm_set_x2apic_msr_interception() is especially sketchy, as it
relies on unused capacity being zero-initialized, and '0' being outside
the range of x2APIC MSRs.

To ensure the array and shadow_msr_intercept stay synchronized, simply
assert that their sizes are identical (note the six 64-bit-only MSRs).

Note, direct_access_msrs will soon be removed entirely; keeping the assert
synchronized with the array isn't expected to be along-term maintenance
burden.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 18 +++++++++++-------
 arch/x86/kvm/svm/svm.h |  2 +-
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 25165d57f1e5..36a99b87a47f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -86,7 +86,7 @@ static DEFINE_PER_CPU(u64, current_tsc_ratio);
 static const struct svm_direct_access_msrs {
 	u32 index;   /* Index of the MSR */
 	bool always; /* True if intercept is initially cleared */
-} direct_access_msrs[MAX_DIRECT_ACCESS_MSRS] = {
+} direct_access_msrs[] = {
 	{ .index = MSR_STAR,				.always = true  },
 	{ .index = MSR_IA32_SYSENTER_CS,		.always = true  },
 	{ .index = MSR_IA32_SYSENTER_EIP,		.always = false },
@@ -144,9 +144,12 @@ static const struct svm_direct_access_msrs {
 	{ .index = X2APIC_MSR(APIC_TMICT),		.always = false },
 	{ .index = X2APIC_MSR(APIC_TMCCT),		.always = false },
 	{ .index = X2APIC_MSR(APIC_TDCR),		.always = false },
-	{ .index = MSR_INVALID,				.always = false },
 };
 
+static_assert(ARRAY_SIZE(direct_access_msrs) ==
+	      MAX_DIRECT_ACCESS_MSRS - 6 * !IS_ENABLED(CONFIG_X86_64));
+#undef MAX_DIRECT_ACCESS_MSRS
+
 /*
  * These 2 parameters are used to config the controls for Pause-Loop Exiting:
  * pause_filter_count: On processors that support Pause filtering(indicated
@@ -767,9 +770,10 @@ static int direct_access_msr_slot(u32 msr)
 {
 	u32 i;
 
-	for (i = 0; direct_access_msrs[i].index != MSR_INVALID; i++)
+	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
 		if (direct_access_msrs[i].index == msr)
 			return i;
+	}
 
 	return -ENOENT;
 }
@@ -891,7 +895,7 @@ void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm)
 {
 	int i;
 
-	for (i = 0; direct_access_msrs[i].index != MSR_INVALID; i++) {
+	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
 		if (!direct_access_msrs[i].always)
 			continue;
 		set_msr_interception(vcpu, msrpm, direct_access_msrs[i].index, 1, 1);
@@ -908,7 +912,7 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
 	if (!x2avic_enabled)
 		return;
 
-	for (i = 0; i < MAX_DIRECT_ACCESS_MSRS; i++) {
+	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
 		int index = direct_access_msrs[i].index;
 
 		if ((index < APIC_BASE_MSR) ||
@@ -936,7 +940,7 @@ static void svm_msr_filter_changed(struct kvm_vcpu *vcpu)
 	 * will automatically get filtered through the MSR filter, so we are
 	 * back in sync after this.
 	 */
-	for (i = 0; direct_access_msrs[i].index != MSR_INVALID; i++) {
+	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
 		u32 msr = direct_access_msrs[i].index;
 		u32 read = test_bit(i, svm->shadow_msr_intercept.read);
 		u32 write = test_bit(i, svm->shadow_msr_intercept.write);
@@ -974,7 +978,7 @@ static __init int init_msrpm_offsets(void)
 
 	memset(msrpm_offsets, 0xff, sizeof(msrpm_offsets));
 
-	for (i = 0; direct_access_msrs[i].index != MSR_INVALID; i++) {
+	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
 		u32 offset;
 
 		offset = svm_msrpm_offset(direct_access_msrs[i].index);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index e6f3c6a153a0..f1e466a10219 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -44,7 +44,7 @@ static inline struct page *__sme_pa_to_page(unsigned long pa)
 #define	IOPM_SIZE PAGE_SIZE * 3
 #define	MSRPM_SIZE PAGE_SIZE * 2
 
-#define MAX_DIRECT_ACCESS_MSRS	48
+#define MAX_DIRECT_ACCESS_MSRS	47
 #define MSRPM_OFFSETS	32
 extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
 extern bool npt_enabled;
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 04/28] KVM: SVM: Kill the VM instead of the host if MSR interception is buggy
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (2 preceding siblings ...)
  2025-05-29 23:39 ` [PATCH 03/28] KVM: SVM: Use ARRAY_SIZE() to iterate over direct_access_msrs Sean Christopherson
@ 2025-05-29 23:39 ` Sean Christopherson
  2025-06-03  8:06   ` Chao Gao
  2025-05-29 23:39 ` [PATCH 05/28] KVM: x86: Use non-atomic bit ops to manipulate "shadow" MSR intercepts Sean Christopherson
                   ` (23 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:39 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

WARN and kill the VM instead of panicking the host if KVM attempts to set
or query MSR interception for an unsupported MSR.  Accessing the MSR
interception bitmaps only meaningfully affects post-VMRUN behavior, and
KVM_BUG_ON() is guaranteed to prevent the current vCPU from doing VMRUN,
i.e. there is no need to panic the entire host.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 36a99b87a47f..d5d11cb0c987 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -827,7 +827,8 @@ static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
 	bit_write = 2 * (msr & 0x0f) + 1;
 	tmp       = msrpm[offset];
 
-	BUG_ON(offset == MSR_INVALID);
+	if (KVM_BUG_ON(offset == MSR_INVALID, vcpu->kvm))
+		return false;
 
 	return test_bit(bit_write, &tmp);
 }
@@ -858,7 +859,8 @@ static void set_msr_interception_bitmap(struct kvm_vcpu *vcpu, u32 *msrpm,
 	bit_write = 2 * (msr & 0x0f) + 1;
 	tmp       = msrpm[offset];
 
-	BUG_ON(offset == MSR_INVALID);
+	if (KVM_BUG_ON(offset == MSR_INVALID, vcpu->kvm))
+		return;
 
 	read  ? clear_bit(bit_read,  &tmp) : set_bit(bit_read,  &tmp);
 	write ? clear_bit(bit_write, &tmp) : set_bit(bit_write, &tmp);
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 05/28] KVM: x86: Use non-atomic bit ops to manipulate "shadow" MSR intercepts
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (3 preceding siblings ...)
  2025-05-29 23:39 ` [PATCH 04/28] KVM: SVM: Kill the VM instead of the host if MSR interception is buggy Sean Christopherson
@ 2025-05-29 23:39 ` Sean Christopherson
  2025-06-03  2:53   ` Mi, Dapeng
  2025-05-29 23:39 ` [PATCH 06/28] KVM: SVM: Massage name and param of helper that merges vmcb01 and vmcb12 MSRPMs Sean Christopherson
                   ` (22 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:39 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Manipulate the MSR bitmaps using non-atomic bit ops APIs (two underscores),
as the bitmaps are per-vCPU and are only ever accessed while vcpu->mutex is
held.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 12 ++++++------
 arch/x86/kvm/vmx/vmx.c |  8 ++++----
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d5d11cb0c987..b55a60e79a73 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -789,14 +789,14 @@ static void set_shadow_msr_intercept(struct kvm_vcpu *vcpu, u32 msr, int read,
 
 	/* Set the shadow bitmaps to the desired intercept states */
 	if (read)
-		set_bit(slot, svm->shadow_msr_intercept.read);
+		__set_bit(slot, svm->shadow_msr_intercept.read);
 	else
-		clear_bit(slot, svm->shadow_msr_intercept.read);
+		__clear_bit(slot, svm->shadow_msr_intercept.read);
 
 	if (write)
-		set_bit(slot, svm->shadow_msr_intercept.write);
+		__set_bit(slot, svm->shadow_msr_intercept.write);
 	else
-		clear_bit(slot, svm->shadow_msr_intercept.write);
+		__clear_bit(slot, svm->shadow_msr_intercept.write);
 }
 
 static bool valid_msr_intercept(u32 index)
@@ -862,8 +862,8 @@ static void set_msr_interception_bitmap(struct kvm_vcpu *vcpu, u32 *msrpm,
 	if (KVM_BUG_ON(offset == MSR_INVALID, vcpu->kvm))
 		return;
 
-	read  ? clear_bit(bit_read,  &tmp) : set_bit(bit_read,  &tmp);
-	write ? clear_bit(bit_write, &tmp) : set_bit(bit_write, &tmp);
+	read  ? __clear_bit(bit_read,  &tmp) : __set_bit(bit_read,  &tmp);
+	write ? __clear_bit(bit_write, &tmp) : __set_bit(bit_write, &tmp);
 
 	msrpm[offset] = tmp;
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9ff00ae9f05a..8f7fe04a1998 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4029,9 +4029,9 @@ void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
 	idx = vmx_get_passthrough_msr_slot(msr);
 	if (idx >= 0) {
 		if (type & MSR_TYPE_R)
-			clear_bit(idx, vmx->shadow_msr_intercept.read);
+			__clear_bit(idx, vmx->shadow_msr_intercept.read);
 		if (type & MSR_TYPE_W)
-			clear_bit(idx, vmx->shadow_msr_intercept.write);
+			__clear_bit(idx, vmx->shadow_msr_intercept.write);
 	}
 
 	if ((type & MSR_TYPE_R) &&
@@ -4071,9 +4071,9 @@ void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
 	idx = vmx_get_passthrough_msr_slot(msr);
 	if (idx >= 0) {
 		if (type & MSR_TYPE_R)
-			set_bit(idx, vmx->shadow_msr_intercept.read);
+			__set_bit(idx, vmx->shadow_msr_intercept.read);
 		if (type & MSR_TYPE_W)
-			set_bit(idx, vmx->shadow_msr_intercept.write);
+			__set_bit(idx, vmx->shadow_msr_intercept.write);
 	}
 
 	if (type & MSR_TYPE_R)
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 06/28] KVM: SVM: Massage name and param of helper that merges vmcb01 and vmcb12 MSRPMs
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (4 preceding siblings ...)
  2025-05-29 23:39 ` [PATCH 05/28] KVM: x86: Use non-atomic bit ops to manipulate "shadow" MSR intercepts Sean Christopherson
@ 2025-05-29 23:39 ` Sean Christopherson
  2025-06-03  2:32   ` Mi, Dapeng
  2025-05-29 23:39 ` [PATCH 07/28] KVM: SVM: Clean up macros related to architectural MSRPM definitions Sean Christopherson
                   ` (21 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:39 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Renam nested_svm_vmrun_msrpm() to nested_svm_merge_msrpm() to better
capture its role, and opportunistically feed it @vcpu instead of @svm, as
grabbing "svm" only to turn around and grab svm->vcpu is rather silly.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c | 15 +++++++--------
 arch/x86/kvm/svm/svm.c    |  2 +-
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 8427a48b8b7a..89a77f0f1cc8 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -189,8 +189,9 @@ void recalc_intercepts(struct vcpu_svm *svm)
  * is optimized in that it only merges the parts where KVM MSR permission bitmap
  * may contain zero bits.
  */
-static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
+static bool nested_svm_merge_msrpm(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
 	int i;
 
 	/*
@@ -205,7 +206,7 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
 	if (!svm->nested.force_msr_bitmap_recalc) {
 		struct hv_vmcb_enlightenments *hve = &svm->nested.ctl.hv_enlightenments;
 
-		if (kvm_hv_hypercall_enabled(&svm->vcpu) &&
+		if (kvm_hv_hypercall_enabled(vcpu) &&
 		    hve->hv_enlightenments_control.msr_bitmap &&
 		    (svm->nested.ctl.clean & BIT(HV_VMCB_NESTED_ENLIGHTENMENTS)))
 			goto set_msrpm_base_pa;
@@ -230,7 +231,7 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
 
 		offset = svm->nested.ctl.msrpm_base_pa + (p * 4);
 
-		if (kvm_vcpu_read_guest(&svm->vcpu, offset, &value, 4))
+		if (kvm_vcpu_read_guest(vcpu, offset, &value, 4))
 			return false;
 
 		svm->nested.msrpm[p] = svm->msrpm[p] | value;
@@ -937,7 +938,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
 	if (enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, true))
 		goto out_exit_err;
 
-	if (nested_svm_vmrun_msrpm(svm))
+	if (nested_svm_merge_msrpm(vcpu))
 		goto out;
 
 out_exit_err:
@@ -1819,13 +1820,11 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
 
 static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu)
 {
-	struct vcpu_svm *svm = to_svm(vcpu);
-
 	if (WARN_ON(!is_guest_mode(vcpu)))
 		return true;
 
 	if (!vcpu->arch.pdptrs_from_userspace &&
-	    !nested_npt_enabled(svm) && is_pae_paging(vcpu))
+	    !nested_npt_enabled(to_svm(vcpu)) && is_pae_paging(vcpu))
 		/*
 		 * Reload the guest's PDPTRs since after a migration
 		 * the guest CR3 might be restored prior to setting the nested
@@ -1834,7 +1833,7 @@ static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu)
 		if (CC(!load_pdptrs(vcpu, vcpu->arch.cr3)))
 			return false;
 
-	if (!nested_svm_vmrun_msrpm(svm)) {
+	if (!nested_svm_merge_msrpm(vcpu)) {
 		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
 		vcpu->run->internal.suberror =
 			KVM_INTERNAL_ERROR_EMULATION;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b55a60e79a73..2085259644b6 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3134,7 +3134,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		 *
 		 * For nested:
 		 * The handling of the MSR bitmap for L2 guests is done in
-		 * nested_svm_vmrun_msrpm.
+		 * nested_svm_merge_msrpm().
 		 * We update the L1 MSR bit as well since it will end up
 		 * touching the MSR anyway now.
 		 */
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 07/28] KVM: SVM: Clean up macros related to architectural MSRPM definitions
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (5 preceding siblings ...)
  2025-05-29 23:39 ` [PATCH 06/28] KVM: SVM: Massage name and param of helper that merges vmcb01 and vmcb12 MSRPMs Sean Christopherson
@ 2025-05-29 23:39 ` Sean Christopherson
  2025-05-29 23:39 ` [PATCH 08/28] KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps Sean Christopherson
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:39 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Move SVM's MSR Permissions Map macros to svm.h in antipication of adding
helpers that are available to SVM code, and opportunistically replace a
variety of open-coded literals with (hopefully) informative macros.

Opportunistically open code ARRAY_SIZE(msrpm_ranges) instead of wrapping
it as NUM_MSR_MAPS, which is an ambiguous name even if it were qualified
with "SVM_MSRPM".

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 18 +++++++++---------
 arch/x86/kvm/svm/svm.h | 17 ++++++++++++++++-
 2 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2085259644b6..1c70293400bc 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -266,24 +266,24 @@ static DEFINE_MUTEX(vmcb_dump_mutex);
  */
 static int tsc_aux_uret_slot __read_mostly = -1;
 
-static const u32 msrpm_ranges[] = {0, 0xc0000000, 0xc0010000};
-
-#define NUM_MSR_MAPS ARRAY_SIZE(msrpm_ranges)
-#define MSRS_RANGE_SIZE 2048
-#define MSRS_IN_RANGE (MSRS_RANGE_SIZE * 8 / 2)
+static const u32 msrpm_ranges[] = {
+	SVM_MSRPM_RANGE_0_BASE_MSR,
+	SVM_MSRPM_RANGE_1_BASE_MSR,
+	SVM_MSRPM_RANGE_2_BASE_MSR
+};
 
 u32 svm_msrpm_offset(u32 msr)
 {
 	u32 offset;
 	int i;
 
-	for (i = 0; i < NUM_MSR_MAPS; i++) {
+	for (i = 0; i < ARRAY_SIZE(msrpm_ranges); i++) {
 		if (msr < msrpm_ranges[i] ||
-		    msr >= msrpm_ranges[i] + MSRS_IN_RANGE)
+		    msr >= msrpm_ranges[i] + SVM_MSRS_PER_RANGE)
 			continue;
 
-		offset  = (msr - msrpm_ranges[i]) / 4; /* 4 msrs per u8 */
-		offset += (i * MSRS_RANGE_SIZE);       /* add range offset */
+		offset  = (msr - msrpm_ranges[i]) / SVM_MSRS_PER_BYTE;
+		offset += (i * SVM_MSRPM_BYTES_PER_RANGE);  /* add range offset */
 
 		/* Now we have the u8 offset - but need the u32 offset */
 		return offset / 4;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f1e466a10219..909b9af6b3c1 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -613,11 +613,26 @@ static inline void svm_vmgexit_no_action(struct vcpu_svm *svm, u64 data)
 	svm_vmgexit_set_return_code(svm, GHCB_HV_RESP_NO_ACTION, data);
 }
 
-/* svm.c */
+/*
+ * The MSRPM is 8KiB in size, divided into four 2KiB ranges (the fourth range
+ * is reserved).  Each MSR within a range is covered by two bits, one each for
+ * read (bit 0) and write (bit 1), where a bit value of '1' means intercepted.
+ */
+#define SVM_MSRPM_BYTES_PER_RANGE 2048
+#define SVM_BITS_PER_MSR 2
+#define SVM_MSRS_PER_BYTE (BITS_PER_BYTE / SVM_BITS_PER_MSR)
+#define SVM_MSRS_PER_RANGE (SVM_MSRPM_BYTES_PER_RANGE * SVM_MSRS_PER_BYTE)
+static_assert(SVM_MSRS_PER_RANGE == 8192);
+
+#define SVM_MSRPM_RANGE_0_BASE_MSR	0
+#define SVM_MSRPM_RANGE_1_BASE_MSR	0xc0000000
+#define SVM_MSRPM_RANGE_2_BASE_MSR	0xc0010000
+
 #define MSR_INVALID				0xffffffffU
 
 #define DEBUGCTL_RESERVED_BITS (~DEBUGCTLMSR_LBR)
 
+/* svm.c */
 extern bool dump_invalid_vmcb;
 
 u32 svm_msrpm_offset(u32 msr);
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 08/28] KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (6 preceding siblings ...)
  2025-05-29 23:39 ` [PATCH 07/28] KVM: SVM: Clean up macros related to architectural MSRPM definitions Sean Christopherson
@ 2025-05-29 23:39 ` Sean Christopherson
  2025-06-04  5:43   ` Chao Gao
                     ` (2 more replies)
  2025-05-29 23:39 ` [PATCH 09/28] KVM: nSVM: Omit SEV-ES specific passthrough MSRs from L0+L1 bitmap merge Sean Christopherson
                   ` (19 subsequent siblings)
  27 siblings, 3 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:39 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Use a dedicated array of MSRPM offsets to merge L0 and L1 bitmaps, i.e. to
merge KVM's vmcb01 bitmap with L1's vmcb12 bitmap.  This will eventually
allow for the removal of direct_access_msrs, as the only path where
tracking the offsets is truly justified is the merge for nested SVM, where
merging in chunks is an easy way to batch uaccess reads/writes.

Opportunistically omit the x2APIC MSRs from the merge-specific array
instead of filtering them out at runtime.

Note, disabling interception of XSS, EFER, PAT, GHCB, and TSC_AUX is
mutually exclusive with nested virtualization, as KVM passes through the
MSRs only for SEV-ES guests, and KVM doesn't support nested virtualization
for SEV+ guests.  Defer removing those MSRs to a future cleanup in order
to make this refactoring as benign as possible.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c | 72 +++++++++++++++++++++++++++++++++------
 arch/x86/kvm/svm/svm.c    |  4 +++
 arch/x86/kvm/svm/svm.h    |  2 ++
 3 files changed, 67 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 89a77f0f1cc8..e53020939e60 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -184,6 +184,64 @@ void recalc_intercepts(struct vcpu_svm *svm)
 	}
 }
 
+static int nested_svm_msrpm_merge_offsets[9] __ro_after_init;
+static int nested_svm_nr_msrpm_merge_offsets __ro_after_init;
+
+int __init nested_svm_init_msrpm_merge_offsets(void)
+{
+	const u32 merge_msrs[] = {
+		MSR_STAR,
+		MSR_IA32_SYSENTER_CS,
+		MSR_IA32_SYSENTER_EIP,
+		MSR_IA32_SYSENTER_ESP,
+	#ifdef CONFIG_X86_64
+		MSR_GS_BASE,
+		MSR_FS_BASE,
+		MSR_KERNEL_GS_BASE,
+		MSR_LSTAR,
+		MSR_CSTAR,
+		MSR_SYSCALL_MASK,
+	#endif
+		MSR_IA32_SPEC_CTRL,
+		MSR_IA32_PRED_CMD,
+		MSR_IA32_FLUSH_CMD,
+		MSR_IA32_LASTBRANCHFROMIP,
+		MSR_IA32_LASTBRANCHTOIP,
+		MSR_IA32_LASTINTFROMIP,
+		MSR_IA32_LASTINTTOIP,
+
+		MSR_IA32_XSS,
+		MSR_EFER,
+		MSR_IA32_CR_PAT,
+		MSR_AMD64_SEV_ES_GHCB,
+		MSR_TSC_AUX,
+	};
+	int i, j;
+
+	for (i = 0; i < ARRAY_SIZE(merge_msrs); i++) {
+		u32 offset = svm_msrpm_offset(merge_msrs[i]);
+
+		if (WARN_ON(offset == MSR_INVALID))
+			return -EIO;
+
+		for (j = 0; j < nested_svm_nr_msrpm_merge_offsets; j++) {
+			if (nested_svm_msrpm_merge_offsets[j] == offset)
+				break;
+		}
+
+		if (j < nested_svm_nr_msrpm_merge_offsets)
+			continue;
+
+		if (WARN_ON(j >= ARRAY_SIZE(nested_svm_msrpm_merge_offsets)))
+			return -EIO;
+
+		nested_svm_msrpm_merge_offsets[j] = offset;
+		nested_svm_nr_msrpm_merge_offsets++;
+	}
+
+	return 0;
+}
+
 /*
  * Merge L0's (KVM) and L1's (Nested VMCB) MSR permission bitmaps. The function
  * is optimized in that it only merges the parts where KVM MSR permission bitmap
@@ -216,19 +274,11 @@ static bool nested_svm_merge_msrpm(struct kvm_vcpu *vcpu)
 	if (!(vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_MSR_PROT)))
 		return true;
 
-	for (i = 0; i < MSRPM_OFFSETS; i++) {
-		u32 value, p;
+	for (i = 0; i < nested_svm_nr_msrpm_merge_offsets; i++) {
+		const int p = nested_svm_msrpm_merge_offsets[i];
+		u32 value;
 		u64 offset;
 
-		if (msrpm_offsets[i] == 0xffffffff)
-			break;
-
-		p      = msrpm_offsets[i];
-
-		/* x2apic msrs are intercepted always for the nested guest */
-		if (is_x2apic_msrpm_offset(p))
-			continue;
-
 		offset = svm->nested.ctl.msrpm_base_pa + (p * 4);
 
 		if (kvm_vcpu_read_guest(vcpu, offset, &value, 4))
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1c70293400bc..84dd1f220986 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5689,6 +5689,10 @@ static int __init svm_init(void)
 	if (!kvm_is_svm_supported())
 		return -EOPNOTSUPP;
 
+	r = nested_svm_init_msrpm_merge_offsets();
+	if (r)
+		return r;
+
 	r = kvm_x86_vendor_init(&svm_init_ops);
 	if (r)
 		return r;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 909b9af6b3c1..0a8041d70994 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -686,6 +686,8 @@ static inline bool nested_exit_on_nmi(struct vcpu_svm *svm)
 	return vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_NMI);
 }
 
+int __init nested_svm_init_msrpm_merge_offsets(void);
+
 int enter_svm_guest_mode(struct kvm_vcpu *vcpu,
 			 u64 vmcb_gpa, struct vmcb *vmcb12, bool from_vmrun);
 void svm_leave_nested(struct kvm_vcpu *vcpu);
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 09/28] KVM: nSVM: Omit SEV-ES specific passthrough MSRs from L0+L1 bitmap merge
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (7 preceding siblings ...)
  2025-05-29 23:39 ` [PATCH 08/28] KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps Sean Christopherson
@ 2025-05-29 23:39 ` Sean Christopherson
  2025-05-29 23:39 ` [PATCH 10/28] KVM: nSVM: Don't initialize vmcb02 MSRPM with vmcb01's "always passthrough" Sean Christopherson
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:39 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Don't merge bitmaps on nested VMRUN for MSRs that KVM passes through only
for SEV-ES guests.  KVM doesn't support nested virtualization for SEV-ES,
and likely never will.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index e53020939e60..e4a079ea4b27 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -184,7 +184,7 @@ void recalc_intercepts(struct vcpu_svm *svm)
 	}
 }
 
-static int nested_svm_msrpm_merge_offsets[9] __ro_after_init;
+static int nested_svm_msrpm_merge_offsets[6] __ro_after_init;
 static int nested_svm_nr_msrpm_merge_offsets __ro_after_init;
 
 int __init nested_svm_init_msrpm_merge_offsets(void)
@@ -209,12 +209,6 @@ int __init nested_svm_init_msrpm_merge_offsets(void)
 		MSR_IA32_LASTBRANCHTOIP,
 		MSR_IA32_LASTINTFROMIP,
 		MSR_IA32_LASTINTTOIP,
-
-		MSR_IA32_XSS,
-		MSR_EFER,
-		MSR_IA32_CR_PAT,
-		MSR_AMD64_SEV_ES_GHCB,
-		MSR_TSC_AUX,
 	};
 	int i, j;
 
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 10/28] KVM: nSVM: Don't initialize vmcb02 MSRPM with vmcb01's "always passthrough"
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (8 preceding siblings ...)
  2025-05-29 23:39 ` [PATCH 09/28] KVM: nSVM: Omit SEV-ES specific passthrough MSRs from L0+L1 bitmap merge Sean Christopherson
@ 2025-05-29 23:39 ` Sean Christopherson
  2025-05-29 23:39 ` [PATCH 11/28] KVM: SVM: Add helpers for accessing MSR bitmap that don't rely on offsets Sean Christopherson
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:39 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Don't initialize vmcb02's MSRPM with KVM's set of "always passthrough"
MSRs, as KVM always needs to consult L1's intercepts, i.e. needs to merge
vmcb01 with vmcb12 and write the result to vmcb02.  This will eventually
allow for the removal of svm_vcpu_init_msrpm().

Note, the bitmaps are truly initialized by svm_vcpu_alloc_msrpm() (default
to intercepting all MSRs), e.g. if there is a bug lurking elsewhere, the
worst case scenario from dropping the call to svm_vcpu_init_msrpm() should
be that KVM would fail to passthrough MSRs to L2.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c | 1 -
 arch/x86/kvm/svm/svm.c    | 5 +++--
 arch/x86/kvm/svm/svm.h    | 1 -
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index e4a079ea4b27..0026d2adb809 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1275,7 +1275,6 @@ int svm_allocate_nested(struct vcpu_svm *svm)
 	svm->nested.msrpm = svm_vcpu_alloc_msrpm();
 	if (!svm->nested.msrpm)
 		goto err_free_vmcb02;
-	svm_vcpu_init_msrpm(&svm->vcpu, svm->nested.msrpm);
 
 	svm->nested.initialized = true;
 	return 0;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 84dd1f220986..d97711bdbfc9 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -893,8 +893,9 @@ u32 *svm_vcpu_alloc_msrpm(void)
 	return msrpm;
 }
 
-void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm)
+static void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu)
 {
+	u32 *msrpm = to_svm(vcpu)->msrpm;
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
@@ -1403,7 +1404,7 @@ static void __svm_vcpu_reset(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	svm_vcpu_init_msrpm(vcpu, svm->msrpm);
+	svm_vcpu_init_msrpm(vcpu);
 
 	svm_init_osvw(vcpu);
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0a8041d70994..47a36a9a7fe5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -637,7 +637,6 @@ extern bool dump_invalid_vmcb;
 
 u32 svm_msrpm_offset(u32 msr);
 u32 *svm_vcpu_alloc_msrpm(void);
-void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm);
 void svm_vcpu_free_msrpm(u32 *msrpm);
 void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb);
 void svm_enable_lbrv(struct kvm_vcpu *vcpu);
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 11/28] KVM: SVM: Add helpers for accessing MSR bitmap that don't rely on offsets
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (9 preceding siblings ...)
  2025-05-29 23:39 ` [PATCH 10/28] KVM: nSVM: Don't initialize vmcb02 MSRPM with vmcb01's "always passthrough" Sean Christopherson
@ 2025-05-29 23:39 ` Sean Christopherson
  2025-06-04 16:11   ` Paolo Bonzini
  2025-05-29 23:39 ` [PATCH 12/28] KVM: SVM: Implement and adopt VMX style MSR intercepts APIs Sean Christopherson
                   ` (16 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:39 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Add macro-built helpers for testing, setting, and clearing MSRPM entries
without relying on precomputed offsets.  This sets the stage for eventually
removing general KVM use of precomputed offsets, which are quite confusing
and rather inefficient for the vast majority of KVM's usage.

Outside of merging L0 and L1 bitmaps for nested SVM, using u32-indexed
offsets and accesses is at best unnecessary, and at worst introduces extra
operations to retrieve the individual bit from within the offset u32 value.
And simply calling them "offsets" is very confusing, as the "unit" of the
offset isn't immediately obvious.

Use the new helpers in set_msr_interception_bitmap() and
msr_write_intercepted() to verify the math and operations, but keep the
existing offset-based logic set_msr_interception_bitmap() to sanity check
the "clear" and "set" operations.  Manipulating MSR interceptions isn't a
hot path and no kernel release is ever expected to contain this specific
version of set_msr_interception_bitmap() (it will be removed entirely in
the near future).

Add compile-time asserts to verify the bit number calculations, and also
to provide a simple demonstration of the layout (SVM and VMX use the same
concept of a bitmap, but with different layouts).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 30 ++++++++++++++--------------
 arch/x86/kvm/svm/svm.h | 44 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d97711bdbfc9..76d074440bcc 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -806,11 +806,6 @@ static bool valid_msr_intercept(u32 index)
 
 static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
 {
-	u8 bit_write;
-	unsigned long tmp;
-	u32 offset;
-	u32 *msrpm;
-
 	/*
 	 * For non-nested case:
 	 * If the L01 MSR bitmap does not intercept the MSR, then we need to
@@ -820,17 +815,10 @@ static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
 	 * If the L02 MSR bitmap does not intercept the MSR, then we need to
 	 * save it.
 	 */
-	msrpm = is_guest_mode(vcpu) ? to_svm(vcpu)->nested.msrpm:
-				      to_svm(vcpu)->msrpm;
+	void *msrpm = is_guest_mode(vcpu) ? to_svm(vcpu)->nested.msrpm:
+					    to_svm(vcpu)->msrpm;
 
-	offset    = svm_msrpm_offset(msr);
-	bit_write = 2 * (msr & 0x0f) + 1;
-	tmp       = msrpm[offset];
-
-	if (KVM_BUG_ON(offset == MSR_INVALID, vcpu->kvm))
-		return false;
-
-	return test_bit(bit_write, &tmp);
+	return svm_test_msr_bitmap_write(msrpm, msr);
 }
 
 static void set_msr_interception_bitmap(struct kvm_vcpu *vcpu, u32 *msrpm,
@@ -865,7 +853,17 @@ static void set_msr_interception_bitmap(struct kvm_vcpu *vcpu, u32 *msrpm,
 	read  ? __clear_bit(bit_read,  &tmp) : __set_bit(bit_read,  &tmp);
 	write ? __clear_bit(bit_write, &tmp) : __set_bit(bit_write, &tmp);
 
-	msrpm[offset] = tmp;
+	if (read)
+		svm_clear_msr_bitmap_read((void *)msrpm, msr);
+	else
+		svm_set_msr_bitmap_read((void *)msrpm, msr);
+
+	if (write)
+		svm_clear_msr_bitmap_write((void *)msrpm, msr);
+	else
+		svm_set_msr_bitmap_write((void *)msrpm, msr);
+
+	WARN_ON_ONCE(msrpm[offset] != (u32)tmp);
 
 	svm_hv_vmcb_dirty_nested_enlightenments(vcpu);
 	svm->nested.force_msr_bitmap_recalc = true;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 47a36a9a7fe5..e432cd7a7889 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -628,6 +628,50 @@ static_assert(SVM_MSRS_PER_RANGE == 8192);
 #define SVM_MSRPM_RANGE_1_BASE_MSR	0xc0000000
 #define SVM_MSRPM_RANGE_2_BASE_MSR	0xc0010000
 
+#define SVM_MSRPM_FIRST_MSR(range_nr)	\
+	(SVM_MSRPM_RANGE_## range_nr ##_BASE_MSR)
+#define SVM_MSRPM_LAST_MSR(range_nr)	\
+	(SVM_MSRPM_RANGE_## range_nr ##_BASE_MSR + SVM_MSRS_PER_RANGE - 1)
+
+#define SVM_MSRPM_BIT_NR(range_nr, msr)						\
+	(range_nr * SVM_MSRPM_BYTES_PER_RANGE * BITS_PER_BYTE +			\
+	 (msr - SVM_MSRPM_RANGE_## range_nr ##_BASE_MSR) * SVM_BITS_PER_MSR)
+
+#define SVM_MSRPM_SANITY_CHECK_BITS(range_nr)					\
+static_assert(SVM_MSRPM_BIT_NR(range_nr, SVM_MSRPM_FIRST_MSR(range_nr) + 1) ==	\
+	      range_nr * 2048 * 8 + 2);						\
+static_assert(SVM_MSRPM_BIT_NR(range_nr, SVM_MSRPM_FIRST_MSR(range_nr) + 7) ==	\
+	      range_nr * 2048 * 8 + 14);
+
+SVM_MSRPM_SANITY_CHECK_BITS(0);
+SVM_MSRPM_SANITY_CHECK_BITS(1);
+SVM_MSRPM_SANITY_CHECK_BITS(2);
+
+#define SVM_BUILD_MSR_BITMAP_CASE(bitmap, range_nr, msr, bitop, bit_rw)		\
+	case SVM_MSRPM_FIRST_MSR(range_nr) ... SVM_MSRPM_LAST_MSR(range_nr):	\
+		return bitop##_bit(SVM_MSRPM_BIT_NR(range_nr, msr) + bit_rw, bitmap);
+
+#define __BUILD_SVM_MSR_BITMAP_HELPER(rtype, action, bitop, access, bit_rw)	\
+static inline rtype svm_##action##_msr_bitmap_##access(unsigned long *bitmap,	\
+						       u32 msr)			\
+{										\
+	switch (msr) {								\
+	SVM_BUILD_MSR_BITMAP_CASE(bitmap, 0, msr, bitop, bit_rw)		\
+	SVM_BUILD_MSR_BITMAP_CASE(bitmap, 1, msr, bitop, bit_rw)		\
+	SVM_BUILD_MSR_BITMAP_CASE(bitmap, 2, msr, bitop, bit_rw)		\
+	default:								\
+		return (rtype)true;						\
+	}									\
+										\
+}
+#define BUILD_SVM_MSR_BITMAP_HELPERS(ret_type, action, bitop)			\
+	__BUILD_SVM_MSR_BITMAP_HELPER(ret_type, action, bitop, read,  0)	\
+	__BUILD_SVM_MSR_BITMAP_HELPER(ret_type, action, bitop, write, 1)
+
+BUILD_SVM_MSR_BITMAP_HELPERS(bool, test, test)
+BUILD_SVM_MSR_BITMAP_HELPERS(void, clear, __clear)
+BUILD_SVM_MSR_BITMAP_HELPERS(void, set, __set)
+
 #define MSR_INVALID				0xffffffffU
 
 #define DEBUGCTL_RESERVED_BITS (~DEBUGCTLMSR_LBR)
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 12/28] KVM: SVM: Implement and adopt VMX style MSR intercepts APIs
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (10 preceding siblings ...)
  2025-05-29 23:39 ` [PATCH 11/28] KVM: SVM: Add helpers for accessing MSR bitmap that don't rely on offsets Sean Christopherson
@ 2025-05-29 23:39 ` Sean Christopherson
  2025-06-05  8:19   ` Chao Gao
  2025-05-29 23:39 ` [PATCH 13/28] KVM: SVM: Pass through GHCB MSR if and only if VM is an SEV-ES guest Sean Christopherson
                   ` (15 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:39 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Add and use SVM MSR interception APIs (in most paths) to match VMX's
APIs and nomenclature.  Specifically, add SVM variants of:

        vmx_disable_intercept_for_msr(vcpu, msr, type)
        vmx_enable_intercept_for_msr(vcpu, msr, type)
        vmx_set_intercept_for_msr(vcpu, msr, type, intercept)

to eventually replace SVM's single helper:

        set_msr_interception(vcpu, msrpm, msr, allow_read, allow_write)

which is awkward to use (in all cases, KVM either applies the same logic
for both reads and writes, or intercepts one of read or write), and is
unintuitive due to using '0' to indicate interception should be *set*.

Keep the guts of the old API for the moment to avoid churning the MSR
filter code, as that mess will be overhauled in the near future.  Leave
behind a temporary comment to call out that the shadow bitmaps have
inverted polarity relative to the bitmaps consumed by hardware.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c |  18 ++++----
 arch/x86/kvm/svm/svm.c | 100 ++++++++++++++++++++++++++++++-----------
 arch/x86/kvm/svm/svm.h |  12 +++++
 3 files changed, 93 insertions(+), 37 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 978a0088a3f1..bb0ec029b3d4 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4415,12 +4415,10 @@ static void sev_es_vcpu_after_set_cpuid(struct vcpu_svm *svm)
 {
 	struct kvm_vcpu *vcpu = &svm->vcpu;
 
-	if (boot_cpu_has(X86_FEATURE_V_TSC_AUX)) {
-		bool v_tsc_aux = guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) ||
-				 guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID);
-
-		set_msr_interception(vcpu, svm->msrpm, MSR_TSC_AUX, v_tsc_aux, v_tsc_aux);
-	}
+	if (boot_cpu_has(X86_FEATURE_V_TSC_AUX))
+		svm_set_intercept_for_msr(vcpu, MSR_TSC_AUX, MSR_TYPE_RW,
+					  !guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) &&
+					  !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID));
 
 	/*
 	 * For SEV-ES, accesses to MSR_IA32_XSS should not be intercepted if
@@ -4436,9 +4434,9 @@ static void sev_es_vcpu_after_set_cpuid(struct vcpu_svm *svm)
 	 */
 	if (guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVES) &&
 	    guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
+		svm_disable_intercept_for_msr(vcpu, MSR_IA32_XSS, MSR_TYPE_RW);
 	else
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 0, 0);
+		svm_enable_intercept_for_msr(vcpu, MSR_IA32_XSS, MSR_TYPE_RW);
 }
 
 void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm)
@@ -4515,8 +4513,8 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
 	svm_clr_intercept(svm, INTERCEPT_XSETBV);
 
 	/* Clear intercepts on selected MSRs */
-	set_msr_interception(vcpu, svm->msrpm, MSR_EFER, 1, 1);
-	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_CR_PAT, 1, 1);
+	svm_disable_intercept_for_msr(vcpu, MSR_EFER, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_IA32_CR_PAT, MSR_TYPE_RW);
 }
 
 void sev_init_vmcb(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 76d074440bcc..56460413eca6 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -869,11 +869,57 @@ static void set_msr_interception_bitmap(struct kvm_vcpu *vcpu, u32 *msrpm,
 	svm->nested.force_msr_bitmap_recalc = true;
 }
 
-void set_msr_interception(struct kvm_vcpu *vcpu, u32 *msrpm, u32 msr,
-			  int read, int write)
+void svm_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
 {
-	set_shadow_msr_intercept(vcpu, msr, read, write);
-	set_msr_interception_bitmap(vcpu, msrpm, msr, read, write);
+	struct vcpu_svm *svm = to_svm(vcpu);
+	void *msrpm = svm->msrpm;
+
+	/* Note, the shadow intercept bitmaps have inverted polarity. */
+	set_shadow_msr_intercept(vcpu, msr, type & MSR_TYPE_R, type & MSR_TYPE_W);
+
+	/*
+	 * Don't disabled interception for the MSR if userspace wants to
+	 * handle it.
+	 */
+	if ((type & MSR_TYPE_R) &&
+	    !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ)) {
+		svm_set_msr_bitmap_read(msrpm, msr);
+		type &= ~MSR_TYPE_R;
+	}
+
+	if ((type & MSR_TYPE_W) &&
+	    !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE)) {
+		svm_set_msr_bitmap_write(msrpm, msr);
+		type &= ~MSR_TYPE_W;
+	}
+
+	if (type & MSR_TYPE_R)
+		svm_clear_msr_bitmap_read(msrpm, msr);
+
+	if (type & MSR_TYPE_W)
+		svm_clear_msr_bitmap_write(msrpm, msr);
+
+	svm_hv_vmcb_dirty_nested_enlightenments(vcpu);
+	svm->nested.force_msr_bitmap_recalc = true;
+}
+
+void svm_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	void *msrpm = svm->msrpm;
+
+
+	set_shadow_msr_intercept(vcpu, msr,
+				 !(type & MSR_TYPE_R), !(type & MSR_TYPE_W));
+
+	if (type & MSR_TYPE_R)
+		svm_set_msr_bitmap_read(msrpm, msr);
+
+	if (type & MSR_TYPE_W)
+		svm_set_msr_bitmap_write(msrpm, msr);
+
+	svm_hv_vmcb_dirty_nested_enlightenments(vcpu);
+	svm->nested.force_msr_bitmap_recalc = true;
 }
 
 u32 *svm_vcpu_alloc_msrpm(void)
@@ -893,13 +939,13 @@ u32 *svm_vcpu_alloc_msrpm(void)
 
 static void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu)
 {
-	u32 *msrpm = to_svm(vcpu)->msrpm;
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
 		if (!direct_access_msrs[i].always)
 			continue;
-		set_msr_interception(vcpu, msrpm, direct_access_msrs[i].index, 1, 1);
+		svm_disable_intercept_for_msr(vcpu, direct_access_msrs[i].index,
+					      MSR_TYPE_RW);
 	}
 }
 
@@ -919,8 +965,8 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
 		if ((index < APIC_BASE_MSR) ||
 		    (index > APIC_BASE_MSR + 0xff))
 			continue;
-		set_msr_interception(&svm->vcpu, svm->msrpm, index,
-				     !intercept, !intercept);
+
+		svm_set_intercept_for_msr(&svm->vcpu, index, MSR_TYPE_RW, intercept);
 	}
 
 	svm->x2avic_msrs_intercepted = intercept;
@@ -1008,13 +1054,13 @@ void svm_enable_lbrv(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 
 	svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK;
-	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHFROMIP, 1, 1);
-	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 1, 1);
-	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTFROMIP, 1, 1);
-	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTTOIP, 1, 1);
+	svm_disable_intercept_for_msr(vcpu, MSR_IA32_LASTBRANCHFROMIP, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_IA32_LASTBRANCHTOIP, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_IA32_LASTINTFROMIP, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_IA32_LASTINTTOIP, MSR_TYPE_RW);
 
 	if (sev_es_guest(vcpu->kvm))
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_DEBUGCTLMSR, 1, 1);
+		svm_disable_intercept_for_msr(vcpu, MSR_IA32_DEBUGCTLMSR, MSR_TYPE_RW);
 
 	/* Move the LBR msrs to the vmcb02 so that the guest can see them. */
 	if (is_guest_mode(vcpu))
@@ -1028,10 +1074,10 @@ static void svm_disable_lbrv(struct kvm_vcpu *vcpu)
 	KVM_BUG_ON(sev_es_guest(vcpu->kvm), vcpu->kvm);
 
 	svm->vmcb->control.virt_ext &= ~LBR_CTL_ENABLE_MASK;
-	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHFROMIP, 0, 0);
-	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 0, 0);
-	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTFROMIP, 0, 0);
-	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTTOIP, 0, 0);
+	svm_enable_intercept_for_msr(vcpu, MSR_IA32_LASTBRANCHFROMIP, MSR_TYPE_RW);
+	svm_enable_intercept_for_msr(vcpu, MSR_IA32_LASTBRANCHTOIP, MSR_TYPE_RW);
+	svm_enable_intercept_for_msr(vcpu, MSR_IA32_LASTINTFROMIP, MSR_TYPE_RW);
+	svm_enable_intercept_for_msr(vcpu, MSR_IA32_LASTINTTOIP, MSR_TYPE_RW);
 
 	/*
 	 * Move the LBR msrs back to the vmcb01 to avoid copying them
@@ -1223,8 +1269,8 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
 		svm_set_intercept(svm, INTERCEPT_VMSAVE);
 		svm->vmcb->control.virt_ext &= ~VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
 
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 0, 0);
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 0, 0);
+		svm_enable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
+		svm_enable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
 	} else {
 		/*
 		 * If hardware supports Virtual VMLOAD VMSAVE then enable it
@@ -1236,8 +1282,8 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
 			svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
 		}
 		/* No need to intercept these MSRs */
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 1, 1);
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 1, 1);
+		svm_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
+		svm_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
 	}
 }
 
@@ -1370,7 +1416,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 	 * of MSR_IA32_SPEC_CTRL.
 	 */
 	if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
+		svm_disable_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW);
 
 	if (kvm_vcpu_apicv_active(vcpu))
 		avic_init_vmcb(svm, vmcb);
@@ -3137,7 +3183,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		 * We update the L1 MSR bit as well since it will end up
 		 * touching the MSR anyway now.
 		 */
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
+		svm_disable_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW);
 		break;
 	case MSR_AMD64_VIRT_SPEC_CTRL:
 		if (!msr->host_initiated &&
@@ -4641,12 +4687,12 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	svm_recalc_instruction_intercepts(vcpu, svm);
 
 	if (boot_cpu_has(X86_FEATURE_IBPB))
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_PRED_CMD, 0,
-				     !!guest_has_pred_cmd_msr(vcpu));
+		svm_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
+					  !guest_has_pred_cmd_msr(vcpu));
 
 	if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
-		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_FLUSH_CMD, 0,
-				     !!guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
+		svm_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
+					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
 
 	if (sev_guest(vcpu->kvm))
 		sev_vcpu_after_set_cpuid(svm);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index e432cd7a7889..32bb1e536dce 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -701,6 +701,18 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool disable);
 void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
 				     int trig_mode, int vec);
 
+void svm_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type);
+void svm_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type);
+
+static inline void svm_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr,
+					     int type, bool enable_intercept)
+{
+	if (enable_intercept)
+		svm_enable_intercept_for_msr(vcpu, msr, type);
+	else
+		svm_disable_intercept_for_msr(vcpu, msr, type);
+}
+
 /* nested.c */
 
 #define NESTED_EXIT_HOST	0	/* Exit handled on host level */
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 13/28] KVM: SVM: Pass through GHCB MSR if and only if VM is an SEV-ES guest
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (11 preceding siblings ...)
  2025-05-29 23:39 ` [PATCH 12/28] KVM: SVM: Implement and adopt VMX style MSR intercepts APIs Sean Christopherson
@ 2025-05-29 23:39 ` Sean Christopherson
  2025-05-29 23:39 ` [PATCH 14/28] KVM: SVM: Drop "always" flag from list of possible passthrough MSRs Sean Christopherson
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:39 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Disable interception of the GHCB MSR if and only if the VM is an SEV-ES
guest.  While the exact behavior is completely undocumented in the APM,
common sense and testing on SEV-ES capable CPUs says that accesses to the
GHCB from non-SEV-ES guests will #GP.  I.e. from the guest's perspective,
no functional change intended.

Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c | 3 ++-
 arch/x86/kvm/svm/svm.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index bb0ec029b3d4..694d38a2327c 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4512,7 +4512,8 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
 	/* Can't intercept XSETBV, HV can't modify XCR0 directly */
 	svm_clr_intercept(svm, INTERCEPT_XSETBV);
 
-	/* Clear intercepts on selected MSRs */
+	/* Clear intercepts on MSRs that are context switched by hardware. */
+	svm_disable_intercept_for_msr(vcpu, MSR_AMD64_SEV_ES_GHCB, MSR_TYPE_RW);
 	svm_disable_intercept_for_msr(vcpu, MSR_EFER, MSR_TYPE_RW);
 	svm_disable_intercept_for_msr(vcpu, MSR_IA32_CR_PAT, MSR_TYPE_RW);
 }
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 56460413eca6..fa1a1b9b2d59 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -110,7 +110,7 @@ static const struct svm_direct_access_msrs {
 	{ .index = MSR_IA32_XSS,			.always = false },
 	{ .index = MSR_EFER,				.always = false },
 	{ .index = MSR_IA32_CR_PAT,			.always = false },
-	{ .index = MSR_AMD64_SEV_ES_GHCB,		.always = true  },
+	{ .index = MSR_AMD64_SEV_ES_GHCB,		.always = false },
 	{ .index = MSR_TSC_AUX,				.always = false },
 	{ .index = X2APIC_MSR(APIC_ID),			.always = false },
 	{ .index = X2APIC_MSR(APIC_LVR),		.always = false },
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 14/28] KVM: SVM: Drop "always" flag from list of possible passthrough MSRs
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (12 preceding siblings ...)
  2025-05-29 23:39 ` [PATCH 13/28] KVM: SVM: Pass through GHCB MSR if and only if VM is an SEV-ES guest Sean Christopherson
@ 2025-05-29 23:39 ` Sean Christopherson
  2025-05-29 23:40 ` [PATCH 15/28] KVM: x86: Move definition of X2APIC_MSR() to lapic.h Sean Christopherson
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:39 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Drop the "always" flag from the array of possible passthrough MSRs, and
instead manually initialize the permissions for the handful of MSRs that
KVM passes through by default.  In addition to cutting down on boilerplate
copy+paste code and eliminating a misleading flag (the MSRs aren't always
passed through, e.g. thanks to MSR filters), this will allow for removing
the direct_access_msrs array entirely.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 124 ++++++++++++++++++++---------------------
 1 file changed, 62 insertions(+), 62 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index fa1a1b9b2d59..e0fedd23e150 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -83,51 +83,48 @@ static DEFINE_PER_CPU(u64, current_tsc_ratio);
 
 #define X2APIC_MSR(x)	(APIC_BASE_MSR + (x >> 4))
 
-static const struct svm_direct_access_msrs {
-	u32 index;   /* Index of the MSR */
-	bool always; /* True if intercept is initially cleared */
-} direct_access_msrs[] = {
-	{ .index = MSR_STAR,				.always = true  },
-	{ .index = MSR_IA32_SYSENTER_CS,		.always = true  },
-	{ .index = MSR_IA32_SYSENTER_EIP,		.always = false },
-	{ .index = MSR_IA32_SYSENTER_ESP,		.always = false },
+static const u32 direct_access_msrs[] = {
+	MSR_STAR,
+	MSR_IA32_SYSENTER_CS,
+	MSR_IA32_SYSENTER_EIP,
+	MSR_IA32_SYSENTER_ESP,
 #ifdef CONFIG_X86_64
-	{ .index = MSR_GS_BASE,				.always = true  },
-	{ .index = MSR_FS_BASE,				.always = true  },
-	{ .index = MSR_KERNEL_GS_BASE,			.always = true  },
-	{ .index = MSR_LSTAR,				.always = true  },
-	{ .index = MSR_CSTAR,				.always = true  },
-	{ .index = MSR_SYSCALL_MASK,			.always = true  },
+	MSR_GS_BASE,
+	MSR_FS_BASE,
+	MSR_KERNEL_GS_BASE,
+	MSR_LSTAR,
+	MSR_CSTAR,
+	MSR_SYSCALL_MASK,
 #endif
-	{ .index = MSR_IA32_SPEC_CTRL,			.always = false },
-	{ .index = MSR_IA32_PRED_CMD,			.always = false },
-	{ .index = MSR_IA32_FLUSH_CMD,			.always = false },
-	{ .index = MSR_IA32_DEBUGCTLMSR,		.always = false },
-	{ .index = MSR_IA32_LASTBRANCHFROMIP,		.always = false },
-	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
-	{ .index = MSR_IA32_LASTINTFROMIP,		.always = false },
-	{ .index = MSR_IA32_LASTINTTOIP,		.always = false },
-	{ .index = MSR_IA32_XSS,			.always = false },
-	{ .index = MSR_EFER,				.always = false },
-	{ .index = MSR_IA32_CR_PAT,			.always = false },
-	{ .index = MSR_AMD64_SEV_ES_GHCB,		.always = false },
-	{ .index = MSR_TSC_AUX,				.always = false },
-	{ .index = X2APIC_MSR(APIC_ID),			.always = false },
-	{ .index = X2APIC_MSR(APIC_LVR),		.always = false },
-	{ .index = X2APIC_MSR(APIC_TASKPRI),		.always = false },
-	{ .index = X2APIC_MSR(APIC_ARBPRI),		.always = false },
-	{ .index = X2APIC_MSR(APIC_PROCPRI),		.always = false },
-	{ .index = X2APIC_MSR(APIC_EOI),		.always = false },
-	{ .index = X2APIC_MSR(APIC_RRR),		.always = false },
-	{ .index = X2APIC_MSR(APIC_LDR),		.always = false },
-	{ .index = X2APIC_MSR(APIC_DFR),		.always = false },
-	{ .index = X2APIC_MSR(APIC_SPIV),		.always = false },
-	{ .index = X2APIC_MSR(APIC_ISR),		.always = false },
-	{ .index = X2APIC_MSR(APIC_TMR),		.always = false },
-	{ .index = X2APIC_MSR(APIC_IRR),		.always = false },
-	{ .index = X2APIC_MSR(APIC_ESR),		.always = false },
-	{ .index = X2APIC_MSR(APIC_ICR),		.always = false },
-	{ .index = X2APIC_MSR(APIC_ICR2),		.always = false },
+	MSR_IA32_SPEC_CTRL,
+	MSR_IA32_PRED_CMD,
+	MSR_IA32_FLUSH_CMD,
+	MSR_IA32_DEBUGCTLMSR,
+	MSR_IA32_LASTBRANCHFROMIP,
+	MSR_IA32_LASTBRANCHTOIP,
+	MSR_IA32_LASTINTFROMIP,
+	MSR_IA32_LASTINTTOIP,
+	MSR_IA32_XSS,
+	MSR_EFER,
+	MSR_IA32_CR_PAT,
+	MSR_AMD64_SEV_ES_GHCB,
+	MSR_TSC_AUX,
+	X2APIC_MSR(APIC_ID),
+	X2APIC_MSR(APIC_LVR),
+	X2APIC_MSR(APIC_TASKPRI),
+	X2APIC_MSR(APIC_ARBPRI),
+	X2APIC_MSR(APIC_PROCPRI),
+	X2APIC_MSR(APIC_EOI),
+	X2APIC_MSR(APIC_RRR),
+	X2APIC_MSR(APIC_LDR),
+	X2APIC_MSR(APIC_DFR),
+	X2APIC_MSR(APIC_SPIV),
+	X2APIC_MSR(APIC_ISR),
+	X2APIC_MSR(APIC_TMR),
+	X2APIC_MSR(APIC_IRR),
+	X2APIC_MSR(APIC_ESR),
+	X2APIC_MSR(APIC_ICR),
+	X2APIC_MSR(APIC_ICR2),
 
 	/*
 	 * Note:
@@ -136,14 +133,14 @@ static const struct svm_direct_access_msrs {
 	 * the AVIC hardware would generate GP fault. Therefore, always
 	 * intercept the MSR 0x832, and do not setup direct_access_msr.
 	 */
-	{ .index = X2APIC_MSR(APIC_LVTTHMR),		.always = false },
-	{ .index = X2APIC_MSR(APIC_LVTPC),		.always = false },
-	{ .index = X2APIC_MSR(APIC_LVT0),		.always = false },
-	{ .index = X2APIC_MSR(APIC_LVT1),		.always = false },
-	{ .index = X2APIC_MSR(APIC_LVTERR),		.always = false },
-	{ .index = X2APIC_MSR(APIC_TMICT),		.always = false },
-	{ .index = X2APIC_MSR(APIC_TMCCT),		.always = false },
-	{ .index = X2APIC_MSR(APIC_TDCR),		.always = false },
+	X2APIC_MSR(APIC_LVTTHMR),
+	X2APIC_MSR(APIC_LVTPC),
+	X2APIC_MSR(APIC_LVT0),
+	X2APIC_MSR(APIC_LVT1),
+	X2APIC_MSR(APIC_LVTERR),
+	X2APIC_MSR(APIC_TMICT),
+	X2APIC_MSR(APIC_TMCCT),
+	X2APIC_MSR(APIC_TDCR),
 };
 
 static_assert(ARRAY_SIZE(direct_access_msrs) ==
@@ -771,7 +768,7 @@ static int direct_access_msr_slot(u32 msr)
 	u32 i;
 
 	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
-		if (direct_access_msrs[i].index == msr)
+		if (direct_access_msrs[i] == msr)
 			return i;
 	}
 
@@ -939,14 +936,17 @@ u32 *svm_vcpu_alloc_msrpm(void)
 
 static void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu)
 {
-	int i;
+	svm_disable_intercept_for_msr(vcpu, MSR_STAR, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
 
-	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
-		if (!direct_access_msrs[i].always)
-			continue;
-		svm_disable_intercept_for_msr(vcpu, direct_access_msrs[i].index,
-					      MSR_TYPE_RW);
-	}
+#ifdef CONFIG_X86_64
+	svm_disable_intercept_for_msr(vcpu, MSR_GS_BASE, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_FS_BASE, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_LSTAR, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_CSTAR, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_SYSCALL_MASK, MSR_TYPE_RW);
+#endif
 }
 
 void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
@@ -960,7 +960,7 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
 		return;
 
 	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
-		int index = direct_access_msrs[i].index;
+		int index = direct_access_msrs[i];
 
 		if ((index < APIC_BASE_MSR) ||
 		    (index > APIC_BASE_MSR + 0xff))
@@ -988,7 +988,7 @@ static void svm_msr_filter_changed(struct kvm_vcpu *vcpu)
 	 * back in sync after this.
 	 */
 	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
-		u32 msr = direct_access_msrs[i].index;
+		u32 msr = direct_access_msrs[i];
 		u32 read = test_bit(i, svm->shadow_msr_intercept.read);
 		u32 write = test_bit(i, svm->shadow_msr_intercept.write);
 
@@ -1028,7 +1028,7 @@ static __init int init_msrpm_offsets(void)
 	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
 		u32 offset;
 
-		offset = svm_msrpm_offset(direct_access_msrs[i].index);
+		offset = svm_msrpm_offset(direct_access_msrs[i]);
 		if (WARN_ON(offset == MSR_INVALID))
 			return -EIO;
 
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 15/28] KVM: x86: Move definition of X2APIC_MSR() to lapic.h
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (13 preceding siblings ...)
  2025-05-29 23:39 ` [PATCH 14/28] KVM: SVM: Drop "always" flag from list of possible passthrough MSRs Sean Christopherson
@ 2025-05-29 23:40 ` Sean Christopherson
  2025-05-29 23:40 ` [PATCH 16/28] KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter change Sean Christopherson
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:40 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Dedup the definition of X2APIC_MSR and put it in the local APIC code
where it belongs.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.h   | 2 ++
 arch/x86/kvm/svm/svm.c | 2 --
 arch/x86/kvm/vmx/vmx.h | 2 --
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 4ce30db65828..4518b4e0552f 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -21,6 +21,8 @@
 #define APIC_BROADCAST			0xFF
 #define X2APIC_BROADCAST		0xFFFFFFFFul
 
+#define X2APIC_MSR(r) (APIC_BASE_MSR + ((r) >> 4))
+
 enum lapic_mode {
 	LAPIC_MODE_DISABLED = 0,
 	LAPIC_MODE_INVALID = X2APIC_ENABLE,
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e0fedd23e150..c01eda772997 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -81,8 +81,6 @@ static uint64_t osvw_len = 4, osvw_status;
 
 static DEFINE_PER_CPU(u64, current_tsc_ratio);
 
-#define X2APIC_MSR(x)	(APIC_BASE_MSR + (x >> 4))
-
 static const u32 direct_access_msrs[] = {
 	MSR_STAR,
 	MSR_IA32_SYSENTER_CS,
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index b5758c33c60f..0afe97e3478f 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -19,8 +19,6 @@
 #include "../mmu.h"
 #include "common.h"
 
-#define X2APIC_MSR(r) (APIC_BASE_MSR + ((r) >> 4))
-
 #ifdef CONFIG_X86_64
 #define MAX_NR_USER_RETURN_MSRS	7
 #else
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 16/28] KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter change
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (14 preceding siblings ...)
  2025-05-29 23:40 ` [PATCH 15/28] KVM: x86: Move definition of X2APIC_MSR() to lapic.h Sean Christopherson
@ 2025-05-29 23:40 ` Sean Christopherson
  2025-05-30 23:38   ` Xin Li
                     ` (2 more replies)
  2025-05-29 23:40 ` [PATCH 17/28] KVM: SVM: " Sean Christopherson
                   ` (11 subsequent siblings)
  27 siblings, 3 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:40 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

On a userspace MSR filter change, recalculate all MSR intercepts using the
filter-agnostic logic instead of maintaining a "shadow copy" of KVM's
desired intercepts.  The shadow bitmaps add yet another point of failure,
are confusing (e.g. what does "handled specially" mean!?!?), an eyesore,
and a maintenance burden.

Given that KVM *must* be able to recalculate the correct intercepts at any
given time, and that MSR filter updates are not hot paths, there is zero
benefit to maintaining the shadow bitmaps.

Link: https://lore.kernel.org/all/aCdPbZiYmtni4Bjs@google.com
Link: https://lore.kernel.org/all/20241126180253.GAZ0YNTdXH1UGeqsu6@fat_crate.local
Cc: Borislav Petkov <bp@alien8.de>
Cc: Xin Li <xin@zytor.com>
Cc: Chao Gao <chao.gao@intel.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 184 +++++++++++------------------------------
 arch/x86/kvm/vmx/vmx.h |   7 --
 2 files changed, 47 insertions(+), 144 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 8f7fe04a1998..6ffa2b2b85ce 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -166,31 +166,6 @@ module_param(allow_smaller_maxphyaddr, bool, S_IRUGO);
 	RTIT_STATUS_ERROR | RTIT_STATUS_STOPPED | \
 	RTIT_STATUS_BYTECNT))
 
-/*
- * List of MSRs that can be directly passed to the guest.
- * In addition to these x2apic, PT and LBR MSRs are handled specially.
- */
-static u32 vmx_possible_passthrough_msrs[MAX_POSSIBLE_PASSTHROUGH_MSRS] = {
-	MSR_IA32_SPEC_CTRL,
-	MSR_IA32_PRED_CMD,
-	MSR_IA32_FLUSH_CMD,
-	MSR_IA32_TSC,
-#ifdef CONFIG_X86_64
-	MSR_FS_BASE,
-	MSR_GS_BASE,
-	MSR_KERNEL_GS_BASE,
-	MSR_IA32_XFD,
-	MSR_IA32_XFD_ERR,
-#endif
-	MSR_IA32_SYSENTER_CS,
-	MSR_IA32_SYSENTER_ESP,
-	MSR_IA32_SYSENTER_EIP,
-	MSR_CORE_C1_RES,
-	MSR_CORE_C3_RESIDENCY,
-	MSR_CORE_C6_RESIDENCY,
-	MSR_CORE_C7_RESIDENCY,
-};
-
 /*
  * These 2 parameters are used to config the controls for Pause-Loop Exiting:
  * ple_gap:    upper bound on the amount of time between two successive
@@ -672,40 +647,6 @@ static inline bool cpu_need_virtualize_apic_accesses(struct kvm_vcpu *vcpu)
 	return flexpriority_enabled && lapic_in_kernel(vcpu);
 }
 
-static int vmx_get_passthrough_msr_slot(u32 msr)
-{
-	int i;
-
-	switch (msr) {
-	case 0x800 ... 0x8ff:
-		/* x2APIC MSRs. These are handled in vmx_update_msr_bitmap_x2apic() */
-		return -ENOENT;
-	case MSR_IA32_RTIT_STATUS:
-	case MSR_IA32_RTIT_OUTPUT_BASE:
-	case MSR_IA32_RTIT_OUTPUT_MASK:
-	case MSR_IA32_RTIT_CR3_MATCH:
-	case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
-		/* PT MSRs. These are handled in pt_update_intercept_for_msr() */
-	case MSR_LBR_SELECT:
-	case MSR_LBR_TOS:
-	case MSR_LBR_INFO_0 ... MSR_LBR_INFO_0 + 31:
-	case MSR_LBR_NHM_FROM ... MSR_LBR_NHM_FROM + 31:
-	case MSR_LBR_NHM_TO ... MSR_LBR_NHM_TO + 31:
-	case MSR_LBR_CORE_FROM ... MSR_LBR_CORE_FROM + 8:
-	case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8:
-		/* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */
-		return -ENOENT;
-	}
-
-	for (i = 0; i < ARRAY_SIZE(vmx_possible_passthrough_msrs); i++) {
-		if (vmx_possible_passthrough_msrs[i] == msr)
-			return i;
-	}
-
-	WARN(1, "Invalid MSR %x, please adapt vmx_possible_passthrough_msrs[]", msr);
-	return -ENOENT;
-}
-
 struct vmx_uret_msr *vmx_find_uret_msr(struct vcpu_vmx *vmx, u32 msr)
 {
 	int i;
@@ -4015,25 +3956,12 @@ void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
-	int idx;
 
 	if (!cpu_has_vmx_msr_bitmap())
 		return;
 
 	vmx_msr_bitmap_l01_changed(vmx);
 
-	/*
-	 * Mark the desired intercept state in shadow bitmap, this is needed
-	 * for resync when the MSR filters change.
-	 */
-	idx = vmx_get_passthrough_msr_slot(msr);
-	if (idx >= 0) {
-		if (type & MSR_TYPE_R)
-			__clear_bit(idx, vmx->shadow_msr_intercept.read);
-		if (type & MSR_TYPE_W)
-			__clear_bit(idx, vmx->shadow_msr_intercept.write);
-	}
-
 	if ((type & MSR_TYPE_R) &&
 	    !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ)) {
 		vmx_set_msr_bitmap_read(msr_bitmap, msr);
@@ -4057,25 +3985,12 @@ void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
-	int idx;
 
 	if (!cpu_has_vmx_msr_bitmap())
 		return;
 
 	vmx_msr_bitmap_l01_changed(vmx);
 
-	/*
-	 * Mark the desired intercept state in shadow bitmap, this is needed
-	 * for resync when the MSR filter changes.
-	 */
-	idx = vmx_get_passthrough_msr_slot(msr);
-	if (idx >= 0) {
-		if (type & MSR_TYPE_R)
-			__set_bit(idx, vmx->shadow_msr_intercept.read);
-		if (type & MSR_TYPE_W)
-			__set_bit(idx, vmx->shadow_msr_intercept.write);
-	}
-
 	if (type & MSR_TYPE_R)
 		vmx_set_msr_bitmap_read(msr_bitmap, msr);
 
@@ -4159,35 +4074,59 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
 	}
 }
 
-void vmx_msr_filter_changed(struct kvm_vcpu *vcpu)
+static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 {
-	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	u32 i;
-
 	if (!cpu_has_vmx_msr_bitmap())
 		return;
 
-	/*
-	 * Redo intercept permissions for MSRs that KVM is passing through to
-	 * the guest.  Disabling interception will check the new MSR filter and
-	 * ensure that KVM enables interception if usersepace wants to filter
-	 * the MSR.  MSRs that KVM is already intercepting don't need to be
-	 * refreshed since KVM is going to intercept them regardless of what
-	 * userspace wants.
-	 */
-	for (i = 0; i < ARRAY_SIZE(vmx_possible_passthrough_msrs); i++) {
-		u32 msr = vmx_possible_passthrough_msrs[i];
-
-		if (!test_bit(i, vmx->shadow_msr_intercept.read))
-			vmx_disable_intercept_for_msr(vcpu, msr, MSR_TYPE_R);
-
-		if (!test_bit(i, vmx->shadow_msr_intercept.write))
-			vmx_disable_intercept_for_msr(vcpu, msr, MSR_TYPE_W);
+	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_TSC, MSR_TYPE_R);
+#ifdef CONFIG_X86_64
+	vmx_disable_intercept_for_msr(vcpu, MSR_FS_BASE, MSR_TYPE_RW);
+	vmx_disable_intercept_for_msr(vcpu, MSR_GS_BASE, MSR_TYPE_RW);
+	vmx_disable_intercept_for_msr(vcpu, MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
+#endif
+	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
+	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
+	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
+	if (kvm_cstate_in_guest(vcpu->kvm)) {
+		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C1_RES, MSR_TYPE_R);
+		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
+		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
+		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
 	}
 
 	/* PT MSRs can be passed through iff PT is exposed to the guest. */
 	if (vmx_pt_mode_is_host_guest())
 		pt_update_intercept_for_msr(vcpu);
+
+	if (vcpu->arch.xfd_no_write_intercept)
+		vmx_disable_intercept_for_msr(vcpu, MSR_IA32_XFD, MSR_TYPE_RW);
+
+
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW,
+				  !to_vmx(vcpu)->spec_ctrl);
+
+	if (kvm_cpu_cap_has(X86_FEATURE_XFD))
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_R,
+					  !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD));
+
+	if (boot_cpu_has(X86_FEATURE_IBPB))
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
+					  !guest_has_pred_cmd_msr(vcpu));
+
+	if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
+					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
+
+	/*
+	 * x2APIC and LBR MSR intercepts are modified on-demand and cannot be
+	 * filtered by userspace.
+	 */
+}
+
+void vmx_msr_filter_changed(struct kvm_vcpu *vcpu)
+{
+	vmx_recalc_msr_intercepts(vcpu);
 }
 
 static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
@@ -7537,26 +7476,6 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu)
 		evmcs->hv_enlightenments_control.msr_bitmap = 1;
 	}
 
-	/* The MSR bitmap starts with all ones */
-	bitmap_fill(vmx->shadow_msr_intercept.read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
-	bitmap_fill(vmx->shadow_msr_intercept.write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
-
-	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_TSC, MSR_TYPE_R);
-#ifdef CONFIG_X86_64
-	vmx_disable_intercept_for_msr(vcpu, MSR_FS_BASE, MSR_TYPE_RW);
-	vmx_disable_intercept_for_msr(vcpu, MSR_GS_BASE, MSR_TYPE_RW);
-	vmx_disable_intercept_for_msr(vcpu, MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
-#endif
-	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
-	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
-	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
-	if (kvm_cstate_in_guest(vcpu->kvm)) {
-		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C1_RES, MSR_TYPE_R);
-		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
-		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
-		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
-	}
-
 	vmx->loaded_vmcs = &vmx->vmcs01;
 
 	if (cpu_need_virtualize_apic_accesses(vcpu)) {
@@ -7842,18 +7761,6 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		}
 	}
 
-	if (kvm_cpu_cap_has(X86_FEATURE_XFD))
-		vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_R,
-					  !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD));
-
-	if (boot_cpu_has(X86_FEATURE_IBPB))
-		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
-					  !guest_has_pred_cmd_msr(vcpu));
-
-	if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
-		vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
-					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
-
 	set_cr4_guest_host_mask(vmx);
 
 	vmx_write_encls_bitmap(vcpu, NULL);
@@ -7869,6 +7776,9 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		vmx->msr_ia32_feature_control_valid_bits &=
 			~FEAT_CTL_SGX_LC_ENABLED;
 
+	/* Recalc MSR interception to account for feature changes. */
+	vmx_recalc_msr_intercepts(vcpu);
+
 	/* Refresh #PF interception to account for MAXPHYADDR changes. */
 	vmx_update_exception_bitmap(vcpu);
 }
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 0afe97e3478f..a26fe3d9e1d2 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -294,13 +294,6 @@ struct vcpu_vmx {
 	struct pt_desc pt_desc;
 	struct lbr_desc lbr_desc;
 
-	/* Save desired MSR intercept (read: pass-through) state */
-#define MAX_POSSIBLE_PASSTHROUGH_MSRS	16
-	struct {
-		DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
-		DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
-	} shadow_msr_intercept;
-
 	/* ve_info must be page aligned. */
 	struct vmx_ve_information *ve_info;
 };
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 17/28] KVM: SVM: Manually recalc all MSR intercepts on userspace MSR filter change
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (15 preceding siblings ...)
  2025-05-29 23:40 ` [PATCH 16/28] KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter change Sean Christopherson
@ 2025-05-29 23:40 ` Sean Christopherson
  2025-05-31 18:01   ` Francesco Lavra
  2025-06-05  6:24   ` Chao Gao
  2025-05-29 23:40 ` [PATCH 18/28] KVM: x86: Rename msr_filter_changed() => recalc_msr_intercepts() Sean Christopherson
                   ` (10 subsequent siblings)
  27 siblings, 2 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:40 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

On a userspace MSR filter change, recalculate all MSR intercepts using the
filter-agnostic logic instead of maintaining a "shadow copy" of KVM's
desired intercepts.  The shadow bitmaps add yet another point of failure,
are confusing (e.g. what does "handled specially" mean!?!?), an eyesore,
and a maintenance burden.

Given that KVM *must* be able to recalculate the correct intercepts at any
given time, and that MSR filter updates are not hot paths, there is zero
benefit to maintaining the shadow bitmaps.

Link: https://lore.kernel.org/all/aCdPbZiYmtni4Bjs@google.com
Link: https://lore.kernel.org/all/20241126180253.GAZ0YNTdXH1UGeqsu6@fat_crate.local
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c |  16 +-
 arch/x86/kvm/svm/svm.c | 371 +++++++++++------------------------------
 arch/x86/kvm/svm/svm.h |   7 +-
 3 files changed, 105 insertions(+), 289 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 694d38a2327c..800ece58b84c 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4411,9 +4411,12 @@ int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in)
 				    count, in);
 }
 
-static void sev_es_vcpu_after_set_cpuid(struct vcpu_svm *svm)
+void sev_es_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 {
-	struct kvm_vcpu *vcpu = &svm->vcpu;
+	/* Clear intercepts on MSRs that are context switched by hardware. */
+	svm_disable_intercept_for_msr(vcpu, MSR_AMD64_SEV_ES_GHCB, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_EFER, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_IA32_CR_PAT, MSR_TYPE_RW);
 
 	if (boot_cpu_has(X86_FEATURE_V_TSC_AUX))
 		svm_set_intercept_for_msr(vcpu, MSR_TSC_AUX, MSR_TYPE_RW,
@@ -4448,16 +4451,12 @@ void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm)
 	best = kvm_find_cpuid_entry(vcpu, 0x8000001F);
 	if (best)
 		vcpu->arch.reserved_gpa_bits &= ~(1UL << (best->ebx & 0x3f));
-
-	if (sev_es_guest(svm->vcpu.kvm))
-		sev_es_vcpu_after_set_cpuid(svm);
 }
 
 static void sev_es_init_vmcb(struct vcpu_svm *svm)
 {
 	struct kvm_sev_info *sev = to_kvm_sev_info(svm->vcpu.kvm);
 	struct vmcb *vmcb = svm->vmcb01.ptr;
-	struct kvm_vcpu *vcpu = &svm->vcpu;
 
 	svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ES_ENABLE;
 
@@ -4511,11 +4510,6 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
 
 	/* Can't intercept XSETBV, HV can't modify XCR0 directly */
 	svm_clr_intercept(svm, INTERCEPT_XSETBV);
-
-	/* Clear intercepts on MSRs that are context switched by hardware. */
-	svm_disable_intercept_for_msr(vcpu, MSR_AMD64_SEV_ES_GHCB, MSR_TYPE_RW);
-	svm_disable_intercept_for_msr(vcpu, MSR_EFER, MSR_TYPE_RW);
-	svm_disable_intercept_for_msr(vcpu, MSR_IA32_CR_PAT, MSR_TYPE_RW);
 }
 
 void sev_init_vmcb(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c01eda772997..685d9fd4a4e1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -71,8 +71,6 @@ MODULE_DEVICE_TABLE(x86cpu, svm_cpu_id);
 
 static bool erratum_383_found __read_mostly;
 
-u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
-
 /*
  * Set osvw_len to higher value when updated Revision Guides
  * are published and we know what the new status bits are
@@ -81,70 +79,6 @@ static uint64_t osvw_len = 4, osvw_status;
 
 static DEFINE_PER_CPU(u64, current_tsc_ratio);
 
-static const u32 direct_access_msrs[] = {
-	MSR_STAR,
-	MSR_IA32_SYSENTER_CS,
-	MSR_IA32_SYSENTER_EIP,
-	MSR_IA32_SYSENTER_ESP,
-#ifdef CONFIG_X86_64
-	MSR_GS_BASE,
-	MSR_FS_BASE,
-	MSR_KERNEL_GS_BASE,
-	MSR_LSTAR,
-	MSR_CSTAR,
-	MSR_SYSCALL_MASK,
-#endif
-	MSR_IA32_SPEC_CTRL,
-	MSR_IA32_PRED_CMD,
-	MSR_IA32_FLUSH_CMD,
-	MSR_IA32_DEBUGCTLMSR,
-	MSR_IA32_LASTBRANCHFROMIP,
-	MSR_IA32_LASTBRANCHTOIP,
-	MSR_IA32_LASTINTFROMIP,
-	MSR_IA32_LASTINTTOIP,
-	MSR_IA32_XSS,
-	MSR_EFER,
-	MSR_IA32_CR_PAT,
-	MSR_AMD64_SEV_ES_GHCB,
-	MSR_TSC_AUX,
-	X2APIC_MSR(APIC_ID),
-	X2APIC_MSR(APIC_LVR),
-	X2APIC_MSR(APIC_TASKPRI),
-	X2APIC_MSR(APIC_ARBPRI),
-	X2APIC_MSR(APIC_PROCPRI),
-	X2APIC_MSR(APIC_EOI),
-	X2APIC_MSR(APIC_RRR),
-	X2APIC_MSR(APIC_LDR),
-	X2APIC_MSR(APIC_DFR),
-	X2APIC_MSR(APIC_SPIV),
-	X2APIC_MSR(APIC_ISR),
-	X2APIC_MSR(APIC_TMR),
-	X2APIC_MSR(APIC_IRR),
-	X2APIC_MSR(APIC_ESR),
-	X2APIC_MSR(APIC_ICR),
-	X2APIC_MSR(APIC_ICR2),
-
-	/*
-	 * Note:
-	 * AMD does not virtualize APIC TSC-deadline timer mode, but it is
-	 * emulated by KVM. When setting APIC LVTT (0x832) register bit 18,
-	 * the AVIC hardware would generate GP fault. Therefore, always
-	 * intercept the MSR 0x832, and do not setup direct_access_msr.
-	 */
-	X2APIC_MSR(APIC_LVTTHMR),
-	X2APIC_MSR(APIC_LVTPC),
-	X2APIC_MSR(APIC_LVT0),
-	X2APIC_MSR(APIC_LVT1),
-	X2APIC_MSR(APIC_LVTERR),
-	X2APIC_MSR(APIC_TMICT),
-	X2APIC_MSR(APIC_TMCCT),
-	X2APIC_MSR(APIC_TDCR),
-};
-
-static_assert(ARRAY_SIZE(direct_access_msrs) ==
-	      MAX_DIRECT_ACCESS_MSRS - 6 * !IS_ENABLED(CONFIG_X86_64));
-#undef MAX_DIRECT_ACCESS_MSRS
-
 /*
  * These 2 parameters are used to config the controls for Pause-Loop Exiting:
  * pause_filter_count: On processors that support Pause filtering(indicated
@@ -761,44 +695,6 @@ static void clr_dr_intercepts(struct vcpu_svm *svm)
 	recalc_intercepts(svm);
 }
 
-static int direct_access_msr_slot(u32 msr)
-{
-	u32 i;
-
-	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
-		if (direct_access_msrs[i] == msr)
-			return i;
-	}
-
-	return -ENOENT;
-}
-
-static void set_shadow_msr_intercept(struct kvm_vcpu *vcpu, u32 msr, int read,
-				     int write)
-{
-	struct vcpu_svm *svm = to_svm(vcpu);
-	int slot = direct_access_msr_slot(msr);
-
-	if (slot == -ENOENT)
-		return;
-
-	/* Set the shadow bitmaps to the desired intercept states */
-	if (read)
-		__set_bit(slot, svm->shadow_msr_intercept.read);
-	else
-		__clear_bit(slot, svm->shadow_msr_intercept.read);
-
-	if (write)
-		__set_bit(slot, svm->shadow_msr_intercept.write);
-	else
-		__clear_bit(slot, svm->shadow_msr_intercept.write);
-}
-
-static bool valid_msr_intercept(u32 index)
-{
-	return direct_access_msr_slot(index) != -ENOENT;
-}
-
 static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
 {
 	/*
@@ -816,62 +712,11 @@ static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
 	return svm_test_msr_bitmap_write(msrpm, msr);
 }
 
-static void set_msr_interception_bitmap(struct kvm_vcpu *vcpu, u32 *msrpm,
-					u32 msr, int read, int write)
-{
-	struct vcpu_svm *svm = to_svm(vcpu);
-	u8 bit_read, bit_write;
-	unsigned long tmp;
-	u32 offset;
-
-	/*
-	 * If this warning triggers extend the direct_access_msrs list at the
-	 * beginning of the file
-	 */
-	WARN_ON(!valid_msr_intercept(msr));
-
-	/* Enforce non allowed MSRs to trap */
-	if (read && !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ))
-		read = 0;
-
-	if (write && !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE))
-		write = 0;
-
-	offset    = svm_msrpm_offset(msr);
-	bit_read  = 2 * (msr & 0x0f);
-	bit_write = 2 * (msr & 0x0f) + 1;
-	tmp       = msrpm[offset];
-
-	if (KVM_BUG_ON(offset == MSR_INVALID, vcpu->kvm))
-		return;
-
-	read  ? __clear_bit(bit_read,  &tmp) : __set_bit(bit_read,  &tmp);
-	write ? __clear_bit(bit_write, &tmp) : __set_bit(bit_write, &tmp);
-
-	if (read)
-		svm_clear_msr_bitmap_read((void *)msrpm, msr);
-	else
-		svm_set_msr_bitmap_read((void *)msrpm, msr);
-
-	if (write)
-		svm_clear_msr_bitmap_write((void *)msrpm, msr);
-	else
-		svm_set_msr_bitmap_write((void *)msrpm, msr);
-
-	WARN_ON_ONCE(msrpm[offset] != (u32)tmp);
-
-	svm_hv_vmcb_dirty_nested_enlightenments(vcpu);
-	svm->nested.force_msr_bitmap_recalc = true;
-}
-
 void svm_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	void *msrpm = svm->msrpm;
 
-	/* Note, the shadow intercept bitmaps have inverted polarity. */
-	set_shadow_msr_intercept(vcpu, msr, type & MSR_TYPE_R, type & MSR_TYPE_W);
-
 	/*
 	 * Don't disabled interception for the MSR if userspace wants to
 	 * handle it.
@@ -903,10 +748,6 @@ void svm_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
 	struct vcpu_svm *svm = to_svm(vcpu);
 	void *msrpm = svm->msrpm;
 
-
-	set_shadow_msr_intercept(vcpu, msr,
-				 !(type & MSR_TYPE_R), !(type & MSR_TYPE_W));
-
 	if (type & MSR_TYPE_R)
 		svm_set_msr_bitmap_read(msrpm, msr);
 
@@ -932,6 +773,20 @@ u32 *svm_vcpu_alloc_msrpm(void)
 	return msrpm;
 }
 
+static void svm_recalc_lbr_msr_intercepts(struct kvm_vcpu *vcpu)
+{
+	bool intercept = !(to_svm(vcpu)->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK);
+
+	svm_set_intercept_for_msr(vcpu, MSR_IA32_LASTBRANCHFROMIP, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(vcpu, MSR_IA32_LASTBRANCHTOIP, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(vcpu, MSR_IA32_LASTINTFROMIP, MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(vcpu, MSR_IA32_LASTINTTOIP, MSR_TYPE_RW, intercept);
+
+	if (sev_es_guest(vcpu->kvm))
+		svm_set_intercept_for_msr(vcpu, MSR_IA32_DEBUGCTLMSR, MSR_TYPE_RW, intercept);
+
+}
+
 static void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu)
 {
 	svm_disable_intercept_for_msr(vcpu, MSR_STAR, MSR_TYPE_RW);
@@ -949,6 +804,38 @@ static void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu)
 
 void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
 {
+	static const u32 x2avic_passthrough_msrs[] = {
+		X2APIC_MSR(APIC_ID),
+		X2APIC_MSR(APIC_LVR),
+		X2APIC_MSR(APIC_TASKPRI),
+		X2APIC_MSR(APIC_ARBPRI),
+		X2APIC_MSR(APIC_PROCPRI),
+		X2APIC_MSR(APIC_EOI),
+		X2APIC_MSR(APIC_RRR),
+		X2APIC_MSR(APIC_LDR),
+		X2APIC_MSR(APIC_DFR),
+		X2APIC_MSR(APIC_SPIV),
+		X2APIC_MSR(APIC_ISR),
+		X2APIC_MSR(APIC_TMR),
+		X2APIC_MSR(APIC_IRR),
+		X2APIC_MSR(APIC_ESR),
+		X2APIC_MSR(APIC_ICR),
+		X2APIC_MSR(APIC_ICR2),
+
+		/*
+		 * Note!  Always intercept LVTT, as TSC-deadline timer mode
+		 * isn't virtualized by hardware, and the CPU will generate a
+		 * #GP instead of a #VMEXIT.
+		 */
+		X2APIC_MSR(APIC_LVTTHMR),
+		X2APIC_MSR(APIC_LVTPC),
+		X2APIC_MSR(APIC_LVT0),
+		X2APIC_MSR(APIC_LVT1),
+		X2APIC_MSR(APIC_LVTERR),
+		X2APIC_MSR(APIC_TMICT),
+		X2APIC_MSR(APIC_TMCCT),
+		X2APIC_MSR(APIC_TDCR),
+	};
 	int i;
 
 	if (intercept == svm->x2avic_msrs_intercepted)
@@ -957,15 +844,9 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
 	if (!x2avic_enabled)
 		return;
 
-	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
-		int index = direct_access_msrs[i];
-
-		if ((index < APIC_BASE_MSR) ||
-		    (index > APIC_BASE_MSR + 0xff))
-			continue;
-
-		svm_set_intercept_for_msr(&svm->vcpu, index, MSR_TYPE_RW, intercept);
-	}
+	for (i = 0; i < ARRAY_SIZE(x2avic_passthrough_msrs); i++)
+		svm_set_intercept_for_msr(&svm->vcpu, x2avic_passthrough_msrs[i],
+					  MSR_TYPE_RW, intercept);
 
 	svm->x2avic_msrs_intercepted = intercept;
 }
@@ -975,65 +856,53 @@ void svm_vcpu_free_msrpm(u32 *msrpm)
 	__free_pages(virt_to_page(msrpm), get_order(MSRPM_SIZE));
 }
 
+static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	svm_vcpu_init_msrpm(vcpu);
+
+	if (lbrv)
+		svm_recalc_lbr_msr_intercepts(vcpu);
+
+	if (boot_cpu_has(X86_FEATURE_IBPB))
+		svm_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
+					  !guest_has_pred_cmd_msr(vcpu));
+
+	if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
+		svm_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
+					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
+
+	/*
+	 * Unconditionally disable interception of SPEC_CTRL if V_SPEC_CTRL is
+	 * supported, i.e. if VMRUN/#VMEXIT context switch MSR_IA32_SPEC_CTRL.
+	 */
+	if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
+		svm_disable_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW);
+	else
+		svm_set_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW, !svm->spec_ctrl);
+
+	/*
+	 * Intercept SYSENTER_EIP and SYSENTER_ESP when emulating an Intel CPU,
+	 * as AMD hardware only store 32 bits, whereas Intel CPUs track 64 bits.
+	 */
+	svm_set_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW,
+				  guest_cpuid_is_intel_compatible(vcpu));
+	svm_set_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW,
+				  guest_cpuid_is_intel_compatible(vcpu));
+
+	if (sev_es_guest(vcpu->kvm))
+		sev_es_recalc_msr_intercepts(vcpu);
+
+	/*
+	 * x2APIC intercepts are modified on-demand and cannot be filtered by
+	 * userspace.
+	 */
+}
+
 static void svm_msr_filter_changed(struct kvm_vcpu *vcpu)
 {
-	struct vcpu_svm *svm = to_svm(vcpu);
-	u32 i;
-
-	/*
-	 * Set intercept permissions for all direct access MSRs again. They
-	 * will automatically get filtered through the MSR filter, so we are
-	 * back in sync after this.
-	 */
-	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
-		u32 msr = direct_access_msrs[i];
-		u32 read = test_bit(i, svm->shadow_msr_intercept.read);
-		u32 write = test_bit(i, svm->shadow_msr_intercept.write);
-
-		set_msr_interception_bitmap(vcpu, svm->msrpm, msr, read, write);
-	}
-}
-
-static __init int add_msr_offset(u32 offset)
-{
-	int i;
-
-	for (i = 0; i < MSRPM_OFFSETS; ++i) {
-
-		/* Offset already in list? */
-		if (msrpm_offsets[i] == offset)
-			return 0;
-
-		/* Slot used by another offset? */
-		if (msrpm_offsets[i] != MSR_INVALID)
-			continue;
-
-		/* Add offset to list */
-		msrpm_offsets[i] = offset;
-
-		return 0;
-	}
-
-	return -EIO;
-}
-
-static __init int init_msrpm_offsets(void)
-{
-	int i;
-
-	memset(msrpm_offsets, 0xff, sizeof(msrpm_offsets));
-
-	for (i = 0; i < ARRAY_SIZE(direct_access_msrs); i++) {
-		u32 offset;
-
-		offset = svm_msrpm_offset(direct_access_msrs[i]);
-		if (WARN_ON(offset == MSR_INVALID))
-			return -EIO;
-
-		if (WARN_ON_ONCE(add_msr_offset(offset)))
-			return -EIO;
-	}
-	return 0;
+	svm_recalc_msr_intercepts(vcpu);
 }
 
 void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
@@ -1052,13 +921,7 @@ void svm_enable_lbrv(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 
 	svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK;
-	svm_disable_intercept_for_msr(vcpu, MSR_IA32_LASTBRANCHFROMIP, MSR_TYPE_RW);
-	svm_disable_intercept_for_msr(vcpu, MSR_IA32_LASTBRANCHTOIP, MSR_TYPE_RW);
-	svm_disable_intercept_for_msr(vcpu, MSR_IA32_LASTINTFROMIP, MSR_TYPE_RW);
-	svm_disable_intercept_for_msr(vcpu, MSR_IA32_LASTINTTOIP, MSR_TYPE_RW);
-
-	if (sev_es_guest(vcpu->kvm))
-		svm_disable_intercept_for_msr(vcpu, MSR_IA32_DEBUGCTLMSR, MSR_TYPE_RW);
+	svm_recalc_lbr_msr_intercepts(vcpu);
 
 	/* Move the LBR msrs to the vmcb02 so that the guest can see them. */
 	if (is_guest_mode(vcpu))
@@ -1072,10 +935,7 @@ static void svm_disable_lbrv(struct kvm_vcpu *vcpu)
 	KVM_BUG_ON(sev_es_guest(vcpu->kvm), vcpu->kvm);
 
 	svm->vmcb->control.virt_ext &= ~LBR_CTL_ENABLE_MASK;
-	svm_enable_intercept_for_msr(vcpu, MSR_IA32_LASTBRANCHFROMIP, MSR_TYPE_RW);
-	svm_enable_intercept_for_msr(vcpu, MSR_IA32_LASTBRANCHTOIP, MSR_TYPE_RW);
-	svm_enable_intercept_for_msr(vcpu, MSR_IA32_LASTINTFROMIP, MSR_TYPE_RW);
-	svm_enable_intercept_for_msr(vcpu, MSR_IA32_LASTINTTOIP, MSR_TYPE_RW);
+	svm_recalc_lbr_msr_intercepts(vcpu);
 
 	/*
 	 * Move the LBR msrs back to the vmcb01 to avoid copying them
@@ -1258,17 +1118,9 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 
 	if (guest_cpuid_is_intel_compatible(vcpu)) {
-		/*
-		 * We must intercept SYSENTER_EIP and SYSENTER_ESP
-		 * accesses because the processor only stores 32 bits.
-		 * For the same reason we cannot use virtual VMLOAD/VMSAVE.
-		 */
 		svm_set_intercept(svm, INTERCEPT_VMLOAD);
 		svm_set_intercept(svm, INTERCEPT_VMSAVE);
 		svm->vmcb->control.virt_ext &= ~VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
-
-		svm_enable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
-		svm_enable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
 	} else {
 		/*
 		 * If hardware supports Virtual VMLOAD VMSAVE then enable it
@@ -1279,10 +1131,9 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
 			svm_clr_intercept(svm, INTERCEPT_VMSAVE);
 			svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
 		}
-		/* No need to intercept these MSRs */
-		svm_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
-		svm_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
 	}
+
+	svm_recalc_msr_intercepts(vcpu);
 }
 
 static void init_vmcb(struct kvm_vcpu *vcpu)
@@ -1409,13 +1260,6 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 
 	svm_recalc_instruction_intercepts(vcpu, svm);
 
-	/*
-	 * If the host supports V_SPEC_CTRL then disable the interception
-	 * of MSR_IA32_SPEC_CTRL.
-	 */
-	if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
-		svm_disable_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW);
-
 	if (kvm_vcpu_apicv_active(vcpu))
 		avic_init_vmcb(svm, vmcb);
 
@@ -1446,8 +1290,6 @@ static void __svm_vcpu_reset(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	svm_vcpu_init_msrpm(vcpu);
-
 	svm_init_osvw(vcpu);
 
 	if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_STUFF_FEATURE_MSRS))
@@ -3247,8 +3089,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 
 		/*
 		 * TSC_AUX is usually changed only during boot and never read
-		 * directly.  Intercept TSC_AUX instead of exposing it to the
-		 * guest via direct_access_msrs, and switch it via user return.
+		 * directly.  Intercept TSC_AUX and switch it via user return.
 		 */
 		preempt_disable();
 		ret = kvm_set_user_return_msr(tsc_aux_uret_slot, data, -1ull);
@@ -4684,14 +4525,6 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 	svm_recalc_instruction_intercepts(vcpu, svm);
 
-	if (boot_cpu_has(X86_FEATURE_IBPB))
-		svm_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
-					  !guest_has_pred_cmd_msr(vcpu));
-
-	if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
-		svm_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
-					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
-
 	if (sev_guest(vcpu->kvm))
 		sev_vcpu_after_set_cpuid(svm);
 
@@ -5559,12 +5392,6 @@ static __init int svm_hardware_setup(void)
 	memset(iopm_va, 0xff, PAGE_SIZE * (1 << order));
 	iopm_base = __sme_page_pa(iopm_pages);
 
-	r = init_msrpm_offsets();
-	if (r) {
-		__free_pages(__sme_pa_to_page(iopm_base), get_order(IOPM_SIZE));
-		return r;
-	}
-
 	kvm_caps.supported_xcr0 &= ~(XFEATURE_MASK_BNDREGS |
 				     XFEATURE_MASK_BNDCSR);
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 32bb1e536dce..23e1e3ae30b0 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -318,12 +318,6 @@ struct vcpu_svm {
 	struct list_head ir_list;
 	spinlock_t ir_list_lock;
 
-	/* Save desired MSR intercept (read: pass-through) state */
-	struct {
-		DECLARE_BITMAP(read, MAX_DIRECT_ACCESS_MSRS);
-		DECLARE_BITMAP(write, MAX_DIRECT_ACCESS_MSRS);
-	} shadow_msr_intercept;
-
 	struct vcpu_sev_es_state sev_es;
 
 	bool guest_state_loaded;
@@ -824,6 +818,7 @@ void sev_init_vmcb(struct vcpu_svm *svm);
 void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm);
 int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
 void sev_es_vcpu_reset(struct vcpu_svm *svm);
+void sev_es_recalc_msr_intercepts(struct kvm_vcpu *vcpu);
 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
 void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa);
 void sev_es_unmap_ghcb(struct vcpu_svm *svm);
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 18/28] KVM: x86: Rename msr_filter_changed() => recalc_msr_intercepts()
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (16 preceding siblings ...)
  2025-05-29 23:40 ` [PATCH 17/28] KVM: SVM: " Sean Christopherson
@ 2025-05-29 23:40 ` Sean Christopherson
  2025-05-30 23:47   ` Xin Li
  2025-05-29 23:40 ` [PATCH 19/28] KVM: SVM: Rename init_vmcb_after_set_cpuid() to make it intercepts specific Sean Christopherson
                   ` (9 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:40 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Rename msr_filter_changed() to recalc_msr_intercepts() and drop the
trampoline wrapper now that both SVM and VMX use a filter-agnostic recalc
helper to react to the new userspace filter.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm-x86-ops.h | 2 +-
 arch/x86/include/asm/kvm_host.h    | 2 +-
 arch/x86/kvm/svm/svm.c             | 8 +-------
 arch/x86/kvm/vmx/main.c            | 6 +++---
 arch/x86/kvm/vmx/vmx.c             | 7 +------
 arch/x86/kvm/vmx/x86_ops.h         | 2 +-
 arch/x86/kvm/x86.c                 | 8 +++++++-
 7 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 8d50e3e0a19b..19a6735d6dd8 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -139,7 +139,7 @@ KVM_X86_OP(check_emulate_instruction)
 KVM_X86_OP(apic_init_signal_blocked)
 KVM_X86_OP_OPTIONAL(enable_l2_tlb_flush)
 KVM_X86_OP_OPTIONAL(migrate_timers)
-KVM_X86_OP(msr_filter_changed)
+KVM_X86_OP(recalc_msr_intercepts)
 KVM_X86_OP(complete_emulated_msr)
 KVM_X86_OP(vcpu_deliver_sipi_vector)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 330cdcbed1a6..89a626e5b80f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1885,7 +1885,7 @@ struct kvm_x86_ops {
 	int (*enable_l2_tlb_flush)(struct kvm_vcpu *vcpu);
 
 	void (*migrate_timers)(struct kvm_vcpu *vcpu);
-	void (*msr_filter_changed)(struct kvm_vcpu *vcpu);
+	void (*recalc_msr_intercepts)(struct kvm_vcpu *vcpu);
 	int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
 
 	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 685d9fd4a4e1..a9a801bcc6d0 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -900,11 +900,6 @@ static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 	 */
 }
 
-static void svm_msr_filter_changed(struct kvm_vcpu *vcpu)
-{
-	svm_recalc_msr_intercepts(vcpu);
-}
-
 void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
 {
 	to_vmcb->save.dbgctl		= from_vmcb->save.dbgctl;
@@ -933,7 +928,6 @@ static void svm_disable_lbrv(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 
 	KVM_BUG_ON(sev_es_guest(vcpu->kvm), vcpu->kvm);
-
 	svm->vmcb->control.virt_ext &= ~LBR_CTL_ENABLE_MASK;
 	svm_recalc_lbr_msr_intercepts(vcpu);
 
@@ -5231,7 +5225,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
 
-	.msr_filter_changed = svm_msr_filter_changed,
+	.recalc_msr_intercepts = svm_recalc_msr_intercepts,
 	.complete_emulated_msr = svm_complete_emulated_msr,
 
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index d1e02e567b57..b3c58731a2f5 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -220,7 +220,7 @@ static int vt_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	return vmx_get_msr(vcpu, msr_info);
 }
 
-static void vt_msr_filter_changed(struct kvm_vcpu *vcpu)
+static void vt_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 {
 	/*
 	 * TDX doesn't allow VMM to configure interception of MSR accesses.
@@ -231,7 +231,7 @@ static void vt_msr_filter_changed(struct kvm_vcpu *vcpu)
 	if (is_td_vcpu(vcpu))
 		return;
 
-	vmx_msr_filter_changed(vcpu);
+	vmx_recalc_msr_intercepts(vcpu);
 }
 
 static int vt_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
@@ -1034,7 +1034,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.apic_init_signal_blocked = vt_op(apic_init_signal_blocked),
 	.migrate_timers = vmx_migrate_timers,
 
-	.msr_filter_changed = vt_op(msr_filter_changed),
+	.recalc_msr_intercepts = vt_op(recalc_msr_intercepts),
 	.complete_emulated_msr = vt_op(complete_emulated_msr),
 
 	.vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6ffa2b2b85ce..826510a0b5bb 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4074,7 +4074,7 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
 	}
 }
 
-static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
+void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 {
 	if (!cpu_has_vmx_msr_bitmap())
 		return;
@@ -4124,11 +4124,6 @@ static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 	 */
 }
 
-void vmx_msr_filter_changed(struct kvm_vcpu *vcpu)
-{
-	vmx_recalc_msr_intercepts(vcpu);
-}
-
 static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
 						int vector)
 {
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index b4596f651232..34c6e683e321 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -52,7 +52,7 @@ void vmx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
 			   int trig_mode, int vector);
 void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu);
 bool vmx_has_emulated_msr(struct kvm *kvm, u32 index);
-void vmx_msr_filter_changed(struct kvm_vcpu *vcpu);
+void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu);
 void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
 void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu);
 int vmx_get_feature_msr(u32 msr, u64 *data);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f9f798f286ce..6da6be8ff5fc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10924,8 +10924,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_vcpu_update_apicv(vcpu);
 		if (kvm_check_request(KVM_REQ_APF_READY, vcpu))
 			kvm_check_async_pf_completion(vcpu);
+
+		/*
+		 * Recalc MSR intercepts as userspace may want to intercept
+		 * accesses to MSRs that KVM would otherwise pass through to
+		 * the guest.
+		 */
 		if (kvm_check_request(KVM_REQ_MSR_FILTER_CHANGED, vcpu))
-			kvm_x86_call(msr_filter_changed)(vcpu);
+			kvm_x86_call(recalc_msr_intercepts)(vcpu);
 
 		if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
 			kvm_x86_call(update_cpu_dirty_logging)(vcpu);
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 19/28] KVM: SVM: Rename init_vmcb_after_set_cpuid() to make it intercepts specific
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (17 preceding siblings ...)
  2025-05-29 23:40 ` [PATCH 18/28] KVM: x86: Rename msr_filter_changed() => recalc_msr_intercepts() Sean Christopherson
@ 2025-05-29 23:40 ` Sean Christopherson
  2025-05-29 23:40 ` [PATCH 20/28] KVM: SVM: Fold svm_vcpu_init_msrpm() into its sole caller Sean Christopherson
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:40 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Rename init_vmcb_after_set_cpuid() to svm_recalc_intercepts_after_set_cpuid()
to more precisely describe its role.  Strictly speaking, the name isn't
perfect as toggling virtual VM{LOAD,SAVE} is arguably not recalculating an
intercept, but practically speaking it's close enough.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a9a801bcc6d0..bbd1d89d9a3b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1107,7 +1107,7 @@ static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
 	}
 }
 
-static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
+static void svm_recalc_intercepts_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
@@ -1273,7 +1273,8 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 		sev_init_vmcb(svm);
 
 	svm_hv_init_vmcb(vmcb);
-	init_vmcb_after_set_cpuid(vcpu);
+
+	svm_recalc_intercepts_after_set_cpuid(vcpu);
 
 	vmcb_mark_all_dirty(vmcb);
 
@@ -4522,7 +4523,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	if (sev_guest(vcpu->kvm))
 		sev_vcpu_after_set_cpuid(svm);
 
-	init_vmcb_after_set_cpuid(vcpu);
+	svm_recalc_intercepts_after_set_cpuid(vcpu);
 }
 
 static bool svm_has_wbinvd_exit(void)
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 20/28] KVM: SVM: Fold svm_vcpu_init_msrpm() into its sole caller
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (18 preceding siblings ...)
  2025-05-29 23:40 ` [PATCH 19/28] KVM: SVM: Rename init_vmcb_after_set_cpuid() to make it intercepts specific Sean Christopherson
@ 2025-05-29 23:40 ` Sean Christopherson
  2025-05-29 23:40 ` [PATCH 21/28] KVM: SVM: Merge "after set CPUID" intercept recalc helpers Sean Christopherson
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:40 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Fold svm_vcpu_init_msrpm() into svm_recalc_msr_intercepts() now that there
is only the one caller (and because the "init" misnomer is even more
misleading than it was in the past).

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index bbd1d89d9a3b..12fbfbf9acad 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -787,21 +787,6 @@ static void svm_recalc_lbr_msr_intercepts(struct kvm_vcpu *vcpu)
 
 }
 
-static void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu)
-{
-	svm_disable_intercept_for_msr(vcpu, MSR_STAR, MSR_TYPE_RW);
-	svm_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
-
-#ifdef CONFIG_X86_64
-	svm_disable_intercept_for_msr(vcpu, MSR_GS_BASE, MSR_TYPE_RW);
-	svm_disable_intercept_for_msr(vcpu, MSR_FS_BASE, MSR_TYPE_RW);
-	svm_disable_intercept_for_msr(vcpu, MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
-	svm_disable_intercept_for_msr(vcpu, MSR_LSTAR, MSR_TYPE_RW);
-	svm_disable_intercept_for_msr(vcpu, MSR_CSTAR, MSR_TYPE_RW);
-	svm_disable_intercept_for_msr(vcpu, MSR_SYSCALL_MASK, MSR_TYPE_RW);
-#endif
-}
-
 void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
 {
 	static const u32 x2avic_passthrough_msrs[] = {
@@ -860,7 +845,17 @@ static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	svm_vcpu_init_msrpm(vcpu);
+	svm_disable_intercept_for_msr(vcpu, MSR_STAR, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
+
+#ifdef CONFIG_X86_64
+	svm_disable_intercept_for_msr(vcpu, MSR_GS_BASE, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_FS_BASE, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_LSTAR, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_CSTAR, MSR_TYPE_RW);
+	svm_disable_intercept_for_msr(vcpu, MSR_SYSCALL_MASK, MSR_TYPE_RW);
+#endif
 
 	if (lbrv)
 		svm_recalc_lbr_msr_intercepts(vcpu);
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 21/28] KVM: SVM: Merge "after set CPUID" intercept recalc helpers
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (19 preceding siblings ...)
  2025-05-29 23:40 ` [PATCH 20/28] KVM: SVM: Fold svm_vcpu_init_msrpm() into its sole caller Sean Christopherson
@ 2025-05-29 23:40 ` Sean Christopherson
  2025-05-29 23:40 ` [PATCH 22/28] KVM: SVM: Drop explicit check on MSRPM offset when emulating SEV-ES accesses Sean Christopherson
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:40 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Merge svm_recalc_intercepts_after_set_cpuid() and
svm_recalc_instruction_intercepts() such that the "after set CPUID" helper
simply invokes the type-specific helpers (MSRs vs. instructions), i.e.
make svm_recalc_intercepts_after_set_cpuid() a single entry point for all
intercept updates that need to be performed after a CPUID change.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 12fbfbf9acad..2ebac30a337a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1079,9 +1079,10 @@ void svm_write_tsc_multiplier(struct kvm_vcpu *vcpu)
 }
 
 /* Evaluate instruction intercepts that depend on guest CPUID features. */
-static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
-					      struct vcpu_svm *svm)
+static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
+
 	/*
 	 * Intercept INVPCID if shadow paging is enabled to sync/free shadow
 	 * roots, or if INVPCID is disabled in the guest to inject #UD.
@@ -1100,11 +1101,6 @@ static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
 		else
 			svm_set_intercept(svm, INTERCEPT_RDTSCP);
 	}
-}
-
-static void svm_recalc_intercepts_after_set_cpuid(struct kvm_vcpu *vcpu)
-{
-	struct vcpu_svm *svm = to_svm(vcpu);
 
 	if (guest_cpuid_is_intel_compatible(vcpu)) {
 		svm_set_intercept(svm, INTERCEPT_VMLOAD);
@@ -1121,7 +1117,11 @@ static void svm_recalc_intercepts_after_set_cpuid(struct kvm_vcpu *vcpu)
 			svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
 		}
 	}
+}
 
+static void svm_recalc_intercepts_after_set_cpuid(struct kvm_vcpu *vcpu)
+{
+	svm_recalc_instruction_intercepts(vcpu);
 	svm_recalc_msr_intercepts(vcpu);
 }
 
@@ -1247,8 +1247,6 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 		svm_clr_intercept(svm, INTERCEPT_PAUSE);
 	}
 
-	svm_recalc_instruction_intercepts(vcpu, svm);
-
 	if (kvm_vcpu_apicv_active(vcpu))
 		avic_init_vmcb(svm, vmcb);
 
@@ -4513,8 +4511,6 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	if (guest_cpuid_is_intel_compatible(vcpu))
 		guest_cpu_cap_clear(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
 
-	svm_recalc_instruction_intercepts(vcpu, svm);
-
 	if (sev_guest(vcpu->kvm))
 		sev_vcpu_after_set_cpuid(svm);
 
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 22/28] KVM: SVM: Drop explicit check on MSRPM offset when emulating SEV-ES accesses
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (20 preceding siblings ...)
  2025-05-29 23:40 ` [PATCH 21/28] KVM: SVM: Merge "after set CPUID" intercept recalc helpers Sean Christopherson
@ 2025-05-29 23:40 ` Sean Christopherson
  2025-05-29 23:40 ` [PATCH 23/28] KVM: SVM: Move svm_msrpm_offset() to nested.c Sean Christopherson
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:40 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Now that msr_write_intercepted() defaults to true, i.e. accurately reflects
hardware behavior for out-of-range MSRs, and doesn't WARN (or BUG) on an
out-of-range MSR, drop sev_es_prevent_msr_access()'s svm_msrpm_offset()
check that guarded against calling msr_write_intercepted() with a "bad"
index.

Opportunistically clean up the helper's formatting.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2ebac30a337a..9d01776d82d4 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2771,12 +2771,11 @@ static int svm_get_feature_msr(u32 msr, u64 *data)
 	return 0;
 }
 
-static bool
-sev_es_prevent_msr_access(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+static bool sev_es_prevent_msr_access(struct kvm_vcpu *vcpu,
+				      struct msr_data *msr_info)
 {
 	return sev_es_guest(vcpu->kvm) &&
 	       vcpu->arch.guest_state_protected &&
-	       svm_msrpm_offset(msr_info->index) != MSR_INVALID &&
 	       !msr_write_intercepted(vcpu, msr_info->index);
 }
 
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 23/28] KVM: SVM: Move svm_msrpm_offset() to nested.c
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (21 preceding siblings ...)
  2025-05-29 23:40 ` [PATCH 22/28] KVM: SVM: Drop explicit check on MSRPM offset when emulating SEV-ES accesses Sean Christopherson
@ 2025-05-29 23:40 ` Sean Christopherson
  2025-05-29 23:40 ` [PATCH 24/28] KVM: SVM: Store MSRPM pointer as "void *" instead of "u32 *" Sean Christopherson
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:40 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Move svm_msrpm_offset() from svm.c to nested.c now that all usage of the
u32-index offsets is nested virtualization specific.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c | 27 +++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c    | 27 ---------------------------
 arch/x86/kvm/svm/svm.h    |  1 -
 3 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 0026d2adb809..5d6525627681 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -187,6 +187,33 @@ void recalc_intercepts(struct vcpu_svm *svm)
 static int nested_svm_msrpm_merge_offsets[6] __ro_after_init;
 static int nested_svm_nr_msrpm_merge_offsets __ro_after_init;
 
+static const u32 msrpm_ranges[] = {
+	SVM_MSRPM_RANGE_0_BASE_MSR,
+	SVM_MSRPM_RANGE_1_BASE_MSR,
+	SVM_MSRPM_RANGE_2_BASE_MSR
+};
+
+static u32 svm_msrpm_offset(u32 msr)
+{
+	u32 offset;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(msrpm_ranges); i++) {
+		if (msr < msrpm_ranges[i] ||
+		    msr >= msrpm_ranges[i] + SVM_MSRS_PER_RANGE)
+			continue;
+
+		offset  = (msr - msrpm_ranges[i]) / SVM_MSRS_PER_BYTE;
+		offset += (i * SVM_MSRPM_BYTES_PER_RANGE);  /* add range offset */
+
+		/* Now we have the u8 offset - but need the u32 offset */
+		return offset / 4;
+	}
+
+	/* MSR not in any range */
+	return MSR_INVALID;
+}
+
 int __init nested_svm_init_msrpm_merge_offsets(void)
 {
 	const u32 merge_msrs[] = {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9d01776d82d4..fa2df1c869db 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -195,33 +195,6 @@ static DEFINE_MUTEX(vmcb_dump_mutex);
  */
 static int tsc_aux_uret_slot __read_mostly = -1;
 
-static const u32 msrpm_ranges[] = {
-	SVM_MSRPM_RANGE_0_BASE_MSR,
-	SVM_MSRPM_RANGE_1_BASE_MSR,
-	SVM_MSRPM_RANGE_2_BASE_MSR
-};
-
-u32 svm_msrpm_offset(u32 msr)
-{
-	u32 offset;
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(msrpm_ranges); i++) {
-		if (msr < msrpm_ranges[i] ||
-		    msr >= msrpm_ranges[i] + SVM_MSRS_PER_RANGE)
-			continue;
-
-		offset  = (msr - msrpm_ranges[i]) / SVM_MSRS_PER_BYTE;
-		offset += (i * SVM_MSRPM_BYTES_PER_RANGE);  /* add range offset */
-
-		/* Now we have the u8 offset - but need the u32 offset */
-		return offset / 4;
-	}
-
-	/* MSR not in any range */
-	return MSR_INVALID;
-}
-
 static int get_npt_level(void)
 {
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 23e1e3ae30b0..d146c35b9bd2 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -673,7 +673,6 @@ BUILD_SVM_MSR_BITMAP_HELPERS(void, set, __set)
 /* svm.c */
 extern bool dump_invalid_vmcb;
 
-u32 svm_msrpm_offset(u32 msr);
 u32 *svm_vcpu_alloc_msrpm(void);
 void svm_vcpu_free_msrpm(u32 *msrpm);
 void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb);
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 24/28] KVM: SVM: Store MSRPM pointer as "void *" instead of "u32 *"
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (22 preceding siblings ...)
  2025-05-29 23:40 ` [PATCH 23/28] KVM: SVM: Move svm_msrpm_offset() to nested.c Sean Christopherson
@ 2025-05-29 23:40 ` Sean Christopherson
  2025-05-29 23:40 ` [PATCH 25/28] KVM: nSVM: Access MSRPM in 4-byte chunks only for merging L0 and L1 bitmaps Sean Christopherson
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:40 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Store KVM's MSRPM pointers as "void *" instead of "u32 *" to guard against
directly accessing the bitmaps outside of code that is explicitly written
to access the bitmaps with a specific type.

Opportunistically use svm_vcpu_free_msrpm() in svm_vcpu_free() instead of
open coding an equivalent.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c |  4 +++-
 arch/x86/kvm/svm/svm.c    |  8 ++++----
 arch/x86/kvm/svm/svm.h    | 13 ++++++++-----
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 5d6525627681..e07e10fb52a5 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -271,6 +271,8 @@ int __init nested_svm_init_msrpm_merge_offsets(void)
 static bool nested_svm_merge_msrpm(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	u32 *msrpm02 = svm->nested.msrpm;
+	u32 *msrpm01 = svm->msrpm;
 	int i;
 
 	/*
@@ -305,7 +307,7 @@ static bool nested_svm_merge_msrpm(struct kvm_vcpu *vcpu)
 		if (kvm_vcpu_read_guest(vcpu, offset, &value, 4))
 			return false;
 
-		svm->nested.msrpm[p] = svm->msrpm[p] | value;
+		msrpm02[p] = msrpm01[p] | value;
 	}
 
 	svm->nested.force_msr_bitmap_recalc = false;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index fa2df1c869db..6f99031c2926 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -731,11 +731,11 @@ void svm_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
 	svm->nested.force_msr_bitmap_recalc = true;
 }
 
-u32 *svm_vcpu_alloc_msrpm(void)
+void *svm_vcpu_alloc_msrpm(void)
 {
 	unsigned int order = get_order(MSRPM_SIZE);
 	struct page *pages = alloc_pages(GFP_KERNEL_ACCOUNT, order);
-	u32 *msrpm;
+	void *msrpm;
 
 	if (!pages)
 		return NULL;
@@ -809,7 +809,7 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
 	svm->x2avic_msrs_intercepted = intercept;
 }
 
-void svm_vcpu_free_msrpm(u32 *msrpm)
+void svm_vcpu_free_msrpm(void *msrpm)
 {
 	__free_pages(virt_to_page(msrpm), get_order(MSRPM_SIZE));
 }
@@ -1353,7 +1353,7 @@ static void svm_vcpu_free(struct kvm_vcpu *vcpu)
 	sev_free_vcpu(vcpu);
 
 	__free_page(__sme_pa_to_page(svm->vmcb01.pa));
-	__free_pages(virt_to_page(svm->msrpm), get_order(MSRPM_SIZE));
+	svm_vcpu_free_msrpm(svm->msrpm);
 }
 
 #ifdef CONFIG_CPU_MITIGATIONS
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index d146c35b9bd2..77287c870967 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -189,8 +189,11 @@ struct svm_nested_state {
 	u64 vmcb12_gpa;
 	u64 last_vmcb12_gpa;
 
-	/* These are the merged vectors */
-	u32 *msrpm;
+	/*
+	 * The MSR permissions map used for vmcb02, which is the merge result
+	 * of vmcb01 and vmcb12
+	 */
+	void *msrpm;
 
 	/* A VMRUN has started but has not yet been performed, so
 	 * we cannot inject a nested vmexit yet.  */
@@ -271,7 +274,7 @@ struct vcpu_svm {
 	 */
 	u64 virt_spec_ctrl;
 
-	u32 *msrpm;
+	void *msrpm;
 
 	ulong nmi_iret_rip;
 
@@ -673,8 +676,8 @@ BUILD_SVM_MSR_BITMAP_HELPERS(void, set, __set)
 /* svm.c */
 extern bool dump_invalid_vmcb;
 
-u32 *svm_vcpu_alloc_msrpm(void);
-void svm_vcpu_free_msrpm(u32 *msrpm);
+void *svm_vcpu_alloc_msrpm(void);
+void svm_vcpu_free_msrpm(void *msrpm);
 void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb);
 void svm_enable_lbrv(struct kvm_vcpu *vcpu);
 void svm_update_lbrv(struct kvm_vcpu *vcpu);
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 25/28] KVM: nSVM: Access MSRPM in 4-byte chunks only for merging L0 and L1 bitmaps
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (23 preceding siblings ...)
  2025-05-29 23:40 ` [PATCH 24/28] KVM: SVM: Store MSRPM pointer as "void *" instead of "u32 *" Sean Christopherson
@ 2025-05-29 23:40 ` Sean Christopherson
  2025-06-04 16:19   ` Paolo Bonzini
  2025-05-29 23:40 ` [PATCH 26/28] KVM: SVM: Return -EINVAL instead of MSR_INVALID to signal out-of-range MSR Sean Christopherson
                   ` (2 subsequent siblings)
  27 siblings, 1 reply; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:40 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Access the MSRPM using u32/4-byte chunks (and appropriately adjusted
offsets) only when merging L0 and L1 bitmaps as part of emulating VMRUN.
The only reason to batch accesses to MSRPMs is to avoid the overhead of
uaccess operations (e.g. STAC/CLAC and bounds checks) when reading L1's
bitmap pointed at by vmcb12.  For all other uses, either per-bit accesses
are more than fast enough (no uaccess), or KVM is only accessing a single
bit (nested_svm_exit_handled_msr()) and so there's nothing to batch.

In addition to (hopefully) documenting the uniqueness of the merging code,
restricting chunked access to _just_ the merging code will allow for
increasing the chunk size (to unsigned long) with minimal risk.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c | 50 ++++++++++++++++-----------------------
 arch/x86/kvm/svm/svm.h    | 18 ++++++++++----
 2 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index e07e10fb52a5..a4e98ada732b 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -187,31 +187,19 @@ void recalc_intercepts(struct vcpu_svm *svm)
 static int nested_svm_msrpm_merge_offsets[6] __ro_after_init;
 static int nested_svm_nr_msrpm_merge_offsets __ro_after_init;
 
-static const u32 msrpm_ranges[] = {
-	SVM_MSRPM_RANGE_0_BASE_MSR,
-	SVM_MSRPM_RANGE_1_BASE_MSR,
-	SVM_MSRPM_RANGE_2_BASE_MSR
-};
+#define SVM_BUILD_MSR_BYTE_NR_CASE(range_nr, msr)				\
+	case SVM_MSRPM_FIRST_MSR(range_nr) ... SVM_MSRPM_LAST_MSR(range_nr):	\
+		return SVM_MSRPM_BYTE_NR(range_nr, msr);
 
 static u32 svm_msrpm_offset(u32 msr)
 {
-	u32 offset;
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(msrpm_ranges); i++) {
-		if (msr < msrpm_ranges[i] ||
-		    msr >= msrpm_ranges[i] + SVM_MSRS_PER_RANGE)
-			continue;
-
-		offset  = (msr - msrpm_ranges[i]) / SVM_MSRS_PER_BYTE;
-		offset += (i * SVM_MSRPM_BYTES_PER_RANGE);  /* add range offset */
-
-		/* Now we have the u8 offset - but need the u32 offset */
-		return offset / 4;
+	switch (msr) {
+	SVM_BUILD_MSR_BYTE_NR_CASE(0, msr)
+	SVM_BUILD_MSR_BYTE_NR_CASE(1, msr)
+	SVM_BUILD_MSR_BYTE_NR_CASE(2, msr)
+	default:
+		return MSR_INVALID;
 	}
-
-	/* MSR not in any range */
-	return MSR_INVALID;
 }
 
 int __init nested_svm_init_msrpm_merge_offsets(void)
@@ -245,6 +233,12 @@ int __init nested_svm_init_msrpm_merge_offsets(void)
 		if (WARN_ON(offset == MSR_INVALID))
 			return -EIO;
 
+		/*
+		 * Merging is done in 32-bit chunks to reduce the number of
+		 * accesses to L1's bitmap.
+		 */
+		offset /= sizeof(u32);
+
 		for (j = 0; j < nested_svm_nr_msrpm_merge_offsets; j++) {
 			if (nested_svm_msrpm_merge_offsets[j] == offset)
 				break;
@@ -1363,8 +1357,9 @@ void svm_leave_nested(struct kvm_vcpu *vcpu)
 
 static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
 {
-	u32 offset, msr, value;
-	int write, mask;
+	u32 offset, msr;
+	int write;
+	u8 value;
 
 	if (!(vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_MSR_PROT)))
 		return NESTED_EXIT_HOST;
@@ -1372,18 +1367,15 @@ static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
 	msr    = svm->vcpu.arch.regs[VCPU_REGS_RCX];
 	offset = svm_msrpm_offset(msr);
 	write  = svm->vmcb->control.exit_info_1 & 1;
-	mask   = 1 << ((2 * (msr & 0xf)) + write);
 
 	if (offset == MSR_INVALID)
 		return NESTED_EXIT_DONE;
 
-	/* Offset is in 32 bit units but need in 8 bit units */
-	offset *= 4;
-
-	if (kvm_vcpu_read_guest(&svm->vcpu, svm->nested.ctl.msrpm_base_pa + offset, &value, 4))
+	if (kvm_vcpu_read_guest(&svm->vcpu, svm->nested.ctl.msrpm_base_pa + offset,
+				&value, sizeof(value)))
 		return NESTED_EXIT_DONE;
 
-	return (value & mask) ? NESTED_EXIT_DONE : NESTED_EXIT_HOST;
+	return (value & BIT(write)) ? NESTED_EXIT_DONE : NESTED_EXIT_HOST;
 }
 
 static int nested_svm_intercept_ioio(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 77287c870967..155b6089fcd2 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -634,15 +634,23 @@ static_assert(SVM_MSRS_PER_RANGE == 8192);
 	(range_nr * SVM_MSRPM_BYTES_PER_RANGE * BITS_PER_BYTE +			\
 	 (msr - SVM_MSRPM_RANGE_## range_nr ##_BASE_MSR) * SVM_BITS_PER_MSR)
 
-#define SVM_MSRPM_SANITY_CHECK_BITS(range_nr)					\
+#define SVM_MSRPM_BYTE_NR(range_nr, msr)					\
+	(range_nr * SVM_MSRPM_BYTES_PER_RANGE +					\
+	 (msr - SVM_MSRPM_RANGE_## range_nr ##_BASE_MSR) / SVM_MSRS_PER_BYTE)
+
+#define SVM_MSRPM_SANITY_CHECK_BITS_AND_BYTES(range_nr)				\
 static_assert(SVM_MSRPM_BIT_NR(range_nr, SVM_MSRPM_FIRST_MSR(range_nr) + 1) ==	\
 	      range_nr * 2048 * 8 + 2);						\
 static_assert(SVM_MSRPM_BIT_NR(range_nr, SVM_MSRPM_FIRST_MSR(range_nr) + 7) ==	\
-	      range_nr * 2048 * 8 + 14);
+	      range_nr * 2048 * 8 + 14);					\
+static_assert(SVM_MSRPM_BYTE_NR(range_nr, SVM_MSRPM_FIRST_MSR(range_nr) + 1) ==	\
+	      range_nr * 2048);							\
+static_assert(SVM_MSRPM_BYTE_NR(range_nr, SVM_MSRPM_FIRST_MSR(range_nr) + 7) ==	\
+	      range_nr * 2048 + 1);
 
-SVM_MSRPM_SANITY_CHECK_BITS(0);
-SVM_MSRPM_SANITY_CHECK_BITS(1);
-SVM_MSRPM_SANITY_CHECK_BITS(2);
+SVM_MSRPM_SANITY_CHECK_BITS_AND_BYTES(0);
+SVM_MSRPM_SANITY_CHECK_BITS_AND_BYTES(1);
+SVM_MSRPM_SANITY_CHECK_BITS_AND_BYTES(2);
 
 #define SVM_BUILD_MSR_BITMAP_CASE(bitmap, range_nr, msr, bitop, bit_rw)		\
 	case SVM_MSRPM_FIRST_MSR(range_nr) ... SVM_MSRPM_LAST_MSR(range_nr):	\
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 26/28] KVM: SVM: Return -EINVAL instead of MSR_INVALID to signal out-of-range MSR
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (24 preceding siblings ...)
  2025-05-29 23:40 ` [PATCH 25/28] KVM: nSVM: Access MSRPM in 4-byte chunks only for merging L0 and L1 bitmaps Sean Christopherson
@ 2025-05-29 23:40 ` Sean Christopherson
  2025-05-29 23:40 ` [PATCH 27/28] KVM: nSVM: Merge MSRPM in 64-bit chunks on 64-bit kernels Sean Christopherson
  2025-05-29 23:40 ` [PATCH 28/28] KVM: selftests: Verify KVM disable interception (for userspace) on filter change Sean Christopherson
  27 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:40 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Return -EINVAL instead of MSR_INVALID from svm_msrpm_offset() to indicate
that the MSR isn't covered by one of the (currently) three MSRPM ranges,
and delete the MSR_INVALID macro now that all users are gone.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c | 14 +++++++-------
 arch/x86/kvm/svm/svm.h    |  2 --
 2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index a4e98ada732b..60f62cddd291 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -191,14 +191,14 @@ static int nested_svm_nr_msrpm_merge_offsets __ro_after_init;
 	case SVM_MSRPM_FIRST_MSR(range_nr) ... SVM_MSRPM_LAST_MSR(range_nr):	\
 		return SVM_MSRPM_BYTE_NR(range_nr, msr);
 
-static u32 svm_msrpm_offset(u32 msr)
+static int svm_msrpm_offset(u32 msr)
 {
 	switch (msr) {
 	SVM_BUILD_MSR_BYTE_NR_CASE(0, msr)
 	SVM_BUILD_MSR_BYTE_NR_CASE(1, msr)
 	SVM_BUILD_MSR_BYTE_NR_CASE(2, msr)
 	default:
-		return MSR_INVALID;
+		return -EINVAL;
 	}
 }
 
@@ -228,9 +228,9 @@ int __init nested_svm_init_msrpm_merge_offsets(void)
 	int i, j;
 
 	for (i = 0; i < ARRAY_SIZE(merge_msrs); i++) {
-		u32 offset = svm_msrpm_offset(merge_msrs[i]);
+		int offset = svm_msrpm_offset(merge_msrs[i]);
 
-		if (WARN_ON(offset == MSR_INVALID))
+		if (WARN_ON(offset < 0))
 			return -EIO;
 
 		/*
@@ -1357,9 +1357,9 @@ void svm_leave_nested(struct kvm_vcpu *vcpu)
 
 static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
 {
-	u32 offset, msr;
-	int write;
+	int offset, write;
 	u8 value;
+	u32 msr;
 
 	if (!(vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_MSR_PROT)))
 		return NESTED_EXIT_HOST;
@@ -1368,7 +1368,7 @@ static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
 	offset = svm_msrpm_offset(msr);
 	write  = svm->vmcb->control.exit_info_1 & 1;
 
-	if (offset == MSR_INVALID)
+	if (offset < 0)
 		return NESTED_EXIT_DONE;
 
 	if (kvm_vcpu_read_guest(&svm->vcpu, svm->nested.ctl.msrpm_base_pa + offset,
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 155b6089fcd2..27c722fd766e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -677,8 +677,6 @@ BUILD_SVM_MSR_BITMAP_HELPERS(bool, test, test)
 BUILD_SVM_MSR_BITMAP_HELPERS(void, clear, __clear)
 BUILD_SVM_MSR_BITMAP_HELPERS(void, set, __set)
 
-#define MSR_INVALID				0xffffffffU
-
 #define DEBUGCTL_RESERVED_BITS (~DEBUGCTLMSR_LBR)
 
 /* svm.c */
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 27/28] KVM: nSVM: Merge MSRPM in 64-bit chunks on 64-bit kernels
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (25 preceding siblings ...)
  2025-05-29 23:40 ` [PATCH 26/28] KVM: SVM: Return -EINVAL instead of MSR_INVALID to signal out-of-range MSR Sean Christopherson
@ 2025-05-29 23:40 ` Sean Christopherson
  2025-05-29 23:40 ` [PATCH 28/28] KVM: selftests: Verify KVM disable interception (for userspace) on filter change Sean Christopherson
  27 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:40 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

When merging L0 and L1 MSRPMs as part of nested VMRUN emulation, access
the bitmaps using "unsigned long" chunks, i.e. use 8-byte access for
64-bit kernels instead of arbitrarily working on 4-byte chunks.

Opportunistically rename local variables in nested_svm_merge_msrpm() to
more precisely/accurately reflect their purpose ("offset" in particular is
extremely ambiguous).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 60f62cddd291..fb4808cf4711 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -184,6 +184,7 @@ void recalc_intercepts(struct vcpu_svm *svm)
 	}
 }
 
+typedef unsigned long nsvm_msrpm_merge_t;
 static int nested_svm_msrpm_merge_offsets[6] __ro_after_init;
 static int nested_svm_nr_msrpm_merge_offsets __ro_after_init;
 
@@ -234,10 +235,10 @@ int __init nested_svm_init_msrpm_merge_offsets(void)
 			return -EIO;
 
 		/*
-		 * Merging is done in 32-bit chunks to reduce the number of
-		 * accesses to L1's bitmap.
+		 * Merging is done in chunks to reduce the number of accesses
+		 * to L1's bitmap.
 		 */
-		offset /= sizeof(u32);
+		offset /= sizeof(nsvm_msrpm_merge_t);
 
 		for (j = 0; j < nested_svm_nr_msrpm_merge_offsets; j++) {
 			if (nested_svm_msrpm_merge_offsets[j] == offset)
@@ -265,8 +266,8 @@ int __init nested_svm_init_msrpm_merge_offsets(void)
 static bool nested_svm_merge_msrpm(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
-	u32 *msrpm02 = svm->nested.msrpm;
-	u32 *msrpm01 = svm->msrpm;
+	nsvm_msrpm_merge_t *msrpm02 = svm->nested.msrpm;
+	nsvm_msrpm_merge_t *msrpm01 = svm->msrpm;
 	int i;
 
 	/*
@@ -293,15 +294,15 @@ static bool nested_svm_merge_msrpm(struct kvm_vcpu *vcpu)
 
 	for (i = 0; i < nested_svm_nr_msrpm_merge_offsets; i++) {
 		const int p = nested_svm_msrpm_merge_offsets[i];
-		u32 value;
-		u64 offset;
+		nsvm_msrpm_merge_t l1_val;
+		gpa_t gpa;
 
-		offset = svm->nested.ctl.msrpm_base_pa + (p * 4);
+		gpa = svm->nested.ctl.msrpm_base_pa + (p * sizeof(l1_val));
 
-		if (kvm_vcpu_read_guest(vcpu, offset, &value, 4))
+		if (kvm_vcpu_read_guest(vcpu, gpa, &l1_val, sizeof(l1_val)))
 			return false;
 
-		msrpm02[p] = msrpm01[p] | value;
+		msrpm02[p] = msrpm01[p] | l1_val;
 	}
 
 	svm->nested.force_msr_bitmap_recalc = false;
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 28/28] KVM: selftests: Verify KVM disable interception (for userspace) on filter change
  2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
                   ` (26 preceding siblings ...)
  2025-05-29 23:40 ` [PATCH 27/28] KVM: nSVM: Merge MSRPM in 64-bit chunks on 64-bit kernels Sean Christopherson
@ 2025-05-29 23:40 ` Sean Christopherson
  2025-06-03  5:47   ` Mi, Dapeng
  27 siblings, 1 reply; 60+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:40 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

Re-read MSR_{FS,GS}_BASE after restoring the "allow everything" userspace
MSR filter to verify that KVM stops forwarding exits to userspace.  This
can also be used in conjunction with manual verification (e.g. printk) to
ensure KVM is correctly updating the MSR bitmaps consumed by hardware.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/x86/userspace_msr_exit_test.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/testing/selftests/kvm/x86/userspace_msr_exit_test.c b/tools/testing/selftests/kvm/x86/userspace_msr_exit_test.c
index 32b2794b78fe..8463a9956410 100644
--- a/tools/testing/selftests/kvm/x86/userspace_msr_exit_test.c
+++ b/tools/testing/selftests/kvm/x86/userspace_msr_exit_test.c
@@ -343,6 +343,12 @@ static void guest_code_permission_bitmap(void)
 	data = test_rdmsr(MSR_GS_BASE);
 	GUEST_ASSERT(data == MSR_GS_BASE);
 
+	/* Access the MSRs again to ensure KVM has disabled interception.*/
+	data = test_rdmsr(MSR_FS_BASE);
+	GUEST_ASSERT(data != MSR_FS_BASE);
+	data = test_rdmsr(MSR_GS_BASE);
+	GUEST_ASSERT(data != MSR_GS_BASE);
+
 	GUEST_DONE();
 }
 
@@ -682,6 +688,8 @@ KVM_ONE_VCPU_TEST(user_msr, msr_permission_bitmap, guest_code_permission_bitmap)
 		    "Expected ucall state to be UCALL_SYNC.");
 	vm_ioctl(vm, KVM_X86_SET_MSR_FILTER, &filter_gs);
 	run_guest_then_process_rdmsr(vcpu, MSR_GS_BASE);
+
+	vm_ioctl(vm, KVM_X86_SET_MSR_FILTER, &filter_allow);
 	run_guest_then_process_ucall_done(vcpu);
 }
 
-- 
2.49.0.1204.g71687c7c1d-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 16/28] KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter change
  2025-05-29 23:40 ` [PATCH 16/28] KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter change Sean Christopherson
@ 2025-05-30 23:38   ` Xin Li
  2025-06-02 17:10     ` Sean Christopherson
  2025-06-03  3:52   ` Mi, Dapeng
  2025-06-05  6:59   ` Chao Gao
  2 siblings, 1 reply; 60+ messages in thread
From: Xin Li @ 2025-05-30 23:38 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Chao Gao, Dapeng Mi

On 5/29/2025 4:40 PM, Sean Christopherson wrote:
> On a userspace MSR filter change, recalculate all MSR intercepts using the
> filter-agnostic logic instead of maintaining a "shadow copy" of KVM's
> desired intercepts.  The shadow bitmaps add yet another point of failure,
> are confusing (e.g. what does "handled specially" mean!?!?), an eyesore,
> and a maintenance burden.
> 
> Given that KVM *must* be able to recalculate the correct intercepts at any
> given time, and that MSR filter updates are not hot paths, there is zero
> benefit to maintaining the shadow bitmaps.

+1

To me, this patch does simplify the logic by removing the bitmap state 
management.


Just one very minor comment below — other than that:

Reviewed-by: Xin Li (Intel) <xin@zytor.com>

> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 8f7fe04a1998..6ffa2b2b85ce 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4159,35 +4074,59 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
>   	}
>   }
>   
> -void vmx_msr_filter_changed(struct kvm_vcpu *vcpu)
> +static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
>   {
> -	struct vcpu_vmx *vmx = to_vmx(vcpu);
> -	u32 i;
> -
>   	if (!cpu_has_vmx_msr_bitmap())
>   		return;
>   
> -	/*
> -	 * Redo intercept permissions for MSRs that KVM is passing through to
> -	 * the guest.  Disabling interception will check the new MSR filter and
> -	 * ensure that KVM enables interception if usersepace wants to filter
> -	 * the MSR.  MSRs that KVM is already intercepting don't need to be
> -	 * refreshed since KVM is going to intercept them regardless of what
> -	 * userspace wants.
> -	 */
> -	for (i = 0; i < ARRAY_SIZE(vmx_possible_passthrough_msrs); i++) {
> -		u32 msr = vmx_possible_passthrough_msrs[i];
> -
> -		if (!test_bit(i, vmx->shadow_msr_intercept.read))
> -			vmx_disable_intercept_for_msr(vcpu, msr, MSR_TYPE_R);
> -
> -		if (!test_bit(i, vmx->shadow_msr_intercept.write))
> -			vmx_disable_intercept_for_msr(vcpu, msr, MSR_TYPE_W);
> +	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_TSC, MSR_TYPE_R);
> +#ifdef CONFIG_X86_64
> +	vmx_disable_intercept_for_msr(vcpu, MSR_FS_BASE, MSR_TYPE_RW);
> +	vmx_disable_intercept_for_msr(vcpu, MSR_GS_BASE, MSR_TYPE_RW);
> +	vmx_disable_intercept_for_msr(vcpu, MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
> +#endif
> +	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
> +	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
> +	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
> +	if (kvm_cstate_in_guest(vcpu->kvm)) {
> +		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C1_RES, MSR_TYPE_R);
> +		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
> +		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
> +		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
>   	}
>   
>   	/* PT MSRs can be passed through iff PT is exposed to the guest. */
>   	if (vmx_pt_mode_is_host_guest())
>   		pt_update_intercept_for_msr(vcpu);
> +
> +	if (vcpu->arch.xfd_no_write_intercept)
> +		vmx_disable_intercept_for_msr(vcpu, MSR_IA32_XFD, MSR_TYPE_RW);
> +
> +
> +	vmx_set_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW,
> +				  !to_vmx(vcpu)->spec_ctrl);
> +
> +	if (kvm_cpu_cap_has(X86_FEATURE_XFD))
> +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_R,
> +					  !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD));
> +
> +	if (boot_cpu_has(X86_FEATURE_IBPB))

I think Boris prefers using cpu_feature_enabled() instead — maybe this
is a good opportunity to update this occurrence?

> +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
> +					  !guest_has_pred_cmd_msr(vcpu));
> +
> +	if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))

Ditto.

> +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
> +					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
> +
> +	/*
> +	 * x2APIC and LBR MSR intercepts are modified on-demand and cannot be
> +	 * filtered by userspace.
> +	 */
> +}

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 18/28] KVM: x86: Rename msr_filter_changed() => recalc_msr_intercepts()
  2025-05-29 23:40 ` [PATCH 18/28] KVM: x86: Rename msr_filter_changed() => recalc_msr_intercepts() Sean Christopherson
@ 2025-05-30 23:47   ` Xin Li
  0 siblings, 0 replies; 60+ messages in thread
From: Xin Li @ 2025-05-30 23:47 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Chao Gao, Dapeng Mi

On 5/29/2025 4:40 PM, Sean Christopherson wrote:
> Rename msr_filter_changed() to recalc_msr_intercepts() and drop the
> trampoline wrapper now that both SVM and VMX use a filter-agnostic recalc
> helper to react to the new userspace filter.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson<seanjc@google.com>
> ---
>   arch/x86/include/asm/kvm-x86-ops.h | 2 +-
>   arch/x86/include/asm/kvm_host.h    | 2 +-
>   arch/x86/kvm/svm/svm.c             | 8 +-------
>   arch/x86/kvm/vmx/main.c            | 6 +++---
>   arch/x86/kvm/vmx/vmx.c             | 7 +------
>   arch/x86/kvm/vmx/x86_ops.h         | 2 +-
>   arch/x86/kvm/x86.c                 | 8 +++++++-
>   7 files changed, 15 insertions(+), 20 deletions(-)

Reviewed-by: Xin Li (Intel) <xin@zytor.com>


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 17/28] KVM: SVM: Manually recalc all MSR intercepts on userspace MSR filter change
  2025-05-29 23:40 ` [PATCH 17/28] KVM: SVM: " Sean Christopherson
@ 2025-05-31 18:01   ` Francesco Lavra
  2025-06-02 17:08     ` Sean Christopherson
  2025-06-05  6:24   ` Chao Gao
  1 sibling, 1 reply; 60+ messages in thread
From: Francesco Lavra @ 2025-05-31 18:01 UTC (permalink / raw)
  To: seanjc; +Cc: bp, chao.gao, dapeng1.mi, kvm, linux-kernel, pbonzini, xin

On 2025-05-29 at 23:40, Sean Christopherson wrote:
> @@ -81,70 +79,6 @@ static uint64_t osvw_len = 4, osvw_status;
>  
>  static DEFINE_PER_CPU(u64, current_tsc_ratio);
>  
> -static const u32 direct_access_msrs[] = {
> -	MSR_STAR,
> -	MSR_IA32_SYSENTER_CS,
> -	MSR_IA32_SYSENTER_EIP,
> -	MSR_IA32_SYSENTER_ESP,
> -#ifdef CONFIG_X86_64
> -	MSR_GS_BASE,
> -	MSR_FS_BASE,
> -	MSR_KERNEL_GS_BASE,
> -	MSR_LSTAR,
> -	MSR_CSTAR,
> -	MSR_SYSCALL_MASK,
> -#endif
> -	MSR_IA32_SPEC_CTRL,
> -	MSR_IA32_PRED_CMD,
> -	MSR_IA32_FLUSH_CMD,
> -	MSR_IA32_DEBUGCTLMSR,
> -	MSR_IA32_LASTBRANCHFROMIP,
> -	MSR_IA32_LASTBRANCHTOIP,
> -	MSR_IA32_LASTINTFROMIP,
> -	MSR_IA32_LASTINTTOIP,
> -	MSR_IA32_XSS,
> -	MSR_EFER,
> -	MSR_IA32_CR_PAT,
> -	MSR_AMD64_SEV_ES_GHCB,
> -	MSR_TSC_AUX,
> -	X2APIC_MSR(APIC_ID),
> -	X2APIC_MSR(APIC_LVR),
> -	X2APIC_MSR(APIC_TASKPRI),
> -	X2APIC_MSR(APIC_ARBPRI),
> -	X2APIC_MSR(APIC_PROCPRI),
> -	X2APIC_MSR(APIC_EOI),
> -	X2APIC_MSR(APIC_RRR),
> -	X2APIC_MSR(APIC_LDR),
> -	X2APIC_MSR(APIC_DFR),
> -	X2APIC_MSR(APIC_SPIV),
> -	X2APIC_MSR(APIC_ISR),
> -	X2APIC_MSR(APIC_TMR),
> -	X2APIC_MSR(APIC_IRR),
> -	X2APIC_MSR(APIC_ESR),
> -	X2APIC_MSR(APIC_ICR),
> -	X2APIC_MSR(APIC_ICR2),
> -
> -	/*
> -	 * Note:
> -	 * AMD does not virtualize APIC TSC-deadline timer mode, but it
> is
> -	 * emulated by KVM. When setting APIC LVTT (0x832) register bit
> 18,
> -	 * the AVIC hardware would generate GP fault. Therefore, always
> -	 * intercept the MSR 0x832, and do not setup direct_access_msr.
> -	 */
> -	X2APIC_MSR(APIC_LVTTHMR),
> -	X2APIC_MSR(APIC_LVTPC),
> -	X2APIC_MSR(APIC_LVT0),
> -	X2APIC_MSR(APIC_LVT1),
> -	X2APIC_MSR(APIC_LVTERR),
> -	X2APIC_MSR(APIC_TMICT),
> -	X2APIC_MSR(APIC_TMCCT),
> -	X2APIC_MSR(APIC_TDCR),
> -};
> -
> -static_assert(ARRAY_SIZE(direct_access_msrs) ==
> -	      MAX_DIRECT_ACCESS_MSRS - 6 * !IS_ENABLED(CONFIG_X86_64));
> -#undef MAX_DIRECT_ACCESS_MSRS

The MAX_DIRECT_ACCESS_MSRS define should now be removed from
arch/x86/kvm/svm/svm.harch/x86/kvm/svm/svm.h, since it's no longer used.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 17/28] KVM: SVM: Manually recalc all MSR intercepts on userspace MSR filter change
  2025-05-31 18:01   ` Francesco Lavra
@ 2025-06-02 17:08     ` Sean Christopherson
  0 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-06-02 17:08 UTC (permalink / raw)
  To: Francesco Lavra
  Cc: bp, chao.gao, dapeng1.mi, kvm, linux-kernel, pbonzini, xin

On Sat, May 31, 2025, Francesco Lavra wrote:
> On 2025-05-29 at 23:40, Sean Christopherson wrote:
> > -static_assert(ARRAY_SIZE(direct_access_msrs) ==
> > -	      MAX_DIRECT_ACCESS_MSRS - 6 * !IS_ENABLED(CONFIG_X86_64));
> > -#undef MAX_DIRECT_ACCESS_MSRS
> 
> The MAX_DIRECT_ACCESS_MSRS define should now be removed from
> arch/x86/kvm/svm/svm.harch/x86/kvm/svm/svm.h, since it's no longer used.

/facepalm

All that work to get rid of the darn thing, and then I forget to actually get rid
of it.

I also failed to remove the msrpm_offset declaration (and its size macro)

  #define MSRPM_OFFSETS	32
  extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;

which are dead code as of "KVM: SVM: Manually recalc all MSR intercepts on userspace
MSR filter change".

Nice catch, and thank you!

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 16/28] KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter change
  2025-05-30 23:38   ` Xin Li
@ 2025-06-02 17:10     ` Sean Christopherson
  0 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-06-02 17:10 UTC (permalink / raw)
  To: Xin Li
  Cc: Paolo Bonzini, kvm, linux-kernel, Borislav Petkov, Chao Gao,
	Dapeng Mi

On Fri, May 30, 2025, Xin Li wrote:
> > +
> > +	if (vcpu->arch.xfd_no_write_intercept)
> > +		vmx_disable_intercept_for_msr(vcpu, MSR_IA32_XFD, MSR_TYPE_RW);
> > +
> > +
> > +	vmx_set_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW,
> > +				  !to_vmx(vcpu)->spec_ctrl);
> > +
> > +	if (kvm_cpu_cap_has(X86_FEATURE_XFD))
> > +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_R,
> > +					  !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD));
> > +
> > +	if (boot_cpu_has(X86_FEATURE_IBPB))
> 
> I think Boris prefers using cpu_feature_enabled() instead — maybe this
> is a good opportunity to update this occurrence?

Yeah, I'm comfortable squeezing in that change.

> > +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
> > +					  !guest_has_pred_cmd_msr(vcpu));
> > +
> > +	if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
> 
> Ditto.
> 
> > +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
> > +					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
> > +
> > +	/*
> > +	 * x2APIC and LBR MSR intercepts are modified on-demand and cannot be
> > +	 * filtered by userspace.
> > +	 */
> > +}

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 06/28] KVM: SVM: Massage name and param of helper that merges vmcb01 and vmcb12 MSRPMs
  2025-05-29 23:39 ` [PATCH 06/28] KVM: SVM: Massage name and param of helper that merges vmcb01 and vmcb12 MSRPMs Sean Christopherson
@ 2025-06-03  2:32   ` Mi, Dapeng
  0 siblings, 0 replies; 60+ messages in thread
From: Mi, Dapeng @ 2025-06-03  2:32 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao

On 5/30/2025 7:39 AM, Sean Christopherson wrote:
> Renam nested_svm_vmrun_msrpm() to nested_svm_merge_msrpm() to better

"Renam" -> "Rename".


> capture its role, and opportunistically feed it @vcpu instead of @svm, as
> grabbing "svm" only to turn around and grab svm->vcpu is rather silly.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/nested.c | 15 +++++++--------
>  arch/x86/kvm/svm/svm.c    |  2 +-
>  2 files changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 8427a48b8b7a..89a77f0f1cc8 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -189,8 +189,9 @@ void recalc_intercepts(struct vcpu_svm *svm)
>   * is optimized in that it only merges the parts where KVM MSR permission bitmap
>   * may contain zero bits.
>   */
> -static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
> +static bool nested_svm_merge_msrpm(struct kvm_vcpu *vcpu)
>  {
> +	struct vcpu_svm *svm = to_svm(vcpu);
>  	int i;
>  
>  	/*
> @@ -205,7 +206,7 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
>  	if (!svm->nested.force_msr_bitmap_recalc) {
>  		struct hv_vmcb_enlightenments *hve = &svm->nested.ctl.hv_enlightenments;
>  
> -		if (kvm_hv_hypercall_enabled(&svm->vcpu) &&
> +		if (kvm_hv_hypercall_enabled(vcpu) &&
>  		    hve->hv_enlightenments_control.msr_bitmap &&
>  		    (svm->nested.ctl.clean & BIT(HV_VMCB_NESTED_ENLIGHTENMENTS)))
>  			goto set_msrpm_base_pa;
> @@ -230,7 +231,7 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
>  
>  		offset = svm->nested.ctl.msrpm_base_pa + (p * 4);
>  
> -		if (kvm_vcpu_read_guest(&svm->vcpu, offset, &value, 4))
> +		if (kvm_vcpu_read_guest(vcpu, offset, &value, 4))
>  			return false;
>  
>  		svm->nested.msrpm[p] = svm->msrpm[p] | value;
> @@ -937,7 +938,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
>  	if (enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, true))
>  		goto out_exit_err;
>  
> -	if (nested_svm_vmrun_msrpm(svm))
> +	if (nested_svm_merge_msrpm(vcpu))
>  		goto out;
>  
>  out_exit_err:
> @@ -1819,13 +1820,11 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
>  
>  static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu)
>  {
> -	struct vcpu_svm *svm = to_svm(vcpu);
> -
>  	if (WARN_ON(!is_guest_mode(vcpu)))
>  		return true;
>  
>  	if (!vcpu->arch.pdptrs_from_userspace &&
> -	    !nested_npt_enabled(svm) && is_pae_paging(vcpu))
> +	    !nested_npt_enabled(to_svm(vcpu)) && is_pae_paging(vcpu))
>  		/*
>  		 * Reload the guest's PDPTRs since after a migration
>  		 * the guest CR3 might be restored prior to setting the nested
> @@ -1834,7 +1833,7 @@ static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu)
>  		if (CC(!load_pdptrs(vcpu, vcpu->arch.cr3)))
>  			return false;
>  
> -	if (!nested_svm_vmrun_msrpm(svm)) {
> +	if (!nested_svm_merge_msrpm(vcpu)) {
>  		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
>  		vcpu->run->internal.suberror =
>  			KVM_INTERNAL_ERROR_EMULATION;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index b55a60e79a73..2085259644b6 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3134,7 +3134,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>  		 *
>  		 * For nested:
>  		 * The handling of the MSR bitmap for L2 guests is done in
> -		 * nested_svm_vmrun_msrpm.
> +		 * nested_svm_merge_msrpm().
>  		 * We update the L1 MSR bit as well since it will end up
>  		 * touching the MSR anyway now.
>  		 */

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 05/28] KVM: x86: Use non-atomic bit ops to manipulate "shadow" MSR intercepts
  2025-05-29 23:39 ` [PATCH 05/28] KVM: x86: Use non-atomic bit ops to manipulate "shadow" MSR intercepts Sean Christopherson
@ 2025-06-03  2:53   ` Mi, Dapeng
  0 siblings, 0 replies; 60+ messages in thread
From: Mi, Dapeng @ 2025-06-03  2:53 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao


On 5/30/2025 7:39 AM, Sean Christopherson wrote:
> Manipulate the MSR bitmaps using non-atomic bit ops APIs (two underscores),
> as the bitmaps are per-vCPU and are only ever accessed while vcpu->mutex is
> held.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/svm.c | 12 ++++++------
>  arch/x86/kvm/vmx/vmx.c |  8 ++++----
>  2 files changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index d5d11cb0c987..b55a60e79a73 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -789,14 +789,14 @@ static void set_shadow_msr_intercept(struct kvm_vcpu *vcpu, u32 msr, int read,
>  
>  	/* Set the shadow bitmaps to the desired intercept states */
>  	if (read)
> -		set_bit(slot, svm->shadow_msr_intercept.read);
> +		__set_bit(slot, svm->shadow_msr_intercept.read);
>  	else
> -		clear_bit(slot, svm->shadow_msr_intercept.read);
> +		__clear_bit(slot, svm->shadow_msr_intercept.read);
>  
>  	if (write)
> -		set_bit(slot, svm->shadow_msr_intercept.write);
> +		__set_bit(slot, svm->shadow_msr_intercept.write);
>  	else
> -		clear_bit(slot, svm->shadow_msr_intercept.write);
> +		__clear_bit(slot, svm->shadow_msr_intercept.write);
>  }
>  
>  static bool valid_msr_intercept(u32 index)
> @@ -862,8 +862,8 @@ static void set_msr_interception_bitmap(struct kvm_vcpu *vcpu, u32 *msrpm,
>  	if (KVM_BUG_ON(offset == MSR_INVALID, vcpu->kvm))
>  		return;
>  
> -	read  ? clear_bit(bit_read,  &tmp) : set_bit(bit_read,  &tmp);
> -	write ? clear_bit(bit_write, &tmp) : set_bit(bit_write, &tmp);
> +	read  ? __clear_bit(bit_read,  &tmp) : __set_bit(bit_read,  &tmp);
> +	write ? __clear_bit(bit_write, &tmp) : __set_bit(bit_write, &tmp);
>  
>  	msrpm[offset] = tmp;
>  
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 9ff00ae9f05a..8f7fe04a1998 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4029,9 +4029,9 @@ void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
>  	idx = vmx_get_passthrough_msr_slot(msr);
>  	if (idx >= 0) {
>  		if (type & MSR_TYPE_R)
> -			clear_bit(idx, vmx->shadow_msr_intercept.read);
> +			__clear_bit(idx, vmx->shadow_msr_intercept.read);
>  		if (type & MSR_TYPE_W)
> -			clear_bit(idx, vmx->shadow_msr_intercept.write);
> +			__clear_bit(idx, vmx->shadow_msr_intercept.write);
>  	}
>  
>  	if ((type & MSR_TYPE_R) &&
> @@ -4071,9 +4071,9 @@ void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
>  	idx = vmx_get_passthrough_msr_slot(msr);
>  	if (idx >= 0) {
>  		if (type & MSR_TYPE_R)
> -			set_bit(idx, vmx->shadow_msr_intercept.read);
> +			__set_bit(idx, vmx->shadow_msr_intercept.read);
>  		if (type & MSR_TYPE_W)
> -			set_bit(idx, vmx->shadow_msr_intercept.write);
> +			__set_bit(idx, vmx->shadow_msr_intercept.write);
>  	}
>  
>  	if (type & MSR_TYPE_R)

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>




^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 16/28] KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter change
  2025-05-29 23:40 ` [PATCH 16/28] KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter change Sean Christopherson
  2025-05-30 23:38   ` Xin Li
@ 2025-06-03  3:52   ` Mi, Dapeng
  2025-06-05  6:59   ` Chao Gao
  2 siblings, 0 replies; 60+ messages in thread
From: Mi, Dapeng @ 2025-06-03  3:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao


On 5/30/2025 7:40 AM, Sean Christopherson wrote:
> On a userspace MSR filter change, recalculate all MSR intercepts using the
> filter-agnostic logic instead of maintaining a "shadow copy" of KVM's
> desired intercepts.  The shadow bitmaps add yet another point of failure,
> are confusing (e.g. what does "handled specially" mean!?!?), an eyesore,
> and a maintenance burden.
>
> Given that KVM *must* be able to recalculate the correct intercepts at any
> given time, and that MSR filter updates are not hot paths, there is zero
> benefit to maintaining the shadow bitmaps.
>
> Link: https://lore.kernel.org/all/aCdPbZiYmtni4Bjs@google.com
> Link: https://lore.kernel.org/all/20241126180253.GAZ0YNTdXH1UGeqsu6@fat_crate.local
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Xin Li <xin@zytor.com>
> Cc: Chao Gao <chao.gao@intel.com>
> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 184 +++++++++++------------------------------
>  arch/x86/kvm/vmx/vmx.h |   7 --
>  2 files changed, 47 insertions(+), 144 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 8f7fe04a1998..6ffa2b2b85ce 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -166,31 +166,6 @@ module_param(allow_smaller_maxphyaddr, bool, S_IRUGO);
>  	RTIT_STATUS_ERROR | RTIT_STATUS_STOPPED | \
>  	RTIT_STATUS_BYTECNT))
>  
> -/*
> - * List of MSRs that can be directly passed to the guest.
> - * In addition to these x2apic, PT and LBR MSRs are handled specially.
> - */
> -static u32 vmx_possible_passthrough_msrs[MAX_POSSIBLE_PASSTHROUGH_MSRS] = {
> -	MSR_IA32_SPEC_CTRL,
> -	MSR_IA32_PRED_CMD,
> -	MSR_IA32_FLUSH_CMD,
> -	MSR_IA32_TSC,
> -#ifdef CONFIG_X86_64
> -	MSR_FS_BASE,
> -	MSR_GS_BASE,
> -	MSR_KERNEL_GS_BASE,
> -	MSR_IA32_XFD,
> -	MSR_IA32_XFD_ERR,
> -#endif
> -	MSR_IA32_SYSENTER_CS,
> -	MSR_IA32_SYSENTER_ESP,
> -	MSR_IA32_SYSENTER_EIP,
> -	MSR_CORE_C1_RES,
> -	MSR_CORE_C3_RESIDENCY,
> -	MSR_CORE_C6_RESIDENCY,
> -	MSR_CORE_C7_RESIDENCY,
> -};
> -
>  /*
>   * These 2 parameters are used to config the controls for Pause-Loop Exiting:
>   * ple_gap:    upper bound on the amount of time between two successive
> @@ -672,40 +647,6 @@ static inline bool cpu_need_virtualize_apic_accesses(struct kvm_vcpu *vcpu)
>  	return flexpriority_enabled && lapic_in_kernel(vcpu);
>  }
>  
> -static int vmx_get_passthrough_msr_slot(u32 msr)
> -{
> -	int i;
> -
> -	switch (msr) {
> -	case 0x800 ... 0x8ff:
> -		/* x2APIC MSRs. These are handled in vmx_update_msr_bitmap_x2apic() */
> -		return -ENOENT;
> -	case MSR_IA32_RTIT_STATUS:
> -	case MSR_IA32_RTIT_OUTPUT_BASE:
> -	case MSR_IA32_RTIT_OUTPUT_MASK:
> -	case MSR_IA32_RTIT_CR3_MATCH:
> -	case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
> -		/* PT MSRs. These are handled in pt_update_intercept_for_msr() */
> -	case MSR_LBR_SELECT:
> -	case MSR_LBR_TOS:
> -	case MSR_LBR_INFO_0 ... MSR_LBR_INFO_0 + 31:
> -	case MSR_LBR_NHM_FROM ... MSR_LBR_NHM_FROM + 31:
> -	case MSR_LBR_NHM_TO ... MSR_LBR_NHM_TO + 31:
> -	case MSR_LBR_CORE_FROM ... MSR_LBR_CORE_FROM + 8:
> -	case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8:
> -		/* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */
> -		return -ENOENT;
> -	}
> -
> -	for (i = 0; i < ARRAY_SIZE(vmx_possible_passthrough_msrs); i++) {
> -		if (vmx_possible_passthrough_msrs[i] == msr)
> -			return i;
> -	}
> -
> -	WARN(1, "Invalid MSR %x, please adapt vmx_possible_passthrough_msrs[]", msr);
> -	return -ENOENT;
> -}
> -
>  struct vmx_uret_msr *vmx_find_uret_msr(struct vcpu_vmx *vmx, u32 msr)
>  {
>  	int i;
> @@ -4015,25 +3956,12 @@ void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  	unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
> -	int idx;
>  
>  	if (!cpu_has_vmx_msr_bitmap())
>  		return;
>  
>  	vmx_msr_bitmap_l01_changed(vmx);
>  
> -	/*
> -	 * Mark the desired intercept state in shadow bitmap, this is needed
> -	 * for resync when the MSR filters change.
> -	 */
> -	idx = vmx_get_passthrough_msr_slot(msr);
> -	if (idx >= 0) {
> -		if (type & MSR_TYPE_R)
> -			__clear_bit(idx, vmx->shadow_msr_intercept.read);
> -		if (type & MSR_TYPE_W)
> -			__clear_bit(idx, vmx->shadow_msr_intercept.write);
> -	}
> -

The patch looks good to me. Only a minor code refine on
vmx_disable_intercept_for_msr().

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9a83d5b174c8..e898ff296fd5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4041,23 +4041,19 @@ void vmx_disable_intercept_for_msr(struct kvm_vcpu
*vcpu, u32 msr, int type)
                        clear_bit(idx, vmx->shadow_msr_intercept.write);
        }

-       if ((type & MSR_TYPE_R) &&
-           !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ)) {
-               vmx_set_msr_bitmap_read(msr_bitmap, msr);
-               type &= ~MSR_TYPE_R;
+       if (type & MSR_TYPE_R) {
+               if (!kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ))
+                       vmx_set_msr_bitmap_read(msr_bitmap, msr);
+               else
+                       vmx_clear_msr_bitmap_read(msr_bitmap, msr);
        }

-       if ((type & MSR_TYPE_W) &&
-           !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE)) {
-               vmx_set_msr_bitmap_write(msr_bitmap, msr);
-               type &= ~MSR_TYPE_W;
+       if (type & MSR_TYPE_W) {
+               if (!kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE))
+                       vmx_set_msr_bitmap_write(msr_bitmap, msr);
+               else
+                       vmx_clear_msr_bitmap_write(msr_bitmap, msr);
        }
-
-       if (type & MSR_TYPE_R)
-               vmx_clear_msr_bitmap_read(msr_bitmap, msr);
-
-       if (type & MSR_TYPE_W)
-               vmx_clear_msr_bitmap_write(msr_bitmap, msr);
 }

It looks simpler and easily understood.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>



>  	if ((type & MSR_TYPE_R) &&
>  	    !kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ)) {
>  		vmx_set_msr_bitmap_read(msr_bitmap, msr);
> @@ -4057,25 +3985,12 @@ void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  	unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
> -	int idx;
>  
>  	if (!cpu_has_vmx_msr_bitmap())
>  		return;
>  
>  	vmx_msr_bitmap_l01_changed(vmx);
>  
> -	/*
> -	 * Mark the desired intercept state in shadow bitmap, this is needed
> -	 * for resync when the MSR filter changes.
> -	 */
> -	idx = vmx_get_passthrough_msr_slot(msr);
> -	if (idx >= 0) {
> -		if (type & MSR_TYPE_R)
> -			__set_bit(idx, vmx->shadow_msr_intercept.read);
> -		if (type & MSR_TYPE_W)
> -			__set_bit(idx, vmx->shadow_msr_intercept.write);
> -	}
> -
>  	if (type & MSR_TYPE_R)
>  		vmx_set_msr_bitmap_read(msr_bitmap, msr);
>  
> @@ -4159,35 +4074,59 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
>  	}
>  }
>  
> -void vmx_msr_filter_changed(struct kvm_vcpu *vcpu)
> +static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
>  {
> -	struct vcpu_vmx *vmx = to_vmx(vcpu);
> -	u32 i;
> -
>  	if (!cpu_has_vmx_msr_bitmap())
>  		return;
>  
> -	/*
> -	 * Redo intercept permissions for MSRs that KVM is passing through to
> -	 * the guest.  Disabling interception will check the new MSR filter and
> -	 * ensure that KVM enables interception if usersepace wants to filter
> -	 * the MSR.  MSRs that KVM is already intercepting don't need to be
> -	 * refreshed since KVM is going to intercept them regardless of what
> -	 * userspace wants.
> -	 */
> -	for (i = 0; i < ARRAY_SIZE(vmx_possible_passthrough_msrs); i++) {
> -		u32 msr = vmx_possible_passthrough_msrs[i];
> -
> -		if (!test_bit(i, vmx->shadow_msr_intercept.read))
> -			vmx_disable_intercept_for_msr(vcpu, msr, MSR_TYPE_R);
> -
> -		if (!test_bit(i, vmx->shadow_msr_intercept.write))
> -			vmx_disable_intercept_for_msr(vcpu, msr, MSR_TYPE_W);
> +	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_TSC, MSR_TYPE_R);
> +#ifdef CONFIG_X86_64
> +	vmx_disable_intercept_for_msr(vcpu, MSR_FS_BASE, MSR_TYPE_RW);
> +	vmx_disable_intercept_for_msr(vcpu, MSR_GS_BASE, MSR_TYPE_RW);
> +	vmx_disable_intercept_for_msr(vcpu, MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
> +#endif
> +	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
> +	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
> +	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
> +	if (kvm_cstate_in_guest(vcpu->kvm)) {
> +		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C1_RES, MSR_TYPE_R);
> +		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
> +		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
> +		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
>  	}
>  
>  	/* PT MSRs can be passed through iff PT is exposed to the guest. */
>  	if (vmx_pt_mode_is_host_guest())
>  		pt_update_intercept_for_msr(vcpu);
> +
> +	if (vcpu->arch.xfd_no_write_intercept)
> +		vmx_disable_intercept_for_msr(vcpu, MSR_IA32_XFD, MSR_TYPE_RW);
> +
> +
> +	vmx_set_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW,
> +				  !to_vmx(vcpu)->spec_ctrl);
> +
> +	if (kvm_cpu_cap_has(X86_FEATURE_XFD))
> +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_R,
> +					  !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD));
> +
> +	if (boot_cpu_has(X86_FEATURE_IBPB))
> +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
> +					  !guest_has_pred_cmd_msr(vcpu));
> +
> +	if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
> +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
> +					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
> +
> +	/*
> +	 * x2APIC and LBR MSR intercepts are modified on-demand and cannot be
> +	 * filtered by userspace.
> +	 */
> +}
> +
> +void vmx_msr_filter_changed(struct kvm_vcpu *vcpu)
> +{
> +	vmx_recalc_msr_intercepts(vcpu);
>  }
>  
>  static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
> @@ -7537,26 +7476,6 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu)
>  		evmcs->hv_enlightenments_control.msr_bitmap = 1;
>  	}
>  
> -	/* The MSR bitmap starts with all ones */
> -	bitmap_fill(vmx->shadow_msr_intercept.read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
> -	bitmap_fill(vmx->shadow_msr_intercept.write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
> -
> -	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_TSC, MSR_TYPE_R);
> -#ifdef CONFIG_X86_64
> -	vmx_disable_intercept_for_msr(vcpu, MSR_FS_BASE, MSR_TYPE_RW);
> -	vmx_disable_intercept_for_msr(vcpu, MSR_GS_BASE, MSR_TYPE_RW);
> -	vmx_disable_intercept_for_msr(vcpu, MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
> -#endif
> -	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
> -	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
> -	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
> -	if (kvm_cstate_in_guest(vcpu->kvm)) {
> -		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C1_RES, MSR_TYPE_R);
> -		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
> -		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
> -		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
> -	}
> -
>  	vmx->loaded_vmcs = &vmx->vmcs01;
>  
>  	if (cpu_need_virtualize_apic_accesses(vcpu)) {
> @@ -7842,18 +7761,6 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  		}
>  	}
>  
> -	if (kvm_cpu_cap_has(X86_FEATURE_XFD))
> -		vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_R,
> -					  !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD));
> -
> -	if (boot_cpu_has(X86_FEATURE_IBPB))
> -		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
> -					  !guest_has_pred_cmd_msr(vcpu));
> -
> -	if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
> -		vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
> -					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
> -
>  	set_cr4_guest_host_mask(vmx);
>  
>  	vmx_write_encls_bitmap(vcpu, NULL);
> @@ -7869,6 +7776,9 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  		vmx->msr_ia32_feature_control_valid_bits &=
>  			~FEAT_CTL_SGX_LC_ENABLED;
>  
> +	/* Recalc MSR interception to account for feature changes. */
> +	vmx_recalc_msr_intercepts(vcpu);
> +
>  	/* Refresh #PF interception to account for MAXPHYADDR changes. */
>  	vmx_update_exception_bitmap(vcpu);
>  }
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 0afe97e3478f..a26fe3d9e1d2 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -294,13 +294,6 @@ struct vcpu_vmx {
>  	struct pt_desc pt_desc;
>  	struct lbr_desc lbr_desc;
>  
> -	/* Save desired MSR intercept (read: pass-through) state */
> -#define MAX_POSSIBLE_PASSTHROUGH_MSRS	16
> -	struct {
> -		DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
> -		DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
> -	} shadow_msr_intercept;
> -
>  	/* ve_info must be page aligned. */
>  	struct vmx_ve_information *ve_info;
>  };

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 28/28] KVM: selftests: Verify KVM disable interception (for userspace) on filter change
  2025-05-29 23:40 ` [PATCH 28/28] KVM: selftests: Verify KVM disable interception (for userspace) on filter change Sean Christopherson
@ 2025-06-03  5:47   ` Mi, Dapeng
  2025-06-04  4:43     ` Manali Shukla
  0 siblings, 1 reply; 60+ messages in thread
From: Mi, Dapeng @ 2025-06-03  5:47 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao


On 5/30/2025 7:40 AM, Sean Christopherson wrote:
> Re-read MSR_{FS,GS}_BASE after restoring the "allow everything" userspace
> MSR filter to verify that KVM stops forwarding exits to userspace.  This
> can also be used in conjunction with manual verification (e.g. printk) to
> ensure KVM is correctly updating the MSR bitmaps consumed by hardware.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  tools/testing/selftests/kvm/x86/userspace_msr_exit_test.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/x86/userspace_msr_exit_test.c b/tools/testing/selftests/kvm/x86/userspace_msr_exit_test.c
> index 32b2794b78fe..8463a9956410 100644
> --- a/tools/testing/selftests/kvm/x86/userspace_msr_exit_test.c
> +++ b/tools/testing/selftests/kvm/x86/userspace_msr_exit_test.c
> @@ -343,6 +343,12 @@ static void guest_code_permission_bitmap(void)
>  	data = test_rdmsr(MSR_GS_BASE);
>  	GUEST_ASSERT(data == MSR_GS_BASE);
>  
> +	/* Access the MSRs again to ensure KVM has disabled interception.*/
> +	data = test_rdmsr(MSR_FS_BASE);
> +	GUEST_ASSERT(data != MSR_FS_BASE);
> +	data = test_rdmsr(MSR_GS_BASE);
> +	GUEST_ASSERT(data != MSR_GS_BASE);
> +
>  	GUEST_DONE();
>  }
>  
> @@ -682,6 +688,8 @@ KVM_ONE_VCPU_TEST(user_msr, msr_permission_bitmap, guest_code_permission_bitmap)
>  		    "Expected ucall state to be UCALL_SYNC.");
>  	vm_ioctl(vm, KVM_X86_SET_MSR_FILTER, &filter_gs);
>  	run_guest_then_process_rdmsr(vcpu, MSR_GS_BASE);
> +
> +	vm_ioctl(vm, KVM_X86_SET_MSR_FILTER, &filter_allow);
>  	run_guest_then_process_ucall_done(vcpu);
>  }
>  

Test passes on Intel platform (Sapphire Rapids).

Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/28] KVM: SVM: Don't BUG if setting up the MSR intercept bitmaps fails
  2025-05-29 23:39 ` [PATCH 01/28] KVM: SVM: Don't BUG if setting up the MSR intercept bitmaps fails Sean Christopherson
@ 2025-06-03  7:17   ` Chao Gao
  2025-06-03 15:28     ` Sean Christopherson
  0 siblings, 1 reply; 60+ messages in thread
From: Chao Gao @ 2025-06-03  7:17 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Borislav Petkov, Xin Li,
	Dapeng Mi

On Thu, May 29, 2025 at 04:39:46PM -0700, Sean Christopherson wrote:
>WARN and reject module loading if there is a problem with KVM's MSR
>interception bitmaps.  Panicking the host in this situation is inexcusable
>since it is trivially easy to propagate the error up the stack.
>
>Signed-off-by: Sean Christopherson <seanjc@google.com>
>---
> arch/x86/kvm/svm/svm.c | 27 +++++++++++++++------------
> 1 file changed, 15 insertions(+), 12 deletions(-)
>
>diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>index 0ad1a6d4fb6d..bd75ff8e4f20 100644
>--- a/arch/x86/kvm/svm/svm.c
>+++ b/arch/x86/kvm/svm/svm.c
>@@ -945,7 +945,7 @@ static void svm_msr_filter_changed(struct kvm_vcpu *vcpu)
> 	}
> }
> 
>-static void add_msr_offset(u32 offset)
>+static int add_msr_offset(u32 offset)
> {
> 	int i;
> 
>@@ -953,7 +953,7 @@ static void add_msr_offset(u32 offset)
> 
> 		/* Offset already in list? */
> 		if (msrpm_offsets[i] == offset)
>-			return;
>+			return 0;
> 
> 		/* Slot used by another offset? */
> 		if (msrpm_offsets[i] != MSR_INVALID)
>@@ -962,17 +962,13 @@ static void add_msr_offset(u32 offset)
> 		/* Add offset to list */
> 		msrpm_offsets[i] = offset;
> 
>-		return;
>+		return 0;
> 	}
> 
>-	/*
>-	 * If this BUG triggers the msrpm_offsets table has an overflow. Just
>-	 * increase MSRPM_OFFSETS in this case.
>-	 */
>-	BUG();
>+	return -EIO;

Would -ENOSPC be more appropriate here? And, instead of returning an integer,
using a boolean might be better since the error code isn't propagated upwards.

> }
> 
>-static void init_msrpm_offsets(void)
>+static int init_msrpm_offsets(void)
> {
> 	int i;
> 
>@@ -982,10 +978,13 @@ static void init_msrpm_offsets(void)
> 		u32 offset;
> 
> 		offset = svm_msrpm_offset(direct_access_msrs[i].index);
>-		BUG_ON(offset == MSR_INVALID);
>+		if (WARN_ON(offset == MSR_INVALID))
>+			return -EIO;
> 
>-		add_msr_offset(offset);
>+		if (WARN_ON_ONCE(add_msr_offset(offset)))
>+			return -EIO;
> 	}
>+	return 0;
> }
> 
> void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
>@@ -5511,7 +5510,11 @@ static __init int svm_hardware_setup(void)
> 	memset(iopm_va, 0xff, PAGE_SIZE * (1 << order));
> 	iopm_base = __sme_page_pa(iopm_pages);
> 
>-	init_msrpm_offsets();
>+	r = init_msrpm_offsets();
>+	if (r) {
>+		__free_pages(__sme_pa_to_page(iopm_base), get_order(IOPM_SIZE));

__free_pages(iopm_pages, order);

And we can move init_msrpm_offsets() above the allocation of iopm_pages to
avoid the need for rewinding. But I don't have a strong opinion on this, as
it goes beyond a simple change to the return type.

>+		return r;
>+	}
> 
> 	kvm_caps.supported_xcr0 &= ~(XFEATURE_MASK_BNDREGS |
> 				     XFEATURE_MASK_BNDCSR);
>-- 
>2.49.0.1204.g71687c7c1d-goog
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 02/28] KVM: SVM: Tag MSR bitmap initialization helpers with __init
  2025-05-29 23:39 ` [PATCH 02/28] KVM: SVM: Tag MSR bitmap initialization helpers with __init Sean Christopherson
@ 2025-06-03  7:18   ` Chao Gao
  0 siblings, 0 replies; 60+ messages in thread
From: Chao Gao @ 2025-06-03  7:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Borislav Petkov, Xin Li,
	Dapeng Mi

On Thu, May 29, 2025 at 04:39:47PM -0700, Sean Christopherson wrote:
>Tag init_msrpm_offsets() and add_msr_offset() with __init, as they're used
>only during hardware setup to map potential passthrough MSRs to offsets in
>the bitmap.
>
>Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Chao Gao <chao.gao@intel.com>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 03/28] KVM: SVM: Use ARRAY_SIZE() to iterate over direct_access_msrs
  2025-05-29 23:39 ` [PATCH 03/28] KVM: SVM: Use ARRAY_SIZE() to iterate over direct_access_msrs Sean Christopherson
@ 2025-06-03  7:57   ` Chao Gao
  0 siblings, 0 replies; 60+ messages in thread
From: Chao Gao @ 2025-06-03  7:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Borislav Petkov, Xin Li,
	Dapeng Mi

On Thu, May 29, 2025 at 04:39:48PM -0700, Sean Christopherson wrote:
>Drop the unnecessary and dangerous value-terminated behavior of
>direct_access_msrs, and simply iterate over the actual size of the array.
>The use in svm_set_x2apic_msr_interception() is especially sketchy, as it
>relies on unused capacity being zero-initialized, and '0' being outside
>the range of x2APIC MSRs.
>
>To ensure the array and shadow_msr_intercept stay synchronized, simply
>assert that their sizes are identical (note the six 64-bit-only MSRs).
>
>Note, direct_access_msrs will soon be removed entirely; keeping the assert
>synchronized with the array isn't expected to be along-term maintenance
>burden.
>
>Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Chao Gao <chao.gao@intel.com>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 04/28] KVM: SVM: Kill the VM instead of the host if MSR interception is buggy
  2025-05-29 23:39 ` [PATCH 04/28] KVM: SVM: Kill the VM instead of the host if MSR interception is buggy Sean Christopherson
@ 2025-06-03  8:06   ` Chao Gao
  2025-06-03 13:26     ` Sean Christopherson
  0 siblings, 1 reply; 60+ messages in thread
From: Chao Gao @ 2025-06-03  8:06 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Borislav Petkov, Xin Li,
	Dapeng Mi

On Thu, May 29, 2025 at 04:39:49PM -0700, Sean Christopherson wrote:
>WARN and kill the VM instead of panicking the host if KVM attempts to set
>or query MSR interception for an unsupported MSR.  Accessing the MSR
>interception bitmaps only meaningfully affects post-VMRUN behavior, and
>KVM_BUG_ON() is guaranteed to prevent the current vCPU from doing VMRUN,
>i.e. there is no need to panic the entire host.
>
>Signed-off-by: Sean Christopherson <seanjc@google.com>
>---
> arch/x86/kvm/svm/svm.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
>diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>index 36a99b87a47f..d5d11cb0c987 100644
>--- a/arch/x86/kvm/svm/svm.c
>+++ b/arch/x86/kvm/svm/svm.c
>@@ -827,7 +827,8 @@ static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
> 	bit_write = 2 * (msr & 0x0f) + 1;
> 	tmp       = msrpm[offset];

not an issue with this patch. but shouldn't the offset be checked against
MSR_INVALID before being used to index msrpm[]?

> 
>-	BUG_ON(offset == MSR_INVALID);
>+	if (KVM_BUG_ON(offset == MSR_INVALID, vcpu->kvm))
>+		return false;
> 
> 	return test_bit(bit_write, &tmp);
> }
>@@ -858,7 +859,8 @@ static void set_msr_interception_bitmap(struct kvm_vcpu *vcpu, u32 *msrpm,
> 	bit_write = 2 * (msr & 0x0f) + 1;
> 	tmp       = msrpm[offset];
> 
>-	BUG_ON(offset == MSR_INVALID);
>+	if (KVM_BUG_ON(offset == MSR_INVALID, vcpu->kvm))
>+		return;
> 
> 	read  ? clear_bit(bit_read,  &tmp) : set_bit(bit_read,  &tmp);
> 	write ? clear_bit(bit_write, &tmp) : set_bit(bit_write, &tmp);
>-- 
>2.49.0.1204.g71687c7c1d-goog
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 04/28] KVM: SVM: Kill the VM instead of the host if MSR interception is buggy
  2025-06-03  8:06   ` Chao Gao
@ 2025-06-03 13:26     ` Sean Christopherson
  0 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-06-03 13:26 UTC (permalink / raw)
  To: Chao Gao
  Cc: Paolo Bonzini, kvm, linux-kernel, Borislav Petkov, Xin Li,
	Dapeng Mi

On Tue, Jun 03, 2025, Chao Gao wrote:
> On Thu, May 29, 2025 at 04:39:49PM -0700, Sean Christopherson wrote:
> >WARN and kill the VM instead of panicking the host if KVM attempts to set
> >or query MSR interception for an unsupported MSR.  Accessing the MSR
> >interception bitmaps only meaningfully affects post-VMRUN behavior, and
> >KVM_BUG_ON() is guaranteed to prevent the current vCPU from doing VMRUN,
> >i.e. there is no need to panic the entire host.
> >
> >Signed-off-by: Sean Christopherson <seanjc@google.com>
> >---
> > arch/x86/kvm/svm/svm.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> >diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> >index 36a99b87a47f..d5d11cb0c987 100644
> >--- a/arch/x86/kvm/svm/svm.c
> >+++ b/arch/x86/kvm/svm/svm.c
> >@@ -827,7 +827,8 @@ static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
> > 	bit_write = 2 * (msr & 0x0f) + 1;
> > 	tmp       = msrpm[offset];
> 
> not an issue with this patch. but shouldn't the offset be checked against
> MSR_INVALID before being used to index msrpm[]?

Oof, yes.  To some extent, it _is_ a problem with this patch, because using
KVM_BUG_ON() makes the OOB access less fatal.  Though it's just a load, and code
that should be unreachable, but still worth cleaning up.

Anyways, I'll place the KVM_BUG_ON()s in the right location as part of this patch.

Thanks!

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/28] KVM: SVM: Don't BUG if setting up the MSR intercept bitmaps fails
  2025-06-03  7:17   ` Chao Gao
@ 2025-06-03 15:28     ` Sean Christopherson
  0 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-06-03 15:28 UTC (permalink / raw)
  To: Chao Gao
  Cc: Paolo Bonzini, kvm, linux-kernel, Borislav Petkov, Xin Li,
	Dapeng Mi

On Tue, Jun 03, 2025, Chao Gao wrote:
> On Thu, May 29, 2025 at 04:39:46PM -0700, Sean Christopherson wrote:
> >WARN and reject module loading if there is a problem with KVM's MSR
> >interception bitmaps.  Panicking the host in this situation is inexcusable
> >since it is trivially easy to propagate the error up the stack.
> >
> >Signed-off-by: Sean Christopherson <seanjc@google.com>
> >---
> > arch/x86/kvm/svm/svm.c | 27 +++++++++++++++------------
> > 1 file changed, 15 insertions(+), 12 deletions(-)
> >
> >diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> >index 0ad1a6d4fb6d..bd75ff8e4f20 100644
> >--- a/arch/x86/kvm/svm/svm.c
> >+++ b/arch/x86/kvm/svm/svm.c
> >@@ -945,7 +945,7 @@ static void svm_msr_filter_changed(struct kvm_vcpu *vcpu)
> > 	}
> > }
> > 
> >-static void add_msr_offset(u32 offset)
> >+static int add_msr_offset(u32 offset)
> > {
> > 	int i;
> > 
> >@@ -953,7 +953,7 @@ static void add_msr_offset(u32 offset)
> > 
> > 		/* Offset already in list? */
> > 		if (msrpm_offsets[i] == offset)
> >-			return;
> >+			return 0;
> > 
> > 		/* Slot used by another offset? */
> > 		if (msrpm_offsets[i] != MSR_INVALID)
> >@@ -962,17 +962,13 @@ static void add_msr_offset(u32 offset)
> > 		/* Add offset to list */
> > 		msrpm_offsets[i] = offset;
> > 
> >-		return;
> >+		return 0;
> > 	}
> > 
> >-	/*
> >-	 * If this BUG triggers the msrpm_offsets table has an overflow. Just
> >-	 * increase MSRPM_OFFSETS in this case.
> >-	 */
> >-	BUG();
> >+	return -EIO;
> 
> Would -ENOSPC be more appropriate here?

Hmm, yeah.  IIRC, I initially had -ENOSPC, but switched to -EIO to be consistent
with how KVM typically reports its internal issues during module load.  But as
you point out, the error code isn't propagated up the stack.

> And, instead of returning an integer, using a boolean might be better since
> the error code isn't propagated upwards.

I strongly prefer to return 0/-errno in any function that isn't a clear cut
predicate, e.g. "return -EIO" is very obviously an error path (and -ENOSPC is
even better), whereas understanding "return false" requires reading the rest of
the function (and IMO this code isn't all that intuitive).

> > }
> > 
> >-static void init_msrpm_offsets(void)
> >+static int init_msrpm_offsets(void)
> > {
> > 	int i;
> > 
> >@@ -982,10 +978,13 @@ static void init_msrpm_offsets(void)
> > 		u32 offset;
> > 
> > 		offset = svm_msrpm_offset(direct_access_msrs[i].index);
> >-		BUG_ON(offset == MSR_INVALID);
> >+		if (WARN_ON(offset == MSR_INVALID))
> >+			return -EIO;
> > 
> >-		add_msr_offset(offset);
> >+		if (WARN_ON_ONCE(add_msr_offset(offset)))
> >+			return -EIO;
> > 	}
> >+	return 0;
> > }
> > 
> > void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
> >@@ -5511,7 +5510,11 @@ static __init int svm_hardware_setup(void)
> > 	memset(iopm_va, 0xff, PAGE_SIZE * (1 << order));
> > 	iopm_base = __sme_page_pa(iopm_pages);
> > 
> >-	init_msrpm_offsets();
> >+	r = init_msrpm_offsets();
> >+	if (r) {
> >+		__free_pages(__sme_pa_to_page(iopm_base), get_order(IOPM_SIZE));
> 
> __free_pages(iopm_pages, order);

Oh, yeah, good call.

> And we can move init_msrpm_offsets() above the allocation of iopm_pages to
> avoid the need for rewinding. But I don't have a strong opinion on this, as
> it goes beyond a simple change to the return type.

I considered that too.  I decided not to bother since this code goes away by the
end of the series.  I thought about skipping this patch entirely, but I kept it
since it should be easy to backport, e.g. if someone wants make their tree more
developer friendly, and on the off chance the patches later in the series need to
be droppped or reverted.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 28/28] KVM: selftests: Verify KVM disable interception (for userspace) on filter change
  2025-06-03  5:47   ` Mi, Dapeng
@ 2025-06-04  4:43     ` Manali Shukla
  0 siblings, 0 replies; 60+ messages in thread
From: Manali Shukla @ 2025-06-04  4:43 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Mi, Dapeng

On 6/3/2025 11:17 AM, Mi, Dapeng wrote:
> 
> On 5/30/2025 7:40 AM, Sean Christopherson wrote:
>> Re-read MSR_{FS,GS}_BASE after restoring the "allow everything" userspace
>> MSR filter to verify that KVM stops forwarding exits to userspace.  This
>> can also be used in conjunction with manual verification (e.g. printk) to
>> ensure KVM is correctly updating the MSR bitmaps consumed by hardware.
>>
>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>> ---
>>  tools/testing/selftests/kvm/x86/userspace_msr_exit_test.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/tools/testing/selftests/kvm/x86/userspace_msr_exit_test.c b/tools/testing/selftests/kvm/x86/userspace_msr_exit_test.c
>> index 32b2794b78fe..8463a9956410 100644
>> --- a/tools/testing/selftests/kvm/x86/userspace_msr_exit_test.c
>> +++ b/tools/testing/selftests/kvm/x86/userspace_msr_exit_test.c
>> @@ -343,6 +343,12 @@ static void guest_code_permission_bitmap(void)
>>  	data = test_rdmsr(MSR_GS_BASE);
>>  	GUEST_ASSERT(data == MSR_GS_BASE);
>>  
>> +	/* Access the MSRs again to ensure KVM has disabled interception.*/
>> +	data = test_rdmsr(MSR_FS_BASE);
>> +	GUEST_ASSERT(data != MSR_FS_BASE);
>> +	data = test_rdmsr(MSR_GS_BASE);
>> +	GUEST_ASSERT(data != MSR_GS_BASE);
>> +
>>  	GUEST_DONE();
>>  }
>>  
>> @@ -682,6 +688,8 @@ KVM_ONE_VCPU_TEST(user_msr, msr_permission_bitmap, guest_code_permission_bitmap)
>>  		    "Expected ucall state to be UCALL_SYNC.");
>>  	vm_ioctl(vm, KVM_X86_SET_MSR_FILTER, &filter_gs);
>>  	run_guest_then_process_rdmsr(vcpu, MSR_GS_BASE);
>> +
>> +	vm_ioctl(vm, KVM_X86_SET_MSR_FILTER, &filter_allow);
>>  	run_guest_then_process_ucall_done(vcpu);
>>  }
>>  
> 
> Test passes on Intel platform (Sapphire Rapids).
> 
> Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> 
> 

This test passes on AMD platform (Genoa)

Tested-by: Manali Shukla <Manali.Shukla@amd.com>


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 08/28] KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps
  2025-05-29 23:39 ` [PATCH 08/28] KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps Sean Christopherson
@ 2025-06-04  5:43   ` Chao Gao
  2025-06-04 13:56     ` Sean Christopherson
  2025-06-04 15:35   ` Paolo Bonzini
  2025-06-04 15:35   ` Paolo Bonzini
  2 siblings, 1 reply; 60+ messages in thread
From: Chao Gao @ 2025-06-04  5:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Borislav Petkov, Xin Li,
	Dapeng Mi

On Thu, May 29, 2025 at 04:39:53PM -0700, Sean Christopherson wrote:
>Use a dedicated array of MSRPM offsets to merge L0 and L1 bitmaps, i.e. to
>merge KVM's vmcb01 bitmap with L1's vmcb12 bitmap.  This will eventually
>allow for the removal of direct_access_msrs, as the only path where
>tracking the offsets is truly justified is the merge for nested SVM, where
>merging in chunks is an easy way to batch uaccess reads/writes.
>
>Opportunistically omit the x2APIC MSRs from the merge-specific array
>instead of filtering them out at runtime.
>
>Note, disabling interception of XSS, EFER, PAT, GHCB, and TSC_AUX is
>mutually exclusive with nested virtualization, as KVM passes through the
>MSRs only for SEV-ES guests, and KVM doesn't support nested virtualization
>for SEV+ guests.  Defer removing those MSRs to a future cleanup in order
>to make this refactoring as benign as possible.
>
>Signed-off-by: Sean Christopherson <seanjc@google.com>
>---
> arch/x86/kvm/svm/nested.c | 72 +++++++++++++++++++++++++++++++++------
> arch/x86/kvm/svm/svm.c    |  4 +++
> arch/x86/kvm/svm/svm.h    |  2 ++
> 3 files changed, 67 insertions(+), 11 deletions(-)
>
>diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
>index 89a77f0f1cc8..e53020939e60 100644
>--- a/arch/x86/kvm/svm/nested.c
>+++ b/arch/x86/kvm/svm/nested.c
>@@ -184,6 +184,64 @@ void recalc_intercepts(struct vcpu_svm *svm)
> 	}
> }
> 
>+static int nested_svm_msrpm_merge_offsets[9] __ro_after_init;

I understand how the array size (i.e., 9) was determined :). But, adding a
comment explaining this would be quite helpful 

>+static int nested_svm_nr_msrpm_merge_offsets __ro_after_init;
>+
>+int __init nested_svm_init_msrpm_merge_offsets(void)
>+{
>+	const u32 merge_msrs[] = {
>+		MSR_STAR,
>+		MSR_IA32_SYSENTER_CS,
>+		MSR_IA32_SYSENTER_EIP,
>+		MSR_IA32_SYSENTER_ESP,
>+	#ifdef CONFIG_X86_64
>+		MSR_GS_BASE,
>+		MSR_FS_BASE,
>+		MSR_KERNEL_GS_BASE,
>+		MSR_LSTAR,
>+		MSR_CSTAR,
>+		MSR_SYSCALL_MASK,
>+	#endif
>+		MSR_IA32_SPEC_CTRL,
>+		MSR_IA32_PRED_CMD,
>+		MSR_IA32_FLUSH_CMD,

MSR_IA32_DEBUGCTLMSR is missing, but it's benign since it shares the same
offset as MSR_IA32_LAST* below.

I'm a bit concerned that we might overlook adding new MSRs to this array in the
future, which could lead to tricky bugs. But I have no idea how to avoid this.
Removing this array and iterating over direct_access_msrs[] directly is an
option but it contradicts this series as one of its purposes is to remove
direct_access_msrs[].

>+		MSR_IA32_LASTBRANCHFROMIP,
>+		MSR_IA32_LASTBRANCHTOIP,
>+		MSR_IA32_LASTINTFROMIP,
>+		MSR_IA32_LASTINTTOIP,
>+
>+		MSR_IA32_XSS,
>+		MSR_EFER,
>+		MSR_IA32_CR_PAT,
>+		MSR_AMD64_SEV_ES_GHCB,
>+		MSR_TSC_AUX,
>+	};


> 
> 		if (kvm_vcpu_read_guest(vcpu, offset, &value, 4))
>diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>index 1c70293400bc..84dd1f220986 100644
>--- a/arch/x86/kvm/svm/svm.c
>+++ b/arch/x86/kvm/svm/svm.c
>@@ -5689,6 +5689,10 @@ static int __init svm_init(void)
> 	if (!kvm_is_svm_supported())
> 		return -EOPNOTSUPP;
> 
>+	r = nested_svm_init_msrpm_merge_offsets();
>+	if (r)
>+		return r;
>+

If the offset array is used for nested virtualization only, how about guarding
this with nested virtualization? For example, in svm_hardware_setup():

	if (nested) {
		r = nested_svm_init_msrpm_merge_offsets();
		if (r)
			goto err;

		pr_info("Nested Virtualization enabled\n");
		kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
	}


> 	r = kvm_x86_vendor_init(&svm_init_ops);
> 	if (r)
> 		return r;
>diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
>index 909b9af6b3c1..0a8041d70994 100644
>--- a/arch/x86/kvm/svm/svm.h
>+++ b/arch/x86/kvm/svm/svm.h
>@@ -686,6 +686,8 @@ static inline bool nested_exit_on_nmi(struct vcpu_svm *svm)
> 	return vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_NMI);
> }
> 
>+int __init nested_svm_init_msrpm_merge_offsets(void);
>+
> int enter_svm_guest_mode(struct kvm_vcpu *vcpu,
> 			 u64 vmcb_gpa, struct vmcb *vmcb12, bool from_vmrun);
> void svm_leave_nested(struct kvm_vcpu *vcpu);
>-- 
>2.49.0.1204.g71687c7c1d-goog
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 08/28] KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps
  2025-06-04  5:43   ` Chao Gao
@ 2025-06-04 13:56     ` Sean Christopherson
  2025-06-05  8:08       ` Chao Gao
  0 siblings, 1 reply; 60+ messages in thread
From: Sean Christopherson @ 2025-06-04 13:56 UTC (permalink / raw)
  To: Chao Gao
  Cc: Paolo Bonzini, kvm, linux-kernel, Borislav Petkov, Xin Li,
	Dapeng Mi

On Wed, Jun 04, 2025, Chao Gao wrote:
> >diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> >index 89a77f0f1cc8..e53020939e60 100644
> >--- a/arch/x86/kvm/svm/nested.c
> >+++ b/arch/x86/kvm/svm/nested.c
> >@@ -184,6 +184,64 @@ void recalc_intercepts(struct vcpu_svm *svm)
> > 	}
> > }
> > 
> >+static int nested_svm_msrpm_merge_offsets[9] __ro_after_init;
> 
> I understand how the array size (i.e., 9) was determined :). But, adding a
> comment explaining this would be quite helpful 

Yeah, I'll write a comment explaining what all is going on.

> >+static int nested_svm_nr_msrpm_merge_offsets __ro_after_init;
> >+
> >+int __init nested_svm_init_msrpm_merge_offsets(void)
> >+{
> >+	const u32 merge_msrs[] = {
> >+		MSR_STAR,
> >+		MSR_IA32_SYSENTER_CS,
> >+		MSR_IA32_SYSENTER_EIP,
> >+		MSR_IA32_SYSENTER_ESP,
> >+	#ifdef CONFIG_X86_64
> >+		MSR_GS_BASE,
> >+		MSR_FS_BASE,
> >+		MSR_KERNEL_GS_BASE,
> >+		MSR_LSTAR,
> >+		MSR_CSTAR,
> >+		MSR_SYSCALL_MASK,
> >+	#endif
> >+		MSR_IA32_SPEC_CTRL,
> >+		MSR_IA32_PRED_CMD,
> >+		MSR_IA32_FLUSH_CMD,
> 
> MSR_IA32_DEBUGCTLMSR is missing, but it's benign since it shares the same
> offset as MSR_IA32_LAST* below.

Gah.  Once all is said and done, it's not supposed to be in this list because its
only passed through for SEV-ES guests, but my intent was to keep it in this patch,
and then removeit along with XSS, EFER, PAT, GHCB, and TSC_AUX in the next.

This made me realize that merging in chunks has a novel flaw: if there is an MSR
that KVM *doesn't* want to give L2 access to, then KVM needs to ensure its offset
isn't processed, i.e. that there isn't a "collision" with another MSR.  I don't
think it's a major concern, because the x2APIC MSRs are nicely isolated, and off
the top of my head I can't think of any MSRs that fall into that bucket.  But it's
something worth calling out in a comment, at least.

> I'm a bit concerned that we might overlook adding new MSRs to this array in the
> future, which could lead to tricky bugs. But I have no idea how to avoid this.

Me either.  One option would be to use wrapper macros for the interception helpers
to fill an array at compile time (similar to how kernel exception fixup works),
but (a) it requires modifications to the linker scripts to generate the arrays,
(b) is quite confusing/complex, and (c) it doesn't actually solve the problem,
it just inverts the problem.  Because as above, there are MSRs we *don't* want
to expose to L2, and so we'd need to add yet more code to filter those out.

And the failure mode for the inverted case would be worse, because if we missed
an MSR, KVM would incorrectly give L2 access to an MSR.  Whereas with the current
approach, a missed MSR simply means L2 gets a slower path; but functionally, it's
fine (and it has to be fine, because KVM can't force L1 to disable interception).

> Removing this array and iterating over direct_access_msrs[] directly is an
> option but it contradicts this series as one of its purposes is to remove
> direct_access_msrs[].

Using direct_access_msrs[] wouldn't solve the problem either, because nothing
ensures _that_ array is up-to-date either.

The best idea I have is to add a test that verifies the MSRs that are supposed
to be passed through actually are passed through.  It's still effectively manual
checking, but it would require us to screw up twice, i.e. forget to update both
the array and the test.  The problem is that there's no easy and foolproof way to
verify that an MSR is passed through in a selftest.

E.g. it would be possible to precisely detect L2 => L0 MSR exits via a BPF program,
but incorporating a BPF program into a KVM selftest just to detect exits isn't
something I'm keen on doing (or maintaining).

Using the "exits" stat isn't foolproof due to NMIs (IRQs can be accounted for via
"irq_exits", and to a lesser extent page faults (especially if shadow paging is
in use).

If KVM provided an "msr_exits" stats, it would be trivial to verify interception
via a selftest, but I can't quite convince myself that MSR exits are interesting
enough to warrant their own stat.

> >+		MSR_IA32_LASTBRANCHFROMIP,
> >+		MSR_IA32_LASTBRANCHTOIP,
> >+		MSR_IA32_LASTINTFROMIP,
> >+		MSR_IA32_LASTINTTOIP,
> >+
> >+		MSR_IA32_XSS,
> >+		MSR_EFER,
> >+		MSR_IA32_CR_PAT,
> >+		MSR_AMD64_SEV_ES_GHCB,
> >+		MSR_TSC_AUX,
> >+	};
> 
> 
> > 
> > 		if (kvm_vcpu_read_guest(vcpu, offset, &value, 4))
> >diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> >index 1c70293400bc..84dd1f220986 100644
> >--- a/arch/x86/kvm/svm/svm.c
> >+++ b/arch/x86/kvm/svm/svm.c
> >@@ -5689,6 +5689,10 @@ static int __init svm_init(void)
> > 	if (!kvm_is_svm_supported())
> > 		return -EOPNOTSUPP;
> > 
> >+	r = nested_svm_init_msrpm_merge_offsets();
> >+	if (r)
> >+		return r;
> >+
> 
> If the offset array is used for nested virtualization only, how about guarding
> this with nested virtualization? For example, in svm_hardware_setup():

Good idea, I'll do that in the next version.

> 	if (nested) {
> 		r = nested_svm_init_msrpm_merge_offsets();
> 		if (r)
> 			goto err;
> 
> 		pr_info("Nested Virtualization enabled\n");
> 		kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
> 	}
> 
> 
> > 	r = kvm_x86_vendor_init(&svm_init_ops);
> > 	if (r)
> > 		return r;
> >diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> >index 909b9af6b3c1..0a8041d70994 100644
> >--- a/arch/x86/kvm/svm/svm.h
> >+++ b/arch/x86/kvm/svm/svm.h
> >@@ -686,6 +686,8 @@ static inline bool nested_exit_on_nmi(struct vcpu_svm *svm)
> > 	return vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_NMI);
> > }
> > 
> >+int __init nested_svm_init_msrpm_merge_offsets(void);
> >+
> > int enter_svm_guest_mode(struct kvm_vcpu *vcpu,
> > 			 u64 vmcb_gpa, struct vmcb *vmcb12, bool from_vmrun);
> > void svm_leave_nested(struct kvm_vcpu *vcpu);
> >-- 
> >2.49.0.1204.g71687c7c1d-goog
> >

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 08/28] KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps
  2025-05-29 23:39 ` [PATCH 08/28] KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps Sean Christopherson
  2025-06-04  5:43   ` Chao Gao
@ 2025-06-04 15:35   ` Paolo Bonzini
  2025-06-04 15:49     ` Sean Christopherson
  2025-06-04 15:35   ` Paolo Bonzini
  2 siblings, 1 reply; 60+ messages in thread
From: Paolo Bonzini @ 2025-06-04 15:35 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

On 5/30/25 01:39, Sean Christopherson wrote:
> Use a dedicated array of MSRPM offsets to merge L0 and L1 bitmaps, i.e. to
> merge KVM's vmcb01 bitmap with L1's vmcb12 bitmap.  This will eventually
> allow for the removal of direct_access_msrs, as the only path where
> tracking the offsets is truly justified is the merge for nested SVM, where
> merging in chunks is an easy way to batch uaccess reads/writes.
> 
> Opportunistically omit the x2APIC MSRs from the merge-specific array
> instead of filtering them out at runtime.
> 
> Note, disabling interception of XSS, EFER, PAT, GHCB, and TSC_AUX is
> mutually exclusive with nested virtualization, as KVM passes through the
> MSRs only for SEV-ES guests, and KVM doesn't support nested virtualization
> for SEV+ guests.  Defer removing those MSRs to a future cleanup in order
> to make this refactoring as benign as possible.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/svm/nested.c | 72 +++++++++++++++++++++++++++++++++------
>   arch/x86/kvm/svm/svm.c    |  4 +++
>   arch/x86/kvm/svm/svm.h    |  2 ++
>   3 files changed, 67 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 89a77f0f1cc8..e53020939e60 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -184,6 +184,64 @@ void recalc_intercepts(struct vcpu_svm *svm)
>   	}
>   }
>   
> +static int nested_svm_msrpm_merge_offsets[9] __ro_after_init;
> +static int nested_svm_nr_msrpm_merge_offsets __ro_after_init;
> +
> +int __init nested_svm_init_msrpm_merge_offsets(void)
> +{
> +	const u32 merge_msrs[] = {

"static const", please.

Paolo

> +		MSR_STAR,
> +		MSR_IA32_SYSENTER_CS,
> +		MSR_IA32_SYSENTER_EIP,
> +		MSR_IA32_SYSENTER_ESP,
> +	#ifdef CONFIG_X86_64
> +		MSR_GS_BASE,
> +		MSR_FS_BASE,
> +		MSR_KERNEL_GS_BASE,
> +		MSR_LSTAR,
> +		MSR_CSTAR,
> +		MSR_SYSCALL_MASK,
> +	#endif
> +		MSR_IA32_SPEC_CTRL,
> +		MSR_IA32_PRED_CMD,
> +		MSR_IA32_FLUSH_CMD,
> +		MSR_IA32_LASTBRANCHFROMIP,
> +		MSR_IA32_LASTBRANCHTOIP,
> +		MSR_IA32_LASTINTFROMIP,
> +		MSR_IA32_LASTINTTOIP,
> +
> +		MSR_IA32_XSS,
> +		MSR_EFER,
> +		MSR_IA32_CR_PAT,
> +		MSR_AMD64_SEV_ES_GHCB,
> +		MSR_TSC_AUX,
> +	};
> +	int i, j;
> +
> +	for (i = 0; i < ARRAY_SIZE(merge_msrs); i++) {
> +		u32 offset = svm_msrpm_offset(merge_msrs[i]);
> +
> +		if (WARN_ON(offset == MSR_INVALID))
> +			return -EIO;
> +
> +		for (j = 0; j < nested_svm_nr_msrpm_merge_offsets; j++) {
> +			if (nested_svm_msrpm_merge_offsets[j] == offset)
> +				break;
> +		}
> +
> +		if (j < nested_svm_nr_msrpm_merge_offsets)
> +			continue;
> +
> +		if (WARN_ON(j >= ARRAY_SIZE(nested_svm_msrpm_merge_offsets)))
> +			return -EIO;
> +
> +		nested_svm_msrpm_merge_offsets[j] = offset;
> +		nested_svm_nr_msrpm_merge_offsets++;
> +	}
> +
> +	return 0;
> +}
> +
>   /*
>    * Merge L0's (KVM) and L1's (Nested VMCB) MSR permission bitmaps. The function
>    * is optimized in that it only merges the parts where KVM MSR permission bitmap
> @@ -216,19 +274,11 @@ static bool nested_svm_merge_msrpm(struct kvm_vcpu *vcpu)
>   	if (!(vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_MSR_PROT)))
>   		return true;
>   
> -	for (i = 0; i < MSRPM_OFFSETS; i++) {
> -		u32 value, p;
> +	for (i = 0; i < nested_svm_nr_msrpm_merge_offsets; i++) {
> +		const int p = nested_svm_msrpm_merge_offsets[i];
> +		u32 value;
>   		u64 offset;
>   
> -		if (msrpm_offsets[i] == 0xffffffff)
> -			break;
> -
> -		p      = msrpm_offsets[i];
> -
> -		/* x2apic msrs are intercepted always for the nested guest */
> -		if (is_x2apic_msrpm_offset(p))
> -			continue;
> -
>   		offset = svm->nested.ctl.msrpm_base_pa + (p * 4);
>   
>   		if (kvm_vcpu_read_guest(vcpu, offset, &value, 4))
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 1c70293400bc..84dd1f220986 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -5689,6 +5689,10 @@ static int __init svm_init(void)
>   	if (!kvm_is_svm_supported())
>   		return -EOPNOTSUPP;
>   
> +	r = nested_svm_init_msrpm_merge_offsets();
> +	if (r)
> +		return r;
> +
>   	r = kvm_x86_vendor_init(&svm_init_ops);
>   	if (r)
>   		return r;
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 909b9af6b3c1..0a8041d70994 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -686,6 +686,8 @@ static inline bool nested_exit_on_nmi(struct vcpu_svm *svm)
>   	return vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_NMI);
>   }
>   
> +int __init nested_svm_init_msrpm_merge_offsets(void);
> +
>   int enter_svm_guest_mode(struct kvm_vcpu *vcpu,
>   			 u64 vmcb_gpa, struct vmcb *vmcb12, bool from_vmrun);
>   void svm_leave_nested(struct kvm_vcpu *vcpu);


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 08/28] KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps
  2025-05-29 23:39 ` [PATCH 08/28] KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps Sean Christopherson
  2025-06-04  5:43   ` Chao Gao
  2025-06-04 15:35   ` Paolo Bonzini
@ 2025-06-04 15:35   ` Paolo Bonzini
  2 siblings, 0 replies; 60+ messages in thread
From: Paolo Bonzini @ 2025-06-04 15:35 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

On 5/30/25 01:39, Sean Christopherson wrote:
> Use a dedicated array of MSRPM offsets to merge L0 and L1 bitmaps, i.e. to
> merge KVM's vmcb01 bitmap with L1's vmcb12 bitmap.  This will eventually
> allow for the removal of direct_access_msrs, as the only path where
> tracking the offsets is truly justified is the merge for nested SVM, where
> merging in chunks is an easy way to batch uaccess reads/writes.
> 
> Opportunistically omit the x2APIC MSRs from the merge-specific array
> instead of filtering them out at runtime.
> 
> Note, disabling interception of XSS, EFER, PAT, GHCB, and TSC_AUX is
> mutually exclusive with nested virtualization, as KVM passes through the
> MSRs only for SEV-ES guests, and KVM doesn't support nested virtualization
> for SEV+ guests.  Defer removing those MSRs to a future cleanup in order
> to make this refactoring as benign as possible.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/svm/nested.c | 72 +++++++++++++++++++++++++++++++++------
>   arch/x86/kvm/svm/svm.c    |  4 +++
>   arch/x86/kvm/svm/svm.h    |  2 ++
>   3 files changed, 67 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 89a77f0f1cc8..e53020939e60 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -184,6 +184,64 @@ void recalc_intercepts(struct vcpu_svm *svm)
>   	}
>   }
>   
> +static int nested_svm_msrpm_merge_offsets[9] __ro_after_init;
> +static int nested_svm_nr_msrpm_merge_offsets __ro_after_init;
> +
> +int __init nested_svm_init_msrpm_merge_offsets(void)
> +{
> +	const u32 merge_msrs[] = {

"static const", please.

Paolo

> +		MSR_STAR,
> +		MSR_IA32_SYSENTER_CS,
> +		MSR_IA32_SYSENTER_EIP,
> +		MSR_IA32_SYSENTER_ESP,
> +	#ifdef CONFIG_X86_64
> +		MSR_GS_BASE,
> +		MSR_FS_BASE,
> +		MSR_KERNEL_GS_BASE,
> +		MSR_LSTAR,
> +		MSR_CSTAR,
> +		MSR_SYSCALL_MASK,
> +	#endif
> +		MSR_IA32_SPEC_CTRL,
> +		MSR_IA32_PRED_CMD,
> +		MSR_IA32_FLUSH_CMD,
> +		MSR_IA32_LASTBRANCHFROMIP,
> +		MSR_IA32_LASTBRANCHTOIP,
> +		MSR_IA32_LASTINTFROMIP,
> +		MSR_IA32_LASTINTTOIP,
> +
> +		MSR_IA32_XSS,
> +		MSR_EFER,
> +		MSR_IA32_CR_PAT,
> +		MSR_AMD64_SEV_ES_GHCB,
> +		MSR_TSC_AUX,
> +	};
> +	int i, j;
> +
> +	for (i = 0; i < ARRAY_SIZE(merge_msrs); i++) {
> +		u32 offset = svm_msrpm_offset(merge_msrs[i]);
> +
> +		if (WARN_ON(offset == MSR_INVALID))
> +			return -EIO;
> +
> +		for (j = 0; j < nested_svm_nr_msrpm_merge_offsets; j++) {
> +			if (nested_svm_msrpm_merge_offsets[j] == offset)
> +				break;
> +		}
> +
> +		if (j < nested_svm_nr_msrpm_merge_offsets)
> +			continue;
> +
> +		if (WARN_ON(j >= ARRAY_SIZE(nested_svm_msrpm_merge_offsets)))
> +			return -EIO;
> +
> +		nested_svm_msrpm_merge_offsets[j] = offset;
> +		nested_svm_nr_msrpm_merge_offsets++;
> +	}
> +
> +	return 0;
> +}
> +
>   /*
>    * Merge L0's (KVM) and L1's (Nested VMCB) MSR permission bitmaps. The function
>    * is optimized in that it only merges the parts where KVM MSR permission bitmap
> @@ -216,19 +274,11 @@ static bool nested_svm_merge_msrpm(struct kvm_vcpu *vcpu)
>   	if (!(vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_MSR_PROT)))
>   		return true;
>   
> -	for (i = 0; i < MSRPM_OFFSETS; i++) {
> -		u32 value, p;
> +	for (i = 0; i < nested_svm_nr_msrpm_merge_offsets; i++) {
> +		const int p = nested_svm_msrpm_merge_offsets[i];
> +		u32 value;
>   		u64 offset;
>   
> -		if (msrpm_offsets[i] == 0xffffffff)
> -			break;
> -
> -		p      = msrpm_offsets[i];
> -
> -		/* x2apic msrs are intercepted always for the nested guest */
> -		if (is_x2apic_msrpm_offset(p))
> -			continue;
> -
>   		offset = svm->nested.ctl.msrpm_base_pa + (p * 4);
>   
>   		if (kvm_vcpu_read_guest(vcpu, offset, &value, 4))
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 1c70293400bc..84dd1f220986 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -5689,6 +5689,10 @@ static int __init svm_init(void)
>   	if (!kvm_is_svm_supported())
>   		return -EOPNOTSUPP;
>   
> +	r = nested_svm_init_msrpm_merge_offsets();
> +	if (r)
> +		return r;
> +
>   	r = kvm_x86_vendor_init(&svm_init_ops);
>   	if (r)
>   		return r;
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 909b9af6b3c1..0a8041d70994 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -686,6 +686,8 @@ static inline bool nested_exit_on_nmi(struct vcpu_svm *svm)
>   	return vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_NMI);
>   }
>   
> +int __init nested_svm_init_msrpm_merge_offsets(void);
> +
>   int enter_svm_guest_mode(struct kvm_vcpu *vcpu,
>   			 u64 vmcb_gpa, struct vmcb *vmcb12, bool from_vmrun);
>   void svm_leave_nested(struct kvm_vcpu *vcpu);


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 08/28] KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps
  2025-06-04 15:35   ` Paolo Bonzini
@ 2025-06-04 15:49     ` Sean Christopherson
  0 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-06-04 15:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

On Wed, Jun 04, 2025, Paolo Bonzini wrote:
> On 5/30/25 01:39, Sean Christopherson wrote:
> > Use a dedicated array of MSRPM offsets to merge L0 and L1 bitmaps, i.e. to
> > merge KVM's vmcb01 bitmap with L1's vmcb12 bitmap.  This will eventually
> > allow for the removal of direct_access_msrs, as the only path where
> > tracking the offsets is truly justified is the merge for nested SVM, where
> > merging in chunks is an easy way to batch uaccess reads/writes.
> > 
> > Opportunistically omit the x2APIC MSRs from the merge-specific array
> > instead of filtering them out at runtime.
> > 
> > Note, disabling interception of XSS, EFER, PAT, GHCB, and TSC_AUX is
> > mutually exclusive with nested virtualization, as KVM passes through the
> > MSRs only for SEV-ES guests, and KVM doesn't support nested virtualization
> > for SEV+ guests.  Defer removing those MSRs to a future cleanup in order
> > to make this refactoring as benign as possible.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >   arch/x86/kvm/svm/nested.c | 72 +++++++++++++++++++++++++++++++++------
> >   arch/x86/kvm/svm/svm.c    |  4 +++
> >   arch/x86/kvm/svm/svm.h    |  2 ++
> >   3 files changed, 67 insertions(+), 11 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index 89a77f0f1cc8..e53020939e60 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -184,6 +184,64 @@ void recalc_intercepts(struct vcpu_svm *svm)
> >   	}
> >   }
> > +static int nested_svm_msrpm_merge_offsets[9] __ro_after_init;
> > +static int nested_svm_nr_msrpm_merge_offsets __ro_after_init;
> > +
> > +int __init nested_svm_init_msrpm_merge_offsets(void)
> > +{
> > +	const u32 merge_msrs[] = {
> 
> "static const", please.

Ugh, I was thinking the compiler would be magical enough to not generate code to
fill an on-stack array at runtime, but that's not the case.

AFAICT, tagging it __initdata works, so I'll do this to hopefully ensure the
memory is discarded after module load.

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index fb4808cf4711..af530f45bf64 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -205,7 +205,7 @@ static int svm_msrpm_offset(u32 msr)
 
 int __init nested_svm_init_msrpm_merge_offsets(void)
 {
-       const u32 merge_msrs[] = {
+       static const u32 __initdata merge_msrs[] = {
                MSR_STAR,
                MSR_IA32_SYSENTER_CS,
                MSR_IA32_SYSENTER_EIP,

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 11/28] KVM: SVM: Add helpers for accessing MSR bitmap that don't rely on offsets
  2025-05-29 23:39 ` [PATCH 11/28] KVM: SVM: Add helpers for accessing MSR bitmap that don't rely on offsets Sean Christopherson
@ 2025-06-04 16:11   ` Paolo Bonzini
  2025-06-04 17:35     ` Sean Christopherson
  0 siblings, 1 reply; 60+ messages in thread
From: Paolo Bonzini @ 2025-06-04 16:11 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

On 5/30/25 01:39, Sean Christopherson wrote:
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 47a36a9a7fe5..e432cd7a7889 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -628,6 +628,50 @@ static_assert(SVM_MSRS_PER_RANGE == 8192);
>   #define SVM_MSRPM_RANGE_1_BASE_MSR	0xc0000000
>   #define SVM_MSRPM_RANGE_2_BASE_MSR	0xc0010000
>   
> +#define SVM_MSRPM_FIRST_MSR(range_nr)	\
> +	(SVM_MSRPM_RANGE_## range_nr ##_BASE_MSR)
> +#define SVM_MSRPM_LAST_MSR(range_nr)	\
> +	(SVM_MSRPM_RANGE_## range_nr ##_BASE_MSR + SVM_MSRS_PER_RANGE - 1)
> +
> +#define SVM_MSRPM_BIT_NR(range_nr, msr)						\
> +	(range_nr * SVM_MSRPM_BYTES_PER_RANGE * BITS_PER_BYTE +			\
> +	 (msr - SVM_MSRPM_RANGE_## range_nr ##_BASE_MSR) * SVM_BITS_PER_MSR)
> +
> +#define SVM_MSRPM_SANITY_CHECK_BITS(range_nr)					\
> +static_assert(SVM_MSRPM_BIT_NR(range_nr, SVM_MSRPM_FIRST_MSR(range_nr) + 1) ==	\
> +	      range_nr * 2048 * 8 + 2);						\
> +static_assert(SVM_MSRPM_BIT_NR(range_nr, SVM_MSRPM_FIRST_MSR(range_nr) + 7) ==	\
> +	      range_nr * 2048 * 8 + 14);
> +
> +SVM_MSRPM_SANITY_CHECK_BITS(0);
> +SVM_MSRPM_SANITY_CHECK_BITS(1);
> +SVM_MSRPM_SANITY_CHECK_BITS(2);

Replying here for patches 11/25/26.  None of this is needed, just write 
a function like this:

static inline u32 svm_msr_bit(u32 msr)
{
	u32 msr_base = msr & ~(SVM_MSRS_PER_RANGE - 1);
	if (msr_base == SVM_MSRPM_RANGE_0_BASE_MSR)
		return SVM_MSRPM_BIT_NR(0, msr);
	if (msr_base == SVM_MSRPM_RANGE_1_BASE_MSR)
		return SVM_MSRPM_BIT_NR(1, msr);
	if (msr_base == SVM_MSRPM_RANGE_2_BASE_MSR)
		return SVM_MSRPM_BIT_NR(2, msr);
	return MSR_INVALID;
}

and you can throw away most of the other macros.  For example:

> +#define SVM_BUILD_MSR_BITMAP_CASE(bitmap, range_nr, msr, bitop, bit_rw)		\
> +	case SVM_MSRPM_FIRST_MSR(range_nr) ... SVM_MSRPM_LAST_MSR(range_nr):	\
> +		return bitop##_bit(SVM_MSRPM_BIT_NR(range_nr, msr) + bit_rw, bitmap);

... becomes a lot more lowercase:

static inline rtype svm_##action##_msr_bitmap_##access(
	unsigned long *bitmap, u32 msr)
{
	u32 bit = svm_msr_bit(msr);
	if (bit == MSR_INVALID)
		return true;
	return bitop##_bit(bit + bit_rw, bitmap);
}


In patch 25, also, you just get

static u32 svm_msrpm_offset(u32 msr)
{
	u32 bit = svm_msr_bit(msr);
	if (bit == MSR_INVALID)
		return MSR_INVALID;
	return bit / BITS_PER_BYTE;
}

And you change everything to -EINVAL in patch 26 to kill MSR_INVALID.

Another nit...

> +#define BUILD_SVM_MSR_BITMAP_HELPERS(ret_type, action, bitop)			\
> +	__BUILD_SVM_MSR_BITMAP_HELPER(ret_type, action, bitop, read,  0)	\
> +	__BUILD_SVM_MSR_BITMAP_HELPER(ret_type, action, bitop, write, 1)
> +
> +BUILD_SVM_MSR_BITMAP_HELPERS(bool, test, test)
> +BUILD_SVM_MSR_BITMAP_HELPERS(void, clear, __clear)
> +BUILD_SVM_MSR_BITMAP_HELPERS(void, set, __set)
Yes it's a bit duplication, but no need for the nesting, just do:

BUILD_SVM_MSR_BITMAP_HELPERS(bool, test,  test,    read,  0)
BUILD_SVM_MSR_BITMAP_HELPERS(bool, test,  test,    write, 1)
BUILD_SVM_MSR_BITMAP_HELPERS(void, clear, __clear, read,  0)
BUILD_SVM_MSR_BITMAP_HELPERS(void, clear, __clear, write, 1)
BUILD_SVM_MSR_BITMAP_HELPERS(void, set,   __set,   read,  0)
BUILD_SVM_MSR_BITMAP_HELPERS(void, set,   __set,   write, 1)

Otherwise, really nice.

Paolo


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 25/28] KVM: nSVM: Access MSRPM in 4-byte chunks only for merging L0 and L1 bitmaps
  2025-05-29 23:40 ` [PATCH 25/28] KVM: nSVM: Access MSRPM in 4-byte chunks only for merging L0 and L1 bitmaps Sean Christopherson
@ 2025-06-04 16:19   ` Paolo Bonzini
  2025-06-04 16:28     ` Sean Christopherson
  0 siblings, 1 reply; 60+ messages in thread
From: Paolo Bonzini @ 2025-06-04 16:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

On 5/30/25 01:40, Sean Christopherson wrote:
> Access the MSRPM using u32/4-byte chunks (and appropriately adjusted
> offsets) only when merging L0 and L1 bitmaps as part of emulating VMRUN.
> The only reason to batch accesses to MSRPMs is to avoid the overhead of
> uaccess operations (e.g. STAC/CLAC and bounds checks) when reading L1's
> bitmap pointed at by vmcb12.  For all other uses, either per-bit accesses
> are more than fast enough (no uaccess), or KVM is only accessing a single
> bit (nested_svm_exit_handled_msr()) and so there's nothing to batch.
> 
> In addition to (hopefully) documenting the uniqueness of the merging code,
> restricting chunked access to _just_ the merging code will allow for
> increasing the chunk size (to unsigned long) with minimal risk.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/svm/nested.c | 50 ++++++++++++++++-----------------------
>   arch/x86/kvm/svm/svm.h    | 18 ++++++++++----
>   2 files changed, 34 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index e07e10fb52a5..a4e98ada732b 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -187,31 +187,19 @@ void recalc_intercepts(struct vcpu_svm *svm)
>   static int nested_svm_msrpm_merge_offsets[6] __ro_after_init;
>   static int nested_svm_nr_msrpm_merge_offsets __ro_after_init;
>   
> -static const u32 msrpm_ranges[] = {
> -	SVM_MSRPM_RANGE_0_BASE_MSR,
> -	SVM_MSRPM_RANGE_1_BASE_MSR,
> -	SVM_MSRPM_RANGE_2_BASE_MSR
> -};
> +#define SVM_BUILD_MSR_BYTE_NR_CASE(range_nr, msr)				\
> +	case SVM_MSRPM_FIRST_MSR(range_nr) ... SVM_MSRPM_LAST_MSR(range_nr):	\
> +		return SVM_MSRPM_BYTE_NR(range_nr, msr);
>   
>   static u32 svm_msrpm_offset(u32 msr)
>   {
> -	u32 offset;
> -	int i;
> -
> -	for (i = 0; i < ARRAY_SIZE(msrpm_ranges); i++) {
> -		if (msr < msrpm_ranges[i] ||
> -		    msr >= msrpm_ranges[i] + SVM_MSRS_PER_RANGE)
> -			continue;
> -
> -		offset  = (msr - msrpm_ranges[i]) / SVM_MSRS_PER_BYTE;
> -		offset += (i * SVM_MSRPM_BYTES_PER_RANGE);  /* add range offset */
> -
> -		/* Now we have the u8 offset - but need the u32 offset */
> -		return offset / 4;
> +	switch (msr) {
> +	SVM_BUILD_MSR_BYTE_NR_CASE(0, msr)
> +	SVM_BUILD_MSR_BYTE_NR_CASE(1, msr)
> +	SVM_BUILD_MSR_BYTE_NR_CASE(2, msr)
> +	default:
> +		return MSR_INVALID;
>   	}
> -
> -	/* MSR not in any range */
> -	return MSR_INVALID;
>   }
>   
>   int __init nested_svm_init_msrpm_merge_offsets(void)
> @@ -245,6 +233,12 @@ int __init nested_svm_init_msrpm_merge_offsets(void)
>   		if (WARN_ON(offset == MSR_INVALID))
>   			return -EIO;
>   
> +		/*
> +		 * Merging is done in 32-bit chunks to reduce the number of
> +		 * accesses to L1's bitmap.
> +		 */
> +		offset /= sizeof(u32);
> +
>   		for (j = 0; j < nested_svm_nr_msrpm_merge_offsets; j++) {
>   			if (nested_svm_msrpm_merge_offsets[j] == offset)
>   				break;
> @@ -1363,8 +1357,9 @@ void svm_leave_nested(struct kvm_vcpu *vcpu)
>   
>   static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
>   {
> -	u32 offset, msr, value;
> -	int write, mask;
> +	u32 offset, msr;
> +	int write;
> +	u8 value;
>   
>   	if (!(vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_MSR_PROT)))
>   		return NESTED_EXIT_HOST;
> @@ -1372,18 +1367,15 @@ static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
>   	msr    = svm->vcpu.arch.regs[VCPU_REGS_RCX];
>   	offset = svm_msrpm_offset(msr);
>   	write  = svm->vmcb->control.exit_info_1 & 1;
> -	mask   = 1 << ((2 * (msr & 0xf)) + write);

This is wrong.  The bit to read isn't always bit 0 or bit 1, therefore 
mask needs to remain.  But it can be written easily as:

    	msr = svm->vcpu.arch.regs[VCPU_REGS_RCX];
	write = svm->vmcb->control.exit_info_1 & 1;

	bit = svm_msrpm_bit(msr);
	if (bit == MSR_INVALID)
		return NESTED_EXIT_DONE;
	offset = bit / BITS_PER_BYTE;
	mask = BIT(write) << (bit & (BITS_PER_BYTE - 1));

and this even removes the need to use svm_msrpm_offset() in 
nested_svm_exit_handled_msr().


At this point, it may even make sense to keep the adjustment for the 
offset in svm_msrpm_offset(), like this:

static u32 svm_msrpm_offset(u32 msr)
{
	u32 bit = svm_msr_bit(msr);
	if (bit == MSR_INVALID)
		return MSR_INVALID;

	/*
	 * Merging is done in 32-bit chunks to reduce the number of
	 * accesses to L1's bitmap.
	 */
	return bit / (BITS_PER_BYTE * sizeof(u32));
}

I'll let you be the judge on this.

Paolo

>   	if (offset == MSR_INVALID)
>   		return NESTED_EXIT_DONE;
>   
> -	/* Offset is in 32 bit units but need in 8 bit units */
> -	offset *= 4;
> -
> -	if (kvm_vcpu_read_guest(&svm->vcpu, svm->nested.ctl.msrpm_base_pa + offset, &value, 4))
> +	if (kvm_vcpu_read_guest(&svm->vcpu, svm->nested.ctl.msrpm_base_pa + offset,
> +				&value, sizeof(value)))
>   		return NESTED_EXIT_DONE;
>   
> -	return (value & mask) ? NESTED_EXIT_DONE : NESTED_EXIT_HOST;
> +	return (value & BIT(write)) ? NESTED_EXIT_DONE : NESTED_EXIT_HOST;
>   }
>   
>   static int nested_svm_intercept_ioio(struct vcpu_svm *svm)
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 77287c870967..155b6089fcd2 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -634,15 +634,23 @@ static_assert(SVM_MSRS_PER_RANGE == 8192);
>   	(range_nr * SVM_MSRPM_BYTES_PER_RANGE * BITS_PER_BYTE +			\
>   	 (msr - SVM_MSRPM_RANGE_## range_nr ##_BASE_MSR) * SVM_BITS_PER_MSR)
>   
> -#define SVM_MSRPM_SANITY_CHECK_BITS(range_nr)					\
> +#define SVM_MSRPM_BYTE_NR(range_nr, msr)					\
> +	(range_nr * SVM_MSRPM_BYTES_PER_RANGE +					\
> +	 (msr - SVM_MSRPM_RANGE_## range_nr ##_BASE_MSR) / SVM_MSRS_PER_BYTE)
> +
> +#define SVM_MSRPM_SANITY_CHECK_BITS_AND_BYTES(range_nr)				\
>   static_assert(SVM_MSRPM_BIT_NR(range_nr, SVM_MSRPM_FIRST_MSR(range_nr) + 1) ==	\
>   	      range_nr * 2048 * 8 + 2);						\
>   static_assert(SVM_MSRPM_BIT_NR(range_nr, SVM_MSRPM_FIRST_MSR(range_nr) + 7) ==	\
> -	      range_nr * 2048 * 8 + 14);
> +	      range_nr * 2048 * 8 + 14);					\
> +static_assert(SVM_MSRPM_BYTE_NR(range_nr, SVM_MSRPM_FIRST_MSR(range_nr) + 1) ==	\
> +	      range_nr * 2048);							\
> +static_assert(SVM_MSRPM_BYTE_NR(range_nr, SVM_MSRPM_FIRST_MSR(range_nr) + 7) ==	\
> +	      range_nr * 2048 + 1);
>   
> -SVM_MSRPM_SANITY_CHECK_BITS(0);
> -SVM_MSRPM_SANITY_CHECK_BITS(1);
> -SVM_MSRPM_SANITY_CHECK_BITS(2);
> +SVM_MSRPM_SANITY_CHECK_BITS_AND_BYTES(0);
> +SVM_MSRPM_SANITY_CHECK_BITS_AND_BYTES(1);
> +SVM_MSRPM_SANITY_CHECK_BITS_AND_BYTES(2);
>   
>   #define SVM_BUILD_MSR_BITMAP_CASE(bitmap, range_nr, msr, bitop, bit_rw)		\
>   	case SVM_MSRPM_FIRST_MSR(range_nr) ... SVM_MSRPM_LAST_MSR(range_nr):	\


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 25/28] KVM: nSVM: Access MSRPM in 4-byte chunks only for merging L0 and L1 bitmaps
  2025-06-04 16:19   ` Paolo Bonzini
@ 2025-06-04 16:28     ` Sean Christopherson
  0 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-06-04 16:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

On Wed, Jun 04, 2025, Paolo Bonzini wrote:
> On 5/30/25 01:40, Sean Christopherson wrote:
> > @@ -1363,8 +1357,9 @@ void svm_leave_nested(struct kvm_vcpu *vcpu)
> >   static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
> >   {
> > -	u32 offset, msr, value;
> > -	int write, mask;
> > +	u32 offset, msr;
> > +	int write;
> > +	u8 value;
> >   	if (!(vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_MSR_PROT)))
> >   		return NESTED_EXIT_HOST;
> > @@ -1372,18 +1367,15 @@ static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
> >   	msr    = svm->vcpu.arch.regs[VCPU_REGS_RCX];
> >   	offset = svm_msrpm_offset(msr);
> >   	write  = svm->vmcb->control.exit_info_1 & 1;
> > -	mask   = 1 << ((2 * (msr & 0xf)) + write);
> 
> This is wrong.  The bit to read isn't always bit 0 or bit 1, therefore mask
> needs to remain.

/facepalm

Duh.  I managed to forget that multiple MSRs are packed into a byte.  Hrm, which
means our nSVM test is even more worthless than I thought.  I'll see if I can get
it to detect this bug.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 11/28] KVM: SVM: Add helpers for accessing MSR bitmap that don't rely on offsets
  2025-06-04 16:11   ` Paolo Bonzini
@ 2025-06-04 17:35     ` Sean Christopherson
  2025-06-04 19:13       ` Paolo Bonzini
  0 siblings, 1 reply; 60+ messages in thread
From: Sean Christopherson @ 2025-06-04 17:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

On Wed, Jun 04, 2025, Paolo Bonzini wrote:
> On 5/30/25 01:39, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index 47a36a9a7fe5..e432cd7a7889 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -628,6 +628,50 @@ static_assert(SVM_MSRS_PER_RANGE == 8192);
> >   #define SVM_MSRPM_RANGE_1_BASE_MSR	0xc0000000
> >   #define SVM_MSRPM_RANGE_2_BASE_MSR	0xc0010000
> > +#define SVM_MSRPM_FIRST_MSR(range_nr)	\
> > +	(SVM_MSRPM_RANGE_## range_nr ##_BASE_MSR)
> > +#define SVM_MSRPM_LAST_MSR(range_nr)	\
> > +	(SVM_MSRPM_RANGE_## range_nr ##_BASE_MSR + SVM_MSRS_PER_RANGE - 1)
> > +
> > +#define SVM_MSRPM_BIT_NR(range_nr, msr)						\
> > +	(range_nr * SVM_MSRPM_BYTES_PER_RANGE * BITS_PER_BYTE +			\
> > +	 (msr - SVM_MSRPM_RANGE_## range_nr ##_BASE_MSR) * SVM_BITS_PER_MSR)
> > +
> > +#define SVM_MSRPM_SANITY_CHECK_BITS(range_nr)					\
> > +static_assert(SVM_MSRPM_BIT_NR(range_nr, SVM_MSRPM_FIRST_MSR(range_nr) + 1) ==	\
> > +	      range_nr * 2048 * 8 + 2);						\
> > +static_assert(SVM_MSRPM_BIT_NR(range_nr, SVM_MSRPM_FIRST_MSR(range_nr) + 7) ==	\
> > +	      range_nr * 2048 * 8 + 14);
> > +
> > +SVM_MSRPM_SANITY_CHECK_BITS(0);
> > +SVM_MSRPM_SANITY_CHECK_BITS(1);
> > +SVM_MSRPM_SANITY_CHECK_BITS(2);
> 
> Replying here for patches 11/25/26.  None of this is needed, just write a
> function like this:
> 
> static inline u32 svm_msr_bit(u32 msr)
> {
> 	u32 msr_base = msr & ~(SVM_MSRS_PER_RANGE - 1);

Ooh, clever.

> 	if (msr_base == SVM_MSRPM_RANGE_0_BASE_MSR)
> 		return SVM_MSRPM_BIT_NR(0, msr);
> 	if (msr_base == SVM_MSRPM_RANGE_1_BASE_MSR)
> 		return SVM_MSRPM_BIT_NR(1, msr);
> 	if (msr_base == SVM_MSRPM_RANGE_2_BASE_MSR)
> 		return SVM_MSRPM_BIT_NR(2, msr);
> 	return MSR_INVALID;

I initially had something like this, but I don't like the potential for typos,
e.g. to fat finger something like:

	if (msr_base == SVM_MSRPM_RANGE_2_BASE_MSR)
		return SVM_MSRPM_BIT_NR(1, msr);

Which is how I ended up with the (admittedly ugly) CASE macros.  Would you be ok
keeping that wrinkle?  E.g.

	#define SVM_MSR_BIT_NR_CASE(range_nr, msr)					\
	case SVM_MSRPM_RANGE_## range_nr ##_BASE_MSR:					\
		return range_nr * SVM_MSRPM_BYTES_PER_RANGE * BITS_PER_BYTE +		\
		       (msr - SVM_MSRPM_RANGE_## range_nr ##_BASE_MSR) * SVM_BITS_PER_MSR);

	static __always_inline int svm_msrpm_bit_nr(u32 msr)
	{
		switch (msr & ~(SVM_MSRS_PER_RANGE - 1)) {
		SVM_BUILD_MSR_BITMAP_CASE(0, msr)
		SVM_BUILD_MSR_BITMAP_CASE(1, msr)
		SVM_BUILD_MSR_BITMAP_CASE(2, msr)
		default:
			return -EINVAL;
		}
	}

Actually, better idea!  Hopefully.  With your masking trick, there's no need to
do subtraction to get the offset within a range, which means getting the bit/byte
number for an MSR can be done entirely programmatically.  And if we do that, then
the SVM_MSRPM_RANGE_xxx_BASE_MSR defines can go away, and the (very trivial)
copy+paste that I dislike also goes away.

Completely untested, but how about this?

	#define SVM_MSRPM_OFFSET_MASK (SVM_MSRS_PER_RANGE - 1)

	static __always_inline int svm_msrpm_bit_nr(u32 msr)
	{
		int range_nr;

		switch (msr & ~SVM_MSRPM_OFFSET_MASK) {
		case 0:
			range_nr = 0;
			break;
		case 0xc0000000:
			range_nr = 1;
			break;
		case 0xc0010000:
			range_nr = 2;
			break;
		default:
			return -EINVAL;
		}

		return range_nr * SVM_MSRPM_BYTES_PER_RANGE * BITS_PER_BYTE +
		       (msr & SVM_MSRPM_OFFSET_MASK) * SVM_BITS_PER_MSR)
	}

	static inline svm_msrpm_byte_nr(u32 msr)
	{
		return svm_msrpm_bit_nr(msr) / BITS_PER_BYTE;
	}

The open coded literals aren't pretty, but VMX does the same thing, precisely
because I didn't want any code besides the innermost helper dealing with the
msr => offset math.

> }
> 
> and you can throw away most of the other macros.  For example:
> 
> > +#define SVM_BUILD_MSR_BITMAP_CASE(bitmap, range_nr, msr, bitop, bit_rw)		\
> > +	case SVM_MSRPM_FIRST_MSR(range_nr) ... SVM_MSRPM_LAST_MSR(range_nr):	\
> > +		return bitop##_bit(SVM_MSRPM_BIT_NR(range_nr, msr) + bit_rw, bitmap);
> 
> ... becomes a lot more lowercase:
> 
> static inline rtype svm_##action##_msr_bitmap_##access(
> 	unsigned long *bitmap, u32 msr)
> {
> 	u32 bit = svm_msr_bit(msr);
> 	if (bit == MSR_INVALID)
> 		return true;
> 	return bitop##_bit(bit + bit_rw, bitmap);

Yeah, much cleaner.

> }
> 
> 
> In patch 25, also, you just get
> 
> static u32 svm_msrpm_offset(u32 msr)
> {
> 	u32 bit = svm_msr_bit(msr);
> 	if (bit == MSR_INVALID)
> 		return MSR_INVALID;
> 	return bit / BITS_PER_BYTE;
> }
> 
> And you change everything to -EINVAL in patch 26 to kill MSR_INVALID.
> 
> Another nit...
> 
> > +#define BUILD_SVM_MSR_BITMAP_HELPERS(ret_type, action, bitop)			\
> > +	__BUILD_SVM_MSR_BITMAP_HELPER(ret_type, action, bitop, read,  0)	\
> > +	__BUILD_SVM_MSR_BITMAP_HELPER(ret_type, action, bitop, write, 1)
> > +
> > +BUILD_SVM_MSR_BITMAP_HELPERS(bool, test, test)
> > +BUILD_SVM_MSR_BITMAP_HELPERS(void, clear, __clear)
> > +BUILD_SVM_MSR_BITMAP_HELPERS(void, set, __set)
> Yes it's a bit duplication, but no need for the nesting, just do:

I don't have a super strong preference, but I do want to be consistent between
VMX and SVM, and VMX has the nesting (unsurprisingly, also written by me).  And
for that, the nested macros add a bit more value due to reads vs writes being in
entirely different areas of the bitmap.

#define BUILD_VMX_MSR_BITMAP_HELPERS(ret_type, action, bitop)		       \
	__BUILD_VMX_MSR_BITMAP_HELPER(ret_type, action, bitop, read,  0x0)     \
	__BUILD_VMX_MSR_BITMAP_HELPER(ret_type, action, bitop, write, 0x800)

BUILD_VMX_MSR_BITMAP_HELPERS(bool, test, test)
BUILD_VMX_MSR_BITMAP_HELPERS(void, clear, __clear)
BUILD_VMX_MSR_BITMAP_HELPERS(void, set, __set)

That could be mitigated to some extent by adding a #define to communicate the
offset, but IMO the nested macros are less ugly than that.

> BUILD_SVM_MSR_BITMAP_HELPERS(bool, test,  test,    read,  0)
> BUILD_SVM_MSR_BITMAP_HELPERS(bool, test,  test,    write, 1)
> BUILD_SVM_MSR_BITMAP_HELPERS(void, clear, __clear, read,  0)
> BUILD_SVM_MSR_BITMAP_HELPERS(void, clear, __clear, write, 1)
> BUILD_SVM_MSR_BITMAP_HELPERS(void, set,   __set,   read,  0)
> BUILD_SVM_MSR_BITMAP_HELPERS(void, set,   __set,   write, 1)
> 
> Otherwise, really nice.
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 11/28] KVM: SVM: Add helpers for accessing MSR bitmap that don't rely on offsets
  2025-06-04 17:35     ` Sean Christopherson
@ 2025-06-04 19:13       ` Paolo Bonzini
  0 siblings, 0 replies; 60+ messages in thread
From: Paolo Bonzini @ 2025-06-04 19:13 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, Borislav Petkov, Xin Li, Chao Gao, Dapeng Mi

On 6/4/25 19:35, Sean Christopherson wrote:
> On Wed, Jun 04, 2025, Paolo Bonzini wrote:
>> Replying here for patches 11/25/26.  None of this is needed, just write a
>> function like this:
>>
>> static inline u32 svm_msr_bit(u32 msr)
>> {
>> 	u32 msr_base = msr & ~(SVM_MSRS_PER_RANGE - 1);
> 
> Ooh, clever.
> 
>> 	if (msr_base == SVM_MSRPM_RANGE_0_BASE_MSR)
>> 		return SVM_MSRPM_BIT_NR(0, msr);
>> 	if (msr_base == SVM_MSRPM_RANGE_1_BASE_MSR)
>> 		return SVM_MSRPM_BIT_NR(1, msr);
>> 	if (msr_base == SVM_MSRPM_RANGE_2_BASE_MSR)
>> 		return SVM_MSRPM_BIT_NR(2, msr);
>> 	return MSR_INVALID;
> 
> I initially had something like this, but I don't like the potential for typos,
> e.g. to fat finger something like:
> 
> 	if (msr_base == SVM_MSRPM_RANGE_2_BASE_MSR)
> 		return SVM_MSRPM_BIT_NR(1, msr);
> 
> Which is how I ended up with the (admittedly ugly) CASE macros.  [...]
> Actually, better idea!  Hopefully.  With your masking trick, there's no need to
> do subtraction to get the offset within a range, which means getting the bit/byte
> number for an MSR can be done entirely programmatically. And if we do that, then> the SVM_MSRPM_RANGE_xxx_BASE_MSR defines can go away, and the (very 
trivial)
> copy+paste that I dislike also goes away.
> 
> Completely untested, but how about this?
> 
> 	#define SVM_MSRPM_OFFSET_MASK (SVM_MSRS_PER_RANGE - 1)
> 
> 	static __always_inline int svm_msrpm_bit_nr(u32 msr)

(yeah, after hitting send I noticed that msr->msrpm would have been better)

> 	{
> 		int range_nr;
> 
> 		switch (msr & ~SVM_MSRPM_OFFSET_MASK) {
> 		case 0:
> 			range_nr = 0;
> 			break;
> 		case 0xc0000000:
> 			range_nr = 1;
> 			break;
> 		case 0xc0010000:
> 			range_nr = 2;
> 			break;
> 		default:
> 			return -EINVAL;
> 		}

I actually was going to propose something very similar, I refrained only 
because I wasn't sure if there would be other remaining uses of 
SVM_MSRPM_RANGE_?_BASE_MSR.  The above is nice.

> 		return range_nr * SVM_MSRPM_BYTES_PER_RANGE * BITS_PER_BYTE +
> 		       (msr & SVM_MSRPM_OFFSET_MASK) * SVM_BITS_PER_MSR)

Or this too:

   return ((range_nr * SVM_MSRS_PER_RANGE)
           + (msr & SVM_MSRPM_OFFSET_MASK)) * SVM_BITS_PER_MSR;

depending on personal taste.  A few less macros, a few more parentheses.

That removes the enjoyment of seeing everything collapse into a single 
LEA instruction (X*2+CONST), as was the case with SVM_MSRPM_BIT_NR.  But 
I agree that these versions are about as nice as the code can be made.

> The open coded literals aren't pretty, but VMX does the same thing, precisely
> because I didn't want any code besides the innermost helper dealing with the
> msr => offset math.

>>> +#define BUILD_SVM_MSR_BITMAP_HELPERS(ret_type, action, bitop)			\
>>> +	__BUILD_SVM_MSR_BITMAP_HELPER(ret_type, action, bitop, read,  0)	\
>>> +	__BUILD_SVM_MSR_BITMAP_HELPER(ret_type, action, bitop, write, 1)
>>> +
>>> +BUILD_SVM_MSR_BITMAP_HELPERS(bool, test, test)
>>> +BUILD_SVM_MSR_BITMAP_HELPERS(void, clear, __clear)
>>> +BUILD_SVM_MSR_BITMAP_HELPERS(void, set, __set)
>> Yes it's a bit duplication, but no need for the nesting, just do:
> 
> I don't have a super strong preference, but I do want to be consistent between
> VMX and SVM, and VMX has the nesting (unsurprisingly, also written by me).  And
> for that, the nested macros add a bit more value due to reads vs writes being in
> entirely different areas of the bitmap.

Yeah, fair enough.  Since it's copied from VMX it makes sense.

Paolo


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 17/28] KVM: SVM: Manually recalc all MSR intercepts on userspace MSR filter change
  2025-05-29 23:40 ` [PATCH 17/28] KVM: SVM: " Sean Christopherson
  2025-05-31 18:01   ` Francesco Lavra
@ 2025-06-05  6:24   ` Chao Gao
  2025-06-05 16:39     ` Sean Christopherson
  1 sibling, 1 reply; 60+ messages in thread
From: Chao Gao @ 2025-06-05  6:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Borislav Petkov, Xin Li,
	Dapeng Mi

>+static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
>+{
>+	struct vcpu_svm *svm = to_svm(vcpu);
>+
>+	svm_vcpu_init_msrpm(vcpu);
>+
>+	if (lbrv)
>+		svm_recalc_lbr_msr_intercepts(vcpu);
>+
>+	if (boot_cpu_has(X86_FEATURE_IBPB))
>+		svm_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
>+					  !guest_has_pred_cmd_msr(vcpu));
>+
>+	if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
>+		svm_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
>+					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
>+
>+	/*
>+	 * Unconditionally disable interception of SPEC_CTRL if V_SPEC_CTRL is
>+	 * supported, i.e. if VMRUN/#VMEXIT context switch MSR_IA32_SPEC_CTRL.
>+	 */
>+	if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
>+		svm_disable_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW);

I think there is a bug in the original code. KVM should inject #GP when guests
try to access unsupported MSRs. Specifically, a guest w/o spec_ctrl support
should get #GP when it tries to access the MSR regardless of V_SPEC_CTRL
support on the host.

So, here should be 

	if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
		svm_set_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW,
					  !guest_has_spec_ctrl_msr(vcpu));

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 16/28] KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter change
  2025-05-29 23:40 ` [PATCH 16/28] KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter change Sean Christopherson
  2025-05-30 23:38   ` Xin Li
  2025-06-03  3:52   ` Mi, Dapeng
@ 2025-06-05  6:59   ` Chao Gao
  2 siblings, 0 replies; 60+ messages in thread
From: Chao Gao @ 2025-06-05  6:59 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Borislav Petkov, Xin Li,
	Dapeng Mi

On Thu, May 29, 2025 at 04:40:01PM -0700, Sean Christopherson wrote:
>On a userspace MSR filter change, recalculate all MSR intercepts using the
>filter-agnostic logic instead of maintaining a "shadow copy" of KVM's
>desired intercepts.  The shadow bitmaps add yet another point of failure,
>are confusing (e.g. what does "handled specially" mean!?!?), an eyesore,
>and a maintenance burden.
>
>Given that KVM *must* be able to recalculate the correct intercepts at any
>given time, and that MSR filter updates are not hot paths, there is zero
>benefit to maintaining the shadow bitmaps.
>
>Link: https://lore.kernel.org/all/aCdPbZiYmtni4Bjs@google.com
>Link: https://lore.kernel.org/all/20241126180253.GAZ0YNTdXH1UGeqsu6@fat_crate.local
>Cc: Borislav Petkov <bp@alien8.de>
>Cc: Xin Li <xin@zytor.com>
>Cc: Chao Gao <chao.gao@intel.com>
>Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
>Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Chao Gao <chao.gao@intel.com>

one nit below,

>+
>+	if (vcpu->arch.xfd_no_write_intercept)
>+		vmx_disable_intercept_for_msr(vcpu, MSR_IA32_XFD, MSR_TYPE_RW);
>+
>+

Remove one newline here.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 08/28] KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps
  2025-06-04 13:56     ` Sean Christopherson
@ 2025-06-05  8:08       ` Chao Gao
  0 siblings, 0 replies; 60+ messages in thread
From: Chao Gao @ 2025-06-05  8:08 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Borislav Petkov, Xin Li,
	Dapeng Mi

>Me either.  One option would be to use wrapper macros for the interception helpers
>to fill an array at compile time (similar to how kernel exception fixup works),
>but (a) it requires modifications to the linker scripts to generate the arrays,
>(b) is quite confusing/complex, and (c) it doesn't actually solve the problem,
>it just inverts the problem.  Because as above, there are MSRs we *don't* want
>to expose to L2, and so we'd need to add yet more code to filter those out.
>
>And the failure mode for the inverted case would be worse, because if we missed
>an MSR, KVM would incorrectly give L2 access to an MSR.  Whereas with the current
>approach, a missed MSR simply means L2 gets a slower path; but functionally, it's
>fine (and it has to be fine, because KVM can't force L1 to disable interception).

I agree. The risk of breaking functionality should be very low.

>
>> Removing this array and iterating over direct_access_msrs[] directly is an
>> option but it contradicts this series as one of its purposes is to remove
>> direct_access_msrs[].
>
>Using direct_access_msrs[] wouldn't solve the problem either, because nothing
>ensures _that_ array is up-to-date either.
>
>The best idea I have is to add a test that verifies the MSRs that are supposed
>to be passed through actually are passed through.  It's still effectively manual
>checking, but it would require us to screw up twice, i.e. forget to update both
>the array and the test.  The problem is that there's no easy and foolproof way to
>verify that an MSR is passed through in a selftest.
>
>E.g. it would be possible to precisely detect L2 => L0 MSR exits via a BPF program,
>but incorporating a BPF program into a KVM selftest just to detect exits isn't
>something I'm keen on doing (or maintaining).
>
>Using the "exits" stat isn't foolproof due to NMIs (IRQs can be accounted for via
>"irq_exits", and to a lesser extent page faults (especially if shadow paging is
>in use).
>
>If KVM provided an "msr_exits" stats, it would be trivial to verify interception
>via a selftest, but I can't quite convince myself that MSR exits are interesting
>enough to warrant their own stat.

Yes, there doesn't seem to be an easy solution to this problem. Given its low
risk, I think we can live with it for now and revisit the issue if it truely
becomes a recurring problem (i.e., people keep forgetting to update the array).

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 12/28] KVM: SVM: Implement and adopt VMX style MSR intercepts APIs
  2025-05-29 23:39 ` [PATCH 12/28] KVM: SVM: Implement and adopt VMX style MSR intercepts APIs Sean Christopherson
@ 2025-06-05  8:19   ` Chao Gao
  0 siblings, 0 replies; 60+ messages in thread
From: Chao Gao @ 2025-06-05  8:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Borislav Petkov, Xin Li,
	Dapeng Mi

On Thu, May 29, 2025 at 04:39:57PM -0700, Sean Christopherson wrote:
>Add and use SVM MSR interception APIs (in most paths) to match VMX's
>APIs and nomenclature.  Specifically, add SVM variants of:
>
>        vmx_disable_intercept_for_msr(vcpu, msr, type)
>        vmx_enable_intercept_for_msr(vcpu, msr, type)
>        vmx_set_intercept_for_msr(vcpu, msr, type, intercept)
>
>to eventually replace SVM's single helper:
>
>        set_msr_interception(vcpu, msrpm, msr, allow_read, allow_write)
>
>which is awkward to use (in all cases, KVM either applies the same logic
>for both reads and writes, or intercepts one of read or write), and is
>unintuitive due to using '0' to indicate interception should be *set*.
>
>Keep the guts of the old API for the moment to avoid churning the MSR
>filter code, as that mess will be overhauled in the near future.  Leave
>behind a temporary comment to call out that the shadow bitmaps have
>inverted polarity relative to the bitmaps consumed by hardware.
>
>No functional change intended.
>
>Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Chao Gao <chao.gao@intel.com>

two nits below:

>+void svm_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
> {
>-	set_shadow_msr_intercept(vcpu, msr, read, write);
>-	set_msr_interception_bitmap(vcpu, msrpm, msr, read, write);
>+	struct vcpu_svm *svm = to_svm(vcpu);
>+	void *msrpm = svm->msrpm;
>+
>+	/* Note, the shadow intercept bitmaps have inverted polarity. */
>+	set_shadow_msr_intercept(vcpu, msr, type & MSR_TYPE_R, type & MSR_TYPE_W);
>+
>+	/*
>+	 * Don't disabled interception for the MSR if userspace wants to

s/disabled/disable

<snip>

>+void svm_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
>+{
>+	struct vcpu_svm *svm = to_svm(vcpu);
>+	void *msrpm = svm->msrpm;
>+
>+

Remove one newline here.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 17/28] KVM: SVM: Manually recalc all MSR intercepts on userspace MSR filter change
  2025-06-05  6:24   ` Chao Gao
@ 2025-06-05 16:39     ` Sean Christopherson
  0 siblings, 0 replies; 60+ messages in thread
From: Sean Christopherson @ 2025-06-05 16:39 UTC (permalink / raw)
  To: Chao Gao
  Cc: Paolo Bonzini, kvm, linux-kernel, Borislav Petkov, Xin Li,
	Dapeng Mi

On Thu, Jun 05, 2025, Chao Gao wrote:
> >+static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
> >+{
> >+	struct vcpu_svm *svm = to_svm(vcpu);
> >+
> >+	svm_vcpu_init_msrpm(vcpu);
> >+
> >+	if (lbrv)
> >+		svm_recalc_lbr_msr_intercepts(vcpu);
> >+
> >+	if (boot_cpu_has(X86_FEATURE_IBPB))
> >+		svm_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
> >+					  !guest_has_pred_cmd_msr(vcpu));
> >+
> >+	if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
> >+		svm_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
> >+					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
> >+
> >+	/*
> >+	 * Unconditionally disable interception of SPEC_CTRL if V_SPEC_CTRL is
> >+	 * supported, i.e. if VMRUN/#VMEXIT context switch MSR_IA32_SPEC_CTRL.
> >+	 */

Tangentially related, the comment above isn't quite correct.  MSR_IA32_SPEC_CTRL
isn't purely context switched, the CPU merges together host and guest values.
Same end result though: KVM doesn't need to manually context switch the MSR.

> >+	if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
> >+		svm_disable_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW);
> 
> I think there is a bug in the original code. KVM should inject #GP when guests
> try to access unsupported MSRs. Specifically, a guest w/o spec_ctrl support
> should get #GP when it tries to access the MSR regardless of V_SPEC_CTRL
> support on the host.
> 
> So, here should be 
> 
> 	if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
> 		svm_set_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW,
> 					  !guest_has_spec_ctrl_msr(vcpu));

Right you are!  I'll slot in a patch earlier in the series, and then it'll end up
looking like this.

I'll also post a KUT testcase.

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2025-06-05 16:39 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-29 23:39 [PATCH 00/28] KVM: x86: Clean up MSR interception code Sean Christopherson
2025-05-29 23:39 ` [PATCH 01/28] KVM: SVM: Don't BUG if setting up the MSR intercept bitmaps fails Sean Christopherson
2025-06-03  7:17   ` Chao Gao
2025-06-03 15:28     ` Sean Christopherson
2025-05-29 23:39 ` [PATCH 02/28] KVM: SVM: Tag MSR bitmap initialization helpers with __init Sean Christopherson
2025-06-03  7:18   ` Chao Gao
2025-05-29 23:39 ` [PATCH 03/28] KVM: SVM: Use ARRAY_SIZE() to iterate over direct_access_msrs Sean Christopherson
2025-06-03  7:57   ` Chao Gao
2025-05-29 23:39 ` [PATCH 04/28] KVM: SVM: Kill the VM instead of the host if MSR interception is buggy Sean Christopherson
2025-06-03  8:06   ` Chao Gao
2025-06-03 13:26     ` Sean Christopherson
2025-05-29 23:39 ` [PATCH 05/28] KVM: x86: Use non-atomic bit ops to manipulate "shadow" MSR intercepts Sean Christopherson
2025-06-03  2:53   ` Mi, Dapeng
2025-05-29 23:39 ` [PATCH 06/28] KVM: SVM: Massage name and param of helper that merges vmcb01 and vmcb12 MSRPMs Sean Christopherson
2025-06-03  2:32   ` Mi, Dapeng
2025-05-29 23:39 ` [PATCH 07/28] KVM: SVM: Clean up macros related to architectural MSRPM definitions Sean Christopherson
2025-05-29 23:39 ` [PATCH 08/28] KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps Sean Christopherson
2025-06-04  5:43   ` Chao Gao
2025-06-04 13:56     ` Sean Christopherson
2025-06-05  8:08       ` Chao Gao
2025-06-04 15:35   ` Paolo Bonzini
2025-06-04 15:49     ` Sean Christopherson
2025-06-04 15:35   ` Paolo Bonzini
2025-05-29 23:39 ` [PATCH 09/28] KVM: nSVM: Omit SEV-ES specific passthrough MSRs from L0+L1 bitmap merge Sean Christopherson
2025-05-29 23:39 ` [PATCH 10/28] KVM: nSVM: Don't initialize vmcb02 MSRPM with vmcb01's "always passthrough" Sean Christopherson
2025-05-29 23:39 ` [PATCH 11/28] KVM: SVM: Add helpers for accessing MSR bitmap that don't rely on offsets Sean Christopherson
2025-06-04 16:11   ` Paolo Bonzini
2025-06-04 17:35     ` Sean Christopherson
2025-06-04 19:13       ` Paolo Bonzini
2025-05-29 23:39 ` [PATCH 12/28] KVM: SVM: Implement and adopt VMX style MSR intercepts APIs Sean Christopherson
2025-06-05  8:19   ` Chao Gao
2025-05-29 23:39 ` [PATCH 13/28] KVM: SVM: Pass through GHCB MSR if and only if VM is an SEV-ES guest Sean Christopherson
2025-05-29 23:39 ` [PATCH 14/28] KVM: SVM: Drop "always" flag from list of possible passthrough MSRs Sean Christopherson
2025-05-29 23:40 ` [PATCH 15/28] KVM: x86: Move definition of X2APIC_MSR() to lapic.h Sean Christopherson
2025-05-29 23:40 ` [PATCH 16/28] KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter change Sean Christopherson
2025-05-30 23:38   ` Xin Li
2025-06-02 17:10     ` Sean Christopherson
2025-06-03  3:52   ` Mi, Dapeng
2025-06-05  6:59   ` Chao Gao
2025-05-29 23:40 ` [PATCH 17/28] KVM: SVM: " Sean Christopherson
2025-05-31 18:01   ` Francesco Lavra
2025-06-02 17:08     ` Sean Christopherson
2025-06-05  6:24   ` Chao Gao
2025-06-05 16:39     ` Sean Christopherson
2025-05-29 23:40 ` [PATCH 18/28] KVM: x86: Rename msr_filter_changed() => recalc_msr_intercepts() Sean Christopherson
2025-05-30 23:47   ` Xin Li
2025-05-29 23:40 ` [PATCH 19/28] KVM: SVM: Rename init_vmcb_after_set_cpuid() to make it intercepts specific Sean Christopherson
2025-05-29 23:40 ` [PATCH 20/28] KVM: SVM: Fold svm_vcpu_init_msrpm() into its sole caller Sean Christopherson
2025-05-29 23:40 ` [PATCH 21/28] KVM: SVM: Merge "after set CPUID" intercept recalc helpers Sean Christopherson
2025-05-29 23:40 ` [PATCH 22/28] KVM: SVM: Drop explicit check on MSRPM offset when emulating SEV-ES accesses Sean Christopherson
2025-05-29 23:40 ` [PATCH 23/28] KVM: SVM: Move svm_msrpm_offset() to nested.c Sean Christopherson
2025-05-29 23:40 ` [PATCH 24/28] KVM: SVM: Store MSRPM pointer as "void *" instead of "u32 *" Sean Christopherson
2025-05-29 23:40 ` [PATCH 25/28] KVM: nSVM: Access MSRPM in 4-byte chunks only for merging L0 and L1 bitmaps Sean Christopherson
2025-06-04 16:19   ` Paolo Bonzini
2025-06-04 16:28     ` Sean Christopherson
2025-05-29 23:40 ` [PATCH 26/28] KVM: SVM: Return -EINVAL instead of MSR_INVALID to signal out-of-range MSR Sean Christopherson
2025-05-29 23:40 ` [PATCH 27/28] KVM: nSVM: Merge MSRPM in 64-bit chunks on 64-bit kernels Sean Christopherson
2025-05-29 23:40 ` [PATCH 28/28] KVM: selftests: Verify KVM disable interception (for userspace) on filter change Sean Christopherson
2025-06-03  5:47   ` Mi, Dapeng
2025-06-04  4:43     ` Manali Shukla

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).