linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v15 00/41] KVM: x86: Mega-CET
@ 2025-09-12 23:22 Sean Christopherson
  2025-09-12 23:22 ` [PATCH v15 01/41] KVM: SEV: Rename kvm_ghcb_get_sw_exit_code() to kvm_get_cached_sw_exit_code() Sean Christopherson
                   ` (43 more replies)
  0 siblings, 44 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

This series is (hopefully) all of the in-flight CET virtualization patches
in one big bundle.  Please holler if I missed a patch or three as this is what
I am planning on applying for 6.18 (modulo fixups and whatnot), i.e. if there's
something else that's needed to enable CET virtualization, now's the time...

Patches 1-3 probably need the most attention, as they are new in v15 and I
don't have a fully working SEV-ES setup (don't have the right guest firmware,
ugh).  Though testing on everything would be much appreciated.

I kept almost all Tested-by tags even for patches that I massaged a bit, and
only dropped tags for the "don't emulate CET stuff" patch.  In theory, the
changes I've made *should* be benign.  Please yell, loudly, if I broken
something and/or you want me to drop your Tested-by.

v15:
 - Collect reviews (hopefully I got 'em all).
 - Add support for KVM_GET_REG_LIST.
 - Load FPU when accessing XSTATE MSRs via ONE_REG ioctls.
 - Explicitly return -EINVAL on kvm_set_one_msr() failure.
 - Make is_xstate_managed_msr() more precise (check guest caps).
 - Dedup guts of kvm_{g,s}et_xstate_msr() (as kvm_access_xstate_msr()).
 - WARN if KVM uses kvm_access_xstate_msr() to access an MSR that isn't
   managed via XSAVE.
 - Document why S_CET isn't treated as an XSTATE-managed MSR.
 - Mark VMCB_CET as clean/dirty as appropriate.
 - Add nSVM support for the CET VMCB fields.
 - Add an "msrs" selftest to coverage ONE_REG and host vs. guest accesses in
   general.
 - Add patches to READ_ONCE() guest-writable GHCB fields, and to check the
   validity of XCR0 "writes".
 - Check the validity of XSS "writes" via common MSR emulation.
 - Add {CP,HV,VC,SV}_VECTOR definitions so that tracing and selftests can
   pretty print them.
 - Add pretty printing for unexpected exceptions in selftests.
 - Tweak the emulator rejection to be more precise (grab S_CET vs. U_CET based
   CPL for near transfers), and to avoid unnecessary reads of CR4, S_CET, and
   U_CET.

Intel (v14): https://lkml.kernel.org/r/20250909093953.202028-1-chao.gao%40intel.com
AMD    (v4): https://lore.kernel.org/all/20250908201750.98824-1-john.allen@amd.com
grsec  (v3): https://lkml.kernel.org/r/20250813205957.14135-1-minipli%40grsecurity.net

Chao Gao (4):
  KVM: x86: Check XSS validity against guest CPUIDs
  KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET
  KVM: nVMX: Add consistency checks for CET states
  KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state

John Allen (4):
  KVM: x86: SVM: Emulate reads and writes to shadow stack MSRs
  KVM: x86: SVM: Update dump_vmcb with shadow stack save area additions
  KVM: x86: SVM: Pass through shadow stack MSRs as appropriate
  KVM: SVM: Enable shadow stack virtualization for SVM

Mathias Krause (1):
  KVM: VMX: Make CR4.CET a guest owned bit

Sean Christopherson (17):
  KVM: SEV: Rename kvm_ghcb_get_sw_exit_code() to
    kvm_get_cached_sw_exit_code()
  KVM: SEV: Read save fields from GHCB exactly once
  KVM: SEV: Validate XCR0 provided by guest in GHCB
  KVM: x86: Report XSS as to-be-saved if there are supported features
  KVM: x86: Load guest FPU state when access XSAVE-managed MSRs
  KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02
  KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  KVM: x86: Add human friendly formatting for #XM, and #VE
  KVM: x86: Define Control Protection Exception (#CP) vector
  KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors
  KVM: selftests: Add ex_str() to print human friendly name of exception
    vectors
  KVM: selftests: Add an MSR test to exercise guest/host and read/write
  KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test
  KVM: selftests: Extend MSRs test to validate vCPUs without supported
    features
  KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test
  KVM: selftests: Add coverate for KVM-defined registers in MSRs test
  KVM: selftests: Verify MSRs are (not) in save/restore list when
    (un)supported

Yang Weijiang (15):
  KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support
  KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS
  KVM: x86: Initialize kvm_caps.supported_xss
  KVM: x86: Add fault checks for guest CR4.CET setting
  KVM: x86: Report KVM supported CET MSRs as to-be-saved
  KVM: VMX: Introduce CET VMCS fields and control bits
  KVM: x86: Enable guest SSP read/write interface with new uAPIs
  KVM: VMX: Emulate read and write to CET MSRs
  KVM: x86: Save and reload SSP to/from SMRAM
  KVM: VMX: Set up interception for CET MSRs
  KVM: VMX: Set host constant supervisor states to VMCS fields
  KVM: x86: Don't emulate instructions affected by CET features
  KVM: x86: Enable CET virtualization for VMX and advertise to userspace
  KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2
  KVM: nVMX: Prepare for enabling CET support for nested guest

 Documentation/virt/kvm/api.rst                |  14 +-
 arch/x86/include/asm/kvm_host.h               |   6 +-
 arch/x86/include/asm/vmx.h                    |   9 +
 arch/x86/include/uapi/asm/kvm.h               |  34 ++
 arch/x86/kvm/cpuid.c                          |  17 +-
 arch/x86/kvm/emulate.c                        |  58 ++-
 arch/x86/kvm/kvm_cache_regs.h                 |   3 +-
 arch/x86/kvm/smm.c                            |   8 +
 arch/x86/kvm/smm.h                            |   2 +-
 arch/x86/kvm/svm/nested.c                     |  20 +
 arch/x86/kvm/svm/sev.c                        |  23 +-
 arch/x86/kvm/svm/svm.c                        |  46 +-
 arch/x86/kvm/svm/svm.h                        |  30 +-
 arch/x86/kvm/trace.h                          |   5 +-
 arch/x86/kvm/vmx/capabilities.h               |   9 +
 arch/x86/kvm/vmx/nested.c                     | 163 ++++++-
 arch/x86/kvm/vmx/nested.h                     |   5 +
 arch/x86/kvm/vmx/vmcs12.c                     |   6 +
 arch/x86/kvm/vmx/vmcs12.h                     |  14 +-
 arch/x86/kvm/vmx/vmx.c                        |  84 +++-
 arch/x86/kvm/vmx/vmx.h                        |   9 +-
 arch/x86/kvm/x86.c                            | 362 +++++++++++++-
 arch/x86/kvm/x86.h                            |  37 ++
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../selftests/kvm/include/x86/processor.h     |   2 +
 .../testing/selftests/kvm/lib/x86/processor.c |  33 ++
 .../selftests/kvm/x86/hyperv_features.c       |  16 +-
 tools/testing/selftests/kvm/x86/msrs_test.c   | 440 ++++++++++++++++++
 .../selftests/kvm/x86/vmx_pmu_caps_test.c     |   4 +-
 .../selftests/kvm/x86/xcr0_cpuid_test.c       |  12 +-
 30 files changed, 1382 insertions(+), 90 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86/msrs_test.c


base-commit: b33f3c899e27cad5a62b15f9e3724fb5e61378c4
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply	[flat|nested] 130+ messages in thread

* [PATCH v15 01/41] KVM: SEV: Rename kvm_ghcb_get_sw_exit_code() to kvm_get_cached_sw_exit_code()
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
@ 2025-09-12 23:22 ` Sean Christopherson
  2025-09-15 16:15   ` Tom Lendacky
  2025-09-12 23:22 ` [PATCH v15 02/41] KVM: SEV: Read save fields from GHCB exactly once Sean Christopherson
                   ` (42 subsequent siblings)
  43 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

Rename kvm_ghcb_get_sw_exit_code() to kvm_get_cached_sw_exit_code() to make
it clear that KVM is getting the cached value, not reading directly from
the guest-controlled GHCB.  More importantly, vacating
kvm_ghcb_get_sw_exit_code() will allow adding a KVM-specific macro-built
kvm_ghcb_get_##field() helper to read values from the GHCB.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2fdd2e478a97..fe8d148b76c0 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3216,7 +3216,7 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
 		kvfree(svm->sev_es.ghcb_sa);
 }
 
-static u64 kvm_ghcb_get_sw_exit_code(struct vmcb_control_area *control)
+static u64 kvm_get_cached_sw_exit_code(struct vmcb_control_area *control)
 {
 	return (((u64)control->exit_code_hi) << 32) | control->exit_code;
 }
@@ -3242,7 +3242,7 @@ static void dump_ghcb(struct vcpu_svm *svm)
 	 */
 	pr_err("GHCB (GPA=%016llx) snapshot:\n", svm->vmcb->control.ghcb_gpa);
 	pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_code",
-	       kvm_ghcb_get_sw_exit_code(control), kvm_ghcb_sw_exit_code_is_valid(svm));
+	       kvm_get_cached_sw_exit_code(control), kvm_ghcb_sw_exit_code_is_valid(svm));
 	pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_info_1",
 	       control->exit_info_1, kvm_ghcb_sw_exit_info_1_is_valid(svm));
 	pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_info_2",
@@ -3331,7 +3331,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	 * Retrieve the exit code now even though it may not be marked valid
 	 * as it could help with debugging.
 	 */
-	exit_code = kvm_ghcb_get_sw_exit_code(control);
+	exit_code = kvm_get_cached_sw_exit_code(control);
 
 	/* Only GHCB Usage code 0 is supported */
 	if (svm->sev_es.ghcb->ghcb_usage) {
@@ -4336,7 +4336,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 
 	svm_vmgexit_success(svm, 0);
 
-	exit_code = kvm_ghcb_get_sw_exit_code(control);
+	exit_code = kvm_get_cached_sw_exit_code(control);
 	switch (exit_code) {
 	case SVM_VMGEXIT_MMIO_READ:
 		ret = setup_vmgexit_scratch(svm, true, control->exit_info_2);
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 02/41] KVM: SEV: Read save fields from GHCB exactly once
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
  2025-09-12 23:22 ` [PATCH v15 01/41] KVM: SEV: Rename kvm_ghcb_get_sw_exit_code() to kvm_get_cached_sw_exit_code() Sean Christopherson
@ 2025-09-12 23:22 ` Sean Christopherson
  2025-09-15 17:32   ` Tom Lendacky
  2025-09-12 23:22 ` [PATCH v15 03/41] KVM: SEV: Validate XCR0 provided by guest in GHCB Sean Christopherson
                   ` (41 subsequent siblings)
  43 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

Wrap all reads of GHCB save fields with READ_ONCE() via a KVM-specific
GHCB get() utility to help guard against TOCTOU bugs.  Using READ_ONCE()
doesn't completely prevent such bugs, e.g. doesn't prevent KVM from
redoing get() after checking the initial value, but at least addresses
all potential TOCTOU issues in the current KVM code base.

Opportunistically reduce the indentation of the macro-defined helpers and
clean up the alignment.

Fixes: 4e15a0ddc3ff ("KVM: SEV: snapshot the GHCB before accessing it")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c |  8 ++++----
 arch/x86/kvm/svm/svm.h | 26 ++++++++++++++++----------
 2 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index fe8d148b76c0..37abbda28685 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3304,16 +3304,16 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
 	svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm, ghcb);
 
 	if (kvm_ghcb_xcr0_is_valid(svm)) {
-		vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb);
+		vcpu->arch.xcr0 = kvm_ghcb_get_xcr0(ghcb);
 		vcpu->arch.cpuid_dynamic_bits_dirty = true;
 	}
 
 	/* Copy the GHCB exit information into the VMCB fields */
-	exit_code = ghcb_get_sw_exit_code(ghcb);
+	exit_code = kvm_ghcb_get_sw_exit_code(ghcb);
 	control->exit_code = lower_32_bits(exit_code);
 	control->exit_code_hi = upper_32_bits(exit_code);
-	control->exit_info_1 = ghcb_get_sw_exit_info_1(ghcb);
-	control->exit_info_2 = ghcb_get_sw_exit_info_2(ghcb);
+	control->exit_info_1 = kvm_ghcb_get_sw_exit_info_1(ghcb);
+	control->exit_info_2 = kvm_ghcb_get_sw_exit_info_2(ghcb);
 	svm->sev_es.sw_scratch = kvm_ghcb_get_sw_scratch_if_valid(svm, ghcb);
 
 	/* Clear the valid entries fields */
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 5d39c0b17988..c2316adde3cc 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -913,16 +913,22 @@ void __svm_sev_es_vcpu_run(struct vcpu_svm *svm, bool spec_ctrl_intercepted,
 void __svm_vcpu_run(struct vcpu_svm *svm, bool spec_ctrl_intercepted);
 
 #define DEFINE_KVM_GHCB_ACCESSORS(field)						\
-	static __always_inline bool kvm_ghcb_##field##_is_valid(const struct vcpu_svm *svm) \
-	{									\
-		return test_bit(GHCB_BITMAP_IDX(field),				\
-				(unsigned long *)&svm->sev_es.valid_bitmap);	\
-	}									\
-										\
-	static __always_inline u64 kvm_ghcb_get_##field##_if_valid(struct vcpu_svm *svm, struct ghcb *ghcb) \
-	{									\
-		return kvm_ghcb_##field##_is_valid(svm) ? ghcb->save.field : 0;	\
-	}									\
+static __always_inline u64 kvm_ghcb_get_##field(struct ghcb *ghcb)			\
+{											\
+	return READ_ONCE(ghcb->save.field);						\
+}											\
+											\
+static __always_inline bool kvm_ghcb_##field##_is_valid(const struct vcpu_svm *svm)	\
+{											\
+	return test_bit(GHCB_BITMAP_IDX(field),						\
+			(unsigned long *)&svm->sev_es.valid_bitmap);			\
+}											\
+											\
+static __always_inline u64 kvm_ghcb_get_##field##_if_valid(struct vcpu_svm *svm,	\
+							   struct ghcb *ghcb)		\
+{											\
+	return kvm_ghcb_##field##_is_valid(svm) ? kvm_ghcb_get_##field(ghcb) : 0;	\
+}
 
 DEFINE_KVM_GHCB_ACCESSORS(cpl)
 DEFINE_KVM_GHCB_ACCESSORS(rax)
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 03/41] KVM: SEV: Validate XCR0 provided by guest in GHCB
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
  2025-09-12 23:22 ` [PATCH v15 01/41] KVM: SEV: Rename kvm_ghcb_get_sw_exit_code() to kvm_get_cached_sw_exit_code() Sean Christopherson
  2025-09-12 23:22 ` [PATCH v15 02/41] KVM: SEV: Read save fields from GHCB exactly once Sean Christopherson
@ 2025-09-12 23:22 ` Sean Christopherson
  2025-09-15 18:41   ` Tom Lendacky
  2025-09-12 23:22 ` [PATCH v15 04/41] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support Sean Christopherson
                   ` (40 subsequent siblings)
  43 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

Use __kvm_set_xcr() to propagate XCR0 changes from the GHCB to KVM's
software model in order to validate the new XCR0 against KVM's view of
the supported XCR0.  Allowing garbage is thankfully mostly benign, as
kvm_load_{guest,host}_xsave_state() bail early for vCPUs with protected
state, xstate_required_size() will simply provide garbage back to the
guest, and attempting to save/restore the bad value via KVM_{G,S}ET_XCRS
will only harm the guest (setting XCR0 will fail).

However, allowing the guest to put junk into a field that KVM assumes is
valid is a CVE waiting to happen.  And as a bonus, using the proper API
eliminates the ugly open coding of setting arch.cpuid_dynamic_bits_dirty.

Simply ignore bad values, as either the guest managed to get an
unsupported value into hardware, or the guest is misbehaving and providing
pure garbage.  In either case, KVM can't fix the broken guest.

Note, using __kvm_set_xcr() also avoids recomputing dynamic CPUID bits
if XCR0 isn't actually changing (relatively to KVM's previous snapshot).

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/svm/sev.c          | 6 ++----
 arch/x86/kvm/x86.c              | 3 ++-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index cb86f3cca3e9..2762554cbb7b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2209,6 +2209,7 @@ int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val);
 unsigned long kvm_get_dr(struct kvm_vcpu *vcpu, int dr);
 unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu);
 void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw);
+int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
 int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu);
 
 int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 37abbda28685..0cd77a87dd84 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3303,10 +3303,8 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
 
 	svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm, ghcb);
 
-	if (kvm_ghcb_xcr0_is_valid(svm)) {
-		vcpu->arch.xcr0 = kvm_ghcb_get_xcr0(ghcb);
-		vcpu->arch.cpuid_dynamic_bits_dirty = true;
-	}
+	if (kvm_ghcb_xcr0_is_valid(svm))
+		__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
 
 	/* Copy the GHCB exit information into the VMCB fields */
 	exit_code = kvm_ghcb_get_sw_exit_code(ghcb);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6d85fbafc679..ba4915456615 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1235,7 +1235,7 @@ static inline u64 kvm_guest_supported_xfd(struct kvm_vcpu *vcpu)
 }
 #endif
 
-static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
+int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
 {
 	u64 xcr0 = xcr;
 	u64 old_xcr0 = vcpu->arch.xcr0;
@@ -1279,6 +1279,7 @@ static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
 		vcpu->arch.cpuid_dynamic_bits_dirty = true;
 	return 0;
 }
+EXPORT_SYMBOL_GPL(__kvm_set_xcr);
 
 int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu)
 {
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 04/41] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (2 preceding siblings ...)
  2025-09-12 23:22 ` [PATCH v15 03/41] KVM: SEV: Validate XCR0 provided by guest in GHCB Sean Christopherson
@ 2025-09-12 23:22 ` Sean Christopherson
  2025-09-15  6:29   ` Xiaoyao Li
  2025-09-16  7:10   ` Binbin Wu
  2025-09-12 23:22 ` [PATCH v15 05/41] KVM: x86: Report XSS as to-be-saved if there are supported features Sean Christopherson
                   ` (39 subsequent siblings)
  43 siblings, 2 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: Yang Weijiang <weijiang.yang@intel.com>

Enable KVM_{G,S}ET_ONE_REG uAPIs so that userspace can access MSRs and
other non-MSR registers through them, along with support for
KVM_GET_REG_LIST to enumerate support for KVM-defined registers.

This is in preparation for allowing userspace to read/write the guest SSP
register, which is needed for the upcoming CET virtualization support.

Currently, two types of registers are supported: KVM_X86_REG_TYPE_MSR and
KVM_X86_REG_TYPE_KVM. All MSRs are in the former type; the latter type is
added for registers that lack existing KVM uAPIs to access them. The "KVM"
in the name is intended to be vague to give KVM flexibility to include
other potential registers.  More precise names like "SYNTHETIC" and
"SYNTHETIC_MSR" were considered, but were deemed too confusing (e.g. can
be conflated with synthetic guest-visible MSRs) and may put KVM into a
corner (e.g. if KVM wants to change how a KVM-defined register is modeled
internally).

Enumerate only KVM-defined registers in KVM_GET_REG_LIST to avoid
duplicating KVM_GET_MSR_INDEX_LIST, and so that KVM can return _only_
registers that are fully supported (KVM_GET_REG_LIST is vCPU-scoped, i.e.
can be precise, whereas KVM_GET_MSR_INDEX_LIST is system-scoped).

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Link: https://lore.kernel.org/all/20240219074733.122080-18-weijiang.yang@intel.com [1]
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 Documentation/virt/kvm/api.rst  |   6 +-
 arch/x86/include/uapi/asm/kvm.h |  26 +++++++++
 arch/x86/kvm/x86.c              | 100 ++++++++++++++++++++++++++++++++
 3 files changed, 131 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index ffc350b649ad..abd02675a24d 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -2908,6 +2908,8 @@ such as set vcpu counter or reset vcpu, and they have the following id bit patte
 
   0x9030 0000 0002 <reg:16>
 
+x86 MSR registers have the following id bit patterns::
+  0x2030 0002 <msr number:32>
 
 4.69 KVM_GET_ONE_REG
 --------------------
@@ -3588,7 +3590,7 @@ VCPU matching underlying host.
 ---------------------
 
 :Capability: basic
-:Architectures: arm64, mips, riscv
+:Architectures: arm64, mips, riscv, x86 (if KVM_CAP_ONE_REG)
 :Type: vcpu ioctl
 :Parameters: struct kvm_reg_list (in/out)
 :Returns: 0 on success; -1 on error
@@ -3631,6 +3633,8 @@ Note that s390 does not support KVM_GET_REG_LIST for historical reasons
 
 - KVM_REG_S390_GBEA
 
+Note, for x86, all MSRs enumerated by KVM_GET_MSR_INDEX_LIST are supported as
+type KVM_X86_REG_TYPE_MSR, but are NOT enumerated via KVM_GET_REG_LIST.
 
 4.85 KVM_ARM_SET_DEVICE_ADDR (deprecated)
 -----------------------------------------
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 0f15d683817d..508b713ca52e 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -411,6 +411,32 @@ struct kvm_xcrs {
 	__u64 padding[16];
 };
 
+#define KVM_X86_REG_TYPE_MSR		2
+#define KVM_X86_REG_TYPE_KVM		3
+
+#define KVM_X86_KVM_REG_SIZE(reg)						\
+({										\
+	reg == KVM_REG_GUEST_SSP ? KVM_REG_SIZE_U64 : 0;			\
+})
+
+#define KVM_X86_REG_TYPE_SIZE(type, reg)					\
+({										\
+	__u64 type_size = (__u64)type << 32;					\
+										\
+	type_size |= type == KVM_X86_REG_TYPE_MSR ? KVM_REG_SIZE_U64 :		\
+		     type == KVM_X86_REG_TYPE_KVM ? KVM_X86_KVM_REG_SIZE(reg) :	\
+		     0;								\
+	type_size;								\
+})
+
+#define KVM_X86_REG_ENCODE(type, index)				\
+	(KVM_REG_X86 | KVM_X86_REG_TYPE_SIZE(type, index) | index)
+
+#define KVM_X86_REG_MSR(index)					\
+	KVM_X86_REG_ENCODE(KVM_X86_REG_TYPE_MSR, index)
+#define KVM_X86_REG_KVM(index)					\
+	KVM_X86_REG_ENCODE(KVM_X86_REG_TYPE_KVM, index)
+
 #define KVM_SYNC_X86_REGS      (1UL << 0)
 #define KVM_SYNC_X86_SREGS     (1UL << 1)
 #define KVM_SYNC_X86_EVENTS    (1UL << 2)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ba4915456615..771b7c883c66 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4735,6 +4735,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_IRQFD_RESAMPLE:
 	case KVM_CAP_MEMORY_FAULT_INFO:
 	case KVM_CAP_X86_GUEST_MODE:
+	case KVM_CAP_ONE_REG:
 		r = 1;
 		break;
 	case KVM_CAP_PRE_FAULT_MEMORY:
@@ -5913,6 +5914,98 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 	}
 }
 
+struct kvm_x86_reg_id {
+	__u32 index;
+	__u8  type;
+	__u8  rsvd1;
+	__u8  rsvd2:4;
+	__u8  size:4;
+	__u8  x86;
+};
+
+static int kvm_translate_kvm_reg(struct kvm_x86_reg_id *reg)
+{
+	return -EINVAL;
+}
+
+static int kvm_get_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val)
+{
+	u64 val;
+
+	if (do_get_msr(vcpu, msr, &val))
+		return -EINVAL;
+
+	if (put_user(val, user_val))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int kvm_set_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val)
+{
+	u64 val;
+
+	if (get_user(val, user_val))
+		return -EFAULT;
+
+	if (do_set_msr(vcpu, msr, &val))
+		return -EINVAL;
+
+	return 0;
+}
+
+static int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl,
+			       void __user *argp)
+{
+	struct kvm_one_reg one_reg;
+	struct kvm_x86_reg_id *reg;
+	u64 __user *user_val;
+	int r;
+
+	if (copy_from_user(&one_reg, argp, sizeof(one_reg)))
+		return -EFAULT;
+
+	if ((one_reg.id & KVM_REG_ARCH_MASK) != KVM_REG_X86)
+		return -EINVAL;
+
+	reg = (struct kvm_x86_reg_id *)&one_reg.id;
+	if (reg->rsvd1 || reg->rsvd2)
+		return -EINVAL;
+
+	if (reg->type == KVM_X86_REG_TYPE_KVM) {
+		r = kvm_translate_kvm_reg(reg);
+		if (r)
+			return r;
+	}
+
+	if (reg->type != KVM_X86_REG_TYPE_MSR)
+		return -EINVAL;
+
+	if ((one_reg.id & KVM_REG_SIZE_MASK) != KVM_REG_SIZE_U64)
+		return -EINVAL;
+
+	guard(srcu)(&vcpu->kvm->srcu);
+
+	user_val = u64_to_user_ptr(one_reg.addr);
+	if (ioctl == KVM_GET_ONE_REG)
+		r = kvm_get_one_msr(vcpu, reg->index, user_val);
+	else
+		r = kvm_set_one_msr(vcpu, reg->index, user_val);
+
+	return r;
+}
+
+static int kvm_get_reg_list(struct kvm_vcpu *vcpu,
+			    struct kvm_reg_list __user *user_list)
+{
+	u64 nr_regs = 0;
+
+	if (put_user(nr_regs, &user_list->n))
+		return -EFAULT;
+
+	return 0;
+}
+
 long kvm_arch_vcpu_ioctl(struct file *filp,
 			 unsigned int ioctl, unsigned long arg)
 {
@@ -6029,6 +6122,13 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		srcu_read_unlock(&vcpu->kvm->srcu, idx);
 		break;
 	}
+	case KVM_GET_ONE_REG:
+	case KVM_SET_ONE_REG:
+		r = kvm_get_set_one_reg(vcpu, ioctl, argp);
+		break;
+	case KVM_GET_REG_LIST:
+		r = kvm_get_reg_list(vcpu, argp);
+		break;
 	case KVM_TPR_ACCESS_REPORTING: {
 		struct kvm_tpr_access_ctl tac;
 
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 05/41] KVM: x86: Report XSS as to-be-saved if there are supported features
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (3 preceding siblings ...)
  2025-09-12 23:22 ` [PATCH v15 04/41] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support Sean Christopherson
@ 2025-09-12 23:22 ` Sean Christopherson
  2025-09-16  7:12   ` Binbin Wu
  2025-09-12 23:22 ` [PATCH v15 06/41] KVM: x86: Check XSS validity against guest CPUIDs Sean Christopherson
                   ` (38 subsequent siblings)
  43 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

Add MSR_IA32_XSS to list of MSRs reported to userspace if supported_xss
is non-zero, i.e. KVM supports at least one XSS based feature.

Before enabling CET virtualization series, guest IA32_MSR_XSS is
guaranteed to be 0, i.e., XSAVES/XRSTORS is executed in non-root mode
with XSS == 0, which equals to the effect of XSAVE/XRSTOR.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 771b7c883c66..3b4258b38ad8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -332,7 +332,7 @@ static const u32 msrs_to_save_base[] = {
 	MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
 	MSR_IA32_UMWAIT_CONTROL,
 
-	MSR_IA32_XFD, MSR_IA32_XFD_ERR,
+	MSR_IA32_XFD, MSR_IA32_XFD_ERR, MSR_IA32_XSS,
 };
 
 static const u32 msrs_to_save_pmu[] = {
@@ -7499,6 +7499,10 @@ static void kvm_probe_msr_to_save(u32 msr_index)
 		if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR))
 			return;
 		break;
+	case MSR_IA32_XSS:
+		if (!kvm_caps.supported_xss)
+			return;
+		break;
 	default:
 		break;
 	}
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 06/41] KVM: x86: Check XSS validity against guest CPUIDs
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (4 preceding siblings ...)
  2025-09-12 23:22 ` [PATCH v15 05/41] KVM: x86: Report XSS as to-be-saved if there are supported features Sean Christopherson
@ 2025-09-12 23:22 ` Sean Christopherson
  2025-09-16  7:20   ` Binbin Wu
  2025-09-12 23:22 ` [PATCH v15 07/41] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS Sean Christopherson
                   ` (37 subsequent siblings)
  43 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: Chao Gao <chao.gao@intel.com>

Maintain per-guest valid XSS bits and check XSS validity against them
rather than against KVM capabilities. This is to prevent bits that are
supported by KVM but not supported for a guest from being set.

Opportunistically return KVM_MSR_RET_UNSUPPORTED on IA32_XSS MSR accesses
if guest CPUID doesn't enumerate X86_FEATURE_XSAVES. Since
KVM_MSR_RET_UNSUPPORTED takes care of host_initiated cases, drop the
host_initiated check.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/kvm/cpuid.c            | 12 ++++++++++++
 arch/x86/kvm/x86.c              |  7 +++----
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2762554cbb7b..d931d72d23c9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -815,7 +815,6 @@ struct kvm_vcpu_arch {
 	bool at_instruction_boundary;
 	bool tpr_access_reporting;
 	bool xfd_no_write_intercept;
-	u64 ia32_xss;
 	u64 microcode_version;
 	u64 arch_capabilities;
 	u64 perf_capabilities;
@@ -876,6 +875,8 @@ struct kvm_vcpu_arch {
 
 	u64 xcr0;
 	u64 guest_supported_xcr0;
+	u64 ia32_xss;
+	u64 guest_supported_xss;
 
 	struct kvm_pio_request pio;
 	void *pio_data;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index ad6cadf09930..46cf616663e6 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -263,6 +263,17 @@ static u64 cpuid_get_supported_xcr0(struct kvm_vcpu *vcpu)
 	return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0;
 }
 
+static u64 cpuid_get_supported_xss(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpuid_entry2 *best;
+
+	best = kvm_find_cpuid_entry_index(vcpu, 0xd, 1);
+	if (!best)
+		return 0;
+
+	return (best->ecx | ((u64)best->edx << 32)) & kvm_caps.supported_xss;
+}
+
 static __always_inline void kvm_update_feature_runtime(struct kvm_vcpu *vcpu,
 						       struct kvm_cpuid_entry2 *entry,
 						       unsigned int x86_feature,
@@ -424,6 +435,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	}
 
 	vcpu->arch.guest_supported_xcr0 = cpuid_get_supported_xcr0(vcpu);
+	vcpu->arch.guest_supported_xss = cpuid_get_supported_xss(vcpu);
 
 	vcpu->arch.pv_cpuid.features = kvm_apply_cpuid_pv_features_quirk(vcpu);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3b4258b38ad8..5a5af40c06a9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3984,15 +3984,14 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		}
 		break;
 	case MSR_IA32_XSS:
-		if (!msr_info->host_initiated &&
-		    !guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
-			return 1;
+		if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
+			return KVM_MSR_RET_UNSUPPORTED;
 		/*
 		 * KVM supports exposing PT to the guest, but does not support
 		 * IA32_XSS[bit 8]. Guests have to use RDMSR/WRMSR rather than
 		 * XSAVES/XRSTORS to save/restore PT MSRs.
 		 */
-		if (data & ~kvm_caps.supported_xss)
+		if (data & ~vcpu->arch.guest_supported_xss)
 			return 1;
 		vcpu->arch.ia32_xss = data;
 		vcpu->arch.cpuid_dynamic_bits_dirty = true;
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 07/41] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (5 preceding siblings ...)
  2025-09-12 23:22 ` [PATCH v15 06/41] KVM: x86: Check XSS validity against guest CPUIDs Sean Christopherson
@ 2025-09-12 23:22 ` Sean Christopherson
  2025-09-16  7:23   ` Binbin Wu
  2025-09-12 23:22 ` [PATCH v15 08/41] KVM: x86: Initialize kvm_caps.supported_xss Sean Christopherson
                   ` (36 subsequent siblings)
  43 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: Yang Weijiang <weijiang.yang@intel.com>

Update CPUID.(EAX=0DH,ECX=1).EBX to reflect current required xstate size
due to XSS MSR modification.
CPUID(EAX=0DH,ECX=1).EBX reports the required storage size of all enabled
xstate features in (XCR0 | IA32_XSS). The CPUID value can be used by guest
before allocate sufficient xsave buffer.

Note, KVM does not yet support any XSS based features, i.e. supported_xss
is guaranteed to be zero at this time.

Opportunistically skip CPUID updates if XSS value doesn't change.

Suggested-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 3 ++-
 arch/x86/kvm/x86.c   | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 46cf616663e6..b5f87254ced7 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -316,7 +316,8 @@ static void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
 	best = kvm_find_cpuid_entry_index(vcpu, 0xD, 1);
 	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
 		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
-		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
+		best->ebx = xstate_required_size(vcpu->arch.xcr0 |
+						 vcpu->arch.ia32_xss, true);
 }
 
 static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5a5af40c06a9..519d58b82f7f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3993,6 +3993,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		 */
 		if (data & ~vcpu->arch.guest_supported_xss)
 			return 1;
+		if (vcpu->arch.ia32_xss == data)
+			break;
 		vcpu->arch.ia32_xss = data;
 		vcpu->arch.cpuid_dynamic_bits_dirty = true;
 		break;
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 08/41] KVM: x86: Initialize kvm_caps.supported_xss
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (6 preceding siblings ...)
  2025-09-12 23:22 ` [PATCH v15 07/41] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS Sean Christopherson
@ 2025-09-12 23:22 ` Sean Christopherson
  2025-09-16  7:29   ` Binbin Wu
  2025-09-12 23:22 ` [PATCH v15 09/41] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Sean Christopherson
                   ` (35 subsequent siblings)
  43 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: Yang Weijiang <weijiang.yang@intel.com>

Set original kvm_caps.supported_xss to (host_xss & KVM_SUPPORTED_XSS) if
XSAVES is supported. host_xss contains the host supported xstate feature
bits for thread FPU context switch, KVM_SUPPORTED_XSS includes all KVM
enabled XSS feature bits, the resulting value represents the supervisor
xstates that are available to guest and are backed by host FPU framework
for swapping {guest,host} XSAVE-managed registers/MSRs.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
[sean: relocate and enhance comment about PT / XSS[8] ]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 519d58b82f7f..c5e38d6943fe 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -217,6 +217,14 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
 				| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
 				| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
 
+/*
+ * Note, KVM supports exposing PT to the guest, but does not support context
+ * switching PT via XSTATE (KVM's PT virtualization relies on perf; swapping
+ * PT via guest XSTATE would clobber perf state), i.e. KVM doesn't support
+ * IA32_XSS[bit 8] (guests can/must use RDMSR/WRMSR to save/restore PT MSRs).
+ */
+#define KVM_SUPPORTED_XSS     0
+
 bool __read_mostly allow_smaller_maxphyaddr = 0;
 EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr);
 
@@ -3986,11 +3994,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_XSS:
 		if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
 			return KVM_MSR_RET_UNSUPPORTED;
-		/*
-		 * KVM supports exposing PT to the guest, but does not support
-		 * IA32_XSS[bit 8]. Guests have to use RDMSR/WRMSR rather than
-		 * XSAVES/XRSTORS to save/restore PT MSRs.
-		 */
+
 		if (data & ~vcpu->arch.guest_supported_xss)
 			return 1;
 		if (vcpu->arch.ia32_xss == data)
@@ -9818,14 +9822,17 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 		kvm_host.xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
 		kvm_caps.supported_xcr0 = kvm_host.xcr0 & KVM_SUPPORTED_XCR0;
 	}
+
+	if (boot_cpu_has(X86_FEATURE_XSAVES)) {
+		rdmsrq(MSR_IA32_XSS, kvm_host.xss);
+		kvm_caps.supported_xss = kvm_host.xss & KVM_SUPPORTED_XSS;
+	}
+
 	kvm_caps.supported_quirks = KVM_X86_VALID_QUIRKS;
 	kvm_caps.inapplicable_quirks = KVM_X86_CONDITIONAL_QUIRKS;
 
 	rdmsrq_safe(MSR_EFER, &kvm_host.efer);
 
-	if (boot_cpu_has(X86_FEATURE_XSAVES))
-		rdmsrq(MSR_IA32_XSS, kvm_host.xss);
-
 	kvm_init_pmu_capability(ops->pmu_ops);
 
 	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 09/41] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (7 preceding siblings ...)
  2025-09-12 23:22 ` [PATCH v15 08/41] KVM: x86: Initialize kvm_caps.supported_xss Sean Christopherson
@ 2025-09-12 23:22 ` Sean Christopherson
  2025-09-15 17:04   ` Xin Li
                     ` (2 more replies)
  2025-09-12 23:22 ` [PATCH v15 10/41] KVM: x86: Add fault checks for guest CR4.CET setting Sean Christopherson
                   ` (34 subsequent siblings)
  43 siblings, 3 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

Load the guest's FPU state if userspace is accessing MSRs whose values
are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(),
to facilitate access to such kind of MSRs.

If MSRs supported in kvm_caps.supported_xss are passed through to guest,
the guest MSRs are swapped with host's before vCPU exits to userspace and
after it reenters kernel before next VM-entry.

Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
explicitly check @vcpu is non-null before attempting to load guest state.
The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without
loading guest FPU state (which doesn't exist).

Note that guest_cpuid_has() is not queried as host userspace is allowed to
access MSRs that have not been exposed to the guest, e.g. it might do
KVM_SET_MSRS prior to KVM_SET_CPUID2.

The two helpers are put here in order to manifest accessing xsave-managed
MSRs requires special check and handling to guarantee the correctness of
read/write to the MSRs.

Co-developed-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
[sean: drop S_CET, add big comment, move accessors to x86.c]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 86 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 85 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c5e38d6943fe..a95ca2fbd3a9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -136,6 +136,9 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
 static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
 
 static DEFINE_MUTEX(vendor_module_lock);
+static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
+static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
+
 struct kvm_x86_ops kvm_x86_ops __read_mostly;
 
 #define KVM_X86_OP(func)					     \
@@ -3801,6 +3804,66 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 	mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
 }
 
+/*
+ * Returns true if the MSR in question is managed via XSTATE, i.e. is context
+ * switched with the rest of guest FPU state.  Note!  S_CET is _not_ context
+ * switched via XSTATE even though it _is_ saved/restored via XSAVES/XRSTORS.
+ * Because S_CET is loaded on VM-Enter and VM-Exit via dedicated VMCS fields,
+ * the value saved/restored via XSTATE is always the host's value.  That detail
+ * is _extremely_ important, as the guest's S_CET must _never_ be resident in
+ * hardware while executing in the host.  Loading guest values for U_CET and
+ * PL[0-3]_SSP while executing in the kernel is safe, as U_CET is specific to
+ * userspace, and PL[0-3]_SSP are only consumed when transitioning to lower
+ * privilegel levels, i.e. are effectively only consumed by userspace as well.
+ */
+static bool is_xstate_managed_msr(struct kvm_vcpu *vcpu, u32 msr)
+{
+	if (!vcpu)
+		return false;
+
+	switch (msr) {
+	case MSR_IA32_U_CET:
+		return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) ||
+		       guest_cpu_cap_has(vcpu, X86_FEATURE_IBT);
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+		return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
+	default:
+		return false;
+	}
+}
+
+/*
+ * Lock and/or reload guest FPU and access xstate MSRs. For accesses initiated
+ * by host, guest FPU is loaded in __msr_io(). For accesses initiated by guest,
+ * guest FPU should have been loaded already.
+ */
+static __always_inline void kvm_access_xstate_msr(struct kvm_vcpu *vcpu,
+						  struct msr_data *msr_info,
+						  int access)
+{
+	BUILD_BUG_ON(access != MSR_TYPE_R && access != MSR_TYPE_W);
+
+	KVM_BUG_ON(!is_xstate_managed_msr(vcpu, msr_info->index), vcpu->kvm);
+	KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
+
+	kvm_fpu_get();
+	if (access == MSR_TYPE_R)
+		rdmsrq(msr_info->index, msr_info->data);
+	else
+		wrmsrq(msr_info->index, msr_info->data);
+	kvm_fpu_put();
+}
+
+static __maybe_unused void kvm_set_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+	kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_W);
+}
+
+static __maybe_unused void kvm_get_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+	kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_R);
+}
+
 int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
 	u32 msr = msr_info->index;
@@ -4551,11 +4614,25 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
 		    int (*do_msr)(struct kvm_vcpu *vcpu,
 				  unsigned index, u64 *data))
 {
+	bool fpu_loaded = false;
 	int i;
 
-	for (i = 0; i < msrs->nmsrs; ++i)
+	for (i = 0; i < msrs->nmsrs; ++i) {
+		/*
+		 * If userspace is accessing one or more XSTATE-managed MSRs,
+		 * temporarily load the guest's FPU state so that the guest's
+		 * MSR value(s) is resident in hardware, i.e. so that KVM can
+		 * get/set the MSR via RDMSR/WRMSR.
+		 */
+		if (!fpu_loaded && is_xstate_managed_msr(vcpu, entries[i].index)) {
+			kvm_load_guest_fpu(vcpu);
+			fpu_loaded = true;
+		}
 		if (do_msr(vcpu, entries[i].index, &entries[i].data))
 			break;
+	}
+	if (fpu_loaded)
+		kvm_put_guest_fpu(vcpu);
 
 	return i;
 }
@@ -5965,6 +6042,7 @@ static int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl,
 	struct kvm_one_reg one_reg;
 	struct kvm_x86_reg_id *reg;
 	u64 __user *user_val;
+	bool load_fpu;
 	int r;
 
 	if (copy_from_user(&one_reg, argp, sizeof(one_reg)))
@@ -5991,12 +6069,18 @@ static int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl,
 
 	guard(srcu)(&vcpu->kvm->srcu);
 
+	load_fpu = is_xstate_managed_msr(vcpu, reg->index);
+	if (load_fpu)
+		kvm_load_guest_fpu(vcpu);
+
 	user_val = u64_to_user_ptr(one_reg.addr);
 	if (ioctl == KVM_GET_ONE_REG)
 		r = kvm_get_one_msr(vcpu, reg->index, user_val);
 	else
 		r = kvm_set_one_msr(vcpu, reg->index, user_val);
 
+	if (load_fpu)
+		kvm_put_guest_fpu(vcpu);
 	return r;
 }
 
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 10/41] KVM: x86: Add fault checks for guest CR4.CET setting
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (8 preceding siblings ...)
  2025-09-12 23:22 ` [PATCH v15 09/41] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Sean Christopherson
@ 2025-09-12 23:22 ` Sean Christopherson
  2025-09-16  8:33   ` Binbin Wu
  2025-09-12 23:22 ` [PATCH v15 11/41] KVM: x86: Report KVM supported CET MSRs as to-be-saved Sean Christopherson
                   ` (33 subsequent siblings)
  43 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: Yang Weijiang <weijiang.yang@intel.com>

Check potential faults for CR4.CET setting per Intel SDM requirements.
CET can be enabled if and only if CR0.WP == 1, i.e. setting CR4.CET ==
1 faults if CR0.WP == 0 and setting CR0.WP == 0 fails if CR4.CET == 1.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a95ca2fbd3a9..5653ddfe124e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1176,6 +1176,9 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 	    (is_64_bit_mode(vcpu) || kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE)))
 		return 1;
 
+	if (!(cr0 & X86_CR0_WP) && kvm_is_cr4_bit_set(vcpu, X86_CR4_CET))
+		return 1;
+
 	kvm_x86_call(set_cr0)(vcpu, cr0);
 
 	kvm_post_set_cr0(vcpu, old_cr0, cr0);
@@ -1376,6 +1379,9 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 			return 1;
 	}
 
+	if ((cr4 & X86_CR4_CET) && !kvm_is_cr0_bit_set(vcpu, X86_CR0_WP))
+		return 1;
+
 	kvm_x86_call(set_cr4)(vcpu, cr4);
 
 	kvm_post_set_cr4(vcpu, old_cr4, cr4);
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 11/41] KVM: x86: Report KVM supported CET MSRs as to-be-saved
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (9 preceding siblings ...)
  2025-09-12 23:22 ` [PATCH v15 10/41] KVM: x86: Add fault checks for guest CR4.CET setting Sean Christopherson
@ 2025-09-12 23:22 ` Sean Christopherson
  2025-09-15  6:30   ` Xiaoyao Li
  2025-09-16  8:46   ` Binbin Wu
  2025-09-12 23:22 ` [PATCH v15 12/41] KVM: VMX: Introduce CET VMCS fields and control bits Sean Christopherson
                   ` (32 subsequent siblings)
  43 siblings, 2 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: Yang Weijiang <weijiang.yang@intel.com>

Add CET MSRs to the list of MSRs reported to userspace if the feature,
i.e. IBT or SHSTK, associated with the MSRs is supported by KVM.

Suggested-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5653ddfe124e..2c9908bc8b32 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -344,6 +344,10 @@ static const u32 msrs_to_save_base[] = {
 	MSR_IA32_UMWAIT_CONTROL,
 
 	MSR_IA32_XFD, MSR_IA32_XFD_ERR, MSR_IA32_XSS,
+
+	MSR_IA32_U_CET, MSR_IA32_S_CET,
+	MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP,
+	MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB,
 };
 
 static const u32 msrs_to_save_pmu[] = {
@@ -7598,6 +7602,20 @@ static void kvm_probe_msr_to_save(u32 msr_index)
 		if (!kvm_caps.supported_xss)
 			return;
 		break;
+	case MSR_IA32_U_CET:
+	case MSR_IA32_S_CET:
+		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
+		    !kvm_cpu_cap_has(X86_FEATURE_IBT))
+			return;
+		break;
+	case MSR_IA32_INT_SSP_TAB:
+		if (!kvm_cpu_cap_has(X86_FEATURE_LM))
+			return;
+		fallthrough;
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
+			return;
+		break;
 	default:
 		break;
 	}
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 12/41] KVM: VMX: Introduce CET VMCS fields and control bits
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (10 preceding siblings ...)
  2025-09-12 23:22 ` [PATCH v15 11/41] KVM: x86: Report KVM supported CET MSRs as to-be-saved Sean Christopherson
@ 2025-09-12 23:22 ` Sean Christopherson
  2025-09-15  6:31   ` Xiaoyao Li
  2025-09-16  9:00   ` Binbin Wu
  2025-09-12 23:22 ` [PATCH v15 13/41] KVM: x86: Enable guest SSP read/write interface with new uAPIs Sean Christopherson
                   ` (31 subsequent siblings)
  43 siblings, 2 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: Yang Weijiang <weijiang.yang@intel.com>

Control-flow Enforcement Technology (CET) is a kind of CPU feature used
to prevent Return/CALL/Jump-Oriented Programming (ROP/COP/JOP) attacks.
It provides two sub-features(SHSTK,IBT) to defend against ROP/COP/JOP
style control-flow subversion attacks.

Shadow Stack (SHSTK):
  A shadow stack is a second stack used exclusively for control transfer
  operations. The shadow stack is separate from the data/normal stack and
  can be enabled individually in user and kernel mode. When shadow stack
  is enabled, CALL pushes the return address on both the data and shadow
  stack. RET pops the return address from both stacks and compares them.
  If the return addresses from the two stacks do not match, the processor
  generates a #CP.

Indirect Branch Tracking (IBT):
  IBT introduces instruction(ENDBRANCH)to mark valid target addresses of
  indirect branches (CALL, JMP etc...). If an indirect branch is executed
  and the next instruction is _not_ an ENDBRANCH, the processor generates
  a #CP. These instruction behaves as a NOP on platforms that have no CET.

Several new CET MSRs are defined to support CET:
  MSR_IA32_{U,S}_CET: CET settings for {user,supervisor} CET respectively.

  MSR_IA32_PL{0,1,2,3}_SSP: SHSTK pointer linear address for CPL{0,1,2,3}.

  MSR_IA32_INT_SSP_TAB: Linear address of SHSTK pointer table, whose entry
			is indexed by IST of interrupt gate desc.

Two XSAVES state bits are introduced for CET:
  IA32_XSS:[bit 11]: Control saving/restoring user mode CET states
  IA32_XSS:[bit 12]: Control saving/restoring supervisor mode CET states.

Six VMCS fields are introduced for CET:
  {HOST,GUEST}_S_CET: Stores CET settings for kernel mode.
  {HOST,GUEST}_SSP: Stores current active SSP.
  {HOST,GUEST}_INTR_SSP_TABLE: Stores current active MSR_IA32_INT_SSP_TAB.

On Intel platforms, two additional bits are defined in VM_EXIT and VM_ENTRY
control fields:
If VM_EXIT_LOAD_CET_STATE = 1, host CET states are loaded from following
VMCS fields at VM-Exit:
  HOST_S_CET
  HOST_SSP
  HOST_INTR_SSP_TABLE

If VM_ENTRY_LOAD_CET_STATE = 1, guest CET states are loaded from following
VMCS fields at VM-Entry:
  GUEST_S_CET
  GUEST_SSP
  GUEST_INTR_SSP_TABLE

Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/vmx.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index cca7d6641287..ce10a7e2d3d9 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -106,6 +106,7 @@
 #define VM_EXIT_CLEAR_BNDCFGS                   0x00800000
 #define VM_EXIT_PT_CONCEAL_PIP			0x01000000
 #define VM_EXIT_CLEAR_IA32_RTIT_CTL		0x02000000
+#define VM_EXIT_LOAD_CET_STATE                  0x10000000
 
 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR	0x00036dff
 
@@ -119,6 +120,7 @@
 #define VM_ENTRY_LOAD_BNDCFGS                   0x00010000
 #define VM_ENTRY_PT_CONCEAL_PIP			0x00020000
 #define VM_ENTRY_LOAD_IA32_RTIT_CTL		0x00040000
+#define VM_ENTRY_LOAD_CET_STATE                 0x00100000
 
 #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR	0x000011ff
 
@@ -369,6 +371,9 @@ enum vmcs_field {
 	GUEST_PENDING_DBG_EXCEPTIONS    = 0x00006822,
 	GUEST_SYSENTER_ESP              = 0x00006824,
 	GUEST_SYSENTER_EIP              = 0x00006826,
+	GUEST_S_CET                     = 0x00006828,
+	GUEST_SSP                       = 0x0000682a,
+	GUEST_INTR_SSP_TABLE            = 0x0000682c,
 	HOST_CR0                        = 0x00006c00,
 	HOST_CR3                        = 0x00006c02,
 	HOST_CR4                        = 0x00006c04,
@@ -381,6 +386,9 @@ enum vmcs_field {
 	HOST_IA32_SYSENTER_EIP          = 0x00006c12,
 	HOST_RSP                        = 0x00006c14,
 	HOST_RIP                        = 0x00006c16,
+	HOST_S_CET                      = 0x00006c18,
+	HOST_SSP                        = 0x00006c1a,
+	HOST_INTR_SSP_TABLE             = 0x00006c1c
 };
 
 /*
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 13/41] KVM: x86: Enable guest SSP read/write interface with new uAPIs
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (11 preceding siblings ...)
  2025-09-12 23:22 ` [PATCH v15 12/41] KVM: VMX: Introduce CET VMCS fields and control bits Sean Christopherson
@ 2025-09-12 23:22 ` Sean Christopherson
  2025-09-15  6:55   ` Xiaoyao Li
  2025-09-12 23:22 ` [PATCH v15 14/41] KVM: VMX: Emulate read and write to CET MSRs Sean Christopherson
                   ` (30 subsequent siblings)
  43 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: Yang Weijiang <weijiang.yang@intel.com>

Enable guest shadow stack pointer(SSP) access interface with new uAPIs.
CET guest SSP is HW register which has corresponding VMCS field to save
and restore guest values when VM-{Exit,Entry} happens. KVM handles SSP
as a fake/synthetic MSR for userspace access.

Use a translation helper to set up mapping for SSP synthetic index and
KVM-internal MSR index so that userspace doesn't need to take care of
KVM's management for synthetic MSRs and avoid conflicts.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 Documentation/virt/kvm/api.rst  |  8 ++++++++
 arch/x86/include/uapi/asm/kvm.h |  3 +++
 arch/x86/kvm/x86.c              | 23 +++++++++++++++++++++--
 arch/x86/kvm/x86.h              | 10 ++++++++++
 4 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index abd02675a24d..6ae24c5ca559 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -2911,6 +2911,14 @@ such as set vcpu counter or reset vcpu, and they have the following id bit patte
 x86 MSR registers have the following id bit patterns::
   0x2030 0002 <msr number:32>
 
+Following are the KVM-defined registers for x86:
+
+======================= ========= =============================================
+    Encoding            Register  Description
+======================= ========= =============================================
+  0x2030 0003 0000 0000 SSP       Shadow Stack Pointer
+======================= ========= =============================================
+
 4.69 KVM_GET_ONE_REG
 --------------------
 
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 508b713ca52e..8cc79eca34b2 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -437,6 +437,9 @@ struct kvm_xcrs {
 #define KVM_X86_REG_KVM(index)					\
 	KVM_X86_REG_ENCODE(KVM_X86_REG_TYPE_KVM, index)
 
+/* KVM-defined registers starting from 0 */
+#define KVM_REG_GUEST_SSP	0
+
 #define KVM_SYNC_X86_REGS      (1UL << 0)
 #define KVM_SYNC_X86_SREGS     (1UL << 1)
 #define KVM_SYNC_X86_EVENTS    (1UL << 2)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2c9908bc8b32..460ceae11495 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6017,7 +6017,15 @@ struct kvm_x86_reg_id {
 
 static int kvm_translate_kvm_reg(struct kvm_x86_reg_id *reg)
 {
-	return -EINVAL;
+	switch (reg->index) {
+	case KVM_REG_GUEST_SSP:
+		reg->type = KVM_X86_REG_TYPE_MSR;
+		reg->index = MSR_KVM_INTERNAL_GUEST_SSP;
+		break;
+	default:
+		return -EINVAL;
+	}
+	return 0;
 }
 
 static int kvm_get_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val)
@@ -6097,11 +6105,22 @@ static int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl,
 static int kvm_get_reg_list(struct kvm_vcpu *vcpu,
 			    struct kvm_reg_list __user *user_list)
 {
-	u64 nr_regs = 0;
+	u64 nr_regs = guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) ? 1 : 0;
+	u64 user_nr_regs;
+
+	if (get_user(user_nr_regs, &user_list->n))
+		return -EFAULT;
 
 	if (put_user(nr_regs, &user_list->n))
 		return -EFAULT;
 
+	if (user_nr_regs < nr_regs)
+		return -E2BIG;
+
+	if (nr_regs &&
+	    put_user(KVM_X86_REG_KVM(KVM_REG_GUEST_SSP), &user_list->reg[0]))
+		return -EFAULT;
+
 	return 0;
 }
 
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 786e36fcd0fb..a7c9c72fca93 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -101,6 +101,16 @@ do {											\
 #define KVM_SVM_DEFAULT_PLE_WINDOW_MAX	USHRT_MAX
 #define KVM_SVM_DEFAULT_PLE_WINDOW	3000
 
+/*
+ * KVM's internal, non-ABI indices for synthetic MSRs. The values themselves
+ * are arbitrary and have no meaning, the only requirement is that they don't
+ * conflict with "real" MSRs that KVM supports. Use values at the upper end
+ * of KVM's reserved paravirtual MSR range to minimize churn, i.e. these values
+ * will be usable until KVM exhausts its supply of paravirtual MSR indices.
+ */
+
+#define MSR_KVM_INTERNAL_GUEST_SSP	0x4b564dff
+
 static inline unsigned int __grow_ple_window(unsigned int val,
 		unsigned int base, unsigned int modifier, unsigned int max)
 {
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 14/41] KVM: VMX: Emulate read and write to CET MSRs
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (12 preceding siblings ...)
  2025-09-12 23:22 ` [PATCH v15 13/41] KVM: x86: Enable guest SSP read/write interface with new uAPIs Sean Christopherson
@ 2025-09-12 23:22 ` Sean Christopherson
  2025-09-16  7:07   ` Xiaoyao Li
  2025-09-17  7:52   ` Binbin Wu
  2025-09-12 23:22 ` [PATCH v15 15/41] KVM: x86: Save and reload SSP to/from SMRAM Sean Christopherson
                   ` (29 subsequent siblings)
  43 siblings, 2 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: Yang Weijiang <weijiang.yang@intel.com>

Add emulation interface for CET MSR access. The emulation code is split
into common part and vendor specific part. The former does common checks
for MSRs, e.g., accessibility, data validity etc., then passes operation
to either XSAVE-managed MSRs via the helpers or CET VMCS fields.

SSP can only be read via RDSSP. Writing even requires destructive and
potentially faulting operations such as SAVEPREVSSP/RSTORSSP or
SETSSBSY/CLRSSBSY. Let the host use a pseudo-MSR that is just a wrapper
for the GUEST_SSP field of the VMCS.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
[sean: drop call to kvm_set_xstate_msr() for S_CET, consolidate code]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 18 ++++++++++++
 arch/x86/kvm/x86.c     | 64 ++++++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/x86.h     | 23 +++++++++++++++
 3 files changed, 103 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 227b45430ad8..4fc1dbba2eb0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2106,6 +2106,15 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		else
 			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
 		break;
+	case MSR_IA32_S_CET:
+		msr_info->data = vmcs_readl(GUEST_S_CET);
+		break;
+	case MSR_KVM_INTERNAL_GUEST_SSP:
+		msr_info->data = vmcs_readl(GUEST_SSP);
+		break;
+	case MSR_IA32_INT_SSP_TAB:
+		msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
+		break;
 	case MSR_IA32_DEBUGCTLMSR:
 		msr_info->data = vmx_guest_debugctl_read();
 		break;
@@ -2424,6 +2433,15 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		else
 			vmx->pt_desc.guest.addr_a[index / 2] = data;
 		break;
+	case MSR_IA32_S_CET:
+		vmcs_writel(GUEST_S_CET, data);
+		break;
+	case MSR_KVM_INTERNAL_GUEST_SSP:
+		vmcs_writel(GUEST_SSP, data);
+		break;
+	case MSR_IA32_INT_SSP_TAB:
+		vmcs_writel(GUEST_INTR_SSP_TABLE, data);
+		break;
 	case MSR_IA32_PERF_CAPABILITIES:
 		if (data & PMU_CAP_LBR_FMT) {
 			if ((data & PMU_CAP_LBR_FMT) !=
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 460ceae11495..0b67b1b0e361 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1890,6 +1890,44 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
 
 		data = (u32)data;
 		break;
+	case MSR_IA32_U_CET:
+	case MSR_IA32_S_CET:
+		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+		    !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT))
+			return KVM_MSR_RET_UNSUPPORTED;
+		if (!kvm_is_valid_u_s_cet(vcpu, data))
+			return 1;
+		break;
+	case MSR_KVM_INTERNAL_GUEST_SSP:
+		if (!host_initiated)
+			return 1;
+		fallthrough;
+		/*
+		 * Note that the MSR emulation here is flawed when a vCPU
+		 * doesn't support the Intel 64 architecture. The expected
+		 * architectural behavior in this case is that the upper 32
+		 * bits do not exist and should always read '0'. However,
+		 * because the actual hardware on which the virtual CPU is
+		 * running does support Intel 64, XRSTORS/XSAVES in the
+		 * guest could observe behavior that violates the
+		 * architecture. Intercepting XRSTORS/XSAVES for this
+		 * special case isn't deemed worthwhile.
+		 */
+	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+			return KVM_MSR_RET_UNSUPPORTED;
+		/*
+		 * MSR_IA32_INT_SSP_TAB is not present on processors that do
+		 * not support Intel 64 architecture.
+		 */
+		if (index == MSR_IA32_INT_SSP_TAB && !guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
+			return KVM_MSR_RET_UNSUPPORTED;
+		if (is_noncanonical_msr_address(data, vcpu))
+			return 1;
+		/* All SSP MSRs except MSR_IA32_INT_SSP_TAB must be 4-byte aligned */
+		if (index != MSR_IA32_INT_SSP_TAB && !IS_ALIGNED(data, 4))
+			return 1;
+		break;
 	}
 
 	msr.data = data;
@@ -1934,6 +1972,20 @@ static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
 		    !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID))
 			return 1;
 		break;
+	case MSR_IA32_U_CET:
+	case MSR_IA32_S_CET:
+		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+		    !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT))
+			return KVM_MSR_RET_UNSUPPORTED;
+		break;
+	case MSR_KVM_INTERNAL_GUEST_SSP:
+		if (!host_initiated)
+			return 1;
+		fallthrough;
+	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+			return KVM_MSR_RET_UNSUPPORTED;
+		break;
 	}
 
 	msr.index = index;
@@ -3864,12 +3916,12 @@ static __always_inline void kvm_access_xstate_msr(struct kvm_vcpu *vcpu,
 	kvm_fpu_put();
 }
 
-static __maybe_unused void kvm_set_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+static void kvm_set_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
 	kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_W);
 }
 
-static __maybe_unused void kvm_get_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+static void kvm_get_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
 	kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_R);
 }
@@ -4255,6 +4307,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		vcpu->arch.guest_fpu.xfd_err = data;
 		break;
 #endif
+	case MSR_IA32_U_CET:
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+		kvm_set_xstate_msr(vcpu, msr_info);
+		break;
 	default:
 		if (kvm_pmu_is_valid_msr(vcpu, msr))
 			return kvm_pmu_set_msr(vcpu, msr_info);
@@ -4604,6 +4660,10 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		msr_info->data = vcpu->arch.guest_fpu.xfd_err;
 		break;
 #endif
+	case MSR_IA32_U_CET:
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+		kvm_get_xstate_msr(vcpu, msr_info);
+		break;
 	default:
 		if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
 			return kvm_pmu_get_msr(vcpu, msr_info);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index a7c9c72fca93..076eccba0f7e 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -710,4 +710,27 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl,
 
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
 
+#define CET_US_RESERVED_BITS		GENMASK(9, 6)
+#define CET_US_SHSTK_MASK_BITS		GENMASK(1, 0)
+#define CET_US_IBT_MASK_BITS		(GENMASK_ULL(5, 2) | GENMASK_ULL(63, 10))
+#define CET_US_LEGACY_BITMAP_BASE(data)	((data) >> 12)
+
+static inline bool kvm_is_valid_u_s_cet(struct kvm_vcpu *vcpu, u64 data)
+{
+	if (data & CET_US_RESERVED_BITS)
+		return false;
+	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+	    (data & CET_US_SHSTK_MASK_BITS))
+		return false;
+	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) &&
+	    (data & CET_US_IBT_MASK_BITS))
+		return false;
+	if (!IS_ALIGNED(CET_US_LEGACY_BITMAP_BASE(data), 4))
+		return false;
+	/* IBT can be suppressed iff the TRACKER isn't WAIT_ENDBR. */
+	if ((data & CET_SUPPRESS) && (data & CET_WAIT_ENDBR))
+		return false;
+
+	return true;
+}
 #endif
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 15/41] KVM: x86: Save and reload SSP to/from SMRAM
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (13 preceding siblings ...)
  2025-09-12 23:22 ` [PATCH v15 14/41] KVM: VMX: Emulate read and write to CET MSRs Sean Christopherson
@ 2025-09-12 23:22 ` Sean Christopherson
  2025-09-16  7:37   ` Xiaoyao Li
  2025-09-17  7:53   ` Binbin Wu
  2025-09-12 23:22 ` [PATCH v15 16/41] KVM: VMX: Set up interception for CET MSRs Sean Christopherson
                   ` (28 subsequent siblings)
  43 siblings, 2 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: Yang Weijiang <weijiang.yang@intel.com>

Save CET SSP to SMRAM on SMI and reload it on RSM. KVM emulates HW arch
behavior when guest enters/leaves SMM mode,i.e., save registers to SMRAM
at the entry of SMM and reload them at the exit to SMM. Per SDM, SSP is
one of such registers on 64-bit Arch, and add the support for SSP.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/smm.c | 8 ++++++++
 arch/x86/kvm/smm.h | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
index 5dd8a1646800..b0b14ba37f9a 100644
--- a/arch/x86/kvm/smm.c
+++ b/arch/x86/kvm/smm.c
@@ -269,6 +269,10 @@ static void enter_smm_save_state_64(struct kvm_vcpu *vcpu,
 	enter_smm_save_seg_64(vcpu, &smram->gs, VCPU_SREG_GS);
 
 	smram->int_shadow = kvm_x86_call(get_interrupt_shadow)(vcpu);
+
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+	    kvm_msr_read(vcpu, MSR_KVM_INTERNAL_GUEST_SSP, &smram->ssp))
+		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
 }
 #endif
 
@@ -558,6 +562,10 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
 	kvm_x86_call(set_interrupt_shadow)(vcpu, 0);
 	ctxt->interruptibility = (u8)smstate->int_shadow;
 
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+	    kvm_msr_write(vcpu, MSR_KVM_INTERNAL_GUEST_SSP, smstate->ssp))
+		return X86EMUL_UNHANDLEABLE;
+
 	return X86EMUL_CONTINUE;
 }
 #endif
diff --git a/arch/x86/kvm/smm.h b/arch/x86/kvm/smm.h
index 551703fbe200..db3c88f16138 100644
--- a/arch/x86/kvm/smm.h
+++ b/arch/x86/kvm/smm.h
@@ -116,8 +116,8 @@ struct kvm_smram_state_64 {
 	u32 smbase;
 	u32 reserved4[5];
 
-	/* ssp and svm_* fields below are not implemented by KVM */
 	u64 ssp;
+	/* svm_* fields below are not implemented by KVM */
 	u64 svm_guest_pat;
 	u64 svm_host_efer;
 	u64 svm_host_cr4;
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 16/41] KVM: VMX: Set up interception for CET MSRs
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (14 preceding siblings ...)
  2025-09-12 23:22 ` [PATCH v15 15/41] KVM: x86: Save and reload SSP to/from SMRAM Sean Christopherson
@ 2025-09-12 23:22 ` Sean Christopherson
  2025-09-15 17:21   ` Xin Li
                     ` (2 more replies)
  2025-09-12 23:22 ` [PATCH v15 17/41] KVM: VMX: Set host constant supervisor states to VMCS fields Sean Christopherson
                   ` (27 subsequent siblings)
  43 siblings, 3 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: Yang Weijiang <weijiang.yang@intel.com>

Enable/disable CET MSRs interception per associated feature configuration.

Pass through CET MSRs that are managed by XSAVE, as they cannot be
intercepted without also intercepting XSAVE. However, intercepting XSAVE
would likely cause unacceptable performance overhead.
MSR_IA32_INT_SSP_TAB is not managed by XSAVE, so it is intercepted.

Note, this MSR design introduced an architectural limitation of SHSTK and
IBT control for guest, i.e., when SHSTK is exposed, IBT is also available
to guest from architectural perspective since IBT relies on subset of SHSTK
relevant MSRs.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4fc1dbba2eb0..adf5af30e537 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4101,6 +4101,8 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
 
 void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 {
+	bool intercept;
+
 	if (!cpu_has_vmx_msr_bitmap())
 		return;
 
@@ -4146,6 +4148,23 @@ void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 		vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
 					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
 
+	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
+		intercept = !guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
+
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, MSR_TYPE_RW, intercept);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, MSR_TYPE_RW, intercept);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, MSR_TYPE_RW, intercept);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, intercept);
+	}
+
+	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK) || kvm_cpu_cap_has(X86_FEATURE_IBT)) {
+		intercept = !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) &&
+			    !guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
+
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, MSR_TYPE_RW, intercept);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, intercept);
+	}
+
 	/*
 	 * x2APIC and LBR MSR intercepts are modified on-demand and cannot be
 	 * filtered by userspace.
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 17/41] KVM: VMX: Set host constant supervisor states to VMCS fields
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (15 preceding siblings ...)
  2025-09-12 23:22 ` [PATCH v15 16/41] KVM: VMX: Set up interception for CET MSRs Sean Christopherson
@ 2025-09-12 23:22 ` Sean Christopherson
  2025-09-16  7:44   ` Xiaoyao Li
  2025-09-17  8:48   ` Xiaoyao Li
  2025-09-12 23:22 ` [PATCH v15 18/41] KVM: x86: Don't emulate instructions affected by CET features Sean Christopherson
                   ` (26 subsequent siblings)
  43 siblings, 2 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: Yang Weijiang <weijiang.yang@intel.com>

Save constant values to HOST_{S_CET,SSP,INTR_SSP_TABLE} field explicitly.
Kernel IBT is supported and the setting in MSR_IA32_S_CET is static after
post-boot(The exception is BIOS call case but vCPU thread never across it)
and KVM doesn't need to refresh HOST_S_CET field before every VM-Enter/
VM-Exit sequence.

Host supervisor shadow stack is not enabled now and SSP is not accessible
to kernel mode, thus it's safe to set host IA32_INT_SSP_TAB/SSP VMCS field
to 0s. When shadow stack is enabled for CPL3, SSP is reloaded from PL3_SSP
before it exits to userspace. Check SDM Vol 2A/B Chapter 3/4 for SYSCALL/
SYSRET/SYSENTER SYSEXIT/RDSSP/CALL etc.

Prevent KVM module loading if host supervisor shadow stack SHSTK_EN is set
in MSR_IA32_S_CET as KVM cannot co-exit with it correctly.

Suggested-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
[sean: snapshot host S_CET if SHSTK *or* IBT is supported]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/capabilities.h |  4 ++++
 arch/x86/kvm/vmx/vmx.c          | 15 +++++++++++++++
 arch/x86/kvm/x86.c              | 12 ++++++++++++
 arch/x86/kvm/x86.h              |  1 +
 4 files changed, 32 insertions(+)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 5316c27f6099..7d290b2cb0f4 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -103,6 +103,10 @@ static inline bool cpu_has_load_perf_global_ctrl(void)
 	return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
 }
 
+static inline bool cpu_has_load_cet_ctrl(void)
+{
+	return (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_CET_STATE);
+}
 static inline bool cpu_has_vmx_mpx(void)
 {
 	return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_BNDCFGS;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index adf5af30e537..e8155635cb42 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4320,6 +4320,21 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
 
 	if (cpu_has_load_ia32_efer())
 		vmcs_write64(HOST_IA32_EFER, kvm_host.efer);
+
+	/*
+	 * Supervisor shadow stack is not enabled on host side, i.e.,
+	 * host IA32_S_CET.SHSTK_EN bit is guaranteed to 0 now, per SDM
+	 * description(RDSSP instruction), SSP is not readable in CPL0,
+	 * so resetting the two registers to 0s at VM-Exit does no harm
+	 * to kernel execution. When execution flow exits to userspace,
+	 * SSP is reloaded from IA32_PL3_SSP. Check SDM Vol.2A/B Chapter
+	 * 3 and 4 for details.
+	 */
+	if (cpu_has_load_cet_ctrl()) {
+		vmcs_writel(HOST_S_CET, kvm_host.s_cet);
+		vmcs_writel(HOST_SSP, 0);
+		vmcs_writel(HOST_INTR_SSP_TABLE, 0);
+	}
 }
 
 void set_cr4_guest_host_mask(struct vcpu_vmx *vmx)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0b67b1b0e361..15f208c44cbd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9982,6 +9982,18 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 		return -EIO;
 	}
 
+	if (boot_cpu_has(X86_FEATURE_SHSTK) || boot_cpu_has(X86_FEATURE_IBT)) {
+		rdmsrq(MSR_IA32_S_CET, kvm_host.s_cet);
+		/*
+		 * Linux doesn't yet support supervisor shadow stacks (SSS), so
+		 * KVM doesn't save/restore the associated MSRs, i.e. KVM may
+		 * clobber the host values.  Yell and refuse to load if SSS is
+		 * unexpectedly enabled, e.g. to avoid crashing the host.
+		 */
+		if (WARN_ON_ONCE(kvm_host.s_cet & CET_SHSTK_EN))
+			return -EIO;
+	}
+
 	memset(&kvm_caps, 0, sizeof(kvm_caps));
 
 	x86_emulator_cache = kvm_alloc_emulator_cache();
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 076eccba0f7e..65cbd454c4f1 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -50,6 +50,7 @@ struct kvm_host_values {
 	u64 efer;
 	u64 xcr0;
 	u64 xss;
+	u64 s_cet;
 	u64 arch_capabilities;
 };
 
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 18/41] KVM: x86: Don't emulate instructions affected by CET features
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (16 preceding siblings ...)
  2025-09-12 23:22 ` [PATCH v15 17/41] KVM: VMX: Set host constant supervisor states to VMCS fields Sean Christopherson
@ 2025-09-12 23:22 ` Sean Christopherson
  2025-09-17  8:16   ` Chao Gao
                     ` (2 more replies)
  2025-09-12 23:22 ` [PATCH v15 19/41] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Sean Christopherson
                   ` (25 subsequent siblings)
  43 siblings, 3 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: Yang Weijiang <weijiang.yang@intel.com>

Don't emulate branch instructions, e.g. CALL/RET/JMP etc., that are
affected by Shadow Stacks and/or Indirect Branch Tracking when said
features are enabled in the guest, as fully emulating CET would require
significant complexity for no practical benefit (KVM shouldn't need to
emulate branch instructions on modern hosts).  Simply doing nothing isn't
an option as that would allow a malicious entity to subvert CET
protections via the emulator.

Note!  On far transfers, do NOT consult the current privilege level and
instead treat SHSTK/IBT as being enabled if they're enabled for User *or*
Supervisor mode.  On inter-privilege level far transfers, SHSTK and IBT
can be in play for the target privilege level, i.e. checking the current
privilege could get a false negative, and KVM doesn't know the target
privilege level until emulation gets under way.

Suggested-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Cc: Mathias Krause <minipli@grsecurity.net>
Cc: John Allen <john.allen@amd.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/emulate.c | 58 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 47 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 542d3664afa3..e4be54a677b0 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -178,6 +178,8 @@
 #define IncSP       ((u64)1 << 54)  /* SP is incremented before ModRM calc */
 #define TwoMemOp    ((u64)1 << 55)  /* Instruction has two memory operand */
 #define IsBranch    ((u64)1 << 56)  /* Instruction is considered a branch. */
+#define ShadowStack ((u64)1 << 57)  /* Instruction protected by Shadow Stack. */
+#define IndirBrnTrk ((u64)1 << 58)  /* Instruction protected by IBT. */
 
 #define DstXacc     (DstAccLo | SrcAccHi | SrcWrite)
 
@@ -4068,9 +4070,9 @@ static const struct opcode group4[] = {
 static const struct opcode group5[] = {
 	F(DstMem | SrcNone | Lock,		em_inc),
 	F(DstMem | SrcNone | Lock,		em_dec),
-	I(SrcMem | NearBranch | IsBranch,       em_call_near_abs),
-	I(SrcMemFAddr | ImplicitOps | IsBranch, em_call_far),
-	I(SrcMem | NearBranch | IsBranch,       em_jmp_abs),
+	I(SrcMem | NearBranch | IsBranch | ShadowStack | IndirBrnTrk, em_call_near_abs),
+	I(SrcMemFAddr | ImplicitOps | IsBranch | ShadowStack | IndirBrnTrk, em_call_far),
+	I(SrcMem | NearBranch | IsBranch | IndirBrnTrk, em_jmp_abs),
 	I(SrcMemFAddr | ImplicitOps | IsBranch, em_jmp_far),
 	I(SrcMem | Stack | TwoMemOp,		em_push), D(Undefined),
 };
@@ -4332,11 +4334,11 @@ static const struct opcode opcode_table[256] = {
 	/* 0xC8 - 0xCF */
 	I(Stack | SrcImmU16 | Src2ImmByte | IsBranch, em_enter),
 	I(Stack | IsBranch, em_leave),
-	I(ImplicitOps | SrcImmU16 | IsBranch, em_ret_far_imm),
-	I(ImplicitOps | IsBranch, em_ret_far),
-	D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch, intn),
+	I(ImplicitOps | SrcImmU16 | IsBranch | ShadowStack, em_ret_far_imm),
+	I(ImplicitOps | IsBranch | ShadowStack, em_ret_far),
+	D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch | ShadowStack, intn),
 	D(ImplicitOps | No64 | IsBranch),
-	II(ImplicitOps | IsBranch, em_iret, iret),
+	II(ImplicitOps | IsBranch | ShadowStack, em_iret, iret),
 	/* 0xD0 - 0xD7 */
 	G(Src2One | ByteOp, group2), G(Src2One, group2),
 	G(Src2CL | ByteOp, group2), G(Src2CL, group2),
@@ -4352,7 +4354,7 @@ static const struct opcode opcode_table[256] = {
 	I2bvIP(SrcImmUByte | DstAcc, em_in,  in,  check_perm_in),
 	I2bvIP(SrcAcc | DstImmUByte, em_out, out, check_perm_out),
 	/* 0xE8 - 0xEF */
-	I(SrcImm | NearBranch | IsBranch, em_call),
+	I(SrcImm | NearBranch | IsBranch | ShadowStack, em_call),
 	D(SrcImm | ImplicitOps | NearBranch | IsBranch),
 	I(SrcImmFAddr | No64 | IsBranch, em_jmp_far),
 	D(SrcImmByte | ImplicitOps | NearBranch | IsBranch),
@@ -4371,7 +4373,7 @@ static const struct opcode opcode_table[256] = {
 static const struct opcode twobyte_table[256] = {
 	/* 0x00 - 0x0F */
 	G(0, group6), GD(0, &group7), N, N,
-	N, I(ImplicitOps | EmulateOnUD | IsBranch, em_syscall),
+	N, I(ImplicitOps | EmulateOnUD | IsBranch | ShadowStack | IndirBrnTrk, em_syscall),
 	II(ImplicitOps | Priv, em_clts, clts), N,
 	DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N,
 	N, D(ImplicitOps | ModRM | SrcMem | NoAccess), N, N,
@@ -4402,8 +4404,8 @@ static const struct opcode twobyte_table[256] = {
 	IIP(ImplicitOps, em_rdtsc, rdtsc, check_rdtsc),
 	II(ImplicitOps | Priv, em_rdmsr, rdmsr),
 	IIP(ImplicitOps, em_rdpmc, rdpmc, check_rdpmc),
-	I(ImplicitOps | EmulateOnUD | IsBranch, em_sysenter),
-	I(ImplicitOps | Priv | EmulateOnUD | IsBranch, em_sysexit),
+	I(ImplicitOps | EmulateOnUD | IsBranch | ShadowStack | IndirBrnTrk, em_sysenter),
+	I(ImplicitOps | Priv | EmulateOnUD | IsBranch | ShadowStack, em_sysexit),
 	N, N,
 	N, N, N, N, N, N, N, N,
 	/* 0x40 - 0x4F */
@@ -4941,6 +4943,40 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
 	if (ctxt->d == 0)
 		return EMULATION_FAILED;
 
+	/*
+	 * Reject emulation if KVM might need to emulate shadow stack updates
+	 * and/or indirect branch tracking enforcement, which the emulator
+	 * doesn't support.
+	 */
+	if (opcode.flags & (ShadowStack | IndirBrnTrk) &&
+	    ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET) {
+		u64 u_cet = 0, s_cet = 0;
+
+		/*
+		 * Check both User and Supervisor on far transfers as inter-
+		 * privilege level transfers are impacted by CET at the target
+		 * privilege levels, and that is not known at this time.  The
+		 * the expectation is that the guest will not require emulation
+		 * of any CET-affected instructions at any privilege level.
+		 */
+		if (!(opcode.flags & NearBranch))
+			u_cet = s_cet = CET_SHSTK_EN | CET_ENDBR_EN;
+		else if (ctxt->ops->cpl(ctxt) == 3)
+			u_cet = CET_SHSTK_EN | CET_ENDBR_EN;
+		else
+			s_cet = CET_SHSTK_EN | CET_ENDBR_EN;
+
+		if ((u_cet && ctxt->ops->get_msr(ctxt, MSR_IA32_U_CET, &u_cet)) ||
+		    (s_cet && ctxt->ops->get_msr(ctxt, MSR_IA32_S_CET, &s_cet)))
+			return EMULATION_FAILED;
+
+		if ((u_cet | s_cet) & CET_SHSTK_EN && opcode.flags & ShadowStack)
+			return EMULATION_FAILED;
+
+		if ((u_cet | s_cet) & CET_ENDBR_EN && opcode.flags & IndirBrnTrk)
+			return EMULATION_FAILED;
+	}
+
 	ctxt->execute = opcode.u.execute;
 
 	if (unlikely(emulation_type & EMULTYPE_TRAP_UD) &&
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 19/41] KVM: x86: Enable CET virtualization for VMX and advertise to userspace
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (17 preceding siblings ...)
  2025-09-12 23:22 ` [PATCH v15 18/41] KVM: x86: Don't emulate instructions affected by CET features Sean Christopherson
@ 2025-09-12 23:22 ` Sean Christopherson
  2025-09-18  1:57   ` Binbin Wu
  2025-09-18  2:18   ` Binbin Wu
  2025-09-12 23:22 ` [PATCH v15 20/41] KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2 Sean Christopherson
                   ` (24 subsequent siblings)
  43 siblings, 2 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: Yang Weijiang <weijiang.yang@intel.com>

Expose CET features to guest if KVM/host can support them, clear CPUID
feature bits if KVM/host cannot support.

Set CPUID feature bits so that CET features are available in guest CPUID.
Add CR4.CET bit support in order to allow guest set CET master control
bit.

Disable KVM CET feature if unrestricted_guest is unsupported/disabled as
KVM does not support emulating CET.

The CET load-bits in VM_ENTRY/VM_EXIT control fields should be set to make
guest CET xstates isolated from host's.

On platforms with VMX_BASIC[bit56] == 0, inject #CP at VMX entry with error
code will fail, and if VMX_BASIC[bit56] == 1, #CP injection with or without
error code is allowed. Disable CET feature bits if the MSR bit is cleared
so that nested VMM can inject #CP if and only if VMX_BASIC[bit56] == 1.

Don't expose CET feature if either of {U,S}_CET xstate bits is cleared
in host XSS or if XSAVES isn't supported.

CET MSRs are reset to 0s after RESET, power-up and INIT, clear guest CET
xsave-area fields so that guest CET MSRs are reset to 0s after the events.

Meanwhile explicitly disable SHSTK and IBT for SVM because CET KVM enabling
for SVM is not ready.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/include/asm/vmx.h      |  1 +
 arch/x86/kvm/cpuid.c            |  2 ++
 arch/x86/kvm/svm/svm.c          |  4 ++++
 arch/x86/kvm/vmx/capabilities.h |  5 +++++
 arch/x86/kvm/vmx/vmx.c          | 30 +++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/vmx.h          |  6 ++++--
 arch/x86/kvm/x86.c              | 22 +++++++++++++++++++---
 arch/x86/kvm/x86.h              |  3 +++
 9 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d931d72d23c9..8c106c8c9081 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -142,7 +142,7 @@
 			  | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
 			  | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
 			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
-			  | X86_CR4_LAM_SUP))
+			  | X86_CR4_LAM_SUP | X86_CR4_CET))
 
 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
 
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index ce10a7e2d3d9..c85c50019523 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -134,6 +134,7 @@
 #define VMX_BASIC_DUAL_MONITOR_TREATMENT	BIT_ULL(49)
 #define VMX_BASIC_INOUT				BIT_ULL(54)
 #define VMX_BASIC_TRUE_CTLS			BIT_ULL(55)
+#define VMX_BASIC_NO_HW_ERROR_CODE_CC		BIT_ULL(56)
 
 static inline u32 vmx_basic_vmcs_revision_id(u64 vmx_basic)
 {
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b5f87254ced7..ee05b876c656 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -944,6 +944,7 @@ void kvm_set_cpu_caps(void)
 		VENDOR_F(WAITPKG),
 		F(SGX_LC),
 		F(BUS_LOCK_DETECT),
+		X86_64_F(SHSTK),
 	);
 
 	/*
@@ -970,6 +971,7 @@ void kvm_set_cpu_caps(void)
 		F(AMX_INT8),
 		F(AMX_BF16),
 		F(FLUSH_L1D),
+		F(IBT),
 	);
 
 	if (boot_cpu_has(X86_FEATURE_AMD_IBPB_RET) &&
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1650de78648a..d4e1fdcf56da 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5223,6 +5223,10 @@ static __init void svm_set_cpu_caps(void)
 	kvm_caps.supported_perf_cap = 0;
 	kvm_caps.supported_xss = 0;
 
+	/* KVM doesn't yet support CET virtualization for SVM. */
+	kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+	kvm_cpu_cap_clear(X86_FEATURE_IBT);
+
 	/* CPUID 0x80000001 and 0x8000000A (SVM features) */
 	if (nested) {
 		kvm_cpu_cap_set(X86_FEATURE_SVM);
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 7d290b2cb0f4..47b0dec8665a 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -76,6 +76,11 @@ static inline bool cpu_has_vmx_basic_inout(void)
 	return	vmcs_config.basic & VMX_BASIC_INOUT;
 }
 
+static inline bool cpu_has_vmx_basic_no_hw_errcode(void)
+{
+	return	vmcs_config.basic & VMX_BASIC_NO_HW_ERROR_CODE_CC;
+}
+
 static inline bool cpu_has_virtual_nmis(void)
 {
 	return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS &&
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e8155635cb42..8d2186d6549f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2615,6 +2615,7 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 		{ VM_ENTRY_LOAD_IA32_EFER,		VM_EXIT_LOAD_IA32_EFER },
 		{ VM_ENTRY_LOAD_BNDCFGS,		VM_EXIT_CLEAR_BNDCFGS },
 		{ VM_ENTRY_LOAD_IA32_RTIT_CTL,		VM_EXIT_CLEAR_IA32_RTIT_CTL },
+		{ VM_ENTRY_LOAD_CET_STATE,		VM_EXIT_LOAD_CET_STATE },
 	};
 
 	memset(vmcs_conf, 0, sizeof(*vmcs_conf));
@@ -4882,6 +4883,14 @@ void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 
 	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);  /* 22.2.1 */
 
+	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
+		vmcs_writel(GUEST_SSP, 0);
+		vmcs_writel(GUEST_INTR_SSP_TABLE, 0);
+	}
+	if (kvm_cpu_cap_has(X86_FEATURE_IBT) ||
+	    kvm_cpu_cap_has(X86_FEATURE_SHSTK))
+		vmcs_writel(GUEST_S_CET, 0);
+
 	kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
 
 	vpid_sync_context(vmx->vpid);
@@ -6349,6 +6358,10 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 	if (vmcs_read32(VM_EXIT_MSR_STORE_COUNT) > 0)
 		vmx_dump_msrs("guest autostore", &vmx->msr_autostore.guest);
 
+	if (vmentry_ctl & VM_ENTRY_LOAD_CET_STATE)
+		pr_err("S_CET = 0x%016lx, SSP = 0x%016lx, SSP TABLE = 0x%016lx\n",
+		       vmcs_readl(GUEST_S_CET), vmcs_readl(GUEST_SSP),
+		       vmcs_readl(GUEST_INTR_SSP_TABLE));
 	pr_err("*** Host State ***\n");
 	pr_err("RIP = 0x%016lx  RSP = 0x%016lx\n",
 	       vmcs_readl(HOST_RIP), vmcs_readl(HOST_RSP));
@@ -6379,6 +6392,10 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 		       vmcs_read64(HOST_IA32_PERF_GLOBAL_CTRL));
 	if (vmcs_read32(VM_EXIT_MSR_LOAD_COUNT) > 0)
 		vmx_dump_msrs("host autoload", &vmx->msr_autoload.host);
+	if (vmexit_ctl & VM_EXIT_LOAD_CET_STATE)
+		pr_err("S_CET = 0x%016lx, SSP = 0x%016lx, SSP TABLE = 0x%016lx\n",
+		       vmcs_readl(HOST_S_CET), vmcs_readl(HOST_SSP),
+		       vmcs_readl(HOST_INTR_SSP_TABLE));
 
 	pr_err("*** Control State ***\n");
 	pr_err("CPUBased=0x%08x SecondaryExec=0x%08x TertiaryExec=0x%016llx\n",
@@ -7963,7 +7980,6 @@ static __init void vmx_set_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_UMIP);
 
 	/* CPUID 0xD.1 */
-	kvm_caps.supported_xss = 0;
 	if (!cpu_has_vmx_xsaves())
 		kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
 
@@ -7975,6 +7991,18 @@ static __init void vmx_set_cpu_caps(void)
 
 	if (cpu_has_vmx_waitpkg())
 		kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG);
+
+	/*
+	 * Disable CET if unrestricted_guest is unsupported as KVM doesn't
+	 * enforce CET HW behaviors in emulator. On platforms with
+	 * VMX_BASIC[bit56] == 0, inject #CP at VMX entry with error code
+	 * fails, so disable CET in this case too.
+	 */
+	if (!cpu_has_load_cet_ctrl() || !enable_unrestricted_guest ||
+	    !cpu_has_vmx_basic_no_hw_errcode()) {
+		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+		kvm_cpu_cap_clear(X86_FEATURE_IBT);
+	}
 }
 
 static bool vmx_is_io_intercepted(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 24d65dac5e89..08a9a0075404 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -484,7 +484,8 @@ static inline u8 vmx_get_rvi(void)
 	 VM_ENTRY_LOAD_IA32_EFER |					\
 	 VM_ENTRY_LOAD_BNDCFGS |					\
 	 VM_ENTRY_PT_CONCEAL_PIP |					\
-	 VM_ENTRY_LOAD_IA32_RTIT_CTL)
+	 VM_ENTRY_LOAD_IA32_RTIT_CTL |					\
+	 VM_ENTRY_LOAD_CET_STATE)
 
 #define __KVM_REQUIRED_VMX_VM_EXIT_CONTROLS				\
 	(VM_EXIT_SAVE_DEBUG_CONTROLS |					\
@@ -506,7 +507,8 @@ static inline u8 vmx_get_rvi(void)
 	       VM_EXIT_LOAD_IA32_EFER |					\
 	       VM_EXIT_CLEAR_BNDCFGS |					\
 	       VM_EXIT_PT_CONCEAL_PIP |					\
-	       VM_EXIT_CLEAR_IA32_RTIT_CTL)
+	       VM_EXIT_CLEAR_IA32_RTIT_CTL |				\
+	       VM_EXIT_LOAD_CET_STATE)
 
 #define KVM_REQUIRED_VMX_PIN_BASED_VM_EXEC_CONTROL			\
 	(PIN_BASED_EXT_INTR_MASK |					\
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 15f208c44cbd..c78acab2ff3f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -226,7 +226,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
  * PT via guest XSTATE would clobber perf state), i.e. KVM doesn't support
  * IA32_XSS[bit 8] (guests can/must use RDMSR/WRMSR to save/restore PT MSRs).
  */
-#define KVM_SUPPORTED_XSS     0
+#define KVM_SUPPORTED_XSS	(XFEATURE_MASK_CET_USER | \
+				 XFEATURE_MASK_CET_KERNEL)
 
 bool __read_mostly allow_smaller_maxphyaddr = 0;
 EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr);
@@ -10080,6 +10081,20 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 	if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
 		kvm_caps.supported_xss = 0;
 
+	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
+	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
+		kvm_caps.supported_xss &= ~(XFEATURE_MASK_CET_USER |
+					    XFEATURE_MASK_CET_KERNEL);
+
+	if ((kvm_caps.supported_xss & (XFEATURE_MASK_CET_USER |
+	     XFEATURE_MASK_CET_KERNEL)) !=
+	    (XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL)) {
+		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+		kvm_cpu_cap_clear(X86_FEATURE_IBT);
+		kvm_caps.supported_xss &= ~(XFEATURE_MASK_CET_USER |
+					    XFEATURE_MASK_CET_KERNEL);
+	}
+
 	if (kvm_caps.has_tsc_control) {
 		/*
 		 * Make sure the user can only configure tsc_khz values that
@@ -12735,10 +12750,11 @@ static void kvm_xstate_reset(struct kvm_vcpu *vcpu, bool init_event)
 	/*
 	 * On INIT, only select XSTATE components are zeroed, most components
 	 * are unchanged.  Currently, the only components that are zeroed and
-	 * supported by KVM are MPX related.
+	 * supported by KVM are MPX and CET related.
 	 */
 	xfeatures_mask = (kvm_caps.supported_xcr0 | kvm_caps.supported_xss) &
-			 (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR);
+			 (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR |
+			  XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL);
 	if (!xfeatures_mask)
 		return;
 
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 65cbd454c4f1..f3dc77f006f9 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -680,6 +680,9 @@ static inline bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 		__reserved_bits |= X86_CR4_PCIDE;       \
 	if (!__cpu_has(__c, X86_FEATURE_LAM))           \
 		__reserved_bits |= X86_CR4_LAM_SUP;     \
+	if (!__cpu_has(__c, X86_FEATURE_SHSTK) &&       \
+	    !__cpu_has(__c, X86_FEATURE_IBT))           \
+		__reserved_bits |= X86_CR4_CET;         \
 	__reserved_bits;                                \
 })
 
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 20/41] KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (18 preceding siblings ...)
  2025-09-12 23:22 ` [PATCH v15 19/41] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Sean Christopherson
@ 2025-09-12 23:22 ` Sean Christopherson
  2025-09-18  2:27   ` Binbin Wu
  2025-09-12 23:22 ` [PATCH v15 21/41] KVM: nVMX: Prepare for enabling CET support for nested guest Sean Christopherson
                   ` (23 subsequent siblings)
  43 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: Yang Weijiang <weijiang.yang@intel.com>

Per SDM description(Vol.3D, Appendix A.1):
"If bit 56 is read as 1, software can use VM entry to deliver a hardware
exception with or without an error code, regardless of vector"

Modify has_error_code check before inject events to nested guest. Only
enforce the check when guest is in real mode, the exception is not hard
exception and the platform doesn't enumerate bit56 in VMX_BASIC, in all
other case ignore the check to make the logic consistent with SDM.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 28 +++++++++++++++++++---------
 arch/x86/kvm/vmx/nested.h |  5 +++++
 2 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 2156c9a854f4..14f9822b611d 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1272,9 +1272,10 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data)
 {
 	const u64 feature_bits = VMX_BASIC_DUAL_MONITOR_TREATMENT |
 				 VMX_BASIC_INOUT |
-				 VMX_BASIC_TRUE_CTLS;
+				 VMX_BASIC_TRUE_CTLS |
+				 VMX_BASIC_NO_HW_ERROR_CODE_CC;
 
-	const u64 reserved_bits = GENMASK_ULL(63, 56) |
+	const u64 reserved_bits = GENMASK_ULL(63, 57) |
 				  GENMASK_ULL(47, 45) |
 				  BIT_ULL(31);
 
@@ -2949,7 +2950,6 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
 		u8 vector = intr_info & INTR_INFO_VECTOR_MASK;
 		u32 intr_type = intr_info & INTR_INFO_INTR_TYPE_MASK;
 		bool has_error_code = intr_info & INTR_INFO_DELIVER_CODE_MASK;
-		bool should_have_error_code;
 		bool urg = nested_cpu_has2(vmcs12,
 					   SECONDARY_EXEC_UNRESTRICTED_GUEST);
 		bool prot_mode = !urg || vmcs12->guest_cr0 & X86_CR0_PE;
@@ -2966,12 +2966,20 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
 		    CC(intr_type == INTR_TYPE_OTHER_EVENT && vector != 0))
 			return -EINVAL;
 
-		/* VM-entry interruption-info field: deliver error code */
-		should_have_error_code =
-			intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
-			x86_exception_has_error_code(vector);
-		if (CC(has_error_code != should_have_error_code))
-			return -EINVAL;
+		/*
+		 * Cannot deliver error code in real mode or if the interrupt
+		 * type is not hardware exception. For other cases, do the
+		 * consistency check only if the vCPU doesn't enumerate
+		 * VMX_BASIC_NO_HW_ERROR_CODE_CC.
+		 */
+		if (!prot_mode || intr_type != INTR_TYPE_HARD_EXCEPTION) {
+			if (CC(has_error_code))
+				return -EINVAL;
+		} else if (!nested_cpu_has_no_hw_errcode_cc(vcpu)) {
+			if (CC(has_error_code !=
+			       x86_exception_has_error_code(vector)))
+				return -EINVAL;
+		}
 
 		/* VM-entry exception error code */
 		if (CC(has_error_code &&
@@ -7214,6 +7222,8 @@ static void nested_vmx_setup_basic(struct nested_vmx_msrs *msrs)
 	msrs->basic |= VMX_BASIC_TRUE_CTLS;
 	if (cpu_has_vmx_basic_inout())
 		msrs->basic |= VMX_BASIC_INOUT;
+	if (cpu_has_vmx_basic_no_hw_errcode())
+		msrs->basic |= VMX_BASIC_NO_HW_ERROR_CODE_CC;
 }
 
 static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index 6eedcfc91070..983484d42ebf 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -309,6 +309,11 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
 	       __kvm_is_valid_cr4(vcpu, val);
 }
 
+static inline bool nested_cpu_has_no_hw_errcode_cc(struct kvm_vcpu *vcpu)
+{
+	return to_vmx(vcpu)->nested.msrs.basic & VMX_BASIC_NO_HW_ERROR_CODE_CC;
+}
+
 /* No difference in the restrictions on guest and host CR4 in VMX operation. */
 #define nested_guest_cr4_valid	nested_cr4_valid
 #define nested_host_cr4_valid	nested_cr4_valid
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 21/41] KVM: nVMX: Prepare for enabling CET support for nested guest
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (19 preceding siblings ...)
  2025-09-12 23:22 ` [PATCH v15 20/41] KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2 Sean Christopherson
@ 2025-09-12 23:22 ` Sean Christopherson
  2025-09-15 17:45   ` Xin Li
  2025-09-18  4:48   ` Xin Li
  2025-09-12 23:23 ` [PATCH v15 22/41] KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET Sean Christopherson
                   ` (22 subsequent siblings)
  43 siblings, 2 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: Yang Weijiang <weijiang.yang@intel.com>

Set up CET MSRs, related VM_ENTRY/EXIT control bits and fixed CR4 setting
to enable CET for nested VM.

vmcs12 and vmcs02 needs to be synced when L2 exits to L1 or when L1 wants
to resume L2, that way correct CET states can be observed by one another.

Please note that consistency checks regarding CET state during VM-Entry
will be added later to prevent this patch from becoming too large.
Advertising the new CET VM_ENTRY/EXIT control bits are also be deferred
until after the consistency checks are added.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 77 +++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmcs12.c |  6 +++
 arch/x86/kvm/vmx/vmcs12.h | 14 ++++++-
 arch/x86/kvm/vmx/vmx.c    |  2 +
 arch/x86/kvm/vmx/vmx.h    |  3 ++
 5 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 14f9822b611d..51d69f368689 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -721,6 +721,24 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
 					 MSR_IA32_MPERF, MSR_TYPE_R);
 
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_U_CET, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_S_CET, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PL0_SSP, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PL1_SSP, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PL2_SSP, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PL3_SSP, MSR_TYPE_RW);
+
 	kvm_vcpu_unmap(vcpu, &map);
 
 	vmx->nested.force_msr_bitmap_recalc = false;
@@ -2521,6 +2539,32 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
 	}
 }
 
+static void vmcs_read_cet_state(struct kvm_vcpu *vcpu, u64 *s_cet,
+				u64 *ssp, u64 *ssp_tbl)
+{
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) ||
+	    guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+		*s_cet = vmcs_readl(GUEST_S_CET);
+
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) {
+		*ssp = vmcs_readl(GUEST_SSP);
+		*ssp_tbl = vmcs_readl(GUEST_INTR_SSP_TABLE);
+	}
+}
+
+static void vmcs_write_cet_state(struct kvm_vcpu *vcpu, u64 s_cet,
+				 u64 ssp, u64 ssp_tbl)
+{
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) ||
+	    guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+		vmcs_writel(GUEST_S_CET, s_cet);
+
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) {
+		vmcs_writel(GUEST_SSP, ssp);
+		vmcs_writel(GUEST_INTR_SSP_TABLE, ssp_tbl);
+	}
+}
+
 static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 {
 	struct hv_enlightened_vmcs *hv_evmcs = nested_vmx_evmcs(vmx);
@@ -2637,6 +2681,10 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 	vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.host.nr);
 	vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr);
 
+	if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE)
+		vmcs_write_cet_state(&vmx->vcpu, vmcs12->guest_s_cet,
+				     vmcs12->guest_ssp, vmcs12->guest_ssp_tbl);
+
 	set_cr4_guest_host_mask(vmx);
 }
 
@@ -2676,6 +2724,13 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 		kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
 		vmx_guest_debugctl_write(vcpu, vmx->nested.pre_vmenter_debugctl);
 	}
+
+	if (!vmx->nested.nested_run_pending ||
+	    !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE))
+		vmcs_write_cet_state(vcpu, vmx->nested.pre_vmenter_s_cet,
+				     vmx->nested.pre_vmenter_ssp,
+				     vmx->nested.pre_vmenter_ssp_tbl);
+
 	if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending ||
 	    !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)))
 		vmcs_write64(GUEST_BNDCFGS, vmx->nested.pre_vmenter_bndcfgs);
@@ -3552,6 +3607,12 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 	     !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)))
 		vmx->nested.pre_vmenter_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
 
+	if (!vmx->nested.nested_run_pending ||
+	    !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE))
+		vmcs_read_cet_state(vcpu, &vmx->nested.pre_vmenter_s_cet,
+				    &vmx->nested.pre_vmenter_ssp,
+				    &vmx->nested.pre_vmenter_ssp_tbl);
+
 	/*
 	 * Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and*
 	 * nested early checks are disabled.  In the event of a "late" VM-Fail,
@@ -4635,6 +4696,10 @@ static void sync_vmcs02_to_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 
 	if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_IA32_EFER)
 		vmcs12->guest_ia32_efer = vcpu->arch.efer;
+
+	vmcs_read_cet_state(&vmx->vcpu, &vmcs12->guest_s_cet,
+			    &vmcs12->guest_ssp,
+			    &vmcs12->guest_ssp_tbl);
 }
 
 /*
@@ -4760,6 +4825,18 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
 	if (vmcs12->vm_exit_controls & VM_EXIT_CLEAR_BNDCFGS)
 		vmcs_write64(GUEST_BNDCFGS, 0);
 
+	/*
+	 * Load CET state from host state if VM_EXIT_LOAD_CET_STATE is set.
+	 * otherwise CET state should be retained across VM-exit, i.e.,
+	 * guest values should be propagated from vmcs12 to vmcs01.
+	 */
+	if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_CET_STATE)
+		vmcs_write_cet_state(vcpu, vmcs12->host_s_cet, vmcs12->host_ssp,
+				     vmcs12->host_ssp_tbl);
+	else
+		vmcs_write_cet_state(vcpu, vmcs12->guest_s_cet, vmcs12->guest_ssp,
+				     vmcs12->guest_ssp_tbl);
+
 	if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PAT) {
 		vmcs_write64(GUEST_IA32_PAT, vmcs12->host_ia32_pat);
 		vcpu->arch.pat = vmcs12->host_ia32_pat;
diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c
index 106a72c923ca..4233b5ca9461 100644
--- a/arch/x86/kvm/vmx/vmcs12.c
+++ b/arch/x86/kvm/vmx/vmcs12.c
@@ -139,6 +139,9 @@ const unsigned short vmcs12_field_offsets[] = {
 	FIELD(GUEST_PENDING_DBG_EXCEPTIONS, guest_pending_dbg_exceptions),
 	FIELD(GUEST_SYSENTER_ESP, guest_sysenter_esp),
 	FIELD(GUEST_SYSENTER_EIP, guest_sysenter_eip),
+	FIELD(GUEST_S_CET, guest_s_cet),
+	FIELD(GUEST_SSP, guest_ssp),
+	FIELD(GUEST_INTR_SSP_TABLE, guest_ssp_tbl),
 	FIELD(HOST_CR0, host_cr0),
 	FIELD(HOST_CR3, host_cr3),
 	FIELD(HOST_CR4, host_cr4),
@@ -151,5 +154,8 @@ const unsigned short vmcs12_field_offsets[] = {
 	FIELD(HOST_IA32_SYSENTER_EIP, host_ia32_sysenter_eip),
 	FIELD(HOST_RSP, host_rsp),
 	FIELD(HOST_RIP, host_rip),
+	FIELD(HOST_S_CET, host_s_cet),
+	FIELD(HOST_SSP, host_ssp),
+	FIELD(HOST_INTR_SSP_TABLE, host_ssp_tbl),
 };
 const unsigned int nr_vmcs12_fields = ARRAY_SIZE(vmcs12_field_offsets);
diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
index 56fd150a6f24..4ad6b16525b9 100644
--- a/arch/x86/kvm/vmx/vmcs12.h
+++ b/arch/x86/kvm/vmx/vmcs12.h
@@ -117,7 +117,13 @@ struct __packed vmcs12 {
 	natural_width host_ia32_sysenter_eip;
 	natural_width host_rsp;
 	natural_width host_rip;
-	natural_width paddingl[8]; /* room for future expansion */
+	natural_width host_s_cet;
+	natural_width host_ssp;
+	natural_width host_ssp_tbl;
+	natural_width guest_s_cet;
+	natural_width guest_ssp;
+	natural_width guest_ssp_tbl;
+	natural_width paddingl[2]; /* room for future expansion */
 	u32 pin_based_vm_exec_control;
 	u32 cpu_based_vm_exec_control;
 	u32 exception_bitmap;
@@ -294,6 +300,12 @@ static inline void vmx_check_vmcs12_offsets(void)
 	CHECK_OFFSET(host_ia32_sysenter_eip, 656);
 	CHECK_OFFSET(host_rsp, 664);
 	CHECK_OFFSET(host_rip, 672);
+	CHECK_OFFSET(host_s_cet, 680);
+	CHECK_OFFSET(host_ssp, 688);
+	CHECK_OFFSET(host_ssp_tbl, 696);
+	CHECK_OFFSET(guest_s_cet, 704);
+	CHECK_OFFSET(guest_ssp, 712);
+	CHECK_OFFSET(guest_ssp_tbl, 720);
 	CHECK_OFFSET(pin_based_vm_exec_control, 744);
 	CHECK_OFFSET(cpu_based_vm_exec_control, 748);
 	CHECK_OFFSET(exception_bitmap, 752);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 8d2186d6549f..989008f5307e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7749,6 +7749,8 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)
 	cr4_fixed1_update(X86_CR4_PKE,        ecx, feature_bit(PKU));
 	cr4_fixed1_update(X86_CR4_UMIP,       ecx, feature_bit(UMIP));
 	cr4_fixed1_update(X86_CR4_LA57,       ecx, feature_bit(LA57));
+	cr4_fixed1_update(X86_CR4_CET,	      ecx, feature_bit(SHSTK));
+	cr4_fixed1_update(X86_CR4_CET,	      edx, feature_bit(IBT));
 
 	entry = kvm_find_cpuid_entry_index(vcpu, 0x7, 1);
 	cr4_fixed1_update(X86_CR4_LAM_SUP,    eax, feature_bit(LAM));
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 08a9a0075404..ecfdba666465 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -181,6 +181,9 @@ struct nested_vmx {
 	 */
 	u64 pre_vmenter_debugctl;
 	u64 pre_vmenter_bndcfgs;
+	u64 pre_vmenter_s_cet;
+	u64 pre_vmenter_ssp;
+	u64 pre_vmenter_ssp_tbl;
 
 	/* to migrate it to L1 if L2 writes to L1's CR8 directly */
 	int l1_tpr_threshold;
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 22/41] KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (20 preceding siblings ...)
  2025-09-12 23:22 ` [PATCH v15 21/41] KVM: nVMX: Prepare for enabling CET support for nested guest Sean Christopherson
@ 2025-09-12 23:23 ` Sean Christopherson
  2025-09-12 23:23 ` [PATCH v15 23/41] KVM: nVMX: Add consistency checks for CET states Sean Christopherson
                   ` (21 subsequent siblings)
  43 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:23 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: Chao Gao <chao.gao@intel.com>

Add consistency checks for CR4.CET and CR0.WP in guest-state or host-state
area in the VMCS12. This ensures that configurations with CR4.CET set and
CR0.WP not set result in VM-entry failure, aligning with architectural
behavior.

Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 51d69f368689..a73f38d7eea1 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3111,6 +3111,9 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
 	    CC(!kvm_vcpu_is_legal_cr3(vcpu, vmcs12->host_cr3)))
 		return -EINVAL;
 
+	if (CC(vmcs12->host_cr4 & X86_CR4_CET && !(vmcs12->host_cr0 & X86_CR0_WP)))
+		return -EINVAL;
+
 	if (CC(is_noncanonical_msr_address(vmcs12->host_ia32_sysenter_esp, vcpu)) ||
 	    CC(is_noncanonical_msr_address(vmcs12->host_ia32_sysenter_eip, vcpu)))
 		return -EINVAL;
@@ -3225,6 +3228,9 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
 	    CC(!nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4)))
 		return -EINVAL;
 
+	if (CC(vmcs12->guest_cr4 & X86_CR4_CET && !(vmcs12->guest_cr0 & X86_CR0_WP)))
+		return -EINVAL;
+
 	if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
 	    (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
 	     CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false))))
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 23/41] KVM: nVMX: Add consistency checks for CET states
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (21 preceding siblings ...)
  2025-09-12 23:23 ` [PATCH v15 22/41] KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET Sean Christopherson
@ 2025-09-12 23:23 ` Sean Christopherson
  2025-09-12 23:23 ` [PATCH v15 24/41] KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state Sean Christopherson
                   ` (20 subsequent siblings)
  43 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:23 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: Chao Gao <chao.gao@intel.com>

Introduce consistency checks for CET states during nested VM-entry.

A VMCS contains both guest and host CET states, each comprising the
IA32_S_CET MSR, SSP, and IA32_INTERRUPT_SSP_TABLE_ADDR MSR. Various
checks are applied to CET states during VM-entry as documented in SDM
Vol3 Chapter "VM ENTRIES". Implement all these checks during nested
VM-entry to emulate the architectural behavior.

In summary, there are three kinds of checks on guest/host CET states
during VM-entry:

A. Checks applied to both guest states and host states:

 * The IA32_S_CET field must not set any reserved bits; bits 10 (SUPPRESS)
   and 11 (TRACKER) cannot both be set.
 * SSP should not have bits 1:0 set.
 * The IA32_INTERRUPT_SSP_TABLE_ADDR field must be canonical.

B. Checks applied to host states only

 * IA32_S_CET MSR and SSP must be canonical if the CPU enters 64-bit mode
   after VM-exit. Otherwise, IA32_S_CET and SSP must have their higher 32
   bits cleared.

C. Checks applied to guest states only:

 * IA32_S_CET MSR and SSP are not required to be canonical (i.e., 63:N-1
   are identical, where N is the CPU's maximum linear-address width). But,
   bits 63:N of SSP must be identical.

Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 47 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index a73f38d7eea1..edb3b877a0f6 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3101,6 +3101,17 @@ static bool is_l1_noncanonical_address_on_vmexit(u64 la, struct vmcs12 *vmcs12)
 	return !__is_canonical_address(la, l1_address_bits_on_exit);
 }
 
+static bool is_valid_cet_state(struct kvm_vcpu *vcpu, u64 s_cet, u64 ssp, u64 ssp_tbl)
+{
+	if (!kvm_is_valid_u_s_cet(vcpu, s_cet) || !IS_ALIGNED(ssp, 4))
+		return false;
+
+	if (is_noncanonical_msr_address(ssp_tbl, vcpu))
+		return false;
+
+	return true;
+}
+
 static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
 				       struct vmcs12 *vmcs12)
 {
@@ -3170,6 +3181,26 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
 			return -EINVAL;
 	}
 
+	if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_CET_STATE) {
+		if (CC(!is_valid_cet_state(vcpu, vmcs12->host_s_cet, vmcs12->host_ssp,
+					   vmcs12->host_ssp_tbl)))
+			return -EINVAL;
+
+		/*
+		 * IA32_S_CET and SSP must be canonical if the host will
+		 * enter 64-bit mode after VM-exit; otherwise, higher
+		 * 32-bits must be all 0s.
+		 */
+		if (ia32e) {
+			if (CC(is_noncanonical_msr_address(vmcs12->host_s_cet, vcpu)) ||
+			    CC(is_noncanonical_msr_address(vmcs12->host_ssp, vcpu)))
+				return -EINVAL;
+		} else {
+			if (CC(vmcs12->host_s_cet >> 32) || CC(vmcs12->host_ssp >> 32))
+				return -EINVAL;
+		}
+	}
+
 	return 0;
 }
 
@@ -3280,6 +3311,22 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
 	     CC((vmcs12->guest_bndcfgs & MSR_IA32_BNDCFGS_RSVD))))
 		return -EINVAL;
 
+	if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE) {
+		if (CC(!is_valid_cet_state(vcpu, vmcs12->guest_s_cet, vmcs12->guest_ssp,
+					   vmcs12->guest_ssp_tbl)))
+			return -EINVAL;
+
+		/*
+		 * Guest SSP must have 63:N bits identical, rather than
+		 * be canonical (i.e., 63:N-1 bits identical), where N is
+		 * the CPU's maximum linear-address width. Similar to
+		 * is_noncanonical_msr_address(), use the host's
+		 * linear-address width.
+		 */
+		if (CC(!__is_canonical_address(vmcs12->guest_ssp, max_host_virt_addr_bits() + 1)))
+			return -EINVAL;
+	}
+
 	if (nested_check_guest_non_reg_state(vmcs12))
 		return -EINVAL;
 
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 24/41] KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (22 preceding siblings ...)
  2025-09-12 23:23 ` [PATCH v15 23/41] KVM: nVMX: Add consistency checks for CET states Sean Christopherson
@ 2025-09-12 23:23 ` Sean Christopherson
  2025-09-12 23:23 ` [PATCH v15 25/41] KVM: x86: SVM: Emulate reads and writes to shadow stack MSRs Sean Christopherson
                   ` (19 subsequent siblings)
  43 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:23 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: Chao Gao <chao.gao@intel.com>

Advertise new VM-Entry/Exit control bits as all nested support for
CET virtualization, including consistency checks, is in place.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index edb3b877a0f6..d7e2fb30fc1a 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -7176,7 +7176,7 @@ static void nested_vmx_setup_exit_ctls(struct vmcs_config *vmcs_conf,
 		VM_EXIT_HOST_ADDR_SPACE_SIZE |
 #endif
 		VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
-		VM_EXIT_CLEAR_BNDCFGS;
+		VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_CET_STATE;
 	msrs->exit_ctls_high |=
 		VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 		VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER |
@@ -7198,7 +7198,8 @@ static void nested_vmx_setup_entry_ctls(struct vmcs_config *vmcs_conf,
 #ifdef CONFIG_X86_64
 		VM_ENTRY_IA32E_MODE |
 #endif
-		VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS;
+		VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS |
+		VM_ENTRY_LOAD_CET_STATE;
 	msrs->entry_ctls_high |=
 		(VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER |
 		 VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL);
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 25/41] KVM: x86: SVM: Emulate reads and writes to shadow stack MSRs
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (23 preceding siblings ...)
  2025-09-12 23:23 ` [PATCH v15 24/41] KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state Sean Christopherson
@ 2025-09-12 23:23 ` Sean Christopherson
  2025-09-15 17:56   ` Xin Li
  2025-09-12 23:23 ` [PATCH v15 26/41] KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02 Sean Christopherson
                   ` (18 subsequent siblings)
  43 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:23 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: John Allen <john.allen@amd.com>

Emulate shadow stack MSR access by reading and writing to the
corresponding fields in the VMCB.

Signed-off-by: John Allen <john.allen@amd.com>
[sean: mark VMCB_CET dirty/clean as appropriate]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 21 +++++++++++++++++++++
 arch/x86/kvm/svm/svm.h |  3 ++-
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d4e1fdcf56da..0c0115b52e5c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2767,6 +2767,15 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (guest_cpuid_is_intel_compatible(vcpu))
 			msr_info->data |= (u64)svm->sysenter_esp_hi << 32;
 		break;
+	case MSR_IA32_S_CET:
+		msr_info->data = svm->vmcb->save.s_cet;
+		break;
+	case MSR_IA32_INT_SSP_TAB:
+		msr_info->data = svm->vmcb->save.isst_addr;
+		break;
+	case MSR_KVM_INTERNAL_GUEST_SSP:
+		msr_info->data = svm->vmcb->save.ssp;
+		break;
 	case MSR_TSC_AUX:
 		msr_info->data = svm->tsc_aux;
 		break;
@@ -2999,6 +3008,18 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		svm->vmcb01.ptr->save.sysenter_esp = (u32)data;
 		svm->sysenter_esp_hi = guest_cpuid_is_intel_compatible(vcpu) ? (data >> 32) : 0;
 		break;
+	case MSR_IA32_S_CET:
+		svm->vmcb->save.s_cet = data;
+		vmcb_mark_dirty(svm->vmcb01.ptr, VMCB_CET);
+		break;
+	case MSR_IA32_INT_SSP_TAB:
+		svm->vmcb->save.isst_addr = data;
+		vmcb_mark_dirty(svm->vmcb01.ptr, VMCB_CET);
+		break;
+	case MSR_KVM_INTERNAL_GUEST_SSP:
+		svm->vmcb->save.ssp = data;
+		vmcb_mark_dirty(svm->vmcb01.ptr, VMCB_CET);
+		break;
 	case MSR_TSC_AUX:
 		/*
 		 * TSC_AUX is always virtualized for SEV-ES guests when the
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c2316adde3cc..a42e95883b45 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -74,6 +74,7 @@ enum {
 			  * AVIC PHYSICAL_TABLE pointer,
 			  * AVIC LOGICAL_TABLE pointer
 			  */
+	VMCB_CET,	 /* S_CET, SSP, ISST_ADDR */
 	VMCB_SW = 31,    /* Reserved for hypervisor/software use */
 };
 
@@ -82,7 +83,7 @@ enum {
 	(1U << VMCB_ASID) | (1U << VMCB_INTR) |			\
 	(1U << VMCB_NPT) | (1U << VMCB_CR) | (1U << VMCB_DR) |	\
 	(1U << VMCB_DT) | (1U << VMCB_SEG) | (1U << VMCB_CR2) |	\
-	(1U << VMCB_LBR) | (1U << VMCB_AVIC) |			\
+	(1U << VMCB_LBR) | (1U << VMCB_AVIC) | (1U << VMCB_CET) | \
 	(1U << VMCB_SW))
 
 /* TPR and CR2 are always written before VMRUN */
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 26/41] KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (24 preceding siblings ...)
  2025-09-12 23:23 ` [PATCH v15 25/41] KVM: x86: SVM: Emulate reads and writes to shadow stack MSRs Sean Christopherson
@ 2025-09-12 23:23 ` Sean Christopherson
  2025-09-12 23:23 ` [PATCH v15 27/41] KVM: x86: SVM: Update dump_vmcb with shadow stack save area additions Sean Christopherson
                   ` (17 subsequent siblings)
  43 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:23 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

Transfer the three CET Shadow Stack VMCB fields (S_CET, ISST_ADDR, and
SSP) on VMRUN, #VMEXIT, and loading nested state (saving nested state
simply copies the entire save area).  SVM doesn't provide a way to
disallow L1 from enabling Shadow Stacks for L2, i.e. KVM *must* provide
nested support before advertising SHSTK to userspace.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 826473f2d7c7..a6443feab252 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -636,6 +636,14 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
 		vmcb_mark_dirty(vmcb02, VMCB_DT);
 	}
 
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+	    (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_CET)))) {
+		vmcb02->save.s_cet  = vmcb12->save.s_cet;
+		vmcb02->save.isst_addr = vmcb12->save.isst_addr;
+		vmcb02->save.ssp = vmcb12->save.ssp;
+		vmcb_mark_dirty(vmcb02, VMCB_CET);
+	}
+
 	kvm_set_rflags(vcpu, vmcb12->save.rflags | X86_EFLAGS_FIXED);
 
 	svm_set_efer(vcpu, svm->nested.save.efer);
@@ -1044,6 +1052,12 @@ void svm_copy_vmrun_state(struct vmcb_save_area *to_save,
 	to_save->rsp = from_save->rsp;
 	to_save->rip = from_save->rip;
 	to_save->cpl = 0;
+
+	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
+		to_save->s_cet  = from_save->s_cet;
+		to_save->isst_addr = from_save->isst_addr;
+		to_save->ssp = from_save->ssp;
+	}
 }
 
 void svm_copy_vmloadsave_state(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
@@ -1111,6 +1125,12 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	vmcb12->save.dr6    = svm->vcpu.arch.dr6;
 	vmcb12->save.cpl    = vmcb02->save.cpl;
 
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) {
+		vmcb12->save.s_cet	= vmcb02->save.s_cet;
+		vmcb12->save.isst_addr	= vmcb02->save.isst_addr;
+		vmcb12->save.ssp	= vmcb02->save.ssp;
+	}
+
 	vmcb12->control.int_state         = vmcb02->control.int_state;
 	vmcb12->control.exit_code         = vmcb02->control.exit_code;
 	vmcb12->control.exit_code_hi      = vmcb02->control.exit_code_hi;
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 27/41] KVM: x86: SVM: Update dump_vmcb with shadow stack save area additions
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (25 preceding siblings ...)
  2025-09-12 23:23 ` [PATCH v15 26/41] KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02 Sean Christopherson
@ 2025-09-12 23:23 ` Sean Christopherson
  2025-09-12 23:23 ` [PATCH v15 28/41] KVM: x86: SVM: Pass through shadow stack MSRs as appropriate Sean Christopherson
                   ` (16 subsequent siblings)
  43 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:23 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: John Allen <john.allen@amd.com>

Add shadow stack VMCB fields to dump_vmcb. PL0_SSP, PL1_SSP, PL2_SSP,
PL3_SSP, and U_CET are part of the SEV-ES save area and are encrypted,
but can be decrypted and dumped if the guest policy allows debugging.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: John Allen <john.allen@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 0c0115b52e5c..c0a16481b9c3 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3410,6 +3410,10 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
 	       "rip:", save->rip, "rflags:", save->rflags);
 	pr_err("%-15s %016llx %-13s %016llx\n",
 	       "rsp:", save->rsp, "rax:", save->rax);
+	pr_err("%-15s %016llx %-13s %016llx\n",
+	       "s_cet:", save->s_cet, "ssp:", save->ssp);
+	pr_err("%-15s %016llx\n",
+	       "isst_addr:", save->isst_addr);
 	pr_err("%-15s %016llx %-13s %016llx\n",
 	       "star:", save01->star, "lstar:", save01->lstar);
 	pr_err("%-15s %016llx %-13s %016llx\n",
@@ -3434,6 +3438,13 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
 		pr_err("%-15s %016llx\n",
 		       "sev_features", vmsa->sev_features);
 
+		pr_err("%-15s %016llx %-13s %016llx\n",
+		       "pl0_ssp:", vmsa->pl0_ssp, "pl1_ssp:", vmsa->pl1_ssp);
+		pr_err("%-15s %016llx %-13s %016llx\n",
+		       "pl2_ssp:", vmsa->pl2_ssp, "pl3_ssp:", vmsa->pl3_ssp);
+		pr_err("%-15s %016llx\n",
+		       "u_cet:", vmsa->u_cet);
+
 		pr_err("%-15s %016llx %-13s %016llx\n",
 		       "rax:", vmsa->rax, "rbx:", vmsa->rbx);
 		pr_err("%-15s %016llx %-13s %016llx\n",
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 28/41] KVM: x86: SVM: Pass through shadow stack MSRs as appropriate
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (26 preceding siblings ...)
  2025-09-12 23:23 ` [PATCH v15 27/41] KVM: x86: SVM: Update dump_vmcb with shadow stack save area additions Sean Christopherson
@ 2025-09-12 23:23 ` Sean Christopherson
  2025-09-12 23:23 ` [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid Sean Christopherson
                   ` (15 subsequent siblings)
  43 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:23 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: John Allen <john.allen@amd.com>

Pass through XSAVE managed CET MSRs on SVM when KVM supports shadow
stack. These cannot be intercepted without also intercepting XSAVE which
would likely cause unacceptable performance overhead.
MSR_IA32_INT_SSP_TAB is not managed by XSAVE, so it is intercepted.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: John Allen <john.allen@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c0a16481b9c3..dc4d34e6af33 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -844,6 +844,17 @@ static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 		svm_disable_intercept_for_msr(vcpu, MSR_IA32_MPERF, MSR_TYPE_R);
 	}
 
+	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
+		bool shstk_enabled = guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
+
+		svm_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, MSR_TYPE_RW, !shstk_enabled);
+		svm_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, !shstk_enabled);
+		svm_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, MSR_TYPE_RW, !shstk_enabled);
+		svm_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, MSR_TYPE_RW, !shstk_enabled);
+		svm_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, MSR_TYPE_RW, !shstk_enabled);
+		svm_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, !shstk_enabled);
+	}
+
 	if (sev_es_guest(vcpu->kvm))
 		sev_es_recalc_msr_intercepts(vcpu);
 
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (27 preceding siblings ...)
  2025-09-12 23:23 ` [PATCH v15 28/41] KVM: x86: SVM: Pass through shadow stack MSRs as appropriate Sean Christopherson
@ 2025-09-12 23:23 ` Sean Christopherson
  2025-09-16 18:55   ` John Allen
  2025-09-12 23:23 ` [PATCH v15 30/41] KVM: SVM: Enable shadow stack virtualization for SVM Sean Christopherson
                   ` (14 subsequent siblings)
  43 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:23 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

Synchronize XSS from the GHCB to KVM's internal tracking if the guest
marks XSS as valid on a #VMGEXIT.  Like XCR0, KVM needs an up-to-date copy
of XSS in order to compute the required XSTATE size when emulating
CPUID.0xD.0x1 for the guest.

Treat the incoming XSS change as an emulated write, i.e. validatate the
guest-provided value, to avoid letting the guest load garbage into KVM's
tracking.  Simply ignore bad values, as either the guest managed to get an
unsupported value into hardware, or the guest is misbehaving and providing
pure garbage.  In either case, KVM can't fix the broken guest.

Note, emulating the change as an MSR write also takes care of side effects,
e.g. marking dynamic CPUID bits as dirty.

Suggested-by: John Allen <john.allen@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c | 3 +++
 arch/x86/kvm/svm/svm.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0cd77a87dd84..0cd32df7b9b6 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3306,6 +3306,9 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
 	if (kvm_ghcb_xcr0_is_valid(svm))
 		__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
 
+	if (kvm_ghcb_xss_is_valid(svm))
+		__kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(ghcb));
+
 	/* Copy the GHCB exit information into the VMCB fields */
 	exit_code = kvm_ghcb_get_sw_exit_code(ghcb);
 	control->exit_code = lower_32_bits(exit_code);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a42e95883b45..10d764878bcc 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -942,5 +942,6 @@ DEFINE_KVM_GHCB_ACCESSORS(sw_exit_info_1)
 DEFINE_KVM_GHCB_ACCESSORS(sw_exit_info_2)
 DEFINE_KVM_GHCB_ACCESSORS(sw_scratch)
 DEFINE_KVM_GHCB_ACCESSORS(xcr0)
+DEFINE_KVM_GHCB_ACCESSORS(xss)
 
 #endif
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 30/41] KVM: SVM: Enable shadow stack virtualization for SVM
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (28 preceding siblings ...)
  2025-09-12 23:23 ` [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid Sean Christopherson
@ 2025-09-12 23:23 ` Sean Christopherson
  2025-09-12 23:23 ` [PATCH v15 31/41] KVM: x86: Add human friendly formatting for #XM, and #VE Sean Christopherson
                   ` (13 subsequent siblings)
  43 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:23 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: John Allen <john.allen@amd.com>

Remove the explicit clearing of shadow stack CPU capabilities.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: John Allen <john.allen@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index dc4d34e6af33..f2e96cf72938 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5264,10 +5264,7 @@ static __init void svm_set_cpu_caps(void)
 	kvm_set_cpu_caps();
 
 	kvm_caps.supported_perf_cap = 0;
-	kvm_caps.supported_xss = 0;
 
-	/* KVM doesn't yet support CET virtualization for SVM. */
-	kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
 	kvm_cpu_cap_clear(X86_FEATURE_IBT);
 
 	/* CPUID 0x80000001 and 0x8000000A (SVM features) */
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 31/41] KVM: x86: Add human friendly formatting for #XM, and #VE
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (29 preceding siblings ...)
  2025-09-12 23:23 ` [PATCH v15 30/41] KVM: SVM: Enable shadow stack virtualization for SVM Sean Christopherson
@ 2025-09-12 23:23 ` Sean Christopherson
  2025-09-12 23:23 ` [PATCH v15 32/41] KVM: x86: Define Control Protection Exception (#CP) vector Sean Christopherson
                   ` (12 subsequent siblings)
  43 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:23 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

Add XM_VECTOR and VE_VECTOR pretty-printing for
trace_kvm_inj_exception().

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/trace.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 57d79fd31df0..06da19b370c5 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -461,8 +461,8 @@ TRACE_EVENT(kvm_inj_virq,
 
 #define kvm_trace_sym_exc						\
 	EXS(DE), EXS(DB), EXS(BP), EXS(OF), EXS(BR), EXS(UD), EXS(NM),	\
-	EXS(DF), EXS(TS), EXS(NP), EXS(SS), EXS(GP), EXS(PF),		\
-	EXS(MF), EXS(AC), EXS(MC)
+	EXS(DF), EXS(TS), EXS(NP), EXS(SS), EXS(GP), EXS(PF), EXS(MF),	\
+	EXS(AC), EXS(MC), EXS(XM), EXS(VE)
 
 /*
  * Tracepoint for kvm interrupt injection:
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 32/41] KVM: x86: Define Control Protection Exception (#CP) vector
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (30 preceding siblings ...)
  2025-09-12 23:23 ` [PATCH v15 31/41] KVM: x86: Add human friendly formatting for #XM, and #VE Sean Christopherson
@ 2025-09-12 23:23 ` Sean Christopherson
  2025-09-12 23:23 ` [PATCH v15 33/41] KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors Sean Christopherson
                   ` (11 subsequent siblings)
  43 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:23 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

Add a CP_VECTOR definition for CET's Control Protection Exception (#CP),
along with human friendly formatting for trace_kvm_inj_exception().

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/uapi/asm/kvm.h | 1 +
 arch/x86/kvm/trace.h            | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 8cc79eca34b2..6faf0dcedf74 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -35,6 +35,7 @@
 #define MC_VECTOR 18
 #define XM_VECTOR 19
 #define VE_VECTOR 20
+#define CP_VECTOR 21
 
 /* Select x86 specific features in <linux/kvm.h> */
 #define __KVM_HAVE_PIT
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 06da19b370c5..322913dda626 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -462,7 +462,7 @@ TRACE_EVENT(kvm_inj_virq,
 #define kvm_trace_sym_exc						\
 	EXS(DE), EXS(DB), EXS(BP), EXS(OF), EXS(BR), EXS(UD), EXS(NM),	\
 	EXS(DF), EXS(TS), EXS(NP), EXS(SS), EXS(GP), EXS(PF), EXS(MF),	\
-	EXS(AC), EXS(MC), EXS(XM), EXS(VE)
+	EXS(AC), EXS(MC), EXS(XM), EXS(VE), EXS(CP)
 
 /*
  * Tracepoint for kvm interrupt injection:
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 33/41] KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (31 preceding siblings ...)
  2025-09-12 23:23 ` [PATCH v15 32/41] KVM: x86: Define Control Protection Exception (#CP) vector Sean Christopherson
@ 2025-09-12 23:23 ` Sean Christopherson
  2025-09-12 23:23 ` [PATCH v15 34/41] KVM: selftests: Add ex_str() to print human friendly name of " Sean Christopherson
                   ` (10 subsequent siblings)
  43 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:23 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

Add {HV,CP,SX}_VECTOR definitions for AMD's Hypervisor Injection Exception,
VMM Communication Exception, and SVM Security Exception vectors, along with
human friendly formatting for trace_kvm_inj_exception().

Note, KVM is all but guaranteed to never observe or inject #SX, and #HV is
also unlikely to go unused.  Add the architectural collateral mostly for
completeness, and on the off chance that hardware goes off the rails.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/uapi/asm/kvm.h | 4 ++++
 arch/x86/kvm/trace.h            | 3 ++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 6faf0dcedf74..2f0386d79f6e 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -37,6 +37,10 @@
 #define VE_VECTOR 20
 #define CP_VECTOR 21
 
+#define HV_VECTOR 28
+#define VC_VECTOR 29
+#define SX_VECTOR 30
+
 /* Select x86 specific features in <linux/kvm.h> */
 #define __KVM_HAVE_PIT
 #define __KVM_HAVE_IOAPIC
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 322913dda626..e79bc9cb7162 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -462,7 +462,8 @@ TRACE_EVENT(kvm_inj_virq,
 #define kvm_trace_sym_exc						\
 	EXS(DE), EXS(DB), EXS(BP), EXS(OF), EXS(BR), EXS(UD), EXS(NM),	\
 	EXS(DF), EXS(TS), EXS(NP), EXS(SS), EXS(GP), EXS(PF), EXS(MF),	\
-	EXS(AC), EXS(MC), EXS(XM), EXS(VE), EXS(CP)
+	EXS(AC), EXS(MC), EXS(XM), EXS(VE), EXS(CP),			\
+	EXS(HV), EXS(VC), EXS(SX)
 
 /*
  * Tracepoint for kvm interrupt injection:
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 34/41] KVM: selftests: Add ex_str() to print human friendly name of exception vectors
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (32 preceding siblings ...)
  2025-09-12 23:23 ` [PATCH v15 33/41] KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors Sean Christopherson
@ 2025-09-12 23:23 ` Sean Christopherson
  2025-09-15  9:07   ` Chao Gao
  2025-09-12 23:23 ` [PATCH v15 35/41] KVM: selftests: Add an MSR test to exercise guest/host and read/write Sean Christopherson
                   ` (9 subsequent siblings)
  43 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:23 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

Steal exception_mnemonic() from KVM-Unit-Tests as ex_str() (to keep line
lengths reasonable) and use it in assert messages that currently print the
raw vector number.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 .../selftests/kvm/include/x86/processor.h     |  2 ++
 .../testing/selftests/kvm/lib/x86/processor.c | 33 +++++++++++++++++++
 .../selftests/kvm/x86/hyperv_features.c       | 16 ++++-----
 .../selftests/kvm/x86/vmx_pmu_caps_test.c     |  4 +--
 .../selftests/kvm/x86/xcr0_cpuid_test.c       | 12 +++----
 5 files changed, 51 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index efcc4b1de523..2ad84f3809e8 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -34,6 +34,8 @@ extern uint64_t guest_tsc_khz;
 
 #define NMI_VECTOR		0x02
 
+const char *ex_str(int vector);
+
 #define X86_EFLAGS_FIXED	 (1u << 1)
 
 #define X86_CR4_VME		(1ul << 0)
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index 3b63c99f7b96..f9182dbd07f2 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -23,6 +23,39 @@ bool host_cpu_is_intel;
 bool is_forced_emulation_enabled;
 uint64_t guest_tsc_khz;
 
+const char *ex_str(int vector)
+{
+	switch (vector) {
+#define VEC_STR(v) case v##_VECTOR: return "#" #v
+	case DE_VECTOR: return "no exception";
+	case KVM_MAGIC_DE_VECTOR: return "#DE";
+	VEC_STR(DB);
+	VEC_STR(NMI);
+	VEC_STR(BP);
+	VEC_STR(OF);
+	VEC_STR(BR);
+	VEC_STR(UD);
+	VEC_STR(NM);
+	VEC_STR(DF);
+	VEC_STR(TS);
+	VEC_STR(NP);
+	VEC_STR(SS);
+	VEC_STR(GP);
+	VEC_STR(PF);
+	VEC_STR(MF);
+	VEC_STR(AC);
+	VEC_STR(MC);
+	VEC_STR(XM);
+	VEC_STR(VE);
+	VEC_STR(CP);
+	VEC_STR(HV);
+	VEC_STR(VC);
+	VEC_STR(SX);
+	default: return "#??";
+#undef VEC_STR
+	}
+}
+
 static void regs_dump(FILE *stream, struct kvm_regs *regs, uint8_t indent)
 {
 	fprintf(stream, "%*srax: 0x%.16llx rbx: 0x%.16llx "
diff --git a/tools/testing/selftests/kvm/x86/hyperv_features.c b/tools/testing/selftests/kvm/x86/hyperv_features.c
index 068e9c69710d..99d327084172 100644
--- a/tools/testing/selftests/kvm/x86/hyperv_features.c
+++ b/tools/testing/selftests/kvm/x86/hyperv_features.c
@@ -54,12 +54,12 @@ static void guest_msr(struct msr_data *msr)
 
 	if (msr->fault_expected)
 		__GUEST_ASSERT(vector == GP_VECTOR,
-			       "Expected #GP on %sMSR(0x%x), got vector '0x%x'",
-			       msr->write ? "WR" : "RD", msr->idx, vector);
+			       "Expected #GP on %sMSR(0x%x), got %s",
+			       msr->write ? "WR" : "RD", msr->idx, ex_str(vector));
 	else
 		__GUEST_ASSERT(!vector,
-			       "Expected success on %sMSR(0x%x), got vector '0x%x'",
-			       msr->write ? "WR" : "RD", msr->idx, vector);
+			       "Expected success on %sMSR(0x%x), got %s",
+			       msr->write ? "WR" : "RD", msr->idx, ex_str(vector));
 
 	if (vector || is_write_only_msr(msr->idx))
 		goto done;
@@ -102,12 +102,12 @@ static void guest_hcall(vm_vaddr_t pgs_gpa, struct hcall_data *hcall)
 	vector = __hyperv_hypercall(hcall->control, input, output, &res);
 	if (hcall->ud_expected) {
 		__GUEST_ASSERT(vector == UD_VECTOR,
-			       "Expected #UD for control '%lu', got vector '0x%x'",
-			       hcall->control, vector);
+			       "Expected #UD for control '%lu', got %s",
+			       hcall->control, ex_str(vector));
 	} else {
 		__GUEST_ASSERT(!vector,
-			       "Expected no exception for control '%lu', got vector '0x%x'",
-			       hcall->control, vector);
+			       "Expected no exception for control '%lu', got %s",
+			       hcall->control, ex_str(vector));
 		GUEST_ASSERT_EQ(res, hcall->expect);
 	}
 
diff --git a/tools/testing/selftests/kvm/x86/vmx_pmu_caps_test.c b/tools/testing/selftests/kvm/x86/vmx_pmu_caps_test.c
index a1f5ff45d518..7d37f0cd4eb9 100644
--- a/tools/testing/selftests/kvm/x86/vmx_pmu_caps_test.c
+++ b/tools/testing/selftests/kvm/x86/vmx_pmu_caps_test.c
@@ -56,8 +56,8 @@ static void guest_test_perf_capabilities_gp(uint64_t val)
 	uint8_t vector = wrmsr_safe(MSR_IA32_PERF_CAPABILITIES, val);
 
 	__GUEST_ASSERT(vector == GP_VECTOR,
-		       "Expected #GP for value '0x%lx', got vector '0x%x'",
-		       val, vector);
+		       "Expected #GP for value '0x%lx', got %s",
+		       val, ex_str(vector));
 }
 
 static void guest_code(uint64_t current_val)
diff --git a/tools/testing/selftests/kvm/x86/xcr0_cpuid_test.c b/tools/testing/selftests/kvm/x86/xcr0_cpuid_test.c
index c8a5c5e51661..d038c1571729 100644
--- a/tools/testing/selftests/kvm/x86/xcr0_cpuid_test.c
+++ b/tools/testing/selftests/kvm/x86/xcr0_cpuid_test.c
@@ -81,13 +81,13 @@ static void guest_code(void)
 
 	vector = xsetbv_safe(0, XFEATURE_MASK_FP);
 	__GUEST_ASSERT(!vector,
-		       "Expected success on XSETBV(FP), got vector '0x%x'",
-		       vector);
+		       "Expected success on XSETBV(FP), got %s",
+		       ex_str(vector));
 
 	vector = xsetbv_safe(0, supported_xcr0);
 	__GUEST_ASSERT(!vector,
-		       "Expected success on XSETBV(0x%lx), got vector '0x%x'",
-		       supported_xcr0, vector);
+		       "Expected success on XSETBV(0x%lx), got %s",
+		       supported_xcr0, ex_str(vector));
 
 	for (i = 0; i < 64; i++) {
 		if (supported_xcr0 & BIT_ULL(i))
@@ -95,8 +95,8 @@ static void guest_code(void)
 
 		vector = xsetbv_safe(0, supported_xcr0 | BIT_ULL(i));
 		__GUEST_ASSERT(vector == GP_VECTOR,
-			       "Expected #GP on XSETBV(0x%llx), supported XCR0 = %lx, got vector '0x%x'",
-			       BIT_ULL(i), supported_xcr0, vector);
+			       "Expected #GP on XSETBV(0x%llx), supported XCR0 = %lx, got %s",
+			       BIT_ULL(i), supported_xcr0, ex_str(vector));
 	}
 
 	GUEST_DONE();
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 35/41] KVM: selftests: Add an MSR test to exercise guest/host and read/write
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (33 preceding siblings ...)
  2025-09-12 23:23 ` [PATCH v15 34/41] KVM: selftests: Add ex_str() to print human friendly name of " Sean Christopherson
@ 2025-09-12 23:23 ` Sean Christopherson
  2025-09-15  8:22   ` Chao Gao
  2025-09-12 23:23 ` [PATCH v15 36/41] KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test Sean Christopherson
                   ` (8 subsequent siblings)
  43 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:23 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

Add a selftest to verify reads and writes to various MSRs, from both the
guest and host, and expect success/failure based on whether or not the
vCPU supports the MSR according to supported CPUID.

Note, this test is extremely similar to KVM-Unit-Test's "msr" test, but
provides more coverage with respect to host accesses, and will be extended
to provide addition testing of CPUID-based features, save/restore lists,
and KVM_{G,S}ET_ONE_REG, all which are extremely difficult to validate in
KUT.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/Makefile.kvm    |   1 +
 tools/testing/selftests/kvm/x86/msrs_test.c | 267 ++++++++++++++++++++
 2 files changed, 268 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86/msrs_test.c

diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 66c82f51837b..1d1b77dabb36 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -87,6 +87,7 @@ TEST_GEN_PROGS_x86 += x86/kvm_clock_test
 TEST_GEN_PROGS_x86 += x86/kvm_pv_test
 TEST_GEN_PROGS_x86 += x86/kvm_buslock_test
 TEST_GEN_PROGS_x86 += x86/monitor_mwait_test
+TEST_GEN_PROGS_x86 += x86/msrs_test
 TEST_GEN_PROGS_x86 += x86/nested_emulation_test
 TEST_GEN_PROGS_x86 += x86/nested_exceptions_test
 TEST_GEN_PROGS_x86 += x86/platform_info_test
diff --git a/tools/testing/selftests/kvm/x86/msrs_test.c b/tools/testing/selftests/kvm/x86/msrs_test.c
new file mode 100644
index 000000000000..dcb429cf1440
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86/msrs_test.c
@@ -0,0 +1,267 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <asm/msr-index.h>
+
+#include <stdint.h>
+
+#include "kvm_util.h"
+#include "processor.h"
+
+/* Use HYPERVISOR for MSRs that are emulated unconditionally (as is HYPERVISOR). */
+#define X86_FEATURE_NONE X86_FEATURE_HYPERVISOR
+
+struct kvm_msr {
+	const struct kvm_x86_cpu_feature feature;
+	const char *name;
+	const u64 reset_val;
+	const u64 write_val;
+	const u64 rsvd_val;
+	const u32 index;
+};
+
+#define __MSR_TEST(msr, str, val, rsvd, reset, feat)			\
+{									\
+	.index = msr,							\
+	.name = str,							\
+	.write_val = val,						\
+	.rsvd_val = rsvd,						\
+	.reset_val = reset,						\
+	.feature = X86_FEATURE_ ##feat,					\
+}
+
+#define MSR_TEST_NON_ZERO(msr, val, rsvd, reset, feat)			\
+	__MSR_TEST(msr, #msr, val, rsvd, reset, feat)
+
+#define MSR_TEST(msr, val, rsvd, feat)					\
+	__MSR_TEST(msr, #msr, val, rsvd, 0, feat)
+
+/*
+ * Note, use a page aligned value for the canonical value so that the value
+ * is compatible with MSRs that use bits 11:0 for things other than addresses.
+ */
+static const u64 canonical_val = 0x123456789000ull;
+
+#define MSR_TEST_CANONICAL(msr, feat)					\
+	__MSR_TEST(msr, #msr, canonical_val, NONCANONICAL, 0, feat)
+
+/*
+ * The main struct must be scoped to a function due to the use of structures to
+ * define features.  For the global structure, allocate enough space for the
+ * foreseeable future without getting too ridiculous, to minimize maintenance
+ * costs (bumping the array size every time an MSR is added is really annoying).
+ */
+static struct kvm_msr msrs[128];
+static int idx;
+
+static u64 fixup_rdmsr_val(u32 msr, u64 want)
+{
+	/* AMD CPUs drop bits 63:32, and KVM is supposed to emulate that. */
+	if (host_cpu_is_amd &&
+	    (msr == MSR_IA32_SYSENTER_ESP || msr == MSR_IA32_SYSENTER_EIP))
+		want &= GENMASK_ULL(31, 0);
+
+	return want;
+}
+
+static void __rdmsr(u32 msr, u64 want)
+{
+	u64 val;
+	u8 vec;
+
+	vec = rdmsr_safe(msr, &val);
+	__GUEST_ASSERT(!vec, "Unexpected %s on RDMSR(0x%x)", ex_str(vec), msr);
+
+	__GUEST_ASSERT(val == want, "Wanted 0x%lx from RDMSR(0x%x), got 0x%lx",
+		       want, msr, val);
+}
+
+static void __wrmsr(u32 msr, u64 val)
+{
+	u8 vec;
+
+	vec = wrmsr_safe(msr, val);
+	__GUEST_ASSERT(!vec, "Unexpected %s on WRMSR(0x%x, 0x%lx)",
+		       ex_str(vec), msr, val);
+	__rdmsr(msr, fixup_rdmsr_val(msr, val));
+}
+
+static void guest_test_supported_msr(const struct kvm_msr *msr)
+{
+	__rdmsr(msr->index, msr->reset_val);
+	__wrmsr(msr->index, msr->write_val);
+	GUEST_SYNC(fixup_rdmsr_val(msr->index, msr->write_val));
+
+	__rdmsr(msr->index, msr->reset_val);
+}
+
+static void guest_test_unsupported_msr(const struct kvm_msr *msr)
+{
+	u64 val;
+	u8 vec;
+
+	vec = rdmsr_safe(msr->index, &val);
+	__GUEST_ASSERT(vec == GP_VECTOR, "Wanted #GP on RDMSR(0x%x), got %s",
+		       msr->index, ex_str(vec));
+
+	vec = wrmsr_safe(msr->index, msr->write_val);
+	__GUEST_ASSERT(vec == GP_VECTOR, "Wanted #GP on WRMSR(0x%x, 0x%lx), got %s",
+		       msr->index, msr->write_val, ex_str(vec));
+
+	GUEST_SYNC(0);
+}
+
+static void guest_main(void)
+{
+	for (;;) {
+		const struct kvm_msr *msr = &msrs[READ_ONCE(idx)];
+
+		if (this_cpu_has(msr->feature))
+			guest_test_supported_msr(msr);
+		else
+			guest_test_unsupported_msr(msr);
+
+		/*
+		 * Skipped the "reserved" value check if the CPU will truncate
+		 * the written value (e.g. SYSENTER on AMD), in which case the
+		 * upper value is simply ignored.
+		 */
+		if (msr->rsvd_val &&
+		    msr->rsvd_val == fixup_rdmsr_val(msr->index, msr->rsvd_val)) {
+			u8 vec = wrmsr_safe(msr->index, msr->rsvd_val);
+
+			__GUEST_ASSERT(vec == GP_VECTOR,
+				       "Wanted #GP on WRMSR(0x%x, 0x%lx), got %s",
+				       msr->index, msr->rsvd_val, ex_str(vec));
+		}
+
+		GUEST_SYNC(msr->reset_val);
+	}
+}
+
+static void host_test_msr(struct kvm_vcpu *vcpu, u64 guest_val)
+{
+	u64 reset_val = msrs[idx].reset_val;
+	u32 msr = msrs[idx].index;
+	u64 val;
+
+	if (!kvm_cpu_has(msrs[idx].feature))
+		return;
+
+	val = vcpu_get_msr(vcpu, msr);
+	TEST_ASSERT(val == guest_val, "Wanted 0x%lx from get_msr(0x%x), got 0x%lx",
+		    guest_val, msr, val);
+
+	vcpu_set_msr(vcpu, msr, reset_val);
+
+	val = vcpu_get_msr(vcpu, msr);
+	TEST_ASSERT(val == reset_val, "Wanted 0x%lx from get_msr(0x%x), got 0x%lx",
+		    reset_val, msr, val);
+}
+
+static void do_vcpu_run(struct kvm_vcpu *vcpu)
+{
+	struct ucall uc;
+
+	for (;;) {
+		vcpu_run(vcpu);
+
+		switch (get_ucall(vcpu, &uc)) {
+		case UCALL_SYNC:
+			host_test_msr(vcpu, uc.args[1]);
+			return;
+		case UCALL_PRINTF:
+			pr_info("%s", uc.buffer);
+			break;
+		case UCALL_ABORT:
+			REPORT_GUEST_ASSERT(uc);
+		case UCALL_DONE:
+			TEST_FAIL("Unexpected UCALL_DONE");
+		default:
+			TEST_FAIL("Unexpected ucall: %lu", uc.cmd);
+		}
+	}
+}
+
+static void __vcpus_run(struct kvm_vcpu **vcpus, const int NR_VCPUS)
+{
+	int i;
+
+	for (i = 0; i < NR_VCPUS; i++)
+		do_vcpu_run(vcpus[i]);
+}
+
+static void vcpus_run(struct kvm_vcpu **vcpus, const int NR_VCPUS)
+{
+	__vcpus_run(vcpus, NR_VCPUS);
+	__vcpus_run(vcpus, NR_VCPUS);
+}
+
+#define MISC_ENABLES_RESET_VAL (MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL | MSR_IA32_MISC_ENABLE_BTS_UNAVAIL)
+
+static void test_msrs(void)
+{
+	const struct kvm_msr __msrs[] = {
+		MSR_TEST_NON_ZERO(MSR_IA32_MISC_ENABLE,
+				  MISC_ENABLES_RESET_VAL | MSR_IA32_MISC_ENABLE_FAST_STRING,
+				  MSR_IA32_MISC_ENABLE_FAST_STRING, MISC_ENABLES_RESET_VAL, NONE),
+		MSR_TEST_NON_ZERO(MSR_IA32_CR_PAT, 0x07070707, 0, 0x7040600070406, NONE),
+
+		MSR_TEST(MSR_IA32_SYSENTER_CS, 0x1234, 0, NONE),
+		/*
+		 * SYSENTER_{ESP,EIP} are technically non-canonical on Intel,
+		 * but KVM doesn't emulate that behavior on emulated writes,
+		 * i.e. this test will observe different behavior if the MSR
+		 * writes are handed by hardware vs. KVM.  KVM's behavior is
+		 * intended (though far from ideal), so don't bother testing
+		 * non-canonical values.
+		 */
+		MSR_TEST(MSR_IA32_SYSENTER_ESP, canonical_val, 0, NONE),
+		MSR_TEST(MSR_IA32_SYSENTER_EIP, canonical_val, 0, NONE),
+
+		MSR_TEST_CANONICAL(MSR_FS_BASE, LM),
+		MSR_TEST_CANONICAL(MSR_GS_BASE, LM),
+		MSR_TEST_CANONICAL(MSR_KERNEL_GS_BASE, LM),
+		MSR_TEST_CANONICAL(MSR_LSTAR, LM),
+		MSR_TEST_CANONICAL(MSR_CSTAR, LM),
+		MSR_TEST(MSR_SYSCALL_MASK, 0xffffffff, 0, LM),
+
+		MSR_TEST_CANONICAL(MSR_IA32_PL0_SSP, SHSTK),
+		MSR_TEST(MSR_IA32_PL0_SSP, canonical_val, canonical_val | 1, SHSTK),
+		MSR_TEST_CANONICAL(MSR_IA32_PL1_SSP, SHSTK),
+		MSR_TEST(MSR_IA32_PL1_SSP, canonical_val, canonical_val | 1, SHSTK),
+		MSR_TEST_CANONICAL(MSR_IA32_PL2_SSP, SHSTK),
+		MSR_TEST(MSR_IA32_PL2_SSP, canonical_val, canonical_val | 1, SHSTK),
+		MSR_TEST_CANONICAL(MSR_IA32_PL3_SSP, SHSTK),
+		MSR_TEST(MSR_IA32_PL3_SSP, canonical_val, canonical_val | 1, SHSTK),
+	};
+
+	/*
+	 * Create two vCPUs, but run them on the same task, to validate KVM's
+	 * context switching of MSR state.  Don't pin the task to a pCPU to
+	 * also validate KVM's handling of cross-pCPU migration.
+	 */
+	const int NR_VCPUS = 2;
+	struct kvm_vcpu *vcpus[NR_VCPUS];
+	struct kvm_vm *vm;
+
+	kvm_static_assert(sizeof(__msrs) <= sizeof(msrs));
+	kvm_static_assert(ARRAY_SIZE(__msrs) <= ARRAY_SIZE(msrs));
+	memcpy(msrs, __msrs, sizeof(__msrs));
+
+	vm = vm_create_with_vcpus(NR_VCPUS, guest_main, vcpus);
+
+	sync_global_to_guest(vm, msrs);
+
+	for (idx = 0; idx < ARRAY_SIZE(__msrs); idx++) {
+		sync_global_to_guest(vm, idx);
+
+		vcpus_run(vcpus, NR_VCPUS);
+		vcpus_run(vcpus, NR_VCPUS);
+	}
+
+	kvm_vm_free(vm);
+}
+
+int main(void)
+{
+	test_msrs();
+}
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 36/41] KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (34 preceding siblings ...)
  2025-09-12 23:23 ` [PATCH v15 35/41] KVM: selftests: Add an MSR test to exercise guest/host and read/write Sean Christopherson
@ 2025-09-12 23:23 ` Sean Christopherson
  2025-09-12 23:23 ` [PATCH v15 37/41] KVM: selftests: Extend MSRs test to validate vCPUs without supported features Sean Christopherson
                   ` (7 subsequent siblings)
  43 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:23 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

Extend the MSRs test to support {S,U}_CET, which are a bit of a pain to
handled due to the MSRs existing if IBT *or* SHSTK is supported.  To deal
with Intel's wonderful decision to bundle IBT and SHSTK under CET, track
the "second" feature and skip RDMSR #GP tests to avoid false failures when
running on a CPU with only one of IBT or SHSTK.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/x86/msrs_test.c | 22 ++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/x86/msrs_test.c b/tools/testing/selftests/kvm/x86/msrs_test.c
index dcb429cf1440..095d49d07235 100644
--- a/tools/testing/selftests/kvm/x86/msrs_test.c
+++ b/tools/testing/selftests/kvm/x86/msrs_test.c
@@ -11,6 +11,7 @@
 
 struct kvm_msr {
 	const struct kvm_x86_cpu_feature feature;
+	const struct kvm_x86_cpu_feature feature2;
 	const char *name;
 	const u64 reset_val;
 	const u64 write_val;
@@ -18,7 +19,7 @@ struct kvm_msr {
 	const u32 index;
 };
 
-#define __MSR_TEST(msr, str, val, rsvd, reset, feat)			\
+#define ____MSR_TEST(msr, str, val, rsvd, reset, feat, f2)		\
 {									\
 	.index = msr,							\
 	.name = str,							\
@@ -26,14 +27,21 @@ struct kvm_msr {
 	.rsvd_val = rsvd,						\
 	.reset_val = reset,						\
 	.feature = X86_FEATURE_ ##feat,					\
+	.feature2 = X86_FEATURE_ ##f2,					\
 }
 
+#define __MSR_TEST(msr, str, val, rsvd, reset, feat)			\
+	____MSR_TEST(msr, str, val, rsvd, reset, feat, feat)
+
 #define MSR_TEST_NON_ZERO(msr, val, rsvd, reset, feat)			\
 	__MSR_TEST(msr, #msr, val, rsvd, reset, feat)
 
 #define MSR_TEST(msr, val, rsvd, feat)					\
 	__MSR_TEST(msr, #msr, val, rsvd, 0, feat)
 
+#define MSR_TEST2(msr, val, rsvd, feat, f2)				\
+	____MSR_TEST(msr, #msr, val, rsvd, 0, feat, f2)
+
 /*
  * Note, use a page aligned value for the canonical value so that the value
  * is compatible with MSRs that use bits 11:0 for things other than addresses.
@@ -98,10 +106,18 @@ static void guest_test_unsupported_msr(const struct kvm_msr *msr)
 	u64 val;
 	u8 vec;
 
+	/*
+	 * Skip the RDMSR #GP test if the secondary feature is supported, as
+	 * only the to-be-written value depends on the primary feature.
+	 */
+	if (this_cpu_has(msr->feature2))
+		goto skip_rdmsr_gp;
+
 	vec = rdmsr_safe(msr->index, &val);
 	__GUEST_ASSERT(vec == GP_VECTOR, "Wanted #GP on RDMSR(0x%x), got %s",
 		       msr->index, ex_str(vec));
 
+skip_rdmsr_gp:
 	vec = wrmsr_safe(msr->index, msr->write_val);
 	__GUEST_ASSERT(vec == GP_VECTOR, "Wanted #GP on WRMSR(0x%x, 0x%lx), got %s",
 		       msr->index, msr->write_val, ex_str(vec));
@@ -224,6 +240,10 @@ static void test_msrs(void)
 		MSR_TEST_CANONICAL(MSR_CSTAR, LM),
 		MSR_TEST(MSR_SYSCALL_MASK, 0xffffffff, 0, LM),
 
+		MSR_TEST2(MSR_IA32_S_CET, CET_SHSTK_EN, CET_RESERVED, SHSTK, IBT),
+		MSR_TEST2(MSR_IA32_S_CET, CET_ENDBR_EN, CET_RESERVED, IBT, SHSTK),
+		MSR_TEST2(MSR_IA32_U_CET, CET_SHSTK_EN, CET_RESERVED, SHSTK, IBT),
+		MSR_TEST2(MSR_IA32_U_CET, CET_ENDBR_EN, CET_RESERVED, IBT, SHSTK),
 		MSR_TEST_CANONICAL(MSR_IA32_PL0_SSP, SHSTK),
 		MSR_TEST(MSR_IA32_PL0_SSP, canonical_val, canonical_val | 1, SHSTK),
 		MSR_TEST_CANONICAL(MSR_IA32_PL1_SSP, SHSTK),
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 37/41] KVM: selftests: Extend MSRs test to validate vCPUs without supported features
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (35 preceding siblings ...)
  2025-09-12 23:23 ` [PATCH v15 36/41] KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test Sean Christopherson
@ 2025-09-12 23:23 ` Sean Christopherson
  2025-09-12 23:23 ` [PATCH v15 38/41] KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test Sean Christopherson
                   ` (6 subsequent siblings)
  43 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:23 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

Add a third vCPUs to the MSRs test that runs with all features disabled in
the vCPU's CPUID model, to verify that KVM does the right thing with
respect to emulating accesses to MSRs that shouldn't exist.  Use the same
VM to verify that KVM is honoring the vCPU model, e.g. isn't looking at
per-VM state when emulating MSR accesses.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/x86/msrs_test.c | 28 ++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86/msrs_test.c b/tools/testing/selftests/kvm/x86/msrs_test.c
index 095d49d07235..98892467438c 100644
--- a/tools/testing/selftests/kvm/x86/msrs_test.c
+++ b/tools/testing/selftests/kvm/x86/msrs_test.c
@@ -254,12 +254,17 @@ static void test_msrs(void)
 		MSR_TEST(MSR_IA32_PL3_SSP, canonical_val, canonical_val | 1, SHSTK),
 	};
 
+	const struct kvm_x86_cpu_feature feat_none = X86_FEATURE_NONE;
+	const struct kvm_x86_cpu_feature feat_lm = X86_FEATURE_LM;
+
 	/*
-	 * Create two vCPUs, but run them on the same task, to validate KVM's
+	 * Create three vCPUs, but run them on the same task, to validate KVM's
 	 * context switching of MSR state.  Don't pin the task to a pCPU to
-	 * also validate KVM's handling of cross-pCPU migration.
+	 * also validate KVM's handling of cross-pCPU migration.  Use the full
+	 * set of features for the first two vCPUs, but clear all features in
+	 * third vCPU in order to test both positive and negative paths.
 	 */
-	const int NR_VCPUS = 2;
+	const int NR_VCPUS = 3;
 	struct kvm_vcpu *vcpus[NR_VCPUS];
 	struct kvm_vm *vm;
 
@@ -271,6 +276,23 @@ static void test_msrs(void)
 
 	sync_global_to_guest(vm, msrs);
 
+	/*
+	 * Clear features in the "unsupported features" vCPU.  This needs to be
+	 * done before the first vCPU run as KVM's ABI is that guest CPUID is
+	 * immutable once the vCPU has been run.
+	 */
+	for (idx = 0; idx < ARRAY_SIZE(__msrs); idx++) {
+		/*
+		 * Don't clear LM; selftests are 64-bit only, and KVM doesn't
+		 * honor LM=0 for MSRs that are supposed to exist if and only
+		 * if the vCPU is a 64-bit model.  Ditto for NONE; clearing a
+		 * fake feature flag will result in false failures.
+		 */
+		if (memcmp(&msrs[idx].feature, &feat_lm, sizeof(feat_lm)) &&
+		    memcmp(&msrs[idx].feature, &feat_none, sizeof(feat_none)))
+			vcpu_clear_cpuid_feature(vcpus[2], msrs[idx].feature);
+	}
+
 	for (idx = 0; idx < ARRAY_SIZE(__msrs); idx++) {
 		sync_global_to_guest(vm, idx);
 
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 38/41] KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (36 preceding siblings ...)
  2025-09-12 23:23 ` [PATCH v15 37/41] KVM: selftests: Extend MSRs test to validate vCPUs without supported features Sean Christopherson
@ 2025-09-12 23:23 ` Sean Christopherson
  2025-09-12 23:23 ` [PATCH v15 39/41] KVM: selftests: Add coverate for KVM-defined registers in " Sean Christopherson
                   ` (5 subsequent siblings)
  43 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:23 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

When KVM_{G,S}ET_ONE_REG are supported, verify that MSRs can be accessed
via ONE_REG and through the dedicated MSR ioctls.  For simplicity, run
the test twice, e.g. instead of trying to get MSR values into the exact
right state when switching write methods.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/x86/msrs_test.c | 22 ++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/x86/msrs_test.c b/tools/testing/selftests/kvm/x86/msrs_test.c
index 98892467438c..53e155ba15d4 100644
--- a/tools/testing/selftests/kvm/x86/msrs_test.c
+++ b/tools/testing/selftests/kvm/x86/msrs_test.c
@@ -153,6 +153,9 @@ static void guest_main(void)
 	}
 }
 
+static bool has_one_reg;
+static bool use_one_reg;
+
 static void host_test_msr(struct kvm_vcpu *vcpu, u64 guest_val)
 {
 	u64 reset_val = msrs[idx].reset_val;
@@ -166,11 +169,21 @@ static void host_test_msr(struct kvm_vcpu *vcpu, u64 guest_val)
 	TEST_ASSERT(val == guest_val, "Wanted 0x%lx from get_msr(0x%x), got 0x%lx",
 		    guest_val, msr, val);
 
-	vcpu_set_msr(vcpu, msr, reset_val);
+	if (use_one_reg)
+		vcpu_set_reg(vcpu, KVM_X86_REG_MSR(msr), reset_val);
+	else
+		vcpu_set_msr(vcpu, msr, reset_val);
 
 	val = vcpu_get_msr(vcpu, msr);
 	TEST_ASSERT(val == reset_val, "Wanted 0x%lx from get_msr(0x%x), got 0x%lx",
 		    reset_val, msr, val);
+
+	if (!has_one_reg)
+		return;
+
+	val = vcpu_get_reg(vcpu, KVM_X86_REG_MSR(msr));
+	TEST_ASSERT(val == reset_val, "Wanted 0x%lx from get_reg(0x%x), got 0x%lx",
+		    reset_val, msr, val);
 }
 
 static void do_vcpu_run(struct kvm_vcpu *vcpu)
@@ -305,5 +318,12 @@ static void test_msrs(void)
 
 int main(void)
 {
+	has_one_reg = kvm_has_cap(KVM_CAP_ONE_REG);
+
 	test_msrs();
+
+	if (has_one_reg) {
+		use_one_reg = true;
+		test_msrs();
+	}
 }
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 39/41] KVM: selftests: Add coverate for KVM-defined registers in MSRs test
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (37 preceding siblings ...)
  2025-09-12 23:23 ` [PATCH v15 38/41] KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test Sean Christopherson
@ 2025-09-12 23:23 ` Sean Christopherson
  2025-09-12 23:23 ` [PATCH v15 40/41] KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported Sean Christopherson
                   ` (4 subsequent siblings)
  43 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:23 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

Add test coverage for the KVM-defined GUEST_SSP "register" in the MSRs
test.  While _KVM's_ goal is to not tie the uAPI of KVM-defined registers
to any particular internal implementation, i.e. to not commit in uAPI to
handling GUEST_SSP as an MSR, treating GUEST_SSP as an MSR for testing
purposes is a-ok and is a naturally fit given the semantics of SSP.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/x86/msrs_test.c | 97 ++++++++++++++++++++-
 1 file changed, 94 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86/msrs_test.c b/tools/testing/selftests/kvm/x86/msrs_test.c
index 53e155ba15d4..6a956cfe0c65 100644
--- a/tools/testing/selftests/kvm/x86/msrs_test.c
+++ b/tools/testing/selftests/kvm/x86/msrs_test.c
@@ -17,9 +17,10 @@ struct kvm_msr {
 	const u64 write_val;
 	const u64 rsvd_val;
 	const u32 index;
+	const bool is_kvm_defined;
 };
 
-#define ____MSR_TEST(msr, str, val, rsvd, reset, feat, f2)		\
+#define ____MSR_TEST(msr, str, val, rsvd, reset, feat, f2, is_kvm)	\
 {									\
 	.index = msr,							\
 	.name = str,							\
@@ -28,10 +29,11 @@ struct kvm_msr {
 	.reset_val = reset,						\
 	.feature = X86_FEATURE_ ##feat,					\
 	.feature2 = X86_FEATURE_ ##f2,					\
+	.is_kvm_defined = is_kvm,					\
 }
 
 #define __MSR_TEST(msr, str, val, rsvd, reset, feat)			\
-	____MSR_TEST(msr, str, val, rsvd, reset, feat, feat)
+	____MSR_TEST(msr, str, val, rsvd, reset, feat, feat, false)
 
 #define MSR_TEST_NON_ZERO(msr, val, rsvd, reset, feat)			\
 	__MSR_TEST(msr, #msr, val, rsvd, reset, feat)
@@ -40,7 +42,7 @@ struct kvm_msr {
 	__MSR_TEST(msr, #msr, val, rsvd, 0, feat)
 
 #define MSR_TEST2(msr, val, rsvd, feat, f2)				\
-	____MSR_TEST(msr, #msr, val, rsvd, 0, feat, f2)
+	____MSR_TEST(msr, #msr, val, rsvd, 0, feat, f2, false)
 
 /*
  * Note, use a page aligned value for the canonical value so that the value
@@ -51,6 +53,9 @@ static const u64 canonical_val = 0x123456789000ull;
 #define MSR_TEST_CANONICAL(msr, feat)					\
 	__MSR_TEST(msr, #msr, canonical_val, NONCANONICAL, 0, feat)
 
+#define MSR_TEST_KVM(msr, val, rsvd, feat)				\
+	____MSR_TEST(KVM_REG_ ##msr, #msr, val, rsvd, 0, feat, feat, true)
+
 /*
  * The main struct must be scoped to a function due to the use of structures to
  * define features.  For the global structure, allocate enough space for the
@@ -156,6 +161,83 @@ static void guest_main(void)
 static bool has_one_reg;
 static bool use_one_reg;
 
+#define KVM_X86_MAX_NR_REGS	1
+
+static bool vcpu_has_reg(struct kvm_vcpu *vcpu, u64 reg)
+{
+	struct {
+		struct kvm_reg_list list;
+		u64 regs[KVM_X86_MAX_NR_REGS];
+	} regs = {};
+	int r, i;
+
+	/*
+	 * If KVM_GET_REG_LIST succeeds with n=0, i.e. there are no supported
+	 * regs, then the vCPU obviously doesn't support the reg.
+	 */
+	r = __vcpu_ioctl(vcpu, KVM_GET_REG_LIST, &regs.list.n);
+	if (!r)
+		return false;
+
+	TEST_ASSERT_EQ(errno, E2BIG);
+
+	/*
+	 * KVM x86 is expected to support enumerating a relative small number
+	 * of regs.  The majority of registers supported by KVM_{G,S}ET_ONE_REG
+	 * are enumerated via other ioctls, e.g. KVM_GET_MSR_INDEX_LIST.  For
+	 * simplicity, hardcode the maximum number of regs and manually update
+	 * the test as necessary.
+	 */
+	TEST_ASSERT(regs.list.n <= KVM_X86_MAX_NR_REGS,
+		    "KVM reports %llu regs, test expects at most %u regs, stale test?",
+		    regs.list.n, KVM_X86_MAX_NR_REGS);
+
+	vcpu_ioctl(vcpu, KVM_GET_REG_LIST, &regs.list.n);
+	for (i = 0; i < regs.list.n; i++) {
+		if (regs.regs[i] == reg)
+			return true;
+	}
+
+	return false;
+}
+
+static void host_test_kvm_reg(struct kvm_vcpu *vcpu)
+{
+	bool has_reg = vcpu_cpuid_has(vcpu, msrs[idx].feature);
+	u64 reset_val = msrs[idx].reset_val;
+	u64 write_val = msrs[idx].write_val;
+	u64 rsvd_val = msrs[idx].rsvd_val;
+	u32 reg = msrs[idx].index;
+	u64 val;
+	int r;
+
+	if (!use_one_reg)
+		return;
+
+	TEST_ASSERT_EQ(vcpu_has_reg(vcpu, KVM_X86_REG_KVM(reg)), has_reg);
+
+	if (!has_reg) {
+		r = __vcpu_get_reg(vcpu, KVM_X86_REG_KVM(reg), &val);
+		TEST_ASSERT(r && errno == EINVAL,
+			    "Expected failure on get_reg(0x%x)", reg);
+		rsvd_val = 0;
+		goto out;
+	}
+
+	val = vcpu_get_reg(vcpu, KVM_X86_REG_KVM(reg));
+	TEST_ASSERT(val == reset_val, "Wanted 0x%lx from get_reg(0x%x), got 0x%lx",
+		    reset_val, reg, val);
+
+	vcpu_set_reg(vcpu, KVM_X86_REG_KVM(reg), write_val);
+	val = vcpu_get_reg(vcpu, KVM_X86_REG_KVM(reg));
+	TEST_ASSERT(val == write_val, "Wanted 0x%lx from get_reg(0x%x), got 0x%lx",
+		    write_val, reg, val);
+
+out:
+	r = __vcpu_set_reg(vcpu, KVM_X86_REG_KVM(reg), rsvd_val);
+	TEST_ASSERT(r, "Expected failure on set_reg(0x%x, 0x%lx)", reg, rsvd_val);
+}
+
 static void host_test_msr(struct kvm_vcpu *vcpu, u64 guest_val)
 {
 	u64 reset_val = msrs[idx].reset_val;
@@ -265,6 +347,8 @@ static void test_msrs(void)
 		MSR_TEST(MSR_IA32_PL2_SSP, canonical_val, canonical_val | 1, SHSTK),
 		MSR_TEST_CANONICAL(MSR_IA32_PL3_SSP, SHSTK),
 		MSR_TEST(MSR_IA32_PL3_SSP, canonical_val, canonical_val | 1, SHSTK),
+
+		MSR_TEST_KVM(GUEST_SSP, canonical_val, NONCANONICAL, SHSTK),
 	};
 
 	const struct kvm_x86_cpu_feature feat_none = X86_FEATURE_NONE;
@@ -280,6 +364,7 @@ static void test_msrs(void)
 	const int NR_VCPUS = 3;
 	struct kvm_vcpu *vcpus[NR_VCPUS];
 	struct kvm_vm *vm;
+	int i;
 
 	kvm_static_assert(sizeof(__msrs) <= sizeof(msrs));
 	kvm_static_assert(ARRAY_SIZE(__msrs) <= ARRAY_SIZE(msrs));
@@ -307,6 +392,12 @@ static void test_msrs(void)
 	}
 
 	for (idx = 0; idx < ARRAY_SIZE(__msrs); idx++) {
+		if (msrs[idx].is_kvm_defined) {
+			for (i = 0; i < NR_VCPUS; i++)
+				host_test_kvm_reg(vcpus[i]);
+			continue;
+		}
+
 		sync_global_to_guest(vm, idx);
 
 		vcpus_run(vcpus, NR_VCPUS);
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 40/41] KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (38 preceding siblings ...)
  2025-09-12 23:23 ` [PATCH v15 39/41] KVM: selftests: Add coverate for KVM-defined registers in " Sean Christopherson
@ 2025-09-12 23:23 ` Sean Christopherson
  2025-09-12 23:23 ` [PATCH v15 41/41] KVM: VMX: Make CR4.CET a guest owned bit Sean Christopherson
                   ` (3 subsequent siblings)
  43 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:23 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

Add a check in the MSRs test to verify that KVM's reported support for
MSRs with feature bits is consistent between KVM's MSR save/restore lists
and KVM's supported CPUID.

To deal with Intel's wonderful decision to bundle IBT and SHSTK under CET,
track the "second" feature to avoid false failures when running on a CPU
with only one of IBT or SHSTK.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/x86/msrs_test.c | 22 ++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/x86/msrs_test.c b/tools/testing/selftests/kvm/x86/msrs_test.c
index 6a956cfe0c65..442409e40da0 100644
--- a/tools/testing/selftests/kvm/x86/msrs_test.c
+++ b/tools/testing/selftests/kvm/x86/msrs_test.c
@@ -392,12 +392,32 @@ static void test_msrs(void)
 	}
 
 	for (idx = 0; idx < ARRAY_SIZE(__msrs); idx++) {
-		if (msrs[idx].is_kvm_defined) {
+		struct kvm_msr *msr = &msrs[idx];
+
+		if (msr->is_kvm_defined) {
 			for (i = 0; i < NR_VCPUS; i++)
 				host_test_kvm_reg(vcpus[i]);
 			continue;
 		}
 
+		/*
+		 * Verify KVM_GET_SUPPORTED_CPUID and KVM_GET_MSR_INDEX_LIST
+		 * are consistent with respect to MSRs whose existence is
+		 * enumerated via CPUID.  Note, using LM as a dummy feature
+		 * is a-ok here as well, as all MSRs that abuse LM should be
+		 * unconditionally reported in the save/restore list (and
+		 * selftests are 64-bit only).  Note #2, skip the check for
+		 * FS/GS.base MSRs, as they aren't reported in the save/restore
+		 * list since their state is managed via SREGS.
+		 */
+		TEST_ASSERT(msr->index == MSR_FS_BASE || msr->index == MSR_GS_BASE ||
+			    kvm_msr_is_in_save_restore_list(msr->index) ==
+			    (kvm_cpu_has(msr->feature) || kvm_cpu_has(msr->feature2)),
+			    "%s %s save/restore list, but %s according to CPUID", msr->name,
+			    kvm_msr_is_in_save_restore_list(msr->index) ? "is" : "isn't",
+			    (kvm_cpu_has(msr->feature) || kvm_cpu_has(msr->feature2)) ?
+			    "supported" : "unsupported");
+
 		sync_global_to_guest(vm, idx);
 
 		vcpus_run(vcpus, NR_VCPUS);
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v15 41/41] KVM: VMX: Make CR4.CET a guest owned bit
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (39 preceding siblings ...)
  2025-09-12 23:23 ` [PATCH v15 40/41] KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported Sean Christopherson
@ 2025-09-12 23:23 ` Sean Christopherson
  2025-09-15 13:18 ` [PATCH v15 00/41] KVM: x86: Mega-CET Mathias Krause
                   ` (2 subsequent siblings)
  43 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-12 23:23 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

From: Mathias Krause <minipli@grsecurity.net>

Make CR4.CET a guest-owned bit under VMX by extending
KVM_POSSIBLE_CR4_GUEST_BITS accordingly.

There's no need to intercept changes to CR4.CET, as it's neither
included in KVM's MMU role bits, nor does KVM specifically care about
the actual value of a (nested) guest's CR4.CET value, beside for
enforcing architectural constraints, i.e. make sure that CR0.WP=1 if
CR4.CET=1.

Intercepting writes to CR4.CET is particularly bad for grsecurity
kernels with KERNEXEC or, even worse, KERNSEAL enabled. These features
heavily make use of read-only kernel objects and use a cpu-local CR0.WP
toggle to override it, when needed. Under a CET-enabled kernel, this
also requires toggling CR4.CET, hence the motivation to make it
guest-owned.

Using the old test from [1] gives the following runtime numbers (perf
stat -r 5 ssdd 10 50000):

* grsec guest on linux-6.16-rc5 + cet patches:
  2.4647 +- 0.0706 seconds time elapsed  ( +-  2.86% )

* grsec guest on linux-6.16-rc5 + cet patches + CR4.CET guest-owned:
  1.5648 +- 0.0240 seconds time elapsed  ( +-  1.53% )

Not only does not intercepting CR4.CET make the test run ~35% faster,
it's also more stable with less fluctuation due to fewer VMEXITs.

Therefore, make CR4.CET a guest-owned bit where possible.

This change is VMX-specific, as SVM has no such fine-grained control
register intercept control.

If KVM's assumptions regarding MMU role handling wrt. a guest's CR4.CET
value ever change, the BUILD_BUG_ON()s related to KVM_MMU_CR4_ROLE_BITS
and KVM_POSSIBLE_CR4_GUEST_BITS will catch that early.

Link: https://lore.kernel.org/kvm/20230322013731.102955-1-minipli@grsecurity.net/ [1]
Reviewed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/kvm_cache_regs.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 36a8786db291..8ddb01191d6f 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -7,7 +7,8 @@
 #define KVM_POSSIBLE_CR0_GUEST_BITS	(X86_CR0_TS | X86_CR0_WP)
 #define KVM_POSSIBLE_CR4_GUEST_BITS				  \
 	(X86_CR4_PVI | X86_CR4_DE | X86_CR4_PCE | X86_CR4_OSFXSR  \
-	 | X86_CR4_OSXMMEXCPT | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_FSGSBASE)
+	 | X86_CR4_OSXMMEXCPT | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_FSGSBASE \
+	 | X86_CR4_CET)
 
 #define X86_CR0_PDPTR_BITS    (X86_CR0_CD | X86_CR0_NW | X86_CR0_PG)
 #define X86_CR4_TLBFLUSH_BITS (X86_CR4_PGE | X86_CR4_PCIDE | X86_CR4_PAE | X86_CR4_SMEP)
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 04/41] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support
  2025-09-12 23:22 ` [PATCH v15 04/41] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support Sean Christopherson
@ 2025-09-15  6:29   ` Xiaoyao Li
  2025-09-16  7:10   ` Binbin Wu
  1 sibling, 0 replies; 130+ messages in thread
From: Xiaoyao Li @ 2025-09-15  6:29 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Zhang Yi Z

On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
> 
> Enable KVM_{G,S}ET_ONE_REG uAPIs so that userspace can access MSRs and
> other non-MSR registers through them, along with support for
> KVM_GET_REG_LIST to enumerate support for KVM-defined registers.
> 
> This is in preparation for allowing userspace to read/write the guest SSP
> register, which is needed for the upcoming CET virtualization support.
> 
> Currently, two types of registers are supported: KVM_X86_REG_TYPE_MSR and
> KVM_X86_REG_TYPE_KVM. All MSRs are in the former type; the latter type is
> added for registers that lack existing KVM uAPIs to access them. The "KVM"
> in the name is intended to be vague to give KVM flexibility to include
> other potential registers.  More precise names like "SYNTHETIC" and
> "SYNTHETIC_MSR" were considered, but were deemed too confusing (e.g. can
> be conflated with synthetic guest-visible MSRs) and may put KVM into a
> corner (e.g. if KVM wants to change how a KVM-defined register is modeled
> internally).
> 
> Enumerate only KVM-defined registers in KVM_GET_REG_LIST to avoid
> duplicating KVM_GET_MSR_INDEX_LIST, and so that KVM can return _only_
> registers that are fully supported (KVM_GET_REG_LIST is vCPU-scoped, i.e.
> can be precise, whereas KVM_GET_MSR_INDEX_LIST is system-scoped).
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Link: https://lore.kernel.org/all/20240219074733.122080-18-weijiang.yang@intel.com [1]
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 11/41] KVM: x86: Report KVM supported CET MSRs as to-be-saved
  2025-09-12 23:22 ` [PATCH v15 11/41] KVM: x86: Report KVM supported CET MSRs as to-be-saved Sean Christopherson
@ 2025-09-15  6:30   ` Xiaoyao Li
  2025-09-16  8:46   ` Binbin Wu
  1 sibling, 0 replies; 130+ messages in thread
From: Xiaoyao Li @ 2025-09-15  6:30 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Zhang Yi Z

On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
> 
> Add CET MSRs to the list of MSRs reported to userspace if the feature,
> i.e. IBT or SHSTK, associated with the MSRs is supported by KVM.
> 
> Suggested-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> ---
>   arch/x86/kvm/x86.c | 18 ++++++++++++++++++
>   1 file changed, 18 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5653ddfe124e..2c9908bc8b32 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -344,6 +344,10 @@ static const u32 msrs_to_save_base[] = {
>   	MSR_IA32_UMWAIT_CONTROL,
>   
>   	MSR_IA32_XFD, MSR_IA32_XFD_ERR, MSR_IA32_XSS,
> +
> +	MSR_IA32_U_CET, MSR_IA32_S_CET,
> +	MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP,
> +	MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB,
>   };
>   
>   static const u32 msrs_to_save_pmu[] = {
> @@ -7598,6 +7602,20 @@ static void kvm_probe_msr_to_save(u32 msr_index)
>   		if (!kvm_caps.supported_xss)
>   			return;
>   		break;
> +	case MSR_IA32_U_CET:
> +	case MSR_IA32_S_CET:
> +		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
> +		    !kvm_cpu_cap_has(X86_FEATURE_IBT))
> +			return;
> +		break;
> +	case MSR_IA32_INT_SSP_TAB:
> +		if (!kvm_cpu_cap_has(X86_FEATURE_LM))
> +			return;
> +		fallthrough;
> +	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> +		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
> +			return;
> +		break;
>   	default:
>   		break;
>   	}


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 12/41] KVM: VMX: Introduce CET VMCS fields and control bits
  2025-09-12 23:22 ` [PATCH v15 12/41] KVM: VMX: Introduce CET VMCS fields and control bits Sean Christopherson
@ 2025-09-15  6:31   ` Xiaoyao Li
  2025-09-16  9:00   ` Binbin Wu
  1 sibling, 0 replies; 130+ messages in thread
From: Xiaoyao Li @ 2025-09-15  6:31 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Zhang Yi Z

On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
> 
> Control-flow Enforcement Technology (CET) is a kind of CPU feature used
> to prevent Return/CALL/Jump-Oriented Programming (ROP/COP/JOP) attacks.
> It provides two sub-features(SHSTK,IBT) to defend against ROP/COP/JOP
> style control-flow subversion attacks.
> 
> Shadow Stack (SHSTK):
>    A shadow stack is a second stack used exclusively for control transfer
>    operations. The shadow stack is separate from the data/normal stack and
>    can be enabled individually in user and kernel mode. When shadow stack
>    is enabled, CALL pushes the return address on both the data and shadow
>    stack. RET pops the return address from both stacks and compares them.
>    If the return addresses from the two stacks do not match, the processor
>    generates a #CP.
> 
> Indirect Branch Tracking (IBT):
>    IBT introduces instruction(ENDBRANCH)to mark valid target addresses of
>    indirect branches (CALL, JMP etc...). If an indirect branch is executed
>    and the next instruction is _not_ an ENDBRANCH, the processor generates
>    a #CP. These instruction behaves as a NOP on platforms that have no CET.
> 
> Several new CET MSRs are defined to support CET:
>    MSR_IA32_{U,S}_CET: CET settings for {user,supervisor} CET respectively.
> 
>    MSR_IA32_PL{0,1,2,3}_SSP: SHSTK pointer linear address for CPL{0,1,2,3}.
> 
>    MSR_IA32_INT_SSP_TAB: Linear address of SHSTK pointer table, whose entry
> 			is indexed by IST of interrupt gate desc.
> 
> Two XSAVES state bits are introduced for CET:
>    IA32_XSS:[bit 11]: Control saving/restoring user mode CET states
>    IA32_XSS:[bit 12]: Control saving/restoring supervisor mode CET states.
> 
> Six VMCS fields are introduced for CET:
>    {HOST,GUEST}_S_CET: Stores CET settings for kernel mode.
>    {HOST,GUEST}_SSP: Stores current active SSP.
>    {HOST,GUEST}_INTR_SSP_TABLE: Stores current active MSR_IA32_INT_SSP_TAB.
> 
> On Intel platforms, two additional bits are defined in VM_EXIT and VM_ENTRY
> control fields:
> If VM_EXIT_LOAD_CET_STATE = 1, host CET states are loaded from following
> VMCS fields at VM-Exit:
>    HOST_S_CET
>    HOST_SSP
>    HOST_INTR_SSP_TABLE
> 
> If VM_ENTRY_LOAD_CET_STATE = 1, guest CET states are loaded from following
> VMCS fields at VM-Entry:
>    GUEST_S_CET
>    GUEST_SSP
>    GUEST_INTR_SSP_TABLE
> 
> Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> ---
>   arch/x86/include/asm/vmx.h | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index cca7d6641287..ce10a7e2d3d9 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -106,6 +106,7 @@
>   #define VM_EXIT_CLEAR_BNDCFGS                   0x00800000
>   #define VM_EXIT_PT_CONCEAL_PIP			0x01000000
>   #define VM_EXIT_CLEAR_IA32_RTIT_CTL		0x02000000
> +#define VM_EXIT_LOAD_CET_STATE                  0x10000000
>   
>   #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR	0x00036dff
>   
> @@ -119,6 +120,7 @@
>   #define VM_ENTRY_LOAD_BNDCFGS                   0x00010000
>   #define VM_ENTRY_PT_CONCEAL_PIP			0x00020000
>   #define VM_ENTRY_LOAD_IA32_RTIT_CTL		0x00040000
> +#define VM_ENTRY_LOAD_CET_STATE                 0x00100000
>   
>   #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR	0x000011ff
>   
> @@ -369,6 +371,9 @@ enum vmcs_field {
>   	GUEST_PENDING_DBG_EXCEPTIONS    = 0x00006822,
>   	GUEST_SYSENTER_ESP              = 0x00006824,
>   	GUEST_SYSENTER_EIP              = 0x00006826,
> +	GUEST_S_CET                     = 0x00006828,
> +	GUEST_SSP                       = 0x0000682a,
> +	GUEST_INTR_SSP_TABLE            = 0x0000682c,
>   	HOST_CR0                        = 0x00006c00,
>   	HOST_CR3                        = 0x00006c02,
>   	HOST_CR4                        = 0x00006c04,
> @@ -381,6 +386,9 @@ enum vmcs_field {
>   	HOST_IA32_SYSENTER_EIP          = 0x00006c12,
>   	HOST_RSP                        = 0x00006c14,
>   	HOST_RIP                        = 0x00006c16,
> +	HOST_S_CET                      = 0x00006c18,
> +	HOST_SSP                        = 0x00006c1a,
> +	HOST_INTR_SSP_TABLE             = 0x00006c1c
>   };
>   
>   /*


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 13/41] KVM: x86: Enable guest SSP read/write interface with new uAPIs
  2025-09-12 23:22 ` [PATCH v15 13/41] KVM: x86: Enable guest SSP read/write interface with new uAPIs Sean Christopherson
@ 2025-09-15  6:55   ` Xiaoyao Li
  2025-09-15 22:12     ` Sean Christopherson
  0 siblings, 1 reply; 130+ messages in thread
From: Xiaoyao Li @ 2025-09-15  6:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Zhang Yi Z

On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
> 
> Enable guest shadow stack pointer(SSP) access interface with new uAPIs.
> CET guest SSP is HW register which has corresponding VMCS field to save
> and restore guest values when VM-{Exit,Entry} happens. KVM handles SSP
> as a fake/synthetic MSR for userspace access.
> 
> Use a translation helper to set up mapping for SSP synthetic index and
> KVM-internal MSR index so that userspace doesn't need to take care of
> KVM's management for synthetic MSRs and avoid conflicts.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   Documentation/virt/kvm/api.rst  |  8 ++++++++
>   arch/x86/include/uapi/asm/kvm.h |  3 +++
>   arch/x86/kvm/x86.c              | 23 +++++++++++++++++++++--
>   arch/x86/kvm/x86.h              | 10 ++++++++++
>   4 files changed, 42 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index abd02675a24d..6ae24c5ca559 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -2911,6 +2911,14 @@ such as set vcpu counter or reset vcpu, and they have the following id bit patte
>   x86 MSR registers have the following id bit patterns::
>     0x2030 0002 <msr number:32>
>   
> +Following are the KVM-defined registers for x86:
> +
> +======================= ========= =============================================
> +    Encoding            Register  Description
> +======================= ========= =============================================
> +  0x2030 0003 0000 0000 SSP       Shadow Stack Pointer
> +======================= ========= =============================================
> +
>   4.69 KVM_GET_ONE_REG
>   --------------------
>   
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index 508b713ca52e..8cc79eca34b2 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -437,6 +437,9 @@ struct kvm_xcrs {
>   #define KVM_X86_REG_KVM(index)					\
>   	KVM_X86_REG_ENCODE(KVM_X86_REG_TYPE_KVM, index)
>   
> +/* KVM-defined registers starting from 0 */
> +#define KVM_REG_GUEST_SSP	0
> +
>   #define KVM_SYNC_X86_REGS      (1UL << 0)
>   #define KVM_SYNC_X86_SREGS     (1UL << 1)
>   #define KVM_SYNC_X86_EVENTS    (1UL << 2)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2c9908bc8b32..460ceae11495 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6017,7 +6017,15 @@ struct kvm_x86_reg_id {
>   
>   static int kvm_translate_kvm_reg(struct kvm_x86_reg_id *reg)
>   {
> -	return -EINVAL;
> +	switch (reg->index) {
> +	case KVM_REG_GUEST_SSP:
> +		reg->type = KVM_X86_REG_TYPE_MSR;
> +		reg->index = MSR_KVM_INTERNAL_GUEST_SSP;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +	return 0;
>   }
>   
>   static int kvm_get_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val)
> @@ -6097,11 +6105,22 @@ static int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl,
>   static int kvm_get_reg_list(struct kvm_vcpu *vcpu,
>   			    struct kvm_reg_list __user *user_list)
>   {
> -	u64 nr_regs = 0;
> +	u64 nr_regs = guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) ? 1 : 0;

I wonder what's the semantic of KVM returning KVM_REG_GUEST_SSP on 
KVM_GET_REG_LIST. Does it ensure KVM_{G,S}ET_ONE_REG returns -EINVAL on 
KVM_REG_GUEST_SSP when it's not enumerated by KVM_GET_REG_LIST?

If so, but KVM_{G,S}ET_ONE_REG can succeed on GUEST_SSP even if
!guest_cpu_cap_has() when @ignore_msrs is true.

> +	u64 user_nr_regs;
> +
> +	if (get_user(user_nr_regs, &user_list->n))
> +		return -EFAULT;
>   
>   	if (put_user(nr_regs, &user_list->n))
>   		return -EFAULT;
>   
> +	if (user_nr_regs < nr_regs)
> +		return -E2BIG;
> +
> +	if (nr_regs &&
> +	    put_user(KVM_X86_REG_KVM(KVM_REG_GUEST_SSP), &user_list->reg[0]))
> +		return -EFAULT;
> +
>   	return 0;
>   }
>   
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 786e36fcd0fb..a7c9c72fca93 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -101,6 +101,16 @@ do {											\
>   #define KVM_SVM_DEFAULT_PLE_WINDOW_MAX	USHRT_MAX
>   #define KVM_SVM_DEFAULT_PLE_WINDOW	3000
>   
> +/*
> + * KVM's internal, non-ABI indices for synthetic MSRs. The values themselves
> + * are arbitrary and have no meaning, the only requirement is that they don't
> + * conflict with "real" MSRs that KVM supports. Use values at the upper end
> + * of KVM's reserved paravirtual MSR range to minimize churn, i.e. these values
> + * will be usable until KVM exhausts its supply of paravirtual MSR indices.
> + */
> +
> +#define MSR_KVM_INTERNAL_GUEST_SSP	0x4b564dff
> +
>   static inline unsigned int __grow_ple_window(unsigned int val,
>   		unsigned int base, unsigned int modifier, unsigned int max)
>   {


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 35/41] KVM: selftests: Add an MSR test to exercise guest/host and read/write
  2025-09-12 23:23 ` [PATCH v15 35/41] KVM: selftests: Add an MSR test to exercise guest/host and read/write Sean Christopherson
@ 2025-09-15  8:22   ` Chao Gao
  2025-09-15 17:00     ` Sean Christopherson
  0 siblings, 1 reply; 130+ messages in thread
From: Chao Gao @ 2025-09-15  8:22 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z

>+static void __vcpus_run(struct kvm_vcpu **vcpus, const int NR_VCPUS)
>+{
>+	int i;
>+
>+	for (i = 0; i < NR_VCPUS; i++)
>+		do_vcpu_run(vcpus[i]);
>+}
>+
>+static void vcpus_run(struct kvm_vcpu **vcpus, const int NR_VCPUS)
>+{
>+	__vcpus_run(vcpus, NR_VCPUS);
>+	__vcpus_run(vcpus, NR_VCPUS);

...

>+	for (idx = 0; idx < ARRAY_SIZE(__msrs); idx++) {
>+		sync_global_to_guest(vm, idx);
>+
>+		vcpus_run(vcpus, NR_VCPUS);
>+		vcpus_run(vcpus, NR_VCPUS);

We enter each vCPU 4 times for each MSR here. If I count correctly, only two of
them are needed as the guest code syncs with the host twice for each MSR (one in
guest_test_{un,}supported_msr(), the other at the end of guest_main()).

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 34/41] KVM: selftests: Add ex_str() to print human friendly name of exception vectors
  2025-09-12 23:23 ` [PATCH v15 34/41] KVM: selftests: Add ex_str() to print human friendly name of " Sean Christopherson
@ 2025-09-15  9:07   ` Chao Gao
  0 siblings, 0 replies; 130+ messages in thread
From: Chao Gao @ 2025-09-15  9:07 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z

On Fri, Sep 12, 2025 at 04:23:12PM -0700, Sean Christopherson wrote:
>Steal exception_mnemonic() from KVM-Unit-Tests as ex_str() (to keep line
>lengths reasonable) and use it in assert messages that currently print the
>raw vector number.
>
>Signed-off-by: Sean Christopherson <seanjc@google.com>

There are two more assert messages still printing the raw numbers. Feel free to
squash the below patch into yours.

assert_ucall_vector() in nested_exceptions_test.c could be converted as well but
its use of FAKE_TRIPLE_FAULT_VECTOR makes me hesitant to do that (e.g., we need
to assign a name to the faked vector in the common code).

From 90c502e97be6acc37f84a086c50ecb180719ea46 Mon Sep 17 00:00:00 2001
From: Chao Gao <chao.gao@intel.com>
Date: Mon, 15 Sep 2025 01:41:03 -0700
Subject: [PATCH] KVM: selftests: Use ex_str() to print human friendly name of
 exception vectors

Convert assert messages that are still printing the raw vector numbers to
print human-friendly names.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 tools/testing/selftests/kvm/x86/monitor_mwait_test.c | 8 ++++----
 tools/testing/selftests/kvm/x86/pmu_counters_test.c  | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86/monitor_mwait_test.c b/tools/testing/selftests/kvm/x86/monitor_mwait_test.c
index 0eb371c62ab8..e45c028d2a7e 100644
--- a/tools/testing/selftests/kvm/x86/monitor_mwait_test.c
+++ b/tools/testing/selftests/kvm/x86/monitor_mwait_test.c
@@ -30,12 +30,12 @@ do {									\
									\
	if (fault_wanted)						\
		__GUEST_ASSERT((vector) == UD_VECTOR,			\
-			       "Expected #UD on " insn " for testcase '0x%x', got '0x%x'", \
-			       testcase, vector);			\
+			       "Expected #UD on " insn " for testcase '0x%x', got %s", \
+			       testcase, ex_str(vector));		\
	else								\
		__GUEST_ASSERT(!(vector),				\
-			       "Expected success on " insn " for testcase '0x%x', got '0x%x'", \
-			       testcase, vector);			\
+			       "Expected success on " insn " for testcase '0x%x', got %s", \
+			       testcase, ex_str(vector));		\
 } while (0)
 
 static void guest_monitor_wait(void *arg)
diff --git a/tools/testing/selftests/kvm/x86/pmu_counters_test.c b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
index 8aaaf25b6111..36eb2658f891 100644
--- a/tools/testing/selftests/kvm/x86/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
@@ -344,8 +344,8 @@ static void test_arch_events(uint8_t pmu_version, uint64_t perf_capabilities,
 
 #define GUEST_ASSERT_PMC_MSR_ACCESS(insn, msr, expect_gp, vector)		\
 __GUEST_ASSERT(expect_gp ? vector == GP_VECTOR : !vector,			\
-	       "Expected %s on " #insn "(0x%x), got vector %u",			\
-	       expect_gp ? "#GP" : "no fault", msr, vector)			\
+	       "Expected %s on " #insn "(0x%x), got %s",			\
+	       expect_gp ? "#GP" : "no fault", msr, ex_str(vector))		\
 
 #define GUEST_ASSERT_PMC_VALUE(insn, msr, val, expected)			\
	__GUEST_ASSERT(val == expected,					\
-- 
2.47.3

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 00/41] KVM: x86: Mega-CET
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (40 preceding siblings ...)
  2025-09-12 23:23 ` [PATCH v15 41/41] KVM: VMX: Make CR4.CET a guest owned bit Sean Christopherson
@ 2025-09-15 13:18 ` Mathias Krause
  2025-09-15 21:20 ` John Allen
  2025-09-16 13:53 ` Chao Gao
  43 siblings, 0 replies; 130+ messages in thread
From: Mathias Krause @ 2025-09-15 13:18 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Chao Gao
  Cc: kvm, linux-kernel, Tom Lendacky, John Allen, Rick Edgecombe,
	Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

[-- Attachment #1: Type: text/plain, Size: 1434 bytes --]

Am 13.09.25 um 01:22 schrieb Sean Christopherson:
> This series is (hopefully) all of the in-flight CET virtualization patches
> in one big bundle.  Please holler if I missed a patch or three as this is what
> I am planning on applying for 6.18 (modulo fixups and whatnot), i.e. if there's
> something else that's needed to enable CET virtualization, now's the time...
> 
> Patches 1-3 probably need the most attention, as they are new in v15 and I
> don't have a fully working SEV-ES setup (don't have the right guest firmware,
> ugh).  Though testing on everything would be much appreciated.
> 
> I kept almost all Tested-by tags even for patches that I massaged a bit, and
> only dropped tags for the "don't emulate CET stuff" patch.  In theory, the
> changes I've made *should* be benign.  Please yell, loudly, if I broken
> something and/or you want me to drop your Tested-by.

I retested this series on my Alder Lake NUC (i7-1260P) and with the
attached hacky patch on top of Chao's QEMU branch[1] -- which points to
commit 02364ef48c96 ("fixup! target/i386: Enable XSAVES support for CET
states") for me right now -- the KUT CET tests[2] pass just fine on the
host as well as within a guest, i.e. nested. Therefore my Tested-by
still stands -- at least for the Intel/VMX part.

Thanks,
Mathias

[1] https://github.com/gaochaointel/qemu-dev#qemu-cet
[2]
https://lore.kernel.org/kvm/20250626073459.12990-1-minipli@grsecurity.net/

[-- Attachment #2: qemu_cet_v15.diff --]
[-- Type: text/x-patch, Size: 1423 bytes --]

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index ce3c52fd0f7d..d07dfc714a3c 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -5320,8 +5320,7 @@ static int kvm_get_nested_state(X86CPU *cpu)
     return ret;
 }
 
-#define KVM_X86_REG_SYNTHETIC_MSR   BIT_ULL(35)
-#define REG_MSR_INDEX(x)            (KVM_X86_REG_SYNTHETIC_MSR | x)
+#define KVM_X86_REG_SSP     (0x20300003ULL << 32 | 0x00000000)
 
 static bool has_cet_ssp(CPUState *cpu)
 {
@@ -5409,9 +5408,9 @@ int kvm_arch_put_registers(CPUState *cpu, int level, Error **errp)
     }
 
     if (has_cet_ssp(cpu)) {
-        ret = kvm_set_one_reg(cpu, REG_MSR_INDEX(0ull), &env->guest_ssp);
+        ret = kvm_set_one_reg(cpu, KVM_X86_REG_SSP, &env->guest_ssp);
         if (ret) {
-            error_report("Failed to set KVM_REG_MSR, ret = %d\n", ret);
+            error_report("Failed to set KVM_REG_MSR, ret = %d", ret);
         }
     }
 
@@ -5489,9 +5488,9 @@ int kvm_arch_get_registers(CPUState *cs, Error **errp)
         goto out;
     }
     if (has_cet_ssp(cs)) {
-        ret = kvm_get_one_reg(cs, REG_MSR_INDEX(0ull), &env->guest_ssp);
+        ret = kvm_get_one_reg(cs, KVM_X86_REG_SSP, &env->guest_ssp);
         if (ret) {
-                error_report("Failed to get KVM_REG_MSR, ret = %d\n", ret);
+                error_report("Failed to get KVM_REG_MSR, ret = %d", ret);
         }
     }
     ret = kvm_get_apic(cpu);

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 01/41] KVM: SEV: Rename kvm_ghcb_get_sw_exit_code() to kvm_get_cached_sw_exit_code()
  2025-09-12 23:22 ` [PATCH v15 01/41] KVM: SEV: Rename kvm_ghcb_get_sw_exit_code() to kvm_get_cached_sw_exit_code() Sean Christopherson
@ 2025-09-15 16:15   ` Tom Lendacky
  2025-09-15 16:30     ` Sean Christopherson
  0 siblings, 1 reply; 130+ messages in thread
From: Tom Lendacky @ 2025-09-15 16:15 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Mathias Krause, John Allen, Rick Edgecombe,
	Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

On 9/12/25 18:22, Sean Christopherson wrote:
> Rename kvm_ghcb_get_sw_exit_code() to kvm_get_cached_sw_exit_code() to make
> it clear that KVM is getting the cached value, not reading directly from
> the guest-controlled GHCB.  More importantly, vacating
> kvm_ghcb_get_sw_exit_code() will allow adding a KVM-specific macro-built
> kvm_ghcb_get_##field() helper to read values from the GHCB.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>

Makes me wonder if we want to create kvm_get_cached_sw_exit_info_{1,2}
routines rather than referencing control->exit_info_{1,2} directly?

> ---
>  arch/x86/kvm/svm/sev.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 2fdd2e478a97..fe8d148b76c0 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3216,7 +3216,7 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
>  		kvfree(svm->sev_es.ghcb_sa);
>  }
>  
> -static u64 kvm_ghcb_get_sw_exit_code(struct vmcb_control_area *control)
> +static u64 kvm_get_cached_sw_exit_code(struct vmcb_control_area *control)
>  {
>  	return (((u64)control->exit_code_hi) << 32) | control->exit_code;
>  }
> @@ -3242,7 +3242,7 @@ static void dump_ghcb(struct vcpu_svm *svm)
>  	 */
>  	pr_err("GHCB (GPA=%016llx) snapshot:\n", svm->vmcb->control.ghcb_gpa);
>  	pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_code",
> -	       kvm_ghcb_get_sw_exit_code(control), kvm_ghcb_sw_exit_code_is_valid(svm));
> +	       kvm_get_cached_sw_exit_code(control), kvm_ghcb_sw_exit_code_is_valid(svm));
>  	pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_info_1",
>  	       control->exit_info_1, kvm_ghcb_sw_exit_info_1_is_valid(svm));
>  	pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_info_2",
> @@ -3331,7 +3331,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
>  	 * Retrieve the exit code now even though it may not be marked valid
>  	 * as it could help with debugging.
>  	 */
> -	exit_code = kvm_ghcb_get_sw_exit_code(control);
> +	exit_code = kvm_get_cached_sw_exit_code(control);
>  
>  	/* Only GHCB Usage code 0 is supported */
>  	if (svm->sev_es.ghcb->ghcb_usage) {
> @@ -4336,7 +4336,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
>  
>  	svm_vmgexit_success(svm, 0);
>  
> -	exit_code = kvm_ghcb_get_sw_exit_code(control);
> +	exit_code = kvm_get_cached_sw_exit_code(control);
>  	switch (exit_code) {
>  	case SVM_VMGEXIT_MMIO_READ:
>  		ret = setup_vmgexit_scratch(svm, true, control->exit_info_2);

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 01/41] KVM: SEV: Rename kvm_ghcb_get_sw_exit_code() to kvm_get_cached_sw_exit_code()
  2025-09-15 16:15   ` Tom Lendacky
@ 2025-09-15 16:30     ` Sean Christopherson
  0 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-15 16:30 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Paolo Bonzini, kvm, linux-kernel, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

On Mon, Sep 15, 2025, Tom Lendacky wrote:
> On 9/12/25 18:22, Sean Christopherson wrote:
> > Rename kvm_ghcb_get_sw_exit_code() to kvm_get_cached_sw_exit_code() to make
> > it clear that KVM is getting the cached value, not reading directly from
> > the guest-controlled GHCB.  More importantly, vacating
> > kvm_ghcb_get_sw_exit_code() will allow adding a KVM-specific macro-built
> > kvm_ghcb_get_##field() helper to read values from the GHCB.
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> 
> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
> 
> Makes me wonder if we want to create kvm_get_cached_sw_exit_info_{1,2}
> routines rather than referencing control->exit_info_{1,2} directly?

I think I'd prefer to avoid creating more wrappers?  I generally don't like having
wrappers for accessing a single field, especially an architecturally defined field,
because I don't like having to bounce through extra layers to understand what's
going on.

But for the GHCB wrappers, they're a necessary evil due to the associated valid
bits and the volatility (guest-writable) of the fields.  For the "cached" fields,
I don't think we need a wrappers; providing a helper to splice the two exit code
fields together is helpful, but accessing the control area directly for "simple"
cases feels very natural to me since that's how KVM works for "normal" VMs.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 35/41] KVM: selftests: Add an MSR test to exercise guest/host and read/write
  2025-09-15  8:22   ` Chao Gao
@ 2025-09-15 17:00     ` Sean Christopherson
  0 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-15 17:00 UTC (permalink / raw)
  To: Chao Gao
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z

On Mon, Sep 15, 2025, Chao Gao wrote:
> >+static void __vcpus_run(struct kvm_vcpu **vcpus, const int NR_VCPUS)
> >+{
> >+	int i;
> >+
> >+	for (i = 0; i < NR_VCPUS; i++)
> >+		do_vcpu_run(vcpus[i]);
> >+}
> >+
> >+static void vcpus_run(struct kvm_vcpu **vcpus, const int NR_VCPUS)
> >+{
> >+	__vcpus_run(vcpus, NR_VCPUS);
> >+	__vcpus_run(vcpus, NR_VCPUS);
> 
> ...
> 
> >+	for (idx = 0; idx < ARRAY_SIZE(__msrs); idx++) {
> >+		sync_global_to_guest(vm, idx);
> >+
> >+		vcpus_run(vcpus, NR_VCPUS);
> >+		vcpus_run(vcpus, NR_VCPUS);
> 
> We enter each vCPU 4 times for each MSR here. If I count correctly, only two of
> them are needed as the guest code syncs with the host twice for each MSR (one in
> guest_test_{un,}supported_msr(), the other at the end of guest_main()).

I'm 99% certain you're correct and that the second run is unnecessary.  I suspect
this is leftover crud from an earlier incarnation of the test that used a separate
VM for the "unsupported" features case (before I realized that the test could
abuse and test the fact that KVM doesn't require homogeneous vCPU models).

I'll triple check and post a fixup.

Thanks!

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 09/41] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs
  2025-09-12 23:22 ` [PATCH v15 09/41] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Sean Christopherson
@ 2025-09-15 17:04   ` Xin Li
  2025-09-16  6:51   ` Xiaoyao Li
  2025-09-16  8:28   ` Binbin Wu
  2 siblings, 0 replies; 130+ messages in thread
From: Xin Li @ 2025-09-15 17:04 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

On 9/12/2025 4:22 PM, Sean Christopherson wrote:
> Load the guest's FPU state if userspace is accessing MSRs whose values
> are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(),
> to facilitate access to such kind of MSRs.
> 
> If MSRs supported in kvm_caps.supported_xss are passed through to guest,
> the guest MSRs are swapped with host's before vCPU exits to userspace and
> after it reenters kernel before next VM-entry.
> 
> Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
> explicitly check @vcpu is non-null before attempting to load guest state.
> The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without
> loading guest FPU state (which doesn't exist).
> 
> Note that guest_cpuid_has() is not queried as host userspace is allowed to
> access MSRs that have not been exposed to the guest, e.g. it might do
> KVM_SET_MSRS prior to KVM_SET_CPUID2.
> 
> The two helpers are put here in order to manifest accessing xsave-managed
> MSRs requires special check and handling to guarantee the correctness of
> read/write to the MSRs.
> 
> Co-developed-by: Yang Weijiang <weijiang.yang@intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> [sean: drop S_CET, add big comment, move accessors to x86.c]
> Signed-off-by: Sean Christopherson <seanjc@google.com>


Reviewed-by: Xin Li (Intel) <xin@zytor.com>


> ---
>   arch/x86/kvm/x86.c | 86 +++++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 85 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c5e38d6943fe..a95ca2fbd3a9 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3801,6 +3804,66 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
>   	mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
>   }
>   
> +/*
> + * Returns true if the MSR in question is managed via XSTATE, i.e. is context
> + * switched with the rest of guest FPU state.  Note!  S_CET is _not_ context
> + * switched via XSTATE even though it _is_ saved/restored via XSAVES/XRSTORS.
> + * Because S_CET is loaded on VM-Enter and VM-Exit via dedicated VMCS fields,
> + * the value saved/restored via XSTATE is always the host's value.  That detail
> + * is _extremely_ important, as the guest's S_CET must _never_ be resident in
> + * hardware while executing in the host.  Loading guest values for U_CET and
> + * PL[0-3]_SSP while executing in the kernel is safe, as U_CET is specific to
> + * userspace, and PL[0-3]_SSP are only consumed when transitioning to lower
> + * privilegel levels, i.e. are effectively only consumed by userspace as well.
> + */
> +static bool is_xstate_managed_msr(struct kvm_vcpu *vcpu, u32 msr)
> +{
> +	if (!vcpu)
> +		return false;
> +
> +	switch (msr) {
> +	case MSR_IA32_U_CET:
> +		return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) ||
> +		       guest_cpu_cap_has(vcpu, X86_FEATURE_IBT);
> +	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> +		return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
> +	default:
> +		return false;
> +	}
> +}

With this new version of is_xstate_managed_msr(), which checks against vcpu
capabilities instead of KVM, patch 9 of KVM FRED patches[1] no longer needs
to make any change to it.  And this is the only conflict when I apply KVM
FRED patches on top of this v15 mega-CET patch series.

[1] https://lore.kernel.org/lkml/20250829153149.2871901-10-xin@zytor.com/




^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 16/41] KVM: VMX: Set up interception for CET MSRs
  2025-09-12 23:22 ` [PATCH v15 16/41] KVM: VMX: Set up interception for CET MSRs Sean Christopherson
@ 2025-09-15 17:21   ` Xin Li
  2025-09-16  7:40   ` Xiaoyao Li
  2025-09-17  8:32   ` Binbin Wu
  2 siblings, 0 replies; 130+ messages in thread
From: Xin Li @ 2025-09-15 17:21 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

On 9/12/2025 4:22 PM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
> 
> Enable/disable CET MSRs interception per associated feature configuration.
> 
> Pass through CET MSRs that are managed by XSAVE, as they cannot be
> intercepted without also intercepting XSAVE. However, intercepting XSAVE
> would likely cause unacceptable performance overhead.
> MSR_IA32_INT_SSP_TAB is not managed by XSAVE, so it is intercepted.
> 
> Note, this MSR design introduced an architectural limitation of SHSTK and
> IBT control for guest, i.e., when SHSTK is exposed, IBT is also available
> to guest from architectural perspective since IBT relies on subset of SHSTK
> relevant MSRs.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>


Reviewed-by: Xin Li (Intel) <xin@zytor.com>


> ---
>   arch/x86/kvm/vmx/vmx.c | 19 +++++++++++++++++++
>   1 file changed, 19 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 4fc1dbba2eb0..adf5af30e537 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4101,6 +4101,8 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
>   
>   void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
>   {
> +	bool intercept;
> +
>   	if (!cpu_has_vmx_msr_bitmap())
>   		return;
>   
> @@ -4146,6 +4148,23 @@ void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
>   		vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
>   					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
>   
> +	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
> +		intercept = !guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
> +
> +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, MSR_TYPE_RW, intercept);


As you suggested, this is also the correct interception setting for FRED.

Thanks!
     Xin

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 02/41] KVM: SEV: Read save fields from GHCB exactly once
  2025-09-12 23:22 ` [PATCH v15 02/41] KVM: SEV: Read save fields from GHCB exactly once Sean Christopherson
@ 2025-09-15 17:32   ` Tom Lendacky
  2025-09-15 21:08     ` Sean Christopherson
  0 siblings, 1 reply; 130+ messages in thread
From: Tom Lendacky @ 2025-09-15 17:32 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Mathias Krause, John Allen, Rick Edgecombe,
	Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

On 9/12/25 18:22, Sean Christopherson wrote:
> Wrap all reads of GHCB save fields with READ_ONCE() via a KVM-specific
> GHCB get() utility to help guard against TOCTOU bugs.  Using READ_ONCE()
> doesn't completely prevent such bugs, e.g. doesn't prevent KVM from
> redoing get() after checking the initial value, but at least addresses
> all potential TOCTOU issues in the current KVM code base.
> 
> Opportunistically reduce the indentation of the macro-defined helpers and
> clean up the alignment.
> 
> Fixes: 4e15a0ddc3ff ("KVM: SEV: snapshot the GHCB before accessing it")
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>

Just wondering if we should make the kvm_ghcb_get_*() routines take just
a struct vcpu_svm routine so that they don't get confused with the
ghcb_get_*() routines? The current uses are just using svm->sev_es.ghcb
to set the ghcb variable that gets used anyway. That way the KVM
versions look specifically like KVM versions.

> ---
>  arch/x86/kvm/svm/sev.c |  8 ++++----
>  arch/x86/kvm/svm/svm.h | 26 ++++++++++++++++----------
>  2 files changed, 20 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index fe8d148b76c0..37abbda28685 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3304,16 +3304,16 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
>  	svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm, ghcb);
>  
>  	if (kvm_ghcb_xcr0_is_valid(svm)) {
> -		vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb);
> +		vcpu->arch.xcr0 = kvm_ghcb_get_xcr0(ghcb);
>  		vcpu->arch.cpuid_dynamic_bits_dirty = true;
>  	}
>  
>  	/* Copy the GHCB exit information into the VMCB fields */
> -	exit_code = ghcb_get_sw_exit_code(ghcb);
> +	exit_code = kvm_ghcb_get_sw_exit_code(ghcb);
>  	control->exit_code = lower_32_bits(exit_code);
>  	control->exit_code_hi = upper_32_bits(exit_code);
> -	control->exit_info_1 = ghcb_get_sw_exit_info_1(ghcb);
> -	control->exit_info_2 = ghcb_get_sw_exit_info_2(ghcb);
> +	control->exit_info_1 = kvm_ghcb_get_sw_exit_info_1(ghcb);
> +	control->exit_info_2 = kvm_ghcb_get_sw_exit_info_2(ghcb);
>  	svm->sev_es.sw_scratch = kvm_ghcb_get_sw_scratch_if_valid(svm, ghcb);
>  
>  	/* Clear the valid entries fields */
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 5d39c0b17988..c2316adde3cc 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -913,16 +913,22 @@ void __svm_sev_es_vcpu_run(struct vcpu_svm *svm, bool spec_ctrl_intercepted,
>  void __svm_vcpu_run(struct vcpu_svm *svm, bool spec_ctrl_intercepted);
>  
>  #define DEFINE_KVM_GHCB_ACCESSORS(field)						\
> -	static __always_inline bool kvm_ghcb_##field##_is_valid(const struct vcpu_svm *svm) \
> -	{									\
> -		return test_bit(GHCB_BITMAP_IDX(field),				\
> -				(unsigned long *)&svm->sev_es.valid_bitmap);	\
> -	}									\
> -										\
> -	static __always_inline u64 kvm_ghcb_get_##field##_if_valid(struct vcpu_svm *svm, struct ghcb *ghcb) \
> -	{									\
> -		return kvm_ghcb_##field##_is_valid(svm) ? ghcb->save.field : 0;	\
> -	}									\
> +static __always_inline u64 kvm_ghcb_get_##field(struct ghcb *ghcb)			\
> +{											\
> +	return READ_ONCE(ghcb->save.field);						\
> +}											\
> +											\
> +static __always_inline bool kvm_ghcb_##field##_is_valid(const struct vcpu_svm *svm)	\
> +{											\
> +	return test_bit(GHCB_BITMAP_IDX(field),						\
> +			(unsigned long *)&svm->sev_es.valid_bitmap);			\
> +}											\
> +											\
> +static __always_inline u64 kvm_ghcb_get_##field##_if_valid(struct vcpu_svm *svm,	\
> +							   struct ghcb *ghcb)		\
> +{											\
> +	return kvm_ghcb_##field##_is_valid(svm) ? kvm_ghcb_get_##field(ghcb) : 0;	\
> +}
>  
>  DEFINE_KVM_GHCB_ACCESSORS(cpl)
>  DEFINE_KVM_GHCB_ACCESSORS(rax)

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 21/41] KVM: nVMX: Prepare for enabling CET support for nested guest
  2025-09-12 23:22 ` [PATCH v15 21/41] KVM: nVMX: Prepare for enabling CET support for nested guest Sean Christopherson
@ 2025-09-15 17:45   ` Xin Li
  2025-09-18  4:48   ` Xin Li
  1 sibling, 0 replies; 130+ messages in thread
From: Xin Li @ 2025-09-15 17:45 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

On 9/12/2025 4:22 PM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
> 
> Set up CET MSRs, related VM_ENTRY/EXIT control bits and fixed CR4 setting
> to enable CET for nested VM.
> 
> vmcs12 and vmcs02 needs to be synced when L2 exits to L1 or when L1 wants
> to resume L2, that way correct CET states can be observed by one another.
> 
> Please note that consistency checks regarding CET state during VM-Entry
> will be added later to prevent this patch from becoming too large.
> Advertising the new CET VM_ENTRY/EXIT control bits are also be deferred
> until after the consistency checks are added.
> 
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>


Reviewed-by: Xin Li (Intel) <xin@zytor.com>


> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 14f9822b611d..51d69f368689 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -4760,6 +4825,18 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
>   	if (vmcs12->vm_exit_controls & VM_EXIT_CLEAR_BNDCFGS)
>   		vmcs_write64(GUEST_BNDCFGS, 0);
>   
> +	/*
> +	 * Load CET state from host state if VM_EXIT_LOAD_CET_STATE is set.
> +	 * otherwise CET state should be retained across VM-exit, i.e.,
> +	 * guest values should be propagated from vmcs12 to vmcs01.
> +	 */
> +	if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_CET_STATE)
> +		vmcs_write_cet_state(vcpu, vmcs12->host_s_cet, vmcs12->host_ssp,
> +				     vmcs12->host_ssp_tbl);
> +	else
> +		vmcs_write_cet_state(vcpu, vmcs12->guest_s_cet, vmcs12->guest_ssp,
> +				     vmcs12->guest_ssp_tbl);
> +
>   	if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PAT) {
>   		vmcs_write64(GUEST_IA32_PAT, vmcs12->host_ia32_pat);
>   		vcpu->arch.pat = vmcs12->host_ia32_pat;

Also tested with VM exit load CET bit set and cleared, both passed, so

Tested-by: Xin Li (Intel) <xin@zytor.com>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 25/41] KVM: x86: SVM: Emulate reads and writes to shadow stack MSRs
  2025-09-12 23:23 ` [PATCH v15 25/41] KVM: x86: SVM: Emulate reads and writes to shadow stack MSRs Sean Christopherson
@ 2025-09-15 17:56   ` Xin Li
  2025-09-15 20:43     ` Sean Christopherson
  0 siblings, 1 reply; 130+ messages in thread
From: Xin Li @ 2025-09-15 17:56 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li

On 9/12/2025 4:23 PM, Sean Christopherson wrote:
> From: John Allen <john.allen@amd.com>
> 
> Emulate shadow stack MSR access by reading and writing to the
> corresponding fields in the VMCB.
> 
> Signed-off-by: John Allen <john.allen@amd.com>
> [sean: mark VMCB_CET dirty/clean as appropriate]
> Signed-off-by: Sean Christopherson <seanjc@google.com>

For the shortlog, shouldn't we use "KVM: SVM:"?

I don't see any change to common x86 code in this patch.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 03/41] KVM: SEV: Validate XCR0 provided by guest in GHCB
  2025-09-12 23:22 ` [PATCH v15 03/41] KVM: SEV: Validate XCR0 provided by guest in GHCB Sean Christopherson
@ 2025-09-15 18:41   ` Tom Lendacky
  2025-09-15 21:22     ` Sean Christopherson
  0 siblings, 1 reply; 130+ messages in thread
From: Tom Lendacky @ 2025-09-15 18:41 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Mathias Krause, John Allen, Rick Edgecombe,
	Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

On 9/12/25 18:22, Sean Christopherson wrote:
> Use __kvm_set_xcr() to propagate XCR0 changes from the GHCB to KVM's
> software model in order to validate the new XCR0 against KVM's view of
> the supported XCR0.  Allowing garbage is thankfully mostly benign, as
> kvm_load_{guest,host}_xsave_state() bail early for vCPUs with protected
> state, xstate_required_size() will simply provide garbage back to the
> guest, and attempting to save/restore the bad value via KVM_{G,S}ET_XCRS
> will only harm the guest (setting XCR0 will fail).
> 
> However, allowing the guest to put junk into a field that KVM assumes is
> valid is a CVE waiting to happen.  And as a bonus, using the proper API
> eliminates the ugly open coding of setting arch.cpuid_dynamic_bits_dirty.
> 
> Simply ignore bad values, as either the guest managed to get an
> unsupported value into hardware, or the guest is misbehaving and providing
> pure garbage.  In either case, KVM can't fix the broken guest.
> 
> Note, using __kvm_set_xcr() also avoids recomputing dynamic CPUID bits
> if XCR0 isn't actually changing (relatively to KVM's previous snapshot).
> 
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
> Signed-off-by: Sean Christopherson <seanjc@google.com>

A question below, but otherwise:

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>

(successfully booted and ran some quick tests against the first 3
patches without any issues on both an SEV-ES and SEV-SNP guest).

> ---
>  arch/x86/include/asm/kvm_host.h | 1 +
>  arch/x86/kvm/svm/sev.c          | 6 ++----
>  arch/x86/kvm/x86.c              | 3 ++-
>  3 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index cb86f3cca3e9..2762554cbb7b 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2209,6 +2209,7 @@ int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val);
>  unsigned long kvm_get_dr(struct kvm_vcpu *vcpu, int dr);
>  unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu);
>  void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw);
> +int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
>  int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu);
>  
>  int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 37abbda28685..0cd77a87dd84 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3303,10 +3303,8 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
>  
>  	svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm, ghcb);
>  
> -	if (kvm_ghcb_xcr0_is_valid(svm)) {
> -		vcpu->arch.xcr0 = kvm_ghcb_get_xcr0(ghcb);
> -		vcpu->arch.cpuid_dynamic_bits_dirty = true;
> -	}
> +	if (kvm_ghcb_xcr0_is_valid(svm))
> +		__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));

Would a vcpu_unimpl() be approprite here if __kvm_set_xcr() returns
something other than 0? It might help with debugging if the guest is
doing something it shouldn't.

Thanks,
Tom

>  
>  	/* Copy the GHCB exit information into the VMCB fields */
>  	exit_code = kvm_ghcb_get_sw_exit_code(ghcb);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 6d85fbafc679..ba4915456615 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1235,7 +1235,7 @@ static inline u64 kvm_guest_supported_xfd(struct kvm_vcpu *vcpu)
>  }
>  #endif
>  
> -static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
> +int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
>  {
>  	u64 xcr0 = xcr;
>  	u64 old_xcr0 = vcpu->arch.xcr0;
> @@ -1279,6 +1279,7 @@ static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
>  		vcpu->arch.cpuid_dynamic_bits_dirty = true;
>  	return 0;
>  }
> +EXPORT_SYMBOL_GPL(__kvm_set_xcr);
>  
>  int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu)
>  {

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 25/41] KVM: x86: SVM: Emulate reads and writes to shadow stack MSRs
  2025-09-15 17:56   ` Xin Li
@ 2025-09-15 20:43     ` Sean Christopherson
  0 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-15 20:43 UTC (permalink / raw)
  To: Xin Li
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li

On Mon, Sep 15, 2025, Xin Li wrote:
> On 9/12/2025 4:23 PM, Sean Christopherson wrote:
> > From: John Allen <john.allen@amd.com>
> > 
> > Emulate shadow stack MSR access by reading and writing to the
> > corresponding fields in the VMCB.
> > 
> > Signed-off-by: John Allen <john.allen@amd.com>
> > [sean: mark VMCB_CET dirty/clean as appropriate]
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> 
> For the shortlog, shouldn't we use "KVM: SVM:"?

Yep, I simply missed that goof.  Thanks!

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 02/41] KVM: SEV: Read save fields from GHCB exactly once
  2025-09-15 17:32   ` Tom Lendacky
@ 2025-09-15 21:08     ` Sean Christopherson
  2025-09-17 21:47       ` Sean Christopherson
  0 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-15 21:08 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Paolo Bonzini, kvm, linux-kernel, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

On Mon, Sep 15, 2025, Tom Lendacky wrote:
> On 9/12/25 18:22, Sean Christopherson wrote:
> > Wrap all reads of GHCB save fields with READ_ONCE() via a KVM-specific
> > GHCB get() utility to help guard against TOCTOU bugs.  Using READ_ONCE()
> > doesn't completely prevent such bugs, e.g. doesn't prevent KVM from
> > redoing get() after checking the initial value, but at least addresses
> > all potential TOCTOU issues in the current KVM code base.
> > 
> > Opportunistically reduce the indentation of the macro-defined helpers and
> > clean up the alignment.
> > 
> > Fixes: 4e15a0ddc3ff ("KVM: SEV: snapshot the GHCB before accessing it")
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> 
> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
> 
> Just wondering if we should make the kvm_ghcb_get_*() routines take just
> a struct vcpu_svm routine so that they don't get confused with the
> ghcb_get_*() routines? The current uses are just using svm->sev_es.ghcb
> to set the ghcb variable that gets used anyway. That way the KVM
> versions look specifically like KVM versions.

Yeah, that's a great idea.  I'll send a patch, and then as Boris put it, play
patch tetris to avoid unnecessary dependencies (I want to keep the CET series
in a separate branch for a variety of reasons).

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 00/41] KVM: x86: Mega-CET
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (41 preceding siblings ...)
  2025-09-15 13:18 ` [PATCH v15 00/41] KVM: x86: Mega-CET Mathias Krause
@ 2025-09-15 21:20 ` John Allen
  2025-09-16 13:53 ` Chao Gao
  43 siblings, 0 replies; 130+ messages in thread
From: John Allen @ 2025-09-15 21:20 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

On Fri, Sep 12, 2025 at 04:22:38PM -0700, Sean Christopherson wrote:
> This series is (hopefully) all of the in-flight CET virtualization patches
> in one big bundle.  Please holler if I missed a patch or three as this is what
> I am planning on applying for 6.18 (modulo fixups and whatnot), i.e. if there's
> something else that's needed to enable CET virtualization, now's the time...
> 
> Patches 1-3 probably need the most attention, as they are new in v15 and I
> don't have a fully working SEV-ES setup (don't have the right guest firmware,
> ugh).  Though testing on everything would be much appreciated.

It looks like there may be regressions with SEV-ES here. Running the
test_shadow_stack_64 selftest in the guest now hangs in the gup write.
Skipping the gup test seems to indicate there are some other issues as
well.

This reminded me that with the last version of the series, I noted an
issue with test_32bit selftest and sev-es on the guest. This would
segfault in sigaction32 and seemed to indicate some incompatibility
between the test and sev-es as it could be reproduced with a stripped
down version of the test without shadow stack enabled. I'm still
investigating this as well, but the above failures seem to be new.

I'll have some time to investigate further tomorrow.

Thanks,
John

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 03/41] KVM: SEV: Validate XCR0 provided by guest in GHCB
  2025-09-15 18:41   ` Tom Lendacky
@ 2025-09-15 21:22     ` Sean Christopherson
  0 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-15 21:22 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Paolo Bonzini, kvm, linux-kernel, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

On Mon, Sep 15, 2025, Tom Lendacky wrote:
> On 9/12/25 18:22, Sean Christopherson wrote:
> (successfully booted and ran some quick tests against the first 3
> patches without any issues on both an SEV-ES and SEV-SNP guest).

Nice, thanks!

> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index 37abbda28685..0cd77a87dd84 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -3303,10 +3303,8 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> >  
> >  	svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm, ghcb);
> >  
> > -	if (kvm_ghcb_xcr0_is_valid(svm)) {
> > -		vcpu->arch.xcr0 = kvm_ghcb_get_xcr0(ghcb);
> > -		vcpu->arch.cpuid_dynamic_bits_dirty = true;
> > -	}
> > +	if (kvm_ghcb_xcr0_is_valid(svm))
> > +		__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
> 
> Would a vcpu_unimpl() be approprite here if __kvm_set_xcr() returns
> something other than 0? It might help with debugging if the guest is
> doing something it shouldn't.

I don't want to use vcpu_unimpl(), because this isn't likely to occur due to lack
of KVM support.  Hrm, but I see now that sev.c abuses vcpu_unimpl() for a whole
pile of things that aren't due to lack of KVM support.  I can certainly appreciate
how painful it can be to debug random -EINVAL failures, but printing via
vcpu_unimpl() isn't an approach that I want to encourage as it's inherently
flawed (requires access to kernel logs, is ratelimited, and can still cause
jitter even though it's ratelimited if multiple CPUs happen to contend the printk
locks at just the wrong time).

For now, I think it's best to do nothing.  Long term, I think we need to figure
out a solution for logging errors of this nature, because this is far from the
first time this sort of thing has popped up.  E.g. nested VMX consistency check
failures are another case where having precise logs would be super helpful, but
telling userspace to enable tracepoints and scrape the tracefs logs (or run BPF
programs) isn't exactly a great experience.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 13/41] KVM: x86: Enable guest SSP read/write interface with new uAPIs
  2025-09-15  6:55   ` Xiaoyao Li
@ 2025-09-15 22:12     ` Sean Christopherson
  2025-09-16  5:52       ` Xiaoyao Li
  0 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-15 22:12 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Zhang Yi Z

On Mon, Sep 15, 2025, Xiaoyao Li wrote:
> On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> > @@ -6097,11 +6105,22 @@ static int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl,
> >   static int kvm_get_reg_list(struct kvm_vcpu *vcpu,
> >   			    struct kvm_reg_list __user *user_list)
> >   {
> > -	u64 nr_regs = 0;
> > +	u64 nr_regs = guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) ? 1 : 0;
> 
> I wonder what's the semantic of KVM returning KVM_REG_GUEST_SSP on
> KVM_GET_REG_LIST. Does it ensure KVM_{G,S}ET_ONE_REG returns -EINVAL on
> KVM_REG_GUEST_SSP when it's not enumerated by KVM_GET_REG_LIST?
> 
> If so, but KVM_{G,S}ET_ONE_REG can succeed on GUEST_SSP even if
> !guest_cpu_cap_has() when @ignore_msrs is true.

Ugh, great catch.  Too many knobs.  The best idea I've got it to to exempt KVM-
internal MSRs from ignore_msrs and report_ignored_msrs on host-initiated writes.
That's unfortunately still a userspace visible change, and would continue to be
userspace-visible, e.g. if we wanted to change the magic value for
MSR_KVM_INTERNAL_GUEST_SSP.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c78acab2ff3f..6a50261d1c5c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -511,6 +511,11 @@ static bool kvm_is_advertised_msr(u32 msr_index)
        return false;
 }
 
+static bool kvm_is_internal_msr(u32 msr)
+{
+       return msr == MSR_KVM_INTERNAL_GUEST_SSP;
+}
+
 typedef int (*msr_access_t)(struct kvm_vcpu *vcpu, u32 index, u64 *data,
                            bool host_initiated);
 
@@ -544,6 +549,9 @@ static __always_inline int kvm_do_msr_access(struct kvm_vcpu *vcpu, u32 msr,
        if (host_initiated && !*data && kvm_is_advertised_msr(msr))
                return 0;
 
+       if (host_initiated && kvm_is_internal_msr(msr))
+               return ret;
+
        if (!ignore_msrs) {
                kvm_debug_ratelimited("unhandled %s: 0x%x data 0x%llx\n",
                                      op, msr, *data);

Alternatively, simply exempt host writes from ignore_msrs.  Aha!  And KVM even
documents that as the behavior:

	kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs.
			Default is 0 (don't ignore, but inject #GP)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c78acab2ff3f..177253e75b41 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -544,7 +544,7 @@ static __always_inline int kvm_do_msr_access(struct kvm_vcpu *vcpu, u32 msr,
        if (host_initiated && !*data && kvm_is_advertised_msr(msr))
                return 0;
 
-       if (!ignore_msrs) {
+       if (host_initiated || !ignore_msrs) {
                kvm_debug_ratelimited("unhandled %s: 0x%x data 0x%llx\n",
                                      op, msr, *data);
                return ret;

So while it's technically an ABI change (arguable since it's guarded by an
off-by-default param), I suspect we can get away with it.  Hmm, commit 6abe9c1386e5
("KVM: X86: Move ignore_msrs handling upper the stack") exempted KVM-internal
MSR accesses from ignore_msrs, but doesn't provide much in the way of justification
for _why_ that's desirable.

Argh, and that same mini-series extended the behavior to feature MSRs, again
without seeming to consider whether or not it's actually desirable to suppress
bad VMM accesses.  Even worse, that decision likely generated an absurd amount
of churn and noise due to splattering helpers and variants all over the place. :-(

commit 12bc2132b15e0a969b3f455d90a5f215ef239eff
Author:     Peter Xu <peterx@redhat.com>
AuthorDate: Mon Jun 22 18:04:42 2020 -0400
Commit:     Paolo Bonzini <pbonzini@redhat.com>
CommitDate: Wed Jul 8 16:21:40 2020 -0400

    KVM: X86: Do the same ignore_msrs check for feature msrs
    
    Logically the ignore_msrs and report_ignored_msrs should also apply to feature
    MSRs.  Add them in.

For 6.18, I think the safe play is to go with the first path (exempt KVM-internal
MSRs), and then try to go for the second approach (exempt all host accesses) for
6.19.  KVM's ABI for ignore_msrs=true is already all kinds of messed up, so I'm
not terribly concerned about temporarily making it marginally worse.

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 13/41] KVM: x86: Enable guest SSP read/write interface with new uAPIs
  2025-09-15 22:12     ` Sean Christopherson
@ 2025-09-16  5:52       ` Xiaoyao Li
  2025-09-19 17:47         ` Sean Christopherson
  0 siblings, 1 reply; 130+ messages in thread
From: Xiaoyao Li @ 2025-09-16  5:52 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Zhang Yi Z

On 9/16/2025 6:12 AM, Sean Christopherson wrote:
> On Mon, Sep 15, 2025, Xiaoyao Li wrote:
>> On 9/13/2025 7:22 AM, Sean Christopherson wrote:
>>> @@ -6097,11 +6105,22 @@ static int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl,
>>>    static int kvm_get_reg_list(struct kvm_vcpu *vcpu,
>>>    			    struct kvm_reg_list __user *user_list)
>>>    {
>>> -	u64 nr_regs = 0;
>>> +	u64 nr_regs = guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) ? 1 : 0;
>>
>> I wonder what's the semantic of KVM returning KVM_REG_GUEST_SSP on
>> KVM_GET_REG_LIST. Does it ensure KVM_{G,S}ET_ONE_REG returns -EINVAL on
>> KVM_REG_GUEST_SSP when it's not enumerated by KVM_GET_REG_LIST?
>>
>> If so, but KVM_{G,S}ET_ONE_REG can succeed on GUEST_SSP even if
>> !guest_cpu_cap_has() when @ignore_msrs is true.
> 
> Ugh, great catch.  Too many knobs.  The best idea I've got it to to exempt KVM-
> internal MSRs from ignore_msrs and report_ignored_msrs on host-initiated writes.
> That's unfortunately still a userspace visible change, and would continue to be
> userspace-visible, e.g. if we wanted to change the magic value for
> MSR_KVM_INTERNAL_GUEST_SSP.
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c78acab2ff3f..6a50261d1c5c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -511,6 +511,11 @@ static bool kvm_is_advertised_msr(u32 msr_index)
>          return false;
>   }
>   
> +static bool kvm_is_internal_msr(u32 msr)
> +{
> +       return msr == MSR_KVM_INTERNAL_GUEST_SSP;
> +}
> +
>   typedef int (*msr_access_t)(struct kvm_vcpu *vcpu, u32 index, u64 *data,
>                              bool host_initiated);
>   
> @@ -544,6 +549,9 @@ static __always_inline int kvm_do_msr_access(struct kvm_vcpu *vcpu, u32 msr,
>          if (host_initiated && !*data && kvm_is_advertised_msr(msr))
>                  return 0;
>   
> +       if (host_initiated && kvm_is_internal_msr(msr))
> +               return ret;
> +
>          if (!ignore_msrs) {
>                  kvm_debug_ratelimited("unhandled %s: 0x%x data 0x%llx\n",
>                                        op, msr, *data);
> 
> Alternatively, simply exempt host writes from ignore_msrs.  Aha!  And KVM even
> documents that as the behavior:
> 
> 	kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs.
> 			Default is 0 (don't ignore, but inject #GP)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c78acab2ff3f..177253e75b41 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -544,7 +544,7 @@ static __always_inline int kvm_do_msr_access(struct kvm_vcpu *vcpu, u32 msr,
>          if (host_initiated && !*data && kvm_is_advertised_msr(msr))
>                  return 0;
>   
> -       if (!ignore_msrs) {
> +       if (host_initiated || !ignore_msrs) {
>                  kvm_debug_ratelimited("unhandled %s: 0x%x data 0x%llx\n",
>                                        op, msr, *data);
>                  return ret;
> 
> So while it's technically an ABI change (arguable since it's guarded by an
> off-by-default param), I suspect we can get away with it.  Hmm, commit 6abe9c1386e5
> ("KVM: X86: Move ignore_msrs handling upper the stack") exempted KVM-internal
> MSR accesses from ignore_msrs, but doesn't provide much in the way of justification
> for _why_ that's desirable.
> 
> Argh, and that same mini-series extended the behavior to feature MSRs, again
> without seeming to consider whether or not it's actually desirable to suppress
> bad VMM accesses.  Even worse, that decision likely generated an absurd amount
> of churn and noise due to splattering helpers and variants all over the place. :-(
> 
> commit 12bc2132b15e0a969b3f455d90a5f215ef239eff
> Author:     Peter Xu <peterx@redhat.com>
> AuthorDate: Mon Jun 22 18:04:42 2020 -0400
> Commit:     Paolo Bonzini <pbonzini@redhat.com>
> CommitDate: Wed Jul 8 16:21:40 2020 -0400
> 
>      KVM: X86: Do the same ignore_msrs check for feature msrs
>      
>      Logically the ignore_msrs and report_ignored_msrs should also apply to feature
>      MSRs.  Add them in.
> 
> For 6.18, I think the safe play is to go with the first path (exempt KVM-internal
> MSRs), and then try to go for the second approach (exempt all host accesses) for
> 6.19.  KVM's ABI for ignore_msrs=true is already all kinds of messed up, so I'm
> not terribly concerned about temporarily making it marginally worse.

Looks OK to me.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 09/41] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs
  2025-09-12 23:22 ` [PATCH v15 09/41] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Sean Christopherson
  2025-09-15 17:04   ` Xin Li
@ 2025-09-16  6:51   ` Xiaoyao Li
  2025-09-16  8:28   ` Binbin Wu
  2 siblings, 0 replies; 130+ messages in thread
From: Xiaoyao Li @ 2025-09-16  6:51 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Zhang Yi Z

On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> Load the guest's FPU state if userspace is accessing MSRs whose values
> are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(),
> to facilitate access to such kind of MSRs.
> 
> If MSRs supported in kvm_caps.supported_xss are passed through to guest,
> the guest MSRs are swapped with host's before vCPU exits to userspace and
> after it reenters kernel before next VM-entry.
> 
> Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
> explicitly check @vcpu is non-null before attempting to load guest state.
> The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without
> loading guest FPU state (which doesn't exist).
> 
> Note that guest_cpuid_has() is not queried as host userspace is allowed to
> access MSRs that have not been exposed to the guest, e.g. it might do
> KVM_SET_MSRS prior to KVM_SET_CPUID2.
> 
> The two helpers are put here in order to manifest accessing xsave-managed
> MSRs requires special check and handling to guarantee the correctness of
> read/write to the MSRs.
> 
> Co-developed-by: Yang Weijiang <weijiang.yang@intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> [sean: drop S_CET, add big comment, move accessors to x86.c]
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 14/41] KVM: VMX: Emulate read and write to CET MSRs
  2025-09-12 23:22 ` [PATCH v15 14/41] KVM: VMX: Emulate read and write to CET MSRs Sean Christopherson
@ 2025-09-16  7:07   ` Xiaoyao Li
  2025-09-16  7:48     ` Chao Gao
  2025-09-19 22:11     ` Sean Christopherson
  2025-09-17  7:52   ` Binbin Wu
  1 sibling, 2 replies; 130+ messages in thread
From: Xiaoyao Li @ 2025-09-16  7:07 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Zhang Yi Z

On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
> 
> Add emulation interface for CET MSR access. The emulation code is split
> into common part and vendor specific part. The former does common checks
> for MSRs, e.g., accessibility, data validity etc., then passes operation
> to either XSAVE-managed MSRs via the helpers or CET VMCS fields.
> 
> SSP can only be read via RDSSP. Writing even requires destructive and
> potentially faulting operations such as SAVEPREVSSP/RSTORSSP or
> SETSSBSY/CLRSSBSY. Let the host use a pseudo-MSR that is just a wrapper
> for the GUEST_SSP field of the VMCS.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> [sean: drop call to kvm_set_xstate_msr() for S_CET, consolidate code]

Is the change/update of "drop call to kvm_set_xstate_msr() for S_CET" 
true for this patch?

> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 04/41] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support
  2025-09-12 23:22 ` [PATCH v15 04/41] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support Sean Christopherson
  2025-09-15  6:29   ` Xiaoyao Li
@ 2025-09-16  7:10   ` Binbin Wu
  2025-09-17 13:14     ` Sean Christopherson
  1 sibling, 1 reply; 130+ messages in thread
From: Binbin Wu @ 2025-09-16  7:10 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z



On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Enable KVM_{G,S}ET_ONE_REG uAPIs so that userspace can access MSRs and
> other non-MSR registers through them, along with support for
> KVM_GET_REG_LIST to enumerate support for KVM-defined registers.
>
> This is in preparation for allowing userspace to read/write the guest SSP
> register, which is needed for the upcoming CET virtualization support.
>
> Currently, two types of registers are supported: KVM_X86_REG_TYPE_MSR and
> KVM_X86_REG_TYPE_KVM. All MSRs are in the former type; the latter type is
> added for registers that lack existing KVM uAPIs to access them. The "KVM"
> in the name is intended to be vague to give KVM flexibility to include
> other potential registers.  More precise names like "SYNTHETIC" and
> "SYNTHETIC_MSR" were considered, but were deemed too confusing (e.g. can
> be conflated with synthetic guest-visible MSRs) and may put KVM into a
> corner (e.g. if KVM wants to change how a KVM-defined register is modeled
> internally).
>
> Enumerate only KVM-defined registers in KVM_GET_REG_LIST to avoid
> duplicating KVM_GET_MSR_INDEX_LIST, and so that KVM can return _only_
> registers that are fully supported (KVM_GET_REG_LIST is vCPU-scoped, i.e.
> can be precise, whereas KVM_GET_MSR_INDEX_LIST is system-scoped).
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Link: https://lore.kernel.org/all/20240219074733.122080-18-weijiang.yang@intel.com [1]
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

One nit below.

> ---
>   Documentation/virt/kvm/api.rst  |   6 +-
>   arch/x86/include/uapi/asm/kvm.h |  26 +++++++++
>   arch/x86/kvm/x86.c              | 100 ++++++++++++++++++++++++++++++++
>   3 files changed, 131 insertions(+), 1 deletion(-)
[...]
> +
> +#define KVM_X86_REG_ENCODE(type, index)				\
Nit:
Is it better to use KVM_X86_REG_ID so that when searching with the string
non-case sensitively, the encoding and its structure can be related to each
other?


> +	(KVM_REG_X86 | KVM_X86_REG_TYPE_SIZE(type, index) | index)
> +
[...]
>   
> +struct kvm_x86_reg_id {
> +	__u32 index;
> +	__u8  type;
> +	__u8  rsvd1;
> +	__u8  rsvd2:4;
> +	__u8  size:4;
> +	__u8  x86;
> +};
> +
>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 05/41] KVM: x86: Report XSS as to-be-saved if there are supported features
  2025-09-12 23:22 ` [PATCH v15 05/41] KVM: x86: Report XSS as to-be-saved if there are supported features Sean Christopherson
@ 2025-09-16  7:12   ` Binbin Wu
  0 siblings, 0 replies; 130+ messages in thread
From: Binbin Wu @ 2025-09-16  7:12 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z



On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> Add MSR_IA32_XSS to list of MSRs reported to userspace if supported_xss
> is non-zero, i.e. KVM supports at least one XSS based feature.
>
> Before enabling CET virtualization series, guest IA32_MSR_XSS is
> guaranteed to be 0, i.e., XSAVES/XRSTORS is executed in non-root mode
> with XSS == 0, which equals to the effect of XSAVE/XRSTOR.
>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

> ---
>   arch/x86/kvm/x86.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 771b7c883c66..3b4258b38ad8 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -332,7 +332,7 @@ static const u32 msrs_to_save_base[] = {
>   	MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
>   	MSR_IA32_UMWAIT_CONTROL,
>   
> -	MSR_IA32_XFD, MSR_IA32_XFD_ERR,
> +	MSR_IA32_XFD, MSR_IA32_XFD_ERR, MSR_IA32_XSS,
>   };
>   
>   static const u32 msrs_to_save_pmu[] = {
> @@ -7499,6 +7499,10 @@ static void kvm_probe_msr_to_save(u32 msr_index)
>   		if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR))
>   			return;
>   		break;
> +	case MSR_IA32_XSS:
> +		if (!kvm_caps.supported_xss)
> +			return;
> +		break;
>   	default:
>   		break;
>   	}


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 06/41] KVM: x86: Check XSS validity against guest CPUIDs
  2025-09-12 23:22 ` [PATCH v15 06/41] KVM: x86: Check XSS validity against guest CPUIDs Sean Christopherson
@ 2025-09-16  7:20   ` Binbin Wu
  0 siblings, 0 replies; 130+ messages in thread
From: Binbin Wu @ 2025-09-16  7:20 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z



On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> From: Chao Gao <chao.gao@intel.com>
>
> Maintain per-guest valid XSS bits and check XSS validity against them
> rather than against KVM capabilities. This is to prevent bits that are
> supported by KVM but not supported for a guest from being set.
>
> Opportunistically return KVM_MSR_RET_UNSUPPORTED on IA32_XSS MSR accesses
> if guest CPUID doesn't enumerate X86_FEATURE_XSAVES. Since
> KVM_MSR_RET_UNSUPPORTED takes care of host_initiated cases, drop the
> host_initiated check.
>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

> ---
>   arch/x86/include/asm/kvm_host.h |  3 ++-
>   arch/x86/kvm/cpuid.c            | 12 ++++++++++++
>   arch/x86/kvm/x86.c              |  7 +++----
>   3 files changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 2762554cbb7b..d931d72d23c9 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -815,7 +815,6 @@ struct kvm_vcpu_arch {
>   	bool at_instruction_boundary;
>   	bool tpr_access_reporting;
>   	bool xfd_no_write_intercept;
> -	u64 ia32_xss;
>   	u64 microcode_version;
>   	u64 arch_capabilities;
>   	u64 perf_capabilities;
> @@ -876,6 +875,8 @@ struct kvm_vcpu_arch {
>   
>   	u64 xcr0;
>   	u64 guest_supported_xcr0;
> +	u64 ia32_xss;
> +	u64 guest_supported_xss;
>   
>   	struct kvm_pio_request pio;
>   	void *pio_data;
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index ad6cadf09930..46cf616663e6 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -263,6 +263,17 @@ static u64 cpuid_get_supported_xcr0(struct kvm_vcpu *vcpu)
>   	return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0;
>   }
>   
> +static u64 cpuid_get_supported_xss(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_cpuid_entry2 *best;
> +
> +	best = kvm_find_cpuid_entry_index(vcpu, 0xd, 1);
> +	if (!best)
> +		return 0;
> +
> +	return (best->ecx | ((u64)best->edx << 32)) & kvm_caps.supported_xss;
> +}
> +
>   static __always_inline void kvm_update_feature_runtime(struct kvm_vcpu *vcpu,
>   						       struct kvm_cpuid_entry2 *entry,
>   						       unsigned int x86_feature,
> @@ -424,6 +435,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>   	}
>   
>   	vcpu->arch.guest_supported_xcr0 = cpuid_get_supported_xcr0(vcpu);
> +	vcpu->arch.guest_supported_xss = cpuid_get_supported_xss(vcpu);
>   
>   	vcpu->arch.pv_cpuid.features = kvm_apply_cpuid_pv_features_quirk(vcpu);
>   
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3b4258b38ad8..5a5af40c06a9 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3984,15 +3984,14 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		}
>   		break;
>   	case MSR_IA32_XSS:
> -		if (!msr_info->host_initiated &&
> -		    !guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
> -			return 1;
> +		if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
> +			return KVM_MSR_RET_UNSUPPORTED;
>   		/*
>   		 * KVM supports exposing PT to the guest, but does not support
>   		 * IA32_XSS[bit 8]. Guests have to use RDMSR/WRMSR rather than
>   		 * XSAVES/XRSTORS to save/restore PT MSRs.
>   		 */
> -		if (data & ~kvm_caps.supported_xss)
> +		if (data & ~vcpu->arch.guest_supported_xss)
>   			return 1;
>   		vcpu->arch.ia32_xss = data;
>   		vcpu->arch.cpuid_dynamic_bits_dirty = true;


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 07/41] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2025-09-12 23:22 ` [PATCH v15 07/41] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS Sean Christopherson
@ 2025-09-16  7:23   ` Binbin Wu
  0 siblings, 0 replies; 130+ messages in thread
From: Binbin Wu @ 2025-09-16  7:23 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z



On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Update CPUID.(EAX=0DH,ECX=1).EBX to reflect current required xstate size
> due to XSS MSR modification.
> CPUID(EAX=0DH,ECX=1).EBX reports the required storage size of all enabled
> xstate features in (XCR0 | IA32_XSS). The CPUID value can be used by guest
> before allocate sufficient xsave buffer.
>
> Note, KVM does not yet support any XSS based features, i.e. supported_xss
> is guaranteed to be zero at this time.
>
> Opportunistically skip CPUID updates if XSS value doesn't change.
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

> ---
>   arch/x86/kvm/cpuid.c | 3 ++-
>   arch/x86/kvm/x86.c   | 2 ++
>   2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 46cf616663e6..b5f87254ced7 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -316,7 +316,8 @@ static void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
>   	best = kvm_find_cpuid_entry_index(vcpu, 0xD, 1);
>   	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
>   		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
> -		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
> +		best->ebx = xstate_required_size(vcpu->arch.xcr0 |
> +						 vcpu->arch.ia32_xss, true);
>   }
>   
>   static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5a5af40c06a9..519d58b82f7f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3993,6 +3993,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		 */
>   		if (data & ~vcpu->arch.guest_supported_xss)
>   			return 1;
> +		if (vcpu->arch.ia32_xss == data)
> +			break;
>   		vcpu->arch.ia32_xss = data;
>   		vcpu->arch.cpuid_dynamic_bits_dirty = true;
>   		break;


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 08/41] KVM: x86: Initialize kvm_caps.supported_xss
  2025-09-12 23:22 ` [PATCH v15 08/41] KVM: x86: Initialize kvm_caps.supported_xss Sean Christopherson
@ 2025-09-16  7:29   ` Binbin Wu
  0 siblings, 0 replies; 130+ messages in thread
From: Binbin Wu @ 2025-09-16  7:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z



On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Set original kvm_caps.supported_xss to (host_xss & KVM_SUPPORTED_XSS) if
> XSAVES is supported. host_xss contains the host supported xstate feature
> bits for thread FPU context switch, KVM_SUPPORTED_XSS includes all KVM
> enabled XSS feature bits, the resulting value represents the supervisor
> xstates that are available to guest and are backed by host FPU framework
> for swapping {guest,host} XSAVE-managed registers/MSRs.
>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> [sean: relocate and enhance comment about PT / XSS[8] ]
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

> ---
>   arch/x86/kvm/x86.c | 23 +++++++++++++++--------
>   1 file changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 519d58b82f7f..c5e38d6943fe 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -217,6 +217,14 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
>   				| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
>   				| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
>   
> +/*
> + * Note, KVM supports exposing PT to the guest, but does not support context
> + * switching PT via XSTATE (KVM's PT virtualization relies on perf; swapping
> + * PT via guest XSTATE would clobber perf state), i.e. KVM doesn't support
> + * IA32_XSS[bit 8] (guests can/must use RDMSR/WRMSR to save/restore PT MSRs).
> + */
> +#define KVM_SUPPORTED_XSS     0
> +
>   bool __read_mostly allow_smaller_maxphyaddr = 0;
>   EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr);
>   
> @@ -3986,11 +3994,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   	case MSR_IA32_XSS:
>   		if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
>   			return KVM_MSR_RET_UNSUPPORTED;
> -		/*
> -		 * KVM supports exposing PT to the guest, but does not support
> -		 * IA32_XSS[bit 8]. Guests have to use RDMSR/WRMSR rather than
> -		 * XSAVES/XRSTORS to save/restore PT MSRs.
> -		 */
> +
>   		if (data & ~vcpu->arch.guest_supported_xss)
>   			return 1;
>   		if (vcpu->arch.ia32_xss == data)
> @@ -9818,14 +9822,17 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>   		kvm_host.xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
>   		kvm_caps.supported_xcr0 = kvm_host.xcr0 & KVM_SUPPORTED_XCR0;
>   	}
> +
> +	if (boot_cpu_has(X86_FEATURE_XSAVES)) {
> +		rdmsrq(MSR_IA32_XSS, kvm_host.xss);
> +		kvm_caps.supported_xss = kvm_host.xss & KVM_SUPPORTED_XSS;
> +	}
> +
>   	kvm_caps.supported_quirks = KVM_X86_VALID_QUIRKS;
>   	kvm_caps.inapplicable_quirks = KVM_X86_CONDITIONAL_QUIRKS;
>   
>   	rdmsrq_safe(MSR_EFER, &kvm_host.efer);
>   
> -	if (boot_cpu_has(X86_FEATURE_XSAVES))
> -		rdmsrq(MSR_IA32_XSS, kvm_host.xss);
> -
>   	kvm_init_pmu_capability(ops->pmu_ops);
>   
>   	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 15/41] KVM: x86: Save and reload SSP to/from SMRAM
  2025-09-12 23:22 ` [PATCH v15 15/41] KVM: x86: Save and reload SSP to/from SMRAM Sean Christopherson
@ 2025-09-16  7:37   ` Xiaoyao Li
  2025-09-17  7:53   ` Binbin Wu
  1 sibling, 0 replies; 130+ messages in thread
From: Xiaoyao Li @ 2025-09-16  7:37 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Zhang Yi Z

On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
> 
> Save CET SSP to SMRAM on SMI and reload it on RSM. KVM emulates HW arch
> behavior when guest enters/leaves SMM mode,i.e., save registers to SMRAM
> at the entry of SMM and reload them at the exit to SMM. Per SDM, SSP is
> one of such registers on 64-bit Arch, and add the support for SSP.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 16/41] KVM: VMX: Set up interception for CET MSRs
  2025-09-12 23:22 ` [PATCH v15 16/41] KVM: VMX: Set up interception for CET MSRs Sean Christopherson
  2025-09-15 17:21   ` Xin Li
@ 2025-09-16  7:40   ` Xiaoyao Li
  2025-09-17  8:32   ` Binbin Wu
  2 siblings, 0 replies; 130+ messages in thread
From: Xiaoyao Li @ 2025-09-16  7:40 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Zhang Yi Z

On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
> 
> Enable/disable CET MSRs interception per associated feature configuration.
> 
> Pass through CET MSRs that are managed by XSAVE, as they cannot be
> intercepted without also intercepting XSAVE. However, intercepting XSAVE
> would likely cause unacceptable performance overhead.
> MSR_IA32_INT_SSP_TAB is not managed by XSAVE, so it is intercepted.
> 
> Note, this MSR design introduced an architectural limitation of SHSTK and
> IBT control for guest, i.e., when SHSTK is exposed, IBT is also available
> to guest from architectural perspective since IBT relies on subset of SHSTK
> relevant MSRs.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 17/41] KVM: VMX: Set host constant supervisor states to VMCS fields
  2025-09-12 23:22 ` [PATCH v15 17/41] KVM: VMX: Set host constant supervisor states to VMCS fields Sean Christopherson
@ 2025-09-16  7:44   ` Xiaoyao Li
  2025-09-17  8:48   ` Xiaoyao Li
  1 sibling, 0 replies; 130+ messages in thread
From: Xiaoyao Li @ 2025-09-16  7:44 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Zhang Yi Z

On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
> 
> Save constant values to HOST_{S_CET,SSP,INTR_SSP_TABLE} field explicitly.
> Kernel IBT is supported and the setting in MSR_IA32_S_CET is static after
> post-boot(The exception is BIOS call case but vCPU thread never across it)
> and KVM doesn't need to refresh HOST_S_CET field before every VM-Enter/
> VM-Exit sequence.
> 
> Host supervisor shadow stack is not enabled now and SSP is not accessible
> to kernel mode, thus it's safe to set host IA32_INT_SSP_TAB/SSP VMCS field
> to 0s. When shadow stack is enabled for CPL3, SSP is reloaded from PL3_SSP
> before it exits to userspace. Check SDM Vol 2A/B Chapter 3/4 for SYSCALL/
> SYSRET/SYSENTER SYSEXIT/RDSSP/CALL etc.
> 
> Prevent KVM module loading if host supervisor shadow stack SHSTK_EN is set
> in MSR_IA32_S_CET as KVM cannot co-exit with it correctly.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Suggested-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> [sean: snapshot host S_CET if SHSTK *or* IBT is supported]
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 14/41] KVM: VMX: Emulate read and write to CET MSRs
  2025-09-16  7:07   ` Xiaoyao Li
@ 2025-09-16  7:48     ` Chao Gao
  2025-09-16  8:10       ` Xiaoyao Li
  2025-09-19 22:11     ` Sean Christopherson
  1 sibling, 1 reply; 130+ messages in thread
From: Chao Gao @ 2025-09-16  7:48 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Sean Christopherson, Paolo Bonzini, kvm, linux-kernel,
	Tom Lendacky, Mathias Krause, John Allen, Rick Edgecombe,
	Maxim Levitsky, Zhang Yi Z

On Tue, Sep 16, 2025 at 03:07:06PM +0800, Xiaoyao Li wrote:
>On 9/13/2025 7:22 AM, Sean Christopherson wrote:
>> From: Yang Weijiang <weijiang.yang@intel.com>
>> 
>> Add emulation interface for CET MSR access. The emulation code is split
>> into common part and vendor specific part. The former does common checks
>> for MSRs, e.g., accessibility, data validity etc., then passes operation
>> to either XSAVE-managed MSRs via the helpers or CET VMCS fields.
>> 
>> SSP can only be read via RDSSP. Writing even requires destructive and
>> potentially faulting operations such as SAVEPREVSSP/RSTORSSP or
>> SETSSBSY/CLRSSBSY. Let the host use a pseudo-MSR that is just a wrapper
>> for the GUEST_SSP field of the VMCS.
>> 
>> Suggested-by: Sean Christopherson <seanjc@google.com>
>> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>> Tested-by: Mathias Krause <minipli@grsecurity.net>
>> Tested-by: John Allen <john.allen@amd.com>
>> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> [sean: drop call to kvm_set_xstate_msr() for S_CET, consolidate code]
>
>Is the change/update of "drop call to kvm_set_xstate_msr() for S_CET" true
>for this patch?

v14 has that call, but it is incorrect. So Sean dropped it. See v14:

https://lore.kernel.org/kvm/20250909093953.202028-12-chao.gao@intel.com/

>
>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>
>Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 14/41] KVM: VMX: Emulate read and write to CET MSRs
  2025-09-16  7:48     ` Chao Gao
@ 2025-09-16  8:10       ` Xiaoyao Li
  0 siblings, 0 replies; 130+ messages in thread
From: Xiaoyao Li @ 2025-09-16  8:10 UTC (permalink / raw)
  To: Chao Gao
  Cc: Sean Christopherson, Paolo Bonzini, kvm, linux-kernel,
	Tom Lendacky, Mathias Krause, John Allen, Rick Edgecombe,
	Maxim Levitsky, Zhang Yi Z

On 9/16/2025 3:48 PM, Chao Gao wrote:
> On Tue, Sep 16, 2025 at 03:07:06PM +0800, Xiaoyao Li wrote:
>> On 9/13/2025 7:22 AM, Sean Christopherson wrote:
>>> From: Yang Weijiang <weijiang.yang@intel.com>
>>>
>>> Add emulation interface for CET MSR access. The emulation code is split
>>> into common part and vendor specific part. The former does common checks
>>> for MSRs, e.g., accessibility, data validity etc., then passes operation
>>> to either XSAVE-managed MSRs via the helpers or CET VMCS fields.
>>>
>>> SSP can only be read via RDSSP. Writing even requires destructive and
>>> potentially faulting operations such as SAVEPREVSSP/RSTORSSP or
>>> SETSSBSY/CLRSSBSY. Let the host use a pseudo-MSR that is just a wrapper
>>> for the GUEST_SSP field of the VMCS.
>>>
>>> Suggested-by: Sean Christopherson <seanjc@google.com>
>>> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>>> Tested-by: Mathias Krause <minipli@grsecurity.net>
>>> Tested-by: John Allen <john.allen@amd.com>
>>> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
>>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>>> [sean: drop call to kvm_set_xstate_msr() for S_CET, consolidate code]
>>
>> Is the change/update of "drop call to kvm_set_xstate_msr() for S_CET" true
>> for this patch?
> 
> v14 has that call, but it is incorrect. So Sean dropped it. See v14:
> 
> https://lore.kernel.org/kvm/20250909093953.202028-12-chao.gao@intel.com/

Sorry, my fault. I missed it somehow, when bouncing between the 3 MSR 
handlers.

>>
>>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>>
>> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 09/41] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs
  2025-09-12 23:22 ` [PATCH v15 09/41] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Sean Christopherson
  2025-09-15 17:04   ` Xin Li
  2025-09-16  6:51   ` Xiaoyao Li
@ 2025-09-16  8:28   ` Binbin Wu
  2025-09-17  2:51     ` Binbin Wu
  2025-09-17 12:47     ` Sean Christopherson
  2 siblings, 2 replies; 130+ messages in thread
From: Binbin Wu @ 2025-09-16  8:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z



On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> Load the guest's FPU state if userspace is accessing MSRs whose values
> are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(),
> to facilitate access to such kind of MSRs.
>
> If MSRs supported in kvm_caps.supported_xss are passed through to guest,
> the guest MSRs are swapped with host's before vCPU exits to userspace and
> after it reenters kernel before next VM-entry.
>
> Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
> explicitly check @vcpu is non-null before attempting to load guest state.
> The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without
> loading guest FPU state (which doesn't exist).
>
> Note that guest_cpuid_has() is not queried as host userspace is allowed to
> access MSRs that have not been exposed to the guest, e.g. it might do
> KVM_SET_MSRS prior to KVM_SET_CPUID2.
>
> The two helpers are put here in order to manifest accessing xsave-managed
> MSRs requires special check and handling to guarantee the correctness of
> read/write to the MSRs.
>
> Co-developed-by: Yang Weijiang <weijiang.yang@intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> [sean: drop S_CET, add big comment, move accessors to x86.c]
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

Two nits below.

> ---
>   arch/x86/kvm/x86.c | 86 +++++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 85 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c5e38d6943fe..a95ca2fbd3a9 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -136,6 +136,9 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
>   static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
>   
>   static DEFINE_MUTEX(vendor_module_lock);
> +static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
> +static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
> +
>   struct kvm_x86_ops kvm_x86_ops __read_mostly;
>   
>   #define KVM_X86_OP(func)					     \
> @@ -3801,6 +3804,66 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
>   	mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
>   }
>   
> +/*
> + * Returns true if the MSR in question is managed via XSTATE, i.e. is context
> + * switched with the rest of guest FPU state.  Note!  S_CET is _not_ context
> + * switched via XSTATE even though it _is_ saved/restored via XSAVES/XRSTORS.
> + * Because S_CET is loaded on VM-Enter and VM-Exit via dedicated VMCS fields,
> + * the value saved/restored via XSTATE is always the host's value.  That detail
> + * is _extremely_ important, as the guest's S_CET must _never_ be resident in
> + * hardware while executing in the host.  Loading guest values for U_CET and
> + * PL[0-3]_SSP while executing in the kernel is safe, as U_CET is specific to
> + * userspace, and PL[0-3]_SSP are only consumed when transitioning to lower
> + * privilegel levels, i.e. are effectively only consumed by userspace as well.
> + */
> +static bool is_xstate_managed_msr(struct kvm_vcpu *vcpu, u32 msr)
> +{
> +	if (!vcpu)
> +		return false;
> +
> +	switch (msr) {
> +	case MSR_IA32_U_CET:
> +		return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) ||
> +		       guest_cpu_cap_has(vcpu, X86_FEATURE_IBT);
> +	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> +		return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
> +	default:
> +		return false;
> +	}
> +}
> +
> +/*
> + * Lock and/or reload guest FPU and access xstate MSRs. For accesses initiated


Lock is unconditional and reload is conditional.
"and/or" seems not accurate?

> + * by host, guest FPU is loaded in __msr_io(). For accesses initiated by guest,
> + * guest FPU should have been loaded already.
> + */
> +static __always_inline void kvm_access_xstate_msr(struct kvm_vcpu *vcpu,
> +						  struct msr_data *msr_info,
> +						  int access)
> +{
> +	BUILD_BUG_ON(access != MSR_TYPE_R && access != MSR_TYPE_W);
> +
> +	KVM_BUG_ON(!is_xstate_managed_msr(vcpu, msr_info->index), vcpu->kvm);
> +	KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
> +
> +	kvm_fpu_get();
> +	if (access == MSR_TYPE_R)
> +		rdmsrq(msr_info->index, msr_info->data);
> +	else
> +		wrmsrq(msr_info->index, msr_info->data);
> +	kvm_fpu_put();
> +}
> +
> +static __maybe_unused void kvm_set_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> +{
> +	kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_W);
> +}
> +
> +static __maybe_unused void kvm_get_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> +{
> +	kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_R);
> +}
> +
>   int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   {
>   	u32 msr = msr_info->index;
> @@ -4551,11 +4614,25 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
>   		    int (*do_msr)(struct kvm_vcpu *vcpu,
>   				  unsigned index, u64 *data))
>   {
> +	bool fpu_loaded = false;
>   	int i;
>   
> -	for (i = 0; i < msrs->nmsrs; ++i)
> +	for (i = 0; i < msrs->nmsrs; ++i) {
> +		/*
> +		 * If userspace is accessing one or more XSTATE-managed MSRs,
> +		 * temporarily load the guest's FPU state so that the guest's
> +		 * MSR value(s) is resident in hardware, i.e. so that KVM can

Using "i.e." and "so that" together feels repetitive.[...]

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 10/41] KVM: x86: Add fault checks for guest CR4.CET setting
  2025-09-12 23:22 ` [PATCH v15 10/41] KVM: x86: Add fault checks for guest CR4.CET setting Sean Christopherson
@ 2025-09-16  8:33   ` Binbin Wu
  0 siblings, 0 replies; 130+ messages in thread
From: Binbin Wu @ 2025-09-16  8:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z



On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Check potential faults for CR4.CET setting per Intel SDM requirements.
> CET can be enabled if and only if CR0.WP == 1, i.e. setting CR4.CET ==
> 1 faults if CR0.WP == 0 and setting CR0.WP == 0 fails if CR4.CET == 1.
>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

> ---
>   arch/x86/kvm/x86.c | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a95ca2fbd3a9..5653ddfe124e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1176,6 +1176,9 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
>   	    (is_64_bit_mode(vcpu) || kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE)))
>   		return 1;
>   
> +	if (!(cr0 & X86_CR0_WP) && kvm_is_cr4_bit_set(vcpu, X86_CR4_CET))
> +		return 1;
> +
>   	kvm_x86_call(set_cr0)(vcpu, cr0);
>   
>   	kvm_post_set_cr0(vcpu, old_cr0, cr0);
> @@ -1376,6 +1379,9 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>   			return 1;
>   	}
>   
> +	if ((cr4 & X86_CR4_CET) && !kvm_is_cr0_bit_set(vcpu, X86_CR0_WP))
> +		return 1;
> +
>   	kvm_x86_call(set_cr4)(vcpu, cr4);
>   
>   	kvm_post_set_cr4(vcpu, old_cr4, cr4);


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 11/41] KVM: x86: Report KVM supported CET MSRs as to-be-saved
  2025-09-12 23:22 ` [PATCH v15 11/41] KVM: x86: Report KVM supported CET MSRs as to-be-saved Sean Christopherson
  2025-09-15  6:30   ` Xiaoyao Li
@ 2025-09-16  8:46   ` Binbin Wu
  1 sibling, 0 replies; 130+ messages in thread
From: Binbin Wu @ 2025-09-16  8:46 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z



On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Add CET MSRs to the list of MSRs reported to userspace if the feature,
> i.e. IBT or SHSTK, associated with the MSRs is supported by KVM.
>
> Suggested-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

> ---
>   arch/x86/kvm/x86.c | 18 ++++++++++++++++++
>   1 file changed, 18 insertions(+)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5653ddfe124e..2c9908bc8b32 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -344,6 +344,10 @@ static const u32 msrs_to_save_base[] = {
>   	MSR_IA32_UMWAIT_CONTROL,
>   
>   	MSR_IA32_XFD, MSR_IA32_XFD_ERR, MSR_IA32_XSS,
> +
> +	MSR_IA32_U_CET, MSR_IA32_S_CET,
> +	MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP,
> +	MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB,
>   };
>   
>   static const u32 msrs_to_save_pmu[] = {
> @@ -7598,6 +7602,20 @@ static void kvm_probe_msr_to_save(u32 msr_index)
>   		if (!kvm_caps.supported_xss)
>   			return;
>   		break;
> +	case MSR_IA32_U_CET:
> +	case MSR_IA32_S_CET:
> +		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
> +		    !kvm_cpu_cap_has(X86_FEATURE_IBT))
> +			return;
> +		break;
> +	case MSR_IA32_INT_SSP_TAB:
> +		if (!kvm_cpu_cap_has(X86_FEATURE_LM))
> +			return;
> +		fallthrough;
> +	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> +		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
> +			return;
> +		break;
>   	default:
>   		break;
>   	}


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 12/41] KVM: VMX: Introduce CET VMCS fields and control bits
  2025-09-12 23:22 ` [PATCH v15 12/41] KVM: VMX: Introduce CET VMCS fields and control bits Sean Christopherson
  2025-09-15  6:31   ` Xiaoyao Li
@ 2025-09-16  9:00   ` Binbin Wu
  1 sibling, 0 replies; 130+ messages in thread
From: Binbin Wu @ 2025-09-16  9:00 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z



On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Control-flow Enforcement Technology (CET) is a kind of CPU feature used
> to prevent Return/CALL/Jump-Oriented Programming (ROP/COP/JOP) attacks.
> It provides two sub-features(SHSTK,IBT) to defend against ROP/COP/JOP
> style control-flow subversion attacks.
>
> Shadow Stack (SHSTK):
>    A shadow stack is a second stack used exclusively for control transfer
>    operations. The shadow stack is separate from the data/normal stack and
>    can be enabled individually in user and kernel mode. When shadow stack
>    is enabled, CALL pushes the return address on both the data and shadow
>    stack. RET pops the return address from both stacks and compares them.
>    If the return addresses from the two stacks do not match, the processor
>    generates a #CP.
>
> Indirect Branch Tracking (IBT):
>    IBT introduces instruction(ENDBRANCH)to mark valid target addresses of
>    indirect branches (CALL, JMP etc...). If an indirect branch is executed
>    and the next instruction is _not_ an ENDBRANCH, the processor generates
>    a #CP. These instruction behaves as a NOP on platforms that have no CET.

These -> The

>
> Several new CET MSRs are defined to support CET:
>    MSR_IA32_{U,S}_CET: CET settings for {user,supervisor} CET respectively.
>
>    MSR_IA32_PL{0,1,2,3}_SSP: SHSTK pointer linear address for CPL{0,1,2,3}.
>
>    MSR_IA32_INT_SSP_TAB: Linear address of SHSTK pointer table, whose entry
> 			is indexed by IST of interrupt gate desc.
>
> Two XSAVES state bits are introduced for CET:
>    IA32_XSS:[bit 11]: Control saving/restoring user mode CET states
>    IA32_XSS:[bit 12]: Control saving/restoring supervisor mode CET states.
>
> Six VMCS fields are introduced for CET:
>    {HOST,GUEST}_S_CET: Stores CET settings for kernel mode.
>    {HOST,GUEST}_SSP: Stores current active SSP.
>    {HOST,GUEST}_INTR_SSP_TABLE: Stores current active MSR_IA32_INT_SSP_TAB.
>
> On Intel platforms, two additional bits are defined in VM_EXIT and VM_ENTRY
> control fields:
> If VM_EXIT_LOAD_CET_STATE = 1, host CET states are loaded from following
> VMCS fields at VM-Exit:
>    HOST_S_CET
>    HOST_SSP
>    HOST_INTR_SSP_TABLE
>
> If VM_ENTRY_LOAD_CET_STATE = 1, guest CET states are loaded from following
> VMCS fields at VM-Entry:
>    GUEST_S_CET
>    GUEST_SSP
>    GUEST_INTR_SSP_TABLE
>
> Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

> ---
>   arch/x86/include/asm/vmx.h | 8 ++++++++
>   1 file changed, 8 insertions(+)
>
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index cca7d6641287..ce10a7e2d3d9 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -106,6 +106,7 @@
>   #define VM_EXIT_CLEAR_BNDCFGS                   0x00800000
>   #define VM_EXIT_PT_CONCEAL_PIP			0x01000000
>   #define VM_EXIT_CLEAR_IA32_RTIT_CTL		0x02000000
> +#define VM_EXIT_LOAD_CET_STATE                  0x10000000
>   
>   #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR	0x00036dff
>   
> @@ -119,6 +120,7 @@
>   #define VM_ENTRY_LOAD_BNDCFGS                   0x00010000
>   #define VM_ENTRY_PT_CONCEAL_PIP			0x00020000
>   #define VM_ENTRY_LOAD_IA32_RTIT_CTL		0x00040000
> +#define VM_ENTRY_LOAD_CET_STATE                 0x00100000
>   
>   #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR	0x000011ff
>   
> @@ -369,6 +371,9 @@ enum vmcs_field {
>   	GUEST_PENDING_DBG_EXCEPTIONS    = 0x00006822,
>   	GUEST_SYSENTER_ESP              = 0x00006824,
>   	GUEST_SYSENTER_EIP              = 0x00006826,
> +	GUEST_S_CET                     = 0x00006828,
> +	GUEST_SSP                       = 0x0000682a,
> +	GUEST_INTR_SSP_TABLE            = 0x0000682c,
>   	HOST_CR0                        = 0x00006c00,
>   	HOST_CR3                        = 0x00006c02,
>   	HOST_CR4                        = 0x00006c04,
> @@ -381,6 +386,9 @@ enum vmcs_field {
>   	HOST_IA32_SYSENTER_EIP          = 0x00006c12,
>   	HOST_RSP                        = 0x00006c14,
>   	HOST_RIP                        = 0x00006c16,
> +	HOST_S_CET                      = 0x00006c18,
> +	HOST_SSP                        = 0x00006c1a,
> +	HOST_INTR_SSP_TABLE             = 0x00006c1c
>   };
>   
>   /*


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 00/41] KVM: x86: Mega-CET
  2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
                   ` (42 preceding siblings ...)
  2025-09-15 21:20 ` John Allen
@ 2025-09-16 13:53 ` Chao Gao
  43 siblings, 0 replies; 130+ messages in thread
From: Chao Gao @ 2025-09-16 13:53 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z

On Fri, Sep 12, 2025 at 04:22:38PM -0700, Sean Christopherson wrote:
>This series is (hopefully) all of the in-flight CET virtualization patches
>in one big bundle.  Please holler if I missed a patch or three as this is what
>I am planning on applying for 6.18 (modulo fixups and whatnot), i.e. if there's
>something else that's needed to enable CET virtualization, now's the time...
>
>Patches 1-3 probably need the most attention, as they are new in v15 and I
>don't have a fully working SEV-ES setup (don't have the right guest firmware,
>ugh).  Though testing on everything would be much appreciated.
>

I tested this series on my EMR system using patched KUT [1][2][3], kselftest,
and glibc tests. No CET test failures or regressions were observed.

[1]: https://lore.kernel.org/kvm/20250626073459.12990-1-minipli@grsecurity.net/
[2]: https://lore.kernel.org/kvm/20250915144936.113996-1-chao.gao@intel.com/
[3]: https://github.com/xinli-intel/kvm-unit-tests/commit/f1df81c3189a3328adb47c7dd6cd985830fe738f

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  2025-09-12 23:23 ` [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid Sean Christopherson
@ 2025-09-16 18:55   ` John Allen
  2025-09-16 19:53     ` Sean Christopherson
  0 siblings, 1 reply; 130+ messages in thread
From: John Allen @ 2025-09-16 18:55 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

On Fri, Sep 12, 2025 at 04:23:07PM -0700, Sean Christopherson wrote:
> Synchronize XSS from the GHCB to KVM's internal tracking if the guest
> marks XSS as valid on a #VMGEXIT.  Like XCR0, KVM needs an up-to-date copy
> of XSS in order to compute the required XSTATE size when emulating
> CPUID.0xD.0x1 for the guest.
> 
> Treat the incoming XSS change as an emulated write, i.e. validatate the
> guest-provided value, to avoid letting the guest load garbage into KVM's
> tracking.  Simply ignore bad values, as either the guest managed to get an
> unsupported value into hardware, or the guest is misbehaving and providing
> pure garbage.  In either case, KVM can't fix the broken guest.
> 
> Note, emulating the change as an MSR write also takes care of side effects,
> e.g. marking dynamic CPUID bits as dirty.
> 
> Suggested-by: John Allen <john.allen@amd.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/sev.c | 3 +++
>  arch/x86/kvm/svm/svm.h | 1 +
>  2 files changed, 4 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 0cd77a87dd84..0cd32df7b9b6 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3306,6 +3306,9 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
>  	if (kvm_ghcb_xcr0_is_valid(svm))
>  		__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
>  
> +	if (kvm_ghcb_xss_is_valid(svm))
> +		__kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(ghcb));
> +

It looks like this is the change that caused the selftest regression
with sev-es. It's not yet clear to me what the problem is though.

Thanks,
John

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  2025-09-16 18:55   ` John Allen
@ 2025-09-16 19:53     ` Sean Christopherson
  2025-09-16 20:33       ` John Allen
  0 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-16 19:53 UTC (permalink / raw)
  To: John Allen
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

On Tue, Sep 16, 2025, John Allen wrote:
> On Fri, Sep 12, 2025 at 04:23:07PM -0700, Sean Christopherson wrote:
> > Synchronize XSS from the GHCB to KVM's internal tracking if the guest
> > marks XSS as valid on a #VMGEXIT.  Like XCR0, KVM needs an up-to-date copy
> > of XSS in order to compute the required XSTATE size when emulating
> > CPUID.0xD.0x1 for the guest.
> > 
> > Treat the incoming XSS change as an emulated write, i.e. validatate the
> > guest-provided value, to avoid letting the guest load garbage into KVM's
> > tracking.  Simply ignore bad values, as either the guest managed to get an
> > unsupported value into hardware, or the guest is misbehaving and providing
> > pure garbage.  In either case, KVM can't fix the broken guest.
> > 
> > Note, emulating the change as an MSR write also takes care of side effects,
> > e.g. marking dynamic CPUID bits as dirty.
> > 
> > Suggested-by: John Allen <john.allen@amd.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/kvm/svm/sev.c | 3 +++
> >  arch/x86/kvm/svm/svm.h | 1 +
> >  2 files changed, 4 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index 0cd77a87dd84..0cd32df7b9b6 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -3306,6 +3306,9 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> >  	if (kvm_ghcb_xcr0_is_valid(svm))
> >  		__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
> >  
> > +	if (kvm_ghcb_xss_is_valid(svm))
> > +		__kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(ghcb));
> > +
> 
> It looks like this is the change that caused the selftest regression
> with sev-es. It's not yet clear to me what the problem is though.

Do you see any WARNs in the guest kernel log?

The most obvious potential bug is that KVM is missing a CPUID update, e.g. due
to dropping an XSS write, consuming stale data, not setting cpuid_dynamic_bits_dirty,
etc.  But AFAICT, CPUID.0xD.1.EBX (only thing that consumes the current XSS) is
only used by init_xstate_size(), and I would expect the guest kernel's sanity
checks in paranoid_xstate_size_valid() to yell if KVM botches CPUID emulation.

Another possibility is that unconditionally setting cpuid_dynamic_bits_dirty
was masking a pre-existing (or just different) bug, and that "fixing" that flaw
by eliding cpuid_dynamic_bits_dirty when "vcpu->arch.ia32_xss == data" exposed
the bug.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  2025-09-16 19:53     ` Sean Christopherson
@ 2025-09-16 20:33       ` John Allen
  2025-09-16 21:38         ` Sean Christopherson
  0 siblings, 1 reply; 130+ messages in thread
From: John Allen @ 2025-09-16 20:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li

On Tue, Sep 16, 2025 at 12:53:58PM -0700, Sean Christopherson wrote:
> On Tue, Sep 16, 2025, John Allen wrote:
> > On Fri, Sep 12, 2025 at 04:23:07PM -0700, Sean Christopherson wrote:
> > > Synchronize XSS from the GHCB to KVM's internal tracking if the guest
> > > marks XSS as valid on a #VMGEXIT.  Like XCR0, KVM needs an up-to-date copy
> > > of XSS in order to compute the required XSTATE size when emulating
> > > CPUID.0xD.0x1 for the guest.
> > > 
> > > Treat the incoming XSS change as an emulated write, i.e. validatate the
> > > guest-provided value, to avoid letting the guest load garbage into KVM's
> > > tracking.  Simply ignore bad values, as either the guest managed to get an
> > > unsupported value into hardware, or the guest is misbehaving and providing
> > > pure garbage.  In either case, KVM can't fix the broken guest.
> > > 
> > > Note, emulating the change as an MSR write also takes care of side effects,
> > > e.g. marking dynamic CPUID bits as dirty.
> > > 
> > > Suggested-by: John Allen <john.allen@amd.com>
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > >  arch/x86/kvm/svm/sev.c | 3 +++
> > >  arch/x86/kvm/svm/svm.h | 1 +
> > >  2 files changed, 4 insertions(+)
> > > 
> > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > index 0cd77a87dd84..0cd32df7b9b6 100644
> > > --- a/arch/x86/kvm/svm/sev.c
> > > +++ b/arch/x86/kvm/svm/sev.c
> > > @@ -3306,6 +3306,9 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> > >  	if (kvm_ghcb_xcr0_is_valid(svm))
> > >  		__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
> > >  
> > > +	if (kvm_ghcb_xss_is_valid(svm))
> > > +		__kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(ghcb));
> > > +
> > 
> > It looks like this is the change that caused the selftest regression
> > with sev-es. It's not yet clear to me what the problem is though.
> 
> Do you see any WARNs in the guest kernel log?
> 
> The most obvious potential bug is that KVM is missing a CPUID update, e.g. due
> to dropping an XSS write, consuming stale data, not setting cpuid_dynamic_bits_dirty,
> etc.  But AFAICT, CPUID.0xD.1.EBX (only thing that consumes the current XSS) is
> only used by init_xstate_size(), and I would expect the guest kernel's sanity
> checks in paranoid_xstate_size_valid() to yell if KVM botches CPUID emulation.

Yes, actually that looks to be the case:

[    0.463504] ------------[ cut here ]------------
[    0.464443] XSAVE consistency problem: size 880 != kernel_size 840
[    0.465445] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/fpu/xstate.c:638 paranoid_xstate_size_valid+0x101/0x140
[    0.466443] Modules linked in:
[    0.467445] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.17.0-rc3-shstk-v15+ #6 PREEMPT(voluntary)
[    0.468443] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
[    0.469444] RIP: 0010:paranoid_xstate_size_valid+0x101/0x140
[    0.470443] Code: 89 44 24 04 e8 00 fa ff ff 8b 44 24 04 eb c2 89 da 89 c6 48 c7 c7 80 f4 bc 9e 89 44 24 04 c6 05 9d a3 a4 ff 01 e8 3f fa fb fd <0f> 0b 8b 44 24 04 eb ce 80 3d 8a a3 a4 ff 00 74 09 e8 c9 f9 ff ff
[    0.471443] RSP: 0000:ffffffff9ee03e80 EFLAGS: 00010286
[    0.472443] RAX: 0000000000000000 RBX: 0000000000000348 RCX: c0000000fffeffff
[    0.473443] RDX: 0000000000000000 RSI: 00000000fffeffff RDI: ffffffff9fd83c00
[    0.474443] RBP: 000000000000000c R08: 0000000000000000 R09: 0000000000000003
[    0.475443] R10: ffffffff9ee03d20 R11: ffff8c04fff8ffe8 R12: 0000000000000001
[    0.476443] R13: ffffffffffffffff R14: 0000000000000001 R15: 000000007c135000
[    0.477443] FS:  0000000000000000(0000) GS:ffff8c051c118000(0000) knlGS:0000000000000000
[    0.478443] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.479443] CR2: ffff8c03f4c01000 CR3: 0008000f73822001 CR4: 0000000000f70ef0
[    0.480445] PKRU: 55555554
[    0.480967] Call Trace:
[    0.481446]  <TASK>
[    0.481856]  init_xstate_size+0xa8/0x160
[    0.482444]  fpu__init_system_xstate+0x1c4/0x500
[    0.483444]  fpu__init_system+0x93/0xc0
[    0.484443]  arch_cpu_finalize_init+0xd2/0x160
[    0.485290]  start_kernel+0x330/0x470
[    0.485444]  x86_64_start_reservations+0x14/0x30
[    0.486443]  x86_64_start_kernel+0xd0/0xe0
[    0.487443]  common_startup_64+0x13e/0x141
[    0.488444]  </TASK>
[    0.488879] ---[ end trace 0000000000000000 ]--

> 
> Another possibility is that unconditionally setting cpuid_dynamic_bits_dirty
> was masking a pre-existing (or just different) bug, and that "fixing" that flaw
> by eliding cpuid_dynamic_bits_dirty when "vcpu->arch.ia32_xss == data" exposed
> the bug.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  2025-09-16 20:33       ` John Allen
@ 2025-09-16 21:38         ` Sean Christopherson
  2025-09-16 22:55           ` John Allen
  0 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-16 21:38 UTC (permalink / raw)
  To: John Allen
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li

On Tue, Sep 16, 2025, John Allen wrote:
> On Tue, Sep 16, 2025 at 12:53:58PM -0700, Sean Christopherson wrote:
> > On Tue, Sep 16, 2025, John Allen wrote:
> > > On Fri, Sep 12, 2025 at 04:23:07PM -0700, Sean Christopherson wrote:
> > > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > > index 0cd77a87dd84..0cd32df7b9b6 100644
> > > > --- a/arch/x86/kvm/svm/sev.c
> > > > +++ b/arch/x86/kvm/svm/sev.c
> > > > @@ -3306,6 +3306,9 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> > > >  	if (kvm_ghcb_xcr0_is_valid(svm))
> > > >  		__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
> > > >  
> > > > +	if (kvm_ghcb_xss_is_valid(svm))
> > > > +		__kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(ghcb));
> > > > +
> > > 
> > > It looks like this is the change that caused the selftest regression
> > > with sev-es. It's not yet clear to me what the problem is though.
> > 
> > Do you see any WARNs in the guest kernel log?
> > 
> > The most obvious potential bug is that KVM is missing a CPUID update, e.g. due
> > to dropping an XSS write, consuming stale data, not setting cpuid_dynamic_bits_dirty,
> > etc.  But AFAICT, CPUID.0xD.1.EBX (only thing that consumes the current XSS) is
> > only used by init_xstate_size(), and I would expect the guest kernel's sanity
> > checks in paranoid_xstate_size_valid() to yell if KVM botches CPUID emulation.
> 
> Yes, actually that looks to be the case:
> 
> [    0.463504] ------------[ cut here ]------------
> [    0.464443] XSAVE consistency problem: size 880 != kernel_size 840
> [    0.465445] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/fpu/xstate.c:638 paranoid_xstate_size_valid+0x101/0x140

Can you run with the below printk tracing in the host (and optionally tracing in
the guest for its updates)?  Compile tested only.

There should be very few XSS updates, so this _shouldn't_ spam/crash your host :-)

---
 arch/x86/kvm/svm/sev.c |  6 ++++--
 arch/x86/kvm/x86.c     | 15 ++++++++++++---
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0cd32df7b9b6..8ac87d623767 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3306,8 +3306,10 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
 	if (kvm_ghcb_xcr0_is_valid(svm))
 		__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
 
-	if (kvm_ghcb_xss_is_valid(svm))
-		__kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(ghcb));
+	if (kvm_ghcb_xss_is_valid(svm)) {
+		if (__kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(ghcb));
+			pr_warn("Dropped XSS update, val = %llx\n", data);
+	}
 
 	/* Copy the GHCB exit information into the VMCB fields */
 	exit_code = kvm_ghcb_get_sw_exit_code(ghcb);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c78acab2ff3f..a846ed69ce2c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4118,13 +4118,22 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		}
 		break;
 	case MSR_IA32_XSS:
-		if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
+		if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES)) {
+			pr_warn("Guest CPUID doesn't have XSAVES\n");
 			return KVM_MSR_RET_UNSUPPORTED;
+		}
 
-		if (data & ~vcpu->arch.guest_supported_xss)
+		if (data & ~vcpu->arch.guest_supported_xss) {
+			pr_warn("Invalid XSS: supported = %llx, val = %llx\n",
+				vcpu->arch.guest_supported_xss, data);
 			return 1;
-		if (vcpu->arch.ia32_xss == data)
+		}
+		if (vcpu->arch.ia32_xss == data) {
+			pr_warn("XSS already set to val = %llx, eliding updates\n", data);
 			break;
+		}
+
+		pr_warn("XSS updated to val = %llx, marking CPUID dirty\n", data);
 		vcpu->arch.ia32_xss = data;
 		vcpu->arch.cpuid_dynamic_bits_dirty = true;
 		break;

base-commit: 14298d819d5a6b7180a4089e7d2121ca3551dc6c
--

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  2025-09-16 21:38         ` Sean Christopherson
@ 2025-09-16 22:55           ` John Allen
  2025-09-18 19:48             ` John Allen
  0 siblings, 1 reply; 130+ messages in thread
From: John Allen @ 2025-09-16 22:55 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li

On Tue, Sep 16, 2025 at 02:38:52PM -0700, Sean Christopherson wrote:
> On Tue, Sep 16, 2025, John Allen wrote:
> > On Tue, Sep 16, 2025 at 12:53:58PM -0700, Sean Christopherson wrote:
> > > On Tue, Sep 16, 2025, John Allen wrote:
> > > > On Fri, Sep 12, 2025 at 04:23:07PM -0700, Sean Christopherson wrote:
> > > > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > > > index 0cd77a87dd84..0cd32df7b9b6 100644
> > > > > --- a/arch/x86/kvm/svm/sev.c
> > > > > +++ b/arch/x86/kvm/svm/sev.c
> > > > > @@ -3306,6 +3306,9 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> > > > >  	if (kvm_ghcb_xcr0_is_valid(svm))
> > > > >  		__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
> > > > >  
> > > > > +	if (kvm_ghcb_xss_is_valid(svm))
> > > > > +		__kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(ghcb));
> > > > > +
> > > > 
> > > > It looks like this is the change that caused the selftest regression
> > > > with sev-es. It's not yet clear to me what the problem is though.
> > > 
> > > Do you see any WARNs in the guest kernel log?
> > > 
> > > The most obvious potential bug is that KVM is missing a CPUID update, e.g. due
> > > to dropping an XSS write, consuming stale data, not setting cpuid_dynamic_bits_dirty,
> > > etc.  But AFAICT, CPUID.0xD.1.EBX (only thing that consumes the current XSS) is
> > > only used by init_xstate_size(), and I would expect the guest kernel's sanity
> > > checks in paranoid_xstate_size_valid() to yell if KVM botches CPUID emulation.
> > 
> > Yes, actually that looks to be the case:
> > 
> > [    0.463504] ------------[ cut here ]------------
> > [    0.464443] XSAVE consistency problem: size 880 != kernel_size 840
> > [    0.465445] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/fpu/xstate.c:638 paranoid_xstate_size_valid+0x101/0x140
> 
> Can you run with the below printk tracing in the host (and optionally tracing in
> the guest for its updates)?  Compile tested only.

Interesting, I see "Guest CPUID doesn't have XSAVES" times the number of
cpus followed by "XSS already set to val = 0, eliding updates" times the
number of cpus. This is with host tracing only. I can try with guest
tracing too in the morning.

Thanks,
John

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 09/41] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs
  2025-09-16  8:28   ` Binbin Wu
@ 2025-09-17  2:51     ` Binbin Wu
  2025-09-17 12:47     ` Sean Christopherson
  1 sibling, 0 replies; 130+ messages in thread
From: Binbin Wu @ 2025-09-17  2:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z



On 9/16/2025 4:28 PM, Binbin Wu wrote:
>
>
> On 9/13/2025 7:22 AM, Sean Christopherson wrote:
>> Load the guest's FPU state if userspace is accessing MSRs whose values
>> are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(),
>> to facilitate access to such kind of MSRs.
>>
>> If MSRs supported in kvm_caps.supported_xss are passed through to guest,
>> the guest MSRs are swapped with host's before vCPU exits to userspace and
>> after it reenters kernel before next VM-entry.
>>
>> Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
>> explicitly check @vcpu is non-null before attempting to load guest state.
>> The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without
>> loading guest FPU state (which doesn't exist).
>>
>> Note that guest_cpuid_has() is not queried as host userspace is allowed to
>> access MSRs that have not been exposed to the guest, e.g. it might do
>> KVM_SET_MSRS prior to KVM_SET_CPUID2.
>>
>> The two helpers are put here in order to manifest accessing xsave-managed
>> MSRs requires special check and handling to guarantee the correctness of
>> read/write to the MSRs.
>>
>> Co-developed-by: Yang Weijiang <weijiang.yang@intel.com>
>> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
>> Tested-by: Mathias Krause <minipli@grsecurity.net>
>> Tested-by: John Allen <john.allen@amd.com>
>> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> [sean: drop S_CET, add big comment, move accessors to x86.c]
>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>
> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
>
> Two nits below.
>
>> ---
>>   arch/x86/kvm/x86.c | 86 +++++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 85 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index c5e38d6943fe..a95ca2fbd3a9 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -136,6 +136,9 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
>>   static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
>>     static DEFINE_MUTEX(vendor_module_lock);
>> +static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
>> +static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
>> +
>>   struct kvm_x86_ops kvm_x86_ops __read_mostly;
>>     #define KVM_X86_OP(func)                         \
>> @@ -3801,6 +3804,66 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
>>       mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
>>   }
>>   +/*
>> + * Returns true if the MSR in question is managed via XSTATE, i.e. is context
>> + * switched with the rest of guest FPU state.  Note!  S_CET is _not_ context
>> + * switched via XSTATE even though it _is_ saved/restored via XSAVES/XRSTORS.
>> + * Because S_CET is loaded on VM-Enter and VM-Exit via dedicated VMCS fields,
>> + * the value saved/restored via XSTATE is always the host's value.  That detail
>> + * is _extremely_ important, as the guest's S_CET must _never_ be resident in
>> + * hardware while executing in the host.  Loading guest values for U_CET and
>> + * PL[0-3]_SSP while executing in the kernel is safe, as U_CET is specific to
>> + * userspace, and PL[0-3]_SSP are only consumed when transitioning to lower
>> + * privilegel levels, i.e. are effectively only consumed by userspace as well.
>> + */

privilegel -> privilege



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 14/41] KVM: VMX: Emulate read and write to CET MSRs
  2025-09-12 23:22 ` [PATCH v15 14/41] KVM: VMX: Emulate read and write to CET MSRs Sean Christopherson
  2025-09-16  7:07   ` Xiaoyao Li
@ 2025-09-17  7:52   ` Binbin Wu
  1 sibling, 0 replies; 130+ messages in thread
From: Binbin Wu @ 2025-09-17  7:52 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z



On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Add emulation interface for CET MSR access. The emulation code is split
> into common part and vendor specific part. The former does common checks
> for MSRs, e.g., accessibility, data validity etc., then passes operation
> to either XSAVE-managed MSRs via the helpers or CET VMCS fields.
>
> SSP can only be read via RDSSP. Writing even requires destructive and
> potentially faulting operations such as SAVEPREVSSP/RSTORSSP or
> SETSSBSY/CLRSSBSY. Let the host use a pseudo-MSR that is just a wrapper
> for the GUEST_SSP field of the VMCS.
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> [sean: drop call to kvm_set_xstate_msr() for S_CET, consolidate code]
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

> ---
>   arch/x86/kvm/vmx/vmx.c | 18 ++++++++++++
>   arch/x86/kvm/x86.c     | 64 ++++++++++++++++++++++++++++++++++++++++--
>   arch/x86/kvm/x86.h     | 23 +++++++++++++++
>   3 files changed, 103 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 227b45430ad8..4fc1dbba2eb0 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2106,6 +2106,15 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		else
>   			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
>   		break;
> +	case MSR_IA32_S_CET:
> +		msr_info->data = vmcs_readl(GUEST_S_CET);
> +		break;
> +	case MSR_KVM_INTERNAL_GUEST_SSP:
> +		msr_info->data = vmcs_readl(GUEST_SSP);
> +		break;
> +	case MSR_IA32_INT_SSP_TAB:
> +		msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
> +		break;
>   	case MSR_IA32_DEBUGCTLMSR:
>   		msr_info->data = vmx_guest_debugctl_read();
>   		break;
> @@ -2424,6 +2433,15 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		else
>   			vmx->pt_desc.guest.addr_a[index / 2] = data;
>   		break;
> +	case MSR_IA32_S_CET:
> +		vmcs_writel(GUEST_S_CET, data);
> +		break;
> +	case MSR_KVM_INTERNAL_GUEST_SSP:
> +		vmcs_writel(GUEST_SSP, data);
> +		break;
> +	case MSR_IA32_INT_SSP_TAB:
> +		vmcs_writel(GUEST_INTR_SSP_TABLE, data);
> +		break;
>   	case MSR_IA32_PERF_CAPABILITIES:
>   		if (data & PMU_CAP_LBR_FMT) {
>   			if ((data & PMU_CAP_LBR_FMT) !=
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 460ceae11495..0b67b1b0e361 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1890,6 +1890,44 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
>   
>   		data = (u32)data;
>   		break;
> +	case MSR_IA32_U_CET:
> +	case MSR_IA32_S_CET:
> +		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
> +		    !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT))
> +			return KVM_MSR_RET_UNSUPPORTED;
> +		if (!kvm_is_valid_u_s_cet(vcpu, data))
> +			return 1;
> +		break;
> +	case MSR_KVM_INTERNAL_GUEST_SSP:
> +		if (!host_initiated)
> +			return 1;
> +		fallthrough;
> +		/*
> +		 * Note that the MSR emulation here is flawed when a vCPU
> +		 * doesn't support the Intel 64 architecture. The expected
> +		 * architectural behavior in this case is that the upper 32
> +		 * bits do not exist and should always read '0'. However,
> +		 * because the actual hardware on which the virtual CPU is
> +		 * running does support Intel 64, XRSTORS/XSAVES in the
> +		 * guest could observe behavior that violates the
> +		 * architecture. Intercepting XRSTORS/XSAVES for this
> +		 * special case isn't deemed worthwhile.
> +		 */
> +	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
> +		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
> +			return KVM_MSR_RET_UNSUPPORTED;
> +		/*
> +		 * MSR_IA32_INT_SSP_TAB is not present on processors that do
> +		 * not support Intel 64 architecture.
> +		 */
> +		if (index == MSR_IA32_INT_SSP_TAB && !guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
> +			return KVM_MSR_RET_UNSUPPORTED;
> +		if (is_noncanonical_msr_address(data, vcpu))
> +			return 1;
> +		/* All SSP MSRs except MSR_IA32_INT_SSP_TAB must be 4-byte aligned */
> +		if (index != MSR_IA32_INT_SSP_TAB && !IS_ALIGNED(data, 4))
> +			return 1;
> +		break;
>   	}
>   
>   	msr.data = data;
> @@ -1934,6 +1972,20 @@ static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
>   		    !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID))
>   			return 1;
>   		break;
> +	case MSR_IA32_U_CET:
> +	case MSR_IA32_S_CET:
> +		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
> +		    !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT))
> +			return KVM_MSR_RET_UNSUPPORTED;
> +		break;
> +	case MSR_KVM_INTERNAL_GUEST_SSP:
> +		if (!host_initiated)
> +			return 1;
> +		fallthrough;
> +	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
> +		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
> +			return KVM_MSR_RET_UNSUPPORTED;
> +		break;
>   	}
>   
>   	msr.index = index;
> @@ -3864,12 +3916,12 @@ static __always_inline void kvm_access_xstate_msr(struct kvm_vcpu *vcpu,
>   	kvm_fpu_put();
>   }
>   
> -static __maybe_unused void kvm_set_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> +static void kvm_set_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   {
>   	kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_W);
>   }
>   
> -static __maybe_unused void kvm_get_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> +static void kvm_get_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   {
>   	kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_R);
>   }
> @@ -4255,6 +4307,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		vcpu->arch.guest_fpu.xfd_err = data;
>   		break;
>   #endif
> +	case MSR_IA32_U_CET:
> +	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> +		kvm_set_xstate_msr(vcpu, msr_info);
> +		break;
>   	default:
>   		if (kvm_pmu_is_valid_msr(vcpu, msr))
>   			return kvm_pmu_set_msr(vcpu, msr_info);
> @@ -4604,6 +4660,10 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		msr_info->data = vcpu->arch.guest_fpu.xfd_err;
>   		break;
>   #endif
> +	case MSR_IA32_U_CET:
> +	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> +		kvm_get_xstate_msr(vcpu, msr_info);
> +		break;
>   	default:
>   		if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
>   			return kvm_pmu_get_msr(vcpu, msr_info);
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index a7c9c72fca93..076eccba0f7e 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -710,4 +710,27 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl,
>   
>   int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
>   
> +#define CET_US_RESERVED_BITS		GENMASK(9, 6)
> +#define CET_US_SHSTK_MASK_BITS		GENMASK(1, 0)
> +#define CET_US_IBT_MASK_BITS		(GENMASK_ULL(5, 2) | GENMASK_ULL(63, 10))
> +#define CET_US_LEGACY_BITMAP_BASE(data)	((data) >> 12)
> +
> +static inline bool kvm_is_valid_u_s_cet(struct kvm_vcpu *vcpu, u64 data)
> +{
> +	if (data & CET_US_RESERVED_BITS)
> +		return false;
> +	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
> +	    (data & CET_US_SHSTK_MASK_BITS))
> +		return false;
> +	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) &&
> +	    (data & CET_US_IBT_MASK_BITS))
> +		return false;
> +	if (!IS_ALIGNED(CET_US_LEGACY_BITMAP_BASE(data), 4))
> +		return false;
> +	/* IBT can be suppressed iff the TRACKER isn't WAIT_ENDBR. */
> +	if ((data & CET_SUPPRESS) && (data & CET_WAIT_ENDBR))
> +		return false;
> +
> +	return true;
> +}
>   #endif


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 15/41] KVM: x86: Save and reload SSP to/from SMRAM
  2025-09-12 23:22 ` [PATCH v15 15/41] KVM: x86: Save and reload SSP to/from SMRAM Sean Christopherson
  2025-09-16  7:37   ` Xiaoyao Li
@ 2025-09-17  7:53   ` Binbin Wu
  1 sibling, 0 replies; 130+ messages in thread
From: Binbin Wu @ 2025-09-17  7:53 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z



On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Save CET SSP to SMRAM on SMI and reload it on RSM. KVM emulates HW arch
> behavior when guest enters/leaves SMM mode,i.e., save registers to SMRAM
> at the entry of SMM and reload them at the exit to SMM. Per SDM, SSP is
> one of such registers on 64-bit Arch, and add the support for SSP.
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

> ---
>   arch/x86/kvm/smm.c | 8 ++++++++
>   arch/x86/kvm/smm.h | 2 +-
>   2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
> index 5dd8a1646800..b0b14ba37f9a 100644
> --- a/arch/x86/kvm/smm.c
> +++ b/arch/x86/kvm/smm.c
> @@ -269,6 +269,10 @@ static void enter_smm_save_state_64(struct kvm_vcpu *vcpu,
>   	enter_smm_save_seg_64(vcpu, &smram->gs, VCPU_SREG_GS);
>   
>   	smram->int_shadow = kvm_x86_call(get_interrupt_shadow)(vcpu);
> +
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
> +	    kvm_msr_read(vcpu, MSR_KVM_INTERNAL_GUEST_SSP, &smram->ssp))
> +		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
>   }
>   #endif
>   
> @@ -558,6 +562,10 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
>   	kvm_x86_call(set_interrupt_shadow)(vcpu, 0);
>   	ctxt->interruptibility = (u8)smstate->int_shadow;
>   
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
> +	    kvm_msr_write(vcpu, MSR_KVM_INTERNAL_GUEST_SSP, smstate->ssp))
> +		return X86EMUL_UNHANDLEABLE;
> +
>   	return X86EMUL_CONTINUE;
>   }
>   #endif
> diff --git a/arch/x86/kvm/smm.h b/arch/x86/kvm/smm.h
> index 551703fbe200..db3c88f16138 100644
> --- a/arch/x86/kvm/smm.h
> +++ b/arch/x86/kvm/smm.h
> @@ -116,8 +116,8 @@ struct kvm_smram_state_64 {
>   	u32 smbase;
>   	u32 reserved4[5];
>   
> -	/* ssp and svm_* fields below are not implemented by KVM */
>   	u64 ssp;
> +	/* svm_* fields below are not implemented by KVM */
>   	u64 svm_guest_pat;
>   	u64 svm_host_efer;
>   	u64 svm_host_cr4;


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 18/41] KVM: x86: Don't emulate instructions affected by CET features
  2025-09-12 23:22 ` [PATCH v15 18/41] KVM: x86: Don't emulate instructions affected by CET features Sean Christopherson
@ 2025-09-17  8:16   ` Chao Gao
  2025-09-17 21:15     ` Sean Christopherson
  2025-09-17  8:19   ` Xiaoyao Li
  2025-09-17  8:45   ` Binbin Wu
  2 siblings, 1 reply; 130+ messages in thread
From: Chao Gao @ 2025-09-17  8:16 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z

On Fri, Sep 12, 2025 at 04:22:56PM -0700, Sean Christopherson wrote:
>From: Yang Weijiang <weijiang.yang@intel.com>
>
>Don't emulate branch instructions, e.g. CALL/RET/JMP etc., that are
>affected by Shadow Stacks and/or Indirect Branch Tracking when said
>features are enabled in the guest, as fully emulating CET would require
>significant complexity for no practical benefit (KVM shouldn't need to
>emulate branch instructions on modern hosts).  Simply doing nothing isn't
>an option as that would allow a malicious entity to subvert CET
>protections via the emulator.
>
>Note!  On far transfers, do NOT consult the current privilege level and
>instead treat SHSTK/IBT as being enabled if they're enabled for User *or*
>Supervisor mode.  On inter-privilege level far transfers, SHSTK and IBT
>can be in play for the target privilege level, i.e. checking the current
>privilege could get a false negative, and KVM doesn't know the target
>privilege level until emulation gets under way.

I modified KUT's cet.c to verify that near jumps, near returns, and far
transfers (e.g., IRET) trigger the emulation failure logic added by this
patch when guests enable Shadow Stack or IBT.

I found only one minor issue: near return instructions were not tagged with
ShadowStack. The following diff fixes this issue:

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e4be54a677b0..b1c9816bd5c6 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4326,8 +4326,8 @@ static const struct opcode opcode_table[256] = {
	X8(I(DstReg | SrcImm64 | Mov, em_mov)),
	/* 0xC0 - 0xC7 */
	G(ByteOp | Src2ImmByte, group2), G(Src2ImmByte, group2),
-	I(ImplicitOps | NearBranch | SrcImmU16 | IsBranch, em_ret_near_imm),
-	I(ImplicitOps | NearBranch | IsBranch, em_ret),
+	I(ImplicitOps | NearBranch | SrcImmU16 | IsBranch | ShadowStack, em_ret_near_imm),
+	I(ImplicitOps | NearBranch | IsBranch | ShadowStack, em_ret),
	I(DstReg | SrcMemFAddr | ModRM | No64 | Src2ES, em_lseg),
	I(DstReg | SrcMemFAddr | ModRM | No64 | Src2DS, em_lseg),
	G(ByteOp, group11), G(0, group11),


And for reference, below are the changes I made to KUT's cet.c

diff --git a/x86/cet.c b/x86/cet.c
index 42d2b1fc..ff6b17f6 100644
--- a/x86/cet.c
+++ b/x86/cet.c
@@ -30,6 +30,8 @@ static u64 cet_shstk_func(void)
	 */
	printf("Try to temper the return-address, this causes #CP on returning...\n");
	*(ret_addr + 1) = 0xdeaddead;
+	/* Verify that near return causes emulation failure */
+	asm volatile (KVM_FEP "ret\n");
 
	return 0;
 }
@@ -45,7 +47,8 @@ static u64 cet_ibt_func(void)
	asm volatile ("movq $2, %rcx\n"
		      "dec %rcx\n"
		      "leaq 2f(%rip), %rax\n"
-		      "jmp *%rax \n"
+		      /* Verify that near jmp causes emulation failure */
+		      KVM_FEP "jmp *%rax \n"
		      "2:\n"
		      "dec %rcx\n");
	return 0;
@@ -111,6 +114,12 @@ int main(int ac, char **av)
	/* Enable CET master control bit in CR4. */
	write_cr4(read_cr4() | X86_CR4_CET);
 
+	/*
+	 * Verify "Far transfers" causes emulation failure even if shadow
+	 * stack isn't enabled for the current privilege level
+	 */
+	asm volatile (KVM_FEP "iret\n");
+
	printf("Unit test for CET user mode...\n");
	run_in_user((usermode_func)cet_shstk_func, GP_VECTOR, 0, 0, 0, 0, &rvc);
	report(cp_count == 1, "Completed shadow-stack protection test successfully.");

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 18/41] KVM: x86: Don't emulate instructions affected by CET features
  2025-09-12 23:22 ` [PATCH v15 18/41] KVM: x86: Don't emulate instructions affected by CET features Sean Christopherson
  2025-09-17  8:16   ` Chao Gao
@ 2025-09-17  8:19   ` Xiaoyao Li
  2025-09-18 14:15     ` Chao Gao
  2025-09-17  8:45   ` Binbin Wu
  2 siblings, 1 reply; 130+ messages in thread
From: Xiaoyao Li @ 2025-09-17  8:19 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Zhang Yi Z

On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
> 
> Don't emulate branch instructions, e.g. CALL/RET/JMP etc., that are
> affected by Shadow Stacks and/or Indirect Branch Tracking when said
> features are enabled in the guest, as fully emulating CET would require
> significant complexity for no practical benefit (KVM shouldn't need to
> emulate branch instructions on modern hosts).  Simply doing nothing isn't
> an option as that would allow a malicious entity to subvert CET
> protections via the emulator.
> 
> Note!  On far transfers, do NOT consult the current privilege level and
> instead treat SHSTK/IBT as being enabled if they're enabled for User *or*
> Supervisor mode.  On inter-privilege level far transfers, SHSTK and IBT
> can be in play for the target privilege level, i.e. checking the current
> privilege could get a false negative, and KVM doesn't know the target
> privilege level until emulation gets under way.
> 
> Suggested-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Cc: Mathias Krause <minipli@grsecurity.net>
> Cc: John Allen <john.allen@amd.com>
> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/emulate.c | 58 ++++++++++++++++++++++++++++++++++--------
>   1 file changed, 47 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 542d3664afa3..e4be54a677b0 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -178,6 +178,8 @@
>   #define IncSP       ((u64)1 << 54)  /* SP is incremented before ModRM calc */
>   #define TwoMemOp    ((u64)1 << 55)  /* Instruction has two memory operand */
>   #define IsBranch    ((u64)1 << 56)  /* Instruction is considered a branch. */
> +#define ShadowStack ((u64)1 << 57)  /* Instruction protected by Shadow Stack. */
> +#define IndirBrnTrk ((u64)1 << 58)  /* Instruction protected by IBT. */
>   
>   #define DstXacc     (DstAccLo | SrcAccHi | SrcWrite)
>   
> @@ -4068,9 +4070,9 @@ static const struct opcode group4[] = {
>   static const struct opcode group5[] = {
>   	F(DstMem | SrcNone | Lock,		em_inc),
>   	F(DstMem | SrcNone | Lock,		em_dec),
> -	I(SrcMem | NearBranch | IsBranch,       em_call_near_abs),
> -	I(SrcMemFAddr | ImplicitOps | IsBranch, em_call_far),
> -	I(SrcMem | NearBranch | IsBranch,       em_jmp_abs),
> +	I(SrcMem | NearBranch | IsBranch | ShadowStack | IndirBrnTrk, em_call_near_abs),
> +	I(SrcMemFAddr | ImplicitOps | IsBranch | ShadowStack | IndirBrnTrk, em_call_far),
> +	I(SrcMem | NearBranch | IsBranch | IndirBrnTrk, em_jmp_abs),

>   	I(SrcMemFAddr | ImplicitOps | IsBranch, em_jmp_far),

It seems this entry for 'FF 05' (Jump far, absolute indirect) needs to 
set ShadowStack and IndirBrnTrk as well?

>   	I(SrcMem | Stack | TwoMemOp,		em_push), D(Undefined),
>   };
> @@ -4332,11 +4334,11 @@ static const struct opcode opcode_table[256] = {
>   	/* 0xC8 - 0xCF */
>   	I(Stack | SrcImmU16 | Src2ImmByte | IsBranch, em_enter),
>   	I(Stack | IsBranch, em_leave),
> -	I(ImplicitOps | SrcImmU16 | IsBranch, em_ret_far_imm),
> -	I(ImplicitOps | IsBranch, em_ret_far),
> -	D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch, intn),
> +	I(ImplicitOps | SrcImmU16 | IsBranch | ShadowStack, em_ret_far_imm),
> +	I(ImplicitOps | IsBranch | ShadowStack, em_ret_far),
> +	D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch | ShadowStack, intn),
>   	D(ImplicitOps | No64 | IsBranch),
> -	II(ImplicitOps | IsBranch, em_iret, iret),
> +	II(ImplicitOps | IsBranch | ShadowStack, em_iret, iret),
>   	/* 0xD0 - 0xD7 */
>   	G(Src2One | ByteOp, group2), G(Src2One, group2),
>   	G(Src2CL | ByteOp, group2), G(Src2CL, group2),
> @@ -4352,7 +4354,7 @@ static const struct opcode opcode_table[256] = {
>   	I2bvIP(SrcImmUByte | DstAcc, em_in,  in,  check_perm_in),
>   	I2bvIP(SrcAcc | DstImmUByte, em_out, out, check_perm_out),
>   	/* 0xE8 - 0xEF */
> -	I(SrcImm | NearBranch | IsBranch, em_call),
> +	I(SrcImm | NearBranch | IsBranch | ShadowStack, em_call),
>   	D(SrcImm | ImplicitOps | NearBranch | IsBranch),
>   	I(SrcImmFAddr | No64 | IsBranch, em_jmp_far),
>   	D(SrcImmByte | ImplicitOps | NearBranch | IsBranch),
> @@ -4371,7 +4373,7 @@ static const struct opcode opcode_table[256] = {
>   static const struct opcode twobyte_table[256] = {
>   	/* 0x00 - 0x0F */
>   	G(0, group6), GD(0, &group7), N, N,
> -	N, I(ImplicitOps | EmulateOnUD | IsBranch, em_syscall),
> +	N, I(ImplicitOps | EmulateOnUD | IsBranch | ShadowStack | IndirBrnTrk, em_syscall),
>   	II(ImplicitOps | Priv, em_clts, clts), N,
>   	DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N,
>   	N, D(ImplicitOps | ModRM | SrcMem | NoAccess), N, N,
> @@ -4402,8 +4404,8 @@ static const struct opcode twobyte_table[256] = {
>   	IIP(ImplicitOps, em_rdtsc, rdtsc, check_rdtsc),
>   	II(ImplicitOps | Priv, em_rdmsr, rdmsr),
>   	IIP(ImplicitOps, em_rdpmc, rdpmc, check_rdpmc),
> -	I(ImplicitOps | EmulateOnUD | IsBranch, em_sysenter),
> -	I(ImplicitOps | Priv | EmulateOnUD | IsBranch, em_sysexit),
> +	I(ImplicitOps | EmulateOnUD | IsBranch | ShadowStack | IndirBrnTrk, em_sysenter),
> +	I(ImplicitOps | Priv | EmulateOnUD | IsBranch | ShadowStack, em_sysexit),
>   	N, N,
>   	N, N, N, N, N, N, N, N,
>   	/* 0x40 - 0x4F */
> @@ -4941,6 +4943,40 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
>   	if (ctxt->d == 0)
>   		return EMULATION_FAILED;
>   
> +	/*
> +	 * Reject emulation if KVM might need to emulate shadow stack updates
> +	 * and/or indirect branch tracking enforcement, which the emulator
> +	 * doesn't support.
> +	 */
> +	if (opcode.flags & (ShadowStack | IndirBrnTrk) &&
> +	    ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET) {
> +		u64 u_cet = 0, s_cet = 0;
> +
> +		/*
> +		 * Check both User and Supervisor on far transfers as inter-
> +		 * privilege level transfers are impacted by CET at the target
> +		 * privilege levels, and that is not known at this time.  The
> +		 * the expectation is that the guest will not require emulation
> +		 * of any CET-affected instructions at any privilege level.
> +		 */
> +		if (!(opcode.flags & NearBranch))
> +			u_cet = s_cet = CET_SHSTK_EN | CET_ENDBR_EN;
> +		else if (ctxt->ops->cpl(ctxt) == 3)
> +			u_cet = CET_SHSTK_EN | CET_ENDBR_EN;
> +		else
> +			s_cet = CET_SHSTK_EN | CET_ENDBR_EN;
> +
> +		if ((u_cet && ctxt->ops->get_msr(ctxt, MSR_IA32_U_CET, &u_cet)) ||
> +		    (s_cet && ctxt->ops->get_msr(ctxt, MSR_IA32_S_CET, &s_cet)))
> +			return EMULATION_FAILED;
> +
> +		if ((u_cet | s_cet) & CET_SHSTK_EN && opcode.flags & ShadowStack)
> +			return EMULATION_FAILED;
> +
> +		if ((u_cet | s_cet) & CET_ENDBR_EN && opcode.flags & IndirBrnTrk)
> +			return EMULATION_FAILED;
> +	}

I'm not sure other than 'jmp far' case I pointed above, if any more 
instruction/case that are protected by shadow stack or IBT are missed.
(I'm not really good at identifying all of them. Just identify one case 
drains my energy)

At least, the part to return EMULATION_FAILED for the cases where shadow 
stack/IBT protection is needed looks good to me. So, for this part:

Reviewed-by: Xiaoyao Li <xiaoyao.li@Intel.com>

>   	ctxt->execute = opcode.u.execute;
>   
>   	if (unlikely(emulation_type & EMULTYPE_TRAP_UD) &&


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 16/41] KVM: VMX: Set up interception for CET MSRs
  2025-09-12 23:22 ` [PATCH v15 16/41] KVM: VMX: Set up interception for CET MSRs Sean Christopherson
  2025-09-15 17:21   ` Xin Li
  2025-09-16  7:40   ` Xiaoyao Li
@ 2025-09-17  8:32   ` Binbin Wu
  2025-09-17 13:44     ` Sean Christopherson
  2 siblings, 1 reply; 130+ messages in thread
From: Binbin Wu @ 2025-09-17  8:32 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z



On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Enable/disable CET MSRs interception per associated feature configuration.
>
> Pass through CET MSRs that are managed by XSAVE, as they cannot be
> intercepted without also intercepting XSAVE. However, intercepting XSAVE
> would likely cause unacceptable performance overhead.
Here may be a bit confusing about the description of "managed by XSAVE" because
KVM has a function is_xstate_managed_msr(), and MSR_IA32_S_CET is not xstate
managed in it.

Otherwise,
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

> MSR_IA32_INT_SSP_TAB is not managed by XSAVE, so it is intercepted.
>
> Note, this MSR design introduced an architectural limitation of SHSTK and
> IBT control for guest, i.e., when SHSTK is exposed, IBT is also available
> to guest from architectural perspective since IBT relies on subset of SHSTK
> relevant MSRs.
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/vmx/vmx.c | 19 +++++++++++++++++++
>   1 file changed, 19 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 4fc1dbba2eb0..adf5af30e537 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4101,6 +4101,8 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
>   
>   void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
>   {
> +	bool intercept;
> +
>   	if (!cpu_has_vmx_msr_bitmap())
>   		return;
>   
> @@ -4146,6 +4148,23 @@ void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
>   		vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
>   					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
>   
> +	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
> +		intercept = !guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
> +
> +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, MSR_TYPE_RW, intercept);
> +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, MSR_TYPE_RW, intercept);
> +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, MSR_TYPE_RW, intercept);
> +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, intercept);
> +	}
> +
> +	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK) || kvm_cpu_cap_has(X86_FEATURE_IBT)) {
> +		intercept = !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) &&
> +			    !guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
> +
> +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, MSR_TYPE_RW, intercept);
> +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, intercept);
> +	}
> +
>   	/*
>   	 * x2APIC and LBR MSR intercepts are modified on-demand and cannot be
>   	 * filtered by userspace.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 18/41] KVM: x86: Don't emulate instructions affected by CET features
  2025-09-12 23:22 ` [PATCH v15 18/41] KVM: x86: Don't emulate instructions affected by CET features Sean Christopherson
  2025-09-17  8:16   ` Chao Gao
  2025-09-17  8:19   ` Xiaoyao Li
@ 2025-09-17  8:45   ` Binbin Wu
  2 siblings, 0 replies; 130+ messages in thread
From: Binbin Wu @ 2025-09-17  8:45 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z



On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Don't emulate branch instructions, e.g. CALL/RET/JMP etc., that are
> affected by Shadow Stacks and/or Indirect Branch Tracking when said
> features are enabled in the guest, as fully emulating CET would require
> significant complexity for no practical benefit (KVM shouldn't need to
> emulate branch instructions on modern hosts).  Simply doing nothing isn't
> an option as that would allow a malicious entity to subvert CET
> protections via the emulator.
>
> Note!  On far transfers, do NOT consult the current privilege level and
> instead treat SHSTK/IBT as being enabled if they're enabled for User *or*
> Supervisor mode.  On inter-privilege level far transfers, SHSTK and IBT
> can be in play for the target privilege level, i.e. checking the current
> privilege could get a false negative, and KVM doesn't know the target
> privilege level until emulation gets under way.

About the emulator, there is a VMX exit reason EXIT_REASON_TASK_SWITCH.
The VM Exit triggers the following path:
EXIT_REASON_TASK_SWITCH
     handle_task_switch
         kvm_task_switch
             emulator_task_switch

According to SDM, in Vol 3 Chapter "Task Management", section "Executing a Task"
"If shadow stack is enabled, then the SSP of the task is located at the 4 bytes
  at offset 104 in the 32-bit TSS and is used by the processor to establish the
  SSP when a task switch occurs from a task associated with this TSS. Note that
  the processor does not write the SSP of the task initiating the task switch to
  the TSS of that task, and instead the SSP of the previous task is pushed onto
  the shadow stack of the new task."

This case is not covered, although using CET in 32-bit guests should be a corner
case.


>
> Suggested-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Cc: Mathias Krause <minipli@grsecurity.net>
> Cc: John Allen <john.allen@amd.com>
> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/emulate.c | 58 ++++++++++++++++++++++++++++++++++--------
>   1 file changed, 47 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 542d3664afa3..e4be54a677b0 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -178,6 +178,8 @@
>   #define IncSP       ((u64)1 << 54)  /* SP is incremented before ModRM calc */
>   #define TwoMemOp    ((u64)1 << 55)  /* Instruction has two memory operand */
>   #define IsBranch    ((u64)1 << 56)  /* Instruction is considered a branch. */
> +#define ShadowStack ((u64)1 << 57)  /* Instruction protected by Shadow Stack. */
> +#define IndirBrnTrk ((u64)1 << 58)  /* Instruction protected by IBT. */
>   
>   #define DstXacc     (DstAccLo | SrcAccHi | SrcWrite)
>   
> @@ -4068,9 +4070,9 @@ static const struct opcode group4[] = {
>   static const struct opcode group5[] = {
>   	F(DstMem | SrcNone | Lock,		em_inc),
>   	F(DstMem | SrcNone | Lock,		em_dec),
> -	I(SrcMem | NearBranch | IsBranch,       em_call_near_abs),
> -	I(SrcMemFAddr | ImplicitOps | IsBranch, em_call_far),
> -	I(SrcMem | NearBranch | IsBranch,       em_jmp_abs),
> +	I(SrcMem | NearBranch | IsBranch | ShadowStack | IndirBrnTrk, em_call_near_abs),
> +	I(SrcMemFAddr | ImplicitOps | IsBranch | ShadowStack | IndirBrnTrk, em_call_far),
> +	I(SrcMem | NearBranch | IsBranch | IndirBrnTrk, em_jmp_abs),
>   	I(SrcMemFAddr | ImplicitOps | IsBranch, em_jmp_far),
>   	I(SrcMem | Stack | TwoMemOp,		em_push), D(Undefined),
>   };
> @@ -4332,11 +4334,11 @@ static const struct opcode opcode_table[256] = {
>   	/* 0xC8 - 0xCF */
>   	I(Stack | SrcImmU16 | Src2ImmByte | IsBranch, em_enter),
>   	I(Stack | IsBranch, em_leave),
> -	I(ImplicitOps | SrcImmU16 | IsBranch, em_ret_far_imm),
> -	I(ImplicitOps | IsBranch, em_ret_far),
> -	D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch, intn),
> +	I(ImplicitOps | SrcImmU16 | IsBranch | ShadowStack, em_ret_far_imm),
> +	I(ImplicitOps | IsBranch | ShadowStack, em_ret_far),
> +	D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch | ShadowStack, intn),
>   	D(ImplicitOps | No64 | IsBranch),
> -	II(ImplicitOps | IsBranch, em_iret, iret),
> +	II(ImplicitOps | IsBranch | ShadowStack, em_iret, iret),
>   	/* 0xD0 - 0xD7 */
>   	G(Src2One | ByteOp, group2), G(Src2One, group2),
>   	G(Src2CL | ByteOp, group2), G(Src2CL, group2),
> @@ -4352,7 +4354,7 @@ static const struct opcode opcode_table[256] = {
>   	I2bvIP(SrcImmUByte | DstAcc, em_in,  in,  check_perm_in),
>   	I2bvIP(SrcAcc | DstImmUByte, em_out, out, check_perm_out),
>   	/* 0xE8 - 0xEF */
> -	I(SrcImm | NearBranch | IsBranch, em_call),
> +	I(SrcImm | NearBranch | IsBranch | ShadowStack, em_call),
>   	D(SrcImm | ImplicitOps | NearBranch | IsBranch),
>   	I(SrcImmFAddr | No64 | IsBranch, em_jmp_far),
>   	D(SrcImmByte | ImplicitOps | NearBranch | IsBranch),
> @@ -4371,7 +4373,7 @@ static const struct opcode opcode_table[256] = {
>   static const struct opcode twobyte_table[256] = {
>   	/* 0x00 - 0x0F */
>   	G(0, group6), GD(0, &group7), N, N,
> -	N, I(ImplicitOps | EmulateOnUD | IsBranch, em_syscall),
> +	N, I(ImplicitOps | EmulateOnUD | IsBranch | ShadowStack | IndirBrnTrk, em_syscall),
>   	II(ImplicitOps | Priv, em_clts, clts), N,
>   	DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N,
>   	N, D(ImplicitOps | ModRM | SrcMem | NoAccess), N, N,
> @@ -4402,8 +4404,8 @@ static const struct opcode twobyte_table[256] = {
>   	IIP(ImplicitOps, em_rdtsc, rdtsc, check_rdtsc),
>   	II(ImplicitOps | Priv, em_rdmsr, rdmsr),
>   	IIP(ImplicitOps, em_rdpmc, rdpmc, check_rdpmc),
> -	I(ImplicitOps | EmulateOnUD | IsBranch, em_sysenter),
> -	I(ImplicitOps | Priv | EmulateOnUD | IsBranch, em_sysexit),
> +	I(ImplicitOps | EmulateOnUD | IsBranch | ShadowStack | IndirBrnTrk, em_sysenter),
> +	I(ImplicitOps | Priv | EmulateOnUD | IsBranch | ShadowStack, em_sysexit),
>   	N, N,
>   	N, N, N, N, N, N, N, N,
>   	/* 0x40 - 0x4F */
> @@ -4941,6 +4943,40 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
>   	if (ctxt->d == 0)
>   		return EMULATION_FAILED;
>   
> +	/*
> +	 * Reject emulation if KVM might need to emulate shadow stack updates
> +	 * and/or indirect branch tracking enforcement, which the emulator
> +	 * doesn't support.
> +	 */
> +	if (opcode.flags & (ShadowStack | IndirBrnTrk) &&
> +	    ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET) {
> +		u64 u_cet = 0, s_cet = 0;
> +
> +		/*
> +		 * Check both User and Supervisor on far transfers as inter-
> +		 * privilege level transfers are impacted by CET at the target
> +		 * privilege levels, and that is not known at this time.  The
> +		 * the expectation is that the guest will not require emulation
> +		 * of any CET-affected instructions at any privilege level.
> +		 */
> +		if (!(opcode.flags & NearBranch))
> +			u_cet = s_cet = CET_SHSTK_EN | CET_ENDBR_EN;
> +		else if (ctxt->ops->cpl(ctxt) == 3)
> +			u_cet = CET_SHSTK_EN | CET_ENDBR_EN;
> +		else
> +			s_cet = CET_SHSTK_EN | CET_ENDBR_EN;
> +
> +		if ((u_cet && ctxt->ops->get_msr(ctxt, MSR_IA32_U_CET, &u_cet)) ||
> +		    (s_cet && ctxt->ops->get_msr(ctxt, MSR_IA32_S_CET, &s_cet)))
> +			return EMULATION_FAILED;
> +
> +		if ((u_cet | s_cet) & CET_SHSTK_EN && opcode.flags & ShadowStack)
> +			return EMULATION_FAILED;
> +
> +		if ((u_cet | s_cet) & CET_ENDBR_EN && opcode.flags & IndirBrnTrk)
> +			return EMULATION_FAILED;
> +	}
> +
>   	ctxt->execute = opcode.u.execute;
>   
>   	if (unlikely(emulation_type & EMULTYPE_TRAP_UD) &&


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 17/41] KVM: VMX: Set host constant supervisor states to VMCS fields
  2025-09-12 23:22 ` [PATCH v15 17/41] KVM: VMX: Set host constant supervisor states to VMCS fields Sean Christopherson
  2025-09-16  7:44   ` Xiaoyao Li
@ 2025-09-17  8:48   ` Xiaoyao Li
  2025-09-17 21:25     ` Sean Christopherson
  1 sibling, 1 reply; 130+ messages in thread
From: Xiaoyao Li @ 2025-09-17  8:48 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Zhang Yi Z

On 9/13/2025 7:22 AM, Sean Christopherson wrote:
...
> +static inline bool cpu_has_load_cet_ctrl(void)
> +{
> +	return (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_CET_STATE);
> +}

When looking at the patch 19, I realize that

   { VM_ENTRY_LOAD_CET_STATE,		VM_EXIT_LOAD_CET_STATE }

is added into vmcs_entry_exit_pairs[] there.

So ...

>   static inline bool cpu_has_vmx_mpx(void)
>   {
>   	return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_BNDCFGS;
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index adf5af30e537..e8155635cb42 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4320,6 +4320,21 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
>   
>   	if (cpu_has_load_ia32_efer())
>   		vmcs_write64(HOST_IA32_EFER, kvm_host.efer);
> +
> +	/*
> +	 * Supervisor shadow stack is not enabled on host side, i.e.,
> +	 * host IA32_S_CET.SHSTK_EN bit is guaranteed to 0 now, per SDM
> +	 * description(RDSSP instruction), SSP is not readable in CPL0,
> +	 * so resetting the two registers to 0s at VM-Exit does no harm
> +	 * to kernel execution. When execution flow exits to userspace,
> +	 * SSP is reloaded from IA32_PL3_SSP. Check SDM Vol.2A/B Chapter
> +	 * 3 and 4 for details.
> +	 */
> +	if (cpu_has_load_cet_ctrl()) {

... cpu_has_load_cet_ctrl() cannot ensure the existence of host CET 
fields, unless we change it to check vmcs_config.vmexit_ctrl or add CET 
entry_exit pair into the vmcs_entry_exit_pairs[] in this patch.

> +		vmcs_writel(HOST_S_CET, kvm_host.s_cet);
> +		vmcs_writel(HOST_SSP, 0);
> +		vmcs_writel(HOST_INTR_SSP_TABLE, 0);
> +	}
>   }

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 09/41] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs
  2025-09-16  8:28   ` Binbin Wu
  2025-09-17  2:51     ` Binbin Wu
@ 2025-09-17 12:47     ` Sean Christopherson
  2025-09-17 21:56       ` Sean Christopherson
  1 sibling, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-17 12:47 UTC (permalink / raw)
  To: Binbin Wu
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z

On Tue, Sep 16, 2025, Binbin Wu wrote:
> > +/*
> > + * Lock and/or reload guest FPU and access xstate MSRs. For accesses initiated
> 
> 
> Lock is unconditional and reload is conditional.
> "and/or" seems not accurate?

Agreed.  This?

/*
 * Lock andr (re)load guest FPU and access xstate MSRs. For accesses initiated
 * by host, guest FPU is loaded in __msr_io(). For accesses initiated by guest,
 * guest FPU should have been loaded already.
 */

> 
> > + * by host, guest FPU is loaded in __msr_io(). For accesses initiated by guest,
> > + * guest FPU should have been loaded already.
> > + */
> > +static __always_inline void kvm_access_xstate_msr(struct kvm_vcpu *vcpu,
> > +						  struct msr_data *msr_info,
> > +						  int access)
> > +{
> > +	BUILD_BUG_ON(access != MSR_TYPE_R && access != MSR_TYPE_W);
> > +
> > +	KVM_BUG_ON(!is_xstate_managed_msr(vcpu, msr_info->index), vcpu->kvm);
> > +	KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
> > +
> > +	kvm_fpu_get();
> > +	if (access == MSR_TYPE_R)
> > +		rdmsrq(msr_info->index, msr_info->data);
> > +	else
> > +		wrmsrq(msr_info->index, msr_info->data);
> > +	kvm_fpu_put();
> > +}
> > +
> > +static __maybe_unused void kvm_set_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > +{
> > +	kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_W);
> > +}
> > +
> > +static __maybe_unused void kvm_get_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > +{
> > +	kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_R);
> > +}
> > +
> >   int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >   {
> >   	u32 msr = msr_info->index;
> > @@ -4551,11 +4614,25 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
> >   		    int (*do_msr)(struct kvm_vcpu *vcpu,
> >   				  unsigned index, u64 *data))
> >   {
> > +	bool fpu_loaded = false;
> >   	int i;
> > -	for (i = 0; i < msrs->nmsrs; ++i)
> > +	for (i = 0; i < msrs->nmsrs; ++i) {
> > +		/*
> > +		 * If userspace is accessing one or more XSTATE-managed MSRs,
> > +		 * temporarily load the guest's FPU state so that the guest's
> > +		 * MSR value(s) is resident in hardware, i.e. so that KVM can
> 
> Using "i.e." and "so that" together feels repetitive.[...]

		/*
		 * If userspace is accessing one or more XSTATE-managed MSRs,
		 * temporarily load the guest's FPU state so that the guest's
		 * MSR value(s) is resident in hardware and thus can be accessed
		 * via RDMSR/WRMSR.
		 */

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 04/41] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support
  2025-09-16  7:10   ` Binbin Wu
@ 2025-09-17 13:14     ` Sean Christopherson
  0 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-17 13:14 UTC (permalink / raw)
  To: Binbin Wu
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z

On Tue, Sep 16, 2025, Binbin Wu wrote:
> On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> > ---
> >   Documentation/virt/kvm/api.rst  |   6 +-
> >   arch/x86/include/uapi/asm/kvm.h |  26 +++++++++
> >   arch/x86/kvm/x86.c              | 100 ++++++++++++++++++++++++++++++++
> >   3 files changed, 131 insertions(+), 1 deletion(-)
> [...]
> > +
> > +#define KVM_X86_REG_ENCODE(type, index)				\
> Nit:
> Is it better to use KVM_X86_REG_ID so that when searching with the string
> non-case sensitively, the encoding and its structure can be related to each
> other?

I'm leaning _very_ slightly toward ENOCDE, but I am a-ok with ID too.  Anyone
else have a preference?  If not, I'll go with Binbinb's suggestion of ID.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 16/41] KVM: VMX: Set up interception for CET MSRs
  2025-09-17  8:32   ` Binbin Wu
@ 2025-09-17 13:44     ` Sean Christopherson
  0 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-17 13:44 UTC (permalink / raw)
  To: Binbin Wu
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z

On Wed, Sep 17, 2025, Binbin Wu wrote:
> 
> 
> On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> > From: Yang Weijiang <weijiang.yang@intel.com>
> > 
> > Enable/disable CET MSRs interception per associated feature configuration.
> > 
> > Pass through CET MSRs that are managed by XSAVE, as they cannot be
> > intercepted without also intercepting XSAVE. However, intercepting XSAVE
> > would likely cause unacceptable performance overhead.
> Here may be a bit confusing about the description of "managed by XSAVE" because
> KVM has a function is_xstate_managed_msr(), and MSR_IA32_S_CET is not xstate
> managed in it.

Ooh, yeah, definitely confusing.  And the XSAVE part is also misleading to some
extent, because strictly speaking it's XSAVES/XRSTORS.  And performance isn't
the main concern, it's the complexity of emulating XSAVES/XRSTORS that's the
non-starter.  I think it's also worth calling out that the code intentionally
doesn't check XSAVES support.

  Disable interception for CET MSRs that can be accessed ia vXSAVES/XRSTORS,
  as accesses through XSTATE aren't subject to MSR interception checks, i.e.
  cannot be intercepted without intercepting and emulating XSAVES/XRSTORS,
  and KVM doesn't support emulating XSAVE/XRSTOR instructions.

  Don't condition interception on the guest actually having XSAVES as there
  is no benefit to intercepting the accesses.  The MSRs in question are
  either context switched by the CPU on VM-Enter/VM-Exit or by KVM via
  XSAVES/XRSTORS (KVM requires XSAVES to virtualization SHSTK), i.e. KVM is
  going to load guest values into hardware irrespective of XSAVES support.

> Otherwise,
> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> 
> > MSR_IA32_INT_SSP_TAB is not managed by XSAVE, so it is intercepted.
> > 
> > Note, this MSR design introduced an architectural limitation of SHSTK and
> > IBT control for guest, i.e., when SHSTK is exposed, IBT is also available
> > to guest from architectural perspective since IBT relies on subset of SHSTK
> > relevant MSRs.
> > 
> > Suggested-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > Tested-by: Mathias Krause <minipli@grsecurity.net>
> > Tested-by: John Allen <john.allen@amd.com>
> > Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> > Signed-off-by: Chao Gao <chao.gao@intel.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >   arch/x86/kvm/vmx/vmx.c | 19 +++++++++++++++++++
> >   1 file changed, 19 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 4fc1dbba2eb0..adf5af30e537 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -4101,6 +4101,8 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
> >   void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
> >   {
> > +	bool intercept;
> > +
> >   	if (!cpu_has_vmx_msr_bitmap())
> >   		return;
> > @@ -4146,6 +4148,23 @@ void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
> >   		vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
> >   					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
> > +	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
> > +		intercept = !guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
> > +
> > +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, MSR_TYPE_RW, intercept);
> > +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, MSR_TYPE_RW, intercept);
> > +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, MSR_TYPE_RW, intercept);
> > +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, intercept);
> > +	}
> > +
> > +	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK) || kvm_cpu_cap_has(X86_FEATURE_IBT)) {
> > +		intercept = !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) &&
> > +			    !guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
> > +
> > +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, MSR_TYPE_RW, intercept);
> > +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, intercept);
> > +	}
> > +
> >   	/*
> >   	 * x2APIC and LBR MSR intercepts are modified on-demand and cannot be
> >   	 * filtered by userspace.
> 

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 18/41] KVM: x86: Don't emulate instructions affected by CET features
  2025-09-17  8:16   ` Chao Gao
@ 2025-09-17 21:15     ` Sean Christopherson
  2025-09-18 14:54       ` Chao Gao
  0 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-17 21:15 UTC (permalink / raw)
  To: Chao Gao
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z

On Wed, Sep 17, 2025, Chao Gao wrote:
> On Fri, Sep 12, 2025 at 04:22:56PM -0700, Sean Christopherson wrote:
> >From: Yang Weijiang <weijiang.yang@intel.com>
> >
> >Don't emulate branch instructions, e.g. CALL/RET/JMP etc., that are
> >affected by Shadow Stacks and/or Indirect Branch Tracking when said
> >features are enabled in the guest, as fully emulating CET would require
> >significant complexity for no practical benefit (KVM shouldn't need to
> >emulate branch instructions on modern hosts).  Simply doing nothing isn't
> >an option as that would allow a malicious entity to subvert CET
> >protections via the emulator.
> >
> >Note!  On far transfers, do NOT consult the current privilege level and
> >instead treat SHSTK/IBT as being enabled if they're enabled for User *or*
> >Supervisor mode.  On inter-privilege level far transfers, SHSTK and IBT
> >can be in play for the target privilege level, i.e. checking the current
> >privilege could get a false negative, and KVM doesn't know the target
> >privilege level until emulation gets under way.
> 
> I modified KUT's cet.c to verify that near jumps, near returns, and far
> transfers (e.g., IRET) trigger the emulation failure logic added by this
> patch when guests enable Shadow Stack or IBT.
> 
> I found only one minor issue: near return instructions were not tagged with
> ShadowStack.

Heh, I had just found this through inspection.

> The following diff fixes this issue:
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index e4be54a677b0..b1c9816bd5c6 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -4326,8 +4326,8 @@ static const struct opcode opcode_table[256] = {
> 	X8(I(DstReg | SrcImm64 | Mov, em_mov)),
> 	/* 0xC0 - 0xC7 */
> 	G(ByteOp | Src2ImmByte, group2), G(Src2ImmByte, group2),
> -	I(ImplicitOps | NearBranch | SrcImmU16 | IsBranch, em_ret_near_imm),
> -	I(ImplicitOps | NearBranch | IsBranch, em_ret),
> +	I(ImplicitOps | NearBranch | SrcImmU16 | IsBranch | ShadowStack, em_ret_near_imm),
> +	I(ImplicitOps | NearBranch | IsBranch | ShadowStack, em_ret),

Tangentially directly related to this bug, I think we should manual annotation
where possible.  I don't see an easy way to do that for ShadowStack, but for IBT
we can use IsBranch, NearBranch and the SrcXXX operance to detect IBT-affected
instructions.  It's obviously more complex, but programmatically detecting
indirect branches should be less error prone.  I'll do so in the next version.

> 	I(DstReg | SrcMemFAddr | ModRM | No64 | Src2ES, em_lseg),
> 	I(DstReg | SrcMemFAddr | ModRM | No64 | Src2DS, em_lseg),
> 	G(ByteOp, group11), G(0, group11),
> 
> 
> And for reference, below are the changes I made to KUT's cet.c

I now have a more comprehensive set of testcases, and it can be upstreamed
(relies on KVM's default behavior of injecting #UD at CPL==3 on failed emulation).

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 17/41] KVM: VMX: Set host constant supervisor states to VMCS fields
  2025-09-17  8:48   ` Xiaoyao Li
@ 2025-09-17 21:25     ` Sean Christopherson
  0 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-17 21:25 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Zhang Yi Z

On Wed, Sep 17, 2025, Xiaoyao Li wrote:
> On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> ...
> > +static inline bool cpu_has_load_cet_ctrl(void)
> > +{
> > +	return (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_CET_STATE);
> > +}
> 
> When looking at the patch 19, I realize that
> 
>   { VM_ENTRY_LOAD_CET_STATE,		VM_EXIT_LOAD_CET_STATE }
> 
> is added into vmcs_entry_exit_pairs[] there.
> 
> So ...
> 
> >   static inline bool cpu_has_vmx_mpx(void)
> >   {
> >   	return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_BNDCFGS;
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index adf5af30e537..e8155635cb42 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -4320,6 +4320,21 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
> >   	if (cpu_has_load_ia32_efer())
> >   		vmcs_write64(HOST_IA32_EFER, kvm_host.efer);
> > +
> > +	/*
> > +	 * Supervisor shadow stack is not enabled on host side, i.e.,
> > +	 * host IA32_S_CET.SHSTK_EN bit is guaranteed to 0 now, per SDM
> > +	 * description(RDSSP instruction), SSP is not readable in CPL0,
> > +	 * so resetting the two registers to 0s at VM-Exit does no harm
> > +	 * to kernel execution. When execution flow exits to userspace,
> > +	 * SSP is reloaded from IA32_PL3_SSP. Check SDM Vol.2A/B Chapter
> > +	 * 3 and 4 for details.
> > +	 */
> > +	if (cpu_has_load_cet_ctrl()) {
> 
> ... cpu_has_load_cet_ctrl() cannot ensure the existence of host CET fields,
> unless we change it to check vmcs_config.vmexit_ctrl or add CET entry_exit
> pair into the vmcs_entry_exit_pairs[] in this patch.

I *love* the attention to detail, but I think we're actually good, technically.

cpu_has_load_cet_ctrl() will always return %false until patch 19, because
VM_ENTRY_LOAD_CET_STATE isn't added to the set of OPTIONAL controls until then,
i.e. VM_ENTRY_LOAD_CET_STATE won't be set in vmcs_config.vmentry_ctrl until
the exit control is as well (and the sanity check is in place).

> > +		vmcs_writel(HOST_S_CET, kvm_host.s_cet);
> > +		vmcs_writel(HOST_SSP, 0);
> > +		vmcs_writel(HOST_INTR_SSP_TABLE, 0);
> > +	}
> >   }

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 02/41] KVM: SEV: Read save fields from GHCB exactly once
  2025-09-15 21:08     ` Sean Christopherson
@ 2025-09-17 21:47       ` Sean Christopherson
  0 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-17 21:47 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Paolo Bonzini, kvm, linux-kernel, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

On Mon, Sep 15, 2025, Sean Christopherson wrote:
> On Mon, Sep 15, 2025, Tom Lendacky wrote:
> > On 9/12/25 18:22, Sean Christopherson wrote:
> > > Wrap all reads of GHCB save fields with READ_ONCE() via a KVM-specific
> > > GHCB get() utility to help guard against TOCTOU bugs.  Using READ_ONCE()
> > > doesn't completely prevent such bugs, e.g. doesn't prevent KVM from
> > > redoing get() after checking the initial value, but at least addresses
> > > all potential TOCTOU issues in the current KVM code base.
> > > 
> > > Opportunistically reduce the indentation of the macro-defined helpers and
> > > clean up the alignment.
> > > 
> > > Fixes: 4e15a0ddc3ff ("KVM: SEV: snapshot the GHCB before accessing it")
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > 
> > Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
> > 
> > Just wondering if we should make the kvm_ghcb_get_*() routines take just
> > a struct vcpu_svm routine so that they don't get confused with the
> > ghcb_get_*() routines? The current uses are just using svm->sev_es.ghcb
> > to set the ghcb variable that gets used anyway. That way the KVM
> > versions look specifically like KVM versions.
> 
> Yeah, that's a great idea.  I'll send a patch, 

Actually, I'll do that straightaway in this patch (need to send a v16 anyways).
Introducing kvm_ghcb_get_##field() and then immediately changing all callers is
ridiculous, and if this ends up getting backported to LTS kernels, it'd be better
to backport the final form, e.g. so that additional fixes don't generate conflicts
that could have been easily avoided.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 09/41] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs
  2025-09-17 12:47     ` Sean Christopherson
@ 2025-09-17 21:56       ` Sean Christopherson
  0 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-17 21:56 UTC (permalink / raw)
  To: Binbin Wu
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z

On Wed, Sep 17, 2025, Sean Christopherson wrote:
> On Tue, Sep 16, 2025, Binbin Wu wrote:
> > > +/*
> > > + * Lock and/or reload guest FPU and access xstate MSRs. For accesses initiated
> > 
> > 
> > Lock is unconditional and reload is conditional.
> > "and/or" seems not accurate?
> 
> Agreed.  This?
> 
> /*
>  * Lock andr (re)load guest FPU and access xstate MSRs. For accesses initiated
>  * by host, guest FPU is loaded in __msr_io(). For accesses initiated by guest,
>  * guest FPU should have been loaded already.
>  */

That's not very good either.

/*
 * Lock (and if necessary, re-load) the guest FPU, i.e. XSTATE, and access an
 * MSR that is managed via XSTATE.  Note, the caller is responsible for doing
 * the initial FPU load, this helper only ensures that guest state is resident
 * in hardware (the kernel can load its FPU state in IRQ context).
 */

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 19/41] KVM: x86: Enable CET virtualization for VMX and advertise to userspace
  2025-09-12 23:22 ` [PATCH v15 19/41] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Sean Christopherson
@ 2025-09-18  1:57   ` Binbin Wu
  2025-09-19 22:57     ` Sean Christopherson
  2025-09-18  2:18   ` Binbin Wu
  1 sibling, 1 reply; 130+ messages in thread
From: Binbin Wu @ 2025-09-18  1:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z



On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Expose CET features to guest if KVM/host can support them, clear CPUID
> feature bits if KVM/host cannot support.
>
> Set CPUID feature bits so that CET features are available in guest CPUID.
> Add CR4.CET bit support in order to allow guest set CET master control
> bit.
>
> Disable KVM CET feature if unrestricted_guest is unsupported/disabled as
> KVM does not support emulating CET.
>
> The CET load-bits in VM_ENTRY/VM_EXIT control fields should be set to make
> guest CET xstates isolated from host's.
>
> On platforms with VMX_BASIC[bit56] == 0, inject #CP at VMX entry with error
> code will fail, and if VMX_BASIC[bit56] == 1, #CP injection with or without
> error code is allowed. Disable CET feature bits if the MSR bit is cleared
> so that nested VMM can inject #CP if and only if VMX_BASIC[bit56] == 1.
>
> Don't expose CET feature if either of {U,S}_CET xstate bits is cleared
> in host XSS or if XSAVES isn't supported.
>
> CET MSRs are reset to 0s after RESET, power-up and INIT, clear guest CET
> xsave-area fields so that guest CET MSRs are reset to 0s after the events.
>
> Meanwhile explicitly disable SHSTK and IBT for SVM because CET KVM enabling
> for SVM is not ready.
>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Signed-off-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

One nit below.

[...]
> 			\
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 15f208c44cbd..c78acab2ff3f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -226,7 +226,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
>    * PT via guest XSTATE would clobber perf state), i.e. KVM doesn't support
>    * IA32_XSS[bit 8] (guests can/must use RDMSR/WRMSR to save/restore PT MSRs).
>    */
> -#define KVM_SUPPORTED_XSS     0
> +#define KVM_SUPPORTED_XSS	(XFEATURE_MASK_CET_USER | \
> +				 XFEATURE_MASK_CET_KERNEL)

Since XFEATURE_MASK_CET_USER and XFEATURE_MASK_CET_KERNEL are always checked or
set together, does it make sense to use a macro for the two bits?

>   
>   bool __read_mostly allow_smaller_maxphyaddr = 0;
>   EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr);
> @@ -10080,6 +10081,20 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>   	if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
>   		kvm_caps.supported_xss = 0;
>   
> +	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
> +	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
> +		kvm_caps.supported_xss &= ~(XFEATURE_MASK_CET_USER |
> +					    XFEATURE_MASK_CET_KERNEL);
> +
> +	if ((kvm_caps.supported_xss & (XFEATURE_MASK_CET_USER |
> +	     XFEATURE_MASK_CET_KERNEL)) !=
> +	    (XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL)) {
> +		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
> +		kvm_cpu_cap_clear(X86_FEATURE_IBT);
> +		kvm_caps.supported_xss &= ~(XFEATURE_MASK_CET_USER |
> +					    XFEATURE_MASK_CET_KERNEL);
> +	}
> +
>   	if (kvm_caps.has_tsc_control) {
>   		/*
>   		 * Make sure the user can only configure tsc_khz values that
> @@ -12735,10 +12750,11 @@ static void kvm_xstate_reset(struct kvm_vcpu *vcpu, bool init_event)
>   	/*
>   	 * On INIT, only select XSTATE components are zeroed, most components
>   	 * are unchanged.  Currently, the only components that are zeroed and
> -	 * supported by KVM are MPX related.
> +	 * supported by KVM are MPX and CET related.
>   	 */
>   	xfeatures_mask = (kvm_caps.supported_xcr0 | kvm_caps.supported_xss) &
> -			 (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR);
> +			 (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR |
> +			  XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL);
>   	if (!xfeatures_mask)
>   		return;
>   
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 65cbd454c4f1..f3dc77f006f9 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -680,6 +680,9 @@ static inline bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>   		__reserved_bits |= X86_CR4_PCIDE;       \
>   	if (!__cpu_has(__c, X86_FEATURE_LAM))           \
>   		__reserved_bits |= X86_CR4_LAM_SUP;     \
> +	if (!__cpu_has(__c, X86_FEATURE_SHSTK) &&       \
> +	    !__cpu_has(__c, X86_FEATURE_IBT))           \
> +		__reserved_bits |= X86_CR4_CET;         \
>   	__reserved_bits;                                \
>   })
>   


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 19/41] KVM: x86: Enable CET virtualization for VMX and advertise to userspace
  2025-09-12 23:22 ` [PATCH v15 19/41] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Sean Christopherson
  2025-09-18  1:57   ` Binbin Wu
@ 2025-09-18  2:18   ` Binbin Wu
  2025-09-18 18:05     ` Sean Christopherson
  1 sibling, 1 reply; 130+ messages in thread
From: Binbin Wu @ 2025-09-18  2:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z



On 9/13/2025 7:22 AM, Sean Christopherson wrote:
[...]
>   
> +static inline bool cpu_has_vmx_basic_no_hw_errcode(void)
> +{
> +	return	vmcs_config.basic & VMX_BASIC_NO_HW_ERROR_CODE_CC;
> +}
> +

I think "_cc" should be appended to the function name, although it would make
the function name longer. Without "_cc", the meaning is different and confusing.



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 20/41] KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2
  2025-09-12 23:22 ` [PATCH v15 20/41] KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2 Sean Christopherson
@ 2025-09-18  2:27   ` Binbin Wu
  0 siblings, 0 replies; 130+ messages in thread
From: Binbin Wu @ 2025-09-18  2:27 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z



On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Per SDM description(Vol.3D, Appendix A.1):
> "If bit 56 is read as 1, software can use VM entry to deliver a hardware
> exception with or without an error code, regardless of vector"
>
> Modify has_error_code check before inject events to nested guest. Only
                                       ^
                                    injecting
> enforce the check when guest is in real mode, the exception is not hard
> exception and the platform doesn't enumerate bit56 in VMX_BASIC, in all
> other case ignore the check to make the logic consistent with SDM.
        ^
       cases
>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 21/41] KVM: nVMX: Prepare for enabling CET support for nested guest
  2025-09-12 23:22 ` [PATCH v15 21/41] KVM: nVMX: Prepare for enabling CET support for nested guest Sean Christopherson
  2025-09-15 17:45   ` Xin Li
@ 2025-09-18  4:48   ` Xin Li
  2025-09-18 18:05     ` Sean Christopherson
  1 sibling, 1 reply; 130+ messages in thread
From: Xin Li @ 2025-09-18  4:48 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li, Zhang Yi Z

On 9/12/2025 4:22 PM, Sean Christopherson wrote:
> diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
> index 56fd150a6f24..4ad6b16525b9 100644
> --- a/arch/x86/kvm/vmx/vmcs12.h
> +++ b/arch/x86/kvm/vmx/vmcs12.h
> @@ -117,7 +117,13 @@ struct __packed vmcs12 {
>   	natural_width host_ia32_sysenter_eip;
>   	natural_width host_rsp;
>   	natural_width host_rip;
> -	natural_width paddingl[8]; /* room for future expansion */
> +	natural_width host_s_cet;
> +	natural_width host_ssp;
> +	natural_width host_ssp_tbl;
> +	natural_width guest_s_cet;
> +	natural_width guest_ssp;
> +	natural_width guest_ssp_tbl;
> +	natural_width paddingl[2]; /* room for future expansion */
>   	u32 pin_based_vm_exec_control;
>   	u32 cpu_based_vm_exec_control;
>   	u32 exception_bitmap;
> @@ -294,6 +300,12 @@ static inline void vmx_check_vmcs12_offsets(void)
>   	CHECK_OFFSET(host_ia32_sysenter_eip, 656);
>   	CHECK_OFFSET(host_rsp, 664);
>   	CHECK_OFFSET(host_rip, 672);
> +	CHECK_OFFSET(host_s_cet, 680);
> +	CHECK_OFFSET(host_ssp, 688);
> +	CHECK_OFFSET(host_ssp_tbl, 696);
> +	CHECK_OFFSET(guest_s_cet, 704);
> +	CHECK_OFFSET(guest_ssp, 712);
> +	CHECK_OFFSET(guest_ssp_tbl, 720);
>   	CHECK_OFFSET(pin_based_vm_exec_control, 744);
>   	CHECK_OFFSET(cpu_based_vm_exec_control, 748);
>   	CHECK_OFFSET(exception_bitmap, 752);


This patch modifies struct vms12 without updating the corresponding vmcs12
definition in Documentation/virt/kvm/x86/nested-vmx.rst.  However,
duplicating the definition within the same source tree seems unnecessary
and prone to inconsistencies.  E.g., the following fields are missing in
Documentation/virt/kvm/x86/nested-vmx.rst:

	...
	u64 posted_intr_desc_addr;
	...
	u64 eoi_exit_bitmap0;
	u64 eoi_exit_bitmap1;
	u64 eoi_exit_bitmap2;
	u64 eoi_exit_bitmap3;
	u64 xss_exit_bitmap;
	...

What's more, the 64-bit padding fields are completely messed up; we have
used 9 u64 after host_ia32_efer:

         u64 host_ia32_perf_global_ctrl;
         u64 vmread_bitmap;
         u64 vmwrite_bitmap;
         u64 vm_function_control;
         u64 eptp_list_address;
         u64 pml_address;
         u64 encls_exiting_bitmap;
         u64 tsc_multiplier;
         u64 padding64[1]; /* room for future expansion */


But it's 8 u64 after host_ia32_efer in the documentation:

	u64 padding64[8]; /* room for future expansion */


We probably should remove it from Documentation/virt/kvm/x86/nested-vmx.rst
and instead add a reference to arch/x86/kvm/vmx/vmcs12.h.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 18/41] KVM: x86: Don't emulate instructions affected by CET features
  2025-09-17  8:19   ` Xiaoyao Li
@ 2025-09-18 14:15     ` Chao Gao
  2025-09-19  1:25       ` Sean Christopherson
  0 siblings, 1 reply; 130+ messages in thread
From: Chao Gao @ 2025-09-18 14:15 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Sean Christopherson, Paolo Bonzini, kvm, linux-kernel,
	Tom Lendacky, Mathias Krause, John Allen, Rick Edgecombe,
	Maxim Levitsky, Zhang Yi Z

>> 
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index 542d3664afa3..e4be54a677b0 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -178,6 +178,8 @@
>>   #define IncSP       ((u64)1 << 54)  /* SP is incremented before ModRM calc */
>>   #define TwoMemOp    ((u64)1 << 55)  /* Instruction has two memory operand */
>>   #define IsBranch    ((u64)1 << 56)  /* Instruction is considered a branch. */
>> +#define ShadowStack ((u64)1 << 57)  /* Instruction protected by Shadow Stack. */
>> +#define IndirBrnTrk ((u64)1 << 58)  /* Instruction protected by IBT. */
>>   #define DstXacc     (DstAccLo | SrcAccHi | SrcWrite)
>> @@ -4068,9 +4070,9 @@ static const struct opcode group4[] = {
>>   static const struct opcode group5[] = {
>>   	F(DstMem | SrcNone | Lock,		em_inc),
>>   	F(DstMem | SrcNone | Lock,		em_dec),
>> -	I(SrcMem | NearBranch | IsBranch,       em_call_near_abs),
>> -	I(SrcMemFAddr | ImplicitOps | IsBranch, em_call_far),
>> -	I(SrcMem | NearBranch | IsBranch,       em_jmp_abs),
>> +	I(SrcMem | NearBranch | IsBranch | ShadowStack | IndirBrnTrk, em_call_near_abs),
>> +	I(SrcMemFAddr | ImplicitOps | IsBranch | ShadowStack | IndirBrnTrk, em_call_far),
>> +	I(SrcMem | NearBranch | IsBranch | IndirBrnTrk, em_jmp_abs),
>
>>   	I(SrcMemFAddr | ImplicitOps | IsBranch, em_jmp_far),
>
>It seems this entry for 'FF 05' (Jump far, absolute indirect) needs to set
>ShadowStack and IndirBrnTrk as well?

Yes. I just checked the pseudo code of the JMP instruction in SDM vol2. A far
jump to a CONFORMING-CODE-SEGMENT or NONCONFORMING-CODE-SEGMENT is affected by
both shadow stack and IBT, and a far jump to a call gate is affected by IBT.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 18/41] KVM: x86: Don't emulate instructions affected by CET features
  2025-09-17 21:15     ` Sean Christopherson
@ 2025-09-18 14:54       ` Chao Gao
  2025-09-18 18:02         ` Sean Christopherson
  0 siblings, 1 reply; 130+ messages in thread
From: Chao Gao @ 2025-09-18 14:54 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z

>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -4326,8 +4326,8 @@ static const struct opcode opcode_table[256] = {
>> 	X8(I(DstReg | SrcImm64 | Mov, em_mov)),
>> 	/* 0xC0 - 0xC7 */
>> 	G(ByteOp | Src2ImmByte, group2), G(Src2ImmByte, group2),
>> -	I(ImplicitOps | NearBranch | SrcImmU16 | IsBranch, em_ret_near_imm),
>> -	I(ImplicitOps | NearBranch | IsBranch, em_ret),
>> +	I(ImplicitOps | NearBranch | SrcImmU16 | IsBranch | ShadowStack, em_ret_near_imm),
>> +	I(ImplicitOps | NearBranch | IsBranch | ShadowStack, em_ret),
>
>Tangentially directly related to this bug, I think we should manual annotation
>where possible.  I don't see an easy way to do that for ShadowStack, but for IBT
>we can use IsBranch, NearBranch and the SrcXXX operance to detect IBT-affected
>instructions.  It's obviously more complex, but programmatically detecting
>indirect branches should be less error prone.  I'll do so in the next version.
>
>> 	I(DstReg | SrcMemFAddr | ModRM | No64 | Src2ES, em_lseg),
>> 	I(DstReg | SrcMemFAddr | ModRM | No64 | Src2DS, em_lseg),
>> 	G(ByteOp, group11), G(0, group11),
>> 
>> 
>> And for reference, below are the changes I made to KUT's cet.c
>
>I now have a more comprehensive set of testcases, and it can be upstreamed
>(relies on KVM's default behavior of injecting #UD at CPL==3 on failed emulation).

IIUC, for KVM_FEP-prefixed instructions, the emulation type is set to
EMULTYPE_TRAP_UD_FORCED. Regardless of the CPL and
KVM_CAP_EXIT_ON_EMULATION_FAILURE, KVM will always inject #UD on failed
emulation.

		r = x86_decode_emulated_instruction(vcpu, emulation_type,
						    insn, insn_len);
		if (r != EMULATION_OK)  {
			if ((emulation_type & EMULTYPE_TRAP_UD) ||
			    (emulation_type & EMULTYPE_TRAP_UD_FORCED)) {
				kvm_queue_exception(vcpu, UD_VECTOR);
				return 1;
			}

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 18/41] KVM: x86: Don't emulate instructions affected by CET features
  2025-09-18 14:54       ` Chao Gao
@ 2025-09-18 18:02         ` Sean Christopherson
  0 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-18 18:02 UTC (permalink / raw)
  To: Chao Gao
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z

On Thu, Sep 18, 2025, Chao Gao wrote:
> >> --- a/arch/x86/kvm/emulate.c
> >> +++ b/arch/x86/kvm/emulate.c
> >> @@ -4326,8 +4326,8 @@ static const struct opcode opcode_table[256] = {
> >> 	X8(I(DstReg | SrcImm64 | Mov, em_mov)),
> >> 	/* 0xC0 - 0xC7 */
> >> 	G(ByteOp | Src2ImmByte, group2), G(Src2ImmByte, group2),
> >> -	I(ImplicitOps | NearBranch | SrcImmU16 | IsBranch, em_ret_near_imm),
> >> -	I(ImplicitOps | NearBranch | IsBranch, em_ret),
> >> +	I(ImplicitOps | NearBranch | SrcImmU16 | IsBranch | ShadowStack, em_ret_near_imm),
> >> +	I(ImplicitOps | NearBranch | IsBranch | ShadowStack, em_ret),
> >
> >Tangentially directly related to this bug, I think we should manual annotation
> >where possible.  I don't see an easy way to do that for ShadowStack, but for IBT
> >we can use IsBranch, NearBranch and the SrcXXX operance to detect IBT-affected
> >instructions.  It's obviously more complex, but programmatically detecting
> >indirect branches should be less error prone.  I'll do so in the next version.
> >
> >> 	I(DstReg | SrcMemFAddr | ModRM | No64 | Src2ES, em_lseg),
> >> 	I(DstReg | SrcMemFAddr | ModRM | No64 | Src2DS, em_lseg),
> >> 	G(ByteOp, group11), G(0, group11),
> >> 
> >> 
> >> And for reference, below are the changes I made to KUT's cet.c
> >
> >I now have a more comprehensive set of testcases, and it can be upstreamed
> >(relies on KVM's default behavior of injecting #UD at CPL==3 on failed emulation).
> 
> IIUC, for KVM_FEP-prefixed instructions, the emulation type is set to
> EMULTYPE_TRAP_UD_FORCED. Regardless of the CPL and
> KVM_CAP_EXIT_ON_EMULATION_FAILURE, KVM will always inject #UD on failed
> emulation.
> 
> 		r = x86_decode_emulated_instruction(vcpu, emulation_type,
> 						    insn, insn_len);
> 		if (r != EMULATION_OK)  {
> 			if ((emulation_type & EMULTYPE_TRAP_UD) ||
> 			    (emulation_type & EMULTYPE_TRAP_UD_FORCED)) {
> 				kvm_queue_exception(vcpu, UD_VECTOR);
> 				return 1;
> 			}

Oh, right.  Nice, that makes the KUT testcase much less hacky :-)

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 21/41] KVM: nVMX: Prepare for enabling CET support for nested guest
  2025-09-18  4:48   ` Xin Li
@ 2025-09-18 18:05     ` Sean Christopherson
  0 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-18 18:05 UTC (permalink / raw)
  To: Xin Li
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z

On Wed, Sep 17, 2025, Xin Li wrote:
> On 9/12/2025 4:22 PM, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
> > index 56fd150a6f24..4ad6b16525b9 100644
> > --- a/arch/x86/kvm/vmx/vmcs12.h
> > +++ b/arch/x86/kvm/vmx/vmcs12.h
> > @@ -117,7 +117,13 @@ struct __packed vmcs12 {
> >   	natural_width host_ia32_sysenter_eip;
> >   	natural_width host_rsp;
> >   	natural_width host_rip;
> > -	natural_width paddingl[8]; /* room for future expansion */
> > +	natural_width host_s_cet;
> > +	natural_width host_ssp;
> > +	natural_width host_ssp_tbl;
> > +	natural_width guest_s_cet;
> > +	natural_width guest_ssp;
> > +	natural_width guest_ssp_tbl;
> > +	natural_width paddingl[2]; /* room for future expansion */
> >   	u32 pin_based_vm_exec_control;
> >   	u32 cpu_based_vm_exec_control;
> >   	u32 exception_bitmap;
> > @@ -294,6 +300,12 @@ static inline void vmx_check_vmcs12_offsets(void)
> >   	CHECK_OFFSET(host_ia32_sysenter_eip, 656);
> >   	CHECK_OFFSET(host_rsp, 664);
> >   	CHECK_OFFSET(host_rip, 672);
> > +	CHECK_OFFSET(host_s_cet, 680);
> > +	CHECK_OFFSET(host_ssp, 688);
> > +	CHECK_OFFSET(host_ssp_tbl, 696);
> > +	CHECK_OFFSET(guest_s_cet, 704);
> > +	CHECK_OFFSET(guest_ssp, 712);
> > +	CHECK_OFFSET(guest_ssp_tbl, 720);
> >   	CHECK_OFFSET(pin_based_vm_exec_control, 744);
> >   	CHECK_OFFSET(cpu_based_vm_exec_control, 748);
> >   	CHECK_OFFSET(exception_bitmap, 752);
> 
> 
> This patch modifies struct vms12 without updating the corresponding vmcs12
> definition in Documentation/virt/kvm/x86/nested-vmx.rst.  However,
> duplicating the definition within the same source tree seems unnecessary
> and prone to inconsistencies.  E.g., the following fields are missing in
> Documentation/virt/kvm/x86/nested-vmx.rst:
> 
> 	...
> 	u64 posted_intr_desc_addr;
> 	...
> 	u64 eoi_exit_bitmap0;
> 	u64 eoi_exit_bitmap1;
> 	u64 eoi_exit_bitmap2;
> 	u64 eoi_exit_bitmap3;
> 	u64 xss_exit_bitmap;
> 	...
> 
> What's more, the 64-bit padding fields are completely messed up; we have
> used 9 u64 after host_ia32_efer:
> 
>         u64 host_ia32_perf_global_ctrl;
>         u64 vmread_bitmap;
>         u64 vmwrite_bitmap;
>         u64 vm_function_control;
>         u64 eptp_list_address;
>         u64 pml_address;
>         u64 encls_exiting_bitmap;
>         u64 tsc_multiplier;
>         u64 padding64[1]; /* room for future expansion */
> 
> 
> But it's 8 u64 after host_ia32_efer in the documentation:
> 
> 	u64 padding64[8]; /* room for future expansion */
> 
> 
> We probably should remove it from Documentation/virt/kvm/x86/nested-vmx.rst
> and instead add a reference to arch/x86/kvm/vmx/vmcs12.h.

Yeah, the paragraph above is also stale, see commit cb9fb5fc12ef ("KVM: nVMX:
Update VMCS12_REVISION comment to state it should never change") (I forgot that
Documentation/virt/kvm/x86/nested-vmx.rst existed).

  For convenience, we repeat the content of struct vmcs12 here. If the internals
  of this structure changes, this can break live migration across KVM versions.
  VMCS12_REVISION (from vmx.c) should be changed if struct vmcs12 or its inner
  struct shadow_vmcs is ever changed.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 19/41] KVM: x86: Enable CET virtualization for VMX and advertise to userspace
  2025-09-18  2:18   ` Binbin Wu
@ 2025-09-18 18:05     ` Sean Christopherson
  2025-09-19  7:10       ` Xiaoyao Li
  0 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-18 18:05 UTC (permalink / raw)
  To: Binbin Wu
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z

On Thu, Sep 18, 2025, Binbin Wu wrote:
> On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> [...]
> > +static inline bool cpu_has_vmx_basic_no_hw_errcode(void)
> > +{
> > +	return	vmcs_config.basic & VMX_BASIC_NO_HW_ERROR_CODE_CC;
> > +}
> > +
> 
> I think "_cc" should be appended to the function name, although it would make
> the function name longer. Without "_cc", the meaning is different and confusing.

+1, got it fixed up.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  2025-09-16 22:55           ` John Allen
@ 2025-09-18 19:48             ` John Allen
  2025-09-18 20:34               ` Sean Christopherson
  0 siblings, 1 reply; 130+ messages in thread
From: John Allen @ 2025-09-18 19:48 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li

On Tue, Sep 16, 2025 at 05:55:33PM -0500, John Allen wrote:
> On Tue, Sep 16, 2025 at 02:38:52PM -0700, Sean Christopherson wrote:
> > On Tue, Sep 16, 2025, John Allen wrote:
> > > On Tue, Sep 16, 2025 at 12:53:58PM -0700, Sean Christopherson wrote:
> > > > On Tue, Sep 16, 2025, John Allen wrote:
> > > > > On Fri, Sep 12, 2025 at 04:23:07PM -0700, Sean Christopherson wrote:
> > > > > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > > > > index 0cd77a87dd84..0cd32df7b9b6 100644
> > > > > > --- a/arch/x86/kvm/svm/sev.c
> > > > > > +++ b/arch/x86/kvm/svm/sev.c
> > > > > > @@ -3306,6 +3306,9 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> > > > > >  	if (kvm_ghcb_xcr0_is_valid(svm))
> > > > > >  		__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
> > > > > >  
> > > > > > +	if (kvm_ghcb_xss_is_valid(svm))
> > > > > > +		__kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(ghcb));
> > > > > > +
> > > > > 
> > > > > It looks like this is the change that caused the selftest regression
> > > > > with sev-es. It's not yet clear to me what the problem is though.
> > > > 
> > > > Do you see any WARNs in the guest kernel log?
> > > > 
> > > > The most obvious potential bug is that KVM is missing a CPUID update, e.g. due
> > > > to dropping an XSS write, consuming stale data, not setting cpuid_dynamic_bits_dirty,
> > > > etc.  But AFAICT, CPUID.0xD.1.EBX (only thing that consumes the current XSS) is
> > > > only used by init_xstate_size(), and I would expect the guest kernel's sanity
> > > > checks in paranoid_xstate_size_valid() to yell if KVM botches CPUID emulation.
> > > 
> > > Yes, actually that looks to be the case:
> > > 
> > > [    0.463504] ------------[ cut here ]------------
> > > [    0.464443] XSAVE consistency problem: size 880 != kernel_size 840
> > > [    0.465445] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/fpu/xstate.c:638 paranoid_xstate_size_valid+0x101/0x140
> > 
> > Can you run with the below printk tracing in the host (and optionally tracing in
> > the guest for its updates)?  Compile tested only.
> 
> Interesting, I see "Guest CPUID doesn't have XSAVES" times the number of
> cpus followed by "XSS already set to val = 0, eliding updates" times the
> number of cpus. This is with host tracing only. I can try with guest
> tracing too in the morning.

Ok, I think I see the problem. The cases above where we were seeing the
added print statements from kvm_set_msr_common were not situations where
we were going through the __kvm_emulate_msr_write via
sev_es_sync_from_ghcb. When we call __kvm_emulate_msr_write from this
context, we never end up getting to kvm_set_msr_common because we hit
the following statement at the top of svm_set_msr:

if (sev_es_prevent_msr_access(vcpu, msr))
	return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0;

So I'm not sure if this would force using the original method of
directly setting arch.ia32_xss or if there's some additional handling
here that we need in this scenario to allow the msr access.

Thanks,
John

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  2025-09-18 19:48             ` John Allen
@ 2025-09-18 20:34               ` Sean Christopherson
  2025-09-18 20:44                 ` Sean Christopherson
  0 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-18 20:34 UTC (permalink / raw)
  To: John Allen
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li

On Thu, Sep 18, 2025, John Allen wrote:
> On Tue, Sep 16, 2025 at 05:55:33PM -0500, John Allen wrote:
> > On Tue, Sep 16, 2025 at 02:38:52PM -0700, Sean Christopherson wrote:
> > > On Tue, Sep 16, 2025, John Allen wrote:
> > > > On Tue, Sep 16, 2025 at 12:53:58PM -0700, Sean Christopherson wrote:
> > > > > On Tue, Sep 16, 2025, John Allen wrote:
> > > > > > On Fri, Sep 12, 2025 at 04:23:07PM -0700, Sean Christopherson wrote:
> > > > > > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > > > > > index 0cd77a87dd84..0cd32df7b9b6 100644
> > > > > > > --- a/arch/x86/kvm/svm/sev.c
> > > > > > > +++ b/arch/x86/kvm/svm/sev.c
> > > > > > > @@ -3306,6 +3306,9 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> > > > > > >  	if (kvm_ghcb_xcr0_is_valid(svm))
> > > > > > >  		__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
> > > > > > >  
> > > > > > > +	if (kvm_ghcb_xss_is_valid(svm))
> > > > > > > +		__kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(ghcb));
> > > > > > > +
> > > > > > 
> > > > > > It looks like this is the change that caused the selftest regression
> > > > > > with sev-es. It's not yet clear to me what the problem is though.
> > > > > 
> > > > > Do you see any WARNs in the guest kernel log?
> > > > > 
> > > > > The most obvious potential bug is that KVM is missing a CPUID update, e.g. due
> > > > > to dropping an XSS write, consuming stale data, not setting cpuid_dynamic_bits_dirty,
> > > > > etc.  But AFAICT, CPUID.0xD.1.EBX (only thing that consumes the current XSS) is
> > > > > only used by init_xstate_size(), and I would expect the guest kernel's sanity
> > > > > checks in paranoid_xstate_size_valid() to yell if KVM botches CPUID emulation.
> > > > 
> > > > Yes, actually that looks to be the case:
> > > > 
> > > > [    0.463504] ------------[ cut here ]------------
> > > > [    0.464443] XSAVE consistency problem: size 880 != kernel_size 840
> > > > [    0.465445] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/fpu/xstate.c:638 paranoid_xstate_size_valid+0x101/0x140
> > > 
> > > Can you run with the below printk tracing in the host (and optionally tracing in
> > > the guest for its updates)?  Compile tested only.
> > 
> > Interesting, I see "Guest CPUID doesn't have XSAVES" times the number of
> > cpus followed by "XSS already set to val = 0, eliding updates" times the
> > number of cpus. This is with host tracing only. I can try with guest
> > tracing too in the morning.
> 
> Ok, I think I see the problem. The cases above where we were seeing the
> added print statements from kvm_set_msr_common were not situations where
> we were going through the __kvm_emulate_msr_write via
> sev_es_sync_from_ghcb. When we call __kvm_emulate_msr_write from this
> context, we never end up getting to kvm_set_msr_common because we hit
> the following statement at the top of svm_set_msr:
> 
> if (sev_es_prevent_msr_access(vcpu, msr))
> 	return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0;

Gah, I was looking for something like that but couldn't find it, obviously.

> So I'm not sure if this would force using the original method of
> directly setting arch.ia32_xss or if there's some additional handling
> here that we need in this scenario to allow the msr access.

Does this fix things?  If so, I'll slot in a patch to extract setting XSS to
the helper, and then this patch can use that API.  I like the symmetry between
__kvm_set_xcr() and __kvm_set_xss(), and I especially like not doing a generic
end-around on svm_set_msr() by calling kvm_set_msr_common() directly.

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 945f7da60107..ace9f321d2c9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2213,6 +2213,7 @@ unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu);
 void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw);
 int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
 int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu);
+int __kvm_set_xss(struct kvm_vcpu *vcpu, u64 xss);
 
 int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
 int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 94d9acc94c9a..462aebc54135 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3355,7 +3355,7 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
                __kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(svm));
 
        if (kvm_ghcb_xss_is_valid(svm))
-               __kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(svm));
+               __kvm_set_xss(vcpu, kvm_ghcb_get_xss(svm));
 
        /* Copy the GHCB exit information into the VMCB fields */
        exit_code = kvm_ghcb_get_sw_exit_code(svm);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5bbc187ab428..9b81e92a8de5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1313,6 +1313,22 @@ int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_xsetbv);
 
+int __kvm_set_xss(struct kvm_vcpu *vcpu, u64 xss)
+{
+       if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
+               return KVM_MSR_RET_UNSUPPORTED;
+
+       if (xss & ~vcpu->arch.guest_supported_xss)
+               return 1;
+       if (vcpu->arch.ia32_xss == xss)
+               return 0;
+
+       vcpu->arch.ia32_xss = xss;
+       vcpu->arch.cpuid_dynamic_bits_dirty = true;
+       return 0;
+}
+EXPORT_SYMBOL_GPL(__kvm_set_xss);
+
 static bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
        return __kvm_is_valid_cr4(vcpu, cr4) &&
@@ -4119,16 +4135,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
                }
                break;
        case MSR_IA32_XSS:
-               if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
-                       return KVM_MSR_RET_UNSUPPORTED;
-
-               if (data & ~vcpu->arch.guest_supported_xss)
-                       return 1;
-               if (vcpu->arch.ia32_xss == data)
-                       break;
-               vcpu->arch.ia32_xss = data;
-               vcpu->arch.cpuid_dynamic_bits_dirty = true;
-               break;
+               return __kvm_set_xss(vcpu, data);
        case MSR_SMI_COUNT:
                if (!msr_info->host_initiated)
                        return 1;

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  2025-09-18 20:34               ` Sean Christopherson
@ 2025-09-18 20:44                 ` Sean Christopherson
  2025-09-18 21:23                   ` John Allen
  0 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-18 20:44 UTC (permalink / raw)
  To: John Allen
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li

On Thu, Sep 18, 2025, Sean Christopherson wrote:
> On Thu, Sep 18, 2025, John Allen wrote:
> > On Tue, Sep 16, 2025 at 05:55:33PM -0500, John Allen wrote:
> > > Interesting, I see "Guest CPUID doesn't have XSAVES" times the number of
> > > cpus followed by "XSS already set to val = 0, eliding updates" times the
> > > number of cpus. This is with host tracing only. I can try with guest
> > > tracing too in the morning.
> > 
> > Ok, I think I see the problem. The cases above where we were seeing the
> > added print statements from kvm_set_msr_common were not situations where
> > we were going through the __kvm_emulate_msr_write via
> > sev_es_sync_from_ghcb. When we call __kvm_emulate_msr_write from this
> > context, we never end up getting to kvm_set_msr_common because we hit
> > the following statement at the top of svm_set_msr:
> > 
> > if (sev_es_prevent_msr_access(vcpu, msr))
> > 	return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0;
> 
> Gah, I was looking for something like that but couldn't find it, obviously.
> 
> > So I'm not sure if this would force using the original method of
> > directly setting arch.ia32_xss or if there's some additional handling
> > here that we need in this scenario to allow the msr access.
> 
> Does this fix things?  If so, I'll slot in a patch to extract setting XSS to
> the helper, and then this patch can use that API.  I like the symmetry between
> __kvm_set_xcr() and __kvm_set_xss(), and I especially like not doing a generic
> end-around on svm_set_msr() by calling kvm_set_msr_common() directly.

Scratch that, KVM supports intra-host (and inter-host?) migration of SEV-ES
guests and so needs to allow the host to save/restore XSS, otherwise a guest
that *knows* its XSS hasn't change could get stale/bad CPUID emulation if the
guest doesn't provide XSS in the GHCB on every exit.

So while seemingly hacky, I'm pretty sure the right solution is actually:

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cabe1950b160..d48bf20c865b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2721,8 +2721,8 @@ static int svm_get_feature_msr(u32 msr, u64 *data)
 static bool sev_es_prevent_msr_access(struct kvm_vcpu *vcpu,
                                      struct msr_data *msr_info)
 {
-       return sev_es_guest(vcpu->kvm) &&
-              vcpu->arch.guest_state_protected &&
+       return sev_es_guest(vcpu->kvm) && vcpu->arch.guest_state_protected &&
+              msr_info->index != MSR_IA32_XSS &&
               !msr_write_intercepted(vcpu, msr_info->index);
 }
 
Side topic, checking msr_write_intercepted() is likely wrong.  It's a bad
heuristic for "managed in the VMSA".  MSRs that _KVM_ loads into hardware and
context switches should still be accessible.  I haven't looked to see if this is
a problem in practice.

> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 945f7da60107..ace9f321d2c9 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2213,6 +2213,7 @@ unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu);
>  void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw);
>  int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
>  int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu);
> +int __kvm_set_xss(struct kvm_vcpu *vcpu, u64 xss);
>  
>  int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
>  int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 94d9acc94c9a..462aebc54135 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3355,7 +3355,7 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
>                 __kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(svm));
>  
>         if (kvm_ghcb_xss_is_valid(svm))
> -               __kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(svm));
> +               __kvm_set_xss(vcpu, kvm_ghcb_get_xss(svm));
>  
>         /* Copy the GHCB exit information into the VMCB fields */
>         exit_code = kvm_ghcb_get_sw_exit_code(svm);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5bbc187ab428..9b81e92a8de5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1313,6 +1313,22 @@ int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_xsetbv);
>  
> +int __kvm_set_xss(struct kvm_vcpu *vcpu, u64 xss)
> +{
> +       if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
> +               return KVM_MSR_RET_UNSUPPORTED;
> +
> +       if (xss & ~vcpu->arch.guest_supported_xss)
> +               return 1;
> +       if (vcpu->arch.ia32_xss == xss)
> +               return 0;
> +
> +       vcpu->arch.ia32_xss = xss;
> +       vcpu->arch.cpuid_dynamic_bits_dirty = true;
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(__kvm_set_xss);
> +
>  static bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>  {
>         return __kvm_is_valid_cr4(vcpu, cr4) &&
> @@ -4119,16 +4135,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>                 }
>                 break;
>         case MSR_IA32_XSS:
> -               if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
> -                       return KVM_MSR_RET_UNSUPPORTED;
> -
> -               if (data & ~vcpu->arch.guest_supported_xss)
> -                       return 1;
> -               if (vcpu->arch.ia32_xss == data)
> -                       break;
> -               vcpu->arch.ia32_xss = data;
> -               vcpu->arch.cpuid_dynamic_bits_dirty = true;
> -               break;
> +               return __kvm_set_xss(vcpu, data);
>         case MSR_SMI_COUNT:
>                 if (!msr_info->host_initiated)
>                         return 1;

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  2025-09-18 20:44                 ` Sean Christopherson
@ 2025-09-18 21:23                   ` John Allen
  2025-09-18 21:42                     ` Edgecombe, Rick P
  0 siblings, 1 reply; 130+ messages in thread
From: John Allen @ 2025-09-18 21:23 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li

On Thu, Sep 18, 2025 at 01:44:13PM -0700, Sean Christopherson wrote:
> On Thu, Sep 18, 2025, Sean Christopherson wrote:
> > On Thu, Sep 18, 2025, John Allen wrote:
> > > On Tue, Sep 16, 2025 at 05:55:33PM -0500, John Allen wrote:
> > > > Interesting, I see "Guest CPUID doesn't have XSAVES" times the number of
> > > > cpus followed by "XSS already set to val = 0, eliding updates" times the
> > > > number of cpus. This is with host tracing only. I can try with guest
> > > > tracing too in the morning.
> > > 
> > > Ok, I think I see the problem. The cases above where we were seeing the
> > > added print statements from kvm_set_msr_common were not situations where
> > > we were going through the __kvm_emulate_msr_write via
> > > sev_es_sync_from_ghcb. When we call __kvm_emulate_msr_write from this
> > > context, we never end up getting to kvm_set_msr_common because we hit
> > > the following statement at the top of svm_set_msr:
> > > 
> > > if (sev_es_prevent_msr_access(vcpu, msr))
> > > 	return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0;
> > 
> > Gah, I was looking for something like that but couldn't find it, obviously.
> > 
> > > So I'm not sure if this would force using the original method of
> > > directly setting arch.ia32_xss or if there's some additional handling
> > > here that we need in this scenario to allow the msr access.
> > 
> > Does this fix things?  If so, I'll slot in a patch to extract setting XSS to
> > the helper, and then this patch can use that API.  I like the symmetry between
> > __kvm_set_xcr() and __kvm_set_xss(), and I especially like not doing a generic
> > end-around on svm_set_msr() by calling kvm_set_msr_common() directly.
> 
> Scratch that, KVM supports intra-host (and inter-host?) migration of SEV-ES
> guests and so needs to allow the host to save/restore XSS, otherwise a guest
> that *knows* its XSS hasn't change could get stale/bad CPUID emulation if the
> guest doesn't provide XSS in the GHCB on every exit.
> 
> So while seemingly hacky, I'm pretty sure the right solution is actually:
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index cabe1950b160..d48bf20c865b 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2721,8 +2721,8 @@ static int svm_get_feature_msr(u32 msr, u64 *data)
>  static bool sev_es_prevent_msr_access(struct kvm_vcpu *vcpu,
>                                       struct msr_data *msr_info)
>  {
> -       return sev_es_guest(vcpu->kvm) &&
> -              vcpu->arch.guest_state_protected &&
> +       return sev_es_guest(vcpu->kvm) && vcpu->arch.guest_state_protected &&
> +              msr_info->index != MSR_IA32_XSS &&
>                !msr_write_intercepted(vcpu, msr_info->index);
>  }

Yes, it looks like this fixes the regression. Thanks!

The 32bit selftest still doesn't work properly with sev-es, but that was
a problem with the previous version too. I suspect there's some
incompatibility between sev-es and the test, but I haven't been able to
get a good answer on why that might be.

Thanks,
John

>  
> Side topic, checking msr_write_intercepted() is likely wrong.  It's a bad
> heuristic for "managed in the VMSA".  MSRs that _KVM_ loads into hardware and
> context switches should still be accessible.  I haven't looked to see if this is
> a problem in practice.
> 
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 945f7da60107..ace9f321d2c9 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -2213,6 +2213,7 @@ unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu);
> >  void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw);
> >  int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
> >  int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu);
> > +int __kvm_set_xss(struct kvm_vcpu *vcpu, u64 xss);
> >  
> >  int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
> >  int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index 94d9acc94c9a..462aebc54135 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -3355,7 +3355,7 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> >                 __kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(svm));
> >  
> >         if (kvm_ghcb_xss_is_valid(svm))
> > -               __kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(svm));
> > +               __kvm_set_xss(vcpu, kvm_ghcb_get_xss(svm));
> >  
> >         /* Copy the GHCB exit information into the VMCB fields */
> >         exit_code = kvm_ghcb_get_sw_exit_code(svm);
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 5bbc187ab428..9b81e92a8de5 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1313,6 +1313,22 @@ int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu)
> >  }
> >  EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_xsetbv);
> >  
> > +int __kvm_set_xss(struct kvm_vcpu *vcpu, u64 xss)
> > +{
> > +       if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
> > +               return KVM_MSR_RET_UNSUPPORTED;
> > +
> > +       if (xss & ~vcpu->arch.guest_supported_xss)
> > +               return 1;
> > +       if (vcpu->arch.ia32_xss == xss)
> > +               return 0;
> > +
> > +       vcpu->arch.ia32_xss = xss;
> > +       vcpu->arch.cpuid_dynamic_bits_dirty = true;
> > +       return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(__kvm_set_xss);
> > +
> >  static bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> >  {
> >         return __kvm_is_valid_cr4(vcpu, cr4) &&
> > @@ -4119,16 +4135,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >                 }
> >                 break;
> >         case MSR_IA32_XSS:
> > -               if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
> > -                       return KVM_MSR_RET_UNSUPPORTED;
> > -
> > -               if (data & ~vcpu->arch.guest_supported_xss)
> > -                       return 1;
> > -               if (vcpu->arch.ia32_xss == data)
> > -                       break;
> > -               vcpu->arch.ia32_xss = data;
> > -               vcpu->arch.cpuid_dynamic_bits_dirty = true;
> > -               break;
> > +               return __kvm_set_xss(vcpu, data);
> >         case MSR_SMI_COUNT:
> >                 if (!msr_info->host_initiated)
> >                         return 1;

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  2025-09-18 21:23                   ` John Allen
@ 2025-09-18 21:42                     ` Edgecombe, Rick P
  2025-09-18 22:18                       ` John Allen
  0 siblings, 1 reply; 130+ messages in thread
From: Edgecombe, Rick P @ 2025-09-18 21:42 UTC (permalink / raw)
  To: seanjc@google.com, john.allen@amd.com
  Cc: Gao, Chao, Li, Xiaoyao, linux-kernel@vger.kernel.org,
	thomas.lendacky@amd.com, minipli@grsecurity.net,
	kvm@vger.kernel.org, pbonzini@redhat.com, mlevitsk@redhat.com

On Thu, 2025-09-18 at 16:23 -0500, John Allen wrote:
> The 32bit selftest still doesn't work properly with sev-es, but that was
> a problem with the previous version too. I suspect there's some
> incompatibility between sev-es and the test, but I haven't been able to
> get a good answer on why that might be.

You are talking about test_32bit() in test_shadow_stack.c?

That test relies on a specific CET arch behavior. If you try to transition to a
32 bit compatibility mode segment with an SSP with high bits set (outside the 32
bit address space), a #GP will be triggered by the HW. The test verifies that
this happens and the kernel handles it appropriately. Could it be platform/mode
difference and not KVM issue?

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  2025-09-18 21:42                     ` Edgecombe, Rick P
@ 2025-09-18 22:18                       ` John Allen
  2025-09-19 13:40                         ` Tom Lendacky
  0 siblings, 1 reply; 130+ messages in thread
From: John Allen @ 2025-09-18 22:18 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: seanjc@google.com, Gao, Chao, Li, Xiaoyao,
	linux-kernel@vger.kernel.org, thomas.lendacky@amd.com,
	minipli@grsecurity.net, kvm@vger.kernel.org, pbonzini@redhat.com,
	mlevitsk@redhat.com

On Thu, Sep 18, 2025 at 09:42:21PM +0000, Edgecombe, Rick P wrote:
> On Thu, 2025-09-18 at 16:23 -0500, John Allen wrote:
> > The 32bit selftest still doesn't work properly with sev-es, but that was
> > a problem with the previous version too. I suspect there's some
> > incompatibility between sev-es and the test, but I haven't been able to
> > get a good answer on why that might be.
> 
> You are talking about test_32bit() in test_shadow_stack.c?

Yes, that's right.

> 
> That test relies on a specific CET arch behavior. If you try to transition to a
> 32 bit compatibility mode segment with an SSP with high bits set (outside the 32
> bit address space), a #GP will be triggered by the HW. The test verifies that
> this happens and the kernel handles it appropriately. Could it be platform/mode
> difference and not KVM issue?

I'm fairly certain that this is an issue with any sev-es guest. The
unexpected seg fault happens when we isolate the sigaction32 call used
in the test regardless of shadow stack support. So I wonder if it's
something similar to the case that the test is checking for. Maybe
something to do with the C bit.

Thanks,
John

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 18/41] KVM: x86: Don't emulate instructions affected by CET features
  2025-09-18 14:15     ` Chao Gao
@ 2025-09-19  1:25       ` Sean Christopherson
  0 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-19  1:25 UTC (permalink / raw)
  To: Chao Gao
  Cc: Xiaoyao Li, Paolo Bonzini, kvm, linux-kernel, Tom Lendacky,
	Mathias Krause, John Allen, Rick Edgecombe, Maxim Levitsky,
	Zhang Yi Z

On Thu, Sep 18, 2025, Chao Gao wrote:
> >> 
> >> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> >> index 542d3664afa3..e4be54a677b0 100644
> >> --- a/arch/x86/kvm/emulate.c
> >> +++ b/arch/x86/kvm/emulate.c
> >> @@ -178,6 +178,8 @@
> >>   #define IncSP       ((u64)1 << 54)  /* SP is incremented before ModRM calc */
> >>   #define TwoMemOp    ((u64)1 << 55)  /* Instruction has two memory operand */
> >>   #define IsBranch    ((u64)1 << 56)  /* Instruction is considered a branch. */
> >> +#define ShadowStack ((u64)1 << 57)  /* Instruction protected by Shadow Stack. */
> >> +#define IndirBrnTrk ((u64)1 << 58)  /* Instruction protected by IBT. */
> >>   #define DstXacc     (DstAccLo | SrcAccHi | SrcWrite)
> >> @@ -4068,9 +4070,9 @@ static const struct opcode group4[] = {
> >>   static const struct opcode group5[] = {
> >>   	F(DstMem | SrcNone | Lock,		em_inc),
> >>   	F(DstMem | SrcNone | Lock,		em_dec),
> >> -	I(SrcMem | NearBranch | IsBranch,       em_call_near_abs),
> >> -	I(SrcMemFAddr | ImplicitOps | IsBranch, em_call_far),
> >> -	I(SrcMem | NearBranch | IsBranch,       em_jmp_abs),
> >> +	I(SrcMem | NearBranch | IsBranch | ShadowStack | IndirBrnTrk, em_call_near_abs),
> >> +	I(SrcMemFAddr | ImplicitOps | IsBranch | ShadowStack | IndirBrnTrk, em_call_far),
> >> +	I(SrcMem | NearBranch | IsBranch | IndirBrnTrk, em_jmp_abs),
> >
> >>   	I(SrcMemFAddr | ImplicitOps | IsBranch, em_jmp_far),
> >
> >It seems this entry for 'FF 05' (Jump far, absolute indirect) needs to set
> >ShadowStack and IndirBrnTrk as well?
> 
> Yes. I just checked the pseudo code of the JMP instruction in SDM vol2. A far
> jump to a CONFORMING-CODE-SEGMENT or NONCONFORMING-CODE-SEGMENT is affected by
> both shadow stack and IBT, and a far jump to a call gate is affected by IBT.

The SHSTK interaction is only a #GP condition though, and it's not _that_ awful
to emulation.  While somewhat silly, I think it makes sense to reject FAR JMP if
its IBT, but implement the SHSTK check.  Rejecting a JMP instruction for SHSTK
is weird/confusing (though definitely easier).

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 19/41] KVM: x86: Enable CET virtualization for VMX and advertise to userspace
  2025-09-18 18:05     ` Sean Christopherson
@ 2025-09-19  7:10       ` Xiaoyao Li
  2025-09-19 14:25         ` Sean Christopherson
  0 siblings, 1 reply; 130+ messages in thread
From: Xiaoyao Li @ 2025-09-19  7:10 UTC (permalink / raw)
  To: Sean Christopherson, Binbin Wu
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Zhang Yi Z

On 9/19/2025 2:05 AM, Sean Christopherson wrote:
> On Thu, Sep 18, 2025, Binbin Wu wrote:
>> On 9/13/2025 7:22 AM, Sean Christopherson wrote:
>> [...]
>>> +static inline bool cpu_has_vmx_basic_no_hw_errcode(void)
>>> +{
>>> +	return	vmcs_config.basic & VMX_BASIC_NO_HW_ERROR_CODE_CC;
>>> +}
>>> +
>>
>> I think "_cc" should be appended to the function name, although it would make
>> the function name longer. Without "_cc", the meaning is different and confusing.
> 
> +1, got it fixed up.

May I ask what the 'CC' means?

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  2025-09-18 22:18                       ` John Allen
@ 2025-09-19 13:40                         ` Tom Lendacky
  2025-09-19 16:13                           ` John Allen
  2025-09-19 17:29                           ` Edgecombe, Rick P
  0 siblings, 2 replies; 130+ messages in thread
From: Tom Lendacky @ 2025-09-19 13:40 UTC (permalink / raw)
  To: John Allen, Edgecombe, Rick P
  Cc: seanjc@google.com, Gao, Chao, Li, Xiaoyao,
	linux-kernel@vger.kernel.org, minipli@grsecurity.net,
	kvm@vger.kernel.org, pbonzini@redhat.com, mlevitsk@redhat.com

On 9/18/25 17:18, John Allen wrote:
> On Thu, Sep 18, 2025 at 09:42:21PM +0000, Edgecombe, Rick P wrote:
>> On Thu, 2025-09-18 at 16:23 -0500, John Allen wrote:
>>> The 32bit selftest still doesn't work properly with sev-es, but that was
>>> a problem with the previous version too. I suspect there's some
>>> incompatibility between sev-es and the test, but I haven't been able to
>>> get a good answer on why that might be.
>>
>> You are talking about test_32bit() in test_shadow_stack.c?
> 
> Yes, that's right.
> 
>>
>> That test relies on a specific CET arch behavior. If you try to transition to a
>> 32 bit compatibility mode segment with an SSP with high bits set (outside the 32
>> bit address space), a #GP will be triggered by the HW. The test verifies that
>> this happens and the kernel handles it appropriately. Could it be platform/mode
>> difference and not KVM issue?
> 
> I'm fairly certain that this is an issue with any sev-es guest. The
> unexpected seg fault happens when we isolate the sigaction32 call used
> in the test regardless of shadow stack support. So I wonder if it's
> something similar to the case that the test is checking for. Maybe
> something to do with the C bit.

Likely something to do with the encryption bit since, if set, will
generate an invalid address in 32-bit, right?

For SEV-ES, we transition to 64-bit very quickly because of the use of the
encryption bit, which is why, for example, we don't support SEV-ES /
SEV-SNP in the OvmfIa32X64.dsc package.

Thanks,
Tom

> 
> Thanks,
> John


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 19/41] KVM: x86: Enable CET virtualization for VMX and advertise to userspace
  2025-09-19  7:10       ` Xiaoyao Li
@ 2025-09-19 14:25         ` Sean Christopherson
  0 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-19 14:25 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Binbin Wu, Paolo Bonzini, kvm, linux-kernel, Tom Lendacky,
	Mathias Krause, John Allen, Rick Edgecombe, Chao Gao,
	Maxim Levitsky, Zhang Yi Z

On Fri, Sep 19, 2025, Xiaoyao Li wrote:
> On 9/19/2025 2:05 AM, Sean Christopherson wrote:
> > On Thu, Sep 18, 2025, Binbin Wu wrote:
> > > On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> > > [...]
> > > > +static inline bool cpu_has_vmx_basic_no_hw_errcode(void)
> > > > +{
> > > > +	return	vmcs_config.basic & VMX_BASIC_NO_HW_ERROR_CODE_CC;
> > > > +}
> > > > +
> > > 
> > > I think "_cc" should be appended to the function name, although it would make
> > > the function name longer. Without "_cc", the meaning is different and confusing.
> > 
> > +1, got it fixed up.
> 
> May I ask what the 'CC' means?

Consistency Check.  It's obviously a bit terse in this context, but it's a well-
established acronym in KVM, so I think/hope someone that really wanted to figure
out what it means could so with a bit of searching.

$ git grep -w CC | grep define
svm/nested.c:#define CC KVM_NESTED_VMENTER_CONSISTENCY_CHECK
vmx/hyperv.c:#define CC KVM_NESTED_VMENTER_CONSISTENCY_CHECK
vmx/nested.c:#define CC KVM_NESTED_VMENTER_CONSISTENCY_CHECK

$ git grep -w CC | wc -l
156

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  2025-09-19 13:40                         ` Tom Lendacky
@ 2025-09-19 16:13                           ` John Allen
  2025-09-19 17:29                           ` Edgecombe, Rick P
  1 sibling, 0 replies; 130+ messages in thread
From: John Allen @ 2025-09-19 16:13 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Edgecombe, Rick P, seanjc@google.com, Gao, Chao, Li, Xiaoyao,
	linux-kernel@vger.kernel.org, minipli@grsecurity.net,
	kvm@vger.kernel.org, pbonzini@redhat.com, mlevitsk@redhat.com

On Fri, Sep 19, 2025 at 08:40:15AM -0500, Tom Lendacky wrote:
> On 9/18/25 17:18, John Allen wrote:
> > On Thu, Sep 18, 2025 at 09:42:21PM +0000, Edgecombe, Rick P wrote:
> >> On Thu, 2025-09-18 at 16:23 -0500, John Allen wrote:
> >>> The 32bit selftest still doesn't work properly with sev-es, but that was
> >>> a problem with the previous version too. I suspect there's some
> >>> incompatibility between sev-es and the test, but I haven't been able to
> >>> get a good answer on why that might be.
> >>
> >> You are talking about test_32bit() in test_shadow_stack.c?
> > 
> > Yes, that's right.
> > 
> >>
> >> That test relies on a specific CET arch behavior. If you try to transition to a
> >> 32 bit compatibility mode segment with an SSP with high bits set (outside the 32
> >> bit address space), a #GP will be triggered by the HW. The test verifies that
> >> this happens and the kernel handles it appropriately. Could it be platform/mode
> >> difference and not KVM issue?
> > 
> > I'm fairly certain that this is an issue with any sev-es guest. The
> > unexpected seg fault happens when we isolate the sigaction32 call used
> > in the test regardless of shadow stack support. So I wonder if it's
> > something similar to the case that the test is checking for. Maybe
> > something to do with the C bit.
> 
> Likely something to do with the encryption bit since, if set, will
> generate an invalid address in 32-bit, right?
> 
> For SEV-ES, we transition to 64-bit very quickly because of the use of the
> encryption bit, which is why, for example, we don't support SEV-ES /
> SEV-SNP in the OvmfIa32X64.dsc package.

Ok, I knew this sounded familiar. This came up in a discussion a while
back. The reason this doesn't work is "int 0x80" is blocked in
SEV/SEV-ES guests. See:
b82a8dbd3d2f ("x86/coco: Disable 32-bit emulation by default on TDX and SEV")

So I don't think this should be a blocker for this series, but it is
something we'll want to address in the selftest. However, I'm not sure
how we can check if we're running from an SEV or SEV-ES guest from
userspace. Maybe we could attempt the int 0x80 and catch the seg fault
in which case we assume that we're running under SEV or SEV-ES or some
other situation where int 0x80 isn't supported? Seems hacky and like it
could mask other failures.

Thanks,
John

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  2025-09-19 13:40                         ` Tom Lendacky
  2025-09-19 16:13                           ` John Allen
@ 2025-09-19 17:29                           ` Edgecombe, Rick P
  2025-09-19 20:58                             ` Edgecombe, Rick P
  1 sibling, 1 reply; 130+ messages in thread
From: Edgecombe, Rick P @ 2025-09-19 17:29 UTC (permalink / raw)
  To: thomas.lendacky@amd.com, john.allen@amd.com
  Cc: Gao, Chao, seanjc@google.com, Li, Xiaoyao,
	linux-kernel@vger.kernel.org, minipli@grsecurity.net,
	mlevitsk@redhat.com, kvm@vger.kernel.org, pbonzini@redhat.com

On Fri, 2025-09-19 at 08:40 -0500, Tom Lendacky wrote:
> Likely something to do with the encryption bit since, if set, will
> generate an invalid address in 32-bit, right?

But the SSP is a virtual address and c-bit is a physical thing. 

> 
> For SEV-ES, we transition to 64-bit very quickly because of the use of the
> encryption bit, which is why, for example, we don't support SEV-ES /
> SEV-SNP in the OvmfIa32X64.dsc package.

This sounds like it's about the lack of ability to set the c-bit in the page
table, rather than having the C-bit set in a virtual address. In compatibility
mode you are not using 32 bit page tables, so the C-bit should be available like
normal I think. Not an expert in 32 bit/compatibility mode though.



More background on this test/behavior: During the tail end of the shadow stack
enabling, there was a concern raised that we didn't un-support 32 bit shadow
stack cleanly enough. We blocked it from being allowed in 32 bit apps, but
nothing stopped an app from enabling it in 64 bit an then switching to 32 bit
mode without the kernel getting a chance to block it. The simplest, get-it-done
type solution was to just not allocate shadow stacks in the space where they
could be usable in 32 bit mode and let the HW catch it.

But the whole point is just to not allow 32 bit mode CET. Sounds like SEV-ES
covers this another way - don't support 32 bit at all. I wonder if we should
just patch the test to skip the 32 bit test on coco VMs?

PS, we don't support CET on TDX currently even though it doesn't require
everything in this series, but I just remembered (forehead slap) that on the way
upstream the extra CET-TDX exclusion got pulled out. After this series, it would
be allowed in TDX guests as well. So we need to do the same testing in TDX. Let
me see how the test goes in TDX and get back to you.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 13/41] KVM: x86: Enable guest SSP read/write interface with new uAPIs
  2025-09-16  5:52       ` Xiaoyao Li
@ 2025-09-19 17:47         ` Sean Christopherson
  2025-09-19 17:58           ` Sean Christopherson
  0 siblings, 1 reply; 130+ messages in thread
From: Sean Christopherson @ 2025-09-19 17:47 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Zhang Yi Z

On Tue, Sep 16, 2025, Xiaoyao Li wrote:
> On 9/16/2025 6:12 AM, Sean Christopherson wrote:
> > For 6.18, I think the safe play is to go with the first path (exempt KVM-internal
> > MSRs), and then try to go for the second approach (exempt all host accesses) for
> > 6.19.  KVM's ABI for ignore_msrs=true is already all kinds of messed up, so I'm
> > not terribly concerned about temporarily making it marginally worse.
> 
> Looks OK to me.

Actually, better idea.  Just use kvm_msr_{read,write}() for ONE_REG and bypass
the ignore_msrs crud.  It's new uAPI, so we can define the semantics to be anything
we want.  I see zero reason for ignore_msrs to apply to host accesses, and even
less reason for it to apply to ONE_REG.

Then there's no need to special case GUEST_SSP, and what to do about ignore_msrs
for host accesses remains an orthogonal discussion.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4ed25d33aaee..4adfece25630 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5932,7 +5932,7 @@ static int kvm_get_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val)
 {
        u64 val;
 
-       if (do_get_msr(vcpu, msr, &val))
+       if (kvm_msr_read(vcpu, msr, &val))
                return -EINVAL;
 
        if (put_user(val, user_val))
@@ -5948,7 +5948,7 @@ static int kvm_set_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val)
        if (get_user(val, user_val))
                return -EFAULT;
 
-       if (do_set_msr(vcpu, msr, &val))
+       if (kvm_msr_write(vcpu, msr, &val))
                return -EINVAL;
 
        return 0;

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 13/41] KVM: x86: Enable guest SSP read/write interface with new uAPIs
  2025-09-19 17:47         ` Sean Christopherson
@ 2025-09-19 17:58           ` Sean Christopherson
  0 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-19 17:58 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Zhang Yi Z

On Fri, Sep 19, 2025, Sean Christopherson wrote:
> On Tue, Sep 16, 2025, Xiaoyao Li wrote:
> > On 9/16/2025 6:12 AM, Sean Christopherson wrote:
> > > For 6.18, I think the safe play is to go with the first path (exempt KVM-internal
> > > MSRs), and then try to go for the second approach (exempt all host accesses) for
> > > 6.19.  KVM's ABI for ignore_msrs=true is already all kinds of messed up, so I'm
> > > not terribly concerned about temporarily making it marginally worse.
> > 
> > Looks OK to me.
> 
> Actually, better idea.  Just use kvm_msr_{read,write}() for ONE_REG and bypass
> the ignore_msrs crud.  It's new uAPI, so we can define the semantics to be anything
> we want.  I see zero reason for ignore_msrs to apply to host accesses, and even
> less reason for it to apply to ONE_REG.
> 
> Then there's no need to special case GUEST_SSP, and what to do about ignore_msrs
> for host accesses remains an orthogonal discussion.
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4ed25d33aaee..4adfece25630 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5932,7 +5932,7 @@ static int kvm_get_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val)
>  {
>         u64 val;
>  
> -       if (do_get_msr(vcpu, msr, &val))
> +       if (kvm_msr_read(vcpu, msr, &val))
>                 return -EINVAL;
>  
>         if (put_user(val, user_val))
> @@ -5948,7 +5948,7 @@ static int kvm_set_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val)
>         if (get_user(val, user_val))
>                 return -EFAULT;
>  
> -       if (do_set_msr(vcpu, msr, &val))
> +       if (kvm_msr_write(vcpu, msr, &val))
>                 return -EINVAL;
>  
>         return 0;

Never mind, that would cause problems for using ONE_REG for actual MSRs.  Most
importantly, it would let userspace bypass the feature MSR restrictions in
do_set_msr().

I think the best option is to immediately reject translation.  That way host
accesses to whatever KVM uses for the internal SSP MSR index are unaffected by
the introduction of ONE_REG support.  E.g. modifying kvm_do_msr_access() would
mean that userspace would see different behavior for MSR_KVM_INTERNAL_GUEST_SSP
versus all other MSRs.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ab7f8c41d93b..720540f102e1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6016,10 +6016,20 @@ struct kvm_x86_reg_id {
        __u8  x86;
 };
 
-static int kvm_translate_kvm_reg(struct kvm_x86_reg_id *reg)
+static int kvm_translate_kvm_reg(struct kvm_vcpu *vcpu,
+                                struct kvm_x86_reg_id *reg)
 {
        switch (reg->index) {
        case KVM_REG_GUEST_SSP:
+               /*
+                * FIXME: If host-initiated accesses are ever exempted from
+                * ignore_msrs (in kvm_do_msr_access()), drop this manual check
+                * and rely on KVM's standard checks to reject accesses to regs
+                * that don't exist.
+                */
+               if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+                       return -EINVAL;
+
                reg->type = KVM_X86_REG_TYPE_MSR;
                reg->index = MSR_KVM_INTERNAL_GUEST_SSP;
                break;
@@ -6075,7 +6085,7 @@ static int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl,
                return -EINVAL;
 
        if (reg->type == KVM_X86_REG_TYPE_KVM) {
-               r = kvm_translate_kvm_reg(reg);
+               r = kvm_translate_kvm_reg(vcpu, reg);
                if (r)
                        return r;
        }

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  2025-09-19 17:29                           ` Edgecombe, Rick P
@ 2025-09-19 20:58                             ` Edgecombe, Rick P
  2025-09-22  9:19                               ` Kiryl Shutsemau
  0 siblings, 1 reply; 130+ messages in thread
From: Edgecombe, Rick P @ 2025-09-19 20:58 UTC (permalink / raw)
  To: thomas.lendacky@amd.com, kas@kernel.org, john.allen@amd.com
  Cc: Gao, Chao, seanjc@google.com, Li, Xiaoyao,
	linux-kernel@vger.kernel.org, minipli@grsecurity.net,
	mlevitsk@redhat.com, kvm@vger.kernel.org, pbonzini@redhat.com

+Kiryl, a CET selftest that does int80 fails on SEV-ES.

On Fri, 2025-09-19 at 10:29 -0700, Rick Edgecombe wrote:
> PS, we don't support CET on TDX currently even though it doesn't require
> everything in this series, but I just remembered (forehead slap) that on the way
> upstream the extra CET-TDX exclusion got pulled out. After this series, it would
> be allowed in TDX guests as well. So we need to do the same testing in TDX. Let
> me see how the test goes in TDX and get back to you.

The test passes on a TDX guest:

[INFO]	new_ssp = 7f8c8d7ffff8, *new_ssp = 7f8c8d800001
[INFO]	changing ssp from 7f8c8e1ffff0 to 7f8c8d7ffff8
[INFO]	ssp is now 7f8c8d800000
[OK]	Shadow stack pivot
[OK]	Shadow stack faults
[INFO]	Corrupting shadow stack
[INFO]	Generated shadow stack violation successfully
[OK]	Shadow stack violation test
[INFO]	Gup read -> shstk access success
[INFO]	Gup write -> shstk access success
[INFO]	Violation from normal write
[INFO]	Gup read -> write access success
[INFO]	Violation from normal write
[INFO]	Gup write -> write access success
[INFO]	Cow gup write -> write access success
[OK]	Shadow gup test
[INFO]	Violation from shstk access
[OK]	mprotect() test
[OK]	Userfaultfd test
[OK]	Guard gap test, other mapping's gaps
[OK]	Guard gap test, placement mapping's gaps
[OK]	Ptrace test
[OK]	32 bit test
[OK]	Uretprobe test


I guess int 80 was re-enabled for TDX, after being disabled for both coco
families. See commits starting back from f4116bfc4462 ("x86/tdx: Allow 32-bit
emulation by default"). Not sure why it was done that way. If there is some way
to re-enable int80 for SEV-ES too, we can leave the test as is. But if you
decide to disable the 32 bit test to resolve this, please leave it working for
TDX.



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 14/41] KVM: VMX: Emulate read and write to CET MSRs
  2025-09-16  7:07   ` Xiaoyao Li
  2025-09-16  7:48     ` Chao Gao
@ 2025-09-19 22:11     ` Sean Christopherson
  1 sibling, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:11 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Zhang Yi Z

On Tue, Sep 16, 2025, Xiaoyao Li wrote:
> On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> > From: Yang Weijiang <weijiang.yang@intel.com>
> > 
> > Add emulation interface for CET MSR access. The emulation code is split
> > into common part and vendor specific part. The former does common checks
> > for MSRs, e.g., accessibility, data validity etc., then passes operation
> > to either XSAVE-managed MSRs via the helpers or CET VMCS fields.
> > 
> > SSP can only be read via RDSSP. Writing even requires destructive and
> > potentially faulting operations such as SAVEPREVSSP/RSTORSSP or
> > SETSSBSY/CLRSSBSY. Let the host use a pseudo-MSR that is just a wrapper
> > for the GUEST_SSP field of the VMCS.
> > 
> > Suggested-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > Tested-by: Mathias Krause <minipli@grsecurity.net>
> > Tested-by: John Allen <john.allen@amd.com>
> > Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> > Signed-off-by: Chao Gao <chao.gao@intel.com>
> > [sean: drop call to kvm_set_xstate_msr() for S_CET, consolidate code]
> 
> Is the change/update of "drop call to kvm_set_xstate_msr() for S_CET" true
> for this patch?

Yes?  My comment there is stating what I did relative to the patch Chao sent.
It's not relative to any existing code.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 19/41] KVM: x86: Enable CET virtualization for VMX and advertise to userspace
  2025-09-18  1:57   ` Binbin Wu
@ 2025-09-19 22:57     ` Sean Christopherson
  0 siblings, 0 replies; 130+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:57 UTC (permalink / raw)
  To: Binbin Wu
  Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
	John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Xiaoyao Li,
	Zhang Yi Z

On Thu, Sep 18, 2025, Binbin Wu wrote:
> 
> 
> On 9/13/2025 7:22 AM, Sean Christopherson wrote:
> > From: Yang Weijiang <weijiang.yang@intel.com>
> > 
> > Expose CET features to guest if KVM/host can support them, clear CPUID
> > feature bits if KVM/host cannot support.
> > 
> > Set CPUID feature bits so that CET features are available in guest CPUID.
> > Add CR4.CET bit support in order to allow guest set CET master control
> > bit.
> > 
> > Disable KVM CET feature if unrestricted_guest is unsupported/disabled as
> > KVM does not support emulating CET.
> > 
> > The CET load-bits in VM_ENTRY/VM_EXIT control fields should be set to make
> > guest CET xstates isolated from host's.
> > 
> > On platforms with VMX_BASIC[bit56] == 0, inject #CP at VMX entry with error
> > code will fail, and if VMX_BASIC[bit56] == 1, #CP injection with or without
> > error code is allowed. Disable CET feature bits if the MSR bit is cleared
> > so that nested VMM can inject #CP if and only if VMX_BASIC[bit56] == 1.
> > 
> > Don't expose CET feature if either of {U,S}_CET xstate bits is cleared
> > in host XSS or if XSAVES isn't supported.
> > 
> > CET MSRs are reset to 0s after RESET, power-up and INIT, clear guest CET
> > xsave-area fields so that guest CET MSRs are reset to 0s after the events.
> > 
> > Meanwhile explicitly disable SHSTK and IBT for SVM because CET KVM enabling
> > for SVM is not ready.
> > 
> > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > Signed-off-by: Mathias Krause <minipli@grsecurity.net>
> > Tested-by: Mathias Krause <minipli@grsecurity.net>
> > Tested-by: John Allen <john.allen@amd.com>
> > Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> > Signed-off-by: Chao Gao <chao.gao@intel.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> 
> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> 
> One nit below.
> 
> [...]
> > 			\
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 15f208c44cbd..c78acab2ff3f 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -226,7 +226,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
> >    * PT via guest XSTATE would clobber perf state), i.e. KVM doesn't support
> >    * IA32_XSS[bit 8] (guests can/must use RDMSR/WRMSR to save/restore PT MSRs).
> >    */
> > -#define KVM_SUPPORTED_XSS     0
> > +#define KVM_SUPPORTED_XSS	(XFEATURE_MASK_CET_USER | \
> > +				 XFEATURE_MASK_CET_KERNEL)
> 
> Since XFEATURE_MASK_CET_USER and XFEATURE_MASK_CET_KERNEL are always checked or
> set together, does it make sense to use a macro for the two bits?

Good call.  I was going to say "eh, we can do that later", but it's a massive
improvement for readability.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  2025-09-19 20:58                             ` Edgecombe, Rick P
@ 2025-09-22  9:19                               ` Kiryl Shutsemau
  2025-09-22  9:33                                 ` Upadhyay, Neeraj
  0 siblings, 1 reply; 130+ messages in thread
From: Kiryl Shutsemau @ 2025-09-22  9:19 UTC (permalink / raw)
  To: Neeraj Upadhyay, Edgecombe, Rick P
  Cc: thomas.lendacky@amd.com, john.allen@amd.com, Gao, Chao,
	seanjc@google.com, Li, Xiaoyao, linux-kernel@vger.kernel.org,
	minipli@grsecurity.net, mlevitsk@redhat.com, kvm@vger.kernel.org,
	pbonzini@redhat.com

On Fri, Sep 19, 2025 at 08:58:45PM +0000, Edgecombe, Rick P wrote:
> +Kiryl, a CET selftest that does int80 fails on SEV-ES.
> 
> On Fri, 2025-09-19 at 10:29 -0700, Rick Edgecombe wrote:
> > PS, we don't support CET on TDX currently even though it doesn't require
> > everything in this series, but I just remembered (forehead slap) that on the way
> > upstream the extra CET-TDX exclusion got pulled out. After this series, it would
> > be allowed in TDX guests as well. So we need to do the same testing in TDX. Let
> > me see how the test goes in TDX and get back to you.
> 
> The test passes on a TDX guest:
> 
> [INFO]	new_ssp = 7f8c8d7ffff8, *new_ssp = 7f8c8d800001
> [INFO]	changing ssp from 7f8c8e1ffff0 to 7f8c8d7ffff8
> [INFO]	ssp is now 7f8c8d800000
> [OK]	Shadow stack pivot
> [OK]	Shadow stack faults
> [INFO]	Corrupting shadow stack
> [INFO]	Generated shadow stack violation successfully
> [OK]	Shadow stack violation test
> [INFO]	Gup read -> shstk access success
> [INFO]	Gup write -> shstk access success
> [INFO]	Violation from normal write
> [INFO]	Gup read -> write access success
> [INFO]	Violation from normal write
> [INFO]	Gup write -> write access success
> [INFO]	Cow gup write -> write access success
> [OK]	Shadow gup test
> [INFO]	Violation from shstk access
> [OK]	mprotect() test
> [OK]	Userfaultfd test
> [OK]	Guard gap test, other mapping's gaps
> [OK]	Guard gap test, placement mapping's gaps
> [OK]	Ptrace test
> [OK]	32 bit test
> [OK]	Uretprobe test
> 
> 
> I guess int 80 was re-enabled for TDX, after being disabled for both coco
> families. See commits starting back from f4116bfc4462 ("x86/tdx: Allow 32-bit
> emulation by default"). Not sure why it was done that way. If there is some way
> to re-enable int80 for SEV-ES too, we can leave the test as is. But if you
> decide to disable the 32 bit test to resolve this, please leave it working for
> TDX.

In TDX case, VAPIC state is protected VMM. It covers ISR, so guest can
safely check ISR to detect if the exception is external or internal.

IIUC, VAPIC state is controlled by VMM in SEV case and ISR is not
reliable.

I am not sure if Secure AVIC[1] changes the situation for AMD.

Neeraj?

[1] https://lore.kernel.org/all/20250811094444.203161-1-Neeraj.Upadhyay@amd.com/

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  2025-09-22  9:19                               ` Kiryl Shutsemau
@ 2025-09-22  9:33                                 ` Upadhyay, Neeraj
  2025-09-22  9:54                                   ` Kiryl Shutsemau
  0 siblings, 1 reply; 130+ messages in thread
From: Upadhyay, Neeraj @ 2025-09-22  9:33 UTC (permalink / raw)
  To: Kiryl Shutsemau, Edgecombe, Rick P
  Cc: thomas.lendacky@amd.com, john.allen@amd.com, Gao, Chao,
	seanjc@google.com, Li, Xiaoyao, linux-kernel@vger.kernel.org,
	minipli@grsecurity.net, mlevitsk@redhat.com, kvm@vger.kernel.org,
	pbonzini@redhat.com, naveen.rao


> 
> In TDX case, VAPIC state is protected VMM. It covers ISR, so guest can
> safely check ISR to detect if the exception is external or internal.
> 
> IIUC, VAPIC state is controlled by VMM in SEV case and ISR is not
> reliable.
> 
> I am not sure if Secure AVIC[1] changes the situation for AMD.
> 
> Neeraj?
> 

For Secure AVIC enabled guests, guest's vAPIC ISR state is not visible 
to (and not controlled by) host or VMM.


- Neeraj


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  2025-09-22  9:33                                 ` Upadhyay, Neeraj
@ 2025-09-22  9:54                                   ` Kiryl Shutsemau
  0 siblings, 0 replies; 130+ messages in thread
From: Kiryl Shutsemau @ 2025-09-22  9:54 UTC (permalink / raw)
  To: Upadhyay, Neeraj
  Cc: Edgecombe, Rick P, thomas.lendacky@amd.com, john.allen@amd.com,
	Gao, Chao, seanjc@google.com, Li, Xiaoyao,
	linux-kernel@vger.kernel.org, minipli@grsecurity.net,
	mlevitsk@redhat.com, kvm@vger.kernel.org, pbonzini@redhat.com,
	naveen.rao

On Mon, Sep 22, 2025 at 03:03:59PM +0530, Upadhyay, Neeraj wrote:
> 
> > 
> > In TDX case, VAPIC state is protected VMM. It covers ISR, so guest can
> > safely check ISR to detect if the exception is external or internal.
> > 
> > IIUC, VAPIC state is controlled by VMM in SEV case and ISR is not
> > reliable.
> > 
> > I am not sure if Secure AVIC[1] changes the situation for AMD.
> > 
> > Neeraj?
> > 
> 
> For Secure AVIC enabled guests, guest's vAPIC ISR state is not visible to
> (and not controlled by) host or VMM.

In this case, I think you should make ia32_disable() in sme_early_init()
conditional on !Secure AVIC.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 130+ messages in thread

end of thread, other threads:[~2025-09-22  9:54 UTC | newest]

Thread overview: 130+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-12 23:22 [PATCH v15 00/41] KVM: x86: Mega-CET Sean Christopherson
2025-09-12 23:22 ` [PATCH v15 01/41] KVM: SEV: Rename kvm_ghcb_get_sw_exit_code() to kvm_get_cached_sw_exit_code() Sean Christopherson
2025-09-15 16:15   ` Tom Lendacky
2025-09-15 16:30     ` Sean Christopherson
2025-09-12 23:22 ` [PATCH v15 02/41] KVM: SEV: Read save fields from GHCB exactly once Sean Christopherson
2025-09-15 17:32   ` Tom Lendacky
2025-09-15 21:08     ` Sean Christopherson
2025-09-17 21:47       ` Sean Christopherson
2025-09-12 23:22 ` [PATCH v15 03/41] KVM: SEV: Validate XCR0 provided by guest in GHCB Sean Christopherson
2025-09-15 18:41   ` Tom Lendacky
2025-09-15 21:22     ` Sean Christopherson
2025-09-12 23:22 ` [PATCH v15 04/41] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support Sean Christopherson
2025-09-15  6:29   ` Xiaoyao Li
2025-09-16  7:10   ` Binbin Wu
2025-09-17 13:14     ` Sean Christopherson
2025-09-12 23:22 ` [PATCH v15 05/41] KVM: x86: Report XSS as to-be-saved if there are supported features Sean Christopherson
2025-09-16  7:12   ` Binbin Wu
2025-09-12 23:22 ` [PATCH v15 06/41] KVM: x86: Check XSS validity against guest CPUIDs Sean Christopherson
2025-09-16  7:20   ` Binbin Wu
2025-09-12 23:22 ` [PATCH v15 07/41] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS Sean Christopherson
2025-09-16  7:23   ` Binbin Wu
2025-09-12 23:22 ` [PATCH v15 08/41] KVM: x86: Initialize kvm_caps.supported_xss Sean Christopherson
2025-09-16  7:29   ` Binbin Wu
2025-09-12 23:22 ` [PATCH v15 09/41] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Sean Christopherson
2025-09-15 17:04   ` Xin Li
2025-09-16  6:51   ` Xiaoyao Li
2025-09-16  8:28   ` Binbin Wu
2025-09-17  2:51     ` Binbin Wu
2025-09-17 12:47     ` Sean Christopherson
2025-09-17 21:56       ` Sean Christopherson
2025-09-12 23:22 ` [PATCH v15 10/41] KVM: x86: Add fault checks for guest CR4.CET setting Sean Christopherson
2025-09-16  8:33   ` Binbin Wu
2025-09-12 23:22 ` [PATCH v15 11/41] KVM: x86: Report KVM supported CET MSRs as to-be-saved Sean Christopherson
2025-09-15  6:30   ` Xiaoyao Li
2025-09-16  8:46   ` Binbin Wu
2025-09-12 23:22 ` [PATCH v15 12/41] KVM: VMX: Introduce CET VMCS fields and control bits Sean Christopherson
2025-09-15  6:31   ` Xiaoyao Li
2025-09-16  9:00   ` Binbin Wu
2025-09-12 23:22 ` [PATCH v15 13/41] KVM: x86: Enable guest SSP read/write interface with new uAPIs Sean Christopherson
2025-09-15  6:55   ` Xiaoyao Li
2025-09-15 22:12     ` Sean Christopherson
2025-09-16  5:52       ` Xiaoyao Li
2025-09-19 17:47         ` Sean Christopherson
2025-09-19 17:58           ` Sean Christopherson
2025-09-12 23:22 ` [PATCH v15 14/41] KVM: VMX: Emulate read and write to CET MSRs Sean Christopherson
2025-09-16  7:07   ` Xiaoyao Li
2025-09-16  7:48     ` Chao Gao
2025-09-16  8:10       ` Xiaoyao Li
2025-09-19 22:11     ` Sean Christopherson
2025-09-17  7:52   ` Binbin Wu
2025-09-12 23:22 ` [PATCH v15 15/41] KVM: x86: Save and reload SSP to/from SMRAM Sean Christopherson
2025-09-16  7:37   ` Xiaoyao Li
2025-09-17  7:53   ` Binbin Wu
2025-09-12 23:22 ` [PATCH v15 16/41] KVM: VMX: Set up interception for CET MSRs Sean Christopherson
2025-09-15 17:21   ` Xin Li
2025-09-16  7:40   ` Xiaoyao Li
2025-09-17  8:32   ` Binbin Wu
2025-09-17 13:44     ` Sean Christopherson
2025-09-12 23:22 ` [PATCH v15 17/41] KVM: VMX: Set host constant supervisor states to VMCS fields Sean Christopherson
2025-09-16  7:44   ` Xiaoyao Li
2025-09-17  8:48   ` Xiaoyao Li
2025-09-17 21:25     ` Sean Christopherson
2025-09-12 23:22 ` [PATCH v15 18/41] KVM: x86: Don't emulate instructions affected by CET features Sean Christopherson
2025-09-17  8:16   ` Chao Gao
2025-09-17 21:15     ` Sean Christopherson
2025-09-18 14:54       ` Chao Gao
2025-09-18 18:02         ` Sean Christopherson
2025-09-17  8:19   ` Xiaoyao Li
2025-09-18 14:15     ` Chao Gao
2025-09-19  1:25       ` Sean Christopherson
2025-09-17  8:45   ` Binbin Wu
2025-09-12 23:22 ` [PATCH v15 19/41] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Sean Christopherson
2025-09-18  1:57   ` Binbin Wu
2025-09-19 22:57     ` Sean Christopherson
2025-09-18  2:18   ` Binbin Wu
2025-09-18 18:05     ` Sean Christopherson
2025-09-19  7:10       ` Xiaoyao Li
2025-09-19 14:25         ` Sean Christopherson
2025-09-12 23:22 ` [PATCH v15 20/41] KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2 Sean Christopherson
2025-09-18  2:27   ` Binbin Wu
2025-09-12 23:22 ` [PATCH v15 21/41] KVM: nVMX: Prepare for enabling CET support for nested guest Sean Christopherson
2025-09-15 17:45   ` Xin Li
2025-09-18  4:48   ` Xin Li
2025-09-18 18:05     ` Sean Christopherson
2025-09-12 23:23 ` [PATCH v15 22/41] KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET Sean Christopherson
2025-09-12 23:23 ` [PATCH v15 23/41] KVM: nVMX: Add consistency checks for CET states Sean Christopherson
2025-09-12 23:23 ` [PATCH v15 24/41] KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state Sean Christopherson
2025-09-12 23:23 ` [PATCH v15 25/41] KVM: x86: SVM: Emulate reads and writes to shadow stack MSRs Sean Christopherson
2025-09-15 17:56   ` Xin Li
2025-09-15 20:43     ` Sean Christopherson
2025-09-12 23:23 ` [PATCH v15 26/41] KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02 Sean Christopherson
2025-09-12 23:23 ` [PATCH v15 27/41] KVM: x86: SVM: Update dump_vmcb with shadow stack save area additions Sean Christopherson
2025-09-12 23:23 ` [PATCH v15 28/41] KVM: x86: SVM: Pass through shadow stack MSRs as appropriate Sean Christopherson
2025-09-12 23:23 ` [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid Sean Christopherson
2025-09-16 18:55   ` John Allen
2025-09-16 19:53     ` Sean Christopherson
2025-09-16 20:33       ` John Allen
2025-09-16 21:38         ` Sean Christopherson
2025-09-16 22:55           ` John Allen
2025-09-18 19:48             ` John Allen
2025-09-18 20:34               ` Sean Christopherson
2025-09-18 20:44                 ` Sean Christopherson
2025-09-18 21:23                   ` John Allen
2025-09-18 21:42                     ` Edgecombe, Rick P
2025-09-18 22:18                       ` John Allen
2025-09-19 13:40                         ` Tom Lendacky
2025-09-19 16:13                           ` John Allen
2025-09-19 17:29                           ` Edgecombe, Rick P
2025-09-19 20:58                             ` Edgecombe, Rick P
2025-09-22  9:19                               ` Kiryl Shutsemau
2025-09-22  9:33                                 ` Upadhyay, Neeraj
2025-09-22  9:54                                   ` Kiryl Shutsemau
2025-09-12 23:23 ` [PATCH v15 30/41] KVM: SVM: Enable shadow stack virtualization for SVM Sean Christopherson
2025-09-12 23:23 ` [PATCH v15 31/41] KVM: x86: Add human friendly formatting for #XM, and #VE Sean Christopherson
2025-09-12 23:23 ` [PATCH v15 32/41] KVM: x86: Define Control Protection Exception (#CP) vector Sean Christopherson
2025-09-12 23:23 ` [PATCH v15 33/41] KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors Sean Christopherson
2025-09-12 23:23 ` [PATCH v15 34/41] KVM: selftests: Add ex_str() to print human friendly name of " Sean Christopherson
2025-09-15  9:07   ` Chao Gao
2025-09-12 23:23 ` [PATCH v15 35/41] KVM: selftests: Add an MSR test to exercise guest/host and read/write Sean Christopherson
2025-09-15  8:22   ` Chao Gao
2025-09-15 17:00     ` Sean Christopherson
2025-09-12 23:23 ` [PATCH v15 36/41] KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test Sean Christopherson
2025-09-12 23:23 ` [PATCH v15 37/41] KVM: selftests: Extend MSRs test to validate vCPUs without supported features Sean Christopherson
2025-09-12 23:23 ` [PATCH v15 38/41] KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test Sean Christopherson
2025-09-12 23:23 ` [PATCH v15 39/41] KVM: selftests: Add coverate for KVM-defined registers in " Sean Christopherson
2025-09-12 23:23 ` [PATCH v15 40/41] KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported Sean Christopherson
2025-09-12 23:23 ` [PATCH v15 41/41] KVM: VMX: Make CR4.CET a guest owned bit Sean Christopherson
2025-09-15 13:18 ` [PATCH v15 00/41] KVM: x86: Mega-CET Mathias Krause
2025-09-15 21:20 ` John Allen
2025-09-16 13:53 ` Chao Gao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).