[PATCH v11 00/23] Enable CET Virtualization

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v11 00/23] Enable CET Virtualization
@ 2025-07-04  8:49 Chao Gao
  2025-07-04  8:49 ` [PATCH v11 01/23] KVM: x86: Rename kvm_{g,s}et_msr()* to show that they emulate guest accesses Chao Gao
                   ` (24 more replies)
  0 siblings, 25 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Chao Gao, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Ingo Molnar, Thomas Gleixner

The FPU support for CET virtualization has already been merged into the tip
tree. This v11 adds Intel CET virtualization in KVM and is based on
tip/master plus Sean's MSR cleanups. For your convenience, it is also
available at

  https://github.com/gaochaointel/linux-dev cet-v11

Changes in v11 (Most changes are suggested by Sean. Thanks!):
1. Rebased onto the latest tip tree + Sean's MSR cleanups
2. Made patch 1's shortlog informative and accurate
3. Slotted in two cleanup patches from Sean (patch 3/4)
4. Used KVM_GET/SET_ONE_REG ioctl for userspace to read/write SSP.
   still assigned a KVM-defined MSR index for SSP but the index isn't
   part of uAPI now.
5. Used KVM_MSR_RET_UNSUPPORTED to reject accesses to unsupported CET MSRs
6. Synthesized triple-fault when reading/writing SSP failed during
   entering into SMM or exiting from SMM
7. Removed an inappropriate "quirk" in v10 that advertised IBT to userspace
   when the hardware supports it but the host does not enable it.
8. Disabled IBT/SHSTK explicitly for SVM to avoid them being enabled on
   AMD CPU accidentally before AMD CET series lands. Because IBT/SHSTK are
   advertised in KVM x86 common code but only Intel support is added by
   this series.
9. Re-ordered "Don't emulate branch instructions" (patch 18) before
   advertising CET support to userspace.
10.Added consistency checks for CR4.CET and other CET MSRs during VM-entry
   (patches 22-23)

Control-flow Enforcement Technology (CET) is a kind of CPU feature used
to prevent Return/CALL/Jump-Oriented Programming (ROP/COP/JOP) attacks.
It provides two sub-features(SHSTK,IBT) to defend against ROP/COP/JOP
style control-flow subversion attacks.

Shadow Stack (SHSTK):
  A shadow stack is a second stack used exclusively for control transfer
  operations. The shadow stack is separate from the data/normal stack and
  can be enabled individually in user and kernel mode. When shadow stack
  is enabled, CALL pushes the return address on both the data and shadow
  stack. RET pops the return address from both stacks and compares them.
  If the return addresses from the two stacks do not match, the processor
  generates a #CP.

Indirect Branch Tracking (IBT):
  IBT introduces new instruction(ENDBRANCH)to mark valid target addresses
  of indirect branches (CALL, JMP etc...). If an indirect branch is
  executed and the next instruction is _not_ an ENDBRANCH, the processor
  generates a #CP. These instruction behaves as a NOP on platforms that
  doesn't support CET.

CET states management:
======================
KVM cooperates with host kernel FPU framework to manage guest CET registers.
With CET supervisor mode state support in this series, KVM can save/restore
full guest CET xsave-managed states.

CET user mode and supervisor mode xstates, i.e., MSR_IA32_{U_CET,PL3_SSP}
and MSR_IA32_PL{0,1,2}, depend on host FPU framework to swap guest and host
xstates. On VM-Exit, guest CET xstates are saved to guest fpu area and host
CET xstates are loaded from task/thread context before vCPU returns to
userspace, vice-versa on VM-Entry. See details in kvm_{load,put}_guest_fpu().
So guest CET xstates management depends on CET xstate bits(U_CET/S_CET bit)
set in host XSS MSR.

CET supervisor mode states are grouped into two categories : XSAVE-managed
and non-XSAVE-managed, the former includes MSR_IA32_PL{0,1,2}_SSP and are
controlled by CET supervisor mode bit(S_CET bit) in XSS, the later consists
of MSR_IA32_S_CET and MSR_IA32_INTR_SSP_TBL.

VMX introduces new VMCS fields, {GUEST|HOST}_{S_CET,SSP,INTR_SSP_TABL}, to
facilitate guest/host non-XSAVES-managed states. When VMX CET entry/exit load
bits are set, guest/host MSR_IA32_{S_CET,INTR_SSP_TBL,SSP} are loaded from
equivalent fields at VM-Exit/Entry. With these new fields, such supervisor
states require no addtional KVM save/reload actions.

Tests:
======================
This series passed basic CET user shadow stack test and kernel IBT test in L1
and L2 guest.
The patch series _has_ impact to existing vmx test cases in KVM-unit-tests,the
failures have been fixed here[1].
One new selftest app[2] is introduced for testing CET MSRs accessibilities.

Note, this series hasn't been tested on AMD platform yet.

To run user SHSTK test and kernel IBT test in guest, an CET capable platform
is required, e.g., Sapphire Rapids server, and follow below steps to build
the binaries:

1. Host kernel: Apply this series to mainline kernel (>= v6.6) and build.

2. Guest kernel: Pull kernel (>= v6.6), opt-in CONFIG_X86_KERNEL_IBT
and CONFIG_X86_USER_SHADOW_STACK options. Build with CET enabled gcc versions
(>= 8.5.0).

3. Apply CET QEMU patches[3] before build mainline QEMU.

Check kernel selftest test_shadow_stack_64 output:
[INFO]  new_ssp = 7f8c82100ff8, *new_ssp = 7f8c82101001
[INFO]  changing ssp from 7f8c82900ff0 to 7f8c82100ff8
[INFO]  ssp is now 7f8c82101000
[OK]    Shadow stack pivot
[OK]    Shadow stack faults
[INFO]  Corrupting shadow stack
[INFO]  Generated shadow stack violation successfully
[OK]    Shadow stack violation test
[INFO]  Gup read -> shstk access success
[INFO]  Gup write -> shstk access success
[INFO]  Violation from normal write
[INFO]  Gup read -> write access success
[INFO]  Violation from normal write
[INFO]  Gup write -> write access success
[INFO]  Cow gup write -> write access success
[OK]    Shadow gup test
[INFO]  Violation from shstk access
[OK]    mprotect() test
[SKIP]  Userfaultfd unavailable.
[OK]    32 bit test

Chao Gao (3):
  KVM: x86: Zero XSTATE components on INIT by iterating over supported
    features
  KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET
  KVM: nVMX: Add consistency checks for CET states

Sean Christopherson (3):
  KVM: x86: Manually clear MPX state only on INIT
  KVM: x86: Report XSS as to-be-saved if there are supported features
  KVM: x86: Load guest FPU state when access XSAVE-managed MSRs

Yang Weijiang (17):
  KVM: x86: Rename kvm_{g,s}et_msr()* to show that they emulate guest
    accesses
  KVM: x86: Add kvm_msr_{read,write}() helpers
  KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support
  KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS
  KVM: x86: Initialize kvm_caps.supported_xss
  KVM: x86: Add fault checks for guest CR4.CET setting
  KVM: x86: Report KVM supported CET MSRs as to-be-saved
  KVM: VMX: Introduce CET VMCS fields and control bits
  KVM: x86: Enable guest SSP read/write interface with new uAPIs
  KVM: VMX: Emulate read and write to CET MSRs
  KVM: x86: Save and reload SSP to/from SMRAM
  KVM: VMX: Set up interception for CET MSRs
  KVM: VMX: Set host constant supervisor states to VMCS fields
  KVM: x86: Don't emulate instructions guarded by CET
  KVM: x86: Enable CET virtualization for VMX and advertise to userspace
  KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2
  KVM: nVMX: Enable CET support for nested guest

 arch/x86/include/asm/kvm_host.h |  16 +-
 arch/x86/include/asm/vmx.h      |   9 +
 arch/x86/include/uapi/asm/kvm.h |  13 ++
 arch/x86/kvm/cpuid.c            |  19 +-
 arch/x86/kvm/emulate.c          |  46 +++--
 arch/x86/kvm/smm.c              |  12 +-
 arch/x86/kvm/smm.h              |   2 +-
 arch/x86/kvm/svm/svm.c          |   4 +
 arch/x86/kvm/vmx/capabilities.h |   9 +
 arch/x86/kvm/vmx/nested.c       | 174 +++++++++++++++--
 arch/x86/kvm/vmx/nested.h       |   5 +
 arch/x86/kvm/vmx/vmcs12.c       |   6 +
 arch/x86/kvm/vmx/vmcs12.h       |  14 +-
 arch/x86/kvm/vmx/vmx.c          |  85 ++++++++-
 arch/x86/kvm/vmx/vmx.h          |   9 +-
 arch/x86/kvm/x86.c              | 326 ++++++++++++++++++++++++++++----
 arch/x86/kvm/x86.h              |  61 ++++++
 17 files changed, 725 insertions(+), 85 deletions(-)

-- 
2.47.1

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v11 01/23] KVM: x86: Rename kvm_{g,s}et_msr()* to show that they emulate guest accesses
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-24 11:37   ` Huang, Kai
  2025-07-28 22:31   ` Xin Li
  2025-07-04  8:49 ` [PATCH v11 02/23] KVM: x86: Add kvm_msr_{read,write}() helpers Chao Gao
                   ` (23 subsequent siblings)
  24 siblings, 2 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Chao Gao, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

From: Yang Weijiang <weijiang.yang@intel.com>

Rename kvm_{g,s}et_msr()* to kvm_emulate_msr_{read,write}()* to make it
more obvious that KVM uses these helpers to emulate guest behaviors,
i.e., host_initiated == false in these helpers.

Suggested-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  8 ++++----
 arch/x86/kvm/smm.c              |  4 ++--
 arch/x86/kvm/vmx/nested.c       | 13 +++++++------
 arch/x86/kvm/x86.c              | 28 +++++++++++++++-------------
 4 files changed, 28 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 142a8421400f..1f3f8601747f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2150,11 +2150,11 @@ void kvm_prepare_event_vectoring_exit(struct kvm_vcpu *vcpu, gpa_t gpa);
 
 void kvm_enable_efer_bits(u64);
 bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer);
-int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data);
-int kvm_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data);
+int kvm_emulate_msr_read_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data);
+int kvm_emulate_msr_write_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data);
 int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data, bool host_initiated);
-int kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data);
-int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data);
+int kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data);
+int kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data);
 int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu);
 int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu);
 int kvm_emulate_as_nop(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
index 9864c057187d..51d0646622ef 100644
--- a/arch/x86/kvm/smm.c
+++ b/arch/x86/kvm/smm.c
@@ -529,7 +529,7 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
 
 	vcpu->arch.smbase =         smstate->smbase;
 
-	if (kvm_set_msr(vcpu, MSR_EFER, smstate->efer & ~EFER_LMA))
+	if (kvm_emulate_msr_write(vcpu, MSR_EFER, smstate->efer & ~EFER_LMA))
 		return X86EMUL_UNHANDLEABLE;
 
 	rsm_load_seg_64(vcpu, &smstate->tr, VCPU_SREG_TR);
@@ -620,7 +620,7 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
 
 		/* And finally go back to 32-bit mode.  */
 		efer = 0;
-		kvm_set_msr(vcpu, MSR_EFER, efer);
+		kvm_emulate_msr_write(vcpu, MSR_EFER, efer);
 	}
 #endif
 
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index c69df3aba8d1..e7374834453c 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -991,7 +991,7 @@ static u32 nested_vmx_load_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
 				__func__, i, e.index, e.reserved);
 			goto fail;
 		}
-		if (kvm_set_msr_with_filter(vcpu, e.index, e.value)) {
+		if (kvm_emulate_msr_write_with_filter(vcpu, e.index, e.value)) {
 			pr_debug_ratelimited(
 				"%s cannot write MSR (%u, 0x%x, 0x%llx)\n",
 				__func__, i, e.index, e.value);
@@ -1027,7 +1027,7 @@ static bool nested_vmx_get_vmexit_msr_value(struct kvm_vcpu *vcpu,
 		}
 	}
 
-	if (kvm_get_msr_with_filter(vcpu, msr_index, data)) {
+	if (kvm_emulate_msr_read_with_filter(vcpu, msr_index, data)) {
 		pr_debug_ratelimited("%s cannot read MSR (0x%x)\n", __func__,
 			msr_index);
 		return false;
@@ -2764,7 +2764,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 
 	if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) &&
 	    kvm_pmu_has_perf_global_ctrl(vcpu_to_pmu(vcpu)) &&
-	    WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL,
+	    WARN_ON_ONCE(kvm_emulate_msr_write(vcpu, MSR_CORE_PERF_GLOBAL_CTRL,
 				     vmcs12->guest_ia32_perf_global_ctrl))) {
 		*entry_failure_code = ENTRY_FAIL_DEFAULT;
 		return -EINVAL;
@@ -4752,8 +4752,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
 	}
 	if ((vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL) &&
 	    kvm_pmu_has_perf_global_ctrl(vcpu_to_pmu(vcpu)))
-		WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL,
-					 vmcs12->host_ia32_perf_global_ctrl));
+		WARN_ON_ONCE(kvm_emulate_msr_write(vcpu,
+					MSR_CORE_PERF_GLOBAL_CTRL,
+					vmcs12->host_ia32_perf_global_ctrl));
 
 	/* Set L1 segment info according to Intel SDM
 	    27.5.2 Loading Host Segment and Descriptor-Table Registers */
@@ -4931,7 +4932,7 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
 				goto vmabort;
 			}
 
-			if (kvm_set_msr_with_filter(vcpu, h.index, h.value)) {
+			if (kvm_emulate_msr_write_with_filter(vcpu, h.index, h.value)) {
 				pr_debug_ratelimited(
 					"%s WRMSR failed (%u, 0x%x, 0x%llx)\n",
 					__func__, j, h.index, h.value);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7543dac7ae70..11d84075cd14 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1929,33 +1929,35 @@ static int kvm_get_msr_ignored_check(struct kvm_vcpu *vcpu,
 				 __kvm_get_msr);
 }
 
-int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data)
+int kvm_emulate_msr_read_with_filter(struct kvm_vcpu *vcpu, u32 index,
+				     u64 *data)
 {
 	if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ))
 		return KVM_MSR_RET_FILTERED;
 	return kvm_get_msr_ignored_check(vcpu, index, data, false);
 }
-EXPORT_SYMBOL_GPL(kvm_get_msr_with_filter);
+EXPORT_SYMBOL_GPL(kvm_emulate_msr_read_with_filter);
 
-int kvm_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data)
+int kvm_emulate_msr_write_with_filter(struct kvm_vcpu *vcpu, u32 index,
+				      u64 data)
 {
 	if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE))
 		return KVM_MSR_RET_FILTERED;
 	return kvm_set_msr_ignored_check(vcpu, index, data, false);
 }
-EXPORT_SYMBOL_GPL(kvm_set_msr_with_filter);
+EXPORT_SYMBOL_GPL(kvm_emulate_msr_write_with_filter);
 
-int kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data)
+int kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data)
 {
 	return kvm_get_msr_ignored_check(vcpu, index, data, false);
 }
-EXPORT_SYMBOL_GPL(kvm_get_msr);
+EXPORT_SYMBOL_GPL(kvm_emulate_msr_read);
 
-int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data)
+int kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data)
 {
 	return kvm_set_msr_ignored_check(vcpu, index, data, false);
 }
-EXPORT_SYMBOL_GPL(kvm_set_msr);
+EXPORT_SYMBOL_GPL(kvm_emulate_msr_write);
 
 static void complete_userspace_rdmsr(struct kvm_vcpu *vcpu)
 {
@@ -2027,7 +2029,7 @@ int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu)
 	u64 data;
 	int r;
 
-	r = kvm_get_msr_with_filter(vcpu, ecx, &data);
+	r = kvm_emulate_msr_read_with_filter(vcpu, ecx, &data);
 
 	if (!r) {
 		trace_kvm_msr_read(ecx, data);
@@ -2052,7 +2054,7 @@ int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
 	u64 data = kvm_read_edx_eax(vcpu);
 	int r;
 
-	r = kvm_set_msr_with_filter(vcpu, ecx, data);
+	r = kvm_emulate_msr_write_with_filter(vcpu, ecx, data);
 
 	if (!r) {
 		trace_kvm_msr_write(ecx, data);
@@ -8484,7 +8486,7 @@ static int emulator_get_msr_with_filter(struct x86_emulate_ctxt *ctxt,
 	struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
 	int r;
 
-	r = kvm_get_msr_with_filter(vcpu, msr_index, pdata);
+	r = kvm_emulate_msr_read_with_filter(vcpu, msr_index, pdata);
 	if (r < 0)
 		return X86EMUL_UNHANDLEABLE;
 
@@ -8507,7 +8509,7 @@ static int emulator_set_msr_with_filter(struct x86_emulate_ctxt *ctxt,
 	struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
 	int r;
 
-	r = kvm_set_msr_with_filter(vcpu, msr_index, data);
+	r = kvm_emulate_msr_write_with_filter(vcpu, msr_index, data);
 	if (r < 0)
 		return X86EMUL_UNHANDLEABLE;
 
@@ -8527,7 +8529,7 @@ static int emulator_set_msr_with_filter(struct x86_emulate_ctxt *ctxt,
 static int emulator_get_msr(struct x86_emulate_ctxt *ctxt,
 			    u32 msr_index, u64 *pdata)
 {
-	return kvm_get_msr(emul_to_vcpu(ctxt), msr_index, pdata);
+	return kvm_emulate_msr_read(emul_to_vcpu(ctxt), msr_index, pdata);
 }
 
 static int emulator_check_rdpmc_early(struct x86_emulate_ctxt *ctxt, u32 pmc)
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v11 02/23] KVM: x86: Add kvm_msr_{read,write}() helpers
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
  2025-07-04  8:49 ` [PATCH v11 01/23] KVM: x86: Rename kvm_{g,s}et_msr()* to show that they emulate guest accesses Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-04  8:49 ` [PATCH v11 03/23] KVM: x86: Manually clear MPX state only on INIT Chao Gao
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Chao Gao, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

From: Yang Weijiang <weijiang.yang@intel.com>

Wrap __kvm_{get,set}_msr() into two new helpers for KVM usage and use the
helpers to replace existing usage of the raw functions.
kvm_msr_{read,write}() are KVM-internal helpers, i.e. used when KVM needs
to get/set a MSR value for emulating CPU behavior, i.e., host_initiated ==
%true in the helpers.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/kvm/cpuid.c            |  2 +-
 arch/x86/kvm/x86.c              | 16 +++++++++++++---
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1f3f8601747f..e07a03dbce6a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2152,9 +2152,10 @@ void kvm_enable_efer_bits(u64);
 bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer);
 int kvm_emulate_msr_read_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data);
 int kvm_emulate_msr_write_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data);
-int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data, bool host_initiated);
 int kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data);
 int kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data);
+int kvm_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data);
+int kvm_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data);
 int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu);
 int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu);
 int kvm_emulate_as_nop(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b2d006756e02..9db246671885 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1992,7 +1992,7 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
 		if (function == 7 && index == 0) {
 			u64 data;
 			if ((*ebx & (feature_bit(RTM) | feature_bit(HLE))) &&
-			    !__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) &&
+			    !kvm_msr_read(vcpu, MSR_IA32_TSX_CTRL, &data) &&
 			    (data & TSX_CTRL_CPUID_CLEAR))
 				*ebx &= ~(feature_bit(RTM) | feature_bit(HLE));
 		} else if (function == 0x80000007) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 11d84075cd14..51b37492142c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1895,8 +1895,8 @@ static int kvm_set_msr_ignored_check(struct kvm_vcpu *vcpu,
  * Returns 0 on success, non-0 otherwise.
  * Assumes vcpu_load() was already called.
  */
-int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
-		  bool host_initiated)
+static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
+			 bool host_initiated)
 {
 	struct msr_data msr;
 	int ret;
@@ -1922,6 +1922,16 @@ int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
 	return ret;
 }
 
+int kvm_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data)
+{
+	return __kvm_set_msr(vcpu, index, data, true);
+}
+
+int kvm_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data)
+{
+	return __kvm_get_msr(vcpu, index, data, true);
+}
+
 static int kvm_get_msr_ignored_check(struct kvm_vcpu *vcpu,
 				     u32 index, u64 *data, bool host_initiated)
 {
@@ -12591,7 +12601,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 						  MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
 
 		__kvm_set_xcr(vcpu, 0, XFEATURE_MASK_FP);
-		__kvm_set_msr(vcpu, MSR_IA32_XSS, 0, true);
+		kvm_msr_write(vcpu, MSR_IA32_XSS, 0);
 	}
 
 	/* All GPRs except RDX (handled below) are zeroed on RESET/INIT. */
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v11 03/23] KVM: x86: Manually clear MPX state only on INIT
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
  2025-07-04  8:49 ` [PATCH v11 01/23] KVM: x86: Rename kvm_{g,s}et_msr()* to show that they emulate guest accesses Chao Gao
  2025-07-04  8:49 ` [PATCH v11 02/23] KVM: x86: Add kvm_msr_{read,write}() helpers Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-04  8:49 ` [PATCH v11 04/23] KVM: x86: Zero XSTATE components on INIT by iterating over supported features Chao Gao
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Chao Gao, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

From: Sean Christopherson <seanjc@google.com>

Don't manually clear/zero MPX state on RESET, as the guest FPU state is
zero allocated and KVM only does RESET during vCPU creation, i.e. the
relevant state is guaranteed to be all zeroes.

Opportunistically move the relevant code into a helper in anticipation of
adding support for CET shadow stacks, which also has state that is zeroed
on INIT.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/x86.c | 46 ++++++++++++++++++++++++++++++----------------
 1 file changed, 30 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 51b37492142c..c956b36314fb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12517,6 +12517,35 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 	kvfree(vcpu->arch.cpuid_entries);
 }
 
+static void kvm_xstate_reset(struct kvm_vcpu *vcpu, bool init_event)
+{
+	struct fpstate *fpstate = vcpu->arch.guest_fpu.fpstate;
+
+	/*
+	 * Guest FPU state is zero allocated and so doesn't need to be manually
+	 * cleared on RESET, i.e. during vCPU creation.
+	 */
+	if (!init_event || !fpstate)
+		return;
+
+	/*
+	 * On INIT, only select XSTATE components are zeroed, most components
+	 * are unchanged.  Currently, the only components that are zeroed and
+	 * supported by KVM are MPX related.
+	 */
+	if (!kvm_mpx_supported())
+		return;
+
+	/*
+	 * All paths that lead to INIT are required to load the guest's FPU
+	 * state (because most paths are buried in KVM_RUN).
+	 */
+	kvm_put_guest_fpu(vcpu);
+	fpstate_clear_xstate_component(fpstate, XFEATURE_BNDREGS);
+	fpstate_clear_xstate_component(fpstate, XFEATURE_BNDCSR);
+	kvm_load_guest_fpu(vcpu);
+}
+
 void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
 	struct kvm_cpuid_entry2 *cpuid_0x1;
@@ -12574,22 +12603,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	kvm_async_pf_hash_reset(vcpu);
 	vcpu->arch.apf.halted = false;
 
-	if (vcpu->arch.guest_fpu.fpstate && kvm_mpx_supported()) {
-		struct fpstate *fpstate = vcpu->arch.guest_fpu.fpstate;
-
-		/*
-		 * All paths that lead to INIT are required to load the guest's
-		 * FPU state (because most paths are buried in KVM_RUN).
-		 */
-		if (init_event)
-			kvm_put_guest_fpu(vcpu);
-
-		fpstate_clear_xstate_component(fpstate, XFEATURE_BNDREGS);
-		fpstate_clear_xstate_component(fpstate, XFEATURE_BNDCSR);
-
-		if (init_event)
-			kvm_load_guest_fpu(vcpu);
-	}
+	kvm_xstate_reset(vcpu, init_event);
 
 	if (!init_event) {
 		vcpu->arch.smbase = 0x30000;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v11 04/23] KVM: x86: Zero XSTATE components on INIT by iterating over supported features
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
                   ` (2 preceding siblings ...)
  2025-07-04  8:49 ` [PATCH v11 03/23] KVM: x86: Manually clear MPX state only on INIT Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-04  8:49 ` [PATCH v11 05/23] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support Chao Gao
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Chao Gao, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

Tweak the code a bit to facilitate resetting more xstate components in
the future, e.g., CET's xstate-managed MSRs.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/x86.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c956b36314fb..b9d1c84f0794 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12520,6 +12520,8 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 static void kvm_xstate_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
 	struct fpstate *fpstate = vcpu->arch.guest_fpu.fpstate;
+	u64 xfeatures_mask;
+	int i;
 
 	/*
 	 * Guest FPU state is zero allocated and so doesn't need to be manually
@@ -12533,16 +12535,20 @@ static void kvm_xstate_reset(struct kvm_vcpu *vcpu, bool init_event)
 	 * are unchanged.  Currently, the only components that are zeroed and
 	 * supported by KVM are MPX related.
 	 */
-	if (!kvm_mpx_supported())
+	xfeatures_mask = (kvm_caps.supported_xcr0 | kvm_caps.supported_xss) &
+			 (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR);
+	if (!xfeatures_mask)
 		return;
 
+	BUILD_BUG_ON(sizeof(xfeatures_mask) * BITS_PER_BYTE <= XFEATURE_MAX);
+
 	/*
 	 * All paths that lead to INIT are required to load the guest's FPU
 	 * state (because most paths are buried in KVM_RUN).
 	 */
 	kvm_put_guest_fpu(vcpu);
-	fpstate_clear_xstate_component(fpstate, XFEATURE_BNDREGS);
-	fpstate_clear_xstate_component(fpstate, XFEATURE_BNDCSR);
+	for_each_set_bit(i, (unsigned long *)&xfeatures_mask, XFEATURE_MAX)
+		fpstate_clear_xstate_component(fpstate, i);
 	kvm_load_guest_fpu(vcpu);
 }
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v11 05/23] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
                   ` (3 preceding siblings ...)
  2025-07-04  8:49 ` [PATCH v11 04/23] KVM: x86: Zero XSTATE components on INIT by iterating over supported features Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-04  8:49 ` [PATCH v11 06/23] KVM: x86: Report XSS as to-be-saved if there are supported features Chao Gao
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Chao Gao, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

From: Yang Weijiang <weijiang.yang@intel.com>

Enable KVM_{G,S}ET_ONE_REG uAPIs so that userspace can access HW MSR or
KVM synthetic MSR through it.

In CET KVM series [1], KVM "steals" an MSR from PV MSR space and access
it via KVM_{G,S}ET_MSRs uAPIs, but the approach pollutes PV MSR space
and hides the difference of synthetic MSRs and normal HW defined MSRs.

Now carve out a separate room in KVM-customized MSR address space for
synthetic MSRs. The synthetic MSRs are not exposed to userspace via
KVM_GET_MSR_INDEX_LIST, instead userspace complies with KVM's setup and
composes the uAPI params. KVM synthetic MSR indices start from 0 and
increase linearly. Userspace caller should tag MSR type correctly in
order to access intended HW or synthetic MSR.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/all/20240219074733.122080-18-weijiang.yang@intel.com/ [1]
---
 arch/x86/include/uapi/asm/kvm.h | 10 +++++
 arch/x86/kvm/x86.c              | 66 +++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+)

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 0f15d683817d..e72d9e6c1739 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -411,6 +411,16 @@ struct kvm_xcrs {
 	__u64 padding[16];
 };
 
+#define KVM_X86_REG_MSR			(1 << 2)
+#define KVM_X86_REG_SYNTHETIC		(1 << 3)
+
+struct kvm_x86_reg_id {
+	__u32 index;
+	__u8 type;
+	__u8 rsvd;
+	__u16 rsvd16;
+};
+
 #define KVM_SYNC_X86_REGS      (1UL << 0)
 #define KVM_SYNC_X86_SREGS     (1UL << 1)
 #define KVM_SYNC_X86_EVENTS    (1UL << 2)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b9d1c84f0794..e5c2bf4a90e6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2237,6 +2237,31 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
 	return kvm_set_msr_ignored_check(vcpu, index, *data, true);
 }
 
+static int kvm_get_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *value)
+{
+	u64 val;
+	int r;
+
+	r = do_get_msr(vcpu, msr, &val);
+	if (r)
+		return r;
+
+	if (put_user(val, value))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int kvm_set_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *value)
+{
+	u64 val;
+
+	if (get_user(val, value))
+		return -EFAULT;
+
+	return do_set_msr(vcpu, msr, &val);
+}
+
 #ifdef CONFIG_X86_64
 struct pvclock_clock {
 	int vclock_mode;
@@ -5896,6 +5921,11 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 	}
 }
 
+static int kvm_translate_synthetic_msr(struct kvm_x86_reg_id *reg)
+{
+	return -EINVAL;
+}
+
 long kvm_arch_vcpu_ioctl(struct file *filp,
 			 unsigned int ioctl, unsigned long arg)
 {
@@ -6012,6 +6042,42 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		srcu_read_unlock(&vcpu->kvm->srcu, idx);
 		break;
 	}
+	case KVM_GET_ONE_REG:
+	case KVM_SET_ONE_REG: {
+		struct kvm_x86_reg_id *id;
+		struct kvm_one_reg reg;
+		u64 __user *value;
+
+		r = -EFAULT;
+		if (copy_from_user(&reg, argp, sizeof(reg)))
+			break;
+
+		r = -EINVAL;
+		id = (struct kvm_x86_reg_id *)&reg.id;
+		if (id->rsvd || id->rsvd16)
+			break;
+
+		if (id->type != KVM_X86_REG_MSR &&
+		    id->type != KVM_X86_REG_SYNTHETIC)
+			break;
+
+		if (id->type == KVM_X86_REG_SYNTHETIC) {
+			r = kvm_translate_synthetic_msr(id);
+			if (r)
+				break;
+		}
+
+		r = -EINVAL;
+		if (id->type != KVM_X86_REG_MSR)
+			break;
+
+		value = u64_to_user_ptr(reg.addr);
+		if (ioctl == KVM_GET_ONE_REG)
+			r = kvm_get_one_msr(vcpu, id->index, value);
+		else
+			r = kvm_set_one_msr(vcpu, id->index, value);
+		break;
+	}
 	case KVM_TPR_ACCESS_REPORTING: {
 		struct kvm_tpr_access_ctl tac;
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v11 06/23] KVM: x86: Report XSS as to-be-saved if there are supported features
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
                   ` (4 preceding siblings ...)
  2025-07-04  8:49 ` [PATCH v11 05/23] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-04  8:49 ` [PATCH v11 07/23] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS Chao Gao
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Chao Gao, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

From: Sean Christopherson <seanjc@google.com>

Add MSR_IA32_XSS to list of MSRs reported to userspace if supported_xss
is non-zero, i.e. KVM supports at least one XSS based feature.

Before enabling CET virtualization series, guest IA32_MSR_XSS is
guaranteed to be 0, i.e., XSAVES/XRSTORS is executed in non-root mode
with XSS == 0, which equals to the effect of XSAVE/XRSTOR.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/x86.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e5c2bf4a90e6..a018f327d5a1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -332,7 +332,7 @@ static const u32 msrs_to_save_base[] = {
 	MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
 	MSR_IA32_UMWAIT_CONTROL,
 
-	MSR_IA32_XFD, MSR_IA32_XFD_ERR,
+	MSR_IA32_XFD, MSR_IA32_XFD_ERR, MSR_IA32_XSS,
 };
 
 static const u32 msrs_to_save_pmu[] = {
@@ -7573,6 +7573,10 @@ static void kvm_probe_msr_to_save(u32 msr_index)
 		if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR))
 			return;
 		break;
+	case MSR_IA32_XSS:
+		if (!kvm_caps.supported_xss)
+			return;
+		break;
 	default:
 		break;
 	}
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v11 07/23] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
                   ` (5 preceding siblings ...)
  2025-07-04  8:49 ` [PATCH v11 06/23] KVM: x86: Report XSS as to-be-saved if there are supported features Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-04  8:49 ` [PATCH v11 08/23] KVM: x86: Initialize kvm_caps.supported_xss Chao Gao
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Zhang Yi Z, Chao Gao, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Yang Weijiang <weijiang.yang@intel.com>

Update CPUID.(EAX=0DH,ECX=1).EBX to reflect current required xstate size
due to XSS MSR modification.
CPUID(EAX=0DH,ECX=1).EBX reports the required storage size of all enabled
xstate features in (XCR0 | IA32_XSS). The CPUID value can be used by guest
before allocate sufficient xsave buffer.

Note, KVM does not yet support any XSS based features, i.e. supported_xss
is guaranteed to be zero at this time.

Opportunistically return KVM_MSR_RET_UNSUPPORTED if guest CPUID doesn't
enumerate it. Since KVM_MSR_RET_UNSUPPORTED takes care of host_initiated
cases, drop the host_initiated check.

Suggested-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/kvm/cpuid.c            | 15 ++++++++++++++-
 arch/x86/kvm/x86.c              |  9 +++++----
 3 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e07a03dbce6a..30d9d434c048 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -804,7 +804,6 @@ struct kvm_vcpu_arch {
 	bool at_instruction_boundary;
 	bool tpr_access_reporting;
 	bool xfd_no_write_intercept;
-	u64 ia32_xss;
 	u64 microcode_version;
 	u64 arch_capabilities;
 	u64 perf_capabilities;
@@ -865,6 +864,8 @@ struct kvm_vcpu_arch {
 
 	u64 xcr0;
 	u64 guest_supported_xcr0;
+	u64 guest_supported_xss;
+	u64 ia32_xss;
 
 	struct kvm_pio_request pio;
 	void *pio_data;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 9db246671885..9b45607f9b37 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -263,6 +263,17 @@ static u64 cpuid_get_supported_xcr0(struct kvm_vcpu *vcpu)
 	return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0;
 }
 
+static u64 cpuid_get_supported_xss(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpuid_entry2 *best;
+
+	best = kvm_find_cpuid_entry_index(vcpu, 0xd, 1);
+	if (!best)
+		return 0;
+
+	return (best->ecx | ((u64)best->edx << 32)) & kvm_caps.supported_xss;
+}
+
 static __always_inline void kvm_update_feature_runtime(struct kvm_vcpu *vcpu,
 						       struct kvm_cpuid_entry2 *entry,
 						       unsigned int x86_feature,
@@ -305,7 +316,8 @@ static void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
 	best = kvm_find_cpuid_entry_index(vcpu, 0xD, 1);
 	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
 		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
-		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
+		best->ebx = xstate_required_size(vcpu->arch.xcr0 |
+						 vcpu->arch.ia32_xss, true);
 }
 
 static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu)
@@ -424,6 +436,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	}
 
 	vcpu->arch.guest_supported_xcr0 = cpuid_get_supported_xcr0(vcpu);
+	vcpu->arch.guest_supported_xss = cpuid_get_supported_xss(vcpu);
 
 	vcpu->arch.pv_cpuid.features = kvm_apply_cpuid_pv_features_quirk(vcpu);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a018f327d5a1..dd984c6acae5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3992,16 +3992,17 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		}
 		break;
 	case MSR_IA32_XSS:
-		if (!msr_info->host_initiated &&
-		    !guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
-			return 1;
+		if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
+			return KVM_MSR_RET_UNSUPPORTED;
 		/*
 		 * KVM supports exposing PT to the guest, but does not support
 		 * IA32_XSS[bit 8]. Guests have to use RDMSR/WRMSR rather than
 		 * XSAVES/XRSTORS to save/restore PT MSRs.
 		 */
-		if (data & ~kvm_caps.supported_xss)
+		if (data & ~vcpu->arch.guest_supported_xss)
 			return 1;
+		if (vcpu->arch.ia32_xss == data)
+			break;
 		vcpu->arch.ia32_xss = data;
 		vcpu->arch.cpuid_dynamic_bits_dirty = true;
 		break;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v11 08/23] KVM: x86: Initialize kvm_caps.supported_xss
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
                   ` (6 preceding siblings ...)
  2025-07-04  8:49 ` [PATCH v11 07/23] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-04  8:49 ` [PATCH v11 09/23] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Chao Gao
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Chao Gao, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

From: Yang Weijiang <weijiang.yang@intel.com>

Set original kvm_caps.supported_xss to (host_xss & KVM_SUPPORTED_XSS) if
XSAVES is supported. host_xss contains the host supported xstate feature
bits for thread FPU context switch, KVM_SUPPORTED_XSS includes all KVM
enabled XSS feature bits, the resulting value represents the supervisor
xstates that are available to guest and are backed by host FPU framework
for swapping {guest,host} XSAVE-managed registers/MSRs.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/x86.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dd984c6acae5..fcbb4566b4c6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -220,6 +220,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
 				| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
 				| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
 
+#define KVM_SUPPORTED_XSS     0
+
 bool __read_mostly allow_smaller_maxphyaddr = 0;
 EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr);
 
@@ -9892,14 +9894,17 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 		kvm_host.xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
 		kvm_caps.supported_xcr0 = kvm_host.xcr0 & KVM_SUPPORTED_XCR0;
 	}
+
+	if (boot_cpu_has(X86_FEATURE_XSAVES)) {
+		rdmsrq(MSR_IA32_XSS, kvm_host.xss);
+		kvm_caps.supported_xss = kvm_host.xss & KVM_SUPPORTED_XSS;
+	}
+
 	kvm_caps.supported_quirks = KVM_X86_VALID_QUIRKS;
 	kvm_caps.inapplicable_quirks = KVM_X86_CONDITIONAL_QUIRKS;
 
 	rdmsrq_safe(MSR_EFER, &kvm_host.efer);
 
-	if (boot_cpu_has(X86_FEATURE_XSAVES))
-		rdmsrq(MSR_IA32_XSS, kvm_host.xss);
-
 	kvm_init_pmu_capability(ops->pmu_ops);
 
 	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v11 09/23] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
                   ` (7 preceding siblings ...)
  2025-07-04  8:49 ` [PATCH v11 08/23] KVM: x86: Initialize kvm_caps.supported_xss Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-04  8:49 ` [PATCH v11 10/23] KVM: x86: Add fault checks for guest CR4.CET setting Chao Gao
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Chao Gao, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

From: Sean Christopherson <seanjc@google.com>

Load the guest's FPU state if userspace is accessing MSRs whose values
are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(),
to facilitate access to such kind of MSRs.

If MSRs supported in kvm_caps.supported_xss are passed through to guest,
the guest MSRs are swapped with host's before vCPU exits to userspace and
after it reenters kernel before next VM-entry.

Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
explicitly check @vcpu is non-null before attempting to load guest state.
The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without
loading guest FPU state (which doesn't exist).

Note that guest_cpuid_has() is not queried as host userspace is allowed to
access MSRs that have not been exposed to the guest, e.g. it might do
KVM_SET_MSRS prior to KVM_SET_CPUID2.

The two helpers are put here in order to manifest accessing xsave-managed
MSRs requires special check and handling to guarantee the correctness of
read/write to the MSRs.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 35 ++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.h | 24 ++++++++++++++++++++++++
 2 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fcbb4566b4c6..15169867bc14 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -136,6 +136,9 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
 static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
 
 static DEFINE_MUTEX(vendor_module_lock);
+static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
+static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
+
 struct kvm_x86_ops kvm_x86_ops __read_mostly;
 
 #define KVM_X86_OP(func)					     \
@@ -4547,6 +4550,21 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 }
 EXPORT_SYMBOL_GPL(kvm_get_msr_common);
 
+/*
+ *  Returns true if the MSR in question is managed via XSTATE, i.e. is context
+ *  switched with the rest of guest FPU state.
+ */
+static bool is_xstate_managed_msr(u32 index)
+{
+	switch (index) {
+	case MSR_IA32_U_CET:
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+		return true;
+	default:
+		return false;
+	}
+}
+
 /*
  * Read or write a bunch of msrs. All parameters are kernel addresses.
  *
@@ -4557,11 +4575,26 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
 		    int (*do_msr)(struct kvm_vcpu *vcpu,
 				  unsigned index, u64 *data))
 {
+	bool fpu_loaded = false;
 	int i;
 
-	for (i = 0; i < msrs->nmsrs; ++i)
+	for (i = 0; i < msrs->nmsrs; ++i) {
+		/*
+		 * If userspace is accessing one or more XSTATE-managed MSRs,
+		 * temporarily load the guest's FPU state so that the guest's
+		 * MSR value(s) is resident in hardware, i.e. so that KVM can
+		 * get/set the MSR via RDMSR/WRMSR.
+		 */
+		if (vcpu && !fpu_loaded && kvm_caps.supported_xss &&
+		    is_xstate_managed_msr(entries[i].index)) {
+			kvm_load_guest_fpu(vcpu);
+			fpu_loaded = true;
+		}
 		if (do_msr(vcpu, entries[i].index, &entries[i].data))
 			break;
+	}
+	if (fpu_loaded)
+		kvm_put_guest_fpu(vcpu);
 
 	return i;
 }
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 832f0faf4779..2914e99059c9 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -667,4 +667,28 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl,
 
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
 
+/*
+ * Lock and/or reload guest FPU and access xstate MSRs. For accesses initiated
+ * by host, guest FPU is loaded in __msr_io(). For accesses initiated by guest,
+ * guest FPU should have been loaded already.
+ */
+
+static inline void kvm_get_xstate_msr(struct kvm_vcpu *vcpu,
+				      struct msr_data *msr_info)
+{
+	KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
+	kvm_fpu_get();
+	rdmsrl(msr_info->index, msr_info->data);
+	kvm_fpu_put();
+}
+
+static inline void kvm_set_xstate_msr(struct kvm_vcpu *vcpu,
+				      struct msr_data *msr_info)
+{
+	KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
+	kvm_fpu_get();
+	wrmsrl(msr_info->index, msr_info->data);
+	kvm_fpu_put();
+}
+
 #endif
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v11 10/23] KVM: x86: Add fault checks for guest CR4.CET setting
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
                   ` (8 preceding siblings ...)
  2025-07-04  8:49 ` [PATCH v11 09/23] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-04  8:49 ` [PATCH v11 11/23] KVM: x86: Report KVM supported CET MSRs as to-be-saved Chao Gao
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Chao Gao, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

From: Yang Weijiang <weijiang.yang@intel.com>

Check potential faults for CR4.CET setting per Intel SDM requirements.
CET can be enabled if and only if CR0.WP == 1, i.e. setting CR4.CET ==
1 faults if CR0.WP == 0 and setting CR0.WP == 0 fails if CR4.CET == 1.

Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 15169867bc14..260368ba3134 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1169,6 +1169,9 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 	    (is_64_bit_mode(vcpu) || kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE)))
 		return 1;
 
+	if (!(cr0 & X86_CR0_WP) && kvm_is_cr4_bit_set(vcpu, X86_CR4_CET))
+		return 1;
+
 	kvm_x86_call(set_cr0)(vcpu, cr0);
 
 	kvm_post_set_cr0(vcpu, old_cr0, cr0);
@@ -1368,6 +1371,9 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 			return 1;
 	}
 
+	if ((cr4 & X86_CR4_CET) && !kvm_is_cr0_bit_set(vcpu, X86_CR0_WP))
+		return 1;
+
 	kvm_x86_call(set_cr4)(vcpu, cr4);
 
 	kvm_post_set_cr4(vcpu, old_cr4, cr4);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v11 11/23] KVM: x86: Report KVM supported CET MSRs as to-be-saved
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
                   ` (9 preceding siblings ...)
  2025-07-04  8:49 ` [PATCH v11 10/23] KVM: x86: Add fault checks for guest CR4.CET setting Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-04  8:49 ` [PATCH v11 12/23] KVM: VMX: Introduce CET VMCS fields and control bits Chao Gao
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Chao Gao, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

From: Yang Weijiang <weijiang.yang@intel.com>

Add CET MSRs to the list of MSRs reported to userspace if the feature,
i.e. IBT or SHSTK, associated with the MSRs is supported by KVM.

Suggested-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/x86.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 260368ba3134..f6cf371ee16a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -338,6 +338,10 @@ static const u32 msrs_to_save_base[] = {
 	MSR_IA32_UMWAIT_CONTROL,
 
 	MSR_IA32_XFD, MSR_IA32_XFD_ERR, MSR_IA32_XSS,
+
+	MSR_IA32_U_CET, MSR_IA32_S_CET,
+	MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP,
+	MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB,
 };
 
 static const u32 msrs_to_save_pmu[] = {
@@ -7619,6 +7623,20 @@ static void kvm_probe_msr_to_save(u32 msr_index)
 		if (!kvm_caps.supported_xss)
 			return;
 		break;
+	case MSR_IA32_U_CET:
+	case MSR_IA32_S_CET:
+		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
+		    !kvm_cpu_cap_has(X86_FEATURE_IBT))
+			return;
+		break;
+	case MSR_IA32_INT_SSP_TAB:
+		if (!kvm_cpu_cap_has(X86_FEATURE_LM))
+			return;
+		fallthrough;
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
+			return;
+		break;
 	default:
 		break;
 	}
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v11 12/23] KVM: VMX: Introduce CET VMCS fields and control bits
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
                   ` (10 preceding siblings ...)
  2025-07-04  8:49 ` [PATCH v11 11/23] KVM: x86: Report KVM supported CET MSRs as to-be-saved Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-28 22:53   ` Xin Li
  2025-07-04  8:49 ` [PATCH v11 13/23] KVM: x86: Enable guest SSP read/write interface with new uAPIs Chao Gao
                   ` (12 subsequent siblings)
  24 siblings, 1 reply; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Zhang Yi Z, Chao Gao, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Yang Weijiang <weijiang.yang@intel.com>

Control-flow Enforcement Technology (CET) is a kind of CPU feature used
to prevent Return/CALL/Jump-Oriented Programming (ROP/COP/JOP) attacks.
It provides two sub-features(SHSTK,IBT) to defend against ROP/COP/JOP
style control-flow subversion attacks.

Shadow Stack (SHSTK):
  A shadow stack is a second stack used exclusively for control transfer
  operations. The shadow stack is separate from the data/normal stack and
  can be enabled individually in user and kernel mode. When shadow stack
  is enabled, CALL pushes the return address on both the data and shadow
  stack. RET pops the return address from both stacks and compares them.
  If the return addresses from the two stacks do not match, the processor
  generates a #CP.

Indirect Branch Tracking (IBT):
  IBT introduces instruction(ENDBRANCH)to mark valid target addresses of
  indirect branches (CALL, JMP etc...). If an indirect branch is executed
  and the next instruction is _not_ an ENDBRANCH, the processor generates
  a #CP. These instruction behaves as a NOP on platforms that have no CET.

Several new CET MSRs are defined to support CET:
  MSR_IA32_{U,S}_CET: CET settings for {user,supervisor} CET respectively.

  MSR_IA32_PL{0,1,2,3}_SSP: SHSTK pointer linear address for CPL{0,1,2,3}.

  MSR_IA32_INT_SSP_TAB: Linear address of SHSTK pointer table, whose entry
			is indexed by IST of interrupt gate desc.

Two XSAVES state bits are introduced for CET:
  IA32_XSS:[bit 11]: Control saving/restoring user mode CET states
  IA32_XSS:[bit 12]: Control saving/restoring supervisor mode CET states.

Six VMCS fields are introduced for CET:
  {HOST,GUEST}_S_CET: Stores CET settings for kernel mode.
  {HOST,GUEST}_SSP: Stores current active SSP.
  {HOST,GUEST}_INTR_SSP_TABLE: Stores current active MSR_IA32_INT_SSP_TAB.

On Intel platforms, two additional bits are defined in VM_EXIT and VM_ENTRY
control fields:
If VM_EXIT_LOAD_CET_STATE = 1, host CET states are loaded from following
VMCS fields at VM-Exit:
  HOST_S_CET
  HOST_SSP
  HOST_INTR_SSP_TABLE

If VM_ENTRY_LOAD_CET_STATE = 1, guest CET states are loaded from following
VMCS fields at VM-Entry:
  GUEST_S_CET
  GUEST_SSP
  GUEST_INTR_SSP_TABLE

Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/vmx.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index cca7d6641287..ce10a7e2d3d9 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -106,6 +106,7 @@
 #define VM_EXIT_CLEAR_BNDCFGS                   0x00800000
 #define VM_EXIT_PT_CONCEAL_PIP			0x01000000
 #define VM_EXIT_CLEAR_IA32_RTIT_CTL		0x02000000
+#define VM_EXIT_LOAD_CET_STATE                  0x10000000
 
 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR	0x00036dff
 
@@ -119,6 +120,7 @@
 #define VM_ENTRY_LOAD_BNDCFGS                   0x00010000
 #define VM_ENTRY_PT_CONCEAL_PIP			0x00020000
 #define VM_ENTRY_LOAD_IA32_RTIT_CTL		0x00040000
+#define VM_ENTRY_LOAD_CET_STATE                 0x00100000
 
 #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR	0x000011ff
 
@@ -369,6 +371,9 @@ enum vmcs_field {
 	GUEST_PENDING_DBG_EXCEPTIONS    = 0x00006822,
 	GUEST_SYSENTER_ESP              = 0x00006824,
 	GUEST_SYSENTER_EIP              = 0x00006826,
+	GUEST_S_CET                     = 0x00006828,
+	GUEST_SSP                       = 0x0000682a,
+	GUEST_INTR_SSP_TABLE            = 0x0000682c,
 	HOST_CR0                        = 0x00006c00,
 	HOST_CR3                        = 0x00006c02,
 	HOST_CR4                        = 0x00006c04,
@@ -381,6 +386,9 @@ enum vmcs_field {
 	HOST_IA32_SYSENTER_EIP          = 0x00006c12,
 	HOST_RSP                        = 0x00006c14,
 	HOST_RIP                        = 0x00006c16,
+	HOST_S_CET                      = 0x00006c18,
+	HOST_SSP                        = 0x00006c1a,
+	HOST_INTR_SSP_TABLE             = 0x00006c1c
 };
 
 /*
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v11 13/23] KVM: x86: Enable guest SSP read/write interface with new uAPIs
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
                   ` (11 preceding siblings ...)
  2025-07-04  8:49 ` [PATCH v11 12/23] KVM: VMX: Introduce CET VMCS fields and control bits Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-04  8:49 ` [PATCH v11 14/23] KVM: VMX: Emulate read and write to CET MSRs Chao Gao
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Chao Gao, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

From: Yang Weijiang <weijiang.yang@intel.com>

Enable guest shadow stack pointer(SSP) access interface with new uAPIs.
CET guest SSP is HW register which has corresponding VMCS field to save
/restore guest values when VM-{Exit,Entry} happens. KVM handles SSP as
a synthetic MSR for userspace access.

Use a translation helper to set up mapping for SSP synthetic index and
KVM-internal MSR index so that userspace doesn't need to take care of
KVM's management for synthetic MSRs and avoid conflicts.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/include/uapi/asm/kvm.h |  3 +++
 arch/x86/kvm/x86.c              | 10 +++++++++-
 arch/x86/kvm/x86.h              | 10 ++++++++++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index e72d9e6c1739..a4870d9c9279 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -421,6 +421,9 @@ struct kvm_x86_reg_id {
 	__u16 rsvd16;
 };
 
+/* KVM synthetic MSR index staring from 0 */
+#define KVM_SYNTHETIC_GUEST_SSP 0
+
 #define KVM_SYNC_X86_REGS      (1UL << 0)
 #define KVM_SYNC_X86_SREGS     (1UL << 1)
 #define KVM_SYNC_X86_EVENTS    (1UL << 2)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f6cf371ee16a..2373968ea7fd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5969,7 +5969,15 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 
 static int kvm_translate_synthetic_msr(struct kvm_x86_reg_id *reg)
 {
-	return -EINVAL;
+	switch (reg->index) {
+	case KVM_SYNTHETIC_GUEST_SSP:
+		reg->type = KVM_X86_REG_MSR;
+		reg->index = MSR_KVM_INTERNAL_GUEST_SSP;
+		break;
+	default:
+		return -EINVAL;
+	}
+	return 0;
 }
 
 long kvm_arch_vcpu_ioctl(struct file *filp,
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 2914e99059c9..17b1485fa2f4 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -79,6 +79,16 @@ void kvm_spurious_fault(void);
 #define KVM_SVM_DEFAULT_PLE_WINDOW_MAX	USHRT_MAX
 #define KVM_SVM_DEFAULT_PLE_WINDOW	3000
 
+/*
+ * KVM's internal, non-ABI indices for synthetic MSRs. The values themselves
+ * are arbitrary and have no meaning, the only requirement is that they don't
+ * conflict with "real" MSRs that KVM supports. Use values at the upper end
+ * of KVM's reserved paravirtual MSR range to minimize churn, i.e. these values
+ * will be usable until KVM exhausts its supply of paravirtual MSR indices.
+ */
+
+#define MSR_KVM_INTERNAL_GUEST_SSP	0x4b564dff
+
 static inline unsigned int __grow_ple_window(unsigned int val,
 		unsigned int base, unsigned int modifier, unsigned int max)
 {
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v11 14/23] KVM: VMX: Emulate read and write to CET MSRs
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
                   ` (12 preceding siblings ...)
  2025-07-04  8:49 ` [PATCH v11 13/23] KVM: x86: Enable guest SSP read/write interface with new uAPIs Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-04  8:49 ` [PATCH v11 15/23] KVM: x86: Save and reload SSP to/from SMRAM Chao Gao
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Chao Gao, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

From: Yang Weijiang <weijiang.yang@intel.com>

Add emulation interface for CET MSR access. The emulation code is split
into common part and vendor specific part. The former does common checks
for MSRs, e.g., accessibility, data validity etc., then passes operation
to either XSAVE-managed MSRs via the helpers or CET VMCS fields.

SSP can only be read via RDSSP. Writing even requires destructive and
potentially faulting operations such as SAVEPREVSSP/RSTORSSP or
SETSSBSY/CLRSSBSY. Let the host use a pseudo-MSR that is just a wrapper
for the GUEST_SSP field of the VMCS.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v11: use KVM_MSR_RET_UNSUPPORTED to refuse MSR accesses to CET MSRs if
they are not supported according to guest CPUIDs
---
 arch/x86/kvm/vmx/vmx.c | 18 ++++++++++++++++++
 arch/x86/kvm/x86.c     | 43 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.h     | 23 ++++++++++++++++++++++
 3 files changed, 84 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f81710d7d992..136c77e91474 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2089,6 +2089,15 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		else
 			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
 		break;
+	case MSR_IA32_S_CET:
+		msr_info->data = vmcs_readl(GUEST_S_CET);
+		break;
+	case MSR_KVM_INTERNAL_GUEST_SSP:
+		msr_info->data = vmcs_readl(GUEST_SSP);
+		break;
+	case MSR_IA32_INT_SSP_TAB:
+		msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
+		break;
 	case MSR_IA32_DEBUGCTLMSR:
 		msr_info->data = vmx_guest_debugctl_read();
 		break;
@@ -2407,6 +2416,15 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		else
 			vmx->pt_desc.guest.addr_a[index / 2] = data;
 		break;
+	case MSR_IA32_S_CET:
+		vmcs_writel(GUEST_S_CET, data);
+		break;
+	case MSR_KVM_INTERNAL_GUEST_SSP:
+		vmcs_writel(GUEST_SSP, data);
+		break;
+	case MSR_IA32_INT_SSP_TAB:
+		vmcs_writel(GUEST_INTR_SSP_TABLE, data);
+		break;
 	case MSR_IA32_PERF_CAPABILITIES:
 		if (data & PMU_CAP_LBR_FMT) {
 			if ((data & PMU_CAP_LBR_FMT) !=
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2373968ea7fd..9ff7996d7534 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1882,6 +1882,27 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
 
 		data = (u32)data;
 		break;
+	case MSR_IA32_U_CET:
+	case MSR_IA32_S_CET:
+		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+		    !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT))
+			return KVM_MSR_RET_UNSUPPORTED;
+		if (!is_cet_msr_valid(vcpu, data))
+			return 1;
+		break;
+	case MSR_KVM_INTERNAL_GUEST_SSP:
+		if (!host_initiated)
+			return 1;
+		fallthrough;
+	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+			return KVM_MSR_RET_UNSUPPORTED;
+		if (is_noncanonical_msr_address(data, vcpu))
+			return 1;
+		/* All SSP MSRs except MSR_IA32_INT_SSP_TAB must be 4-byte aligned */
+		if (index != MSR_IA32_INT_SSP_TAB && !IS_ALIGNED(data, 4))
+			return 1;
+		break;
 	}
 
 	msr.data = data;
@@ -1926,6 +1947,20 @@ static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
 		    !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID))
 			return 1;
 		break;
+	case MSR_IA32_U_CET:
+	case MSR_IA32_S_CET:
+		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+		    !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT))
+			return KVM_MSR_RET_UNSUPPORTED;
+		break;
+	case MSR_KVM_INTERNAL_GUEST_SSP:
+		if (!host_initiated)
+			return 1;
+		fallthrough;
+	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+			return KVM_MSR_RET_UNSUPPORTED;
+		break;
 	}
 
 	msr.index = index;
@@ -4201,6 +4236,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		vcpu->arch.guest_fpu.xfd_err = data;
 		break;
 #endif
+	case MSR_IA32_U_CET:
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+		kvm_set_xstate_msr(vcpu, msr_info);
+		break;
 	default:
 		if (kvm_pmu_is_valid_msr(vcpu, msr))
 			return kvm_pmu_set_msr(vcpu, msr_info);
@@ -4550,6 +4589,10 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		msr_info->data = vcpu->arch.guest_fpu.xfd_err;
 		break;
 #endif
+	case MSR_IA32_U_CET:
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+		kvm_get_xstate_msr(vcpu, msr_info);
+		break;
 	default:
 		if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
 			return kvm_pmu_get_msr(vcpu, msr_info);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 17b1485fa2f4..1b5a96329c64 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -701,4 +701,27 @@ static inline void kvm_set_xstate_msr(struct kvm_vcpu *vcpu,
 	kvm_fpu_put();
 }
 
+#define CET_US_RESERVED_BITS		GENMASK(9, 6)
+#define CET_US_SHSTK_MASK_BITS		GENMASK(1, 0)
+#define CET_US_IBT_MASK_BITS		(GENMASK_ULL(5, 2) | GENMASK_ULL(63, 10))
+#define CET_US_LEGACY_BITMAP_BASE(data)	((data) >> 12)
+
+static inline bool is_cet_msr_valid(struct kvm_vcpu *vcpu, u64 data)
+{
+	if (data & CET_US_RESERVED_BITS)
+		return false;
+	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+	    (data & CET_US_SHSTK_MASK_BITS))
+		return false;
+	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) &&
+	    (data & CET_US_IBT_MASK_BITS))
+		return false;
+	if (!IS_ALIGNED(CET_US_LEGACY_BITMAP_BASE(data), 4))
+		return false;
+	/* IBT can be suppressed iff the TRACKER isn't WAIT_ENDBR. */
+	if ((data & CET_SUPPRESS) && (data & CET_WAIT_ENDBR))
+		return false;
+
+	return true;
+}
 #endif
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v11 15/23] KVM: x86: Save and reload SSP to/from SMRAM
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
                   ` (13 preceding siblings ...)
  2025-07-04  8:49 ` [PATCH v11 14/23] KVM: VMX: Emulate read and write to CET MSRs Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-04  8:49 ` [PATCH v11 16/23] KVM: VMX: Set up interception for CET MSRs Chao Gao
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Chao Gao, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

From: Yang Weijiang <weijiang.yang@intel.com>

Save CET SSP to SMRAM on SMI and reload it on RSM. KVM emulates HW arch
behavior when guest enters/leaves SMM mode,i.e., save registers to SMRAM
at the entry of SMM and reload them at the exit to SMM. Per SDM, SSP is
one of such registers on 64-bit Arch, and add the support for SSP.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>

---
v11:
 1)Synthesize triple-fault if KVM fails to kvm_msr_{write,read} guest SSP.
 2)Do nothing for SSP for 32-bit guests when entering/exiting SMM.
---
 arch/x86/kvm/smm.c | 8 ++++++++
 arch/x86/kvm/smm.h | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
index 51d0646622ef..de1fce62ecd2 100644
--- a/arch/x86/kvm/smm.c
+++ b/arch/x86/kvm/smm.c
@@ -269,6 +269,10 @@ static void enter_smm_save_state_64(struct kvm_vcpu *vcpu,
 	enter_smm_save_seg_64(vcpu, &smram->gs, VCPU_SREG_GS);
 
 	smram->int_shadow = kvm_x86_call(get_interrupt_shadow)(vcpu);
+
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+	    kvm_msr_read(vcpu, MSR_KVM_INTERNAL_GUEST_SSP, &smram->ssp))
+		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
 }
 #endif
 
@@ -558,6 +562,10 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
 	kvm_x86_call(set_interrupt_shadow)(vcpu, 0);
 	ctxt->interruptibility = (u8)smstate->int_shadow;
 
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+	    kvm_msr_write(vcpu, MSR_KVM_INTERNAL_GUEST_SSP, smstate->ssp))
+		return X86EMUL_UNHANDLEABLE;
+
 	return X86EMUL_CONTINUE;
 }
 #endif
diff --git a/arch/x86/kvm/smm.h b/arch/x86/kvm/smm.h
index 551703fbe200..db3c88f16138 100644
--- a/arch/x86/kvm/smm.h
+++ b/arch/x86/kvm/smm.h
@@ -116,8 +116,8 @@ struct kvm_smram_state_64 {
 	u32 smbase;
 	u32 reserved4[5];
 
-	/* ssp and svm_* fields below are not implemented by KVM */
 	u64 ssp;
+	/* svm_* fields below are not implemented by KVM */
 	u64 svm_guest_pat;
 	u64 svm_host_efer;
 	u64 svm_host_cr4;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v11 16/23] KVM: VMX: Set up interception for CET MSRs
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
                   ` (14 preceding siblings ...)
  2025-07-04  8:49 ` [PATCH v11 15/23] KVM: x86: Save and reload SSP to/from SMRAM Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-04  8:49 ` [PATCH v11 17/23] KVM: VMX: Set host constant supervisor states to VMCS fields Chao Gao
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Chao Gao, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

From: Yang Weijiang <weijiang.yang@intel.com>

Enable/disable CET MSRs interception per associated feature configuration.

Shadow Stack feature requires all CET MSRs passed through to guest to make
it supported in user and supervisor mode while IBT feature only depends on
MSR_IA32_{U,S}_CETS_CET to enable user and supervisor IBT.

Note, this MSR design introduced an architectural limitation of SHSTK and
IBT control for guest, i.e., when SHSTK is exposed, IBT is also available
to guest from architectural perspective since IBT relies on subset of SHSTK
relevant MSRs.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>

---
v11:
Rebase onto Sean's MSR cleanups.
---
 arch/x86/kvm/vmx/vmx.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 136c77e91474..ba46c1dcdb9d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4084,6 +4084,8 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
 
 void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 {
+	bool set;
+
 	if (!cpu_has_vmx_msr_bitmap())
 		return;
 
@@ -4125,6 +4127,24 @@ void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 		vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
 					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
 
+	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
+		set = !guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
+
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, MSR_TYPE_RW, set);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, MSR_TYPE_RW, set);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, MSR_TYPE_RW, set);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, set);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_INT_SSP_TAB, MSR_TYPE_RW, set);
+	}
+
+	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK) || kvm_cpu_cap_has(X86_FEATURE_IBT)) {
+		set = !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) &&
+		      !guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
+
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, MSR_TYPE_RW, set);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, set);
+	}
+
 	/*
 	 * x2APIC and LBR MSR intercepts are modified on-demand and cannot be
 	 * filtered by userspace.
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v11 17/23] KVM: VMX: Set host constant supervisor states to VMCS fields
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
                   ` (15 preceding siblings ...)
  2025-07-04  8:49 ` [PATCH v11 16/23] KVM: VMX: Set up interception for CET MSRs Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-04  8:49 ` [PATCH v11 18/23] KVM: x86: Don't emulate instructions guarded by CET Chao Gao
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Chao Gao, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

From: Yang Weijiang <weijiang.yang@intel.com>

Save constant values to HOST_{S_CET,SSP,INTR_SSP_TABLE} field explicitly.
Kernel IBT is supported and the setting in MSR_IA32_S_CET is static after
post-boot(The exception is BIOS call case but vCPU thread never across it)
and KVM doesn't need to refresh HOST_S_CET field before every VM-Enter/
VM-Exit sequence.

Host supervisor shadow stack is not enabled now and SSP is not accessible
to kernel mode, thus it's safe to set host IA32_INT_SSP_TAB/SSP VMCS field
to 0s. When shadow stack is enabled for CPL3, SSP is reloaded from PL3_SSP
before it exits to userspace. Check SDM Vol 2A/B Chapter 3/4 for SYSCALL/
SYSRET/SYSENTER SYSEXIT/RDSSP/CALL etc.

Prevent KVM module loading if host supervisor shadow stack SHSTK_EN is set
in MSR_IA32_S_CET as KVM cannot co-exit with it correctly.

Suggested-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/vmx/capabilities.h |  4 ++++
 arch/x86/kvm/vmx/vmx.c          | 15 +++++++++++++++
 arch/x86/kvm/x86.c              | 12 ++++++++++++
 arch/x86/kvm/x86.h              |  1 +
 4 files changed, 32 insertions(+)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index cb6588238f46..0f0e1717dc80 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -104,6 +104,10 @@ static inline bool cpu_has_load_perf_global_ctrl(void)
 	return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
 }
 
+static inline bool cpu_has_load_cet_ctrl(void)
+{
+	return (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_CET_STATE);
+}
 static inline bool cpu_has_vmx_mpx(void)
 {
 	return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_BNDCFGS;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ba46c1dcdb9d..3d6da3836e6b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4300,6 +4300,21 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
 
 	if (cpu_has_load_ia32_efer())
 		vmcs_write64(HOST_IA32_EFER, kvm_host.efer);
+
+	/*
+	 * Supervisor shadow stack is not enabled on host side, i.e.,
+	 * host IA32_S_CET.SHSTK_EN bit is guaranteed to 0 now, per SDM
+	 * description(RDSSP instruction), SSP is not readable in CPL0,
+	 * so resetting the two registers to 0s at VM-Exit does no harm
+	 * to kernel execution. When execution flow exits to userspace,
+	 * SSP is reloaded from IA32_PL3_SSP. Check SDM Vol.2A/B Chapter
+	 * 3 and 4 for details.
+	 */
+	if (cpu_has_load_cet_ctrl()) {
+		vmcs_writel(HOST_S_CET, kvm_host.s_cet);
+		vmcs_writel(HOST_SSP, 0);
+		vmcs_writel(HOST_INTR_SSP_TABLE, 0);
+	}
 }
 
 void set_cr4_guest_host_mask(struct vcpu_vmx *vmx)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9ff7996d7534..b17a8bf84db3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9975,6 +9975,18 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 		return -EIO;
 	}
 
+	if (boot_cpu_has(X86_FEATURE_SHSTK)) {
+		rdmsrl(MSR_IA32_S_CET, kvm_host.s_cet);
+		/*
+		 * Linux doesn't yet support supervisor shadow stacks (SSS), so
+		 * KVM doesn't save/restore the associated MSRs, i.e. KVM may
+		 * clobber the host values.  Yell and refuse to load if SSS is
+		 * unexpectedly enabled, e.g. to avoid crashing the host.
+		 */
+		if (WARN_ON_ONCE(kvm_host.s_cet & CET_SHSTK_EN))
+			return -EIO;
+	}
+
 	memset(&kvm_caps, 0, sizeof(kvm_caps));
 
 	x86_emulator_cache = kvm_alloc_emulator_cache();
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 1b5a96329c64..8d2049a1f41b 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -50,6 +50,7 @@ struct kvm_host_values {
 	u64 efer;
 	u64 xcr0;
 	u64 xss;
+	u64 s_cet;
 	u64 arch_capabilities;
 };
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v11 18/23] KVM: x86: Don't emulate instructions guarded by CET
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
                   ` (16 preceding siblings ...)
  2025-07-04  8:49 ` [PATCH v11 17/23] KVM: VMX: Set host constant supervisor states to VMCS fields Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-04  8:49 ` [PATCH v11 19/23] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Chao Gao
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Chao Gao, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

From: Yang Weijiang <weijiang.yang@intel.com>

Don't emulate the branch instructions, e.g., CALL/RET/JMP etc., when CET
is active in guest, return KVM_INTERNAL_ERROR_EMULATION to userspace to
handle it.

KVM doesn't emulate CPU behaviors to check CET protected stuffs while
emulating guest instructions, instead it stops emulation on detecting
the instructions in process are CET protected. By doing so, it can avoid
generating bogus #CP in guest and preventing CET protected execution flow
subversion from guest side.

Suggested-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/emulate.c | 46 ++++++++++++++++++++++++++++++++----------
 1 file changed, 35 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1349e278cd2a..80b9d1e4a50a 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -178,6 +178,8 @@
 #define IncSP       ((u64)1 << 54)  /* SP is incremented before ModRM calc */
 #define TwoMemOp    ((u64)1 << 55)  /* Instruction has two memory operand */
 #define IsBranch    ((u64)1 << 56)  /* Instruction is considered a branch. */
+#define ShadowStack ((u64)1 << 57)  /* Instruction protected by Shadow Stack. */
+#define IndirBrnTrk ((u64)1 << 58)  /* Instruction protected by IBT. */
 
 #define DstXacc     (DstAccLo | SrcAccHi | SrcWrite)
 
@@ -4068,9 +4070,11 @@ static const struct opcode group4[] = {
 static const struct opcode group5[] = {
 	F(DstMem | SrcNone | Lock,		em_inc),
 	F(DstMem | SrcNone | Lock,		em_dec),
-	I(SrcMem | NearBranch | IsBranch,       em_call_near_abs),
-	I(SrcMemFAddr | ImplicitOps | IsBranch, em_call_far),
-	I(SrcMem | NearBranch | IsBranch,       em_jmp_abs),
+	I(SrcMem | NearBranch | IsBranch | ShadowStack | IndirBrnTrk,
+	em_call_near_abs),
+	I(SrcMemFAddr | ImplicitOps | IsBranch | ShadowStack | IndirBrnTrk,
+	em_call_far),
+	I(SrcMem | NearBranch | IsBranch | IndirBrnTrk, em_jmp_abs),
 	I(SrcMemFAddr | ImplicitOps | IsBranch, em_jmp_far),
 	I(SrcMem | Stack | TwoMemOp,		em_push), D(Undefined),
 };
@@ -4332,11 +4336,11 @@ static const struct opcode opcode_table[256] = {
 	/* 0xC8 - 0xCF */
 	I(Stack | SrcImmU16 | Src2ImmByte | IsBranch, em_enter),
 	I(Stack | IsBranch, em_leave),
-	I(ImplicitOps | SrcImmU16 | IsBranch, em_ret_far_imm),
-	I(ImplicitOps | IsBranch, em_ret_far),
-	D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch, intn),
+	I(ImplicitOps | SrcImmU16 | IsBranch | ShadowStack, em_ret_far_imm),
+	I(ImplicitOps | IsBranch | ShadowStack, em_ret_far),
+	D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch | ShadowStack, intn),
 	D(ImplicitOps | No64 | IsBranch),
-	II(ImplicitOps | IsBranch, em_iret, iret),
+	II(ImplicitOps | IsBranch | ShadowStack, em_iret, iret),
 	/* 0xD0 - 0xD7 */
 	G(Src2One | ByteOp, group2), G(Src2One, group2),
 	G(Src2CL | ByteOp, group2), G(Src2CL, group2),
@@ -4352,7 +4356,7 @@ static const struct opcode opcode_table[256] = {
 	I2bvIP(SrcImmUByte | DstAcc, em_in,  in,  check_perm_in),
 	I2bvIP(SrcAcc | DstImmUByte, em_out, out, check_perm_out),
 	/* 0xE8 - 0xEF */
-	I(SrcImm | NearBranch | IsBranch, em_call),
+	I(SrcImm | NearBranch | IsBranch | ShadowStack, em_call),
 	D(SrcImm | ImplicitOps | NearBranch | IsBranch),
 	I(SrcImmFAddr | No64 | IsBranch, em_jmp_far),
 	D(SrcImmByte | ImplicitOps | NearBranch | IsBranch),
@@ -4371,7 +4375,8 @@ static const struct opcode opcode_table[256] = {
 static const struct opcode twobyte_table[256] = {
 	/* 0x00 - 0x0F */
 	G(0, group6), GD(0, &group7), N, N,
-	N, I(ImplicitOps | EmulateOnUD | IsBranch, em_syscall),
+	N, I(ImplicitOps | EmulateOnUD | IsBranch | ShadowStack | IndirBrnTrk,
+	em_syscall),
 	II(ImplicitOps | Priv, em_clts, clts), N,
 	DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N,
 	N, D(ImplicitOps | ModRM | SrcMem | NoAccess), N, N,
@@ -4402,8 +4407,9 @@ static const struct opcode twobyte_table[256] = {
 	IIP(ImplicitOps, em_rdtsc, rdtsc, check_rdtsc),
 	II(ImplicitOps | Priv, em_rdmsr, rdmsr),
 	IIP(ImplicitOps, em_rdpmc, rdpmc, check_rdpmc),
-	I(ImplicitOps | EmulateOnUD | IsBranch, em_sysenter),
-	I(ImplicitOps | Priv | EmulateOnUD | IsBranch, em_sysexit),
+	I(ImplicitOps | EmulateOnUD | IsBranch | ShadowStack | IndirBrnTrk,
+	em_sysenter),
+	I(ImplicitOps | Priv | EmulateOnUD | IsBranch | ShadowStack, em_sysexit),
 	N, N,
 	N, N, N, N, N, N, N, N,
 	/* 0x40 - 0x4F */
@@ -4941,6 +4947,24 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
 	if (ctxt->d == 0)
 		return EMULATION_FAILED;
 
+	if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET) {
+		u64 u_cet, s_cet;
+		bool stop_em;
+
+		if (ctxt->ops->get_msr(ctxt, MSR_IA32_U_CET, &u_cet) ||
+		    ctxt->ops->get_msr(ctxt, MSR_IA32_S_CET, &s_cet))
+			return EMULATION_FAILED;
+
+		stop_em = ((u_cet & CET_SHSTK_EN) || (s_cet & CET_SHSTK_EN)) &&
+			  (opcode.flags & ShadowStack);
+
+		stop_em |= ((u_cet & CET_ENDBR_EN) || (s_cet & CET_ENDBR_EN)) &&
+			   (opcode.flags & IndirBrnTrk);
+
+		if (stop_em)
+			return EMULATION_FAILED;
+	}
+
 	ctxt->execute = opcode.u.execute;
 
 	if (unlikely(emulation_type & EMULTYPE_TRAP_UD) &&
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v11 19/23] KVM: x86: Enable CET virtualization for VMX and advertise to userspace
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
                   ` (17 preceding siblings ...)
  2025-07-04  8:49 ` [PATCH v11 18/23] KVM: x86: Don't emulate instructions guarded by CET Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-21 15:51   ` Mathias Krause
  2025-08-06 20:58   ` [PATCH v11 19/23] KVM: x86: Enable CET virtualization for VMX and advertise to userspace John Allen
  2025-07-04  8:49 ` [PATCH v11 20/23] KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2 Chao Gao
                   ` (5 subsequent siblings)
  24 siblings, 2 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Chao Gao, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

From: Yang Weijiang <weijiang.yang@intel.com>

Expose CET features to guest if KVM/host can support them, clear CPUID
feature bits if KVM/host cannot support.

Set CPUID feature bits so that CET features are available in guest CPUID.
Add CR4.CET bit support in order to allow guest set CET master control
bit.

Disable KVM CET feature if unrestricted_guest is unsupported/disabled as
KVM does not support emulating CET.

The CET load-bits in VM_ENTRY/VM_EXIT control fields should be set to make
guest CET xstates isolated from host's.

On platforms with VMX_BASIC[bit56] == 0, inject #CP at VMX entry with error
code will fail, and if VMX_BASIC[bit56] == 1, #CP injection with or without
error code is allowed. Disable CET feature bits if the MSR bit is cleared
so that nested VMM can inject #CP if and only if VMX_BASIC[bit56] == 1.

Don't expose CET feature if either of {U,S}_CET xstate bits is cleared
in host XSS or if XSAVES isn't supported.

CET MSRs are reset to 0s after RESET, power-up and INIT, clear guest CET
xsave-area fields so that guest CET MSRs are reset to 0s after the events.

Meanwhile explicitly disable SHSTK and IBT for SVM because CET KVM enabling
for SVM is not ready.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>

---
v11:
 1) Remove IBT CPUID reference to raw CPUID info after discussion.
 2) Disable SHSTK/IBT support in SVM explicitly per Sean's comment.
 3) Handle GUEST_S_CET field as a common field for SHSTK and IBT.
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/include/asm/vmx.h      |  1 +
 arch/x86/kvm/cpuid.c            |  2 ++
 arch/x86/kvm/svm/svm.c          |  4 ++++
 arch/x86/kvm/vmx/capabilities.h |  5 +++++
 arch/x86/kvm/vmx/vmx.c          | 30 +++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/vmx.h          |  6 ++++--
 arch/x86/kvm/x86.c              | 22 +++++++++++++++++++---
 arch/x86/kvm/x86.h              |  3 +++
 9 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 30d9d434c048..2aca91c7ae1b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -142,7 +142,7 @@
 			  | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
 			  | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
 			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
-			  | X86_CR4_LAM_SUP))
+			  | X86_CR4_LAM_SUP | X86_CR4_CET))
 
 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
 
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index ce10a7e2d3d9..c85c50019523 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -134,6 +134,7 @@
 #define VMX_BASIC_DUAL_MONITOR_TREATMENT	BIT_ULL(49)
 #define VMX_BASIC_INOUT				BIT_ULL(54)
 #define VMX_BASIC_TRUE_CTLS			BIT_ULL(55)
+#define VMX_BASIC_NO_HW_ERROR_CODE_CC		BIT_ULL(56)
 
 static inline u32 vmx_basic_vmcs_revision_id(u64 vmx_basic)
 {
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 9b45607f9b37..7007ce792706 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -944,6 +944,7 @@ void kvm_set_cpu_caps(void)
 		VENDOR_F(WAITPKG),
 		F(SGX_LC),
 		F(BUS_LOCK_DETECT),
+		F(SHSTK),
 	);
 
 	/*
@@ -970,6 +971,7 @@ void kvm_set_cpu_caps(void)
 		F(AMX_INT8),
 		F(AMX_BF16),
 		F(FLUSH_L1D),
+		F(IBT),
 	);
 
 	if (boot_cpu_has(X86_FEATURE_AMD_IBPB_RET) &&
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 803574920e41..6375695ce285 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5223,6 +5223,10 @@ static __init void svm_set_cpu_caps(void)
 	kvm_caps.supported_perf_cap = 0;
 	kvm_caps.supported_xss = 0;
 
+	/* KVM doesn't yet support CET virtualization for SVM. */
+	kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+	kvm_cpu_cap_clear(X86_FEATURE_IBT);
+
 	/* CPUID 0x80000001 and 0x8000000A (SVM features) */
 	if (nested) {
 		kvm_cpu_cap_set(X86_FEATURE_SVM);
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 0f0e1717dc80..09c130d2e595 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -77,6 +77,11 @@ static inline bool cpu_has_vmx_basic_inout(void)
 	return	vmcs_config.basic & VMX_BASIC_INOUT;
 }
 
+static inline bool cpu_has_vmx_basic_no_hw_errcode(void)
+{
+	return	vmcs_config.basic & VMX_BASIC_NO_HW_ERROR_CODE_CC;
+}
+
 static inline bool cpu_has_virtual_nmis(void)
 {
 	return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS &&
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3d6da3836e6b..d837876e3726 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2598,6 +2598,7 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 		{ VM_ENTRY_LOAD_IA32_EFER,		VM_EXIT_LOAD_IA32_EFER },
 		{ VM_ENTRY_LOAD_BNDCFGS,		VM_EXIT_CLEAR_BNDCFGS },
 		{ VM_ENTRY_LOAD_IA32_RTIT_CTL,		VM_EXIT_CLEAR_IA32_RTIT_CTL },
+		{ VM_ENTRY_LOAD_CET_STATE,		VM_EXIT_LOAD_CET_STATE },
 	};
 
 	memset(vmcs_conf, 0, sizeof(*vmcs_conf));
@@ -4862,6 +4863,14 @@ void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 
 	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);  /* 22.2.1 */
 
+	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
+		vmcs_writel(GUEST_SSP, 0);
+		vmcs_writel(GUEST_INTR_SSP_TABLE, 0);
+	}
+	if (kvm_cpu_cap_has(X86_FEATURE_IBT) ||
+	    kvm_cpu_cap_has(X86_FEATURE_SHSTK))
+		vmcs_writel(GUEST_S_CET, 0);
+
 	kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
 
 	vpid_sync_context(vmx->vpid);
@@ -6310,6 +6319,10 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 	if (vmcs_read32(VM_EXIT_MSR_STORE_COUNT) > 0)
 		vmx_dump_msrs("guest autostore", &vmx->msr_autostore.guest);
 
+	if (vmentry_ctl & VM_ENTRY_LOAD_CET_STATE)
+		pr_err("S_CET = 0x%016lx, SSP = 0x%016lx, SSP TABLE = 0x%016lx\n",
+		       vmcs_readl(GUEST_S_CET), vmcs_readl(GUEST_SSP),
+		       vmcs_readl(GUEST_INTR_SSP_TABLE));
 	pr_err("*** Host State ***\n");
 	pr_err("RIP = 0x%016lx  RSP = 0x%016lx\n",
 	       vmcs_readl(HOST_RIP), vmcs_readl(HOST_RSP));
@@ -6340,6 +6353,10 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 		       vmcs_read64(HOST_IA32_PERF_GLOBAL_CTRL));
 	if (vmcs_read32(VM_EXIT_MSR_LOAD_COUNT) > 0)
 		vmx_dump_msrs("host autoload", &vmx->msr_autoload.host);
+	if (vmexit_ctl & VM_EXIT_LOAD_CET_STATE)
+		pr_err("S_CET = 0x%016lx, SSP = 0x%016lx, SSP TABLE = 0x%016lx\n",
+		       vmcs_readl(HOST_S_CET), vmcs_readl(HOST_SSP),
+		       vmcs_readl(HOST_INTR_SSP_TABLE));
 
 	pr_err("*** Control State ***\n");
 	pr_err("CPUBased=0x%08x SecondaryExec=0x%08x TertiaryExec=0x%016llx\n",
@@ -7917,7 +7934,6 @@ static __init void vmx_set_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_UMIP);
 
 	/* CPUID 0xD.1 */
-	kvm_caps.supported_xss = 0;
 	if (!cpu_has_vmx_xsaves())
 		kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
 
@@ -7929,6 +7945,18 @@ static __init void vmx_set_cpu_caps(void)
 
 	if (cpu_has_vmx_waitpkg())
 		kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG);
+
+	/*
+	 * Disable CET if unrestricted_guest is unsupported as KVM doesn't
+	 * enforce CET HW behaviors in emulator. On platforms with
+	 * VMX_BASIC[bit56] == 0, inject #CP at VMX entry with error code
+	 * fails, so disable CET in this case too.
+	 */
+	if (!cpu_has_load_cet_ctrl() || !enable_unrestricted_guest ||
+	    !cpu_has_vmx_basic_no_hw_errcode()) {
+		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+		kvm_cpu_cap_clear(X86_FEATURE_IBT);
+	}
 }
 
 static bool vmx_is_io_intercepted(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 87174d961c85..f93062965a7a 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -482,7 +482,8 @@ static inline u8 vmx_get_rvi(void)
 	 VM_ENTRY_LOAD_IA32_EFER |					\
 	 VM_ENTRY_LOAD_BNDCFGS |					\
 	 VM_ENTRY_PT_CONCEAL_PIP |					\
-	 VM_ENTRY_LOAD_IA32_RTIT_CTL)
+	 VM_ENTRY_LOAD_IA32_RTIT_CTL |					\
+	 VM_ENTRY_LOAD_CET_STATE)
 
 #define __KVM_REQUIRED_VMX_VM_EXIT_CONTROLS				\
 	(VM_EXIT_SAVE_DEBUG_CONTROLS |					\
@@ -504,7 +505,8 @@ static inline u8 vmx_get_rvi(void)
 	       VM_EXIT_LOAD_IA32_EFER |					\
 	       VM_EXIT_CLEAR_BNDCFGS |					\
 	       VM_EXIT_PT_CONCEAL_PIP |					\
-	       VM_EXIT_CLEAR_IA32_RTIT_CTL)
+	       VM_EXIT_CLEAR_IA32_RTIT_CTL |				\
+	       VM_EXIT_LOAD_CET_STATE)
 
 #define KVM_REQUIRED_VMX_PIN_BASED_VM_EXEC_CONTROL			\
 	(PIN_BASED_EXT_INTR_MASK |					\
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b17a8bf84db3..3b99124f8985 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -223,7 +223,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
 				| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
 				| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
 
-#define KVM_SUPPORTED_XSS     0
+#define KVM_SUPPORTED_XSS	(XFEATURE_MASK_CET_USER | \
+				 XFEATURE_MASK_CET_KERNEL)
 
 bool __read_mostly allow_smaller_maxphyaddr = 0;
 EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr);
@@ -10073,6 +10074,20 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 	if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
 		kvm_caps.supported_xss = 0;
 
+	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
+	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
+		kvm_caps.supported_xss &= ~(XFEATURE_MASK_CET_USER |
+					    XFEATURE_MASK_CET_KERNEL);
+
+	if ((kvm_caps.supported_xss & (XFEATURE_MASK_CET_USER |
+	     XFEATURE_MASK_CET_KERNEL)) !=
+	    (XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL)) {
+		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+		kvm_cpu_cap_clear(X86_FEATURE_IBT);
+		kvm_caps.supported_xss &= ~(XFEATURE_MASK_CET_USER |
+					    XFEATURE_MASK_CET_KERNEL);
+	}
+
 	if (kvm_caps.has_tsc_control) {
 		/*
 		 * Make sure the user can only configure tsc_khz values that
@@ -12729,10 +12744,11 @@ static void kvm_xstate_reset(struct kvm_vcpu *vcpu, bool init_event)
 	/*
 	 * On INIT, only select XSTATE components are zeroed, most components
 	 * are unchanged.  Currently, the only components that are zeroed and
-	 * supported by KVM are MPX related.
+	 * supported by KVM are MPX and CET related.
 	 */
 	xfeatures_mask = (kvm_caps.supported_xcr0 | kvm_caps.supported_xss) &
-			 (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR);
+			 (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR |
+			  XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL);
 	if (!xfeatures_mask)
 		return;
 
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 8d2049a1f41b..e54c779e74a3 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -647,6 +647,9 @@ static inline bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 		__reserved_bits |= X86_CR4_PCIDE;       \
 	if (!__cpu_has(__c, X86_FEATURE_LAM))           \
 		__reserved_bits |= X86_CR4_LAM_SUP;     \
+	if (!__cpu_has(__c, X86_FEATURE_SHSTK) &&       \
+	    !__cpu_has(__c, X86_FEATURE_IBT))           \
+		__reserved_bits |= X86_CR4_CET;         \
 	__reserved_bits;                                \
 })
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v11 20/23] KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
                   ` (18 preceding siblings ...)
  2025-07-04  8:49 ` [PATCH v11 19/23] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-04  8:49 ` [PATCH v11 21/23] KVM: nVMX: Enable CET support for nested guest Chao Gao
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Chao Gao, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

From: Yang Weijiang <weijiang.yang@intel.com>

Per SDM description(Vol.3D, Appendix A.1):
"If bit 56 is read as 1, software can use VM entry to deliver a hardware
exception with or without an error code, regardless of vector"

Modify has_error_code check before inject events to nested guest. Only
enforce the check when guest is in real mode, the exception is not hard
exception and the platform doesn't enumerate bit56 in VMX_BASIC, in all
other case ignore the check to make the logic consistent with SDM.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 28 +++++++++++++++++++---------
 arch/x86/kvm/vmx/nested.h |  5 +++++
 2 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index e7374834453c..683a9cad04df 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1266,9 +1266,10 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data)
 {
 	const u64 feature_bits = VMX_BASIC_DUAL_MONITOR_TREATMENT |
 				 VMX_BASIC_INOUT |
-				 VMX_BASIC_TRUE_CTLS;
+				 VMX_BASIC_TRUE_CTLS |
+				 VMX_BASIC_NO_HW_ERROR_CODE_CC;
 
-	const u64 reserved_bits = GENMASK_ULL(63, 56) |
+	const u64 reserved_bits = GENMASK_ULL(63, 57) |
 				  GENMASK_ULL(47, 45) |
 				  BIT_ULL(31);
 
@@ -2943,7 +2944,6 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
 		u8 vector = intr_info & INTR_INFO_VECTOR_MASK;
 		u32 intr_type = intr_info & INTR_INFO_INTR_TYPE_MASK;
 		bool has_error_code = intr_info & INTR_INFO_DELIVER_CODE_MASK;
-		bool should_have_error_code;
 		bool urg = nested_cpu_has2(vmcs12,
 					   SECONDARY_EXEC_UNRESTRICTED_GUEST);
 		bool prot_mode = !urg || vmcs12->guest_cr0 & X86_CR0_PE;
@@ -2960,12 +2960,20 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
 		    CC(intr_type == INTR_TYPE_OTHER_EVENT && vector != 0))
 			return -EINVAL;
 
-		/* VM-entry interruption-info field: deliver error code */
-		should_have_error_code =
-			intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
-			x86_exception_has_error_code(vector);
-		if (CC(has_error_code != should_have_error_code))
-			return -EINVAL;
+		/*
+		 * Cannot deliver error code in real mode or if the interrupt
+		 * type is not hardware exception. For other cases, do the
+		 * consistency check only if the vCPU doesn't enumerate
+		 * VMX_BASIC_NO_HW_ERROR_CODE_CC.
+		 */
+		if (!prot_mode || intr_type != INTR_TYPE_HARD_EXCEPTION) {
+			if (CC(has_error_code))
+				return -EINVAL;
+		} else if (!nested_cpu_has_no_hw_errcode_cc(vcpu)) {
+			if (CC(has_error_code !=
+			       x86_exception_has_error_code(vector)))
+				return -EINVAL;
+		}
 
 		/* VM-entry exception error code */
 		if (CC(has_error_code &&
@@ -7200,6 +7208,8 @@ static void nested_vmx_setup_basic(struct nested_vmx_msrs *msrs)
 	msrs->basic |= VMX_BASIC_TRUE_CTLS;
 	if (cpu_has_vmx_basic_inout())
 		msrs->basic |= VMX_BASIC_INOUT;
+	if (cpu_has_vmx_basic_no_hw_errcode())
+		msrs->basic |= VMX_BASIC_NO_HW_ERROR_CODE_CC;
 }
 
 static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index 6eedcfc91070..983484d42ebf 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -309,6 +309,11 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
 	       __kvm_is_valid_cr4(vcpu, val);
 }
 
+static inline bool nested_cpu_has_no_hw_errcode_cc(struct kvm_vcpu *vcpu)
+{
+	return to_vmx(vcpu)->nested.msrs.basic & VMX_BASIC_NO_HW_ERROR_CODE_CC;
+}
+
 /* No difference in the restrictions on guest and host CR4 in VMX operation. */
 #define nested_guest_cr4_valid	nested_cr4_valid
 #define nested_host_cr4_valid	nested_cr4_valid
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v11 21/23] KVM: nVMX: Enable CET support for nested guest
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
                   ` (19 preceding siblings ...)
  2025-07-04  8:49 ` [PATCH v11 20/23] KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2 Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-28  6:30   ` Xin Li
  2025-07-04  8:49 ` [PATCH v11 22/23] KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET Chao Gao
                   ` (3 subsequent siblings)
  24 siblings, 1 reply; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Chao Gao, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

From: Yang Weijiang <weijiang.yang@intel.com>

Set up CET MSRs, related VM_ENTRY/EXIT control bits and fixed CR4 setting
to enable CET for nested VM.

vmcs12 and vmcs02 needs to be synced when L2 exits to L1 or when L1 wants
to resume L2, that way correct CET states can be observed by one another.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>

---
v11:
Handle GUEST_S_CET as a common field for SHSTK and IBT.
---
 arch/x86/kvm/vmx/nested.c | 80 ++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/vmcs12.c |  6 +++
 arch/x86/kvm/vmx/vmcs12.h | 14 ++++++-
 arch/x86/kvm/vmx/vmx.c    |  2 +
 arch/x86/kvm/vmx/vmx.h    |  3 ++
 5 files changed, 102 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 683a9cad04df..3e6f7b4fc374 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -715,6 +715,28 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
 					 MSR_IA32_FLUSH_CMD, MSR_TYPE_W);
 
+	/* Pass CET MSRs to nested VM if L0 and L1 are set to pass-through. */
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_U_CET, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_S_CET, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PL0_SSP, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PL1_SSP, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PL2_SSP, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PL3_SSP, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_INT_SSP_TAB, MSR_TYPE_RW);
+
 	kvm_vcpu_unmap(vcpu, &map);
 
 	vmx->nested.force_msr_bitmap_recalc = false;
@@ -2515,6 +2537,30 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
 	}
 }
 
+static inline void cet_vmcs_fields_get(struct kvm_vcpu *vcpu, u64 *ssp,
+				       u64 *s_cet, u64 *ssp_tbl)
+{
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) {
+		*ssp = vmcs_readl(GUEST_SSP);
+		*ssp_tbl = vmcs_readl(GUEST_INTR_SSP_TABLE);
+	}
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) ||
+	    guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+		*s_cet = vmcs_readl(GUEST_S_CET);
+}
+
+static inline void cet_vmcs_fields_set(struct kvm_vcpu *vcpu, u64 ssp,
+				       u64 s_cet, u64 ssp_tbl)
+{
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) {
+		vmcs_writel(GUEST_SSP, ssp);
+		vmcs_writel(GUEST_INTR_SSP_TABLE, ssp_tbl);
+	}
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) ||
+	    guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+		vmcs_writel(GUEST_S_CET, s_cet);
+}
+
 static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 {
 	struct hv_enlightened_vmcs *hv_evmcs = nested_vmx_evmcs(vmx);
@@ -2631,6 +2677,11 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 	vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.host.nr);
 	vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr);
 
+	if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE)
+		cet_vmcs_fields_set(&vmx->vcpu, vmcs12->guest_ssp,
+				    vmcs12->guest_s_cet,
+				    vmcs12->guest_ssp_tbl);
+
 	set_cr4_guest_host_mask(vmx);
 }
 
@@ -2670,6 +2721,13 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 		kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
 		vmx_guest_debugctl_write(vcpu, vmx->nested.pre_vmenter_debugctl);
 	}
+
+	if (!vmx->nested.nested_run_pending ||
+	    !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE))
+		cet_vmcs_fields_set(vcpu, vmx->nested.pre_vmenter_ssp,
+				    vmx->nested.pre_vmenter_s_cet,
+				    vmx->nested.pre_vmenter_ssp_tbl);
+
 	if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending ||
 	    !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)))
 		vmcs_write64(GUEST_BNDCFGS, vmx->nested.pre_vmenter_bndcfgs);
@@ -3546,6 +3604,12 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 	     !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)))
 		vmx->nested.pre_vmenter_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
 
+	if (!vmx->nested.nested_run_pending ||
+	    !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE))
+		cet_vmcs_fields_get(vcpu, &vmx->nested.pre_vmenter_ssp,
+				    &vmx->nested.pre_vmenter_s_cet,
+				    &vmx->nested.pre_vmenter_ssp_tbl);
+
 	/*
 	 * Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and*
 	 * nested early checks are disabled.  In the event of a "late" VM-Fail,
@@ -4473,6 +4537,9 @@ static bool is_vmcs12_ext_field(unsigned long field)
 	case GUEST_IDTR_BASE:
 	case GUEST_PENDING_DBG_EXCEPTIONS:
 	case GUEST_BNDCFGS:
+	case GUEST_SSP:
+	case GUEST_S_CET:
+	case GUEST_INTR_SSP_TABLE:
 		return true;
 	default:
 		break;
@@ -4523,6 +4590,10 @@ static void sync_vmcs02_to_vmcs12_rare(struct kvm_vcpu *vcpu,
 	vmcs12->guest_pending_dbg_exceptions =
 		vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS);
 
+	cet_vmcs_fields_get(&vmx->vcpu, &vmcs12->guest_ssp,
+			    &vmcs12->guest_s_cet,
+			    &vmcs12->guest_ssp_tbl);
+
 	vmx->nested.need_sync_vmcs02_to_vmcs12_rare = false;
 }
 
@@ -4754,6 +4825,10 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
 	if (vmcs12->vm_exit_controls & VM_EXIT_CLEAR_BNDCFGS)
 		vmcs_write64(GUEST_BNDCFGS, 0);
 
+	if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_CET_STATE)
+		cet_vmcs_fields_set(vcpu, vmcs12->host_ssp, vmcs12->host_s_cet,
+				    vmcs12->host_ssp_tbl);
+
 	if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PAT) {
 		vmcs_write64(GUEST_IA32_PAT, vmcs12->host_ia32_pat);
 		vcpu->arch.pat = vmcs12->host_ia32_pat;
@@ -7032,7 +7107,7 @@ static void nested_vmx_setup_exit_ctls(struct vmcs_config *vmcs_conf,
 		VM_EXIT_HOST_ADDR_SPACE_SIZE |
 #endif
 		VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
-		VM_EXIT_CLEAR_BNDCFGS;
+		VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_CET_STATE;
 	msrs->exit_ctls_high |=
 		VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 		VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER |
@@ -7054,7 +7129,8 @@ static void nested_vmx_setup_entry_ctls(struct vmcs_config *vmcs_conf,
 #ifdef CONFIG_X86_64
 		VM_ENTRY_IA32E_MODE |
 #endif
-		VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS;
+		VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS |
+		VM_ENTRY_LOAD_CET_STATE;
 	msrs->entry_ctls_high |=
 		(VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER |
 		 VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL);
diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c
index 106a72c923ca..4233b5ca9461 100644
--- a/arch/x86/kvm/vmx/vmcs12.c
+++ b/arch/x86/kvm/vmx/vmcs12.c
@@ -139,6 +139,9 @@ const unsigned short vmcs12_field_offsets[] = {
 	FIELD(GUEST_PENDING_DBG_EXCEPTIONS, guest_pending_dbg_exceptions),
 	FIELD(GUEST_SYSENTER_ESP, guest_sysenter_esp),
 	FIELD(GUEST_SYSENTER_EIP, guest_sysenter_eip),
+	FIELD(GUEST_S_CET, guest_s_cet),
+	FIELD(GUEST_SSP, guest_ssp),
+	FIELD(GUEST_INTR_SSP_TABLE, guest_ssp_tbl),
 	FIELD(HOST_CR0, host_cr0),
 	FIELD(HOST_CR3, host_cr3),
 	FIELD(HOST_CR4, host_cr4),
@@ -151,5 +154,8 @@ const unsigned short vmcs12_field_offsets[] = {
 	FIELD(HOST_IA32_SYSENTER_EIP, host_ia32_sysenter_eip),
 	FIELD(HOST_RSP, host_rsp),
 	FIELD(HOST_RIP, host_rip),
+	FIELD(HOST_S_CET, host_s_cet),
+	FIELD(HOST_SSP, host_ssp),
+	FIELD(HOST_INTR_SSP_TABLE, host_ssp_tbl),
 };
 const unsigned int nr_vmcs12_fields = ARRAY_SIZE(vmcs12_field_offsets);
diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
index 56fd150a6f24..4ad6b16525b9 100644
--- a/arch/x86/kvm/vmx/vmcs12.h
+++ b/arch/x86/kvm/vmx/vmcs12.h
@@ -117,7 +117,13 @@ struct __packed vmcs12 {
 	natural_width host_ia32_sysenter_eip;
 	natural_width host_rsp;
 	natural_width host_rip;
-	natural_width paddingl[8]; /* room for future expansion */
+	natural_width host_s_cet;
+	natural_width host_ssp;
+	natural_width host_ssp_tbl;
+	natural_width guest_s_cet;
+	natural_width guest_ssp;
+	natural_width guest_ssp_tbl;
+	natural_width paddingl[2]; /* room for future expansion */
 	u32 pin_based_vm_exec_control;
 	u32 cpu_based_vm_exec_control;
 	u32 exception_bitmap;
@@ -294,6 +300,12 @@ static inline void vmx_check_vmcs12_offsets(void)
 	CHECK_OFFSET(host_ia32_sysenter_eip, 656);
 	CHECK_OFFSET(host_rsp, 664);
 	CHECK_OFFSET(host_rip, 672);
+	CHECK_OFFSET(host_s_cet, 680);
+	CHECK_OFFSET(host_ssp, 688);
+	CHECK_OFFSET(host_ssp_tbl, 696);
+	CHECK_OFFSET(guest_s_cet, 704);
+	CHECK_OFFSET(guest_ssp, 712);
+	CHECK_OFFSET(guest_ssp_tbl, 720);
 	CHECK_OFFSET(pin_based_vm_exec_control, 744);
 	CHECK_OFFSET(cpu_based_vm_exec_control, 748);
 	CHECK_OFFSET(exception_bitmap, 752);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d837876e3726..46188e1a01a7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7703,6 +7703,8 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)
 	cr4_fixed1_update(X86_CR4_PKE,        ecx, feature_bit(PKU));
 	cr4_fixed1_update(X86_CR4_UMIP,       ecx, feature_bit(UMIP));
 	cr4_fixed1_update(X86_CR4_LA57,       ecx, feature_bit(LA57));
+	cr4_fixed1_update(X86_CR4_CET,	      ecx, feature_bit(SHSTK));
+	cr4_fixed1_update(X86_CR4_CET,	      edx, feature_bit(IBT));
 
 	entry = kvm_find_cpuid_entry_index(vcpu, 0x7, 1);
 	cr4_fixed1_update(X86_CR4_LAM_SUP,    eax, feature_bit(LAM));
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index f93062965a7a..c7b037ee1dce 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -181,6 +181,9 @@ struct nested_vmx {
 	 */
 	u64 pre_vmenter_debugctl;
 	u64 pre_vmenter_bndcfgs;
+	u64 pre_vmenter_ssp;
+	u64 pre_vmenter_s_cet;
+	u64 pre_vmenter_ssp_tbl;
 
 	/* to migrate it to L1 if L2 writes to L1's CR8 directly */
 	int l1_tpr_threshold;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v11 22/23] KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
                   ` (20 preceding siblings ...)
  2025-07-04  8:49 ` [PATCH v11 21/23] KVM: nVMX: Enable CET support for nested guest Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-04  8:49 ` [PATCH v11 23/23] KVM: nVMX: Add consistency checks for CET states Chao Gao
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Chao Gao, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

Add consistency checks for CR4.CET and CR0.WP in guest-state or host-state
area in the VMCS12. This ensures that configurations with CR4.CET set and
CR0.WP not set result in VM-entry failure, aligning with architectural
behavior.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 3e6f7b4fc374..362e241a2cbb 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3108,6 +3108,9 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
 	    CC(!kvm_vcpu_is_legal_cr3(vcpu, vmcs12->host_cr3)))
 		return -EINVAL;
 
+	if (CC(vmcs12->host_cr4 & X86_CR4_CET && !(vmcs12->host_cr0 & X86_CR0_WP)))
+		return -EINVAL;
+
 	if (CC(is_noncanonical_msr_address(vmcs12->host_ia32_sysenter_esp, vcpu)) ||
 	    CC(is_noncanonical_msr_address(vmcs12->host_ia32_sysenter_eip, vcpu)))
 		return -EINVAL;
@@ -3222,6 +3225,9 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
 	    CC(!nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4)))
 		return -EINVAL;
 
+	if (CC(vmcs12->guest_cr4 & X86_CR4_CET && !(vmcs12->guest_cr0 & X86_CR0_WP)))
+		return -EINVAL;
+
 	if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
 	    (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
 	     CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false))))
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v11 23/23] KVM: nVMX: Add consistency checks for CET states
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
                   ` (21 preceding siblings ...)
  2025-07-04  8:49 ` [PATCH v11 22/23] KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET Chao Gao
@ 2025-07-04  8:49 ` Chao Gao
  2025-07-06 16:51 ` [PATCH v11 00/23] Enable CET Virtualization Xiaoyao Li
  2025-07-21 15:35 ` Mathias Krause
  24 siblings, 0 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-04  8:49 UTC (permalink / raw)
  To: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Chao Gao, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

Introduce consistency checks for CET states during nested VM-entry.

A VMCS contains both guest and host CET states, each comprising the
IA32_S_CET MSR, SSP, and IA32_INTERRUPT_SSP_TABLE_ADDR MSR. Various
checks are applied to CET states during VM-entry as documented in SDM
Vol3 Chapter "VM ENTRIES". Implement all these checks during nested
VM-entry to emulate the architectural behavior.

In summary, there are three kinds of checks on guest/host CET states
during VM-entry:

A. Checks applied to both guest states and host states:

 * The IA32_S_CET field must not set any reserved bits; bits 10 (SUPPRESS)
   and 11 (TRACKER) cannot both be set.
 * SSP should not have bits 1:0 set.
 * The IA32_INTERRUPT_SSP_TABLE_ADDR field must be canonical.

B. Checks applied to host states only

 * IA32_S_CET MSR and SSP must be canonical if the CPU enters 64-bit mode
   after VM-exit. Otherwise, IA32_S_CET and SSP must have their higher 32
   bits cleared.

C. Checks applied to guest states only:

 * IA32_S_CET MSR and SSP are not required to be canonical (i.e., 63:N-1
   are identical, where N is the CPU's maximum linear-address width). But,
   bits 63:N of SSP must be identical.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 47 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 362e241a2cbb..3d16d97d8dc7 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3098,6 +3098,17 @@ static bool is_l1_noncanonical_address_on_vmexit(u64 la, struct vmcs12 *vmcs12)
 	return !__is_canonical_address(la, l1_address_bits_on_exit);
 }
 
+static bool is_valid_cet_state(struct kvm_vcpu *vcpu, u64 s_cet, u64 ssp, u64 ssp_tbl)
+{
+	if (!is_cet_msr_valid(vcpu, s_cet) || !IS_ALIGNED(ssp, 4))
+		return false;
+
+	if (is_noncanonical_msr_address(ssp_tbl, vcpu))
+		return false;
+
+	return true;
+}
+
 static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
 				       struct vmcs12 *vmcs12)
 {
@@ -3167,6 +3178,26 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
 			return -EINVAL;
 	}
 
+	if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_CET_STATE) {
+		if (CC(!is_valid_cet_state(vcpu, vmcs12->host_s_cet, vmcs12->host_ssp,
+					   vmcs12->host_ssp_tbl)))
+			return -EINVAL;
+
+		/*
+		 * IA32_S_CET and SSP must be canonical if the host will
+		 * enter 64-bit mode after VM-exit; otherwise, higher
+		 * 32-bits must be all 0s.
+		 */
+		if (ia32e) {
+			if (CC(is_noncanonical_msr_address(vmcs12->host_s_cet, vcpu)) ||
+			    CC(is_noncanonical_msr_address(vmcs12->host_ssp, vcpu)))
+				return -EINVAL;
+		} else {
+			if (CC(vmcs12->host_s_cet >> 32) || CC(vmcs12->host_ssp >> 32))
+				return -EINVAL;
+		}
+	}
+
 	return 0;
 }
 
@@ -3277,6 +3308,22 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
 	     CC((vmcs12->guest_bndcfgs & MSR_IA32_BNDCFGS_RSVD))))
 		return -EINVAL;
 
+	if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE) {
+		if (CC(!is_valid_cet_state(vcpu, vmcs12->guest_s_cet, vmcs12->guest_ssp,
+					   vmcs12->guest_ssp_tbl)))
+			return -EINVAL;
+
+		/*
+		 * Guest SSP must have 63:N bits identical, rather than
+		 * be canonical (i.e., 63:N-1 bits identical), where N is
+		 * the CPU's maximum linear-address width. Similar to
+		 * is_noncanonical_msr_address(), use the host's
+		 * linear-address width.
+		 */
+		if (CC(!__is_canonical_address(vmcs12->guest_ssp, max_host_virt_addr_bits() + 1)))
+			return -EINVAL;
+	}
+
 	if (nested_check_guest_non_reg_state(vmcs12))
 		return -EINVAL;
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 00/23] Enable CET Virtualization
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
                   ` (22 preceding siblings ...)
  2025-07-04  8:49 ` [PATCH v11 23/23] KVM: nVMX: Add consistency checks for CET states Chao Gao
@ 2025-07-06 16:51 ` Xiaoyao Li
  2025-07-07  1:32   ` Chao Gao
  2025-07-21 15:35 ` Mathias Krause
  24 siblings, 1 reply; 49+ messages in thread
From: Xiaoyao Li @ 2025-07-06 16:51 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Borislav Petkov, Dave Hansen, H. Peter Anvin, Ingo Molnar,
	Thomas Gleixner

Hi Chao,

On 7/4/2025 4:49 PM, Chao Gao wrote:
> Tests:
> ======================
> This series passed basic CET user shadow stack test and kernel IBT test in L1
> and L2 guest.
> The patch series_has_ impact to existing vmx test cases in KVM-unit-tests,the
> failures have been fixed here[1].
> One new selftest app[2] is introduced for testing CET MSRs accessibilities.
> 
> Note, this series hasn't been tested on AMD platform yet.
> 
> To run user SHSTK test and kernel IBT test in guest, an CET capable platform
> is required, e.g., Sapphire Rapids server, and follow below steps to build
> the binaries:
> 
> 1. Host kernel: Apply this series to mainline kernel (>= v6.6) and build.
> 
> 2. Guest kernel: Pull kernel (>= v6.6), opt-in CONFIG_X86_KERNEL_IBT
> and CONFIG_X86_USER_SHADOW_STACK options. Build with CET enabled gcc versions
> (>= 8.5.0).
> 
> 3. Apply CET QEMU patches[3] before build mainline QEMU.

You forgot to provide the links of [1][2][3].

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 00/23] Enable CET Virtualization
  2025-07-06 16:51 ` [PATCH v11 00/23] Enable CET Virtualization Xiaoyao Li
@ 2025-07-07  1:32   ` Chao Gao
  2025-07-16 20:36     ` John Allen
  0 siblings, 1 reply; 49+ messages in thread
From: Chao Gao @ 2025-07-07  1:32 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen,
	rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	xin, Borislav Petkov, Dave Hansen, H. Peter Anvin, Ingo Molnar,
	Thomas Gleixner

On Mon, Jul 07, 2025 at 12:51:14AM +0800, Xiaoyao Li wrote:
>Hi Chao,
>
>On 7/4/2025 4:49 PM, Chao Gao wrote:
>> Tests:
>> ======================
>> This series passed basic CET user shadow stack test and kernel IBT test in L1
>> and L2 guest.
>> The patch series_has_ impact to existing vmx test cases in KVM-unit-tests,the
>> failures have been fixed here[1].
>> One new selftest app[2] is introduced for testing CET MSRs accessibilities.
>> 
>> Note, this series hasn't been tested on AMD platform yet.
>> 
>> To run user SHSTK test and kernel IBT test in guest, an CET capable platform
>> is required, e.g., Sapphire Rapids server, and follow below steps to build
>> the binaries:
>> 
>> 1. Host kernel: Apply this series to mainline kernel (>= v6.6) and build.
>> 
>> 2. Guest kernel: Pull kernel (>= v6.6), opt-in CONFIG_X86_KERNEL_IBT
>> and CONFIG_X86_USER_SHADOW_STACK options. Build with CET enabled gcc versions
>> (>= 8.5.0).
>> 
>> 3. Apply CET QEMU patches[3] before build mainline QEMU.
>
>You forgot to provide the links of [1][2][3].

Oops, thanks for catching this.

Here are the links:

[1]: KVM-unit-tests fixup:
https://lore.kernel.org/all/20230913235006.74172-1-weijiang.yang@intel.com/
[2]: Selftest for CET MSRs:
https://lore.kernel.org/all/20230914064201.85605-1-weijiang.yang@intel.com/
[3]: QEMU patch:
https://lore.kernel.org/all/20230720111445.99509-1-weijiang.yang@intel.com/

Please note that [1] has already been merged. And [3] is an older version of
CET for QEMU; I plan to post a new version for QEMU after the KVM series is
merged.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 00/23] Enable CET Virtualization
  2025-07-07  1:32   ` Chao Gao
@ 2025-07-16 20:36     ` John Allen
  2025-07-17  7:00       ` Mathias Krause
  0 siblings, 1 reply; 49+ messages in thread
From: John Allen @ 2025-07-16 20:36 UTC (permalink / raw)
  To: Chao Gao
  Cc: Xiaoyao Li, kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen,
	rick.p.edgecombe, mlevitsk, weijiang.yang, minipli, xin,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Ingo Molnar,
	Thomas Gleixner

On Mon, Jul 07, 2025 at 09:32:37AM +0800, Chao Gao wrote:
> On Mon, Jul 07, 2025 at 12:51:14AM +0800, Xiaoyao Li wrote:
> >Hi Chao,
> >
> >On 7/4/2025 4:49 PM, Chao Gao wrote:
> >> Tests:
> >> ======================
> >> This series passed basic CET user shadow stack test and kernel IBT test in L1
> >> and L2 guest.
> >> The patch series_has_ impact to existing vmx test cases in KVM-unit-tests,the
> >> failures have been fixed here[1].
> >> One new selftest app[2] is introduced for testing CET MSRs accessibilities.
> >> 
> >> Note, this series hasn't been tested on AMD platform yet.
> >> 
> >> To run user SHSTK test and kernel IBT test in guest, an CET capable platform
> >> is required, e.g., Sapphire Rapids server, and follow below steps to build
> >> the binaries:
> >> 
> >> 1. Host kernel: Apply this series to mainline kernel (>= v6.6) and build.
> >> 
> >> 2. Guest kernel: Pull kernel (>= v6.6), opt-in CONFIG_X86_KERNEL_IBT
> >> and CONFIG_X86_USER_SHADOW_STACK options. Build with CET enabled gcc versions
> >> (>= 8.5.0).
> >> 
> >> 3. Apply CET QEMU patches[3] before build mainline QEMU.
> >
> >You forgot to provide the links of [1][2][3].
> 
> Oops, thanks for catching this.
> 
> Here are the links:
> 
> [1]: KVM-unit-tests fixup:
> https://lore.kernel.org/all/20230913235006.74172-1-weijiang.yang@intel.com/
> [2]: Selftest for CET MSRs:
> https://lore.kernel.org/all/20230914064201.85605-1-weijiang.yang@intel.com/
> [3]: QEMU patch:
> https://lore.kernel.org/all/20230720111445.99509-1-weijiang.yang@intel.com/
> 
> Please note that [1] has already been merged. And [3] is an older version of
> CET for QEMU; I plan to post a new version for QEMU after the KVM series is
> merged.

Do you happen to have a branch with the in-progress qemu patches you are
testing with? I'm working on testing on AMD and I'm having issues
getting this old version of the series to work properly.

Thanks,
John


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 00/23] Enable CET Virtualization
  2025-07-16 20:36     ` John Allen
@ 2025-07-17  7:00       ` Mathias Krause
  2025-07-17  7:57         ` Chao Gao
  0 siblings, 1 reply; 49+ messages in thread
From: Mathias Krause @ 2025-07-17  7:00 UTC (permalink / raw)
  To: John Allen, Chao Gao
  Cc: Xiaoyao Li, kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen,
	rick.p.edgecombe, mlevitsk, weijiang.yang, xin, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Thomas Gleixner

On 16.07.25 22:36, John Allen wrote:
> On Mon, Jul 07, 2025 at 09:32:37AM +0800, Chao Gao wrote:
>> On Mon, Jul 07, 2025 at 12:51:14AM +0800, Xiaoyao Li wrote:
>>> Hi Chao,
>>>
>>> On 7/4/2025 4:49 PM, Chao Gao wrote:
>>>> Tests:
>>>> ======================
>>>> This series passed basic CET user shadow stack test and kernel IBT test in L1
>>>> and L2 guest.
>>>> The patch series_has_ impact to existing vmx test cases in KVM-unit-tests,the
>>>> failures have been fixed here[1].
>>>> One new selftest app[2] is introduced for testing CET MSRs accessibilities.
>>>>
>>>> Note, this series hasn't been tested on AMD platform yet.
>>>>
>>>> To run user SHSTK test and kernel IBT test in guest, an CET capable platform
>>>> is required, e.g., Sapphire Rapids server, and follow below steps to build
>>>> the binaries:
>>>>
>>>> 1. Host kernel: Apply this series to mainline kernel (>= v6.6) and build.
>>>>
>>>> 2. Guest kernel: Pull kernel (>= v6.6), opt-in CONFIG_X86_KERNEL_IBT
>>>> and CONFIG_X86_USER_SHADOW_STACK options. Build with CET enabled gcc versions
>>>> (>= 8.5.0).
>>>>
>>>> 3. Apply CET QEMU patches[3] before build mainline QEMU.
>>>
>>> You forgot to provide the links of [1][2][3].
>>
>> Oops, thanks for catching this.
>>
>> Here are the links:
>>
>> [1]: KVM-unit-tests fixup:
>> https://lore.kernel.org/all/20230913235006.74172-1-weijiang.yang@intel.com/
>> [2]: Selftest for CET MSRs:
>> https://lore.kernel.org/all/20230914064201.85605-1-weijiang.yang@intel.com/
>> [3]: QEMU patch:
>> https://lore.kernel.org/all/20230720111445.99509-1-weijiang.yang@intel.com/
>>
>> Please note that [1] has already been merged. And [3] is an older version of
>> CET for QEMU; I plan to post a new version for QEMU after the KVM series is
>> merged.
> 
> Do you happen to have a branch with the in-progress qemu patches you are
> testing with? I'm working on testing on AMD and I'm having issues
> getting this old version of the series to work properly.

For me the old patches worked by changing the #define of
MSR_KVM_GUEST_SSP from 0x4b564d09 to 0x4b564dff -- on top of QEMU 9.0.1,
that is.

Thanks,
Mathias

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 00/23] Enable CET Virtualization
  2025-07-17  7:00       ` Mathias Krause
@ 2025-07-17  7:57         ` Chao Gao
  2025-07-21 18:08           ` John Allen
  0 siblings, 1 reply; 49+ messages in thread
From: Chao Gao @ 2025-07-17  7:57 UTC (permalink / raw)
  To: Mathias Krause
  Cc: John Allen, Xiaoyao Li, kvm, linux-kernel, x86, seanjc, pbonzini,
	dave.hansen, rick.p.edgecombe, mlevitsk, weijiang.yang, xin,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Ingo Molnar,
	Thomas Gleixner

On Thu, Jul 17, 2025 at 09:00:04AM +0200, Mathias Krause wrote:
>On 16.07.25 22:36, John Allen wrote:
>> On Mon, Jul 07, 2025 at 09:32:37AM +0800, Chao Gao wrote:
>>> On Mon, Jul 07, 2025 at 12:51:14AM +0800, Xiaoyao Li wrote:
>>>> Hi Chao,
>>>>
>>>> On 7/4/2025 4:49 PM, Chao Gao wrote:
>>>>> Tests:
>>>>> ======================
>>>>> This series passed basic CET user shadow stack test and kernel IBT test in L1
>>>>> and L2 guest.
>>>>> The patch series_has_ impact to existing vmx test cases in KVM-unit-tests,the
>>>>> failures have been fixed here[1].
>>>>> One new selftest app[2] is introduced for testing CET MSRs accessibilities.
>>>>>
>>>>> Note, this series hasn't been tested on AMD platform yet.
>>>>>
>>>>> To run user SHSTK test and kernel IBT test in guest, an CET capable platform
>>>>> is required, e.g., Sapphire Rapids server, and follow below steps to build
>>>>> the binaries:
>>>>>
>>>>> 1. Host kernel: Apply this series to mainline kernel (>= v6.6) and build.
>>>>>
>>>>> 2. Guest kernel: Pull kernel (>= v6.6), opt-in CONFIG_X86_KERNEL_IBT
>>>>> and CONFIG_X86_USER_SHADOW_STACK options. Build with CET enabled gcc versions
>>>>> (>= 8.5.0).
>>>>>
>>>>> 3. Apply CET QEMU patches[3] before build mainline QEMU.
>>>>
>>>> You forgot to provide the links of [1][2][3].
>>>
>>> Oops, thanks for catching this.
>>>
>>> Here are the links:
>>>
>>> [1]: KVM-unit-tests fixup:
>>> https://lore.kernel.org/all/20230913235006.74172-1-weijiang.yang@intel.com/
>>> [2]: Selftest for CET MSRs:
>>> https://lore.kernel.org/all/20230914064201.85605-1-weijiang.yang@intel.com/
>>> [3]: QEMU patch:
>>> https://lore.kernel.org/all/20230720111445.99509-1-weijiang.yang@intel.com/
>>>
>>> Please note that [1] has already been merged. And [3] is an older version of
>>> CET for QEMU; I plan to post a new version for QEMU after the KVM series is
>>> merged.
>> 
>> Do you happen to have a branch with the in-progress qemu patches you are
>> testing with? I'm working on testing on AMD and I'm having issues
>> getting this old version of the series to work properly.

Hi John,

Try this branch:

https://github.com/gaochaointel/qemu-dev qemu-cet

Disclaimer: I haven't cleaned up the QEMU patches yet, so they are not of
upstream quality.

>
>For me the old patches worked by changing the #define of
>MSR_KVM_GUEST_SSP from 0x4b564d09 to 0x4b564dff -- on top of QEMU 9.0.1,
>that is.

Please note that aliasing guest SSP to the virtual MSR indexed by 0x4b564dff is
not part of KVM uAPI in the v11 series. This means the index 0x4b564dff isn't
stable; userspace should read/write guest SSP via KVM_GET/SET_ONE_REG ioctls.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 00/23] Enable CET Virtualization
  2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
                   ` (23 preceding siblings ...)
  2025-07-06 16:51 ` [PATCH v11 00/23] Enable CET Virtualization Xiaoyao Li
@ 2025-07-21 15:35 ` Mathias Krause
  24 siblings, 0 replies; 49+ messages in thread
From: Mathias Krause @ 2025-07-21 15:35 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, xin,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Ingo Molnar,
	Thomas Gleixner

On 04.07.25 10:49, Chao Gao wrote:
> The FPU support for CET virtualization has already been merged into the tip
> tree. This v11 adds Intel CET virtualization in KVM and is based on
> tip/master plus Sean's MSR cleanups. For your convenience, it is also
> available at
> 
>   https://github.com/gaochaointel/linux-dev cet-v11
> 
> Changes in v11 (Most changes are suggested by Sean. Thanks!):
> 1. Rebased onto the latest tip tree + Sean's MSR cleanups
> 2. Made patch 1's shortlog informative and accurate
> 3. Slotted in two cleanup patches from Sean (patch 3/4)
> 4. Used KVM_GET/SET_ONE_REG ioctl for userspace to read/write SSP.
>    still assigned a KVM-defined MSR index for SSP but the index isn't
>    part of uAPI now.
> 5. Used KVM_MSR_RET_UNSUPPORTED to reject accesses to unsupported CET MSRs
> 6. Synthesized triple-fault when reading/writing SSP failed during
>    entering into SMM or exiting from SMM
> 7. Removed an inappropriate "quirk" in v10 that advertised IBT to userspace
>    when the hardware supports it but the host does not enable it.
> 8. Disabled IBT/SHSTK explicitly for SVM to avoid them being enabled on
>    AMD CPU accidentally before AMD CET series lands. Because IBT/SHSTK are
>    advertised in KVM x86 common code but only Intel support is added by
>    this series.
> 9. Re-ordered "Don't emulate branch instructions" (patch 18) before
>    advertising CET support to userspace.
> 10.Added consistency checks for CR4.CET and other CET MSRs during VM-entry
>    (patches 22-23)
> 
> [...]

I tested this with your work-in-progress QEMU support branch from [1]
and it worked well on my Alder Lake NUC (i7-1260P).

The host kernel has IBT and user shadow stacks enabled, so does the
guest kernel. KUT CET tests[2] ran fine on the host as well as in the
guest, i.e. nested works too.

Therefore,

Tested-by: Mathias Krause <minipli@grsecurity.net>

[1] https://github.com/gaochaointel/qemu-dev#qemu-cet
[2]
https://lore.kernel.org/kvm/20250626073459.12990-1-minipli@grsecurity.net/

Thanks,
Mathias

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 19/23] KVM: x86: Enable CET virtualization for VMX and advertise to userspace
  2025-07-04  8:49 ` [PATCH v11 19/23] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Chao Gao
@ 2025-07-21 15:51   ` Mathias Krause
  2025-07-21 17:45     ` Sean Christopherson
  2025-08-06 20:58   ` [PATCH v11 19/23] KVM: x86: Enable CET virtualization for VMX and advertise to userspace John Allen
  1 sibling, 1 reply; 49+ messages in thread
From: Mathias Krause @ 2025-07-21 15:51 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, xin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

[-- Attachment #1: Type: text/plain, Size: 1020 bytes --]

On 04.07.25 10:49, Chao Gao wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
> 
> Expose CET features to guest if KVM/host can support them, clear CPUID
> feature bits if KVM/host cannot support.
> [...]

Can we please make CR4.CET a guest-owned bit as well (sending a patch in
a second)? It's a logical continuation to making CR0.WP a guest-owned
bit just that it's even easier this time, as no MMU role bits are
involved and it still makes a big difference, at least for grsecurity
guest kernels.

Using the old test from [1] gives the following numbers (perf stat -r 5
ssdd 10 50000):

* grsec guest on linux-6.16-rc5 + cet patches:
  2.4647 +- 0.0706 seconds time elapsed  ( +-  2.86% )

* grsec guest on linux-6.16-rc5 + cet patches + CR4.CET guest-owned:
  1.5648 +- 0.0240 seconds time elapsed  ( +-  1.53% )

Not only is it ~35% faster, it's also more stable, less fluctuation due
to less VMEXITs, I believe.

Thanks,
Mathias

[1]
https://lore.kernel.org/kvm/20230322013731.102955-1-minipli@grsecurity.net/

[-- Attachment #2: 0001-KVM-VMX-Make-CR4.CET-a-guest-owned-bit.patch --]
[-- Type: text/x-patch, Size: 1260 bytes --]

From 14ef5d8b952744c46c32f16fea3b29184cde3e65 Mon Sep 17 00:00:00 2001
From: Mathias Krause <minipli@grsecurity.net>
Date: Mon, 21 Jul 2025 13:45:55 +0200
Subject: [PATCH] KVM: VMX: Make CR4.CET a guest owned bit

There's no need to intercept changes of CR4.CET, make it a guest-owned
bit where possible.

This change is VMX-specific, as SVM has no such fine-grained control
register intercept control.

Signed-off-by: Mathias Krause <minipli@grsecurity.net>
---
 arch/x86/kvm/kvm_cache_regs.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 36a8786db291..8ddb01191d6f 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -7,7 +7,8 @@
 #define KVM_POSSIBLE_CR0_GUEST_BITS	(X86_CR0_TS | X86_CR0_WP)
 #define KVM_POSSIBLE_CR4_GUEST_BITS				  \
 	(X86_CR4_PVI | X86_CR4_DE | X86_CR4_PCE | X86_CR4_OSFXSR  \
-	 | X86_CR4_OSXMMEXCPT | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_FSGSBASE)
+	 | X86_CR4_OSXMMEXCPT | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_FSGSBASE \
+	 | X86_CR4_CET)
 
 #define X86_CR0_PDPTR_BITS    (X86_CR0_CD | X86_CR0_NW | X86_CR0_PG)
 #define X86_CR4_TLBFLUSH_BITS (X86_CR4_PGE | X86_CR4_PCIDE | X86_CR4_PAE | X86_CR4_SMEP)
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 19/23] KVM: x86: Enable CET virtualization for VMX and advertise to userspace
  2025-07-21 15:51   ` Mathias Krause
@ 2025-07-21 17:45     ` Sean Christopherson
  2025-07-22  5:49       ` Mathias Krause
  0 siblings, 1 reply; 49+ messages in thread
From: Sean Christopherson @ 2025-07-21 17:45 UTC (permalink / raw)
  To: Mathias Krause
  Cc: Chao Gao, kvm, linux-kernel, x86, pbonzini, dave.hansen,
	rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, xin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

On Mon, Jul 21, 2025, Mathias Krause wrote:
> On 04.07.25 10:49, Chao Gao wrote:
> > From: Yang Weijiang <weijiang.yang@intel.com>
> > 
> > Expose CET features to guest if KVM/host can support them, clear CPUID
> > feature bits if KVM/host cannot support.
> > [...]
> 
> Can we please make CR4.CET a guest-owned bit as well (sending a patch in
> a second)? It's a logical continuation to making CR0.WP a guest-owned
> bit just that it's even easier this time, as no MMU role bits are
> involved and it still makes a big difference, at least for grsecurity
> guest kernels.

Out of curiosity, what's the use case for toggling CR4.CET at runtime?

> Using the old test from [1] gives the following numbers (perf stat -r 5
> ssdd 10 50000):
> 
> * grsec guest on linux-6.16-rc5 + cet patches:
>   2.4647 +- 0.0706 seconds time elapsed  ( +-  2.86% )
> 
> * grsec guest on linux-6.16-rc5 + cet patches + CR4.CET guest-owned:
>   1.5648 +- 0.0240 seconds time elapsed  ( +-  1.53% )
> 
> Not only is it ~35% faster, it's also more stable, less fluctuation due
> to less VMEXITs, I believe.
> 
> Thanks,
> Mathias
> 
> [1]
> https://lore.kernel.org/kvm/20230322013731.102955-1-minipli@grsecurity.net/

> From 14ef5d8b952744c46c32f16fea3b29184cde3e65 Mon Sep 17 00:00:00 2001
> From: Mathias Krause <minipli@grsecurity.net>
> Date: Mon, 21 Jul 2025 13:45:55 +0200
> Subject: [PATCH] KVM: VMX: Make CR4.CET a guest owned bit
> 
> There's no need to intercept changes of CR4.CET, make it a guest-owned
> bit where possible.

In the changelog, please elaborate on the assertion that CR4.CET doesn't need to
be intercepted, and include the motiviation and perf numbers.  KVM's "rule" is
to disable interception of something if and only if there is a good reason for
doing so, because generally speaking intercepting is safer.  E.g. KVM bugs are
less likely to put the host at risk.  "Because we can" isn't not a good reason :-)

E.g. at one point CR4.LA57 was a guest-owned bit, and the code was buggy.  Fixing
things took far more effort than it should have there was no justification for
the logic (IIRC, it was done purely on the whims of the original developer).

KVM has had many such cases, where some weird behavior was never documented/justified,
and I really, really want to avoid committing the same sins that have caused me
so much pain :-)

> This change is VMX-specific, as SVM has no such fine-grained control
> register intercept control.
> 
> Signed-off-by: Mathias Krause <minipli@grsecurity.net>
> ---
>  arch/x86/kvm/kvm_cache_regs.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
> index 36a8786db291..8ddb01191d6f 100644
> --- a/arch/x86/kvm/kvm_cache_regs.h
> +++ b/arch/x86/kvm/kvm_cache_regs.h
> @@ -7,7 +7,8 @@
>  #define KVM_POSSIBLE_CR0_GUEST_BITS	(X86_CR0_TS | X86_CR0_WP)
>  #define KVM_POSSIBLE_CR4_GUEST_BITS				  \
>  	(X86_CR4_PVI | X86_CR4_DE | X86_CR4_PCE | X86_CR4_OSFXSR  \
> -	 | X86_CR4_OSXMMEXCPT | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_FSGSBASE)
> +	 | X86_CR4_OSXMMEXCPT | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_FSGSBASE \
> +	 | X86_CR4_CET)
>  
>  #define X86_CR0_PDPTR_BITS    (X86_CR0_CD | X86_CR0_NW | X86_CR0_PG)
>  #define X86_CR4_TLBFLUSH_BITS (X86_CR4_PGE | X86_CR4_PCIDE | X86_CR4_PAE | X86_CR4_SMEP)
> -- 
> 2.47.2
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 00/23] Enable CET Virtualization
  2025-07-17  7:57         ` Chao Gao
@ 2025-07-21 18:08           ` John Allen
  0 siblings, 0 replies; 49+ messages in thread
From: John Allen @ 2025-07-21 18:08 UTC (permalink / raw)
  To: Chao Gao
  Cc: Mathias Krause, Xiaoyao Li, kvm, linux-kernel, x86, seanjc,
	pbonzini, dave.hansen, rick.p.edgecombe, mlevitsk, weijiang.yang,
	xin, Borislav Petkov, Dave Hansen, H. Peter Anvin, Ingo Molnar,
	Thomas Gleixner

On Thu, Jul 17, 2025 at 03:57:03PM +0800, Chao Gao wrote:
> On Thu, Jul 17, 2025 at 09:00:04AM +0200, Mathias Krause wrote:
> >On 16.07.25 22:36, John Allen wrote:
> >> On Mon, Jul 07, 2025 at 09:32:37AM +0800, Chao Gao wrote:
> >>> On Mon, Jul 07, 2025 at 12:51:14AM +0800, Xiaoyao Li wrote:
> >>>> Hi Chao,
> >>>>
> >>>> On 7/4/2025 4:49 PM, Chao Gao wrote:
> >>>>> Tests:
> >>>>> ======================
> >>>>> This series passed basic CET user shadow stack test and kernel IBT test in L1
> >>>>> and L2 guest.
> >>>>> The patch series_has_ impact to existing vmx test cases in KVM-unit-tests,the
> >>>>> failures have been fixed here[1].
> >>>>> One new selftest app[2] is introduced for testing CET MSRs accessibilities.
> >>>>>
> >>>>> Note, this series hasn't been tested on AMD platform yet.
> >>>>>
> >>>>> To run user SHSTK test and kernel IBT test in guest, an CET capable platform
> >>>>> is required, e.g., Sapphire Rapids server, and follow below steps to build
> >>>>> the binaries:
> >>>>>
> >>>>> 1. Host kernel: Apply this series to mainline kernel (>= v6.6) and build.
> >>>>>
> >>>>> 2. Guest kernel: Pull kernel (>= v6.6), opt-in CONFIG_X86_KERNEL_IBT
> >>>>> and CONFIG_X86_USER_SHADOW_STACK options. Build with CET enabled gcc versions
> >>>>> (>= 8.5.0).
> >>>>>
> >>>>> 3. Apply CET QEMU patches[3] before build mainline QEMU.
> >>>>
> >>>> You forgot to provide the links of [1][2][3].
> >>>
> >>> Oops, thanks for catching this.
> >>>
> >>> Here are the links:
> >>>
> >>> [1]: KVM-unit-tests fixup:
> >>> https://lore.kernel.org/all/20230913235006.74172-1-weijiang.yang@intel.com/
> >>> [2]: Selftest for CET MSRs:
> >>> https://lore.kernel.org/all/20230914064201.85605-1-weijiang.yang@intel.com/
> >>> [3]: QEMU patch:
> >>> https://lore.kernel.org/all/20230720111445.99509-1-weijiang.yang@intel.com/
> >>>
> >>> Please note that [1] has already been merged. And [3] is an older version of
> >>> CET for QEMU; I plan to post a new version for QEMU after the KVM series is
> >>> merged.
> >> 
> >> Do you happen to have a branch with the in-progress qemu patches you are
> >> testing with? I'm working on testing on AMD and I'm having issues
> >> getting this old version of the series to work properly.
> 
> Hi John,
> 
> Try this branch:
> 
> https://github.com/gaochaointel/qemu-dev qemu-cet
> 
> Disclaimer: I haven't cleaned up the QEMU patches yet, so they are not of
> upstream quality.
> 
> >
> >For me the old patches worked by changing the #define of
> >MSR_KVM_GUEST_SSP from 0x4b564d09 to 0x4b564dff -- on top of QEMU 9.0.1,
> >that is.
> 
> Please note that aliasing guest SSP to the virtual MSR indexed by 0x4b564dff is
> not part of KVM uAPI in the v11 series. This means the index 0x4b564dff isn't
> stable; userspace should read/write guest SSP via KVM_GET/SET_ONE_REG ioctls.

Thanks, tested on AMD. There's a couple of minor details I still need to
work out, but I should have an updated SVM series out soon.

Tested-by: John Allen <john.allen@amd.com>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 19/23] KVM: x86: Enable CET virtualization for VMX and advertise to userspace
  2025-07-21 17:45     ` Sean Christopherson
@ 2025-07-22  5:49       ` Mathias Krause
  2025-07-22 14:13         ` Sean Christopherson
  2025-07-22 21:25         ` [PATCH v2] KVM: VMX: Make CR4.CET a guest owned bit Mathias Krause
  0 siblings, 2 replies; 49+ messages in thread
From: Mathias Krause @ 2025-07-22  5:49 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Chao Gao, kvm, linux-kernel, x86, pbonzini, dave.hansen,
	rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, xin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

On 21.07.25 19:45, Sean Christopherson wrote:
> On Mon, Jul 21, 2025, Mathias Krause wrote:
>> Can we please make CR4.CET a guest-owned bit as well (sending a patch in
>> a second)? It's a logical continuation to making CR0.WP a guest-owned
>> bit just that it's even easier this time, as no MMU role bits are
>> involved and it still makes a big difference, at least for grsecurity
>> guest kernels.
> 
> Out of curiosity, what's the use case for toggling CR4.CET at runtime?

Plain and simple: architectural requirements to be able to toggle CR0.WP.

>> Using the old test from [1] gives the following numbers (perf stat -r 5
>> ssdd 10 50000):
>>
>> * grsec guest on linux-6.16-rc5 + cet patches:
>>   2.4647 +- 0.0706 seconds time elapsed  ( +-  2.86% )
>>
>> * grsec guest on linux-6.16-rc5 + cet patches + CR4.CET guest-owned:
>>   1.5648 +- 0.0240 seconds time elapsed  ( +-  1.53% )
>>
>> Not only is it ~35% faster, it's also more stable, less fluctuation due
>> to less VMEXITs, I believe.

Above test exercises the "pain path" by single-stepping a process and
constantly switching between tracer and tracee. The scheduling part
involves a CR0.WP toggle in grsecurity to be able to write to r/o
memory, which now requires toggling CR4.CET as well.

>>
>> Thanks,
>> Mathias
>>
>> [1]
>> https://lore.kernel.org/kvm/20230322013731.102955-1-minipli@grsecurity.net/
> 
>> From 14ef5d8b952744c46c32f16fea3b29184cde3e65 Mon Sep 17 00:00:00 2001
>> From: Mathias Krause <minipli@grsecurity.net>
>> Date: Mon, 21 Jul 2025 13:45:55 +0200
>> Subject: [PATCH] KVM: VMX: Make CR4.CET a guest owned bit
>>
>> There's no need to intercept changes of CR4.CET, make it a guest-owned
>> bit where possible.
> 
> In the changelog, please elaborate on the assertion that CR4.CET doesn't need to
> be intercepted, and include the motiviation and perf numbers.  KVM's "rule" is
> to disable interception of something if and only if there is a good reason for
> doing so, because generally speaking intercepting is safer.  E.g. KVM bugs are
> less likely to put the host at risk.  "Because we can" isn't not a good reason :-)

Understood, will extend the changelog accordingly.

> 
> E.g. at one point CR4.LA57 was a guest-owned bit, and the code was buggy.  Fixing
> things took far more effort than it should have there was no justification for
> the logic (IIRC, it was done purely on the whims of the original developer).
> 
> KVM has had many such cases, where some weird behavior was never documented/justified,
> and I really, really want to avoid committing the same sins that have caused me
> so much pain :-)

I totally understand your reasoning, "just because" shouldn't be the
justification. In this case, however, not making it a guest-owned bit
has a big performance impact for grsecurity, we would like to address.

The defines and asserts regarding KVM_MMU_CR4_ROLE_BITS and
KVM_POSSIBLE_CR4_GUEST_BITS (and KVM_MMU_CR0_ROLE_BITS /
KVM_POSSIBLE_CR0_GUEST_BITS too) should catch future attempts on
involving CR4.CET in any MMU-related decisions and I found no other use
of the (nested) guest's CR4.CET value beside for sanity checks to
prevent invalid architectural state (CR4.CET=1 but CR0.WP=0).

That is, imho, existing documentation regarding the expectations on
guest-owned bits and, even better, with BUILD_BUG_ON()s enforcing these.


Thanks,
Mathias

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 19/23] KVM: x86: Enable CET virtualization for VMX and advertise to userspace
  2025-07-22  5:49       ` Mathias Krause
@ 2025-07-22 14:13         ` Sean Christopherson
  2025-07-22 21:25         ` [PATCH v2] KVM: VMX: Make CR4.CET a guest owned bit Mathias Krause
  1 sibling, 0 replies; 49+ messages in thread
From: Sean Christopherson @ 2025-07-22 14:13 UTC (permalink / raw)
  To: Mathias Krause
  Cc: Chao Gao, kvm, linux-kernel, x86, pbonzini, dave.hansen,
	rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, xin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

On Tue, Jul 22, 2025, Mathias Krause wrote:
> On 21.07.25 19:45, Sean Christopherson wrote:
> > On Mon, Jul 21, 2025, Mathias Krause wrote:
> >> Can we please make CR4.CET a guest-owned bit as well (sending a patch in
> >> a second)? It's a logical continuation to making CR0.WP a guest-owned
> >> bit just that it's even easier this time, as no MMU role bits are
> >> involved and it still makes a big difference, at least for grsecurity
> >> guest kernels.
> > 
> > Out of curiosity, what's the use case for toggling CR4.CET at runtime?
> 
> Plain and simple: architectural requirements to be able to toggle CR0.WP.

Ugh, right.  That was less fun than I as expecting :-)

> > E.g. at one point CR4.LA57 was a guest-owned bit, and the code was buggy.  Fixing
> > things took far more effort than it should have there was no justification for
> > the logic (IIRC, it was done purely on the whims of the original developer).
> > 
> > KVM has had many such cases, where some weird behavior was never documented/justified,
> > and I really, really want to avoid committing the same sins that have caused me
> > so much pain :-)
> 
> I totally understand your reasoning, "just because" shouldn't be the
> justification. In this case, however, not making it a guest-owned bit
> has a big performance impact for grsecurity, we would like to address.

Oh, I'm not objecting to the change, at all.  I just want to make sure we capture
the justification in the changelog.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v2] KVM: VMX: Make CR4.CET a guest owned bit
  2025-07-22  5:49       ` Mathias Krause
  2025-07-22 14:13         ` Sean Christopherson
@ 2025-07-22 21:25         ` Mathias Krause
  2025-07-23  6:24           ` Chao Gao
  1 sibling, 1 reply; 49+ messages in thread
From: Mathias Krause @ 2025-07-22 21:25 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Mathias Krause, Chao Gao, kvm, linux-kernel, x86, pbonzini,
	dave.hansen, rick.p.edgecombe, mlevitsk, john.allen,
	weijiang.yang, xin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

There's no need to intercept changes to CR4.CET, as it's neither
included in KVM's MMU role bits, nor does KVM specifically care about
the actual value of a (nested) guest's CR4.CET value, beside for
enforcing architectural constraints, i.e. make sure that CR0.WP=1 if
CR4.CET=1.

Intercepting writes to CR4.CET is particularly bad for grsecurity
kernels with KERNEXEC or, even worse, KERNSEAL enabled. These features
heavily make use of read-only kernel objects and use a cpu-local CR0.WP
toggle to override it, when needed. Under a CET-enabled kernel, this
also requires toggling CR4.CET, hence the motivation to make it
guest-owned.

Using the old test from [1] gives the following runtime numbers (perf
stat -r 5 ssdd 10 50000):

* grsec guest on linux-6.16-rc5 + cet patches:
  2.4647 +- 0.0706 seconds time elapsed  ( +-  2.86% )

* grsec guest on linux-6.16-rc5 + cet patches + CR4.CET guest-owned:
  1.5648 +- 0.0240 seconds time elapsed  ( +-  1.53% )

Not only makes not intercepting CR4.CET the test run ~35% faster, it's
also more stable, less fluctuation due to less VMEXITs, I believe.

Therefore, make CR4.CET a guest-owned bit where possible.

This change is VMX-specific, as SVM has no such fine-grained control
register intercept control.

If KVM's assumptions regarding MMU role handling wrt. a guest's CR4.CET
value ever change, the BUILD_BUG_ON()s related to KVM_MMU_CR4_ROLE_BITS
and KVM_POSSIBLE_CR4_GUEST_BITS will catch that early.

Link: https://lore.kernel.org/kvm/20230322013731.102955-1-minipli@grsecurity.net/ [1]
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
---
v2:
- provide motivation and performance numbers

 arch/x86/kvm/kvm_cache_regs.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 36a8786db291..8ddb01191d6f 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -7,7 +7,8 @@
 #define KVM_POSSIBLE_CR0_GUEST_BITS	(X86_CR0_TS | X86_CR0_WP)
 #define KVM_POSSIBLE_CR4_GUEST_BITS				  \
 	(X86_CR4_PVI | X86_CR4_DE | X86_CR4_PCE | X86_CR4_OSFXSR  \
-	 | X86_CR4_OSXMMEXCPT | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_FSGSBASE)
+	 | X86_CR4_OSXMMEXCPT | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_FSGSBASE \
+	 | X86_CR4_CET)

 #define X86_CR0_PDPTR_BITS    (X86_CR0_CD | X86_CR0_NW | X86_CR0_PG)
 #define X86_CR4_TLBFLUSH_BITS (X86_CR4_PGE | X86_CR4_PCIDE | X86_CR4_PAE | X86_CR4_SMEP)
-- 
2.47.2

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v2] KVM: VMX: Make CR4.CET a guest owned bit
  2025-07-22 21:25         ` [PATCH v2] KVM: VMX: Make CR4.CET a guest owned bit Mathias Krause
@ 2025-07-23  6:24           ` Chao Gao
  0 siblings, 0 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-23  6:24 UTC (permalink / raw)
  To: Mathias Krause
  Cc: Sean Christopherson, kvm, linux-kernel, x86, pbonzini,
	dave.hansen, rick.p.edgecombe, mlevitsk, john.allen,
	weijiang.yang, xin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin


It is recommended to first state what a patch does before providing
the background and motivation. See

https://docs.kernel.org/process/maintainer-kvm-x86.html#changelog

>There's no need to intercept changes to CR4.CET, as it's neither
>included in KVM's MMU role bits, nor does KVM specifically care about
>the actual value of a (nested) guest's CR4.CET value, beside for
>enforcing architectural constraints, i.e. make sure that CR0.WP=1 if
>CR4.CET=1.
>
>Intercepting writes to CR4.CET is particularly bad for grsecurity
>kernels with KERNEXEC or, even worse, KERNSEAL enabled. These features
>heavily make use of read-only kernel objects and use a cpu-local CR0.WP
>toggle to override it, when needed. Under a CET-enabled kernel, this
>also requires toggling CR4.CET, hence the motivation to make it
>guest-owned.
>
>Using the old test from [1] gives the following runtime numbers (perf
>stat -r 5 ssdd 10 50000):
>
>* grsec guest on linux-6.16-rc5 + cet patches:
>  2.4647 +- 0.0706 seconds time elapsed  ( +-  2.86% )
>
>* grsec guest on linux-6.16-rc5 + cet patches + CR4.CET guest-owned:
>  1.5648 +- 0.0240 seconds time elapsed  ( +-  1.53% )
>
>Not only makes not intercepting CR4.CET the test run ~35% faster, it's
>also more stable, less fluctuation due to less VMEXITs, I believe.
>
>Therefore, make CR4.CET a guest-owned bit where possible.
>
>This change is VMX-specific, as SVM has no such fine-grained control
>register intercept control.

Ah, that's why the shortlog is "KVM: VMX". I was wondering why the shortlog
specifically mentions VMX while the patch actually touches x86 common code.

>
>If KVM's assumptions regarding MMU role handling wrt. a guest's CR4.CET
>value ever change, the BUILD_BUG_ON()s related to KVM_MMU_CR4_ROLE_BITS
>and KVM_POSSIBLE_CR4_GUEST_BITS will catch that early.
>
>Link: https://lore.kernel.org/kvm/20230322013731.102955-1-minipli@grsecurity.net/ [1]
>Signed-off-by: Mathias Krause <minipli@grsecurity.net>

The patch looks good. So,

Reviewed-by: Chao Gao <chao.gao@intel.com>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 01/23] KVM: x86: Rename kvm_{g,s}et_msr()* to show that they emulate guest accesses
  2025-07-04  8:49 ` [PATCH v11 01/23] KVM: x86: Rename kvm_{g,s}et_msr()* to show that they emulate guest accesses Chao Gao
@ 2025-07-24 11:37   ` Huang, Kai
  2025-07-24 13:31     ` Sean Christopherson
  2025-07-28 22:31   ` Xin Li
  1 sibling, 1 reply; 49+ messages in thread
From: Huang, Kai @ 2025-07-24 11:37 UTC (permalink / raw)
  To: kvm@vger.kernel.org, pbonzini@redhat.com, Hansen, Dave,
	linux-kernel@vger.kernel.org, Gao, Chao, x86@kernel.org,
	seanjc@google.com
  Cc: Yang, Weijiang, Edgecombe, Rick P, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, john.allen@amd.com,
	mingo@redhat.com, tglx@linutronix.de, minipli@grsecurity.net,
	mlevitsk@redhat.com, xin@zytor.com

On Fri, 2025-07-04 at 01:49 -0700, Chao Gao wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
> 
> Rename kvm_{g,s}et_msr()* to kvm_emulate_msr_{read,write}()* to make it
> more obvious that KVM uses these helpers to emulate guest behaviors,
> i.e., host_initiated == false in these helpers.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Suggested-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Reviewed-by: Chao Gao <chao.gao@intel.com>

Nit: I don't think your Reviewed-by is needed if the chain already has
your SoB?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 01/23] KVM: x86: Rename kvm_{g,s}et_msr()* to show that they emulate guest accesses
  2025-07-24 11:37   ` Huang, Kai
@ 2025-07-24 13:31     ` Sean Christopherson
  0 siblings, 0 replies; 49+ messages in thread
From: Sean Christopherson @ 2025-07-24 13:31 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm@vger.kernel.org, pbonzini@redhat.com, Dave Hansen,
	linux-kernel@vger.kernel.org, Chao Gao, x86@kernel.org,
	Weijiang Yang, Rick P Edgecombe, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, john.allen@amd.com,
	mingo@redhat.com, tglx@linutronix.de, minipli@grsecurity.net,
	mlevitsk@redhat.com, xin@zytor.com

On Thu, Jul 24, 2025, Kai Huang wrote:
> On Fri, 2025-07-04 at 01:49 -0700, Chao Gao wrote:
> > From: Yang Weijiang <weijiang.yang@intel.com>
> > 
> > Rename kvm_{g,s}et_msr()* to kvm_emulate_msr_{read,write}()* to make it
> > more obvious that KVM uses these helpers to emulate guest behaviors,
> > i.e., host_initiated == false in these helpers.
> > 
> > Suggested-by: Sean Christopherson <seanjc@google.com>
> > Suggested-by: Chao Gao <chao.gao@intel.com>
> > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > Signed-off-by: Chao Gao <chao.gao@intel.com>
> > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> > Reviewed-by: Chao Gao <chao.gao@intel.com>
> 
> Nit: I don't think your Reviewed-by is needed if the chain already has
> your SoB?

Keep the Reviewed-by, it's still useful, e.g. to communicate that Chao has done
more than just shepherd the patch along.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 21/23] KVM: nVMX: Enable CET support for nested guest
  2025-07-04  8:49 ` [PATCH v11 21/23] KVM: nVMX: Enable CET support for nested guest Chao Gao
@ 2025-07-28  6:30   ` Xin Li
  2025-07-28  8:42     ` Chao Gao
  0 siblings, 1 reply; 49+ messages in thread
From: Xin Li @ 2025-07-28  6:30 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

> @@ -2515,6 +2537,30 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
>   	}
>   }
>   
> +static inline void cet_vmcs_fields_get(struct kvm_vcpu *vcpu, u64 *ssp,
> +				       u64 *s_cet, u64 *ssp_tbl)
> +{
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) {
> +		*ssp = vmcs_readl(GUEST_SSP);
> +		*ssp_tbl = vmcs_readl(GUEST_INTR_SSP_TABLE);
> +	}
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) ||
> +	    guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
> +		*s_cet = vmcs_readl(GUEST_S_CET);
> +}
> +
> +static inline void cet_vmcs_fields_set(struct kvm_vcpu *vcpu, u64 ssp,
> +				       u64 s_cet, u64 ssp_tbl)
> +{
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) {
> +		vmcs_writel(GUEST_SSP, ssp);
> +		vmcs_writel(GUEST_INTR_SSP_TABLE, ssp_tbl);
> +	}
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) ||
> +	    guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
> +		vmcs_writel(GUEST_S_CET, s_cet);
> +}
> +
>   static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
>   {
>   	struct hv_enlightened_vmcs *hv_evmcs = nested_vmx_evmcs(vmx);


The order of the arguments is a bit of weird to me, I would move s_cet
before ssp.  Then it is consistent with the order in
https://lore.kernel.org/kvm/20250704085027.182163-13-chao.gao@intel.com/


> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -181,6 +181,9 @@ struct nested_vmx {
>   	 */
>   	u64 pre_vmenter_debugctl;
>   	u64 pre_vmenter_bndcfgs;
> +	u64 pre_vmenter_ssp;
> +	u64 pre_vmenter_s_cet;
> +	u64 pre_vmenter_ssp_tbl;
>   
>   	/* to migrate it to L1 if L2 writes to L1's CR8 directly */
>   	int l1_tpr_threshold;

Same here.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 21/23] KVM: nVMX: Enable CET support for nested guest
  2025-07-28  6:30   ` Xin Li
@ 2025-07-28  8:42     ` Chao Gao
  0 siblings, 0 replies; 49+ messages in thread
From: Chao Gao @ 2025-07-28  8:42 UTC (permalink / raw)
  To: Xin Li
  Cc: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen,
	rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

On Sun, Jul 27, 2025 at 11:30:28PM -0700, Xin Li wrote:
>> @@ -2515,6 +2537,30 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
>>   	}
>>   }
>> +static inline void cet_vmcs_fields_get(struct kvm_vcpu *vcpu, u64 *ssp,
>> +				       u64 *s_cet, u64 *ssp_tbl)
>> +{
>> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) {
>> +		*ssp = vmcs_readl(GUEST_SSP);
>> +		*ssp_tbl = vmcs_readl(GUEST_INTR_SSP_TABLE);
>> +	}
>> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) ||
>> +	    guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
>> +		*s_cet = vmcs_readl(GUEST_S_CET);
>> +}
>> +
>> +static inline void cet_vmcs_fields_set(struct kvm_vcpu *vcpu, u64 ssp,
>> +				       u64 s_cet, u64 ssp_tbl)
>> +{
>> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) {
>> +		vmcs_writel(GUEST_SSP, ssp);
>> +		vmcs_writel(GUEST_INTR_SSP_TABLE, ssp_tbl);
>> +	}
>> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) ||
>> +	    guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
>> +		vmcs_writel(GUEST_S_CET, s_cet);
>> +}
>> +
>>   static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
>>   {
>>   	struct hv_enlightened_vmcs *hv_evmcs = nested_vmx_evmcs(vmx);
>
>
>The order of the arguments is a bit of weird to me, I would move s_cet
>before ssp.  Then it is consistent with the order in
>https://lore.kernel.org/kvm/20250704085027.182163-13-chao.gao@intel.com/

Sure. I can reorder the arguments.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 01/23] KVM: x86: Rename kvm_{g,s}et_msr()* to show that they emulate guest accesses
  2025-07-04  8:49 ` [PATCH v11 01/23] KVM: x86: Rename kvm_{g,s}et_msr()* to show that they emulate guest accesses Chao Gao
  2025-07-24 11:37   ` Huang, Kai
@ 2025-07-28 22:31   ` Xin Li
  2025-07-29  0:45     ` Chao Gao
  1 sibling, 1 reply; 49+ messages in thread
From: Xin Li @ 2025-07-28 22:31 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

On 7/4/2025 1:49 AM, Chao Gao wrote:
> @@ -2764,7 +2764,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>   
>   	if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) &&
>   	    kvm_pmu_has_perf_global_ctrl(vcpu_to_pmu(vcpu)) &&
> -	    WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL,
> +	    WARN_ON_ONCE(kvm_emulate_msr_write(vcpu, MSR_CORE_PERF_GLOBAL_CTRL,
>   				     vmcs12->guest_ia32_perf_global_ctrl))) {

Not sure if the alignment should be adjusted based on the above modified
line.

>   		*entry_failure_code = ENTRY_FAIL_DEFAULT;
>   		return -EINVAL;
> @@ -4752,8 +4752,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
>   	}
>   	if ((vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL) &&
>   	    kvm_pmu_has_perf_global_ctrl(vcpu_to_pmu(vcpu)))
> -		WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL,
> -					 vmcs12->host_ia32_perf_global_ctrl));
> +		WARN_ON_ONCE(kvm_emulate_msr_write(vcpu,
> +					MSR_CORE_PERF_GLOBAL_CTRL,
> +					vmcs12->host_ia32_perf_global_ctrl));

Same here.

>   
>   	/* Set L1 segment info according to Intel SDM
>   	    27.5.2 Loading Host Segment and Descriptor-Table Registers */
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7543dac7ae70..11d84075cd14 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1929,33 +1929,35 @@ static int kvm_get_msr_ignored_check(struct kvm_vcpu *vcpu,
>   				 __kvm_get_msr);
>   }
>   
> -int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data)
> +int kvm_emulate_msr_read_with_filter(struct kvm_vcpu *vcpu, u32 index,
> +				     u64 *data)

I think the extra new line doesn't improve readability, but it's the
maintainer's call.

>   {
>   	if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ))
>   		return KVM_MSR_RET_FILTERED;
>   	return kvm_get_msr_ignored_check(vcpu, index, data, false);
>   }
> -EXPORT_SYMBOL_GPL(kvm_get_msr_with_filter);
> +EXPORT_SYMBOL_GPL(kvm_emulate_msr_read_with_filter);
>   
> -int kvm_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data)
> +int kvm_emulate_msr_write_with_filter(struct kvm_vcpu *vcpu, u32 index,

Ditto.

> +				      u64 data)
>   {
>   	if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE))
>   		return KVM_MSR_RET_FILTERED;


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 12/23] KVM: VMX: Introduce CET VMCS fields and control bits
  2025-07-04  8:49 ` [PATCH v11 12/23] KVM: VMX: Introduce CET VMCS fields and control bits Chao Gao
@ 2025-07-28 22:53   ` Xin Li
  2025-07-29  1:30     ` Chao Gao
  0 siblings, 1 reply; 49+ messages in thread
From: Xin Li @ 2025-07-28 22:53 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen
  Cc: rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	Zhang Yi Z, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

On 7/4/2025 1:49 AM, Chao Gao wrote:
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index cca7d6641287..ce10a7e2d3d9 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -106,6 +106,7 @@
>   #define VM_EXIT_CLEAR_BNDCFGS                   0x00800000
>   #define VM_EXIT_PT_CONCEAL_PIP			0x01000000
>   #define VM_EXIT_CLEAR_IA32_RTIT_CTL		0x02000000
> +#define VM_EXIT_LOAD_CET_STATE                  0x10000000
>   
>   #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR	0x00036dff
>   
> @@ -119,6 +120,7 @@
>   #define VM_ENTRY_LOAD_BNDCFGS                   0x00010000
>   #define VM_ENTRY_PT_CONCEAL_PIP			0x00020000
>   #define VM_ENTRY_LOAD_IA32_RTIT_CTL		0x00040000
> +#define VM_ENTRY_LOAD_CET_STATE                 0x00100000
>   
>   #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR	0x000011ff
>   
> @@ -369,6 +371,9 @@ enum vmcs_field {
>   	GUEST_PENDING_DBG_EXCEPTIONS    = 0x00006822,
>   	GUEST_SYSENTER_ESP              = 0x00006824,
>   	GUEST_SYSENTER_EIP              = 0x00006826,
> +	GUEST_S_CET                     = 0x00006828,
> +	GUEST_SSP                       = 0x0000682a,
> +	GUEST_INTR_SSP_TABLE            = 0x0000682c,
>   	HOST_CR0                        = 0x00006c00,
>   	HOST_CR3                        = 0x00006c02,
>   	HOST_CR4                        = 0x00006c04,
> @@ -381,6 +386,9 @@ enum vmcs_field {
>   	HOST_IA32_SYSENTER_EIP          = 0x00006c12,
>   	HOST_RSP                        = 0x00006c14,
>   	HOST_RIP                        = 0x00006c16,
> +	HOST_S_CET                      = 0x00006c18,
> +	HOST_SSP                        = 0x00006c1a,
> +	HOST_INTR_SSP_TABLE             = 0x00006c1c
>   };
>   
>   /*

A comment not on this patch itself.

Both spaces and tabs are currently used to align columns in 
arch/x86/include/asm/vmx.h.  Can we standardize on one, either spaces or 
tabs?

Thanks!
     Xin

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 01/23] KVM: x86: Rename kvm_{g,s}et_msr()* to show that they emulate guest accesses
  2025-07-28 22:31   ` Xin Li
@ 2025-07-29  0:45     ` Chao Gao
  2025-07-29 18:19       ` Sean Christopherson
  0 siblings, 1 reply; 49+ messages in thread
From: Chao Gao @ 2025-07-29  0:45 UTC (permalink / raw)
  To: Xin Li
  Cc: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen,
	rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

On Mon, Jul 28, 2025 at 03:31:41PM -0700, Xin Li wrote:
>On 7/4/2025 1:49 AM, Chao Gao wrote:
>> @@ -2764,7 +2764,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>>   	if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) &&
>>   	    kvm_pmu_has_perf_global_ctrl(vcpu_to_pmu(vcpu)) &&
>> -	    WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL,
>> +	    WARN_ON_ONCE(kvm_emulate_msr_write(vcpu, MSR_CORE_PERF_GLOBAL_CTRL,
>>   				     vmcs12->guest_ia32_perf_global_ctrl))) {
>
>Not sure if the alignment should be adjusted based on the above modified
>line.

I prefer to align the indentation. so will do.

>
>>   		*entry_failure_code = ENTRY_FAIL_DEFAULT;
>>   		return -EINVAL;
>> @@ -4752,8 +4752,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
>>   	}
>>   	if ((vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL) &&
>>   	    kvm_pmu_has_perf_global_ctrl(vcpu_to_pmu(vcpu)))
>> -		WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL,
>> -					 vmcs12->host_ia32_perf_global_ctrl));
>> +		WARN_ON_ONCE(kvm_emulate_msr_write(vcpu,
>> +					MSR_CORE_PERF_GLOBAL_CTRL,
>> +					vmcs12->host_ia32_perf_global_ctrl));
>
>Same here.

ack.

>
>>   	/* Set L1 segment info according to Intel SDM
>>   	    27.5.2 Loading Host Segment and Descriptor-Table Registers */
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 7543dac7ae70..11d84075cd14 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1929,33 +1929,35 @@ static int kvm_get_msr_ignored_check(struct kvm_vcpu *vcpu,
>>   				 __kvm_get_msr);
>>   }
>> -int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data)
>> +int kvm_emulate_msr_read_with_filter(struct kvm_vcpu *vcpu, u32 index,
>> +				     u64 *data)
>
>I think the extra new line doesn't improve readability, but it's the
>maintainer's call.
>

Sure. Seems "let it poke out" is Sean's preference. I saw he made similar
requests several times. e.g.,

https://lore.kernel.org/kvm/ZjQgA0ml4-mRJC-e@google.com/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 12/23] KVM: VMX: Introduce CET VMCS fields and control bits
  2025-07-28 22:53   ` Xin Li
@ 2025-07-29  1:30     ` Chao Gao
  2025-07-29  2:17       ` Xin Li
  0 siblings, 1 reply; 49+ messages in thread
From: Chao Gao @ 2025-07-29  1:30 UTC (permalink / raw)
  To: Xin Li
  Cc: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen,
	rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	Zhang Yi Z, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

On Mon, Jul 28, 2025 at 03:53:14PM -0700, Xin Li wrote:
>On 7/4/2025 1:49 AM, Chao Gao wrote:
>> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
>> index cca7d6641287..ce10a7e2d3d9 100644
>> --- a/arch/x86/include/asm/vmx.h
>> +++ b/arch/x86/include/asm/vmx.h
>> @@ -106,6 +106,7 @@
>>   #define VM_EXIT_CLEAR_BNDCFGS                   0x00800000
>>   #define VM_EXIT_PT_CONCEAL_PIP			0x01000000
>>   #define VM_EXIT_CLEAR_IA32_RTIT_CTL		0x02000000
>> +#define VM_EXIT_LOAD_CET_STATE                  0x10000000
>>   #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR	0x00036dff
>> @@ -119,6 +120,7 @@
>>   #define VM_ENTRY_LOAD_BNDCFGS                   0x00010000
>>   #define VM_ENTRY_PT_CONCEAL_PIP			0x00020000
>>   #define VM_ENTRY_LOAD_IA32_RTIT_CTL		0x00040000
>> +#define VM_ENTRY_LOAD_CET_STATE                 0x00100000
>>   #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR	0x000011ff
>> @@ -369,6 +371,9 @@ enum vmcs_field {
>>   	GUEST_PENDING_DBG_EXCEPTIONS    = 0x00006822,
>>   	GUEST_SYSENTER_ESP              = 0x00006824,
>>   	GUEST_SYSENTER_EIP              = 0x00006826,
>> +	GUEST_S_CET                     = 0x00006828,
>> +	GUEST_SSP                       = 0x0000682a,
>> +	GUEST_INTR_SSP_TABLE            = 0x0000682c,
>>   	HOST_CR0                        = 0x00006c00,
>>   	HOST_CR3                        = 0x00006c02,
>>   	HOST_CR4                        = 0x00006c04,
>> @@ -381,6 +386,9 @@ enum vmcs_field {
>>   	HOST_IA32_SYSENTER_EIP          = 0x00006c12,
>>   	HOST_RSP                        = 0x00006c14,
>>   	HOST_RIP                        = 0x00006c16,
>> +	HOST_S_CET                      = 0x00006c18,
>> +	HOST_SSP                        = 0x00006c1a,
>> +	HOST_INTR_SSP_TABLE             = 0x00006c1c
>>   };
>>   /*
>
>A comment not on this patch itself.
>
>Both spaces and tabs are currently used to align columns in
>arch/x86/include/asm/vmx.h.  Can we standardize on one, either spaces or
>tabs?

I'm okay with using tabs or spaces consistently, but doing so will cause a lot
of code churn and make it slightly harder to use git-blame. Let's see what
others think.

>
>Thanks!
>    Xin

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 12/23] KVM: VMX: Introduce CET VMCS fields and control bits
  2025-07-29  1:30     ` Chao Gao
@ 2025-07-29  2:17       ` Xin Li
  0 siblings, 0 replies; 49+ messages in thread
From: Xin Li @ 2025-07-29  2:17 UTC (permalink / raw)
  To: Chao Gao
  Cc: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen,
	rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	Zhang Yi Z, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

On 7/28/2025 6:30 PM, Chao Gao wrote:
>> A comment not on this patch itself.
>>
>> Both spaces and tabs are currently used to align columns in
>> arch/x86/include/asm/vmx.h.  Can we standardize on one, either spaces or
>> tabs?
> I'm okay with using tabs or spaces consistently, but doing so will cause a lot
> of code churn and make it slightly harder to use git-blame. Let's see what
> others think.

Agreed, lets see what Sean would say :)


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 01/23] KVM: x86: Rename kvm_{g,s}et_msr()* to show that they emulate guest accesses
  2025-07-29  0:45     ` Chao Gao
@ 2025-07-29 18:19       ` Sean Christopherson
  0 siblings, 0 replies; 49+ messages in thread
From: Sean Christopherson @ 2025-07-29 18:19 UTC (permalink / raw)
  To: Chao Gao
  Cc: Xin Li, kvm, linux-kernel, x86, pbonzini, dave.hansen,
	rick.p.edgecombe, mlevitsk, john.allen, weijiang.yang, minipli,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

On Tue, Jul 29, 2025, Chao Gao wrote:
> On Mon, Jul 28, 2025 at 03:31:41PM -0700, Xin Li wrote:
> >>   	/* Set L1 segment info according to Intel SDM
> >>   	    27.5.2 Loading Host Segment and Descriptor-Table Registers */
> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >> index 7543dac7ae70..11d84075cd14 100644
> >> --- a/arch/x86/kvm/x86.c
> >> +++ b/arch/x86/kvm/x86.c
> >> @@ -1929,33 +1929,35 @@ static int kvm_get_msr_ignored_check(struct kvm_vcpu *vcpu,
> >>   				 __kvm_get_msr);
> >>   }
> >> -int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data)
> >> +int kvm_emulate_msr_read_with_filter(struct kvm_vcpu *vcpu, u32 index,
> >> +				     u64 *data)
> >
> >I think the extra new line doesn't improve readability, but it's the
> >maintainer's call.
> >
> 
> Sure. Seems "let it poke out" is Sean's preference. I saw he made similar
> requests several times. e.g.,

Depends on the situation.  I'd probably mentally flip a coin in this case.

But what I'd actually do here is choose names that are (a) less verbose and (b)
capture the relationship between the APIs.  Instead of:

  int kvm_emulate_msr_read_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data);
  int kvm_emulate_msr_write_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data);
  int kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data);
  int kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data);

rename to:

  int kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data);
  int kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data);
  int __kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data);
  int __kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data);

And then we can do a follow-up patch to solidify the relationship:

--
From: Sean Christopherson <seanjc@google.com>
Date: Tue, 29 Jul 2025 11:13:48 -0700
Subject: [PATCH] KVM: x86: Use double-underscore read/write MSR helpers as
 appropriate

Use the double-underscore helpers for emulating MSR reads and writes in
he no-underscore versions to better capture the relationship between the
two sets of APIs (the double-underscore versions don't honor userspace MSR
filters).

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 09b106a5afdf..65c787bcfe8b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1932,11 +1932,24 @@ static int kvm_get_msr_ignored_check(struct kvm_vcpu *vcpu,
 				 __kvm_get_msr);
 }
 
+int __kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data)
+{
+	return kvm_get_msr_ignored_check(vcpu, index, data, false);
+}
+EXPORT_SYMBOL_GPL(__kvm_emulate_msr_read);
+
+int __kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data)
+{
+	return kvm_set_msr_ignored_check(vcpu, index, data, false);
+}
+EXPORT_SYMBOL_GPL(__kvm_emulate_msr_write);
+
 int kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data)
 {
 	if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ))
 		return KVM_MSR_RET_FILTERED;
-	return kvm_get_msr_ignored_check(vcpu, index, data, false);
+
+	return __kvm_emulate_msr_read(vcpu, index, data);
 }
 EXPORT_SYMBOL_GPL(kvm_emulate_msr_read);
 
@@ -1944,21 +1957,11 @@ int kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data)
 {
 	if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE))
 		return KVM_MSR_RET_FILTERED;
-	return kvm_set_msr_ignored_check(vcpu, index, data, false);
+
+	return __kvm_emulate_msr_write(vcpu, index, data);
 }
 EXPORT_SYMBOL_GPL(kvm_emulate_msr_write);
 
-int __kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data)
-{
-	return kvm_get_msr_ignored_check(vcpu, index, data, false);
-}
-EXPORT_SYMBOL_GPL(__kvm_emulate_msr_read);
-
-int __kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data)
-{
-	return kvm_set_msr_ignored_check(vcpu, index, data, false);
-}
-EXPORT_SYMBOL_GPL(__kvm_emulate_msr_write);
 
 static void complete_userspace_rdmsr(struct kvm_vcpu *vcpu)
 {

base-commit: 1877e7b0749cbaa2d2ba4056eeda93adb373f7d4
--

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 19/23] KVM: x86: Enable CET virtualization for VMX and advertise to userspace
  2025-07-04  8:49 ` [PATCH v11 19/23] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Chao Gao
  2025-07-21 15:51   ` Mathias Krause
@ 2025-08-06 20:58   ` John Allen
  2025-08-06 22:47     ` Sean Christopherson
  1 sibling, 1 reply; 49+ messages in thread
From: John Allen @ 2025-08-06 20:58 UTC (permalink / raw)
  To: Chao Gao
  Cc: kvm, linux-kernel, x86, seanjc, pbonzini, dave.hansen,
	rick.p.edgecombe, mlevitsk, weijiang.yang, minipli, xin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

On Fri, Jul 04, 2025 at 01:49:50AM -0700, Chao Gao wrote:
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 803574920e41..6375695ce285 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -5223,6 +5223,10 @@ static __init void svm_set_cpu_caps(void)
>  	kvm_caps.supported_perf_cap = 0;
>  	kvm_caps.supported_xss = 0;
>  
> +	/* KVM doesn't yet support CET virtualization for SVM. */
> +	kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
> +	kvm_cpu_cap_clear(X86_FEATURE_IBT);
> +

Since AMD isn't supporting IBT, not sure if it makes sense to clear IBT
here since it doesn't look like we're clearing other features that we
don't support in hardware. For compatibility, my series just removes
both lines here, but the IBT clearing is probably not needed in this
series.

Thanks,
John

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v11 19/23] KVM: x86: Enable CET virtualization for VMX and advertise to userspace
  2025-08-06 20:58   ` [PATCH v11 19/23] KVM: x86: Enable CET virtualization for VMX and advertise to userspace John Allen
@ 2025-08-06 22:47     ` Sean Christopherson
  0 siblings, 0 replies; 49+ messages in thread
From: Sean Christopherson @ 2025-08-06 22:47 UTC (permalink / raw)
  To: John Allen
  Cc: Chao Gao, kvm, linux-kernel, x86, pbonzini, dave.hansen,
	rick.p.edgecombe, mlevitsk, weijiang.yang, minipli, xin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

On Wed, Aug 06, 2025, John Allen wrote:
> On Fri, Jul 04, 2025 at 01:49:50AM -0700, Chao Gao wrote:
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 803574920e41..6375695ce285 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -5223,6 +5223,10 @@ static __init void svm_set_cpu_caps(void)
> >  	kvm_caps.supported_perf_cap = 0;
> >  	kvm_caps.supported_xss = 0;
> >  
> > +	/* KVM doesn't yet support CET virtualization for SVM. */
> > +	kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
> > +	kvm_cpu_cap_clear(X86_FEATURE_IBT);
> > +
> 
> Since AMD isn't supporting IBT, 

Isn't supporting IBT, yet.  :-)

I totally believe that AMD doesn't have any plans to support IBT, but unless
IBT virtualization would Just Work (would it?), we should leave this in, because
being paranoid is basically free. 

> not sure if it makes sense to clear IBT here since it doesn't look like we're
> clearing other features that we don't support in hardware. For compatibility,
> my series just removes both lines here, but the IBT clearing is probably not
> needed in this series.

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2025-08-06 22:47 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-04  8:49 [PATCH v11 00/23] Enable CET Virtualization Chao Gao
2025-07-04  8:49 ` [PATCH v11 01/23] KVM: x86: Rename kvm_{g,s}et_msr()* to show that they emulate guest accesses Chao Gao
2025-07-24 11:37   ` Huang, Kai
2025-07-24 13:31     ` Sean Christopherson
2025-07-28 22:31   ` Xin Li
2025-07-29  0:45     ` Chao Gao
2025-07-29 18:19       ` Sean Christopherson
2025-07-04  8:49 ` [PATCH v11 02/23] KVM: x86: Add kvm_msr_{read,write}() helpers Chao Gao
2025-07-04  8:49 ` [PATCH v11 03/23] KVM: x86: Manually clear MPX state only on INIT Chao Gao
2025-07-04  8:49 ` [PATCH v11 04/23] KVM: x86: Zero XSTATE components on INIT by iterating over supported features Chao Gao
2025-07-04  8:49 ` [PATCH v11 05/23] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support Chao Gao
2025-07-04  8:49 ` [PATCH v11 06/23] KVM: x86: Report XSS as to-be-saved if there are supported features Chao Gao
2025-07-04  8:49 ` [PATCH v11 07/23] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS Chao Gao
2025-07-04  8:49 ` [PATCH v11 08/23] KVM: x86: Initialize kvm_caps.supported_xss Chao Gao
2025-07-04  8:49 ` [PATCH v11 09/23] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Chao Gao
2025-07-04  8:49 ` [PATCH v11 10/23] KVM: x86: Add fault checks for guest CR4.CET setting Chao Gao
2025-07-04  8:49 ` [PATCH v11 11/23] KVM: x86: Report KVM supported CET MSRs as to-be-saved Chao Gao
2025-07-04  8:49 ` [PATCH v11 12/23] KVM: VMX: Introduce CET VMCS fields and control bits Chao Gao
2025-07-28 22:53   ` Xin Li
2025-07-29  1:30     ` Chao Gao
2025-07-29  2:17       ` Xin Li
2025-07-04  8:49 ` [PATCH v11 13/23] KVM: x86: Enable guest SSP read/write interface with new uAPIs Chao Gao
2025-07-04  8:49 ` [PATCH v11 14/23] KVM: VMX: Emulate read and write to CET MSRs Chao Gao
2025-07-04  8:49 ` [PATCH v11 15/23] KVM: x86: Save and reload SSP to/from SMRAM Chao Gao
2025-07-04  8:49 ` [PATCH v11 16/23] KVM: VMX: Set up interception for CET MSRs Chao Gao
2025-07-04  8:49 ` [PATCH v11 17/23] KVM: VMX: Set host constant supervisor states to VMCS fields Chao Gao
2025-07-04  8:49 ` [PATCH v11 18/23] KVM: x86: Don't emulate instructions guarded by CET Chao Gao
2025-07-04  8:49 ` [PATCH v11 19/23] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Chao Gao
2025-07-21 15:51   ` Mathias Krause
2025-07-21 17:45     ` Sean Christopherson
2025-07-22  5:49       ` Mathias Krause
2025-07-22 14:13         ` Sean Christopherson
2025-07-22 21:25         ` [PATCH v2] KVM: VMX: Make CR4.CET a guest owned bit Mathias Krause
2025-07-23  6:24           ` Chao Gao
2025-08-06 20:58   ` [PATCH v11 19/23] KVM: x86: Enable CET virtualization for VMX and advertise to userspace John Allen
2025-08-06 22:47     ` Sean Christopherson
2025-07-04  8:49 ` [PATCH v11 20/23] KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2 Chao Gao
2025-07-04  8:49 ` [PATCH v11 21/23] KVM: nVMX: Enable CET support for nested guest Chao Gao
2025-07-28  6:30   ` Xin Li
2025-07-28  8:42     ` Chao Gao
2025-07-04  8:49 ` [PATCH v11 22/23] KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET Chao Gao
2025-07-04  8:49 ` [PATCH v11 23/23] KVM: nVMX: Add consistency checks for CET states Chao Gao
2025-07-06 16:51 ` [PATCH v11 00/23] Enable CET Virtualization Xiaoyao Li
2025-07-07  1:32   ` Chao Gao
2025-07-16 20:36     ` John Allen
2025-07-17  7:00       ` Mathias Krause
2025-07-17  7:57         ` Chao Gao
2025-07-21 18:08           ` John Allen
2025-07-21 15:35 ` Mathias Krause

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).