[PATCH v14 00/22] Enable CET Virtualization

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v14 00/22] Enable CET Virtualization
@ 2025-09-09  9:39 Chao Gao
  2025-09-09  9:39 ` [PATCH v14 01/22] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support Chao Gao
                   ` (22 more replies)
  0 siblings, 23 replies; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

The FPU support for CET virtualization has already been merged into 6.17-rc1.
Building on that, this series introduces Intel CET virtualization support for
KVM.

Changes in v14
1. rename the type of guest SSP register to KVM_X86_REG_KVM and add docs
   for register IDs in api.rst (Sean, Xiaoyao)
2. update commit message of patch 1
3. use rdmsrq/wrmsrq() instead of rdmsrl/wrmsrl() in patch 6 (Xin)
4. split the introduction of per-guest guest_supported_xss into a
separate patch. (Xiaoyao)
5. make guest FPU and VMCS consistent regarding MSR_IA32_S_CET
6. collect reviews from Xiaoyao.

---
Control-flow Enforcement Technology (CET) is a kind of CPU feature used
to prevent Return/CALL/Jump-Oriented Programming (ROP/COP/JOP) attacks.
It provides two sub-features(SHSTK,IBT) to defend against ROP/COP/JOP
style control-flow subversion attacks.

Shadow Stack (SHSTK):
  A shadow stack is a second stack used exclusively for control transfer
  operations. The shadow stack is separate from the data/normal stack and
  can be enabled individually in user and kernel mode. When shadow stack
  is enabled, CALL pushes the return address on both the data and shadow
  stack. RET pops the return address from both stacks and compares them.
  If the return addresses from the two stacks do not match, the processor
  generates a #CP.

Indirect Branch Tracking (IBT):
  IBT introduces new instruction(ENDBRANCH)to mark valid target addresses
  of indirect branches (CALL, JMP etc...). If an indirect branch is
  executed and the next instruction is _not_ an ENDBRANCH, the processor
  generates a #CP. These instruction behaves as a NOP on platforms that
  doesn't support CET.

CET states management
=====================
KVM cooperates with host kernel FPU framework to manage guest CET registers.
With CET supervisor mode state support in this series, KVM can save/restore
full guest CET xsave-managed states.

CET user mode and supervisor mode xstates, i.e., MSR_IA32_{U_CET,PL3_SSP}
and MSR_IA32_PL{0,1,2}, depend on host FPU framework to swap guest and host
xstates. On VM-Exit, guest CET xstates are saved to guest fpu area and host
CET xstates are loaded from task/thread context before vCPU returns to
userspace, vice-versa on VM-Entry. See details in kvm_{load,put}_guest_fpu().

CET supervisor mode states are grouped into two categories : XSAVE-managed
and non-XSAVE-managed, the former includes MSR_IA32_PL{0,1,2}_SSP and are
controlled by CET supervisor mode bit(S_CET bit) in XSS, the later consists
of MSR_IA32_S_CET and MSR_IA32_INTR_SSP_TBL.

VMX introduces new VMCS fields, {GUEST|HOST}_{S_CET,SSP,INTR_SSP_TABL}, to
facilitate guest/host non-XSAVES-managed states. When VMX CET entry/exit load
bits are set, guest/host MSR_IA32_{S_CET,INTR_SSP_TBL,SSP} are loaded from
equivalent fields at VM-Exit/Entry. With these new fields, such supervisor
states require no addtional KVM save/reload actions.

Tests
======
This series has successfully passed the basic CET user shadow stack test
and kernel IBT test in both L1 and L2 guests. The newly added
KVM-unit-tests [2] also passed, and its v11 has been tested with the AMD
CET series by John [3].

For your convenience, you can use my WIP QEMU [1] for testing.

[1]: https://github.com/gaochaointel/qemu-dev qemu-cet
[2]: https://lore.kernel.org/kvm/20250626073459.12990-1-minipli@grsecurity.net/
[3]: https://lore.kernel.org/kvm/aH6CH+x5mCDrvtoz@AUSJOHALLEN.amd.com/

Chao Gao (5):
  KVM: x86: Check XSS validity against guest CPUIDs
  KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET
  KVM: nVMX: Add consistency checks for CET states
  KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state
  KVM: selftest: Add tests for KVM_{GET,SET}_ONE_REG

Sean Christopherson (2):
  KVM: x86: Report XSS as to-be-saved if there are supported features
  KVM: x86: Load guest FPU state when access XSAVE-managed MSRs

Yang Weijiang (15):
  KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support
  KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS
  KVM: x86: Initialize kvm_caps.supported_xss
  KVM: x86: Add fault checks for guest CR4.CET setting
  KVM: x86: Report KVM supported CET MSRs as to-be-saved
  KVM: VMX: Introduce CET VMCS fields and control bits
  KVM: x86: Enable guest SSP read/write interface with new uAPIs
  KVM: VMX: Emulate read and write to CET MSRs
  KVM: x86: Save and reload SSP to/from SMRAM
  KVM: VMX: Set up interception for CET MSRs
  KVM: VMX: Set host constant supervisor states to VMCS fields
  KVM: x86: Don't emulate instructions guarded by CET
  KVM: x86: Enable CET virtualization for VMX and advertise to userspace
  KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2
  KVM: nVMX: Prepare for enabling CET support for nested guest

 Documentation/virt/kvm/api.rst                |   9 +
 arch/x86/include/asm/kvm_host.h               |   5 +-
 arch/x86/include/asm/vmx.h                    |   9 +
 arch/x86/include/uapi/asm/kvm.h               |  29 ++
 arch/x86/kvm/cpuid.c                          |  17 +-
 arch/x86/kvm/emulate.c                        |  46 ++-
 arch/x86/kvm/smm.c                            |   8 +
 arch/x86/kvm/smm.h                            |   2 +-
 arch/x86/kvm/svm/svm.c                        |   4 +
 arch/x86/kvm/vmx/capabilities.h               |   9 +
 arch/x86/kvm/vmx/nested.c                     | 163 ++++++++++-
 arch/x86/kvm/vmx/nested.h                     |   5 +
 arch/x86/kvm/vmx/vmcs12.c                     |   6 +
 arch/x86/kvm/vmx/vmcs12.h                     |  14 +-
 arch/x86/kvm/vmx/vmx.c                        |  85 +++++-
 arch/x86/kvm/vmx/vmx.h                        |   9 +-
 arch/x86/kvm/x86.c                            | 264 +++++++++++++++++-
 arch/x86/kvm/x86.h                            |  61 ++++
 tools/arch/x86/include/uapi/asm/kvm.h         |  29 ++
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../selftests/kvm/x86/get_set_one_reg.c       |  30 ++
 21 files changed, 764 insertions(+), 41 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86/get_set_one_reg.c

-- 
2.47.3

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v14 01/22] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
@ 2025-09-09  9:39 ` Chao Gao
  2025-09-10  9:03   ` Xiaoyao Li
                     ` (2 more replies)
  2025-09-09  9:39 ` [PATCH v14 02/22] KVM: x86: Report XSS as to-be-saved if there are supported features Chao Gao
                   ` (21 subsequent siblings)
  22 siblings, 3 replies; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

From: Yang Weijiang <weijiang.yang@intel.com>

Enable KVM_{G,S}ET_ONE_REG uAPIs so that userspace can access MSRs and
other non-MSR registers through them.

This is in preparation for allowing userspace to read/write the guest SSP
register, which is needed for the upcoming CET virtualization support.

Currently, two types of registers are supported: KVM_X86_REG_TYPE_MSR and
KVM_X86_REG_TYPE_KVM. All MSRs are in the former type; the latter type is
added for registers that lack existing KVM uAPIs to access them. The "KVM"
in the name is intended to be vague to give KVM flexibility to include
other potential registers. We considered some specific names, like
"SYNTHETIC" and "SYNTHETIC_MSR" before, but both are confusing and may put
KVM itself into a corner.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Link: https://lore.kernel.org/all/20240219074733.122080-18-weijiang.yang@intel.com/ [1]
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v14:
- Rename the group type of guest SSP register to KVM_X86_REG_KVM
- Add docs for id patterns for x86 in api.rst
- Update commit message
---
 Documentation/virt/kvm/api.rst  |  2 +
 arch/x86/include/uapi/asm/kvm.h | 26 +++++++++++
 arch/x86/kvm/x86.c              | 78 +++++++++++++++++++++++++++++++++
 3 files changed, 106 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 6aa40ee05a4a..28fc12b46eeb 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -2908,6 +2908,8 @@ such as set vcpu counter or reset vcpu, and they have the following id bit patte
 
   0x9030 0000 0002 <reg:16>
 
+x86 MSR registers have the following id bit patterns::
+  0x2030 0002 <msr number:32>
 
 4.69 KVM_GET_ONE_REG
 --------------------
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 0f15d683817d..508b713ca52e 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -411,6 +411,32 @@ struct kvm_xcrs {
 	__u64 padding[16];
 };
 
+#define KVM_X86_REG_TYPE_MSR		2
+#define KVM_X86_REG_TYPE_KVM		3
+
+#define KVM_X86_KVM_REG_SIZE(reg)						\
+({										\
+	reg == KVM_REG_GUEST_SSP ? KVM_REG_SIZE_U64 : 0;			\
+})
+
+#define KVM_X86_REG_TYPE_SIZE(type, reg)					\
+({										\
+	__u64 type_size = (__u64)type << 32;					\
+										\
+	type_size |= type == KVM_X86_REG_TYPE_MSR ? KVM_REG_SIZE_U64 :		\
+		     type == KVM_X86_REG_TYPE_KVM ? KVM_X86_KVM_REG_SIZE(reg) :	\
+		     0;								\
+	type_size;								\
+})
+
+#define KVM_X86_REG_ENCODE(type, index)				\
+	(KVM_REG_X86 | KVM_X86_REG_TYPE_SIZE(type, index) | index)
+
+#define KVM_X86_REG_MSR(index)					\
+	KVM_X86_REG_ENCODE(KVM_X86_REG_TYPE_MSR, index)
+#define KVM_X86_REG_KVM(index)					\
+	KVM_X86_REG_ENCODE(KVM_X86_REG_TYPE_KVM, index)
+
 #define KVM_SYNC_X86_REGS      (1UL << 0)
 #define KVM_SYNC_X86_SREGS     (1UL << 1)
 #define KVM_SYNC_X86_EVENTS    (1UL << 2)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7ba2cdfdac44..f32d3edfc7b1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2254,6 +2254,31 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
 	return kvm_set_msr_ignored_check(vcpu, index, *data, true);
 }
 
+static int kvm_get_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *value)
+{
+	u64 val;
+	int r;
+
+	r = do_get_msr(vcpu, msr, &val);
+	if (r)
+		return r;
+
+	if (put_user(val, value))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int kvm_set_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *value)
+{
+	u64 val;
+
+	if (get_user(val, value))
+		return -EFAULT;
+
+	return do_set_msr(vcpu, msr, &val);
+}
+
 #ifdef CONFIG_X86_64
 struct pvclock_clock {
 	int vclock_mode;
@@ -4737,6 +4762,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_IRQFD_RESAMPLE:
 	case KVM_CAP_MEMORY_FAULT_INFO:
 	case KVM_CAP_X86_GUEST_MODE:
+	case KVM_CAP_ONE_REG:
 		r = 1;
 		break;
 	case KVM_CAP_PRE_FAULT_MEMORY:
@@ -5915,6 +5941,20 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 	}
 }
 
+struct kvm_x86_reg_id {
+	__u32 index;
+	__u8  type;
+	__u8  rsvd1;
+	__u8  rsvd2:4;
+	__u8  size:4;
+	__u8  x86;
+};
+
+static int kvm_translate_kvm_reg(struct kvm_x86_reg_id *reg)
+{
+	return -EINVAL;
+}
+
 long kvm_arch_vcpu_ioctl(struct file *filp,
 			 unsigned int ioctl, unsigned long arg)
 {
@@ -6031,6 +6071,44 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		srcu_read_unlock(&vcpu->kvm->srcu, idx);
 		break;
 	}
+	case KVM_GET_ONE_REG:
+	case KVM_SET_ONE_REG: {
+		struct kvm_x86_reg_id *id;
+		struct kvm_one_reg reg;
+		u64 __user *value;
+
+		r = -EFAULT;
+		if (copy_from_user(&reg, argp, sizeof(reg)))
+			break;
+
+		r = -EINVAL;
+		if ((reg.id & KVM_REG_ARCH_MASK) != KVM_REG_X86)
+			break;
+
+		id = (struct kvm_x86_reg_id *)&reg.id;
+		if (id->rsvd1 || id->rsvd2)
+			break;
+
+		if (id->type == KVM_X86_REG_TYPE_KVM) {
+			r = kvm_translate_kvm_reg(id);
+			if (r)
+				break;
+		}
+
+		r = -EINVAL;
+		if (id->type != KVM_X86_REG_TYPE_MSR)
+			break;
+
+		if ((reg.id & KVM_REG_SIZE_MASK) != KVM_REG_SIZE_U64)
+			break;
+
+		value = u64_to_user_ptr(reg.addr);
+		if (ioctl == KVM_GET_ONE_REG)
+			r = kvm_get_one_msr(vcpu, id->index, value);
+		else
+			r = kvm_set_one_msr(vcpu, id->index, value);
+		break;
+	}
 	case KVM_TPR_ACCESS_REPORTING: {
 		struct kvm_tpr_access_ctl tac;
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v14 02/22] KVM: x86: Report XSS as to-be-saved if there are supported features
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
  2025-09-09  9:39 ` [PATCH v14 01/22] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support Chao Gao
@ 2025-09-09  9:39 ` Chao Gao
  2025-09-11  6:52   ` Binbin Wu
  2025-09-09  9:39 ` [PATCH v14 03/22] KVM: x86: Check XSS validity against guest CPUIDs Chao Gao
                   ` (20 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

From: Sean Christopherson <seanjc@google.com>

Add MSR_IA32_XSS to list of MSRs reported to userspace if supported_xss
is non-zero, i.e. KVM supports at least one XSS based feature.

Before enabling CET virtualization series, guest IA32_MSR_XSS is
guaranteed to be 0, i.e., XSAVES/XRSTORS is executed in non-root mode
with XSS == 0, which equals to the effect of XSAVE/XRSTOR.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/x86.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f32d3edfc7b1..47b60f275fd7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -335,7 +335,7 @@ static const u32 msrs_to_save_base[] = {
 	MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
 	MSR_IA32_UMWAIT_CONTROL,
 
-	MSR_IA32_XFD, MSR_IA32_XFD_ERR,
+	MSR_IA32_XFD, MSR_IA32_XFD_ERR, MSR_IA32_XSS,
 };
 
 static const u32 msrs_to_save_pmu[] = {
@@ -7470,6 +7470,10 @@ static void kvm_probe_msr_to_save(u32 msr_index)
 		if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR))
 			return;
 		break;
+	case MSR_IA32_XSS:
+		if (!kvm_caps.supported_xss)
+			return;
+		break;
 	default:
 		break;
 	}
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v14 03/22] KVM: x86: Check XSS validity against guest CPUIDs
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
  2025-09-09  9:39 ` [PATCH v14 01/22] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support Chao Gao
  2025-09-09  9:39 ` [PATCH v14 02/22] KVM: x86: Report XSS as to-be-saved if there are supported features Chao Gao
@ 2025-09-09  9:39 ` Chao Gao
  2025-09-10  9:22   ` Xiaoyao Li
  2025-09-09  9:39 ` [PATCH v14 04/22] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS Chao Gao
                   ` (19 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

Maintain per-guest valid XSS bits and check XSS validity against them
rather than against KVM capabilities. This is to prevent bits that are
supported by KVM but not supported for a guest from being set.

Opportunistically return KVM_MSR_RET_UNSUPPORTED on IA32_XSS MSR accesses
if guest CPUID doesn't enumerate X86_FEATURE_XSAVES. Since
KVM_MSR_RET_UNSUPPORTED takes care of host_initiated cases, drop the
host_initiated check.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v14 - new, introduce guest_supported_xss in a separate patch (Xiaoyao)
---
 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/kvm/cpuid.c            | 12 ++++++++++++
 arch/x86/kvm/x86.c              |  7 +++----
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0d3cc0fc27af..b2983c830247 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -815,7 +815,6 @@ struct kvm_vcpu_arch {
 	bool at_instruction_boundary;
 	bool tpr_access_reporting;
 	bool xfd_no_write_intercept;
-	u64 ia32_xss;
 	u64 microcode_version;
 	u64 arch_capabilities;
 	u64 perf_capabilities;
@@ -876,6 +875,8 @@ struct kvm_vcpu_arch {
 
 	u64 xcr0;
 	u64 guest_supported_xcr0;
+	u64 ia32_xss;
+	u64 guest_supported_xss;
 
 	struct kvm_pio_request pio;
 	void *pio_data;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index ad6cadf09930..46cf616663e6 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -263,6 +263,17 @@ static u64 cpuid_get_supported_xcr0(struct kvm_vcpu *vcpu)
 	return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0;
 }
 
+static u64 cpuid_get_supported_xss(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpuid_entry2 *best;
+
+	best = kvm_find_cpuid_entry_index(vcpu, 0xd, 1);
+	if (!best)
+		return 0;
+
+	return (best->ecx | ((u64)best->edx << 32)) & kvm_caps.supported_xss;
+}
+
 static __always_inline void kvm_update_feature_runtime(struct kvm_vcpu *vcpu,
 						       struct kvm_cpuid_entry2 *entry,
 						       unsigned int x86_feature,
@@ -424,6 +435,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	}
 
 	vcpu->arch.guest_supported_xcr0 = cpuid_get_supported_xcr0(vcpu);
+	vcpu->arch.guest_supported_xss = cpuid_get_supported_xss(vcpu);
 
 	vcpu->arch.pv_cpuid.features = kvm_apply_cpuid_pv_features_quirk(vcpu);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 47b60f275fd7..6c167117018c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4011,15 +4011,14 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		}
 		break;
 	case MSR_IA32_XSS:
-		if (!msr_info->host_initiated &&
-		    !guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
-			return 1;
+		if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
+			return KVM_MSR_RET_UNSUPPORTED;
 		/*
 		 * KVM supports exposing PT to the guest, but does not support
 		 * IA32_XSS[bit 8]. Guests have to use RDMSR/WRMSR rather than
 		 * XSAVES/XRSTORS to save/restore PT MSRs.
 		 */
-		if (data & ~kvm_caps.supported_xss)
+		if (data & ~vcpu->arch.guest_supported_xss)
 			return 1;
 		vcpu->arch.ia32_xss = data;
 		vcpu->arch.cpuid_dynamic_bits_dirty = true;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v14 04/22] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
                   ` (2 preceding siblings ...)
  2025-09-09  9:39 ` [PATCH v14 03/22] KVM: x86: Check XSS validity against guest CPUIDs Chao Gao
@ 2025-09-09  9:39 ` Chao Gao
  2025-09-10  9:23   ` Xiaoyao Li
  2025-09-11  7:02   ` Binbin Wu
  2025-09-09  9:39 ` [PATCH v14 05/22] KVM: x86: Initialize kvm_caps.supported_xss Chao Gao
                   ` (18 subsequent siblings)
  22 siblings, 2 replies; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

From: Yang Weijiang <weijiang.yang@intel.com>

Update CPUID.(EAX=0DH,ECX=1).EBX to reflect current required xstate size
due to XSS MSR modification.
CPUID(EAX=0DH,ECX=1).EBX reports the required storage size of all enabled
xstate features in (XCR0 | IA32_XSS). The CPUID value can be used by guest
before allocate sufficient xsave buffer.

Note, KVM does not yet support any XSS based features, i.e. supported_xss
is guaranteed to be zero at this time.

Opportunistically skip CPUID updates if XSS value doesn't change.

Suggested-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/cpuid.c | 3 ++-
 arch/x86/kvm/x86.c   | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 46cf616663e6..b5f87254ced7 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -316,7 +316,8 @@ static void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
 	best = kvm_find_cpuid_entry_index(vcpu, 0xD, 1);
 	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
 		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
-		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
+		best->ebx = xstate_required_size(vcpu->arch.xcr0 |
+						 vcpu->arch.ia32_xss, true);
 }
 
 static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6c167117018c..bbae3bf405c7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4020,6 +4020,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		 */
 		if (data & ~vcpu->arch.guest_supported_xss)
 			return 1;
+		if (vcpu->arch.ia32_xss == data)
+			break;
 		vcpu->arch.ia32_xss = data;
 		vcpu->arch.cpuid_dynamic_bits_dirty = true;
 		break;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v14 05/22] KVM: x86: Initialize kvm_caps.supported_xss
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
                   ` (3 preceding siblings ...)
  2025-09-09  9:39 ` [PATCH v14 04/22] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS Chao Gao
@ 2025-09-09  9:39 ` Chao Gao
  2025-09-10  9:36   ` Xiaoyao Li
  2025-09-09  9:39 ` [PATCH v14 06/22] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Chao Gao
                   ` (17 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

From: Yang Weijiang <weijiang.yang@intel.com>

Set original kvm_caps.supported_xss to (host_xss & KVM_SUPPORTED_XSS) if
XSAVES is supported. host_xss contains the host supported xstate feature
bits for thread FPU context switch, KVM_SUPPORTED_XSS includes all KVM
enabled XSS feature bits, the resulting value represents the supervisor
xstates that are available to guest and are backed by host FPU framework
for swapping {guest,host} XSAVE-managed registers/MSRs.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/x86.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bbae3bf405c7..c15e8c00dc7d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -220,6 +220,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
 				| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
 				| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
 
+#define KVM_SUPPORTED_XSS     0
+
 bool __read_mostly allow_smaller_maxphyaddr = 0;
 EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr);
 
@@ -9789,14 +9791,17 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 		kvm_host.xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
 		kvm_caps.supported_xcr0 = kvm_host.xcr0 & KVM_SUPPORTED_XCR0;
 	}
+
+	if (boot_cpu_has(X86_FEATURE_XSAVES)) {
+		rdmsrq(MSR_IA32_XSS, kvm_host.xss);
+		kvm_caps.supported_xss = kvm_host.xss & KVM_SUPPORTED_XSS;
+	}
+
 	kvm_caps.supported_quirks = KVM_X86_VALID_QUIRKS;
 	kvm_caps.inapplicable_quirks = KVM_X86_CONDITIONAL_QUIRKS;
 
 	rdmsrq_safe(MSR_EFER, &kvm_host.efer);
 
-	if (boot_cpu_has(X86_FEATURE_XSAVES))
-		rdmsrq(MSR_IA32_XSS, kvm_host.xss);
-
 	kvm_init_pmu_capability(ops->pmu_ops);
 
 	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v14 06/22] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
                   ` (4 preceding siblings ...)
  2025-09-09  9:39 ` [PATCH v14 05/22] KVM: x86: Initialize kvm_caps.supported_xss Chao Gao
@ 2025-09-09  9:39 ` Chao Gao
  2025-09-10  9:37   ` Xiaoyao Li
  2025-09-09  9:39 ` [PATCH v14 07/22] KVM: x86: Add fault checks for guest CR4.CET setting Chao Gao
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

From: Sean Christopherson <seanjc@google.com>

Load the guest's FPU state if userspace is accessing MSRs whose values
are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(),
to facilitate access to such kind of MSRs.

If MSRs supported in kvm_caps.supported_xss are passed through to guest,
the guest MSRs are swapped with host's before vCPU exits to userspace and
after it reenters kernel before next VM-entry.

Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
explicitly check @vcpu is non-null before attempting to load guest state.
The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without
loading guest FPU state (which doesn't exist).

Note that guest_cpuid_has() is not queried as host userspace is allowed to
access MSRs that have not been exposed to the guest, e.g. it might do
KVM_SET_MSRS prior to KVM_SET_CPUID2.

The two helpers are put here in order to manifest accessing xsave-managed
MSRs requires special check and handling to guarantee the correctness of
read/write to the MSRs.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v14:
- s/rdmsrl/rdmsrq, s/wrmsrl/wrmsrq (Xin)
- return true in is_xstate_managed_msr() for MSR_IA32_S_CET
---
 arch/x86/kvm/x86.c | 36 +++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.h | 24 ++++++++++++++++++++++++
 2 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c15e8c00dc7d..7c0a07be6b64 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -136,6 +136,9 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
 static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
 
 static DEFINE_MUTEX(vendor_module_lock);
+static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
+static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
+
 struct kvm_x86_ops kvm_x86_ops __read_mostly;
 
 #define KVM_X86_OP(func)					     \
@@ -4566,6 +4569,22 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 }
 EXPORT_SYMBOL_GPL(kvm_get_msr_common);
 
+/*
+ *  Returns true if the MSR in question is managed via XSTATE, i.e. is context
+ *  switched with the rest of guest FPU state.
+ */
+static bool is_xstate_managed_msr(u32 index)
+{
+	switch (index) {
+	case MSR_IA32_S_CET:
+	case MSR_IA32_U_CET:
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+		return true;
+	default:
+		return false;
+	}
+}
+
 /*
  * Read or write a bunch of msrs. All parameters are kernel addresses.
  *
@@ -4576,11 +4595,26 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
 		    int (*do_msr)(struct kvm_vcpu *vcpu,
 				  unsigned index, u64 *data))
 {
+	bool fpu_loaded = false;
 	int i;
 
-	for (i = 0; i < msrs->nmsrs; ++i)
+	for (i = 0; i < msrs->nmsrs; ++i) {
+		/*
+		 * If userspace is accessing one or more XSTATE-managed MSRs,
+		 * temporarily load the guest's FPU state so that the guest's
+		 * MSR value(s) is resident in hardware, i.e. so that KVM can
+		 * get/set the MSR via RDMSR/WRMSR.
+		 */
+		if (vcpu && !fpu_loaded && kvm_caps.supported_xss &&
+		    is_xstate_managed_msr(entries[i].index)) {
+			kvm_load_guest_fpu(vcpu);
+			fpu_loaded = true;
+		}
 		if (do_msr(vcpu, entries[i].index, &entries[i].data))
 			break;
+	}
+	if (fpu_loaded)
+		kvm_put_guest_fpu(vcpu);
 
 	return i;
 }
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index eb3088684e8a..34afe43579bb 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -701,4 +701,28 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl,
 
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
 
+/*
+ * Lock and/or reload guest FPU and access xstate MSRs. For accesses initiated
+ * by host, guest FPU is loaded in __msr_io(). For accesses initiated by guest,
+ * guest FPU should have been loaded already.
+ */
+
+static inline void kvm_get_xstate_msr(struct kvm_vcpu *vcpu,
+				      struct msr_data *msr_info)
+{
+	KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
+	kvm_fpu_get();
+	rdmsrq(msr_info->index, msr_info->data);
+	kvm_fpu_put();
+}
+
+static inline void kvm_set_xstate_msr(struct kvm_vcpu *vcpu,
+				      struct msr_data *msr_info)
+{
+	KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
+	kvm_fpu_get();
+	wrmsrq(msr_info->index, msr_info->data);
+	kvm_fpu_put();
+}
+
 #endif
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v14 07/22] KVM: x86: Add fault checks for guest CR4.CET setting
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
                   ` (5 preceding siblings ...)
  2025-09-09  9:39 ` [PATCH v14 06/22] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Chao Gao
@ 2025-09-09  9:39 ` Chao Gao
  2025-09-10  9:38   ` Xiaoyao Li
  2025-09-09  9:39 ` [PATCH v14 08/22] KVM: x86: Report KVM supported CET MSRs as to-be-saved Chao Gao
                   ` (15 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

From: Yang Weijiang <weijiang.yang@intel.com>

Check potential faults for CR4.CET setting per Intel SDM requirements.
CET can be enabled if and only if CR0.WP == 1, i.e. setting CR4.CET ==
1 faults if CR0.WP == 0 and setting CR0.WP == 0 fails if CR4.CET == 1.

Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/x86.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7c0a07be6b64..50c192c99a7e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1173,6 +1173,9 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 	    (is_64_bit_mode(vcpu) || kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE)))
 		return 1;
 
+	if (!(cr0 & X86_CR0_WP) && kvm_is_cr4_bit_set(vcpu, X86_CR4_CET))
+		return 1;
+
 	kvm_x86_call(set_cr0)(vcpu, cr0);
 
 	kvm_post_set_cr0(vcpu, old_cr0, cr0);
@@ -1372,6 +1375,9 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 			return 1;
 	}
 
+	if ((cr4 & X86_CR4_CET) && !kvm_is_cr0_bit_set(vcpu, X86_CR0_WP))
+		return 1;
+
 	kvm_x86_call(set_cr4)(vcpu, cr4);
 
 	kvm_post_set_cr4(vcpu, old_cr4, cr4);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v14 08/22] KVM: x86: Report KVM supported CET MSRs as to-be-saved
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
                   ` (6 preceding siblings ...)
  2025-09-09  9:39 ` [PATCH v14 07/22] KVM: x86: Add fault checks for guest CR4.CET setting Chao Gao
@ 2025-09-09  9:39 ` Chao Gao
  2025-09-09  9:39 ` [PATCH v14 09/22] KVM: VMX: Introduce CET VMCS fields and control bits Chao Gao
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

From: Yang Weijiang <weijiang.yang@intel.com>

Add CET MSRs to the list of MSRs reported to userspace if the feature,
i.e. IBT or SHSTK, associated with the MSRs is supported by KVM.

Suggested-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/x86.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 50c192c99a7e..691f8e68046f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -341,6 +341,10 @@ static const u32 msrs_to_save_base[] = {
 	MSR_IA32_UMWAIT_CONTROL,
 
 	MSR_IA32_XFD, MSR_IA32_XFD_ERR, MSR_IA32_XSS,
+
+	MSR_IA32_U_CET, MSR_IA32_S_CET,
+	MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP,
+	MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB,
 };
 
 static const u32 msrs_to_save_pmu[] = {
@@ -7517,6 +7521,20 @@ static void kvm_probe_msr_to_save(u32 msr_index)
 		if (!kvm_caps.supported_xss)
 			return;
 		break;
+	case MSR_IA32_U_CET:
+	case MSR_IA32_S_CET:
+		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
+		    !kvm_cpu_cap_has(X86_FEATURE_IBT))
+			return;
+		break;
+	case MSR_IA32_INT_SSP_TAB:
+		if (!kvm_cpu_cap_has(X86_FEATURE_LM))
+			return;
+		fallthrough;
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
+			return;
+		break;
 	default:
 		break;
 	}
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v14 09/22] KVM: VMX: Introduce CET VMCS fields and control bits
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
                   ` (7 preceding siblings ...)
  2025-09-09  9:39 ` [PATCH v14 08/22] KVM: x86: Report KVM supported CET MSRs as to-be-saved Chao Gao
@ 2025-09-09  9:39 ` Chao Gao
  2025-09-09  9:39 ` [PATCH v14 10/22] KVM: x86: Enable guest SSP read/write interface with new uAPIs Chao Gao
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

From: Yang Weijiang <weijiang.yang@intel.com>

Control-flow Enforcement Technology (CET) is a kind of CPU feature used
to prevent Return/CALL/Jump-Oriented Programming (ROP/COP/JOP) attacks.
It provides two sub-features(SHSTK,IBT) to defend against ROP/COP/JOP
style control-flow subversion attacks.

Shadow Stack (SHSTK):
  A shadow stack is a second stack used exclusively for control transfer
  operations. The shadow stack is separate from the data/normal stack and
  can be enabled individually in user and kernel mode. When shadow stack
  is enabled, CALL pushes the return address on both the data and shadow
  stack. RET pops the return address from both stacks and compares them.
  If the return addresses from the two stacks do not match, the processor
  generates a #CP.

Indirect Branch Tracking (IBT):
  IBT introduces instruction(ENDBRANCH)to mark valid target addresses of
  indirect branches (CALL, JMP etc...). If an indirect branch is executed
  and the next instruction is _not_ an ENDBRANCH, the processor generates
  a #CP. These instruction behaves as a NOP on platforms that have no CET.

Several new CET MSRs are defined to support CET:
  MSR_IA32_{U,S}_CET: CET settings for {user,supervisor} CET respectively.

  MSR_IA32_PL{0,1,2,3}_SSP: SHSTK pointer linear address for CPL{0,1,2,3}.

  MSR_IA32_INT_SSP_TAB: Linear address of SHSTK pointer table, whose entry
			is indexed by IST of interrupt gate desc.

Two XSAVES state bits are introduced for CET:
  IA32_XSS:[bit 11]: Control saving/restoring user mode CET states
  IA32_XSS:[bit 12]: Control saving/restoring supervisor mode CET states.

Six VMCS fields are introduced for CET:
  {HOST,GUEST}_S_CET: Stores CET settings for kernel mode.
  {HOST,GUEST}_SSP: Stores current active SSP.
  {HOST,GUEST}_INTR_SSP_TABLE: Stores current active MSR_IA32_INT_SSP_TAB.

On Intel platforms, two additional bits are defined in VM_EXIT and VM_ENTRY
control fields:
If VM_EXIT_LOAD_CET_STATE = 1, host CET states are loaded from following
VMCS fields at VM-Exit:
  HOST_S_CET
  HOST_SSP
  HOST_INTR_SSP_TABLE

If VM_ENTRY_LOAD_CET_STATE = 1, guest CET states are loaded from following
VMCS fields at VM-Entry:
  GUEST_S_CET
  GUEST_SSP
  GUEST_INTR_SSP_TABLE

Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/include/asm/vmx.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index cca7d6641287..ce10a7e2d3d9 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -106,6 +106,7 @@
 #define VM_EXIT_CLEAR_BNDCFGS                   0x00800000
 #define VM_EXIT_PT_CONCEAL_PIP			0x01000000
 #define VM_EXIT_CLEAR_IA32_RTIT_CTL		0x02000000
+#define VM_EXIT_LOAD_CET_STATE                  0x10000000

 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR	0x00036dff

@@ -119,6 +120,7 @@
 #define VM_ENTRY_LOAD_BNDCFGS                   0x00010000
 #define VM_ENTRY_PT_CONCEAL_PIP			0x00020000
 #define VM_ENTRY_LOAD_IA32_RTIT_CTL		0x00040000
+#define VM_ENTRY_LOAD_CET_STATE                 0x00100000

 #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR	0x000011ff

@@ -369,6 +371,9 @@ enum vmcs_field {
 	GUEST_PENDING_DBG_EXCEPTIONS    = 0x00006822,
 	GUEST_SYSENTER_ESP              = 0x00006824,
 	GUEST_SYSENTER_EIP              = 0x00006826,
+	GUEST_S_CET                     = 0x00006828,
+	GUEST_SSP                       = 0x0000682a,
+	GUEST_INTR_SSP_TABLE            = 0x0000682c,
 	HOST_CR0                        = 0x00006c00,
 	HOST_CR3                        = 0x00006c02,
 	HOST_CR4                        = 0x00006c04,
@@ -381,6 +386,9 @@ enum vmcs_field {
 	HOST_IA32_SYSENTER_EIP          = 0x00006c12,
 	HOST_RSP                        = 0x00006c14,
 	HOST_RIP                        = 0x00006c16,
+	HOST_S_CET                      = 0x00006c18,
+	HOST_SSP                        = 0x00006c1a,
+	HOST_INTR_SSP_TABLE             = 0x00006c1c
 };

 /*
-- 
2.47.3

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v14 10/22] KVM: x86: Enable guest SSP read/write interface with new uAPIs
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
                   ` (8 preceding siblings ...)
  2025-09-09  9:39 ` [PATCH v14 09/22] KVM: VMX: Introduce CET VMCS fields and control bits Chao Gao
@ 2025-09-09  9:39 ` Chao Gao
  2025-09-09  9:39 ` [PATCH v14 11/22] KVM: VMX: Emulate read and write to CET MSRs Chao Gao
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

From: Yang Weijiang <weijiang.yang@intel.com>

Enable guest shadow stack pointer(SSP) access interface with new uAPIs.
CET guest SSP is HW register which has corresponding VMCS field to save
/restore guest values when VM-{Exit,Entry} happens. KVM handles SSP as
a synthetic MSR for userspace access.

Use a translation helper to set up mapping for SSP synthetic index and
KVM-internal MSR index so that userspace doesn't need to take care of
KVM's management for synthetic MSRs and avoid conflicts.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 Documentation/virt/kvm/api.rst  |  7 +++++++
 arch/x86/include/uapi/asm/kvm.h |  3 +++
 arch/x86/kvm/x86.c              | 10 +++++++++-
 arch/x86/kvm/x86.h              | 10 ++++++++++
 4 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 28fc12b46eeb..2b999408a768 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -2911,6 +2911,13 @@ such as set vcpu counter or reset vcpu, and they have the following id bit patte
 x86 MSR registers have the following id bit patterns::
   0x2030 0002 <msr number:32>
 
+Following are the KVM-defined registers for x86:
+======================= ========= =============================================
+    Encoding            Register  Description
+======================= ========= =============================================
+  0x2030 0003 0000 0000 SSP       Shadow Stack Pointer
+======================= ========= =============================================
+
 4.69 KVM_GET_ONE_REG
 --------------------
 
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 508b713ca52e..8cc79eca34b2 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -437,6 +437,9 @@ struct kvm_xcrs {
 #define KVM_X86_REG_KVM(index)					\
 	KVM_X86_REG_ENCODE(KVM_X86_REG_TYPE_KVM, index)
 
+/* KVM-defined registers starting from 0 */
+#define KVM_REG_GUEST_SSP	0
+
 #define KVM_SYNC_X86_REGS      (1UL << 0)
 #define KVM_SYNC_X86_SREGS     (1UL << 1)
 #define KVM_SYNC_X86_EVENTS    (1UL << 2)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 691f8e68046f..a6036eab3852 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5999,7 +5999,15 @@ struct kvm_x86_reg_id {
 
 static int kvm_translate_kvm_reg(struct kvm_x86_reg_id *reg)
 {
-	return -EINVAL;
+	switch (reg->index) {
+	case KVM_REG_GUEST_SSP:
+		reg->type = KVM_X86_REG_TYPE_MSR;
+		reg->index = MSR_KVM_INTERNAL_GUEST_SSP;
+		break;
+	default:
+		return -EINVAL;
+	}
+	return 0;
 }
 
 long kvm_arch_vcpu_ioctl(struct file *filp,
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 34afe43579bb..cf4f73a95825 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -101,6 +101,16 @@ do {											\
 #define KVM_SVM_DEFAULT_PLE_WINDOW_MAX	USHRT_MAX
 #define KVM_SVM_DEFAULT_PLE_WINDOW	3000
 
+/*
+ * KVM's internal, non-ABI indices for synthetic MSRs. The values themselves
+ * are arbitrary and have no meaning, the only requirement is that they don't
+ * conflict with "real" MSRs that KVM supports. Use values at the upper end
+ * of KVM's reserved paravirtual MSR range to minimize churn, i.e. these values
+ * will be usable until KVM exhausts its supply of paravirtual MSR indices.
+ */
+
+#define MSR_KVM_INTERNAL_GUEST_SSP	0x4b564dff
+
 static inline unsigned int __grow_ple_window(unsigned int val,
 		unsigned int base, unsigned int modifier, unsigned int max)
 {
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v14 11/22] KVM: VMX: Emulate read and write to CET MSRs
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
                   ` (9 preceding siblings ...)
  2025-09-09  9:39 ` [PATCH v14 10/22] KVM: x86: Enable guest SSP read/write interface with new uAPIs Chao Gao
@ 2025-09-09  9:39 ` Chao Gao
  2025-09-11  8:05   ` Xiaoyao Li
  2025-09-09  9:39 ` [PATCH v14 12/22] KVM: x86: Save and reload SSP to/from SMRAM Chao Gao
                   ` (11 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

From: Yang Weijiang <weijiang.yang@intel.com>

Add emulation interface for CET MSR access. The emulation code is split
into common part and vendor specific part. The former does common checks
for MSRs, e.g., accessibility, data validity etc., then passes operation
to either XSAVE-managed MSRs via the helpers or CET VMCS fields.

SSP can only be read via RDSSP. Writing even requires destructive and
potentially faulting operations such as SAVEPREVSSP/RSTORSSP or
SETSSBSY/CLRSSBSY. Let the host use a pseudo-MSR that is just a wrapper
for the GUEST_SSP field of the VMCS.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v14:
- Update both hardware MSR value and VMCS field when userspace writes to
  MSR_IA32_S_CET. This keeps guest FPU and VMCS always inconsistent
  regarding MSR_IA32_S_CET.
---
 arch/x86/kvm/vmx/vmx.c | 19 +++++++++++++
 arch/x86/kvm/x86.c     | 60 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.h     | 23 ++++++++++++++++
 3 files changed, 102 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 227b45430ad8..22bd71bebfad 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2106,6 +2106,15 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		else
 			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
 		break;
+	case MSR_IA32_S_CET:
+		msr_info->data = vmcs_readl(GUEST_S_CET);
+		break;
+	case MSR_KVM_INTERNAL_GUEST_SSP:
+		msr_info->data = vmcs_readl(GUEST_SSP);
+		break;
+	case MSR_IA32_INT_SSP_TAB:
+		msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
+		break;
 	case MSR_IA32_DEBUGCTLMSR:
 		msr_info->data = vmx_guest_debugctl_read();
 		break;
@@ -2424,6 +2433,16 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		else
 			vmx->pt_desc.guest.addr_a[index / 2] = data;
 		break;
+	case MSR_IA32_S_CET:
+		vmcs_writel(GUEST_S_CET, data);
+		kvm_set_xstate_msr(vcpu, msr_info);
+		break;
+	case MSR_KVM_INTERNAL_GUEST_SSP:
+		vmcs_writel(GUEST_SSP, data);
+		break;
+	case MSR_IA32_INT_SSP_TAB:
+		vmcs_writel(GUEST_INTR_SSP_TABLE, data);
+		break;
 	case MSR_IA32_PERF_CAPABILITIES:
 		if (data & PMU_CAP_LBR_FMT) {
 			if ((data & PMU_CAP_LBR_FMT) !=
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a6036eab3852..79861b7ad44d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1886,6 +1886,44 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
 
 		data = (u32)data;
 		break;
+	case MSR_IA32_U_CET:
+	case MSR_IA32_S_CET:
+		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+		    !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT))
+			return KVM_MSR_RET_UNSUPPORTED;
+		if (!kvm_is_valid_u_s_cet(vcpu, data))
+			return 1;
+		break;
+	case MSR_KVM_INTERNAL_GUEST_SSP:
+		if (!host_initiated)
+			return 1;
+		fallthrough;
+		/*
+		 * Note that the MSR emulation here is flawed when a vCPU
+		 * doesn't support the Intel 64 architecture. The expected
+		 * architectural behavior in this case is that the upper 32
+		 * bits do not exist and should always read '0'. However,
+		 * because the actual hardware on which the virtual CPU is
+		 * running does support Intel 64, XRSTORS/XSAVES in the
+		 * guest could observe behavior that violates the
+		 * architecture. Intercepting XRSTORS/XSAVES for this
+		 * special case isn't deemed worthwhile.
+		 */
+	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+			return KVM_MSR_RET_UNSUPPORTED;
+		/*
+		 * MSR_IA32_INT_SSP_TAB is not present on processors that do
+		 * not support Intel 64 architecture.
+		 */
+		if (index == MSR_IA32_INT_SSP_TAB && !guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
+			return KVM_MSR_RET_UNSUPPORTED;
+		if (is_noncanonical_msr_address(data, vcpu))
+			return 1;
+		/* All SSP MSRs except MSR_IA32_INT_SSP_TAB must be 4-byte aligned */
+		if (index != MSR_IA32_INT_SSP_TAB && !IS_ALIGNED(data, 4))
+			return 1;
+		break;
 	}
 
 	msr.data = data;
@@ -1930,6 +1968,20 @@ static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
 		    !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID))
 			return 1;
 		break;
+	case MSR_IA32_U_CET:
+	case MSR_IA32_S_CET:
+		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+		    !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT))
+			return KVM_MSR_RET_UNSUPPORTED;
+		break;
+	case MSR_KVM_INTERNAL_GUEST_SSP:
+		if (!host_initiated)
+			return 1;
+		fallthrough;
+	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+			return KVM_MSR_RET_UNSUPPORTED;
+		break;
 	}
 
 	msr.index = index;
@@ -4220,6 +4272,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		vcpu->arch.guest_fpu.xfd_err = data;
 		break;
 #endif
+	case MSR_IA32_U_CET:
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+		kvm_set_xstate_msr(vcpu, msr_info);
+		break;
 	default:
 		if (kvm_pmu_is_valid_msr(vcpu, msr))
 			return kvm_pmu_set_msr(vcpu, msr_info);
@@ -4569,6 +4625,10 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		msr_info->data = vcpu->arch.guest_fpu.xfd_err;
 		break;
 #endif
+	case MSR_IA32_U_CET:
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+		kvm_get_xstate_msr(vcpu, msr_info);
+		break;
 	default:
 		if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
 			return kvm_pmu_get_msr(vcpu, msr_info);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index cf4f73a95825..95d2a82a4674 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -735,4 +735,27 @@ static inline void kvm_set_xstate_msr(struct kvm_vcpu *vcpu,
 	kvm_fpu_put();
 }
 
+#define CET_US_RESERVED_BITS		GENMASK(9, 6)
+#define CET_US_SHSTK_MASK_BITS		GENMASK(1, 0)
+#define CET_US_IBT_MASK_BITS		(GENMASK_ULL(5, 2) | GENMASK_ULL(63, 10))
+#define CET_US_LEGACY_BITMAP_BASE(data)	((data) >> 12)
+
+static inline bool kvm_is_valid_u_s_cet(struct kvm_vcpu *vcpu, u64 data)
+{
+	if (data & CET_US_RESERVED_BITS)
+		return false;
+	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+	    (data & CET_US_SHSTK_MASK_BITS))
+		return false;
+	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) &&
+	    (data & CET_US_IBT_MASK_BITS))
+		return false;
+	if (!IS_ALIGNED(CET_US_LEGACY_BITMAP_BASE(data), 4))
+		return false;
+	/* IBT can be suppressed iff the TRACKER isn't WAIT_ENDBR. */
+	if ((data & CET_SUPPRESS) && (data & CET_WAIT_ENDBR))
+		return false;
+
+	return true;
+}
 #endif
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v14 12/22] KVM: x86: Save and reload SSP to/from SMRAM
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
                   ` (10 preceding siblings ...)
  2025-09-09  9:39 ` [PATCH v14 11/22] KVM: VMX: Emulate read and write to CET MSRs Chao Gao
@ 2025-09-09  9:39 ` Chao Gao
  2025-09-09  9:39 ` [PATCH v14 13/22] KVM: VMX: Set up interception for CET MSRs Chao Gao
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

From: Yang Weijiang <weijiang.yang@intel.com>

Save CET SSP to SMRAM on SMI and reload it on RSM. KVM emulates HW arch
behavior when guest enters/leaves SMM mode,i.e., save registers to SMRAM
at the entry of SMM and reload them at the exit to SMM. Per SDM, SSP is
one of such registers on 64-bit Arch, and add the support for SSP.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/smm.c | 8 ++++++++
 arch/x86/kvm/smm.h | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
index 5dd8a1646800..b0b14ba37f9a 100644
--- a/arch/x86/kvm/smm.c
+++ b/arch/x86/kvm/smm.c
@@ -269,6 +269,10 @@ static void enter_smm_save_state_64(struct kvm_vcpu *vcpu,
 	enter_smm_save_seg_64(vcpu, &smram->gs, VCPU_SREG_GS);
 
 	smram->int_shadow = kvm_x86_call(get_interrupt_shadow)(vcpu);
+
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+	    kvm_msr_read(vcpu, MSR_KVM_INTERNAL_GUEST_SSP, &smram->ssp))
+		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
 }
 #endif
 
@@ -558,6 +562,10 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
 	kvm_x86_call(set_interrupt_shadow)(vcpu, 0);
 	ctxt->interruptibility = (u8)smstate->int_shadow;
 
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+	    kvm_msr_write(vcpu, MSR_KVM_INTERNAL_GUEST_SSP, smstate->ssp))
+		return X86EMUL_UNHANDLEABLE;
+
 	return X86EMUL_CONTINUE;
 }
 #endif
diff --git a/arch/x86/kvm/smm.h b/arch/x86/kvm/smm.h
index 551703fbe200..db3c88f16138 100644
--- a/arch/x86/kvm/smm.h
+++ b/arch/x86/kvm/smm.h
@@ -116,8 +116,8 @@ struct kvm_smram_state_64 {
 	u32 smbase;
 	u32 reserved4[5];
 
-	/* ssp and svm_* fields below are not implemented by KVM */
 	u64 ssp;
+	/* svm_* fields below are not implemented by KVM */
 	u64 svm_guest_pat;
 	u64 svm_host_efer;
 	u64 svm_host_cr4;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v14 13/22] KVM: VMX: Set up interception for CET MSRs
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
                   ` (11 preceding siblings ...)
  2025-09-09  9:39 ` [PATCH v14 12/22] KVM: x86: Save and reload SSP to/from SMRAM Chao Gao
@ 2025-09-09  9:39 ` Chao Gao
  2025-09-09  9:39 ` [PATCH v14 14/22] KVM: VMX: Set host constant supervisor states to VMCS fields Chao Gao
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

From: Yang Weijiang <weijiang.yang@intel.com>

Enable/disable CET MSRs interception per associated feature configuration.

Pass through CET MSRs that are managed by XSAVE, as they cannot be
intercepted without also intercepting XSAVE. However, intercepting XSAVE
would likely cause unacceptable performance overhead.
MSR_IA32_INT_SSP_TAB is not managed by XSAVE, so it is intercepted.

Note, this MSR design introduced an architectural limitation of SHSTK and
IBT control for guest, i.e., when SHSTK is exposed, IBT is also available
to guest from architectural perspective since IBT relies on subset of SHSTK
relevant MSRs.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 22bd71bebfad..70f5a9e05cec 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4102,6 +4102,8 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
 
 void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 {
+	bool intercept;
+
 	if (!cpu_has_vmx_msr_bitmap())
 		return;
 
@@ -4147,6 +4149,23 @@ void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 		vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
 					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
 
+	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
+		intercept = !guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
+
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, MSR_TYPE_RW, intercept);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, MSR_TYPE_RW, intercept);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, MSR_TYPE_RW, intercept);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, intercept);
+	}
+
+	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK) || kvm_cpu_cap_has(X86_FEATURE_IBT)) {
+		intercept = !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) &&
+			    !guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
+
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, MSR_TYPE_RW, intercept);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, intercept);
+	}
+
 	/*
 	 * x2APIC and LBR MSR intercepts are modified on-demand and cannot be
 	 * filtered by userspace.
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v14 14/22] KVM: VMX: Set host constant supervisor states to VMCS fields
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
                   ` (12 preceding siblings ...)
  2025-09-09  9:39 ` [PATCH v14 13/22] KVM: VMX: Set up interception for CET MSRs Chao Gao
@ 2025-09-09  9:39 ` Chao Gao
  2025-09-12 22:04   ` Sean Christopherson
  2025-09-09  9:39 ` [PATCH v14 15/22] KVM: x86: Don't emulate instructions guarded by CET Chao Gao
                   ` (8 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

From: Yang Weijiang <weijiang.yang@intel.com>

Save constant values to HOST_{S_CET,SSP,INTR_SSP_TABLE} field explicitly.
Kernel IBT is supported and the setting in MSR_IA32_S_CET is static after
post-boot(The exception is BIOS call case but vCPU thread never across it)
and KVM doesn't need to refresh HOST_S_CET field before every VM-Enter/
VM-Exit sequence.

Host supervisor shadow stack is not enabled now and SSP is not accessible
to kernel mode, thus it's safe to set host IA32_INT_SSP_TAB/SSP VMCS field
to 0s. When shadow stack is enabled for CPL3, SSP is reloaded from PL3_SSP
before it exits to userspace. Check SDM Vol 2A/B Chapter 3/4 for SYSCALL/
SYSRET/SYSENTER SYSEXIT/RDSSP/CALL etc.

Prevent KVM module loading if host supervisor shadow stack SHSTK_EN is set
in MSR_IA32_S_CET as KVM cannot co-exit with it correctly.

Suggested-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/vmx/capabilities.h |  4 ++++
 arch/x86/kvm/vmx/vmx.c          | 15 +++++++++++++++
 arch/x86/kvm/x86.c              | 12 ++++++++++++
 arch/x86/kvm/x86.h              |  1 +
 4 files changed, 32 insertions(+)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 5316c27f6099..7d290b2cb0f4 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -103,6 +103,10 @@ static inline bool cpu_has_load_perf_global_ctrl(void)
 	return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
 }
 
+static inline bool cpu_has_load_cet_ctrl(void)
+{
+	return (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_CET_STATE);
+}
 static inline bool cpu_has_vmx_mpx(void)
 {
 	return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_BNDCFGS;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 70f5a9e05cec..3430a17ecd23 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4321,6 +4321,21 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
 
 	if (cpu_has_load_ia32_efer())
 		vmcs_write64(HOST_IA32_EFER, kvm_host.efer);
+
+	/*
+	 * Supervisor shadow stack is not enabled on host side, i.e.,
+	 * host IA32_S_CET.SHSTK_EN bit is guaranteed to 0 now, per SDM
+	 * description(RDSSP instruction), SSP is not readable in CPL0,
+	 * so resetting the two registers to 0s at VM-Exit does no harm
+	 * to kernel execution. When execution flow exits to userspace,
+	 * SSP is reloaded from IA32_PL3_SSP. Check SDM Vol.2A/B Chapter
+	 * 3 and 4 for details.
+	 */
+	if (cpu_has_load_cet_ctrl()) {
+		vmcs_writel(HOST_S_CET, kvm_host.s_cet);
+		vmcs_writel(HOST_SSP, 0);
+		vmcs_writel(HOST_INTR_SSP_TABLE, 0);
+	}
 }
 
 void set_cr4_guest_host_mask(struct vcpu_vmx *vmx)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 79861b7ad44d..d67aef261638 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9890,6 +9890,18 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 		return -EIO;
 	}
 
+	if (boot_cpu_has(X86_FEATURE_SHSTK)) {
+		rdmsrq(MSR_IA32_S_CET, kvm_host.s_cet);
+		/*
+		 * Linux doesn't yet support supervisor shadow stacks (SSS), so
+		 * KVM doesn't save/restore the associated MSRs, i.e. KVM may
+		 * clobber the host values.  Yell and refuse to load if SSS is
+		 * unexpectedly enabled, e.g. to avoid crashing the host.
+		 */
+		if (WARN_ON_ONCE(kvm_host.s_cet & CET_SHSTK_EN))
+			return -EIO;
+	}
+
 	memset(&kvm_caps, 0, sizeof(kvm_caps));
 
 	x86_emulator_cache = kvm_alloc_emulator_cache();
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 95d2a82a4674..3da60b046ce8 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -50,6 +50,7 @@ struct kvm_host_values {
 	u64 efer;
 	u64 xcr0;
 	u64 xss;
+	u64 s_cet;
 	u64 arch_capabilities;
 };
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v14 15/22] KVM: x86: Don't emulate instructions guarded by CET
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
                   ` (13 preceding siblings ...)
  2025-09-09  9:39 ` [PATCH v14 14/22] KVM: VMX: Set host constant supervisor states to VMCS fields Chao Gao
@ 2025-09-09  9:39 ` Chao Gao
  2025-09-11  9:18   ` Xiaoyao Li
  2025-09-12 14:42   ` Sean Christopherson
  2025-09-09  9:39 ` [PATCH v14 16/22] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Chao Gao
                   ` (7 subsequent siblings)
  22 siblings, 2 replies; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

From: Yang Weijiang <weijiang.yang@intel.com>

Don't emulate the branch instructions, e.g., CALL/RET/JMP etc., when CET
is active in guest, return KVM_INTERNAL_ERROR_EMULATION to userspace to
handle it.

KVM doesn't emulate CPU behaviors to check CET protected stuffs while
emulating guest instructions, instead it stops emulation on detecting
the instructions in process are CET protected. By doing so, it can avoid
generating bogus #CP in guest and preventing CET protected execution flow
subversion from guest side.

Suggested-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/emulate.c | 46 ++++++++++++++++++++++++++++++++----------
 1 file changed, 35 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 542d3664afa3..97a4d1e69583 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -178,6 +178,8 @@
 #define IncSP       ((u64)1 << 54)  /* SP is incremented before ModRM calc */
 #define TwoMemOp    ((u64)1 << 55)  /* Instruction has two memory operand */
 #define IsBranch    ((u64)1 << 56)  /* Instruction is considered a branch. */
+#define ShadowStack ((u64)1 << 57)  /* Instruction protected by Shadow Stack. */
+#define IndirBrnTrk ((u64)1 << 58)  /* Instruction protected by IBT. */
 
 #define DstXacc     (DstAccLo | SrcAccHi | SrcWrite)
 
@@ -4068,9 +4070,11 @@ static const struct opcode group4[] = {
 static const struct opcode group5[] = {
 	F(DstMem | SrcNone | Lock,		em_inc),
 	F(DstMem | SrcNone | Lock,		em_dec),
-	I(SrcMem | NearBranch | IsBranch,       em_call_near_abs),
-	I(SrcMemFAddr | ImplicitOps | IsBranch, em_call_far),
-	I(SrcMem | NearBranch | IsBranch,       em_jmp_abs),
+	I(SrcMem | NearBranch | IsBranch | ShadowStack | IndirBrnTrk,
+	em_call_near_abs),
+	I(SrcMemFAddr | ImplicitOps | IsBranch | ShadowStack | IndirBrnTrk,
+	em_call_far),
+	I(SrcMem | NearBranch | IsBranch | IndirBrnTrk, em_jmp_abs),
 	I(SrcMemFAddr | ImplicitOps | IsBranch, em_jmp_far),
 	I(SrcMem | Stack | TwoMemOp,		em_push), D(Undefined),
 };
@@ -4332,11 +4336,11 @@ static const struct opcode opcode_table[256] = {
 	/* 0xC8 - 0xCF */
 	I(Stack | SrcImmU16 | Src2ImmByte | IsBranch, em_enter),
 	I(Stack | IsBranch, em_leave),
-	I(ImplicitOps | SrcImmU16 | IsBranch, em_ret_far_imm),
-	I(ImplicitOps | IsBranch, em_ret_far),
-	D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch, intn),
+	I(ImplicitOps | SrcImmU16 | IsBranch | ShadowStack, em_ret_far_imm),
+	I(ImplicitOps | IsBranch | ShadowStack, em_ret_far),
+	D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch | ShadowStack, intn),
 	D(ImplicitOps | No64 | IsBranch),
-	II(ImplicitOps | IsBranch, em_iret, iret),
+	II(ImplicitOps | IsBranch | ShadowStack, em_iret, iret),
 	/* 0xD0 - 0xD7 */
 	G(Src2One | ByteOp, group2), G(Src2One, group2),
 	G(Src2CL | ByteOp, group2), G(Src2CL, group2),
@@ -4352,7 +4356,7 @@ static const struct opcode opcode_table[256] = {
 	I2bvIP(SrcImmUByte | DstAcc, em_in,  in,  check_perm_in),
 	I2bvIP(SrcAcc | DstImmUByte, em_out, out, check_perm_out),
 	/* 0xE8 - 0xEF */
-	I(SrcImm | NearBranch | IsBranch, em_call),
+	I(SrcImm | NearBranch | IsBranch | ShadowStack, em_call),
 	D(SrcImm | ImplicitOps | NearBranch | IsBranch),
 	I(SrcImmFAddr | No64 | IsBranch, em_jmp_far),
 	D(SrcImmByte | ImplicitOps | NearBranch | IsBranch),
@@ -4371,7 +4375,8 @@ static const struct opcode opcode_table[256] = {
 static const struct opcode twobyte_table[256] = {
 	/* 0x00 - 0x0F */
 	G(0, group6), GD(0, &group7), N, N,
-	N, I(ImplicitOps | EmulateOnUD | IsBranch, em_syscall),
+	N, I(ImplicitOps | EmulateOnUD | IsBranch | ShadowStack | IndirBrnTrk,
+	em_syscall),
 	II(ImplicitOps | Priv, em_clts, clts), N,
 	DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N,
 	N, D(ImplicitOps | ModRM | SrcMem | NoAccess), N, N,
@@ -4402,8 +4407,9 @@ static const struct opcode twobyte_table[256] = {
 	IIP(ImplicitOps, em_rdtsc, rdtsc, check_rdtsc),
 	II(ImplicitOps | Priv, em_rdmsr, rdmsr),
 	IIP(ImplicitOps, em_rdpmc, rdpmc, check_rdpmc),
-	I(ImplicitOps | EmulateOnUD | IsBranch, em_sysenter),
-	I(ImplicitOps | Priv | EmulateOnUD | IsBranch, em_sysexit),
+	I(ImplicitOps | EmulateOnUD | IsBranch | ShadowStack | IndirBrnTrk,
+	em_sysenter),
+	I(ImplicitOps | Priv | EmulateOnUD | IsBranch | ShadowStack, em_sysexit),
 	N, N,
 	N, N, N, N, N, N, N, N,
 	/* 0x40 - 0x4F */
@@ -4941,6 +4947,24 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
 	if (ctxt->d == 0)
 		return EMULATION_FAILED;
 
+	if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET) {
+		u64 u_cet, s_cet;
+		bool stop_em;
+
+		if (ctxt->ops->get_msr(ctxt, MSR_IA32_U_CET, &u_cet) ||
+		    ctxt->ops->get_msr(ctxt, MSR_IA32_S_CET, &s_cet))
+			return EMULATION_FAILED;
+
+		stop_em = ((u_cet & CET_SHSTK_EN) || (s_cet & CET_SHSTK_EN)) &&
+			  (opcode.flags & ShadowStack);
+
+		stop_em |= ((u_cet & CET_ENDBR_EN) || (s_cet & CET_ENDBR_EN)) &&
+			   (opcode.flags & IndirBrnTrk);
+
+		if (stop_em)
+			return EMULATION_FAILED;
+	}
+
 	ctxt->execute = opcode.u.execute;
 
 	if (unlikely(emulation_type & EMULTYPE_TRAP_UD) &&
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v14 16/22] KVM: x86: Enable CET virtualization for VMX and advertise to userspace
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
                   ` (14 preceding siblings ...)
  2025-09-09  9:39 ` [PATCH v14 15/22] KVM: x86: Don't emulate instructions guarded by CET Chao Gao
@ 2025-09-09  9:39 ` Chao Gao
  2025-09-09  9:39 ` [PATCH v14 17/22] KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2 Chao Gao
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

From: Yang Weijiang <weijiang.yang@intel.com>

Expose CET features to guest if KVM/host can support them, clear CPUID
feature bits if KVM/host cannot support.

Set CPUID feature bits so that CET features are available in guest CPUID.
Add CR4.CET bit support in order to allow guest set CET master control
bit.

Disable KVM CET feature if unrestricted_guest is unsupported/disabled as
KVM does not support emulating CET.

The CET load-bits in VM_ENTRY/VM_EXIT control fields should be set to make
guest CET xstates isolated from host's.

On platforms with VMX_BASIC[bit56] == 0, inject #CP at VMX entry with error
code will fail, and if VMX_BASIC[bit56] == 1, #CP injection with or without
error code is allowed. Disable CET feature bits if the MSR bit is cleared
so that nested VMM can inject #CP if and only if VMX_BASIC[bit56] == 1.

Don't expose CET feature if either of {U,S}_CET xstate bits is cleared
in host XSS or if XSAVES isn't supported.

CET MSRs are reset to 0s after RESET, power-up and INIT, clear guest CET
xsave-area fields so that guest CET MSRs are reset to 0s after the events.

Meanwhile explicitly disable SHSTK and IBT for SVM because CET KVM enabling
for SVM is not ready.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/include/asm/vmx.h      |  1 +
 arch/x86/kvm/cpuid.c            |  2 ++
 arch/x86/kvm/svm/svm.c          |  4 ++++
 arch/x86/kvm/vmx/capabilities.h |  5 +++++
 arch/x86/kvm/vmx/vmx.c          | 30 +++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/vmx.h          |  6 ++++--
 arch/x86/kvm/x86.c              | 22 +++++++++++++++++++---
 arch/x86/kvm/x86.h              |  3 +++
 9 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b2983c830247..e947204b7f21 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -142,7 +142,7 @@
 			  | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
 			  | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
 			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
-			  | X86_CR4_LAM_SUP))
+			  | X86_CR4_LAM_SUP | X86_CR4_CET))
 
 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
 
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index ce10a7e2d3d9..c85c50019523 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -134,6 +134,7 @@
 #define VMX_BASIC_DUAL_MONITOR_TREATMENT	BIT_ULL(49)
 #define VMX_BASIC_INOUT				BIT_ULL(54)
 #define VMX_BASIC_TRUE_CTLS			BIT_ULL(55)
+#define VMX_BASIC_NO_HW_ERROR_CODE_CC		BIT_ULL(56)
 
 static inline u32 vmx_basic_vmcs_revision_id(u64 vmx_basic)
 {
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b5f87254ced7..ee05b876c656 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -944,6 +944,7 @@ void kvm_set_cpu_caps(void)
 		VENDOR_F(WAITPKG),
 		F(SGX_LC),
 		F(BUS_LOCK_DETECT),
+		X86_64_F(SHSTK),
 	);
 
 	/*
@@ -970,6 +971,7 @@ void kvm_set_cpu_caps(void)
 		F(AMX_INT8),
 		F(AMX_BF16),
 		F(FLUSH_L1D),
+		F(IBT),
 	);
 
 	if (boot_cpu_has(X86_FEATURE_AMD_IBPB_RET) &&
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cb4f81be0024..e4af4907c7d8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5224,6 +5224,10 @@ static __init void svm_set_cpu_caps(void)
 	kvm_caps.supported_perf_cap = 0;
 	kvm_caps.supported_xss = 0;
 
+	/* KVM doesn't yet support CET virtualization for SVM. */
+	kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+	kvm_cpu_cap_clear(X86_FEATURE_IBT);
+
 	/* CPUID 0x80000001 and 0x8000000A (SVM features) */
 	if (nested) {
 		kvm_cpu_cap_set(X86_FEATURE_SVM);
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 7d290b2cb0f4..47b0dec8665a 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -76,6 +76,11 @@ static inline bool cpu_has_vmx_basic_inout(void)
 	return	vmcs_config.basic & VMX_BASIC_INOUT;
 }
 
+static inline bool cpu_has_vmx_basic_no_hw_errcode(void)
+{
+	return	vmcs_config.basic & VMX_BASIC_NO_HW_ERROR_CODE_CC;
+}
+
 static inline bool cpu_has_virtual_nmis(void)
 {
 	return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS &&
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3430a17ecd23..820a2d1f3bd7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2616,6 +2616,7 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 		{ VM_ENTRY_LOAD_IA32_EFER,		VM_EXIT_LOAD_IA32_EFER },
 		{ VM_ENTRY_LOAD_BNDCFGS,		VM_EXIT_CLEAR_BNDCFGS },
 		{ VM_ENTRY_LOAD_IA32_RTIT_CTL,		VM_EXIT_CLEAR_IA32_RTIT_CTL },
+		{ VM_ENTRY_LOAD_CET_STATE,		VM_EXIT_LOAD_CET_STATE },
 	};
 
 	memset(vmcs_conf, 0, sizeof(*vmcs_conf));
@@ -4883,6 +4884,14 @@ void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 
 	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);  /* 22.2.1 */
 
+	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
+		vmcs_writel(GUEST_SSP, 0);
+		vmcs_writel(GUEST_INTR_SSP_TABLE, 0);
+	}
+	if (kvm_cpu_cap_has(X86_FEATURE_IBT) ||
+	    kvm_cpu_cap_has(X86_FEATURE_SHSTK))
+		vmcs_writel(GUEST_S_CET, 0);
+
 	kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
 
 	vpid_sync_context(vmx->vpid);
@@ -6350,6 +6359,10 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 	if (vmcs_read32(VM_EXIT_MSR_STORE_COUNT) > 0)
 		vmx_dump_msrs("guest autostore", &vmx->msr_autostore.guest);
 
+	if (vmentry_ctl & VM_ENTRY_LOAD_CET_STATE)
+		pr_err("S_CET = 0x%016lx, SSP = 0x%016lx, SSP TABLE = 0x%016lx\n",
+		       vmcs_readl(GUEST_S_CET), vmcs_readl(GUEST_SSP),
+		       vmcs_readl(GUEST_INTR_SSP_TABLE));
 	pr_err("*** Host State ***\n");
 	pr_err("RIP = 0x%016lx  RSP = 0x%016lx\n",
 	       vmcs_readl(HOST_RIP), vmcs_readl(HOST_RSP));
@@ -6380,6 +6393,10 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 		       vmcs_read64(HOST_IA32_PERF_GLOBAL_CTRL));
 	if (vmcs_read32(VM_EXIT_MSR_LOAD_COUNT) > 0)
 		vmx_dump_msrs("host autoload", &vmx->msr_autoload.host);
+	if (vmexit_ctl & VM_EXIT_LOAD_CET_STATE)
+		pr_err("S_CET = 0x%016lx, SSP = 0x%016lx, SSP TABLE = 0x%016lx\n",
+		       vmcs_readl(HOST_S_CET), vmcs_readl(HOST_SSP),
+		       vmcs_readl(HOST_INTR_SSP_TABLE));
 
 	pr_err("*** Control State ***\n");
 	pr_err("CPUBased=0x%08x SecondaryExec=0x%08x TertiaryExec=0x%016llx\n",
@@ -7964,7 +7981,6 @@ static __init void vmx_set_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_UMIP);
 
 	/* CPUID 0xD.1 */
-	kvm_caps.supported_xss = 0;
 	if (!cpu_has_vmx_xsaves())
 		kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
 
@@ -7976,6 +7992,18 @@ static __init void vmx_set_cpu_caps(void)
 
 	if (cpu_has_vmx_waitpkg())
 		kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG);
+
+	/*
+	 * Disable CET if unrestricted_guest is unsupported as KVM doesn't
+	 * enforce CET HW behaviors in emulator. On platforms with
+	 * VMX_BASIC[bit56] == 0, inject #CP at VMX entry with error code
+	 * fails, so disable CET in this case too.
+	 */
+	if (!cpu_has_load_cet_ctrl() || !enable_unrestricted_guest ||
+	    !cpu_has_vmx_basic_no_hw_errcode()) {
+		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+		kvm_cpu_cap_clear(X86_FEATURE_IBT);
+	}
 }
 
 static bool vmx_is_io_intercepted(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 24d65dac5e89..08a9a0075404 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -484,7 +484,8 @@ static inline u8 vmx_get_rvi(void)
 	 VM_ENTRY_LOAD_IA32_EFER |					\
 	 VM_ENTRY_LOAD_BNDCFGS |					\
 	 VM_ENTRY_PT_CONCEAL_PIP |					\
-	 VM_ENTRY_LOAD_IA32_RTIT_CTL)
+	 VM_ENTRY_LOAD_IA32_RTIT_CTL |					\
+	 VM_ENTRY_LOAD_CET_STATE)
 
 #define __KVM_REQUIRED_VMX_VM_EXIT_CONTROLS				\
 	(VM_EXIT_SAVE_DEBUG_CONTROLS |					\
@@ -506,7 +507,8 @@ static inline u8 vmx_get_rvi(void)
 	       VM_EXIT_LOAD_IA32_EFER |					\
 	       VM_EXIT_CLEAR_BNDCFGS |					\
 	       VM_EXIT_PT_CONCEAL_PIP |					\
-	       VM_EXIT_CLEAR_IA32_RTIT_CTL)
+	       VM_EXIT_CLEAR_IA32_RTIT_CTL |				\
+	       VM_EXIT_LOAD_CET_STATE)
 
 #define KVM_REQUIRED_VMX_PIN_BASED_VM_EXEC_CONTROL			\
 	(PIN_BASED_EXT_INTR_MASK |					\
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d67aef261638..6f64a3355274 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -223,7 +223,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
 				| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
 				| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
 
-#define KVM_SUPPORTED_XSS     0
+#define KVM_SUPPORTED_XSS	(XFEATURE_MASK_CET_USER | \
+				 XFEATURE_MASK_CET_KERNEL)
 
 bool __read_mostly allow_smaller_maxphyaddr = 0;
 EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr);
@@ -9988,6 +9989,20 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 	if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
 		kvm_caps.supported_xss = 0;
 
+	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
+	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
+		kvm_caps.supported_xss &= ~(XFEATURE_MASK_CET_USER |
+					    XFEATURE_MASK_CET_KERNEL);
+
+	if ((kvm_caps.supported_xss & (XFEATURE_MASK_CET_USER |
+	     XFEATURE_MASK_CET_KERNEL)) !=
+	    (XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL)) {
+		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+		kvm_cpu_cap_clear(X86_FEATURE_IBT);
+		kvm_caps.supported_xss &= ~(XFEATURE_MASK_CET_USER |
+					    XFEATURE_MASK_CET_KERNEL);
+	}
+
 	if (kvm_caps.has_tsc_control) {
 		/*
 		 * Make sure the user can only configure tsc_khz values that
@@ -12643,10 +12658,11 @@ static void kvm_xstate_reset(struct kvm_vcpu *vcpu, bool init_event)
 	/*
 	 * On INIT, only select XSTATE components are zeroed, most components
 	 * are unchanged.  Currently, the only components that are zeroed and
-	 * supported by KVM are MPX related.
+	 * supported by KVM are MPX and CET related.
 	 */
 	xfeatures_mask = (kvm_caps.supported_xcr0 | kvm_caps.supported_xss) &
-			 (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR);
+			 (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR |
+			  XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL);
 	if (!xfeatures_mask)
 		return;
 
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 3da60b046ce8..728e01781ae8 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -681,6 +681,9 @@ static inline bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 		__reserved_bits |= X86_CR4_PCIDE;       \
 	if (!__cpu_has(__c, X86_FEATURE_LAM))           \
 		__reserved_bits |= X86_CR4_LAM_SUP;     \
+	if (!__cpu_has(__c, X86_FEATURE_SHSTK) &&       \
+	    !__cpu_has(__c, X86_FEATURE_IBT))           \
+		__reserved_bits |= X86_CR4_CET;         \
 	__reserved_bits;                                \
 })
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v14 17/22] KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
                   ` (15 preceding siblings ...)
  2025-09-09  9:39 ` [PATCH v14 16/22] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Chao Gao
@ 2025-09-09  9:39 ` Chao Gao
  2025-09-09  9:39 ` [PATCH v14 18/22] KVM: nVMX: Prepare for enabling CET support for nested guest Chao Gao
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

From: Yang Weijiang <weijiang.yang@intel.com>

Per SDM description(Vol.3D, Appendix A.1):
"If bit 56 is read as 1, software can use VM entry to deliver a hardware
exception with or without an error code, regardless of vector"

Modify has_error_code check before inject events to nested guest. Only
enforce the check when guest is in real mode, the exception is not hard
exception and the platform doesn't enumerate bit56 in VMX_BASIC, in all
other case ignore the check to make the logic consistent with SDM.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 28 +++++++++++++++++++---------
 arch/x86/kvm/vmx/nested.h |  5 +++++
 2 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 2156c9a854f4..14f9822b611d 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1272,9 +1272,10 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data)
 {
 	const u64 feature_bits = VMX_BASIC_DUAL_MONITOR_TREATMENT |
 				 VMX_BASIC_INOUT |
-				 VMX_BASIC_TRUE_CTLS;
+				 VMX_BASIC_TRUE_CTLS |
+				 VMX_BASIC_NO_HW_ERROR_CODE_CC;
 
-	const u64 reserved_bits = GENMASK_ULL(63, 56) |
+	const u64 reserved_bits = GENMASK_ULL(63, 57) |
 				  GENMASK_ULL(47, 45) |
 				  BIT_ULL(31);
 
@@ -2949,7 +2950,6 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
 		u8 vector = intr_info & INTR_INFO_VECTOR_MASK;
 		u32 intr_type = intr_info & INTR_INFO_INTR_TYPE_MASK;
 		bool has_error_code = intr_info & INTR_INFO_DELIVER_CODE_MASK;
-		bool should_have_error_code;
 		bool urg = nested_cpu_has2(vmcs12,
 					   SECONDARY_EXEC_UNRESTRICTED_GUEST);
 		bool prot_mode = !urg || vmcs12->guest_cr0 & X86_CR0_PE;
@@ -2966,12 +2966,20 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
 		    CC(intr_type == INTR_TYPE_OTHER_EVENT && vector != 0))
 			return -EINVAL;
 
-		/* VM-entry interruption-info field: deliver error code */
-		should_have_error_code =
-			intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
-			x86_exception_has_error_code(vector);
-		if (CC(has_error_code != should_have_error_code))
-			return -EINVAL;
+		/*
+		 * Cannot deliver error code in real mode or if the interrupt
+		 * type is not hardware exception. For other cases, do the
+		 * consistency check only if the vCPU doesn't enumerate
+		 * VMX_BASIC_NO_HW_ERROR_CODE_CC.
+		 */
+		if (!prot_mode || intr_type != INTR_TYPE_HARD_EXCEPTION) {
+			if (CC(has_error_code))
+				return -EINVAL;
+		} else if (!nested_cpu_has_no_hw_errcode_cc(vcpu)) {
+			if (CC(has_error_code !=
+			       x86_exception_has_error_code(vector)))
+				return -EINVAL;
+		}
 
 		/* VM-entry exception error code */
 		if (CC(has_error_code &&
@@ -7214,6 +7222,8 @@ static void nested_vmx_setup_basic(struct nested_vmx_msrs *msrs)
 	msrs->basic |= VMX_BASIC_TRUE_CTLS;
 	if (cpu_has_vmx_basic_inout())
 		msrs->basic |= VMX_BASIC_INOUT;
+	if (cpu_has_vmx_basic_no_hw_errcode())
+		msrs->basic |= VMX_BASIC_NO_HW_ERROR_CODE_CC;
 }
 
 static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index 6eedcfc91070..983484d42ebf 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -309,6 +309,11 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
 	       __kvm_is_valid_cr4(vcpu, val);
 }
 
+static inline bool nested_cpu_has_no_hw_errcode_cc(struct kvm_vcpu *vcpu)
+{
+	return to_vmx(vcpu)->nested.msrs.basic & VMX_BASIC_NO_HW_ERROR_CODE_CC;
+}
+
 /* No difference in the restrictions on guest and host CR4 in VMX operation. */
 #define nested_guest_cr4_valid	nested_cr4_valid
 #define nested_host_cr4_valid	nested_cr4_valid
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v14 18/22] KVM: nVMX: Prepare for enabling CET support for nested guest
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
                   ` (16 preceding siblings ...)
  2025-09-09  9:39 ` [PATCH v14 17/22] KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2 Chao Gao
@ 2025-09-09  9:39 ` Chao Gao
  2025-09-09  9:39 ` [PATCH v14 19/22] KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET Chao Gao
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

From: Yang Weijiang <weijiang.yang@intel.com>

Set up CET MSRs, related VM_ENTRY/EXIT control bits and fixed CR4 setting
to enable CET for nested VM.

vmcs12 and vmcs02 needs to be synced when L2 exits to L1 or when L1 wants
to resume L2, that way correct CET states can be observed by one another.

Please note that consistency checks regarding CET state during VM-Entry
will be added later to prevent this patch from becoming too large.
Advertising the new CET VM_ENTRY/EXIT control bits are also be deferred
until after the consistency checks are added.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 77 +++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmcs12.c |  6 +++
 arch/x86/kvm/vmx/vmcs12.h | 14 ++++++-
 arch/x86/kvm/vmx/vmx.c    |  2 +
 arch/x86/kvm/vmx/vmx.h    |  3 ++
 5 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 14f9822b611d..51d69f368689 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -721,6 +721,24 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
 					 MSR_IA32_MPERF, MSR_TYPE_R);
 
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_U_CET, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_S_CET, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PL0_SSP, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PL1_SSP, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PL2_SSP, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PL3_SSP, MSR_TYPE_RW);
+
 	kvm_vcpu_unmap(vcpu, &map);
 
 	vmx->nested.force_msr_bitmap_recalc = false;
@@ -2521,6 +2539,32 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
 	}
 }
 
+static void vmcs_read_cet_state(struct kvm_vcpu *vcpu, u64 *s_cet,
+				u64 *ssp, u64 *ssp_tbl)
+{
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) ||
+	    guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+		*s_cet = vmcs_readl(GUEST_S_CET);
+
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) {
+		*ssp = vmcs_readl(GUEST_SSP);
+		*ssp_tbl = vmcs_readl(GUEST_INTR_SSP_TABLE);
+	}
+}
+
+static void vmcs_write_cet_state(struct kvm_vcpu *vcpu, u64 s_cet,
+				 u64 ssp, u64 ssp_tbl)
+{
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) ||
+	    guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+		vmcs_writel(GUEST_S_CET, s_cet);
+
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) {
+		vmcs_writel(GUEST_SSP, ssp);
+		vmcs_writel(GUEST_INTR_SSP_TABLE, ssp_tbl);
+	}
+}
+
 static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 {
 	struct hv_enlightened_vmcs *hv_evmcs = nested_vmx_evmcs(vmx);
@@ -2637,6 +2681,10 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 	vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.host.nr);
 	vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr);
 
+	if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE)
+		vmcs_write_cet_state(&vmx->vcpu, vmcs12->guest_s_cet,
+				     vmcs12->guest_ssp, vmcs12->guest_ssp_tbl);
+
 	set_cr4_guest_host_mask(vmx);
 }
 
@@ -2676,6 +2724,13 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 		kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
 		vmx_guest_debugctl_write(vcpu, vmx->nested.pre_vmenter_debugctl);
 	}
+
+	if (!vmx->nested.nested_run_pending ||
+	    !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE))
+		vmcs_write_cet_state(vcpu, vmx->nested.pre_vmenter_s_cet,
+				     vmx->nested.pre_vmenter_ssp,
+				     vmx->nested.pre_vmenter_ssp_tbl);
+
 	if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending ||
 	    !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)))
 		vmcs_write64(GUEST_BNDCFGS, vmx->nested.pre_vmenter_bndcfgs);
@@ -3552,6 +3607,12 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 	     !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)))
 		vmx->nested.pre_vmenter_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
 
+	if (!vmx->nested.nested_run_pending ||
+	    !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE))
+		vmcs_read_cet_state(vcpu, &vmx->nested.pre_vmenter_s_cet,
+				    &vmx->nested.pre_vmenter_ssp,
+				    &vmx->nested.pre_vmenter_ssp_tbl);
+
 	/*
 	 * Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and*
 	 * nested early checks are disabled.  In the event of a "late" VM-Fail,
@@ -4635,6 +4696,10 @@ static void sync_vmcs02_to_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 
 	if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_IA32_EFER)
 		vmcs12->guest_ia32_efer = vcpu->arch.efer;
+
+	vmcs_read_cet_state(&vmx->vcpu, &vmcs12->guest_s_cet,
+			    &vmcs12->guest_ssp,
+			    &vmcs12->guest_ssp_tbl);
 }
 
 /*
@@ -4760,6 +4825,18 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
 	if (vmcs12->vm_exit_controls & VM_EXIT_CLEAR_BNDCFGS)
 		vmcs_write64(GUEST_BNDCFGS, 0);
 
+	/*
+	 * Load CET state from host state if VM_EXIT_LOAD_CET_STATE is set.
+	 * otherwise CET state should be retained across VM-exit, i.e.,
+	 * guest values should be propagated from vmcs12 to vmcs01.
+	 */
+	if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_CET_STATE)
+		vmcs_write_cet_state(vcpu, vmcs12->host_s_cet, vmcs12->host_ssp,
+				     vmcs12->host_ssp_tbl);
+	else
+		vmcs_write_cet_state(vcpu, vmcs12->guest_s_cet, vmcs12->guest_ssp,
+				     vmcs12->guest_ssp_tbl);
+
 	if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PAT) {
 		vmcs_write64(GUEST_IA32_PAT, vmcs12->host_ia32_pat);
 		vcpu->arch.pat = vmcs12->host_ia32_pat;
diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c
index 106a72c923ca..4233b5ca9461 100644
--- a/arch/x86/kvm/vmx/vmcs12.c
+++ b/arch/x86/kvm/vmx/vmcs12.c
@@ -139,6 +139,9 @@ const unsigned short vmcs12_field_offsets[] = {
 	FIELD(GUEST_PENDING_DBG_EXCEPTIONS, guest_pending_dbg_exceptions),
 	FIELD(GUEST_SYSENTER_ESP, guest_sysenter_esp),
 	FIELD(GUEST_SYSENTER_EIP, guest_sysenter_eip),
+	FIELD(GUEST_S_CET, guest_s_cet),
+	FIELD(GUEST_SSP, guest_ssp),
+	FIELD(GUEST_INTR_SSP_TABLE, guest_ssp_tbl),
 	FIELD(HOST_CR0, host_cr0),
 	FIELD(HOST_CR3, host_cr3),
 	FIELD(HOST_CR4, host_cr4),
@@ -151,5 +154,8 @@ const unsigned short vmcs12_field_offsets[] = {
 	FIELD(HOST_IA32_SYSENTER_EIP, host_ia32_sysenter_eip),
 	FIELD(HOST_RSP, host_rsp),
 	FIELD(HOST_RIP, host_rip),
+	FIELD(HOST_S_CET, host_s_cet),
+	FIELD(HOST_SSP, host_ssp),
+	FIELD(HOST_INTR_SSP_TABLE, host_ssp_tbl),
 };
 const unsigned int nr_vmcs12_fields = ARRAY_SIZE(vmcs12_field_offsets);
diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
index 56fd150a6f24..4ad6b16525b9 100644
--- a/arch/x86/kvm/vmx/vmcs12.h
+++ b/arch/x86/kvm/vmx/vmcs12.h
@@ -117,7 +117,13 @@ struct __packed vmcs12 {
 	natural_width host_ia32_sysenter_eip;
 	natural_width host_rsp;
 	natural_width host_rip;
-	natural_width paddingl[8]; /* room for future expansion */
+	natural_width host_s_cet;
+	natural_width host_ssp;
+	natural_width host_ssp_tbl;
+	natural_width guest_s_cet;
+	natural_width guest_ssp;
+	natural_width guest_ssp_tbl;
+	natural_width paddingl[2]; /* room for future expansion */
 	u32 pin_based_vm_exec_control;
 	u32 cpu_based_vm_exec_control;
 	u32 exception_bitmap;
@@ -294,6 +300,12 @@ static inline void vmx_check_vmcs12_offsets(void)
 	CHECK_OFFSET(host_ia32_sysenter_eip, 656);
 	CHECK_OFFSET(host_rsp, 664);
 	CHECK_OFFSET(host_rip, 672);
+	CHECK_OFFSET(host_s_cet, 680);
+	CHECK_OFFSET(host_ssp, 688);
+	CHECK_OFFSET(host_ssp_tbl, 696);
+	CHECK_OFFSET(guest_s_cet, 704);
+	CHECK_OFFSET(guest_ssp, 712);
+	CHECK_OFFSET(guest_ssp_tbl, 720);
 	CHECK_OFFSET(pin_based_vm_exec_control, 744);
 	CHECK_OFFSET(cpu_based_vm_exec_control, 748);
 	CHECK_OFFSET(exception_bitmap, 752);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 820a2d1f3bd7..92daf63c9487 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7750,6 +7750,8 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)
 	cr4_fixed1_update(X86_CR4_PKE,        ecx, feature_bit(PKU));
 	cr4_fixed1_update(X86_CR4_UMIP,       ecx, feature_bit(UMIP));
 	cr4_fixed1_update(X86_CR4_LA57,       ecx, feature_bit(LA57));
+	cr4_fixed1_update(X86_CR4_CET,	      ecx, feature_bit(SHSTK));
+	cr4_fixed1_update(X86_CR4_CET,	      edx, feature_bit(IBT));
 
 	entry = kvm_find_cpuid_entry_index(vcpu, 0x7, 1);
 	cr4_fixed1_update(X86_CR4_LAM_SUP,    eax, feature_bit(LAM));
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 08a9a0075404..ecfdba666465 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -181,6 +181,9 @@ struct nested_vmx {
 	 */
 	u64 pre_vmenter_debugctl;
 	u64 pre_vmenter_bndcfgs;
+	u64 pre_vmenter_s_cet;
+	u64 pre_vmenter_ssp;
+	u64 pre_vmenter_ssp_tbl;
 
 	/* to migrate it to L1 if L2 writes to L1's CR8 directly */
 	int l1_tpr_threshold;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v14 19/22] KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
                   ` (17 preceding siblings ...)
  2025-09-09  9:39 ` [PATCH v14 18/22] KVM: nVMX: Prepare for enabling CET support for nested guest Chao Gao
@ 2025-09-09  9:39 ` Chao Gao
  2025-09-09  9:39 ` [PATCH v14 20/22] KVM: nVMX: Add consistency checks for CET states Chao Gao
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

Add consistency checks for CR4.CET and CR0.WP in guest-state or host-state
area in the VMCS12. This ensures that configurations with CR4.CET set and
CR0.WP not set result in VM-entry failure, aligning with architectural
behavior.

Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 51d69f368689..a73f38d7eea1 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3111,6 +3111,9 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
 	    CC(!kvm_vcpu_is_legal_cr3(vcpu, vmcs12->host_cr3)))
 		return -EINVAL;
 
+	if (CC(vmcs12->host_cr4 & X86_CR4_CET && !(vmcs12->host_cr0 & X86_CR0_WP)))
+		return -EINVAL;
+
 	if (CC(is_noncanonical_msr_address(vmcs12->host_ia32_sysenter_esp, vcpu)) ||
 	    CC(is_noncanonical_msr_address(vmcs12->host_ia32_sysenter_eip, vcpu)))
 		return -EINVAL;
@@ -3225,6 +3228,9 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
 	    CC(!nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4)))
 		return -EINVAL;
 
+	if (CC(vmcs12->guest_cr4 & X86_CR4_CET && !(vmcs12->guest_cr0 & X86_CR0_WP)))
+		return -EINVAL;
+
 	if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
 	    (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
 	     CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false))))
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v14 20/22] KVM: nVMX: Add consistency checks for CET states
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
                   ` (18 preceding siblings ...)
  2025-09-09  9:39 ` [PATCH v14 19/22] KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET Chao Gao
@ 2025-09-09  9:39 ` Chao Gao
  2025-09-09  9:39 ` [PATCH v14 21/22] KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state Chao Gao
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

Introduce consistency checks for CET states during nested VM-entry.

A VMCS contains both guest and host CET states, each comprising the
IA32_S_CET MSR, SSP, and IA32_INTERRUPT_SSP_TABLE_ADDR MSR. Various
checks are applied to CET states during VM-entry as documented in SDM
Vol3 Chapter "VM ENTRIES". Implement all these checks during nested
VM-entry to emulate the architectural behavior.

In summary, there are three kinds of checks on guest/host CET states
during VM-entry:

A. Checks applied to both guest states and host states:

 * The IA32_S_CET field must not set any reserved bits; bits 10 (SUPPRESS)
   and 11 (TRACKER) cannot both be set.
 * SSP should not have bits 1:0 set.
 * The IA32_INTERRUPT_SSP_TABLE_ADDR field must be canonical.

B. Checks applied to host states only

 * IA32_S_CET MSR and SSP must be canonical if the CPU enters 64-bit mode
   after VM-exit. Otherwise, IA32_S_CET and SSP must have their higher 32
   bits cleared.

C. Checks applied to guest states only:

 * IA32_S_CET MSR and SSP are not required to be canonical (i.e., 63:N-1
   are identical, where N is the CPU's maximum linear-address width). But,
   bits 63:N of SSP must be identical.

Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 47 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index a73f38d7eea1..edb3b877a0f6 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3101,6 +3101,17 @@ static bool is_l1_noncanonical_address_on_vmexit(u64 la, struct vmcs12 *vmcs12)
 	return !__is_canonical_address(la, l1_address_bits_on_exit);
 }
 
+static bool is_valid_cet_state(struct kvm_vcpu *vcpu, u64 s_cet, u64 ssp, u64 ssp_tbl)
+{
+	if (!kvm_is_valid_u_s_cet(vcpu, s_cet) || !IS_ALIGNED(ssp, 4))
+		return false;
+
+	if (is_noncanonical_msr_address(ssp_tbl, vcpu))
+		return false;
+
+	return true;
+}
+
 static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
 				       struct vmcs12 *vmcs12)
 {
@@ -3170,6 +3181,26 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
 			return -EINVAL;
 	}
 
+	if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_CET_STATE) {
+		if (CC(!is_valid_cet_state(vcpu, vmcs12->host_s_cet, vmcs12->host_ssp,
+					   vmcs12->host_ssp_tbl)))
+			return -EINVAL;
+
+		/*
+		 * IA32_S_CET and SSP must be canonical if the host will
+		 * enter 64-bit mode after VM-exit; otherwise, higher
+		 * 32-bits must be all 0s.
+		 */
+		if (ia32e) {
+			if (CC(is_noncanonical_msr_address(vmcs12->host_s_cet, vcpu)) ||
+			    CC(is_noncanonical_msr_address(vmcs12->host_ssp, vcpu)))
+				return -EINVAL;
+		} else {
+			if (CC(vmcs12->host_s_cet >> 32) || CC(vmcs12->host_ssp >> 32))
+				return -EINVAL;
+		}
+	}
+
 	return 0;
 }
 
@@ -3280,6 +3311,22 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
 	     CC((vmcs12->guest_bndcfgs & MSR_IA32_BNDCFGS_RSVD))))
 		return -EINVAL;
 
+	if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE) {
+		if (CC(!is_valid_cet_state(vcpu, vmcs12->guest_s_cet, vmcs12->guest_ssp,
+					   vmcs12->guest_ssp_tbl)))
+			return -EINVAL;
+
+		/*
+		 * Guest SSP must have 63:N bits identical, rather than
+		 * be canonical (i.e., 63:N-1 bits identical), where N is
+		 * the CPU's maximum linear-address width. Similar to
+		 * is_noncanonical_msr_address(), use the host's
+		 * linear-address width.
+		 */
+		if (CC(!__is_canonical_address(vmcs12->guest_ssp, max_host_virt_addr_bits() + 1)))
+			return -EINVAL;
+	}
+
 	if (nested_check_guest_non_reg_state(vmcs12))
 		return -EINVAL;
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v14 21/22] KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
                   ` (19 preceding siblings ...)
  2025-09-09  9:39 ` [PATCH v14 20/22] KVM: nVMX: Add consistency checks for CET states Chao Gao
@ 2025-09-09  9:39 ` Chao Gao
  2025-09-09  9:39 ` [PATCH v14 22/22] KVM: selftest: Add tests for KVM_{GET,SET}_ONE_REG Chao Gao
  2025-09-09  9:52 ` [PATCH v14 00/22] Enable CET Virtualization Chao Gao
  22 siblings, 0 replies; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

Advertise new VM-Entry/Exit control bits as all nested support for
CET virtualization, including consistency checks, is in place.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index edb3b877a0f6..d7e2fb30fc1a 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -7176,7 +7176,7 @@ static void nested_vmx_setup_exit_ctls(struct vmcs_config *vmcs_conf,
 		VM_EXIT_HOST_ADDR_SPACE_SIZE |
 #endif
 		VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
-		VM_EXIT_CLEAR_BNDCFGS;
+		VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_CET_STATE;
 	msrs->exit_ctls_high |=
 		VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 		VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER |
@@ -7198,7 +7198,8 @@ static void nested_vmx_setup_entry_ctls(struct vmcs_config *vmcs_conf,
 #ifdef CONFIG_X86_64
 		VM_ENTRY_IA32E_MODE |
 #endif
-		VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS;
+		VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS |
+		VM_ENTRY_LOAD_CET_STATE;
 	msrs->entry_ctls_high |=
 		(VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER |
 		 VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v14 22/22] KVM: selftest: Add tests for KVM_{GET,SET}_ONE_REG
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
                   ` (20 preceding siblings ...)
  2025-09-09  9:39 ` [PATCH v14 21/22] KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state Chao Gao
@ 2025-09-09  9:39 ` Chao Gao
  2025-09-10 18:06   ` Sean Christopherson
  2025-09-09  9:52 ` [PATCH v14 00/22] Enable CET Virtualization Chao Gao
  22 siblings, 1 reply; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:39 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin, xiaoyao.li

Add tests for newly added KVM_{GET,SET}_ONE_REG support for x86. Verify the
new ioctls can read and write real MSRs and synthetic MSRs.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 tools/arch/x86/include/uapi/asm/kvm.h         | 29 ++++++++++++++++++
 tools/testing/selftests/kvm/Makefile.kvm      |  1 +
 .../selftests/kvm/x86/get_set_one_reg.c       | 30 +++++++++++++++++++
 3 files changed, 60 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86/get_set_one_reg.c

diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include/uapi/asm/kvm.h
index 6f3499507c5e..59ac0b46ebcc 100644
--- a/tools/arch/x86/include/uapi/asm/kvm.h
+++ b/tools/arch/x86/include/uapi/asm/kvm.h
@@ -411,6 +411,35 @@ struct kvm_xcrs {
 	__u64 padding[16];
 };
 
+#define KVM_X86_REG_TYPE_MSR		2
+#define KVM_X86_REG_TYPE_KVM		3
+
+#define KVM_X86_KVM_REG_SIZE(reg)						\
+({										\
+	reg == KVM_REG_GUEST_SSP ? KVM_REG_SIZE_U64 : 0;			\
+})
+
+#define KVM_X86_REG_TYPE_SIZE(type, reg)					\
+({										\
+	__u64 type_size = (__u64)type << 32;					\
+										\
+	type_size |= type == KVM_X86_REG_TYPE_MSR ? KVM_REG_SIZE_U64 :		\
+		     type == KVM_X86_REG_TYPE_KVM ? KVM_X86_KVM_REG_SIZE(reg) :	\
+		     0;								\
+	type_size;								\
+})
+
+#define KVM_X86_REG_ENCODE(type, index)				\
+	(KVM_REG_X86 | KVM_X86_REG_TYPE_SIZE(type, index) | index)
+
+#define KVM_X86_REG_MSR(index)					\
+	KVM_X86_REG_ENCODE(KVM_X86_REG_TYPE_MSR, index)
+#define KVM_X86_REG_KVM(index)					\
+	KVM_X86_REG_ENCODE(KVM_X86_REG_TYPE_KVM, index)
+
+/* KVM-defined registers starting from 0 */
+#define KVM_REG_GUEST_SSP	0
+
 #define KVM_SYNC_X86_REGS      (1UL << 0)
 #define KVM_SYNC_X86_SREGS     (1UL << 1)
 #define KVM_SYNC_X86_EVENTS    (1UL << 2)
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index f6fe7a07a0a2..9a375d5faf1c 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -136,6 +136,7 @@ TEST_GEN_PROGS_x86 += x86/max_vcpuid_cap_test
 TEST_GEN_PROGS_x86 += x86/triple_fault_event_test
 TEST_GEN_PROGS_x86 += x86/recalc_apic_map_test
 TEST_GEN_PROGS_x86 += x86/aperfmperf_test
+TEST_GEN_PROGS_x86 += x86/get_set_one_reg
 TEST_GEN_PROGS_x86 += access_tracking_perf_test
 TEST_GEN_PROGS_x86 += coalesced_io_test
 TEST_GEN_PROGS_x86 += dirty_log_perf_test
diff --git a/tools/testing/selftests/kvm/x86/get_set_one_reg.c b/tools/testing/selftests/kvm/x86/get_set_one_reg.c
new file mode 100644
index 000000000000..8a4dbc812214
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86/get_set_one_reg.c
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <fcntl.h>
+#include <stdint.h>
+#include <sys/ioctl.h>
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+
+int main(int argc, char *argv[])
+{
+	struct kvm_vcpu *vcpu;
+	struct kvm_vm *vm;
+	u64 data;
+
+	TEST_REQUIRE(kvm_has_cap(KVM_CAP_ONE_REG));
+
+	vm = vm_create_with_one_vcpu(&vcpu, NULL);
+
+	TEST_ASSERT_EQ(__vcpu_get_reg(vcpu, KVM_X86_REG_MSR(MSR_EFER), &data), 0);
+	TEST_ASSERT_EQ(__vcpu_set_reg(vcpu, KVM_X86_REG_MSR(MSR_EFER), data), 0);
+
+	if (kvm_cpu_has(X86_FEATURE_SHSTK)) {
+		TEST_ASSERT_EQ(__vcpu_get_reg(vcpu, KVM_X86_REG_KVM(KVM_REG_GUEST_SSP), &data), 0);
+		TEST_ASSERT_EQ(__vcpu_set_reg(vcpu, KVM_X86_REG_KVM(KVM_REG_GUEST_SSP), data), 0);
+	}
+
+	kvm_vm_free(vm);
+	return 0;
+}
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 00/22] Enable CET Virtualization
  2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
                   ` (21 preceding siblings ...)
  2025-09-09  9:39 ` [PATCH v14 22/22] KVM: selftest: Add tests for KVM_{GET,SET}_ONE_REG Chao Gao
@ 2025-09-09  9:52 ` Chao Gao
  2025-09-10 18:29   ` Sean Christopherson
  22 siblings, 1 reply; 53+ messages in thread
From: Chao Gao @ 2025-09-09  9:52 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, x86, xin, xiaoyao.li

On Tue, Sep 09, 2025 at 02:39:31AM -0700, Chao Gao wrote:
>The FPU support for CET virtualization has already been merged into 6.17-rc1.
>Building on that, this series introduces Intel CET virtualization support for
>KVM.
>
>Changes in v14
>1. rename the type of guest SSP register to KVM_X86_REG_KVM and add docs
>   for register IDs in api.rst (Sean, Xiaoyao)
>2. update commit message of patch 1
>3. use rdmsrq/wrmsrq() instead of rdmsrl/wrmsrl() in patch 6 (Xin)
>4. split the introduction of per-guest guest_supported_xss into a
>separate patch. (Xiaoyao)
>5. make guest FPU and VMCS consistent regarding MSR_IA32_S_CET
>6. collect reviews from Xiaoyao.

(Removed Weijiang's Intel email as it is bouncing)

Below is the diff between v13 and v14:

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 6aa40ee05a4a..2b999408a768 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -2908,6 +2908,15 @@ such as set vcpu counter or reset vcpu, and they have the following id bit patte
 
   0x9030 0000 0002 <reg:16>
 
+x86 MSR registers have the following id bit patterns::
+  0x2030 0002 <msr number:32>
+
+Following are the KVM-defined registers for x86:
+======================= ========= =============================================
+    Encoding            Register  Description
+======================= ========= =============================================
+  0x2030 0003 0000 0000 SSP       Shadow Stack Pointer
+======================= ========= =============================================
 
 4.69 KVM_GET_ONE_REG
 --------------------
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 061c0cd73d39..e947204b7f21 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -875,8 +875,8 @@ struct kvm_vcpu_arch {
 
	u64 xcr0;
	u64 guest_supported_xcr0;
-	u64 guest_supported_xss;
	u64 ia32_xss;
+	u64 guest_supported_xss;
 
	struct kvm_pio_request pio;
	void *pio_data;
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 478d9b63a9db..8cc79eca34b2 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -412,28 +412,33 @@ struct kvm_xcrs {
 };
 
 #define KVM_X86_REG_TYPE_MSR		2
-#define KVM_X86_REG_TYPE_SYNTHETIC_MSR	3
+#define KVM_X86_REG_TYPE_KVM		3
 
-#define KVM_X86_REG_TYPE_SIZE(type)						\
+#define KVM_X86_KVM_REG_SIZE(reg)						\
+({										\
+	reg == KVM_REG_GUEST_SSP ? KVM_REG_SIZE_U64 : 0;			\
+})
+
+#define KVM_X86_REG_TYPE_SIZE(type, reg)					\
 ({										\
	__u64 type_size = (__u64)type << 32;					\
										\
	type_size |= type == KVM_X86_REG_TYPE_MSR ? KVM_REG_SIZE_U64 :		\
-		     type == KVM_X86_REG_TYPE_SYNTHETIC_MSR ? KVM_REG_SIZE_U64 :\
+		     type == KVM_X86_REG_TYPE_KVM ? KVM_X86_KVM_REG_SIZE(reg) :	\
		     0;								\
	type_size;								\
 })
 
 #define KVM_X86_REG_ENCODE(type, index)				\
-	(KVM_REG_X86 | KVM_X86_REG_TYPE_SIZE(type) | index)
+	(KVM_REG_X86 | KVM_X86_REG_TYPE_SIZE(type, index) | index)
 
 #define KVM_X86_REG_MSR(index)					\
	KVM_X86_REG_ENCODE(KVM_X86_REG_TYPE_MSR, index)
-#define KVM_X86_REG_SYNTHETIC_MSR(index)			\
-	KVM_X86_REG_ENCODE(KVM_X86_REG_TYPE_SYNTHETIC_MSR, index)
+#define KVM_X86_REG_KVM(index)					\
+	KVM_X86_REG_ENCODE(KVM_X86_REG_TYPE_KVM, index)
 
-/* KVM synthetic MSR index staring from 0 */
-#define KVM_SYNTHETIC_GUEST_SSP 0
+/* KVM-defined registers starting from 0 */
+#define KVM_REG_GUEST_SSP	0
 
 #define KVM_SYNC_X86_REGS      (1UL << 0)
 #define KVM_SYNC_X86_SREGS     (1UL << 1)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 989008f5307e..92daf63c9487 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2435,6 +2435,7 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
		break;
	case MSR_IA32_S_CET:
		vmcs_writel(GUEST_S_CET, data);
+		kvm_set_xstate_msr(vcpu, msr_info);
		break;
	case MSR_KVM_INTERNAL_GUEST_SSP:
		vmcs_writel(GUEST_SSP, data);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9930678f5a3b..6f64a3355274 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4647,6 +4647,7 @@ EXPORT_SYMBOL_GPL(kvm_get_msr_common);
 static bool is_xstate_managed_msr(u32 index)
 {
	switch (index) {
+	case MSR_IA32_S_CET:
	case MSR_IA32_U_CET:
	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
		return true;
@@ -6051,16 +6052,16 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 struct kvm_x86_reg_id {
	__u32 index;
	__u8  type;
-	__u8  rsvd;
-	__u8  rsvd4:4;
+	__u8  rsvd1;
+	__u8  rsvd2:4;
	__u8  size:4;
	__u8  x86;
 };
 
-static int kvm_translate_synthetic_msr(struct kvm_x86_reg_id *reg)
+static int kvm_translate_kvm_reg(struct kvm_x86_reg_id *reg)
 {
	switch (reg->index) {
-	case KVM_SYNTHETIC_GUEST_SSP:
+	case KVM_REG_GUEST_SSP:
		reg->type = KVM_X86_REG_TYPE_MSR;
		reg->index = MSR_KVM_INTERNAL_GUEST_SSP;
		break;
@@ -6201,18 +6202,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
			break;
 
		id = (struct kvm_x86_reg_id *)&reg.id;
-		if (id->rsvd || id->rsvd4)
-			break;
-
-		if (id->type != KVM_X86_REG_TYPE_MSR &&
-		    id->type != KVM_X86_REG_TYPE_SYNTHETIC_MSR)
+		if (id->rsvd1 || id->rsvd2)
			break;
 
-		if ((reg.id & KVM_REG_SIZE_MASK) != KVM_REG_SIZE_U64)
-			break;
-
-		if (id->type == KVM_X86_REG_TYPE_SYNTHETIC_MSR) {
-			r = kvm_translate_synthetic_msr(id);
+		if (id->type == KVM_X86_REG_TYPE_KVM) {
+			r = kvm_translate_kvm_reg(id);
			if (r)
				break;
		}
@@ -6221,6 +6215,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
		if (id->type != KVM_X86_REG_TYPE_MSR)
			break;
 
+		if ((reg.id & KVM_REG_SIZE_MASK) != KVM_REG_SIZE_U64)
+			break;
+
		value = u64_to_user_ptr(reg.addr);
		if (ioctl == KVM_GET_ONE_REG)
			r = kvm_get_one_msr(vcpu, id->index, value);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index d6b21ba41416..728e01781ae8 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -726,7 +726,7 @@ static inline void kvm_get_xstate_msr(struct kvm_vcpu *vcpu,
 {
	KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
	kvm_fpu_get();
-	rdmsrl(msr_info->index, msr_info->data);
+	rdmsrq(msr_info->index, msr_info->data);
	kvm_fpu_put();
 }
 
@@ -735,7 +735,7 @@ static inline void kvm_set_xstate_msr(struct kvm_vcpu *vcpu,
 {
	KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
	kvm_fpu_get();
-	wrmsrl(msr_info->index, msr_info->data);
+	wrmsrq(msr_info->index, msr_info->data);
	kvm_fpu_put();
 }
 
diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include/uapi/asm/kvm.h
index 590762820a61..59ac0b46ebcc 100644
--- a/tools/arch/x86/include/uapi/asm/kvm.h
+++ b/tools/arch/x86/include/uapi/asm/kvm.h
@@ -412,28 +412,33 @@ struct kvm_xcrs {
 };
 
 #define KVM_X86_REG_TYPE_MSR		2
-#define KVM_X86_REG_TYPE_SYNTHETIC_MSR	3
+#define KVM_X86_REG_TYPE_KVM		3
 
-#define KVM_X86_REG_TYPE_SIZE(type)						\
+#define KVM_X86_KVM_REG_SIZE(reg)						\
+({										\
+	reg == KVM_REG_GUEST_SSP ? KVM_REG_SIZE_U64 : 0;			\
+})
+
+#define KVM_X86_REG_TYPE_SIZE(type, reg)					\
 ({										\
	__u64 type_size = (__u64)type << 32;					\
										\
	type_size |= type == KVM_X86_REG_TYPE_MSR ? KVM_REG_SIZE_U64 :		\
-		     type == KVM_X86_REG_TYPE_SYNTHETIC_MSR ? KVM_REG_SIZE_U64 :\
+		     type == KVM_X86_REG_TYPE_KVM ? KVM_X86_KVM_REG_SIZE(reg) :	\
		     0;								\
	type_size;								\
 })
 
 #define KVM_X86_REG_ENCODE(type, index)				\
-	(KVM_REG_X86 | KVM_X86_REG_TYPE_SIZE(type) | index)
+	(KVM_REG_X86 | KVM_X86_REG_TYPE_SIZE(type, index) | index)
 
 #define KVM_X86_REG_MSR(index)					\
	KVM_X86_REG_ENCODE(KVM_X86_REG_TYPE_MSR, index)
-#define KVM_X86_REG_SYNTHETIC_MSR(index)			\
-	KVM_X86_REG_ENCODE(KVM_X86_REG_TYPE_SYNTHETIC_MSR, index)
+#define KVM_X86_REG_KVM(index)					\
+	KVM_X86_REG_ENCODE(KVM_X86_REG_TYPE_KVM, index)
 
-/* KVM synthetic MSR index staring from 0 */
-#define KVM_SYNTHETIC_GUEST_SSP 0
+/* KVM-defined registers starting from 0 */
+#define KVM_REG_GUEST_SSP	0
 
 #define KVM_SYNC_X86_REGS      (1UL << 0)
 #define KVM_SYNC_X86_SREGS     (1UL << 1)
diff --git a/tools/testing/selftests/kvm/x86/get_set_one_reg.c b/tools/testing/selftests/kvm/x86/get_set_one_reg.c
index 8b069155ddc7..8a4dbc812214 100644
--- a/tools/testing/selftests/kvm/x86/get_set_one_reg.c
+++ b/tools/testing/selftests/kvm/x86/get_set_one_reg.c
@@ -12,7 +12,6 @@ int main(int argc, char *argv[])
	struct kvm_vcpu *vcpu;
	struct kvm_vm *vm;
	u64 data;
-	int r;
 
	TEST_REQUIRE(kvm_has_cap(KVM_CAP_ONE_REG));
 
@@ -22,12 +21,8 @@ int main(int argc, char *argv[])
	TEST_ASSERT_EQ(__vcpu_set_reg(vcpu, KVM_X86_REG_MSR(MSR_EFER), data), 0);
 
	if (kvm_cpu_has(X86_FEATURE_SHSTK)) {
-		r = __vcpu_get_reg(vcpu, KVM_X86_REG_SYNTHETIC_MSR(KVM_SYNTHETIC_GUEST_SSP),
-				   &data);
-		TEST_ASSERT_EQ(r, 0);
-		r = __vcpu_set_reg(vcpu, KVM_X86_REG_SYNTHETIC_MSR(KVM_SYNTHETIC_GUEST_SSP),
-				   data);
-		TEST_ASSERT_EQ(r, 0);
+		TEST_ASSERT_EQ(__vcpu_get_reg(vcpu, KVM_X86_REG_KVM(KVM_REG_GUEST_SSP), &data), 0);
+		TEST_ASSERT_EQ(__vcpu_set_reg(vcpu, KVM_X86_REG_KVM(KVM_REG_GUEST_SSP), data), 0);
	}
 
	kvm_vm_free(vm);


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 01/22] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support
  2025-09-09  9:39 ` [PATCH v14 01/22] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support Chao Gao
@ 2025-09-10  9:03   ` Xiaoyao Li
  2025-09-10 17:17   ` Sean Christopherson
  2025-09-10 17:35   ` Sean Christopherson
  2 siblings, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2025-09-10  9:03 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin

On 9/9/2025 5:39 PM, Chao Gao wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
> 
> Enable KVM_{G,S}ET_ONE_REG uAPIs so that userspace can access MSRs and
> other non-MSR registers through them.
> 
> This is in preparation for allowing userspace to read/write the guest SSP
> register, which is needed for the upcoming CET virtualization support.
> 
> Currently, two types of registers are supported: KVM_X86_REG_TYPE_MSR and
> KVM_X86_REG_TYPE_KVM. All MSRs are in the former type; the latter type is
> added for registers that lack existing KVM uAPIs to access them. The "KVM"
> in the name is intended to be vague to give KVM flexibility to include
> other potential registers. We considered some specific names, like
> "SYNTHETIC" and "SYNTHETIC_MSR" before, but both are confusing and may put
> KVM itself into a corner.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Link: https://lore.kernel.org/all/20240219074733.122080-18-weijiang.yang@intel.com/ [1]
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 03/22] KVM: x86: Check XSS validity against guest CPUIDs
  2025-09-09  9:39 ` [PATCH v14 03/22] KVM: x86: Check XSS validity against guest CPUIDs Chao Gao
@ 2025-09-10  9:22   ` Xiaoyao Li
  2025-09-10 11:33     ` Chao Gao
  0 siblings, 1 reply; 53+ messages in thread
From: Xiaoyao Li @ 2025-09-10  9:22 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin

On 9/9/2025 5:39 PM, Chao Gao wrote:
> Maintain per-guest valid XSS bits and check XSS validity against them
> rather than against KVM capabilities. This is to prevent bits that are
> supported by KVM but not supported for a guest from being set.
> 
> Opportunistically return KVM_MSR_RET_UNSUPPORTED on IA32_XSS MSR accesses
> if guest CPUID doesn't enumerate X86_FEATURE_XSAVES. Since
> KVM_MSR_RET_UNSUPPORTED takes care of host_initiated cases, drop the
> host_initiated check.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

<snip>
> @@ -4011,15 +4011,14 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		}
>   		break;
>   	case MSR_IA32_XSS:
> -		if (!msr_info->host_initiated &&
> -		    !guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
> -			return 1;
> +		if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
> +			return KVM_MSR_RET_UNSUPPORTED;
>   		/*
>   		 * KVM supports exposing PT to the guest, but does not support
>   		 * IA32_XSS[bit 8]. Guests have to use RDMSR/WRMSR rather than
>   		 * XSAVES/XRSTORS to save/restore PT MSRs.
>   		 */

Not an issue of this patch, there seems not the proper place to put 
above comment.
> -		if (data & ~kvm_caps.supported_xss)
> +		if (data & ~vcpu->arch.guest_supported_xss)
>   			return 1;
>   		vcpu->arch.ia32_xss = data;
>   		vcpu->arch.cpuid_dynamic_bits_dirty = true;


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 04/22] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2025-09-09  9:39 ` [PATCH v14 04/22] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS Chao Gao
@ 2025-09-10  9:23   ` Xiaoyao Li
  2025-09-11  7:02   ` Binbin Wu
  1 sibling, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2025-09-10  9:23 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin

On 9/9/2025 5:39 PM, Chao Gao wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
> 
> Update CPUID.(EAX=0DH,ECX=1).EBX to reflect current required xstate size
> due to XSS MSR modification.
> CPUID(EAX=0DH,ECX=1).EBX reports the required storage size of all enabled
> xstate features in (XCR0 | IA32_XSS). The CPUID value can be used by guest
> before allocate sufficient xsave buffer.
> 
> Note, KVM does not yet support any XSS based features, i.e. supported_xss
> is guaranteed to be zero at this time.
> 
> Opportunistically skip CPUID updates if XSS value doesn't change.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> ---
>   arch/x86/kvm/cpuid.c | 3 ++-
>   arch/x86/kvm/x86.c   | 2 ++
>   2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 46cf616663e6..b5f87254ced7 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -316,7 +316,8 @@ static void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
>   	best = kvm_find_cpuid_entry_index(vcpu, 0xD, 1);
>   	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
>   		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
> -		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
> +		best->ebx = xstate_required_size(vcpu->arch.xcr0 |
> +						 vcpu->arch.ia32_xss, true);
>   }
>   
>   static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 6c167117018c..bbae3bf405c7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4020,6 +4020,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		 */
>   		if (data & ~vcpu->arch.guest_supported_xss)
>   			return 1;
> +		if (vcpu->arch.ia32_xss == data)
> +			break;
>   		vcpu->arch.ia32_xss = data;
>   		vcpu->arch.cpuid_dynamic_bits_dirty = true;
>   		break;


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 05/22] KVM: x86: Initialize kvm_caps.supported_xss
  2025-09-09  9:39 ` [PATCH v14 05/22] KVM: x86: Initialize kvm_caps.supported_xss Chao Gao
@ 2025-09-10  9:36   ` Xiaoyao Li
  0 siblings, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2025-09-10  9:36 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin

On 9/9/2025 5:39 PM, Chao Gao wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
> 
> Set original kvm_caps.supported_xss to (host_xss & KVM_SUPPORTED_XSS) if
> XSAVES is supported. host_xss contains the host supported xstate feature
> bits for thread FPU context switch, KVM_SUPPORTED_XSS includes all KVM
> enabled XSS feature bits, the resulting value represents the supervisor
> xstates that are available to guest and are backed by host FPU framework
> for swapping {guest,host} XSAVE-managed registers/MSRs.
> 
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
>   arch/x86/kvm/x86.c | 11 ++++++++---
>   1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index bbae3bf405c7..c15e8c00dc7d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -220,6 +220,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
>   				| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
>   				| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
>   
> +#define KVM_SUPPORTED_XSS     0
> +

(related to my comment on previous patch)

It seems better to move the comment about Intel PT (IA32_XSS[bit 8]) in 
kvm_set_msr_common() to above KVM_SUPPORTED_XSS.

>   bool __read_mostly allow_smaller_maxphyaddr = 0;
>   EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr);
>   
> @@ -9789,14 +9791,17 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>   		kvm_host.xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
>   		kvm_caps.supported_xcr0 = kvm_host.xcr0 & KVM_SUPPORTED_XCR0;
>   	}
> +
> +	if (boot_cpu_has(X86_FEATURE_XSAVES)) {
> +		rdmsrq(MSR_IA32_XSS, kvm_host.xss);
> +		kvm_caps.supported_xss = kvm_host.xss & KVM_SUPPORTED_XSS;
> +	}
> +
>   	kvm_caps.supported_quirks = KVM_X86_VALID_QUIRKS;
>   	kvm_caps.inapplicable_quirks = KVM_X86_CONDITIONAL_QUIRKS;
>   
>   	rdmsrq_safe(MSR_EFER, &kvm_host.efer);
>   
> -	if (boot_cpu_has(X86_FEATURE_XSAVES))
> -		rdmsrq(MSR_IA32_XSS, kvm_host.xss);
> -
>   	kvm_init_pmu_capability(ops->pmu_ops);
>   
>   	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 06/22] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs
  2025-09-09  9:39 ` [PATCH v14 06/22] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Chao Gao
@ 2025-09-10  9:37   ` Xiaoyao Li
  2025-09-10 11:18     ` Chao Gao
  0 siblings, 1 reply; 53+ messages in thread
From: Xiaoyao Li @ 2025-09-10  9:37 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin

On 9/9/2025 5:39 PM, Chao Gao wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> Load the guest's FPU state if userspace is accessing MSRs whose values
> are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(),
> to facilitate access to such kind of MSRs.
> 
> If MSRs supported in kvm_caps.supported_xss are passed through to guest,
> the guest MSRs are swapped with host's before vCPU exits to userspace and
> after it reenters kernel before next VM-entry.
> 
> Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
> explicitly check @vcpu is non-null before attempting to load guest state.
> The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without
> loading guest FPU state (which doesn't exist).
> 
> Note that guest_cpuid_has() is not queried as host userspace is allowed to
> access MSRs that have not been exposed to the guest, e.g. it might do
> KVM_SET_MSRS prior to KVM_SET_CPUID2.
> 
> The two helpers are put here in order to manifest accessing xsave-managed
> MSRs requires special check and handling to guarantee the correctness of
> read/write to the MSRs.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Co-developed-by: Yang Weijiang <weijiang.yang@intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> v14:
> - s/rdmsrl/rdmsrq, s/wrmsrl/wrmsrq (Xin)
> - return true in is_xstate_managed_msr() for MSR_IA32_S_CET
> ---
>   arch/x86/kvm/x86.c | 36 +++++++++++++++++++++++++++++++++++-
>   arch/x86/kvm/x86.h | 24 ++++++++++++++++++++++++
>   2 files changed, 59 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c15e8c00dc7d..7c0a07be6b64 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -136,6 +136,9 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
>   static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
>   
>   static DEFINE_MUTEX(vendor_module_lock);
> +static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
> +static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
> +
>   struct kvm_x86_ops kvm_x86_ops __read_mostly;
>   
>   #define KVM_X86_OP(func)					     \
> @@ -4566,6 +4569,22 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   }
>   EXPORT_SYMBOL_GPL(kvm_get_msr_common);
>   
> +/*
> + *  Returns true if the MSR in question is managed via XSTATE, i.e. is context
> + *  switched with the rest of guest FPU state.
> + */
> +static bool is_xstate_managed_msr(u32 index)
> +{
> +	switch (index) {
> +	case MSR_IA32_S_CET:
> +	case MSR_IA32_U_CET:
> +	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> +		return true;
> +	default:
> +		return false;
> +	}
> +}
> +
>   /*
>    * Read or write a bunch of msrs. All parameters are kernel addresses.
>    *
> @@ -4576,11 +4595,26 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
>   		    int (*do_msr)(struct kvm_vcpu *vcpu,
>   				  unsigned index, u64 *data))
>   {
> +	bool fpu_loaded = false;
>   	int i;
>   
> -	for (i = 0; i < msrs->nmsrs; ++i)
> +	for (i = 0; i < msrs->nmsrs; ++i) {
> +		/*
> +		 * If userspace is accessing one or more XSTATE-managed MSRs,
> +		 * temporarily load the guest's FPU state so that the guest's
> +		 * MSR value(s) is resident in hardware, i.e. so that KVM can
> +		 * get/set the MSR via RDMSR/WRMSR.
> +		 */
> +		if (vcpu && !fpu_loaded && kvm_caps.supported_xss &&

why not check vcpu->arch.guest_supported_xss?

> +		    is_xstate_managed_msr(entries[i].index)) {
> +			kvm_load_guest_fpu(vcpu);
> +			fpu_loaded = true;
> +		}
>   		if (do_msr(vcpu, entries[i].index, &entries[i].data))
>   			break;
> +	}
> +	if (fpu_loaded)
> +		kvm_put_guest_fpu(vcpu);
>   
>   	return i;
>   }
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index eb3088684e8a..34afe43579bb 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -701,4 +701,28 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl,
>   
>   int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
>   
> +/*
> + * Lock and/or reload guest FPU and access xstate MSRs. For accesses initiated
> + * by host, guest FPU is loaded in __msr_io(). For accesses initiated by guest,
> + * guest FPU should have been loaded already.
> + */
> +
> +static inline void kvm_get_xstate_msr(struct kvm_vcpu *vcpu,
> +				      struct msr_data *msr_info)
> +{
> +	KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
> +	kvm_fpu_get();
> +	rdmsrq(msr_info->index, msr_info->data);
> +	kvm_fpu_put();
> +}
> +
> +static inline void kvm_set_xstate_msr(struct kvm_vcpu *vcpu,
> +				      struct msr_data *msr_info)
> +{
> +	KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
> +	kvm_fpu_get();
> +	wrmsrq(msr_info->index, msr_info->data);
> +	kvm_fpu_put();
> +}
> +
>   #endif


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 07/22] KVM: x86: Add fault checks for guest CR4.CET setting
  2025-09-09  9:39 ` [PATCH v14 07/22] KVM: x86: Add fault checks for guest CR4.CET setting Chao Gao
@ 2025-09-10  9:38   ` Xiaoyao Li
  0 siblings, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2025-09-10  9:38 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin

On 9/9/2025 5:39 PM, Chao Gao wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
> 
> Check potential faults for CR4.CET setting per Intel SDM requirements.
> CET can be enabled if and only if CR0.WP == 1, i.e. setting CR4.CET ==
> 1 faults if CR0.WP == 0 and setting CR0.WP == 0 fails if CR4.CET == 1.
> 
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> ---
>   arch/x86/kvm/x86.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7c0a07be6b64..50c192c99a7e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1173,6 +1173,9 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
>   	    (is_64_bit_mode(vcpu) || kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE)))
>   		return 1;
>   
> +	if (!(cr0 & X86_CR0_WP) && kvm_is_cr4_bit_set(vcpu, X86_CR4_CET))
> +		return 1;
> +
>   	kvm_x86_call(set_cr0)(vcpu, cr0);
>   
>   	kvm_post_set_cr0(vcpu, old_cr0, cr0);
> @@ -1372,6 +1375,9 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>   			return 1;
>   	}
>   
> +	if ((cr4 & X86_CR4_CET) && !kvm_is_cr0_bit_set(vcpu, X86_CR0_WP))
> +		return 1;
> +
>   	kvm_x86_call(set_cr4)(vcpu, cr4);
>   
>   	kvm_post_set_cr4(vcpu, old_cr4, cr4);


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 06/22] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs
  2025-09-10  9:37   ` Xiaoyao Li
@ 2025-09-10 11:18     ` Chao Gao
  2025-09-10 13:46       ` Xiaoyao Li
  2025-09-10 17:50       ` Sean Christopherson
  0 siblings, 2 replies; 53+ messages in thread
From: Chao Gao @ 2025-09-10 11:18 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: kvm, linux-kernel, acme, bp, dave.hansen, hpa, john.allen, mingo,
	mingo, minipli, mlevitsk, namhyung, pbonzini, prsampat,
	rick.p.edgecombe, seanjc, shuah, tglx, weijiang.yang, x86, xin

On Wed, Sep 10, 2025 at 05:37:50PM +0800, Xiaoyao Li wrote:
>On 9/9/2025 5:39 PM, Chao Gao wrote:
>> From: Sean Christopherson <seanjc@google.com>
>> 
>> Load the guest's FPU state if userspace is accessing MSRs whose values
>> are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(),
>> to facilitate access to such kind of MSRs.
>> 
>> If MSRs supported in kvm_caps.supported_xss are passed through to guest,
>> the guest MSRs are swapped with host's before vCPU exits to userspace and
>> after it reenters kernel before next VM-entry.
>> 
>> Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
>> explicitly check @vcpu is non-null before attempting to load guest state.
>> The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without
>> loading guest FPU state (which doesn't exist).
>> 
>> Note that guest_cpuid_has() is not queried as host userspace is allowed to
>> access MSRs that have not been exposed to the guest, e.g. it might do
>> KVM_SET_MSRS prior to KVM_SET_CPUID2.

...

>> +	bool fpu_loaded = false;
>>   	int i;
>> -	for (i = 0; i < msrs->nmsrs; ++i)
>> +	for (i = 0; i < msrs->nmsrs; ++i) {
>> +		/*
>> +		 * If userspace is accessing one or more XSTATE-managed MSRs,
>> +		 * temporarily load the guest's FPU state so that the guest's
>> +		 * MSR value(s) is resident in hardware, i.e. so that KVM can
>> +		 * get/set the MSR via RDMSR/WRMSR.
>> +		 */
>> +		if (vcpu && !fpu_loaded && kvm_caps.supported_xss &&
>
>why not check vcpu->arch.guest_supported_xss?

Looks like Sean anticipated someone would ask this question.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 03/22] KVM: x86: Check XSS validity against guest CPUIDs
  2025-09-10  9:22   ` Xiaoyao Li
@ 2025-09-10 11:33     ` Chao Gao
  2025-09-10 18:47       ` Sean Christopherson
  0 siblings, 1 reply; 53+ messages in thread
From: Chao Gao @ 2025-09-10 11:33 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: kvm, linux-kernel, acme, bp, dave.hansen, hpa, john.allen, mingo,
	mingo, minipli, mlevitsk, namhyung, pbonzini, prsampat,
	rick.p.edgecombe, seanjc, shuah, tglx, weijiang.yang, x86, xin

On Wed, Sep 10, 2025 at 05:22:15PM +0800, Xiaoyao Li wrote:
>On 9/9/2025 5:39 PM, Chao Gao wrote:
>> Maintain per-guest valid XSS bits and check XSS validity against them
>> rather than against KVM capabilities. This is to prevent bits that are
>> supported by KVM but not supported for a guest from being set.
>> 
>> Opportunistically return KVM_MSR_RET_UNSUPPORTED on IA32_XSS MSR accesses
>> if guest CPUID doesn't enumerate X86_FEATURE_XSAVES. Since
>> KVM_MSR_RET_UNSUPPORTED takes care of host_initiated cases, drop the
>> host_initiated check.
>> 
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>
>Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
>
><snip>
>> @@ -4011,15 +4011,14 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   		}
>>   		break;
>>   	case MSR_IA32_XSS:
>> -		if (!msr_info->host_initiated &&
>> -		    !guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
>> -			return 1;
>> +		if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
>> +			return KVM_MSR_RET_UNSUPPORTED;
>>   		/*
>>   		 * KVM supports exposing PT to the guest, but does not support
>>   		 * IA32_XSS[bit 8]. Guests have to use RDMSR/WRMSR rather than
>>   		 * XSAVES/XRSTORS to save/restore PT MSRs.
>>   		 */
>
>Not an issue of this patch, there seems not the proper place to put above
>comment.

Agreed.

I am curious why PT state isn't supported, which is apparently missing from
the comment. If it is due to lack of host FPU support, I think the recent
guest-only xfeatures we built for CET can help.

Anyway, PT is only visible on BROKEN kernels. so we won't do anything for
now besides documenting the reason.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 06/22] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs
  2025-09-10 11:18     ` Chao Gao
@ 2025-09-10 13:46       ` Xiaoyao Li
  2025-09-10 15:24         ` Chao Gao
  2025-09-10 17:50       ` Sean Christopherson
  1 sibling, 1 reply; 53+ messages in thread
From: Xiaoyao Li @ 2025-09-10 13:46 UTC (permalink / raw)
  To: Chao Gao
  Cc: kvm, linux-kernel, acme, bp, dave.hansen, hpa, john.allen, mingo,
	mingo, minipli, mlevitsk, namhyung, pbonzini, prsampat,
	rick.p.edgecombe, seanjc, shuah, tglx, weijiang.yang, x86, xin

On 9/10/2025 7:18 PM, Chao Gao wrote:
> On Wed, Sep 10, 2025 at 05:37:50PM +0800, Xiaoyao Li wrote:
>> On 9/9/2025 5:39 PM, Chao Gao wrote:
>>> From: Sean Christopherson <seanjc@google.com>
>>>
>>> Load the guest's FPU state if userspace is accessing MSRs whose values
>>> are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(),
>>> to facilitate access to such kind of MSRs.
>>>
>>> If MSRs supported in kvm_caps.supported_xss are passed through to guest,
>>> the guest MSRs are swapped with host's before vCPU exits to userspace and
>>> after it reenters kernel before next VM-entry.
>>>
>>> Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
>>> explicitly check @vcpu is non-null before attempting to load guest state.
>>> The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without
>>> loading guest FPU state (which doesn't exist).
>>>
>>> Note that guest_cpuid_has() is not queried as host userspace is allowed to
>>> access MSRs that have not been exposed to the guest, e.g. it might do
>>> KVM_SET_MSRS prior to KVM_SET_CPUID2.
> 
> ...
> 
>>> +	bool fpu_loaded = false;
>>>    	int i;
>>> -	for (i = 0; i < msrs->nmsrs; ++i)
>>> +	for (i = 0; i < msrs->nmsrs; ++i) {
>>> +		/*
>>> +		 * If userspace is accessing one or more XSTATE-managed MSRs,
>>> +		 * temporarily load the guest's FPU state so that the guest's
>>> +		 * MSR value(s) is resident in hardware, i.e. so that KVM can
>>> +		 * get/set the MSR via RDMSR/WRMSR.
>>> +		 */
>>> +		if (vcpu && !fpu_loaded && kvm_caps.supported_xss &&
>>
>> why not check vcpu->arch.guest_supported_xss?
> 
> Looks like Sean anticipated someone would ask this question.

here it determines whether to call kvm_load_guest_fpu().

- based on kvm_caps.supported_xss, it will always load guest fpu.
- based on vcpu->arch.guest_supported_xss, it depends on whether 
userspace calls KVM_SET_CPUID2 and whether it enables any XSS feature.

So the difference is when no XSS feature is enabled for the VM.

In this case, if checking vcpu->arch.guest_supported_xss, it will skip
kvm_load_guest_fpu(). And it will result in GET_MSR gets usrerspace's 
value and SET_MSR changes userspace's value, when MSR access is 
eventually allowed in later do_msr() callback. Is my understanding 
correctly?


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 06/22] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs
  2025-09-10 13:46       ` Xiaoyao Li
@ 2025-09-10 15:24         ` Chao Gao
  0 siblings, 0 replies; 53+ messages in thread
From: Chao Gao @ 2025-09-10 15:24 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: kvm, linux-kernel, acme, bp, dave.hansen, hpa, john.allen, mingo,
	mingo, minipli, mlevitsk, namhyung, pbonzini, prsampat,
	rick.p.edgecombe, seanjc, shuah, tglx, weijiang.yang, x86, xin

On Wed, Sep 10, 2025 at 09:46:01PM +0800, Xiaoyao Li wrote:
>On 9/10/2025 7:18 PM, Chao Gao wrote:
>> On Wed, Sep 10, 2025 at 05:37:50PM +0800, Xiaoyao Li wrote:
>> > On 9/9/2025 5:39 PM, Chao Gao wrote:
>> > > From: Sean Christopherson <seanjc@google.com>
>> > > 
>> > > Load the guest's FPU state if userspace is accessing MSRs whose values
>> > > are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(),
>> > > to facilitate access to such kind of MSRs.
>> > > 
>> > > If MSRs supported in kvm_caps.supported_xss are passed through to guest,
>> > > the guest MSRs are swapped with host's before vCPU exits to userspace and
>> > > after it reenters kernel before next VM-entry.
>> > > 
>> > > Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
>> > > explicitly check @vcpu is non-null before attempting to load guest state.
>> > > The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without
>> > > loading guest FPU state (which doesn't exist).
>> > > 
>> > > Note that guest_cpuid_has() is not queried as host userspace is allowed to
>> > > access MSRs that have not been exposed to the guest, e.g. it might do
>> > > KVM_SET_MSRS prior to KVM_SET_CPUID2.
>> 
>> ...
>> 
>> > > +	bool fpu_loaded = false;
>> > >    	int i;
>> > > -	for (i = 0; i < msrs->nmsrs; ++i)
>> > > +	for (i = 0; i < msrs->nmsrs; ++i) {
>> > > +		/*
>> > > +		 * If userspace is accessing one or more XSTATE-managed MSRs,
>> > > +		 * temporarily load the guest's FPU state so that the guest's
>> > > +		 * MSR value(s) is resident in hardware, i.e. so that KVM can
>> > > +		 * get/set the MSR via RDMSR/WRMSR.
>> > > +		 */
>> > > +		if (vcpu && !fpu_loaded && kvm_caps.supported_xss &&
>> > 
>> > why not check vcpu->arch.guest_supported_xss?
>> 
>> Looks like Sean anticipated someone would ask this question.
>
>here it determines whether to call kvm_load_guest_fpu().
>
>- based on kvm_caps.supported_xss, it will always load guest fpu.
>- based on vcpu->arch.guest_supported_xss, it depends on whether userspace
>calls KVM_SET_CPUID2 and whether it enables any XSS feature.
>
>So the difference is when no XSS feature is enabled for the VM.
>
>In this case, if checking vcpu->arch.guest_supported_xss, it will skip
>kvm_load_guest_fpu(). And it will result in GET_MSR gets usrerspace's value
>and SET_MSR changes userspace's value, when MSR access is eventually allowed
>in later do_msr() callback. Is my understanding correctly?

Actually, there will be no functional issue.

Those MSR accesses are always "rejected" with KVM_MSR_RET_UNSUPPORTED by
__kvm_set/get_msr() and get fixup if they are "host_initiated" in
kvm_do_msr_access(). KVM doesn't access any hardware MSRs in the process.

Using vcpu->arch.guest_supported_xss here also works, but the correctness
isn't that obvious for this special case.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 01/22] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support
  2025-09-09  9:39 ` [PATCH v14 01/22] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support Chao Gao
  2025-09-10  9:03   ` Xiaoyao Li
@ 2025-09-10 17:17   ` Sean Christopherson
  2025-09-10 17:35   ` Sean Christopherson
  2 siblings, 0 replies; 53+ messages in thread
From: Sean Christopherson @ 2025-09-10 17:17 UTC (permalink / raw)
  To: Chao Gao
  Cc: kvm, linux-kernel, acme, bp, dave.hansen, hpa, john.allen, mingo,
	mingo, minipli, mlevitsk, namhyung, pbonzini, prsampat,
	rick.p.edgecombe, shuah, tglx, weijiang.yang, x86, xin,
	xiaoyao.li

On Tue, Sep 09, 2025, Chao Gao wrote:
> +static int kvm_set_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *value)
> +{
> +	u64 val;
> +
> +	if (get_user(val, value))
> +		return -EFAULT;
> +
> +	return do_set_msr(vcpu, msr, &val);

This needs to explicitly return -EINVAL on failure, otherwise KVM will return
semi-arbitrary positive values to userspace.

> +}
> +
>  #ifdef CONFIG_X86_64
>  struct pvclock_clock {
>  	int vclock_mode;
> @@ -4737,6 +4762,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_IRQFD_RESAMPLE:
>  	case KVM_CAP_MEMORY_FAULT_INFO:
>  	case KVM_CAP_X86_GUEST_MODE:
> +	case KVM_CAP_ONE_REG:

We should add (partial) support for KVM_GET_REG_LIST, otherwise the ABI for
handling GUEST_SSP is effectively undefined.  And because the ioctl is per-vCPU,
utilizing KVM_GET_REG_LIST gives us the opportunity to avoid the horrors we
created with KVM_GET_MSR_INDEX_LIST, where KVM enumerates MSRs that aren't fully
supported by the vCPU.

If we don't enumerate GUEST_SSP via KVM_GET_REG_LIST, then trying to do
KVM_{G,S}ET_ONE_REG will "unexpectedly" fail if the vCPU doesn't have SHSTK.
By enumerating GUEST_SSP in KVM_GET_REG_LIST _if and only if_ it's fully supported,
we'll have a much more explicit ABI than we do for MSRs.  And if we don't do that,
we'd have to special case MSR_KVM_INTERNAL_GUEST_SSP in kvm_is_advertised_msr().

As for MSRs, that's where "partial" support comes in.  For MSRs, I think the least
awful option is to keep using KVM_GET_MSR_INDEX_LIST for enumerating MSRs, and
document that any MSRs that can be accessed via KVM_{G,S}ET_MSRS can be accessed
via KVM_{G,S}ET_ONE_REG.  That avoids having to bake in different behavior for
MSR vs. ONE_REG accesses (and avoids having to add a pile of code to precisely
enumerate support for per-vCPU MSRs).

>  		r = 1;
>  		break;
>  	case KVM_CAP_PRE_FAULT_MEMORY:
> @@ -5915,6 +5941,20 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
>  	}
>  }
>  
> +struct kvm_x86_reg_id {
> +	__u32 index;
> +	__u8  type;
> +	__u8  rsvd1;
> +	__u8  rsvd2:4;
> +	__u8  size:4;
> +	__u8  x86;
> +};
> +
> +static int kvm_translate_kvm_reg(struct kvm_x86_reg_id *reg)
> +{
> +	return -EINVAL;
> +}
> +
>  long kvm_arch_vcpu_ioctl(struct file *filp,
>  			 unsigned int ioctl, unsigned long arg)
>  {
> @@ -6031,6 +6071,44 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>  		srcu_read_unlock(&vcpu->kvm->srcu, idx);
>  		break;
>  	}
> +	case KVM_GET_ONE_REG:
> +	case KVM_SET_ONE_REG: {
> +		struct kvm_x86_reg_id *id;
> +		struct kvm_one_reg reg;
> +		u64 __user *value;
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&reg, argp, sizeof(reg)))
> +			break;
> +
> +		r = -EINVAL;
> +		if ((reg.id & KVM_REG_ARCH_MASK) != KVM_REG_X86)
> +			break;
> +
> +		id = (struct kvm_x86_reg_id *)&reg.id;
> +		if (id->rsvd1 || id->rsvd2)
> +			break;
> +
> +		if (id->type == KVM_X86_REG_TYPE_KVM) {
> +			r = kvm_translate_kvm_reg(id);
> +			if (r)
> +				break;
> +		}
> +
> +		r = -EINVAL;
> +		if (id->type != KVM_X86_REG_TYPE_MSR)
> +			break;
> +
> +		if ((reg.id & KVM_REG_SIZE_MASK) != KVM_REG_SIZE_U64)
> +			break;
> +
> +		value = u64_to_user_ptr(reg.addr);
> +		if (ioctl == KVM_GET_ONE_REG)
> +			r = kvm_get_one_msr(vcpu, id->index, value);
> +		else
> +			r = kvm_set_one_msr(vcpu, id->index, value);
> +		break;
> +	}

I think it makes sense to put this in a separate helper, if only so that the
error returns are more obvious.

>  	case KVM_TPR_ACCESS_REPORTING: {
>  		struct kvm_tpr_access_ctl tac;
>  
> -- 
> 2.47.3
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 01/22] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support
  2025-09-09  9:39 ` [PATCH v14 01/22] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support Chao Gao
  2025-09-10  9:03   ` Xiaoyao Li
  2025-09-10 17:17   ` Sean Christopherson
@ 2025-09-10 17:35   ` Sean Christopherson
  2 siblings, 0 replies; 53+ messages in thread
From: Sean Christopherson @ 2025-09-10 17:35 UTC (permalink / raw)
  To: Chao Gao
  Cc: kvm, linux-kernel, acme, bp, dave.hansen, hpa, john.allen, mingo,
	mingo, minipli, mlevitsk, namhyung, pbonzini, prsampat,
	rick.p.edgecombe, shuah, tglx, weijiang.yang, x86, xin,
	xiaoyao.li

On Tue, Sep 09, 2025, Chao Gao wrote:
> @@ -6031,6 +6071,44 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>  		srcu_read_unlock(&vcpu->kvm->srcu, idx);
>  		break;
>  	}
> +	case KVM_GET_ONE_REG:
> +	case KVM_SET_ONE_REG: {
> +		struct kvm_x86_reg_id *id;
> +		struct kvm_one_reg reg;
> +		u64 __user *value;
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&reg, argp, sizeof(reg)))
> +			break;
> +
> +		r = -EINVAL;
> +		if ((reg.id & KVM_REG_ARCH_MASK) != KVM_REG_X86)
> +			break;
> +
> +		id = (struct kvm_x86_reg_id *)&reg.id;
> +		if (id->rsvd1 || id->rsvd2)
> +			break;
> +
> +		if (id->type == KVM_X86_REG_TYPE_KVM) {
> +			r = kvm_translate_kvm_reg(id);
> +			if (r)
> +				break;
> +		}
> +
> +		r = -EINVAL;
> +		if (id->type != KVM_X86_REG_TYPE_MSR)
> +			break;
> +
> +		if ((reg.id & KVM_REG_SIZE_MASK) != KVM_REG_SIZE_U64)
> +			break;
> +

Almost forgot.  I think it makes sense to grab kvm->srcu here.  I'm not entirely
positive that's necessary these days, e.g. after commit 3617c0ee7dec ("KVM: x86/xen:
Only write Xen hypercall page for guest writes to MSR")but there are path, but
_proving_ that there are no memory or MSR/PMU filter accesses is practically
impossible given how much code is reachable via MSR emulation.

If someone wants to put in the effort to prove SRCU isn't needed, then we should
also drop SRCU protection from KVM_{G,S}ET_MSRS.

> +		value = u64_to_user_ptr(reg.addr);
> +		if (ioctl == KVM_GET_ONE_REG)
> +			r = kvm_get_one_msr(vcpu, id->index, value);
> +		else
> +			r = kvm_set_one_msr(vcpu, id->index, value);
> +		break;
> +	}
>  	case KVM_TPR_ACCESS_REPORTING: {
>  		struct kvm_tpr_access_ctl tac;
>  
> -- 
> 2.47.3
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 06/22] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs
  2025-09-10 11:18     ` Chao Gao
  2025-09-10 13:46       ` Xiaoyao Li
@ 2025-09-10 17:50       ` Sean Christopherson
  1 sibling, 0 replies; 53+ messages in thread
From: Sean Christopherson @ 2025-09-10 17:50 UTC (permalink / raw)
  To: Chao Gao
  Cc: Xiaoyao Li, kvm, linux-kernel, acme, bp, dave.hansen, hpa,
	john.allen, mingo, mingo, minipli, mlevitsk, namhyung, pbonzini,
	prsampat, rick.p.edgecombe, shuah, tglx, weijiang.yang, x86, xin

On Wed, Sep 10, 2025, Chao Gao wrote:
> On Wed, Sep 10, 2025 at 05:37:50PM +0800, Xiaoyao Li wrote:
> >On 9/9/2025 5:39 PM, Chao Gao wrote:
> >> From: Sean Christopherson <seanjc@google.com>
> >> 
> >> Load the guest's FPU state if userspace is accessing MSRs whose values
> >> are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(),
> >> to facilitate access to such kind of MSRs.
> >> 
> >> If MSRs supported in kvm_caps.supported_xss are passed through to guest,
> >> the guest MSRs are swapped with host's before vCPU exits to userspace and
> >> after it reenters kernel before next VM-entry.
> >> 
> >> Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
> >> explicitly check @vcpu is non-null before attempting to load guest state.
> >> The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without
> >> loading guest FPU state (which doesn't exist).
> >> 
> >> Note that guest_cpuid_has() is not queried as host userspace is allowed to
> >> access MSRs that have not been exposed to the guest, e.g. it might do
> >> KVM_SET_MSRS prior to KVM_SET_CPUID2.
> 
> ...
> 
> >> +	bool fpu_loaded = false;
> >>   	int i;
> >> -	for (i = 0; i < msrs->nmsrs; ++i)
> >> +	for (i = 0; i < msrs->nmsrs; ++i) {
> >> +		/*
> >> +		 * If userspace is accessing one or more XSTATE-managed MSRs,
> >> +		 * temporarily load the guest's FPU state so that the guest's
> >> +		 * MSR value(s) is resident in hardware, i.e. so that KVM can
> >> +		 * get/set the MSR via RDMSR/WRMSR.
> >> +		 */
> >> +		if (vcpu && !fpu_loaded && kvm_caps.supported_xss &&
> >
> >why not check vcpu->arch.guest_supported_xss?
> 
> Looks like Sean anticipated someone would ask this question.

I don't think so, I'm pretty sure querying kvm_caps.supported_xss is a holdover
from the early days of this patch, e.g. before guest_cpu_cap_has() existed, and
potentially even before vcpu->arch.guest_supported_xss existed.

I'm pretty sure we can make this less weird and more accurate:

/*
 * Returns true if the MSR in question is managed via XSTATE, i.e. is context
 * switched with the rest of guest FPU state.  Note!  S_CET is _not_ context
 * switched via XSTATE even though it _is_ saved/restored via XSAVES/XRSTORS.
 * Because S_CET is loaded on VM-Enter and VM-Exit via dedicated VMCS fields,
 * the value saved/restored via XSTATE is always the host's value.  That detail
 * is _extremely_ important, as the guest's S_CET must _never_ be resident in
 * hardware while executing in the host.  Loading guest values for U_CET and
 * PL[0-3]_SSP while executing in the kernel is safe, as U_CET is specific to
 * userspace, and PL[0-3]_SSP are only consumed when transitioning to lower
 * privilegel levels, i.e. are effectively only consumed by userspace as well.
 */
static bool is_xstate_managed_msr(struct kvm_vcpu *vcpu, u32 msr)
{
	if (!vcpu)
		return false;

	switch (msr) {
	case MSR_IA32_U_CET:
		return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) ||
		       guest_cpu_cap_has(vcpu, X86_FEATURE_IBT);
	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
		return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
	default:
		return false;
	}
}

Which is very desirable because the KVM_{G,S}ET_ONE_REG path also needs to
load/put the FPU, as found via a WIP selftest that tripped:

  KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);

And if we simplify is_xstate_managed_msr(), then the accessors can also do:

  KVM_BUG_ON(!is_xstate_managed_msr(vcpu, msr_info->index), vcpu->kvm);

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 22/22] KVM: selftest: Add tests for KVM_{GET,SET}_ONE_REG
  2025-09-09  9:39 ` [PATCH v14 22/22] KVM: selftest: Add tests for KVM_{GET,SET}_ONE_REG Chao Gao
@ 2025-09-10 18:06   ` Sean Christopherson
  0 siblings, 0 replies; 53+ messages in thread
From: Sean Christopherson @ 2025-09-10 18:06 UTC (permalink / raw)
  To: Chao Gao
  Cc: kvm, linux-kernel, acme, bp, dave.hansen, hpa, john.allen, mingo,
	mingo, minipli, mlevitsk, namhyung, pbonzini, prsampat,
	rick.p.edgecombe, shuah, tglx, weijiang.yang, x86, xin,
	xiaoyao.li

On Tue, Sep 09, 2025, Chao Gao wrote:
> Add tests for newly added KVM_{GET,SET}_ONE_REG support for x86. Verify the
> new ioctls can read and write real MSRs and synthetic MSRs.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
>  tools/arch/x86/include/uapi/asm/kvm.h         | 29 ++++++++++++++++++
>  tools/testing/selftests/kvm/Makefile.kvm      |  1 +
>  .../selftests/kvm/x86/get_set_one_reg.c       | 30 +++++++++++++++++++
>  3 files changed, 60 insertions(+)
>  create mode 100644 tools/testing/selftests/kvm/x86/get_set_one_reg.c
> 
> diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include/uapi/asm/kvm.h
> index 6f3499507c5e..59ac0b46ebcc 100644
> --- a/tools/arch/x86/include/uapi/asm/kvm.h
> +++ b/tools/arch/x86/include/uapi/asm/kvm.h

Don't copy KVM headers to tools/, KVM selftests don't actually use them (i.e.
copying them is confusing/misleadling).  The copied headers are mainly used by
tools/perf, and they run a script to synchronize everything.

> diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
> index f6fe7a07a0a2..9a375d5faf1c 100644
> --- a/tools/testing/selftests/kvm/Makefile.kvm
> +++ b/tools/testing/selftests/kvm/Makefile.kvm
> @@ -136,6 +136,7 @@ TEST_GEN_PROGS_x86 += x86/max_vcpuid_cap_test
>  TEST_GEN_PROGS_x86 += x86/triple_fault_event_test
>  TEST_GEN_PROGS_x86 += x86/recalc_apic_map_test
>  TEST_GEN_PROGS_x86 += x86/aperfmperf_test
> +TEST_GEN_PROGS_x86 += x86/get_set_one_reg
>  TEST_GEN_PROGS_x86 += access_tracking_perf_test
>  TEST_GEN_PROGS_x86 += coalesced_io_test
>  TEST_GEN_PROGS_x86 += dirty_log_perf_test
> diff --git a/tools/testing/selftests/kvm/x86/get_set_one_reg.c b/tools/testing/selftests/kvm/x86/get_set_one_reg.c
> new file mode 100644
> index 000000000000..8a4dbc812214
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86/get_set_one_reg.c
> @@ -0,0 +1,30 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <fcntl.h>
> +#include <stdint.h>
> +#include <sys/ioctl.h>
> +
> +#include "test_util.h"
> +#include "kvm_util.h"
> +#include "processor.h"
> +
> +int main(int argc, char *argv[])
> +{
> +	struct kvm_vcpu *vcpu;
> +	struct kvm_vm *vm;
> +	u64 data;
> +
> +	TEST_REQUIRE(kvm_has_cap(KVM_CAP_ONE_REG));
> +
> +	vm = vm_create_with_one_vcpu(&vcpu, NULL);
> +
> +	TEST_ASSERT_EQ(__vcpu_get_reg(vcpu, KVM_X86_REG_MSR(MSR_EFER), &data), 0);
> +	TEST_ASSERT_EQ(__vcpu_set_reg(vcpu, KVM_X86_REG_MSR(MSR_EFER), data), 0);
> +
> +	if (kvm_cpu_has(X86_FEATURE_SHSTK)) {
> +		TEST_ASSERT_EQ(__vcpu_get_reg(vcpu, KVM_X86_REG_KVM(KVM_REG_GUEST_SSP), &data), 0);
> +		TEST_ASSERT_EQ(__vcpu_set_reg(vcpu, KVM_X86_REG_KVM(KVM_REG_GUEST_SSP), data), 0);

This isn't a very useful test, nor is it extensible.  I finally bit the bullet
and created an MSR test to mostly replace KUT's msr.c, and to add coverage for
KVM_{G,S}ET_ONE_REG and KVM_GET_REG_LIST.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 00/22] Enable CET Virtualization
  2025-09-09  9:52 ` [PATCH v14 00/22] Enable CET Virtualization Chao Gao
@ 2025-09-10 18:29   ` Sean Christopherson
  0 siblings, 0 replies; 53+ messages in thread
From: Sean Christopherson @ 2025-09-10 18:29 UTC (permalink / raw)
  To: Chao Gao
  Cc: kvm, linux-kernel, acme, bp, dave.hansen, hpa, john.allen, mingo,
	mingo, minipli, mlevitsk, namhyung, pbonzini, prsampat,
	rick.p.edgecombe, shuah, tglx, x86, xin, xiaoyao.li

On Tue, Sep 09, 2025, Chao Gao wrote:
> On Tue, Sep 09, 2025 at 02:39:31AM -0700, Chao Gao wrote:
> >The FPU support for CET virtualization has already been merged into 6.17-rc1.
> >Building on that, this series introduces Intel CET virtualization support for
> >KVM.
> >
> >Changes in v14
> >1. rename the type of guest SSP register to KVM_X86_REG_KVM and add docs
> >   for register IDs in api.rst (Sean, Xiaoyao)
> >2. update commit message of patch 1
> >3. use rdmsrq/wrmsrq() instead of rdmsrl/wrmsrl() in patch 6 (Xin)
> >4. split the introduction of per-guest guest_supported_xss into a
> >separate patch. (Xiaoyao)
> >5. make guest FPU and VMCS consistent regarding MSR_IA32_S_CET
> >6. collect reviews from Xiaoyao.
> 
> (Removed Weijiang's Intel email as it is bouncing)

Yeah, I'll try to remember to filter it out.  I'll post a v15, I have a decent
number of local changes and fixes (see responses, hopefully I captured everything),
along with a more comprehensive selftest.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 03/22] KVM: x86: Check XSS validity against guest CPUIDs
  2025-09-10 11:33     ` Chao Gao
@ 2025-09-10 18:47       ` Sean Christopherson
  0 siblings, 0 replies; 53+ messages in thread
From: Sean Christopherson @ 2025-09-10 18:47 UTC (permalink / raw)
  To: Chao Gao
  Cc: Xiaoyao Li, kvm, linux-kernel, acme, bp, dave.hansen, hpa,
	john.allen, mingo, mingo, minipli, mlevitsk, namhyung, pbonzini,
	prsampat, rick.p.edgecombe, shuah, tglx, weijiang.yang, x86, xin

On Wed, Sep 10, 2025, Chao Gao wrote:
> On Wed, Sep 10, 2025 at 05:22:15PM +0800, Xiaoyao Li wrote:
> >On 9/9/2025 5:39 PM, Chao Gao wrote:
> >> Maintain per-guest valid XSS bits and check XSS validity against them
> >> rather than against KVM capabilities. This is to prevent bits that are
> >> supported by KVM but not supported for a guest from being set.
> >> 
> >> Opportunistically return KVM_MSR_RET_UNSUPPORTED on IA32_XSS MSR accesses
> >> if guest CPUID doesn't enumerate X86_FEATURE_XSAVES. Since
> >> KVM_MSR_RET_UNSUPPORTED takes care of host_initiated cases, drop the
> >> host_initiated check.
> >> 
> >> Signed-off-by: Chao Gao <chao.gao@intel.com>
> >
> >Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> >
> ><snip>
> >> @@ -4011,15 +4011,14 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >>   		}
> >>   		break;
> >>   	case MSR_IA32_XSS:
> >> -		if (!msr_info->host_initiated &&
> >> -		    !guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
> >> -			return 1;
> >> +		if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
> >> +			return KVM_MSR_RET_UNSUPPORTED;
> >>   		/*
> >>   		 * KVM supports exposing PT to the guest, but does not support
> >>   		 * IA32_XSS[bit 8]. Guests have to use RDMSR/WRMSR rather than
> >>   		 * XSAVES/XRSTORS to save/restore PT MSRs.
> >>   		 */
> >
> >Not an issue of this patch, there seems not the proper place to put above
> >comment.
> 
> Agreed.

It was there to call out that KVM doesn't support any XSS bits even though KVM
supports a feature that architecturally can be context switched via XSS+XSTATE.
I'll find a better home for the comment (probably move it in patch 5 as
Xiaoyao suggested).

> I am curious why PT state isn't supported, which is apparently missing from
> the comment. If it is due to lack of host FPU support, I think the recent
> guest-only xfeatures we built for CET can help.

Presumably, perf uses PT across multiple tasks, i.e. doesn't want to context
switch PT state along with everything else.  For KVM, PT virtualization is
intertwined with perf, and so wholesale swapping guest PT state simply won't
work.
 
> Anyway, PT is only visible on BROKEN kernels. so we won't do anything for
> now besides documenting the reason.

Yeah, PT virtualization is riddled with problems, just ignore it.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 02/22] KVM: x86: Report XSS as to-be-saved if there are supported features
  2025-09-09  9:39 ` [PATCH v14 02/22] KVM: x86: Report XSS as to-be-saved if there are supported features Chao Gao
@ 2025-09-11  6:52   ` Binbin Wu
  0 siblings, 0 replies; 53+ messages in thread
From: Binbin Wu @ 2025-09-11  6:52 UTC (permalink / raw)
  To: Chao Gao
  Cc: kvm, linux-kernel, acme, bp, dave.hansen, hpa, john.allen, mingo,
	mingo, minipli, mlevitsk, namhyung, pbonzini, prsampat,
	rick.p.edgecombe, seanjc, shuah, tglx, weijiang.yang, x86, xin,
	xiaoyao.li



On 9/9/2025 5:39 PM, Chao Gao wrote:
> From: Sean Christopherson <seanjc@google.com>
>
> Add MSR_IA32_XSS to list of MSRs reported to userspace if supported_xss
> is non-zero, i.e. KVM supports at least one XSS based feature.
>
> Before enabling CET virtualization series, guest IA32_MSR_XSS is
> guaranteed to be 0, i.e., XSAVES/XRSTORS is executed in non-root mode
> with XSS == 0, which equals to the effect of XSAVE/XRSTOR.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>


> ---
>   arch/x86/kvm/x86.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f32d3edfc7b1..47b60f275fd7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -335,7 +335,7 @@ static const u32 msrs_to_save_base[] = {
>   	MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
>   	MSR_IA32_UMWAIT_CONTROL,
>   
> -	MSR_IA32_XFD, MSR_IA32_XFD_ERR,
> +	MSR_IA32_XFD, MSR_IA32_XFD_ERR, MSR_IA32_XSS,
>   };
>   
>   static const u32 msrs_to_save_pmu[] = {
> @@ -7470,6 +7470,10 @@ static void kvm_probe_msr_to_save(u32 msr_index)
>   		if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR))
>   			return;
>   		break;
> +	case MSR_IA32_XSS:
> +		if (!kvm_caps.supported_xss)
> +			return;
> +		break;
>   	default:
>   		break;
>   	}


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 04/22] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2025-09-09  9:39 ` [PATCH v14 04/22] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS Chao Gao
  2025-09-10  9:23   ` Xiaoyao Li
@ 2025-09-11  7:02   ` Binbin Wu
  1 sibling, 0 replies; 53+ messages in thread
From: Binbin Wu @ 2025-09-11  7:02 UTC (permalink / raw)
  To: Chao Gao
  Cc: kvm, linux-kernel, acme, bp, dave.hansen, hpa, john.allen, mingo,
	mingo, minipli, mlevitsk, namhyung, pbonzini, prsampat,
	rick.p.edgecombe, seanjc, shuah, tglx, weijiang.yang, x86, xin,
	xiaoyao.li



On 9/9/2025 5:39 PM, Chao Gao wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Update CPUID.(EAX=0DH,ECX=1).EBX to reflect current required xstate size
> due to XSS MSR modification.
> CPUID(EAX=0DH,ECX=1).EBX reports the required storage size of all enabled
> xstate features in (XCR0 | IA32_XSS). The CPUID value can be used by guest
> before allocate sufficient xsave buffer.

Nit:
allocate -> allocating.

Otherwise,
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

>
> Note, KVM does not yet support any XSS based features, i.e. supported_xss
> is guaranteed to be zero at this time.
>
> Opportunistically skip CPUID updates if XSS value doesn't change.
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
>   arch/x86/kvm/cpuid.c | 3 ++-
>   arch/x86/kvm/x86.c   | 2 ++
>   2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 46cf616663e6..b5f87254ced7 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -316,7 +316,8 @@ static void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
>   	best = kvm_find_cpuid_entry_index(vcpu, 0xD, 1);
>   	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
>   		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
> -		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
> +		best->ebx = xstate_required_size(vcpu->arch.xcr0 |
> +						 vcpu->arch.ia32_xss, true);
>   }
>   
>   static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 6c167117018c..bbae3bf405c7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4020,6 +4020,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		 */
>   		if (data & ~vcpu->arch.guest_supported_xss)
>   			return 1;
> +		if (vcpu->arch.ia32_xss == data)
> +			break;
>   		vcpu->arch.ia32_xss = data;
>   		vcpu->arch.cpuid_dynamic_bits_dirty = true;
>   		break;


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 11/22] KVM: VMX: Emulate read and write to CET MSRs
  2025-09-09  9:39 ` [PATCH v14 11/22] KVM: VMX: Emulate read and write to CET MSRs Chao Gao
@ 2025-09-11  8:05   ` Xiaoyao Li
  2025-09-11  9:02     ` Chao Gao
  0 siblings, 1 reply; 53+ messages in thread
From: Xiaoyao Li @ 2025-09-11  8:05 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin

On 9/9/2025 5:39 PM, Chao Gao wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
> 
> Add emulation interface for CET MSR access. The emulation code is split
> into common part and vendor specific part. The former does common checks
> for MSRs, e.g., accessibility, data validity etc., then passes operation
> to either XSAVE-managed MSRs via the helpers or CET VMCS fields.

I planed to continue the review after Sean posts v15 as he promised.
But I want to raise my question regarding it sooner so I just ask it on v14.

Do we expect to put the accessibility and data validity check always in 
__kvm_{s,g}_msr(), when the handling cannot be put in 
kvm_{g,s}et_common() only? i.e., there will be 3 case:

- All the handling in kvm_{g,s}et_common(), when the MSR emulation is 
common to vmx and svm.

- generic accessibility and data validity check in __kvm_{g,s}et_msr() 
and vendor specific handling in {vmx,svm}_{g,s}et_msr()

- generic accessibility and data validity check in __kvm_{g,s}et_msr() , 
vendor specific handling in {vmx,svm}_{g,s}et_msr() and other generic 
handling in kvm_{g,s}et_common()

> SSP can only be read via RDSSP. Writing even requires destructive and
> potentially faulting operations such as SAVEPREVSSP/RSTORSSP or
> SETSSBSY/CLRSSBSY. Let the host use a pseudo-MSR that is just a wrapper
> for the GUEST_SSP field of the VMCS.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> v14:
> - Update both hardware MSR value and VMCS field when userspace writes to
>    MSR_IA32_S_CET. This keeps guest FPU and VMCS always inconsistent
>    regarding MSR_IA32_S_CET.
> ---
>   arch/x86/kvm/vmx/vmx.c | 19 +++++++++++++
>   arch/x86/kvm/x86.c     | 60 ++++++++++++++++++++++++++++++++++++++++++
>   arch/x86/kvm/x86.h     | 23 ++++++++++++++++
>   3 files changed, 102 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 227b45430ad8..22bd71bebfad 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2106,6 +2106,15 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		else
>   			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
>   		break;
> +	case MSR_IA32_S_CET:
> +		msr_info->data = vmcs_readl(GUEST_S_CET);
> +		break;
> +	case MSR_KVM_INTERNAL_GUEST_SSP:
> +		msr_info->data = vmcs_readl(GUEST_SSP);
> +		break;
> +	case MSR_IA32_INT_SSP_TAB:
> +		msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
> +		break;
>   	case MSR_IA32_DEBUGCTLMSR:
>   		msr_info->data = vmx_guest_debugctl_read();
>   		break;
> @@ -2424,6 +2433,16 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		else
>   			vmx->pt_desc.guest.addr_a[index / 2] = data;
>   		break;
> +	case MSR_IA32_S_CET:
> +		vmcs_writel(GUEST_S_CET, data);
> +		kvm_set_xstate_msr(vcpu, msr_info);
> +		break;
> +	case MSR_KVM_INTERNAL_GUEST_SSP:
> +		vmcs_writel(GUEST_SSP, data);
> +		break;
> +	case MSR_IA32_INT_SSP_TAB:
> +		vmcs_writel(GUEST_INTR_SSP_TABLE, data);
> +		break;
>   	case MSR_IA32_PERF_CAPABILITIES:
>   		if (data & PMU_CAP_LBR_FMT) {
>   			if ((data & PMU_CAP_LBR_FMT) !=
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a6036eab3852..79861b7ad44d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1886,6 +1886,44 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
>   
>   		data = (u32)data;
>   		break;
> +	case MSR_IA32_U_CET:
> +	case MSR_IA32_S_CET:
> +		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
> +		    !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT))
> +			return KVM_MSR_RET_UNSUPPORTED;
> +		if (!kvm_is_valid_u_s_cet(vcpu, data))
> +			return 1;
> +		break;
> +	case MSR_KVM_INTERNAL_GUEST_SSP:
> +		if (!host_initiated)
> +			return 1;
> +		fallthrough;
> +		/*
> +		 * Note that the MSR emulation here is flawed when a vCPU
> +		 * doesn't support the Intel 64 architecture. The expected
> +		 * architectural behavior in this case is that the upper 32
> +		 * bits do not exist and should always read '0'. However,
> +		 * because the actual hardware on which the virtual CPU is
> +		 * running does support Intel 64, XRSTORS/XSAVES in the
> +		 * guest could observe behavior that violates the
> +		 * architecture. Intercepting XRSTORS/XSAVES for this
> +		 * special case isn't deemed worthwhile.
> +		 */
> +	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
> +		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
> +			return KVM_MSR_RET_UNSUPPORTED;
> +		/*
> +		 * MSR_IA32_INT_SSP_TAB is not present on processors that do
> +		 * not support Intel 64 architecture.
> +		 */
> +		if (index == MSR_IA32_INT_SSP_TAB && !guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
> +			return KVM_MSR_RET_UNSUPPORTED;
> +		if (is_noncanonical_msr_address(data, vcpu))
> +			return 1;
> +		/* All SSP MSRs except MSR_IA32_INT_SSP_TAB must be 4-byte aligned */
> +		if (index != MSR_IA32_INT_SSP_TAB && !IS_ALIGNED(data, 4))
> +			return 1;
> +		break;
>   	}
>   
>   	msr.data = data;
> @@ -1930,6 +1968,20 @@ static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
>   		    !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID))
>   			return 1;
>   		break;
> +	case MSR_IA32_U_CET:
> +	case MSR_IA32_S_CET:
> +		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
> +		    !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT))
> +			return KVM_MSR_RET_UNSUPPORTED;
> +		break;
> +	case MSR_KVM_INTERNAL_GUEST_SSP:
> +		if (!host_initiated)
> +			return 1;
> +		fallthrough;
> +	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
> +		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
> +			return KVM_MSR_RET_UNSUPPORTED;
> +		break;
>   	}
>   
>   	msr.index = index;
> @@ -4220,6 +4272,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		vcpu->arch.guest_fpu.xfd_err = data;
>   		break;
>   #endif
> +	case MSR_IA32_U_CET:
> +	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> +		kvm_set_xstate_msr(vcpu, msr_info);
> +		break;
>   	default:
>   		if (kvm_pmu_is_valid_msr(vcpu, msr))
>   			return kvm_pmu_set_msr(vcpu, msr_info);
> @@ -4569,6 +4625,10 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		msr_info->data = vcpu->arch.guest_fpu.xfd_err;
>   		break;
>   #endif
> +	case MSR_IA32_U_CET:
> +	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> +		kvm_get_xstate_msr(vcpu, msr_info);
> +		break;
>   	default:
>   		if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
>   			return kvm_pmu_get_msr(vcpu, msr_info);
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index cf4f73a95825..95d2a82a4674 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -735,4 +735,27 @@ static inline void kvm_set_xstate_msr(struct kvm_vcpu *vcpu,
>   	kvm_fpu_put();
>   }
>   
> +#define CET_US_RESERVED_BITS		GENMASK(9, 6)
> +#define CET_US_SHSTK_MASK_BITS		GENMASK(1, 0)
> +#define CET_US_IBT_MASK_BITS		(GENMASK_ULL(5, 2) | GENMASK_ULL(63, 10))
> +#define CET_US_LEGACY_BITMAP_BASE(data)	((data) >> 12)
> +
> +static inline bool kvm_is_valid_u_s_cet(struct kvm_vcpu *vcpu, u64 data)
> +{
> +	if (data & CET_US_RESERVED_BITS)
> +		return false;
> +	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
> +	    (data & CET_US_SHSTK_MASK_BITS))
> +		return false;
> +	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) &&
> +	    (data & CET_US_IBT_MASK_BITS))
> +		return false;
> +	if (!IS_ALIGNED(CET_US_LEGACY_BITMAP_BASE(data), 4))
> +		return false;
> +	/* IBT can be suppressed iff the TRACKER isn't WAIT_ENDBR. */
> +	if ((data & CET_SUPPRESS) && (data & CET_WAIT_ENDBR))
> +		return false;
> +
> +	return true;
> +}
>   #endif


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 11/22] KVM: VMX: Emulate read and write to CET MSRs
  2025-09-11  8:05   ` Xiaoyao Li
@ 2025-09-11  9:02     ` Chao Gao
  2025-09-11 20:24       ` Sean Christopherson
  0 siblings, 1 reply; 53+ messages in thread
From: Chao Gao @ 2025-09-11  9:02 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: kvm, linux-kernel, acme, bp, dave.hansen, hpa, john.allen, mingo,
	mingo, minipli, mlevitsk, namhyung, pbonzini, prsampat,
	rick.p.edgecombe, seanjc, shuah, tglx, weijiang.yang, x86, xin

On Thu, Sep 11, 2025 at 04:05:23PM +0800, Xiaoyao Li wrote:
>On 9/9/2025 5:39 PM, Chao Gao wrote:
>> From: Yang Weijiang <weijiang.yang@intel.com>
>> 
>> Add emulation interface for CET MSR access. The emulation code is split
>> into common part and vendor specific part. The former does common checks
>> for MSRs, e.g., accessibility, data validity etc., then passes operation
>> to either XSAVE-managed MSRs via the helpers or CET VMCS fields.
>
>I planed to continue the review after Sean posts v15 as he promised.
>But I want to raise my question regarding it sooner so I just ask it on v14.
>
>Do we expect to put the accessibility and data validity check always in
>__kvm_{s,g}_msr(), when the handling cannot be put in kvm_{g,s}et_common()
>only? i.e., there will be 3 case:

For checks that are shared between VMX/SVM, I think yes and there is no other
sensible choice to me; other options just cause code duplication. For checks
that are not common, we have to put them into vendor code.

>
>- All the handling in kvm_{g,s}et_common(), when the MSR emulation is common
>to vmx and svm.
>
>- generic accessibility and data validity check in __kvm_{g,s}et_msr() and
>vendor specific handling in {vmx,svm}_{g,s}et_msr()
>
>- generic accessibility and data validity check in __kvm_{g,s}et_msr() ,
>vendor specific handling in {vmx,svm}_{g,s}et_msr() and other generic
>handling in kvm_{g,s}et_common()

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 15/22] KVM: x86: Don't emulate instructions guarded by CET
  2025-09-09  9:39 ` [PATCH v14 15/22] KVM: x86: Don't emulate instructions guarded by CET Chao Gao
@ 2025-09-11  9:18   ` Xiaoyao Li
  2025-09-11 10:42     ` Chao Gao
  2025-09-12 14:42   ` Sean Christopherson
  1 sibling, 1 reply; 53+ messages in thread
From: Xiaoyao Li @ 2025-09-11  9:18 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-kernel
  Cc: acme, bp, dave.hansen, hpa, john.allen, mingo, mingo, minipli,
	mlevitsk, namhyung, pbonzini, prsampat, rick.p.edgecombe, seanjc,
	shuah, tglx, weijiang.yang, x86, xin

On 9/9/2025 5:39 PM, Chao Gao wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
> 
> Don't emulate the branch instructions, e.g., CALL/RET/JMP etc., when CET
> is active in guest, return KVM_INTERNAL_ERROR_EMULATION to userspace to
> handle it.
> 
> KVM doesn't emulate CPU behaviors to check CET protected stuffs while
> emulating guest instructions, instead it stops emulation on detecting
> the instructions in process are CET protected. By doing so, it can avoid
> generating bogus #CP in guest and preventing CET protected execution flow
> subversion from guest side.
> 
> Suggested-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
>   arch/x86/kvm/emulate.c | 46 ++++++++++++++++++++++++++++++++----------
>   1 file changed, 35 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 542d3664afa3..97a4d1e69583 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -178,6 +178,8 @@
>   #define IncSP       ((u64)1 << 54)  /* SP is incremented before ModRM calc */
>   #define TwoMemOp    ((u64)1 << 55)  /* Instruction has two memory operand */
>   #define IsBranch    ((u64)1 << 56)  /* Instruction is considered a branch. */
> +#define ShadowStack ((u64)1 << 57)  /* Instruction protected by Shadow Stack. */
> +#define IndirBrnTrk ((u64)1 << 58)  /* Instruction protected by IBT. */
>   
>   #define DstXacc     (DstAccLo | SrcAccHi | SrcWrite)
>   
> @@ -4068,9 +4070,11 @@ static const struct opcode group4[] = {
>   static const struct opcode group5[] = {
>   	F(DstMem | SrcNone | Lock,		em_inc),
>   	F(DstMem | SrcNone | Lock,		em_dec),
> -	I(SrcMem | NearBranch | IsBranch,       em_call_near_abs),
> -	I(SrcMemFAddr | ImplicitOps | IsBranch, em_call_far),
> -	I(SrcMem | NearBranch | IsBranch,       em_jmp_abs),
> +	I(SrcMem | NearBranch | IsBranch | ShadowStack | IndirBrnTrk,
> +	em_call_near_abs),
> +	I(SrcMemFAddr | ImplicitOps | IsBranch | ShadowStack | IndirBrnTrk,
> +	em_call_far),
> +	I(SrcMem | NearBranch | IsBranch | IndirBrnTrk, em_jmp_abs),
>   	I(SrcMemFAddr | ImplicitOps | IsBranch, em_jmp_far),
>   	I(SrcMem | Stack | TwoMemOp,		em_push), D(Undefined),
>   };
> @@ -4332,11 +4336,11 @@ static const struct opcode opcode_table[256] = {
>   	/* 0xC8 - 0xCF */
>   	I(Stack | SrcImmU16 | Src2ImmByte | IsBranch, em_enter),
>   	I(Stack | IsBranch, em_leave),
> -	I(ImplicitOps | SrcImmU16 | IsBranch, em_ret_far_imm),
> -	I(ImplicitOps | IsBranch, em_ret_far),
> -	D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch, intn),
> +	I(ImplicitOps | SrcImmU16 | IsBranch | ShadowStack, em_ret_far_imm),
> +	I(ImplicitOps | IsBranch | ShadowStack, em_ret_far),
> +	D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch | ShadowStack, intn),
>   	D(ImplicitOps | No64 | IsBranch),
> -	II(ImplicitOps | IsBranch, em_iret, iret),
> +	II(ImplicitOps | IsBranch | ShadowStack, em_iret, iret),
>   	/* 0xD0 - 0xD7 */
>   	G(Src2One | ByteOp, group2), G(Src2One, group2),
>   	G(Src2CL | ByteOp, group2), G(Src2CL, group2),
> @@ -4352,7 +4356,7 @@ static const struct opcode opcode_table[256] = {
>   	I2bvIP(SrcImmUByte | DstAcc, em_in,  in,  check_perm_in),
>   	I2bvIP(SrcAcc | DstImmUByte, em_out, out, check_perm_out),
>   	/* 0xE8 - 0xEF */
> -	I(SrcImm | NearBranch | IsBranch, em_call),
> +	I(SrcImm | NearBranch | IsBranch | ShadowStack, em_call),
>   	D(SrcImm | ImplicitOps | NearBranch | IsBranch),
>   	I(SrcImmFAddr | No64 | IsBranch, em_jmp_far),
>   	D(SrcImmByte | ImplicitOps | NearBranch | IsBranch),
> @@ -4371,7 +4375,8 @@ static const struct opcode opcode_table[256] = {
>   static const struct opcode twobyte_table[256] = {
>   	/* 0x00 - 0x0F */
>   	G(0, group6), GD(0, &group7), N, N,
> -	N, I(ImplicitOps | EmulateOnUD | IsBranch, em_syscall),
> +	N, I(ImplicitOps | EmulateOnUD | IsBranch | ShadowStack | IndirBrnTrk,
> +	em_syscall),
>   	II(ImplicitOps | Priv, em_clts, clts), N,
>   	DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N,
>   	N, D(ImplicitOps | ModRM | SrcMem | NoAccess), N, N,
> @@ -4402,8 +4407,9 @@ static const struct opcode twobyte_table[256] = {
>   	IIP(ImplicitOps, em_rdtsc, rdtsc, check_rdtsc),
>   	II(ImplicitOps | Priv, em_rdmsr, rdmsr),
>   	IIP(ImplicitOps, em_rdpmc, rdpmc, check_rdpmc),
> -	I(ImplicitOps | EmulateOnUD | IsBranch, em_sysenter),
> -	I(ImplicitOps | Priv | EmulateOnUD | IsBranch, em_sysexit),
> +	I(ImplicitOps | EmulateOnUD | IsBranch | ShadowStack | IndirBrnTrk,
> +	em_sysenter),
> +	I(ImplicitOps | Priv | EmulateOnUD | IsBranch | ShadowStack, em_sysexit),
>   	N, N,
>   	N, N, N, N, N, N, N, N,
>   	/* 0x40 - 0x4F */
> @@ -4941,6 +4947,24 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
>   	if (ctxt->d == 0)
>   		return EMULATION_FAILED;
>   
> +	if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET) {
> +		u64 u_cet, s_cet;
> +		bool stop_em;
> +
> +		if (ctxt->ops->get_msr(ctxt, MSR_IA32_U_CET, &u_cet) ||
> +		    ctxt->ops->get_msr(ctxt, MSR_IA32_S_CET, &s_cet))
> +			return EMULATION_FAILED;
> +
> +		stop_em = ((u_cet & CET_SHSTK_EN) || (s_cet & CET_SHSTK_EN)) &&
> +			  (opcode.flags & ShadowStack);
> +
> +		stop_em |= ((u_cet & CET_ENDBR_EN) || (s_cet & CET_ENDBR_EN)) &&
> +			   (opcode.flags & IndirBrnTrk);

Why don't check CPL here? Just for simplicity?

> +		if (stop_em)
> +			return EMULATION_FAILED;
> +	}
> +
>   	ctxt->execute = opcode.u.execute;
>   
>   	if (unlikely(emulation_type & EMULTYPE_TRAP_UD) &&


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 15/22] KVM: x86: Don't emulate instructions guarded by CET
  2025-09-11  9:18   ` Xiaoyao Li
@ 2025-09-11 10:42     ` Chao Gao
  2025-09-12  6:23       ` Xiaoyao Li
  0 siblings, 1 reply; 53+ messages in thread
From: Chao Gao @ 2025-09-11 10:42 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: kvm, linux-kernel, acme, bp, dave.hansen, hpa, john.allen, mingo,
	mingo, minipli, mlevitsk, namhyung, pbonzini, prsampat,
	rick.p.edgecombe, seanjc, shuah, tglx, weijiang.yang, x86, xin

>> @@ -4941,6 +4947,24 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
>>   	if (ctxt->d == 0)
>>   		return EMULATION_FAILED;
>> +	if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET) {
>> +		u64 u_cet, s_cet;
>> +		bool stop_em;
>> +
>> +		if (ctxt->ops->get_msr(ctxt, MSR_IA32_U_CET, &u_cet) ||
>> +		    ctxt->ops->get_msr(ctxt, MSR_IA32_S_CET, &s_cet))
>> +			return EMULATION_FAILED;
>> +
>> +		stop_em = ((u_cet & CET_SHSTK_EN) || (s_cet & CET_SHSTK_EN)) &&
>> +			  (opcode.flags & ShadowStack);
>> +
>> +		stop_em |= ((u_cet & CET_ENDBR_EN) || (s_cet & CET_ENDBR_EN)) &&
>> +			   (opcode.flags & IndirBrnTrk);
>
>Why don't check CPL here? Just for simplicity?

I think so. This is a corner case and we don't want to make it very precise
(and thus complex). The reason is that no one had a strong opinion on whether
to do the CPL check or not. I asked the same question before [*], but I don't
have a strong opinion on this either.

[*]: https://lore.kernel.org/kvm/ZaSQn7RCRTaBK1bc@chao-email/

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 11/22] KVM: VMX: Emulate read and write to CET MSRs
  2025-09-11  9:02     ` Chao Gao
@ 2025-09-11 20:24       ` Sean Christopherson
  0 siblings, 0 replies; 53+ messages in thread
From: Sean Christopherson @ 2025-09-11 20:24 UTC (permalink / raw)
  To: Chao Gao
  Cc: Xiaoyao Li, kvm, linux-kernel, acme, bp, dave.hansen, hpa,
	john.allen, mingo, mingo, minipli, mlevitsk, namhyung, pbonzini,
	prsampat, rick.p.edgecombe, shuah, tglx, weijiang.yang, x86, xin

On Thu, Sep 11, 2025, Chao Gao wrote:
> On Thu, Sep 11, 2025 at 04:05:23PM +0800, Xiaoyao Li wrote:
> >On 9/9/2025 5:39 PM, Chao Gao wrote:
> >> From: Yang Weijiang <weijiang.yang@intel.com>
> >> 
> >> Add emulation interface for CET MSR access. The emulation code is split
> >> into common part and vendor specific part. The former does common checks
> >> for MSRs, e.g., accessibility, data validity etc., then passes operation
> >> to either XSAVE-managed MSRs via the helpers or CET VMCS fields.
> >
> >I planed to continue the review after Sean posts v15 as he promised.
> >But I want to raise my question regarding it sooner so I just ask it on v14.
> >
> >Do we expect to put the accessibility and data validity check always in
> >__kvm_{s,g}_msr(), when the handling cannot be put in kvm_{g,s}et_common()
> >only? i.e., there will be 3 case:
> 
> For checks that are shared between VMX/SVM, I think yes and there is no other
> sensible choice to me; other options just cause code duplication.

+1.  Put as much emulation as possible into x86.c, e.g. validity checks, state
tracking, etc.  Ideally, the only thing in vendor code is vendor specific, e.g.
checks that are unique such as CR4.VMXE interactions, and propagation to/from
the VMCS/VMCB.  See also https://lore.kernel.org/all/aLDm9YID-r5WWcD9@google.com.

> For checks that are not common, we have to put them into vendor code.
> 
> >
> >- All the handling in kvm_{g,s}et_common(), when the MSR emulation is common
> >to vmx and svm.
> >
> >- generic accessibility and data validity check in __kvm_{g,s}et_msr() and
> >vendor specific handling in {vmx,svm}_{g,s}et_msr()
> >
> >- generic accessibility and data validity check in __kvm_{g,s}et_msr() ,
> >vendor specific handling in {vmx,svm}_{g,s}et_msr() and other generic
> >handling in kvm_{g,s}et_common()

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 15/22] KVM: x86: Don't emulate instructions guarded by CET
  2025-09-11 10:42     ` Chao Gao
@ 2025-09-12  6:23       ` Xiaoyao Li
  2025-09-12 14:37         ` Sean Christopherson
  0 siblings, 1 reply; 53+ messages in thread
From: Xiaoyao Li @ 2025-09-12  6:23 UTC (permalink / raw)
  To: Chao Gao
  Cc: kvm, linux-kernel, acme, bp, dave.hansen, hpa, john.allen, mingo,
	mingo, minipli, mlevitsk, namhyung, pbonzini, prsampat,
	rick.p.edgecombe, seanjc, shuah, tglx, weijiang.yang, x86, xin

On 9/11/2025 6:42 PM, Chao Gao wrote:
>>> @@ -4941,6 +4947,24 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
>>>    	if (ctxt->d == 0)
>>>    		return EMULATION_FAILED;
>>> +	if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET) {
>>> +		u64 u_cet, s_cet;
>>> +		bool stop_em;
>>> +
>>> +		if (ctxt->ops->get_msr(ctxt, MSR_IA32_U_CET, &u_cet) ||
>>> +		    ctxt->ops->get_msr(ctxt, MSR_IA32_S_CET, &s_cet))
>>> +			return EMULATION_FAILED;
>>> +
>>> +		stop_em = ((u_cet & CET_SHSTK_EN) || (s_cet & CET_SHSTK_EN)) &&
>>> +			  (opcode.flags & ShadowStack);
>>> +
>>> +		stop_em |= ((u_cet & CET_ENDBR_EN) || (s_cet & CET_ENDBR_EN)) &&
>>> +			   (opcode.flags & IndirBrnTrk);
>>
>> Why don't check CPL here? Just for simplicity?
> 
> I think so. This is a corner case and we don't want to make it very precise
> (and thus complex). The reason is that no one had a strong opinion on whether
> to do the CPL check or not. I asked the same question before [*], but I don't
> have a strong opinion on this either.

I'm OK with it.

But I think we should at least mention it in the change log. So people 
will know that CPL check is skipped intentionally and maintainers are OK 
with it so the patch was merged, when they dig the history.

> [*]: https://lore.kernel.org/kvm/ZaSQn7RCRTaBK1bc@chao-email/


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 15/22] KVM: x86: Don't emulate instructions guarded by CET
  2025-09-12  6:23       ` Xiaoyao Li
@ 2025-09-12 14:37         ` Sean Christopherson
  2025-09-12 15:11           ` Sean Christopherson
  0 siblings, 1 reply; 53+ messages in thread
From: Sean Christopherson @ 2025-09-12 14:37 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Chao Gao, kvm, linux-kernel, acme, bp, dave.hansen, hpa,
	john.allen, mingo, mingo, minipli, mlevitsk, namhyung, pbonzini,
	prsampat, rick.p.edgecombe, shuah, tglx, weijiang.yang, x86, xin

On Fri, Sep 12, 2025, Xiaoyao Li wrote:
> On 9/11/2025 6:42 PM, Chao Gao wrote:
> > > > @@ -4941,6 +4947,24 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
> > > >    	if (ctxt->d == 0)
> > > >    		return EMULATION_FAILED;
> > > > +	if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET) {
> > > > +		u64 u_cet, s_cet;
> > > > +		bool stop_em;
> > > > +
> > > > +		if (ctxt->ops->get_msr(ctxt, MSR_IA32_U_CET, &u_cet) ||
> > > > +		    ctxt->ops->get_msr(ctxt, MSR_IA32_S_CET, &s_cet))
> > > > +			return EMULATION_FAILED;
> > > > +
> > > > +		stop_em = ((u_cet & CET_SHSTK_EN) || (s_cet & CET_SHSTK_EN)) &&
> > > > +			  (opcode.flags & ShadowStack);
> > > > +
> > > > +		stop_em |= ((u_cet & CET_ENDBR_EN) || (s_cet & CET_ENDBR_EN)) &&
> > > > +			   (opcode.flags & IndirBrnTrk);
> > > 
> > > Why don't check CPL here? Just for simplicity?
> > 
> > I think so. This is a corner case and we don't want to make it very precise

Checking CPL here would not make the code more complex, e.g. naively it could be
something like:

		u64 cet;
		int r;

		if (ctxt->ops->cpl(ctxt) == 3)
			r = ctxt->ops->get_msr(ctxt, MSR_IA32_U_CET, &cet);
		else
			r = ctxt->ops->get_msr(ctxt, MSR_IA32_S_CET, &cet);
		if (r)
			return EMULATION_FAILED;

		if (cet & CET_SHSTK_EN && opcode.flags & ShadowStack)
			  return EMULATION_FAILED;

		if (cet & CET_ENDBR_EN && opcode.flags & IndirBrnTrk)
			  return EMULATION_FAILED;

> > (and thus complex). The reason is that no one had a strong opinion on whether
> > to do the CPL check or not. I asked the same question before [*], but I don't
> > have a strong opinion on this either.
> 
> I'm OK with it.

I have a strong opinion.  :-)

KVM must NOT check CPL, because inter-privilege level transfers could trigger
CET emulation and both levels.  E.g. a FAR CALL will be affected by both shadow
stacks and IBT at the target privilege level.

So this need more than just a changelog blurb, it needs a comment.  The code
can also be cleaned up and optimized.  Reading CR4 and two MSRs (via indirect
calls, i.e. potential retpolines) is wasteful for the vast majority of instructions,
and gathering "stop emulation" into a local variable when a positive test is fatal
is pointless.

	/*
	 * Reject emulation if KVM might need to emulate shadow stack updates
	 * and/or indirect branch tracking enforcement, which the emulator
	 * doesn't support.  Deliberately don't check CPL as inter-privilege
	 * level transfers can trigger emulation at both privilege levels, and
	 * the expectation is that the guest will not require emulation of any
	 * CET-affected instructions at any privilege level.
	 */
	if (opcode.flags & (ShadowStack | IndirBrnTrk) &&
	    ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET) {
		u64 u_cet, s_cet;

		if (ctxt->ops->get_msr(ctxt, MSR_IA32_U_CET, &u_cet) ||
		    ctxt->ops->get_msr(ctxt, MSR_IA32_S_CET, &s_cet))
			return EMULATION_FAILED;

		if ((u_cet | s_cet) & CET_SHSTK_EN && opcode.flags & ShadowStack)
			  return EMULATION_FAILED;

		if ((u_cet | s_cet) & CET_ENDBR_EN && opcode.flags & IndirBrnTrk)
			  return EMULATION_FAILED;
	}

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 15/22] KVM: x86: Don't emulate instructions guarded by CET
  2025-09-09  9:39 ` [PATCH v14 15/22] KVM: x86: Don't emulate instructions guarded by CET Chao Gao
  2025-09-11  9:18   ` Xiaoyao Li
@ 2025-09-12 14:42   ` Sean Christopherson
  1 sibling, 0 replies; 53+ messages in thread
From: Sean Christopherson @ 2025-09-12 14:42 UTC (permalink / raw)
  To: Chao Gao
  Cc: kvm, linux-kernel, acme, bp, dave.hansen, hpa, john.allen, mingo,
	mingo, minipli, mlevitsk, namhyung, pbonzini, prsampat,
	rick.p.edgecombe, shuah, tglx, weijiang.yang, x86, xin,
	xiaoyao.li

On Tue, Sep 09, 2025, Chao Gao wrote:
> @@ -4068,9 +4070,11 @@ static const struct opcode group4[] = {
>  static const struct opcode group5[] = {
>  	F(DstMem | SrcNone | Lock,		em_inc),
>  	F(DstMem | SrcNone | Lock,		em_dec),
> -	I(SrcMem | NearBranch | IsBranch,       em_call_near_abs),
> -	I(SrcMemFAddr | ImplicitOps | IsBranch, em_call_far),
> -	I(SrcMem | NearBranch | IsBranch,       em_jmp_abs),
> +	I(SrcMem | NearBranch | IsBranch | ShadowStack | IndirBrnTrk,
> +	em_call_near_abs),

Argh, these wraps are killing me.  I spent a good 20 seconds staring at the code
trying to figure out which instructions are affected.  There's definitely a bit
of -ENOCOFFEE going on, but there's also zero reason to wrap.

> +	I(SrcMemFAddr | ImplicitOps | IsBranch | ShadowStack | IndirBrnTrk,
> +	em_call_far),
> +	I(SrcMem | NearBranch | IsBranch | IndirBrnTrk, em_jmp_abs),
>  	I(SrcMemFAddr | ImplicitOps | IsBranch, em_jmp_far),
>  	I(SrcMem | Stack | TwoMemOp,		em_push), D(Undefined),
>  };

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 15/22] KVM: x86: Don't emulate instructions guarded by CET
  2025-09-12 14:37         ` Sean Christopherson
@ 2025-09-12 15:11           ` Sean Christopherson
  2025-09-16 14:42             ` Chao Gao
  0 siblings, 1 reply; 53+ messages in thread
From: Sean Christopherson @ 2025-09-12 15:11 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Chao Gao, kvm, linux-kernel, acme, bp, dave.hansen, hpa,
	john.allen, mingo, mingo, minipli, mlevitsk, namhyung, pbonzini,
	prsampat, rick.p.edgecombe, shuah, tglx, weijiang.yang, x86, xin

On Fri, Sep 12, 2025, Sean Christopherson wrote:
> On Fri, Sep 12, 2025, Xiaoyao Li wrote:
> > On 9/11/2025 6:42 PM, Chao Gao wrote:
> > > (and thus complex). The reason is that no one had a strong opinion on whether
> > > to do the CPL check or not. I asked the same question before [*], but I don't
> > > have a strong opinion on this either.
> > 
> > I'm OK with it.
> 
> I have a strong opinion.  :-)
> 
> KVM must NOT check CPL, because inter-privilege level transfers could trigger
> CET emulation and both levels.  E.g. a FAR CALL will be affected by both shadow
> stacks and IBT at the target privilege level.
> 
> So this need more than just a changelog blurb, it needs a comment.  The code
> can also be cleaned up and optimized.  Reading CR4 and two MSRs (via indirect
> calls, i.e. potential retpolines) is wasteful for the vast majority of instructions,
> and gathering "stop emulation" into a local variable when a positive test is fatal
> is pointless.
> 
> 	/*
> 	 * Reject emulation if KVM might need to emulate shadow stack updates
> 	 * and/or indirect branch tracking enforcement, which the emulator
> 	 * doesn't support.  Deliberately don't check CPL as inter-privilege
> 	 * level transfers can trigger emulation at both privilege levels, and
> 	 * the expectation is that the guest will not require emulation of any
> 	 * CET-affected instructions at any privilege level.
> 	 */
> 	if (opcode.flags & (ShadowStack | IndirBrnTrk) &&
> 	    ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET) {
> 		u64 u_cet, s_cet;
> 
> 		if (ctxt->ops->get_msr(ctxt, MSR_IA32_U_CET, &u_cet) ||
> 		    ctxt->ops->get_msr(ctxt, MSR_IA32_S_CET, &s_cet))
> 			return EMULATION_FAILED;
> 
> 		if ((u_cet | s_cet) & CET_SHSTK_EN && opcode.flags & ShadowStack)
> 			  return EMULATION_FAILED;
> 
> 		if ((u_cet | s_cet) & CET_ENDBR_EN && opcode.flags & IndirBrnTrk)
> 			  return EMULATION_FAILED;
> 	}

On second thought, I think it's worth doing the CPL checks.  Explaining why KVM
doesn't bother with checking privilege level is more work than just writing the
code.

	/*
	 * Reject emulation if KVM might need to emulate shadow stack updates
	 * and/or indirect branch tracking enforcement, which the emulator
	 * doesn't support.
	 */
	if (opcode.flags & (ShadowStack | IndirBrnTrk) &&
	    ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET) {
		u64 u_cet = 0, s_cet = 0;

		/*
		 * Check both User and Supervisor on far transfers as inter-
		 * privilege level transfers are impacted by CET at the target
		 * privilege levels, and that is not known at this time.  The
	 	 * the expectation is that the guest will not require emulation
		 * of any CET-affected instructions at any privilege level.
		 */
		if (!(opcode.flags & NearBranch)) {
			u_cet = s_cet = CET_SHSTK_EN | CET_ENDBR_EN;
		} else if (ctxt->ops->cpl(ctxt) == 3) {
			u_cet = CET_SHSTK_EN | CET_ENDBR_EN;
		} else {
			s_cet = CET_SHSTK_EN | CET_ENDBR_EN;
		}

		if ((u_cet && ctxt->ops->get_msr(ctxt, MSR_IA32_U_CET, &u_cet)) ||
		    (s_cet && ctxt->ops->get_msr(ctxt, MSR_IA32_S_CET, &s_cet)))
			return EMULATION_FAILED;

		if ((u_cet | s_cet) & CET_SHSTK_EN && opcode.flags & ShadowStack)
			  return EMULATION_FAILED;

		if ((u_cet | s_cet) & CET_ENDBR_EN && opcode.flags & IndirBrnTrk)
			  return EMULATION_FAILED;
	}

Side topic, has anyone actually tested that this works?  I.e. that attempts to
emulate CET-affected instructions result in emulation failure?  I'd love to have
a selftest for this (hint, hint), but presumably writing one is non-trivial due
to the need to get the selftest compiled with the necessary annotations, setup,
and whatnot.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 14/22] KVM: VMX: Set host constant supervisor states to VMCS fields
  2025-09-09  9:39 ` [PATCH v14 14/22] KVM: VMX: Set host constant supervisor states to VMCS fields Chao Gao
@ 2025-09-12 22:04   ` Sean Christopherson
  0 siblings, 0 replies; 53+ messages in thread
From: Sean Christopherson @ 2025-09-12 22:04 UTC (permalink / raw)
  To: Chao Gao
  Cc: kvm, linux-kernel, acme, bp, dave.hansen, hpa, john.allen, mingo,
	mingo, minipli, mlevitsk, namhyung, pbonzini, prsampat,
	rick.p.edgecombe, shuah, tglx, weijiang.yang, x86, xin,
	xiaoyao.li

On Tue, Sep 09, 2025, Chao Gao wrote:
>  void set_cr4_guest_host_mask(struct vcpu_vmx *vmx)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 79861b7ad44d..d67aef261638 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9890,6 +9890,18 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>  		return -EIO;
>  	}
>  
> +	if (boot_cpu_has(X86_FEATURE_SHSTK)) {

This needs to check for "|| IBT"

> +		rdmsrq(MSR_IA32_S_CET, kvm_host.s_cet);
> +		/*
> +		 * Linux doesn't yet support supervisor shadow stacks (SSS), so
> +		 * KVM doesn't save/restore the associated MSRs, i.e. KVM may
> +		 * clobber the host values.  Yell and refuse to load if SSS is
> +		 * unexpectedly enabled, e.g. to avoid crashing the host.
> +		 */
> +		if (WARN_ON_ONCE(kvm_host.s_cet & CET_SHSTK_EN))
> +			return -EIO;
> +	}

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v14 15/22] KVM: x86: Don't emulate instructions guarded by CET
  2025-09-12 15:11           ` Sean Christopherson
@ 2025-09-16 14:42             ` Chao Gao
  0 siblings, 0 replies; 53+ messages in thread
From: Chao Gao @ 2025-09-16 14:42 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Xiaoyao Li, kvm, linux-kernel, acme, bp, dave.hansen, hpa,
	john.allen, mingo, mingo, minipli, mlevitsk, namhyung, pbonzini,
	prsampat, rick.p.edgecombe, shuah, tglx, weijiang.yang, x86, xin

>On second thought, I think it's worth doing the CPL checks.  Explaining why KVM
>doesn't bother with checking privilege level is more work than just writing the
>code.
>
>	/*
>	 * Reject emulation if KVM might need to emulate shadow stack updates
>	 * and/or indirect branch tracking enforcement, which the emulator
>	 * doesn't support.
>	 */
>	if (opcode.flags & (ShadowStack | IndirBrnTrk) &&
>	    ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET) {
>		u64 u_cet = 0, s_cet = 0;
>
>		/*
>		 * Check both User and Supervisor on far transfers as inter-
>		 * privilege level transfers are impacted by CET at the target
>		 * privilege levels, and that is not known at this time.  The
>	 	 * the expectation is that the guest will not require emulation
>		 * of any CET-affected instructions at any privilege level.
>		 */
>		if (!(opcode.flags & NearBranch)) {
>			u_cet = s_cet = CET_SHSTK_EN | CET_ENDBR_EN;
>		} else if (ctxt->ops->cpl(ctxt) == 3) {
>			u_cet = CET_SHSTK_EN | CET_ENDBR_EN;
>		} else {
>			s_cet = CET_SHSTK_EN | CET_ENDBR_EN;
>		}
>
>		if ((u_cet && ctxt->ops->get_msr(ctxt, MSR_IA32_U_CET, &u_cet)) ||
>		    (s_cet && ctxt->ops->get_msr(ctxt, MSR_IA32_S_CET, &s_cet)))
>			return EMULATION_FAILED;
>
>		if ((u_cet | s_cet) & CET_SHSTK_EN && opcode.flags & ShadowStack)
>			  return EMULATION_FAILED;
>
>		if ((u_cet | s_cet) & CET_ENDBR_EN && opcode.flags & IndirBrnTrk)
>			  return EMULATION_FAILED;
>	}
>
>Side topic, has anyone actually tested that this works?  I.e. that attempts to
>emulate CET-affected instructions result in emulation failure?

I haven't. :(

>I'd love to have
>a selftest for this (hint, hint), but presumably writing one is non-trivial due
>to the need to get the selftest compiled with the necessary annotations, setup,
>and whatnot.

Sure. I'll try to write a selftest for this, but I'm unsure about its
complexity. Can you clarify what you mean by "necessary annotations,
setup..."? It seems to me that some simple assembly code, like
test_em_rdmsr(), should work.

For now, I plan to do a quick test by tweaking KUT's cet.c to force
emulation of CET-affected instructions.

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2025-09-16 14:43 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-09  9:39 [PATCH v14 00/22] Enable CET Virtualization Chao Gao
2025-09-09  9:39 ` [PATCH v14 01/22] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support Chao Gao
2025-09-10  9:03   ` Xiaoyao Li
2025-09-10 17:17   ` Sean Christopherson
2025-09-10 17:35   ` Sean Christopherson
2025-09-09  9:39 ` [PATCH v14 02/22] KVM: x86: Report XSS as to-be-saved if there are supported features Chao Gao
2025-09-11  6:52   ` Binbin Wu
2025-09-09  9:39 ` [PATCH v14 03/22] KVM: x86: Check XSS validity against guest CPUIDs Chao Gao
2025-09-10  9:22   ` Xiaoyao Li
2025-09-10 11:33     ` Chao Gao
2025-09-10 18:47       ` Sean Christopherson
2025-09-09  9:39 ` [PATCH v14 04/22] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS Chao Gao
2025-09-10  9:23   ` Xiaoyao Li
2025-09-11  7:02   ` Binbin Wu
2025-09-09  9:39 ` [PATCH v14 05/22] KVM: x86: Initialize kvm_caps.supported_xss Chao Gao
2025-09-10  9:36   ` Xiaoyao Li
2025-09-09  9:39 ` [PATCH v14 06/22] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Chao Gao
2025-09-10  9:37   ` Xiaoyao Li
2025-09-10 11:18     ` Chao Gao
2025-09-10 13:46       ` Xiaoyao Li
2025-09-10 15:24         ` Chao Gao
2025-09-10 17:50       ` Sean Christopherson
2025-09-09  9:39 ` [PATCH v14 07/22] KVM: x86: Add fault checks for guest CR4.CET setting Chao Gao
2025-09-10  9:38   ` Xiaoyao Li
2025-09-09  9:39 ` [PATCH v14 08/22] KVM: x86: Report KVM supported CET MSRs as to-be-saved Chao Gao
2025-09-09  9:39 ` [PATCH v14 09/22] KVM: VMX: Introduce CET VMCS fields and control bits Chao Gao
2025-09-09  9:39 ` [PATCH v14 10/22] KVM: x86: Enable guest SSP read/write interface with new uAPIs Chao Gao
2025-09-09  9:39 ` [PATCH v14 11/22] KVM: VMX: Emulate read and write to CET MSRs Chao Gao
2025-09-11  8:05   ` Xiaoyao Li
2025-09-11  9:02     ` Chao Gao
2025-09-11 20:24       ` Sean Christopherson
2025-09-09  9:39 ` [PATCH v14 12/22] KVM: x86: Save and reload SSP to/from SMRAM Chao Gao
2025-09-09  9:39 ` [PATCH v14 13/22] KVM: VMX: Set up interception for CET MSRs Chao Gao
2025-09-09  9:39 ` [PATCH v14 14/22] KVM: VMX: Set host constant supervisor states to VMCS fields Chao Gao
2025-09-12 22:04   ` Sean Christopherson
2025-09-09  9:39 ` [PATCH v14 15/22] KVM: x86: Don't emulate instructions guarded by CET Chao Gao
2025-09-11  9:18   ` Xiaoyao Li
2025-09-11 10:42     ` Chao Gao
2025-09-12  6:23       ` Xiaoyao Li
2025-09-12 14:37         ` Sean Christopherson
2025-09-12 15:11           ` Sean Christopherson
2025-09-16 14:42             ` Chao Gao
2025-09-12 14:42   ` Sean Christopherson
2025-09-09  9:39 ` [PATCH v14 16/22] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Chao Gao
2025-09-09  9:39 ` [PATCH v14 17/22] KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2 Chao Gao
2025-09-09  9:39 ` [PATCH v14 18/22] KVM: nVMX: Prepare for enabling CET support for nested guest Chao Gao
2025-09-09  9:39 ` [PATCH v14 19/22] KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET Chao Gao
2025-09-09  9:39 ` [PATCH v14 20/22] KVM: nVMX: Add consistency checks for CET states Chao Gao
2025-09-09  9:39 ` [PATCH v14 21/22] KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state Chao Gao
2025-09-09  9:39 ` [PATCH v14 22/22] KVM: selftest: Add tests for KVM_{GET,SET}_ONE_REG Chao Gao
2025-09-10 18:06   ` Sean Christopherson
2025-09-09  9:52 ` [PATCH v14 00/22] Enable CET Virtualization Chao Gao
2025-09-10 18:29   ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).